Results 11 - 20
of
239
Computer Go: an AI Oriented Survey
- Artificial Intelligence
, 2001
"... Since the beginning of AI, mind games have been studied as relevant application fields. Nowadays, some programs are better than human players in most classical games. Their results highlight the efficiency of AI methods that are now quite standard. Such methods are very useful to Go programs, bu ..."
Abstract
-
Cited by 68 (17 self)
- Add to MetaCart
Since the beginning of AI, mind games have been studied as relevant application fields. Nowadays, some programs are better than human players in most classical games. Their results highlight the efficiency of AI methods that are now quite standard. Such methods are very useful to Go programs, but they do not enable a strong Go program to be built. The problems related to Computer Go require new AI problem solving methods. Given the great number of problems and the diversity of possible solutions, Computer Go is an attractive research domain for AI. Prospective methods of programming the game of Go will probably be of interest in other domains as well. The goal of this paper is to present Computer Go by showing the links between existing studies on Computer Go and different AI related domains: evaluation function, heuristic search, machine learning, automatic knowledge generation, mathematical morphology and cognitive science. In addition, this paper describes both the practical aspects of Go programming, such as program optimization, and various theoretical aspects such as combinatorial game theory, mathematical morphology, and MonteCarlo methods. B. Bouzy T. Cazenave page 2 08/06/01 1.
An analytic solution to discrete bayesian reinforcement learning
- In Proc. ICML
, 2006
"... Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming ..."
Abstract
-
Cited by 53 (6 self)
- Add to MetaCart
Reinforcement learning (RL) was originally proposed as a framework to allow agents to learn in an online fashion as they interact with their environment. Existing RL algorithms come short of achieving this goal because the amount of exploration required is often too costly and/or too time consuming for online learning. As a result, RL is mostly used for offline learning in simulated environments. We propose a new algorithm, called BEETLE, for effective online learning that is computationally efficient while minimizing the amount of exploration. We take a Bayesian model-based approach, framing RL as a partially observable Markov decision process. Our two main contributions are the analytical derivation that the optimal value function is the upper envelope of a set of multivariate polynomials, and an efficient pointbased value iteration algorithm that exploits this simple parameterization. 1.
Between MDPs and semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales
- Journal of Artificial Intelligence Research
, 1998
"... Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion o ..."
Abstract
-
Cited by 51 (7 self)
- Add to MetaCart
Learning, planning, and representing knowledge at multiple levels of temporal abstraction are key challenges for AI. In this paper we develop an approach to these problems based on the mathematical framework of reinforcement learning and Markov decision processes (MDPs). We extend the usual notion of action to include options—whole courses of behavior that may be temporally extended, stochastic, and contingent on events. Examples of options include picking up an object, going to lunch, and traveling to a distant city, as well as primitive actions such as muscle twitches and joint torques. Options may be given a priori, learned by experience, or both. They may be used interchangeably with actions in a variety of planning and learning methods. The theory of semi-Markov decision processes (SMDPs) can be applied to model the consequences of options and as a basis for planning and learning methods using them. In this paper we develop these connections, building on prior work by Bradtke and Duff (1995), Parr (in prep.) and others. Our main novel results concern the interface between the MDP and SMDP levels of analysis. We show how a set of options can be altered by changing only their termination conditions
Computer Evolution of Buildable Objects
"... Creating artificial life forms through evolutionary robotics faces a "chicken and egg" problem: learning to control a complex body is dominated by inductive biases specific to its sensors and effectors, while building a body which is controllable is conditioned on the pre-existence of a brain. The i ..."
Abstract
-
Cited by 44 (10 self)
- Add to MetaCart
Creating artificial life forms through evolutionary robotics faces a "chicken and egg" problem: learning to control a complex body is dominated by inductive biases specific to its sensors and effectors, while building a body which is controllable is conditioned on the pre-existence of a brain. The idea of co-evolution of bodies and brains is becoming popular, but little work has been done in evolution of physical structure because of the lack of a general framework for doing it. Evolution of creatures in simulation has been constrained by the "reality gap" which implies that resultant objects are usually not buildable. The work we present takes a step in the problem of body evolution by applying evolutionary techniques to the design of structures assembled out of parts. Evolution takes place in a simulator we designed, which computes forces and stresses and predicts failure for 2-dimensional Lego structures. The final printout of our program is a schematic assembly, which can then be b...
Learning to Take Actions
, 1998
"... We formalize a model for supervised learning of action strategies in dynamic stochastic domains and show that PAC-learning results on Occam algorithms hold in this model as well. We then identify a class of rule-based action strategies for which polynomial time learning is possible. The representati ..."
Abstract
-
Cited by 43 (8 self)
- Add to MetaCart
We formalize a model for supervised learning of action strategies in dynamic stochastic domains and show that PAC-learning results on Occam algorithms hold in this model as well. We then identify a class of rule-based action strategies for which polynomial time learning is possible. The representation of strategies is a generalization of decision lists; strategies include rules with existentially quantified conditions, simple recursive predicates, and small internal state, but are syntactically restricted. We also study the learnability of hierarchically composed strategies where a subroutine already acquired can be used as a basic action in a higher level strategy. We prove some positive results in this setting, but also show that in some cases the hierarchical learning problem is computationally hard. 1 Introduction We formalize a model for supervised learning of action strategies in dynamic stochastic domains, and study the learnability of strategies represented by rule-based syste...
Learning to Drive a Bicycle using Reinforcement Learning and Shaping
, 1998
"... We present and solve a real-world problem of learning to drive a bicycle. We solve the problem by online reinforcement learning using the Sarsa()-algorithm. Then we solve the composite problem of learning to balance a bicycle and then drive to a goal. In our approach the reinforcement function is in ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
We present and solve a real-world problem of learning to drive a bicycle. We solve the problem by online reinforcement learning using the Sarsa()-algorithm. Then we solve the composite problem of learning to balance a bicycle and then drive to a goal. In our approach the reinforcement function is independent of the task the agent tries to learn to solve. 1 Introduction Here we consider the problem of learning to balance on a bicycle. Having done this we want to drive the bicycle to a goal. The second problem is not as straightforward as it may seem. The learning agent has to solve two problems at the same time: Balancing on the bicycle and driving to a specific place. Recently, ideas from behavioural psychology have been adapted by reinforcement learning to solve this type of problem. We will return to this in section 3. In reinforcement learning an agent interacts with an environment or a system. At each time step the agent receives information on the state of the system and chooses ...
Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning
- Proc. of the 20th International Conference on Machine Learning
, 2003
"... We present a novel Bayesian approach to the problem of value function estimation in continuous state spaces. We de ne a probabilistic generative model for the value function by imposing a Gaussian prior over value functions and assuming a Gaussian noise model. ..."
Abstract
-
Cited by 42 (7 self)
- Add to MetaCart
We present a novel Bayesian approach to the problem of value function estimation in continuous state spaces. We de ne a probabilistic generative model for the value function by imposing a Gaussian prior over value functions and assuming a Gaussian noise model.
Interactive Drama, Art and Artificial Intelligence
, 2002
"... This research was funded in part through fellowships from the Litton and Intel Corporations. Any opinions, findings and conclusions or recommendations expressed in this publication are those of the author and do not necessarily reflect those of the sponsors. ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
This research was funded in part through fellowships from the Litton and Intel Corporations. Any opinions, findings and conclusions or recommendations expressed in this publication are those of the author and do not necessarily reflect those of the sponsors.
Efficiently exploring architectural design spaces via predictive modeling
- in Proc. of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2006
"... Architects use cycle-by-cycle simulation to evaluate design choices and understand tradeoffs and interactions among design parameters. Efficiently exploring exponential-size design spaces with many interacting parameters remains an open problem: the sheer number of experiments renders detailed simul ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
Architects use cycle-by-cycle simulation to evaluate design choices and understand tradeoffs and interactions among design parameters. Efficiently exploring exponential-size design spaces with many interacting parameters remains an open problem: the sheer number of experiments renders detailed simulation intractable. We attack this problem via an automated approach that builds accurate, confident predictive design-space models. We simulate sampled points, using the results to teach our models the function describing relationships among design parameters. The models produce highly accurate performance estimates for other points in the space, can be queried to predict performance impacts of architectural changes, and are very fast compared to simulation, enabling efficient discovery of tradeoffs among parameters in different regions. We validate our approach via sensitivity studies on memory hierarchy and CPU design spaces: our models generally predict IPC with only 1-2% error and reduce required simulation by two orders of magnitude. We also show the efficacy of our technique for exploring chip multiprocessor (CMP) design spaces: when trained on a 1 % sample drawn from a CMP design space with 250K points and up to 55× performance swings among different system configurations, our models predict performance with only 4-5 % error on average. Our approach combines with techniques to reduce time per simulation, achieving net time savings of three-four orders of magnitude.

