Results 11 - 20
of
1,489
Acting Optimally in Partially Observable Stochastic Domains
, 1994
"... In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or near-optimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the oper ..."
Abstract
-
Cited by 243 (16 self)
- Add to MetaCart
In this paper, we describe the partially observable Markov decision process (pomdp) approach to finding optimal or near-optimal control strategies for partially observable stochastic environments, given a complete model of the environment. The pomdp approach was originally developed in the operations research community and provides a formal basis for planning problems that have been of interest to the AI community. We found the existing algorithms for computing optimal control strategies to be highly computationally inefficient and have developed a new algorithm that is empirically more efficient. We sketch this algorithm and present preliminary results on several small problems that illustrate important properties of the pomdp approach.
Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents
- In Proceedings of the Tenth International Conference on Machine Learning
, 1993
"... Intelligent human agents exist in a cooperative social environment that facilitates learning. They learn not only by trialand -error, but also through cooperation by sharing instantaneous information, episodic experience, and learned knowledge. The key investigations of this paper are, "Given the sa ..."
Abstract
-
Cited by 220 (0 self)
- Add to MetaCart
Intelligent human agents exist in a cooperative social environment that facilitates learning. They learn not only by trialand -error, but also through cooperation by sharing instantaneous information, episodic experience, and learned knowledge. The key investigations of this paper are, "Given the same number of reinforcement learning agents, will cooperative agents outperform independent agents who do not communicate during learning?" and "What is the price for such cooperation?" Using independent agents as a benchmark, cooperative agents are studied in following ways: (1) sharing sensation, (2) sharing episodes, and (3) sharing learned policies. This paper shows that (a) additional sensation from another agent is beneficial if it can be used efficiently, (b) sharing learned policies or episodes among agents speeds up learning at the cost of communication, and (c) for joint tasks, agents engaging in partnership can significantly outperform independent agents although they may learn slo...
Experiences with an Interactive Museum Tour-Guide Robot
, 1998
"... This article describes the software architecture of an autonomous, interactive tour-guide robot. It presents a modular and distributed software architecture, which integrates localization, mapping, collision avoidance, planning, and various modules concerned with user interaction and Web-based telep ..."
Abstract
-
Cited by 217 (63 self)
- Add to MetaCart
This article describes the software architecture of an autonomous, interactive tour-guide robot. It presents a modular and distributed software architecture, which integrates localization, mapping, collision avoidance, planning, and various modules concerned with user interaction and Web-based telepresence. At its heart, the software approach relies on probabilistic computation, on-line learning, and any-time algorithms. It enables robots to operate safely, reliably, and at high speeds in highly dynamic environments, and does not require any modifications of the environment to aid the robot's operation. Special emphasis is placed on the design of interactive capabilities that appeal to people's intuition. The interface provides new means for human-robot interaction with crowds of people in public places, and it also provides people all around the world with the ability to establish a "virtual telepresence" using the Web. To illustrate our approach, results are reported obtained in mid-...
Tagging English Text with a Probabilistic Model
, 1994
"... In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used ..."
Abstract
-
Cited by 212 (0 self)
- Add to MetaCart
In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used a simple triclass Markov model and are looking for the best way to estimate the parameters of this model, depending on the kind and amount of training data provided. Two approaches in particular are compared and combined: using text that has been tagged by hand and computing relative frequency counts, using text without tags and training the model as a hidden Markov process, according to a Maximum Likelihood principle
AntNet: Distributed stigmergetic control for communications networks
- Journal of Artificial Intelligence Research
, 1998
"... This paper introduces AntNet, a novel approach to the adaptive learning of routing tables in communications networks. AntNet is a distributed, mobile agents based Monte Carlo system that was inspired by recent work on the ant colony metaphor for solving optimization problems. AntNet's agents concurr ..."
Abstract
-
Cited by 205 (29 self)
- Add to MetaCart
This paper introduces AntNet, a novel approach to the adaptive learning of routing tables in communications networks. AntNet is a distributed, mobile agents based Monte Carlo system that was inspired by recent work on the ant colony metaphor for solving optimization problems. AntNet's agents concurrently explore the network and exchange collected information. The communication among the agents is indirect and asynchronous, mediated by the network itself. This form of communication is typical of social insects and is called stigmergy. We compare our algorithm with six state-of-the-art routing algorithms coming from the telecommunications and machine learning elds. The algorithms' performance is evaluated over a set of realistic testbeds. We run many experiments over real and arti cial IP datagram networks with increasing number of nodes and under several paradigmatic spatial and temporal tra c distributions. Results are very encouraging. AntNet showed superior performance under all the experimental conditions with respect to its competitors. We analyze the main characteristics of the algorithm and try to explain the reasons for its superiority. 1.
The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
- Machine Learning
, 1995
"... Abstract. Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that learning does not plan uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and ap ..."
Abstract
-
Cited by 203 (8 self)
- Add to MetaCart
Abstract. Parti-game is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous state-spaces. In high dimensions it is essential that learning does not plan uniformly over a state-space. Parti-game maintains a decision-tree partitioning of state-space and applies techniques from game-theory and computational geometry to e ciently and adaptively concentrate high resolution only on critical areas. The currentversion of the algorithm is designed to nd feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to nd a solution that optimizes a real-valued criterion. Many simulated problems have been tested, ranging from two-dimensional to nine-dimensional state-spaces, including mazes, path planning, non-linear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.
Learning and Sequential Decision Making
- LEARNING AND COMPUTATIONAL NEUROSCIENCE
, 1989
"... In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of ..."
Abstract
-
Cited by 185 (10 self)
- Add to MetaCart
In this report we show how the class of adaptive prediction methods that Sutton called "temporal difference," or TD, methods are related to the theory of squential decision making. TD methods have been used as "adaptive critics" in connectionist learning systems, and have been proposed as models of animal learning in classical conditioning experiments. Here we relate TD methods to decision tasks formulated in terms of a stochastic dynamical system whose behavior unfolds over time under the influence of a decision maker's actions. Strategies are sought for selecting actions so as to maximize a measure of long-term payoff gain. Mathematically, tasks such as this can be formulated as Markovian decision problems, and numerous methods have been proposed for learning how to solve such problems. We show how a TD method can be understood as a novel synthesis of concepts from the theory of stochastic dynamic programming, which comprises the standard method for solving such tasks when a model of the dynamical system is available, and the theory of parameter estimation, which provides the appropriate context for studying learning rules in the form of equations for updating associative strengths in behavioral models, or connection weights in connectionist networks. Because this report is oriented primarily toward the non-engineer interested in animal learning, it presents tutorials on stochastic sequential decision tasks, stochastic dynamic programming, and parameter estimation.
Rapidly-Exploring Random Trees: Progress and Prospects
- Algorithmic and Computational Robotics: New Directions
, 2000
"... this paper, which presents randomized, algorithmic techniques for path planning that are particular suited for problems that involve dierential constraints. ..."
Abstract
-
Cited by 185 (24 self)
- Add to MetaCart
this paper, which presents randomized, algorithmic techniques for path planning that are particular suited for problems that involve dierential constraints.
Collaborative Multi-Robot Exploration
, 2000
"... In this paper we consider the problem of exploring an unknown environment by a team of robots. As in single-robot exploration the goal is to minimize the overall exploration time. The key problem to be solved therefore is to choose appropriate target points for the individual robots so that they sim ..."
Abstract
-
Cited by 184 (30 self)
- Add to MetaCart
In this paper we consider the problem of exploring an unknown environment by a team of robots. As in single-robot exploration the goal is to minimize the overall exploration time. The key problem to be solved therefore is to choose appropriate target points for the individual robots so that they simultaneously explore different regions of their environment. We present a probabilistic approach for the coordination of multiple robots which, in contrast to previous approaches, simultaneously takes into account the costs of reaching a target point and the utility of target points. The utility of target points is given by the size of the unexplored area that a robot can cover with its sensors upon reaching a target position. Whenever a target point is assigned to a specific robot, the utility of the unexplored area visible from this target position is reduced for the other robots. This way, a team of multiple robots assigns different target points to the individual robots. The technique has...
Intelligent Scissors for Image Composition
- In Computer Graphics, SIGGRAPH Proceedings
, 1995
"... We present a new, interactive tool called Intelligent Scissors which we use for image segmentation and composition. Fully automated segmentation is an unsolved problem, while manual tracing is inaccurate and laboriously unacceptable. However, Intelligent Scissors allow objects within digital images ..."
Abstract
-
Cited by 184 (5 self)
- Add to MetaCart
We present a new, interactive tool called Intelligent Scissors which we use for image segmentation and composition. Fully automated segmentation is an unsolved problem, while manual tracing is inaccurate and laboriously unacceptable. However, Intelligent Scissors allow objects within digital images to be extracted quickly and accurately using simple gesture motions with a mouse. When the gestured mouse position comes in proximity to an object edge, a live-wire boundary “snaps ” to, and wraps around the object of interest. Live-wire boundary detection formulates discrete dynamic programming (DP) as a two-dimensional graph searching problem. DP provides mathematically optimal boundaries while greatly reducing sensitivity to local noise or other intervening structures. Robustness is further enhanced with on-the-fly training which causes the boundary to adhere to the specific type of edge currently being followed, rather than simply the strongest edge in the neighborhood. Boundary cooling automatically freezes unchanging segments and automates input of additional seed points. Cooling also allows the user to be much more free with the gesture path, thereby increasing the efficiency and finesse with which boundaries can be extracted. Extracted objects can be scaled, rotated, and composited using live-wire masks and spatial frequency equivalencing. Frequency equivalencing is performed by applying a Butterworth filter which matches the lowest frequency spectra to all other image components. Intelligent Scissors allow creation of convincing compositions from existing images while dramatically increasing the speed and precision with which objects can be extracted. 1.

