Results 1 -
7 of
7
Probabilistic Robot Navigation in Partially Observable Environments
- In Proceedings of IJCAI-95
, 1995
"... Autonomous mobile robots need very reliable navigation capabilities in order to operate unattended for long periods of time. This paper reports on first results of a research program that uses partially observable Markov models to robustly track a robot's location in office environments and to direc ..."
Abstract
-
Cited by 231 (9 self)
- Add to MetaCart
Autonomous mobile robots need very reliable navigation capabilities in order to operate unattended for long periods of time. This paper reports on first results of a research program that uses partially observable Markov models to robustly track a robot's location in office environments and to direct its goal-oriented actions. The approach explicitly maintains a probability distribution over the possible locations of the robot, taking into account various sources of uncertainty, including approximate knowledge of the environment, and actuator and sensor uncertainty. A novel feature of our approach is its integration of topological map information with approximate metric information. We demonstrate the robustness of this approach in controlling an actual indoor mobile robot navigating corridors. 1 Introduction We are interested in the task of long-term autonomous navigation in an office environment (with corridors, foyers, and rooms). While the state of the art in autonomous office nav...
Exact Solutions to Time-Dependent MDPs
- in Advances in Neural Information Processing Systems
, 2000
"... We describe an extension of the Markov decision process model in which a continuous time dimension is included in the state space. This allows for the representation and exact solution of a wide range of problems in which transitions or rewards vary over time. We examine problems based on route ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
We describe an extension of the Markov decision process model in which a continuous time dimension is included in the state space. This allows for the representation and exact solution of a wide range of problems in which transitions or rewards vary over time. We examine problems based on route planning with public transportation and telescope observation scheduling. 1
The effect of representation and knowledge on goal-directed exploration with reinforcement learning algorithms: The proofs
, 1995
"... Abstract. We analyze the complexity of on-line reinforcement-learning algorithms applied to goal-directed exploration tasks. Previous work had concluded that, even in deterministic state spaces, initially uninformed reinforcement learning was at least exponential for such problems, or that it was of ..."
Abstract
-
Cited by 45 (4 self)
- Add to MetaCart
Abstract. We analyze the complexity of on-line reinforcement-learning algorithms applied to goal-directed exploration tasks. Previous work had concluded that, even in deterministic state spaces, initially uninformed reinforcement learning was at least exponential for such problems, or that it was of polynomial worst-case time-complexity only if the learning methods were augmented. We prove that, to the contrary, the algorithms are tractable with only a simple change in the reward structure (“penalizing the agent for action executions”) or in the initialization of the values that they maintain. In particular, we provide tight complexity bounds for both Watkins ’ Q-learning and Heger’s Q-hat-learning and show how their complexity depends on properties of the state spaces. We also demonstrate how one can decrease the complexity even further by either learning action models or utilizing prior knowledge of the topology of the state spaces. Our results provide guidance for empirical reinforcement-learning researchers on how to distinguish hard reinforcement-learning problems from easy ones and how to represent them in a way that allows them to be solved efficiently.
Sensor planning with non-linear utility functions
- In Proceedings of the Fifth European Conference on Planning (ECP-99
, 1999
"... Abstract. Sensor planning is concerned with when to sense and what to sense. We study sensor planning in the context of planning objectives that trade-o between minimizing the worst-case, expected, and best-case planexecution costs. Sensor planning with these planning objectives is interesting becau ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Abstract. Sensor planning is concerned with when to sense and what to sense. We study sensor planning in the context of planning objectives that trade-o between minimizing the worst-case, expected, and best-case planexecution costs. Sensor planning with these planning objectives is interesting because they are realistic and the frequency of sensing changes with the planning objective: more pessimistic decision makers tend to sense more frequently. We perform sensor planning by combining one of our techniques for planning with non-linear utility functions with an existing sensor-planning method. The resulting sensor-planning method is not only as easy to implement as the sensor-planning method that it extends but also (almost) as e cient. We demonstrate empirically how sensor plans change as the planning objective changes, using a common testbed for sensor planning.
Risk-averse auction agents
- In
, 2003
"... Auctions are an important means for purchasing material in the era of e-commerce. Research on auctions often studies them in isolation. In practice, however, auction agents are part of complete supply-chain management systems and have to make the same decisions as their human counterparts. To addres ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Auctions are an important means for purchasing material in the era of e-commerce. Research on auctions often studies them in isolation. In practice, however, auction agents are part of complete supply-chain management systems and have to make the same decisions as their human counterparts. To address this issue, we generalize results from auction theory in three ways. First, auction theory provides the optimal bidding function for the case where auction agents want to maximize the expected profit. Since companies are often risk-averse, we derive a closed form of the optimal bidding function for auction agents that maximize the expected utility of the profit for concave exponential utility functions. Second, auction theory often assumes that auction agents know the bidder’s valuation of an auctioned item. However, the valuation depends on how the item can be used in the production process. We therefore develop theoretical results that enable us to integrate our auction agents into production-planning systems to derive the bidder’s valuation automatically. Third, auction theory often assumes that the probability distribution over the competitors ’ valuations of the auctioned item is known. We use simulations of the combined auction- and production-planning system to obtain crude approximations of these probability distributions automatically. The resulting auction agents are part of a complete supply-chain management system and seamlessly combine ideas from auction theory, utility theory, and dynamic programming.
The interaction of representations and planning objectives for decision-theoretic planning tasks
- Journal of Experimental and Theoretical Artificial Intelligence
, 2002
"... We study decision-theoretic planning or reinforcement learning in the presence of traps such as steep slopes for outdoor robots or staircases for indoor robots. In this case, achieving the goal from the start is often the primary objective while minimizing the travel time is only of secondary import ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We study decision-theoretic planning or reinforcement learning in the presence of traps such as steep slopes for outdoor robots or staircases for indoor robots. In this case, achieving the goal from the start is often the primary objective while minimizing the travel time is only of secondary importance. We study how this planning objective interacts with possible representations of the planning tasks, namely whether to use a discount factor that is one or smaller than one and whether to use the action-penalty or the goal-reward representation. We show that the action-penalty representation without discounting guarantees that the plan that maximizes the expected reward also achieves the goal from the start (provided that this is possible) but neither the action-penalty representation with discounting nor the goal-reward representation with discounting have this property. We then show exactly when this trapping phenomenon occurs, using a novel interpretation of discounting, namely that it models agents that use convex exponential utility functions and thus are optimistic in the face of uncertainty. Finally, we show how our Selective State-Deletion Method can be used in conjunction with standard decision-theoretic planners to eliminate the trapping phenomenon. 1

