Results 1  10
of
29
A Bayesian framework for reinforcement learning
 In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed tha ..."
Abstract

Cited by 76 (1 self)
 Add to MetaCart
The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed that the learning process estimates online the full posterior distribution over models. To determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming. By using a different hypothesis for each trial appropriate exploratory and exploitative behavior is obtained. This Bayesian method always converges to the optimal policy for a stationary process with discrete states. 1.
Exploiting Structure to Efficiently Solve Large Scale Partially Observable Markov Decision Processes
, 2005
"... Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithm ..."
Abstract

Cited by 63 (6 self)
 Add to MetaCart
Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in realworld problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finitehorizon discrete POMDP is PSPACEcomplete. In practice, two important sources of intractability plague most solution algorithms: large policy spaces and large state spaces. On the other hand,
Exploration of MultiState Environments: Local Measures and BackPropagation of Uncertainty
, 1998
"... . This paper presents an action selection technique for reinforcement learning in stationary Markovian environments. This technique may be used in direct algorithms such as Qlearning, or in indirect algorithms such as adaptive dynamic programming. It is based on two principles. The rst is to dene a ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
. This paper presents an action selection technique for reinforcement learning in stationary Markovian environments. This technique may be used in direct algorithms such as Qlearning, or in indirect algorithms such as adaptive dynamic programming. It is based on two principles. The rst is to dene a local measure of the uncertainty using the theory of bandit problems. We show that such a measure suers from several drawbacks. In particular, a direct application of it leads to algorithms of low quality that can be easily misled by particular congurations of the environment. The second basic principle was introduced to eliminate this drawback. It consists of assimilating the local measures of uncertainty to rewards, and backpropagating them with the dynamic programming or temporal dierence mechanisms. This allows reproducing globalscale reasoning about the uncertainty, using only local measures of it. Numerical simulations clearly show the eciency of these propositions. Keywords: ...
Bayesian sparse sampling for online reward optimization
 In ICML ’05: Proceedings of the 22nd international conference on Machine learning
, 2005
"... We present an efficient “sparse sampling ” technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making whil ..."
Abstract

Cited by 37 (5 self)
 Add to MetaCart
We present an efficient “sparse sampling ” technique for approximating Bayes optimal decision making in reinforcement learning, addressing the well known exploration versus exploitation tradeoff. Our approach combines sparse sampling with Bayesian exploration to achieve improved decision making while controlling computational cost. The idea is to grow a sparse lookahead tree, intelligently, by exploiting information in a Bayesian posterior—rather than enumerate action branches (standard sparse sampling) or compensate myopically (value of perfect information). The outcome is a flexible, practical technique for improving action selection in simple reinforcement learning scenarios. 1.
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
 Proc. IEEE
, 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
Recent advances in automatic speech recognition are accomplished by designing a plugin maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximumlikelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for highperformance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine
Learning Models for Robot Navigation
, 1998
"... Hidden Markov models (hmms) and partially observable Markov decision processes (pomdps) provide a useful tool for modeling dynamical systems. They are particularly useful for representing environments such as road networks and office buildings, which are typical for robot navigation and planning. Th ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
Hidden Markov models (hmms) and partially observable Markov decision processes (pomdps) provide a useful tool for modeling dynamical systems. They are particularly useful for representing environments such as road networks and office buildings, which are typical for robot navigation and planning. The work presented here describes a formal framework for incorporating readily available odometric information into both the models and the algorithm that learns them. By taking advantage of such information, learning hmms/pomdps can be made better and require fewer iterations, while being robust in the face of data reduction. That is, the performance of our algorithm does not significantly deteriorate as the training sequences provided to it become significantly shorter. Formal proofs for the convergence of the algorithm to a local maximum of the likelihood function are provided. Experimental results, obtained from both simulated and real robot data, demonstrate the effectiveness of the approach....
Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition
"... In this paper a theoretical framework for Bayesian adaptive learning of discrete HMM and semicontinuous one with Gaussian mixture state observation densities is presented. Corresponding to the wellknown BaumWelch and segmental kmeans algorithms respectively for HMM training, formulations of MAP ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
In this paper a theoretical framework for Bayesian adaptive learning of discrete HMM and semicontinuous one with Gaussian mixture state observation densities is presented. Corresponding to the wellknown BaumWelch and segmental kmeans algorithms respectively for HMM training, formulations of MAP (maximum aposteriori) and segmental MAP estimation of HMM parameters are developed. Furthermore, a computationally efficient method of the segmental quasiBayes estimation for semicontinuous HMM is also presented. The important issue of prior density estimation is discussed and a simplified method of moment estimate is given. The method proposed in this paper will be applicable to some problems in HMM training for speech recognition such as sequential or batch training, model adaptation, and parameter smoothing, etc.
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty
 Operations Research
"... doi 10.1287/opre.1080.0685 ..."
The AdvisorPOMDP: A Principled Approach to Trust through Reputation in Electronic Markets
 In PST
, 2005
"... This paper examines approaches to representing uncertainty in reputation systems for electronic markets with the aim of constructing a decision theoretic framework for collecting information about selling agents and making purchase decisions in the context of a social reputation system. A selection ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
This paper examines approaches to representing uncertainty in reputation systems for electronic markets with the aim of constructing a decision theoretic framework for collecting information about selling agents and making purchase decisions in the context of a social reputation system. A selection of approaches to representing reputation using DempsterShafter Theory and Bayesian probability are surveyed and a model for collecting and using reputation is developed using a Partially Observable Markov Decision Process.
A LEARNING ALGORITHM FOR COMMUNICATING MARKOV DECISION PROCESSES WITH UNKNOWN TRANSITION MATRICES
, 2007
"... This study is concerned with finite Markov decision processes (MDPs) whose state are exactly observable but its transition matrix is unknown. We develop a learning algorithm of the rewardpenalty type for the communicating case of multichain MDPs. An adaptively optimal policy and an asymptotic sequ ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
This study is concerned with finite Markov decision processes (MDPs) whose state are exactly observable but its transition matrix is unknown. We develop a learning algorithm of the rewardpenalty type for the communicating case of multichain MDPs. An adaptively optimal policy and an asymptotic sequence of adaptive policies with nearly optimal properties are constructed under the average expected reward criterion. Also, a numerical experiment is given to show the practical effectiveness of the algorithm.