Results 11  20
of
167
Crossentropy optimization for independent process analysis
, 2006
"... We treat the problem of searching for hidden multidimensional independent autoregressive processes. First, we transform the problem to Independent Subspace Analysis (ISA). Our main contribution concerns ISA. We show that under certain conditions, ISA is equivalent to a combinatorial optimization pr ..."
Abstract

Cited by 21 (16 self)
 Add to MetaCart
(Show Context)
We treat the problem of searching for hidden multidimensional independent autoregressive processes. First, we transform the problem to Independent Subspace Analysis (ISA). Our main contribution concerns ISA. We show that under certain conditions, ISA is equivalent to a combinatorial optimization problem. For the solution of this optimization we apply the crossentropy method. Numerical simulations indicate that the crossentropy method can provide considerable improvements over other stateoftheart methods.
Learning Complementary Multiagent Behaviors: A Case
"... Abstract. As machine learning is applied to increasingly complex tasks, it is likely that the diverse challenges encountered can only be addressed by combining the strengths of different learning algorithms. We examine this aspect of learning through a case study grounded in the robot soccer context ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
(Show Context)
Abstract. As machine learning is applied to increasingly complex tasks, it is likely that the diverse challenges encountered can only be addressed by combining the strengths of different learning algorithms. We examine this aspect of learning through a case study grounded in the robot soccer context. The task we consider is Keepaway, a popular benchmark for multiagent reinforcement learning from the simulation soccer domain. Whereas previous successful results in Keepaway have limited learning to an isolated, infrequent decision that amounts to a turntaking behavior (passing), we expand the agents ’ learning capability to include a much more ubiquitous action (moving without the ball, or getting open), such that at any given time, multiple agents are executing learned behaviors simultaneously. We introduce a policy search method for learning “GetOpen ” to complement the temporal difference learning approach employed for learning “Pass”. Empirical results indicate that the learned GetOpen policy matches the best handcoded policy for this task, and outperforms the best policy found when Pass is learned. We demonstrate that Pass and GetOpen can be learned simultaneously to realize tightlycoupled soccer team behavior. 1
An empirical analysis of value functionbased and policy search reinforcement learning
 AAMAS ’09: Proceedings of the 8th international
, 2009
"... In several agentoriented scenarios in the real world, an autonomous agent that is situated in an unknown environment must learn through a process of trial and error to take actions that result in longterm benefit. Reinforcement Learning (or sequential decision making) is a paradigm wellsuited to ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
(Show Context)
In several agentoriented scenarios in the real world, an autonomous agent that is situated in an unknown environment must learn through a process of trial and error to take actions that result in longterm benefit. Reinforcement Learning (or sequential decision making) is a paradigm wellsuited to this requirement. Value functionbased methods and policy search methods are contrasting approaches to solve reinforcement learning tasks. While both classes of methods benefit from independent theoretical analyses, these often fail to extend to the practical situations in which the methods are deployed. We conduct an empirical study to examine the strengths and weaknesses of these approaches by introducing a suite of test domains that can be varied for problem size, stochasticity, function approximation, and partial observability. Our results indicate clear patterns in the domain characteristics for which each class of methods excels. We investigate whether their strengths can be combined, and develop an approach to achieve that purpose. The effectiveness of this approach is also demonstrated on the challenging benchmark task of robot soccer Keepaway. We highlight several lines of inquiry that emanate from this study.
Application of the CrossEntropy Method to Clustering and Vector Quantization. Forthcoming in Journal of Global Optimization
, 2006
"... Editor: We apply the crossentropy (CE) method to problems in clustering and vector quantization. The CE algorithm involves the following iterative steps: (a) the generation of clusters according to a certain parametric probability distribution, (b) updating the parameters of this distribution acco ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Editor: We apply the crossentropy (CE) method to problems in clustering and vector quantization. The CE algorithm involves the following iterative steps: (a) the generation of clusters according to a certain parametric probability distribution, (b) updating the parameters of this distribution according to the KullbackLeibler crossentropy. Through various numerical experiments we demonstrate the high accuracy of the CE algorithm and show that it can generate nearoptimal clusters for fairly large data sets. We compare the CE method with wellknown clustering and vector quantization methods such as Kmeans, fuzzy Kmeans and linear vector quantization, and apply each method to benchmark and image analysis data.
Generalized Crossentropy Methods with Applications to Rareevent Simulation and Optimization
"... On behalf of: ..."
(Show Context)
Efficient Selection of Multiple Bandit Arms: Theory and Practice
"... We consider the general, widely applicable problem of selecting from n realvalued random variables a subset of size m of those with the highest means, based on as few samples as possible. This problem, which we denote Explorem, is a core aspect in several stochastic optimization algorithms, and ap ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
We consider the general, widely applicable problem of selecting from n realvalued random variables a subset of size m of those with the highest means, based on as few samples as possible. This problem, which we denote Explorem, is a core aspect in several stochastic optimization algorithms, and applications of simulation and industrial engineering. The theoretical basis for our work is an extension of a previous formulation using multiarmed bandits that is devoted to identifying just the one best of n random variables (Explore1). In addition to providing PAC bounds for the general case, we tailor our theoretically grounded approach to work efficiently in practice. Empirical comparisons of the resulting sampling algorithm against stateoftheart subset selection strategies demonstrate significant gains in sample efficiency. 1.
Fitted Qiteration by Advantage Weighted Regression
"... Recently, fitted Qiteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in realwor ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
(Show Context)
Recently, fitted Qiteration (FQI) based methods have become more popular due to their increased sample efficiency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in realworld tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly nonsmooth policies unsuitable for realworld systems. In this paper, we show that by using a softgreedy action selection the policy improvement step used in FQI can be simplified to an inexpensive advantageweighted regression. With this result, we are able to derive a new, computationally efficient FQI algorithm which can even deal with high dimensional action spaces. 1
The CrossEntropy Method for Policy Search in Decentralized POMDPs
, 2008
"... Decentralized POMDPs (DecPOMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a DecPOMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the CrossEntropy (CE) method, a recently introduced metho ..."
Abstract

Cited by 11 (9 self)
 Add to MetaCart
Decentralized POMDPs (DecPOMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a DecPOMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the CrossEntropy (CE) method, a recently introduced method for combinatorial optimization, to DecPOMDPs, resulting in a randomized (samplingbased) algorithm for approximately solving DecPOMDPs. This algorithm operates by sampling pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of DecPOMDPs. Povzetek: Prispevek opisuje novo metodo multiagentnega načrtovanja. 1
ADVANCES IN DISTRIBUTED OPTIMIZATION USING PROBABILITY COLLECTIVES
 ADVANCES IN COMPLEX SYSTEMS
, 2006
"... Recent work has shown how information theory extends conventional fullrationality game theory to allow bounded rational agents. The associated mathematical framework can be used to solve distributed optimization and control problems. This is done by translating the distributed problem into an itera ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
Recent work has shown how information theory extends conventional fullrationality game theory to allow bounded rational agents. The associated mathematical framework can be used to solve distributed optimization and control problems. This is done by translating the distributed problem into an iterated game, where each agent’s mixed strategy (i.e., its stochastically determined move) sets a different variable of the problem. So the expected value of the objective function of the distributed problem is determined by the joint probability distribution across the moves of the agents. The mixed strategies of the agents are updated from one game iteration to the next so as to converge on a joint distribution that optimizes that expected value of the objective function. Here a set of new techniques for this updating is presented. These and older techniques are then extended to apply to uncountable move spaces. We also present an extension of the approach to include (in)equality constraints over the underlying variables. Another contribution is that we how to extend the Monte Carlo version of the approach to cases where some agents have no Monte Carlo samples for some of their moves, and derive an “automatic annealing schedule”.
A Stochastic approximation method for inference in probabilistic graphical models
"... We describe a new algorithmic framework for inference in probabilistic models, and apply it to inference for latent Dirichlet allocation (LDA). Our framework adopts the methodology of variational inference, but unlike existing variational methods such as mean field and expectation propagation it is ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
We describe a new algorithmic framework for inference in probabilistic models, and apply it to inference for latent Dirichlet allocation (LDA). Our framework adopts the methodology of variational inference, but unlike existing variational methods such as mean field and expectation propagation it is not restricted to tractable classes of approximating distributions. Our approach can also be viewed as a “populationbased ” sequential Monte Carlo (SMC) method, but unlike existing SMC methods there is no need to design the artificial sequence of distributions. Significantly, our framework offers a principled means to exchange the variance of an importance sampling estimate for the bias incurred through variational approximation. We conduct experiments on a difficult inference problem in population genetics, a problem that is related to inference for LDA. The results of these experiments suggest that our method can offer improvements in stability and accuracy over existing methods, and at a comparable cost. 1