Results 1  10
of
34
Covariate shift adaptation by importance weighted cross validation
, 2000
"... A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapo ..."
Abstract

Cited by 122 (55 self)
 Add to MetaCart
(Show Context)
A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase. However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated. The situation where the training input points and test input points follow different distributions while the conditional distribution of output values given input points is unchanged is called the covariate shift. Under the covariate shift, standard model selection techniques such as cross validation do not work as desired since its unbiasedness is no longer maintained. In this paper, we propose a new method called importance weighted cross validation (IWCV), for which we prove its unbiasedness even under the covariate shift. The IWCV procedure is the only one that can be applied for unbiased classification under covariate shift, whereas alternatives to IWCV exist for regression. The usefulness of our proposed method is illustrated by simulations, and furthermore demonstrated in the braincomputer interface, where strong nonstationarity effects can be seen between training and test sessions. c2000 Masashi Sugiyama, Matthias Krauledat, and KlausRobert Müller.
Direct Density Ratio Estimation for Largescale Covariate Shift Adaptation
"... Covariate shift is a situation in supervised learning where training and test inputs follow different distributions even though the functional relation remains unchanged. A common approach to compensating for the bias caused by covariate shift is to reweight the training samples according to importa ..."
Abstract

Cited by 37 (21 self)
 Add to MetaCart
(Show Context)
Covariate shift is a situation in supervised learning where training and test inputs follow different distributions even though the functional relation remains unchanged. A common approach to compensating for the bias caused by covariate shift is to reweight the training samples according to importance, which is the ratio of test and training densities. We propose a novel method that allows us to directly estimate the importance from samples without going through the hard task of density estimation. An advantage of the proposed method is that the computation time is nearly independent of the number of test input samples, which is highly beneficial in recent applications with large numbers of unlabeled samples. We demonstrate through experiments that the proposed method is computationally more efficient than existing approaches with comparable accuracy.
Learning from scarce experience
 Proceedings of the Nineteenth International Conference on Machine Learning
, 2002
"... Searching the space of policies directly for the optimal policy has been one popular method for solving partially observable reinforcement learning problems. Typically, with each change of the target policy, its value is estimated from the results of following that very policy. This requires a large ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
(Show Context)
Searching the space of policies directly for the optimal policy has been one popular method for solving partially observable reinforcement learning problems. Typically, with each change of the target policy, its value is estimated from the results of following that very policy. This requires a large number of interactions with the environment as different polices are considered. We present a family of algorithms based on likelihood ratio estimation that use data gathered when executing one policy (or collection of policies) to estimate the value of a different policy. The algorithms combine estimation and optimization stages. The former utilizes experience to build a nonparametric representation of an optimized function. The latter performs optimization on this estimate. We show positive empirical results and provide the sample complexity bound. 1.
Reinforcement Learning by Policy Search
, 2000
"... One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are know ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
One objective of artificial intelligence is to model the behavior of an intelligent agent interacting with its environment. The environment's transformations could be modeled as a Markov chain, whose state is partially observable to the agent and affected by its actions; such processes are known as partially observable Markov decision processes (POMDPs). While the environment's dynamics are assumed to obey certain rules, the agent does not know them and must learn. In this dissertation we focus on the agent's adaptation as captured by the reinforcement learning framework. Reinforcement learning means learning a policya mapping of observations into actionsbased on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. The set of policies being searched is constrained by the architecture of the agent's controller. POMDPs require a controller to have a memory. We investigate various architectures for controllers with memory, including controllers with external memory, finite state controllers and distributed controllers for multiagent system. For these various controllers we work out the details of the algorithms which learn by ascending the gradient of expected cumulative reinforcement. Building on statistical learning theory and experiment design theory, a policy evaluation algorithm is developed for the case of experience reuse. We address the question of sufficient experience for uniform convergence of policy evaluation and obtain sample complexity bounds for various estimators. Finally, we demonstrate the performance of the proposed algorithms on several domains, the most complex of which is simulated adaptive packet routing in a telecommunication network.
Reinforcement Learning for Adaptive Routing
 In Proceedings of the International Joint Conference on Neural Networks (IJCNN
, 2002
"... Reinforcement learning means learning a policya mapping of observations into actions based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. We present an application of gradient a ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
(Show Context)
Reinforcement learning means learning a policya mapping of observations into actions based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. We present an application of gradient ascent algorithm for reinforcement learning to a complex domain of packet routing in network communication and compare the performance of this algorithm to other routing methods on a benchmark problem.
A Survey of MultiObjective Sequential DecisionMaking
"... Sequential decisionmaking problems with multiple objectives arise naturally in practice and pose unique challenges for research in decisiontheoretic planning and learning, which has largely focused on singleobjective settings. This article surveys algorithms designed for sequential decisionmakin ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
Sequential decisionmaking problems with multiple objectives arise naturally in practice and pose unique challenges for research in decisiontheoretic planning and learning, which has largely focused on singleobjective settings. This article surveys algorithms designed for sequential decisionmaking problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multiobjective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a singleobjective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multiobjective methods according to the applicable scenario, the nature of the scalarization function (which projects multiobjective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multiobjective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work. 1.
Learning decisions: Robustness, uncertainty, and approximation
, 2004
"... Decision making under uncertainty is a central problem in robotics and machine learning. This thesis explores three fundamental and intertwined aspects of the problem of learning to make decisions. The first is the problem of uncertainty. Classical optimal control techniques typically rely on perfec ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
Decision making under uncertainty is a central problem in robotics and machine learning. This thesis explores three fundamental and intertwined aspects of the problem of learning to make decisions. The first is the problem of uncertainty. Classical optimal control techniques typically rely on perfect state information. Real world problems never enjoy such conditions. Perhaps more critically, classical optimal control algorithms fail to degrade gracefully as this assumption is violated. Closely tied to the problem of uncertainty is that of approximation. In large scale problems, learning decisions inevitably requires approximation. The difficulties of approximation inside the framework of optimal control are wellknown. [Gordon, 1995] Often, especially in robotics applications, we wish to operate learned controllers in domains where failure has relatively serious consequences. It is important to ensure that decision policies we generate are robust both to uncertainty in our models of systems and to our inability to accurately capture true system dynamics. We present new classes of algorithms that gracefully handle uncertainty, approximation,
Intelligent MarketMaking in Artificial Financial Markets
, 2003
"... This thesis describes and evaluates a marketmaking algorithm for setting prices in financial markets with asymmetric information, and analyzes the properties of artificial markets in which the algorithm is used. The core of our algorithm is a technique for maintaining an online probability density ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
This thesis describes and evaluates a marketmaking algorithm for setting prices in financial markets with asymmetric information, and analyzes the properties of artificial markets in which the algorithm is used. The core of our algorithm is a technique for maintaining an online probability density estimate of the underlying value of a stock. Previous theoretical work on marketmaking has led to pricesetting equations for which solutions cannot be achieved in practice, whereas empirical work on algorithms for marketmaking has focused on sets of heuristics and rules that lack theoretical justification. The algorithm presented in this thesis is theoretically justified by results in finance, and at the same time flexible enough to be easily extended by incorporating modules for dealing with considerations like portfolio risk and competition from other marketmakers. We analyze the performance of our algorithm experimentally in artificial markets with different parameter settings and find that many reasonable realworld properties emerge. For example, the spread increases in response to uncertainty about the true value of a stock, average spreads tend to be higher in more volatile markets, and marketmakers with lower average spreads perform better in environments with multiple competitive marketmakers. In addition, the time series data generated by simple markets populated with marketmakers using our algorithm replicate properties of realworld financial time series, such as volatility clustering and the fattailed nature of return distributions, without the need to specify explicit models for opinion propagation and herd behavior in the trading crowd.
Asymptotic Bayesian generalization error when training and test distributions are different
, 2007
"... In supervised learning, we commonly assume that training and test data are sampled from the same distribution. However, this assumption can be violated in practice and then standard machine learning techniques perform poorly. This paper focuses on revealing and improving the performance of Bayesian ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
In supervised learning, we commonly assume that training and test data are sampled from the same distribution. However, this assumption can be violated in practice and then standard machine learning techniques perform poorly. This paper focuses on revealing and improving the performance of Bayesian estimation when the training and test distributions are different. We formally analyze the asymptotic Bayesian generalization error and establish its upper bound under a very general setting. Our important finding is that lower order terms—which can be ignored in the absence of the distribution change— play an important role under the distribution change. We also propose a novel variant of stochastic complexity which can be used for choosing an appropriate model and hyperparameters under a particular distribution change.
Modeling Stock Order Flows and Learning MarketMaking from Data
 Data, AI Memo 2002009, MIT
, 2002
"... Stock markets employ specialized traders, marketmakers, designed to provide liquidity and volume to the market by constantly supplying both supply and demand. In this paper, we demonstrate a novel method for modeling the market as a dynamic system and a reinforcement learning algorithm that learns ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Stock markets employ specialized traders, marketmakers, designed to provide liquidity and volume to the market by constantly supplying both supply and demand. In this paper, we demonstrate a novel method for modeling the market as a dynamic system and a reinforcement learning algorithm that learns profitable marketmaking strategies when run on this model. The sequence of buys and sells for a particular stock, the order flow, we model as an InputOutput Hidden Markov Model fit to historical data. When combined with the dynamics of the order book, this creates a highly nonlinear and difficult dynamic system. Our reinforcement learning algorithm, based on likelihood ratios, is run on this partiallyobservable environment. We demonstrate learning results for two separate real stocks. Copyright c