Results 1  10
of
23
Learning nearoptimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path
 MACHINE LEARNING JOURNAL (2008) 71:89129
, 2008
"... ..."
Valueiteration based fitted policy iteration: learning with a single trajectory
 In IEEE ADPRL
, 2007
"... Abstract — We consider batch reinforcement learning problems in continuous space, expected total discountedreward Markovian Decision Problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
Abstract — We consider batch reinforcement learning problems in continuous space, expected total discountedreward Markovian Decision Problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is policy iteration where in successive iterations the actionvalue functions of the intermediate policies are obtained by means of approximate value iteration. PACstyle polynomial bounds are derived on the number of samples needed to guarantee nearoptimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced thereby significantly extending the scope of previous results. I.
Convergence and consistency of regularized boosting algorithms with stationary βmixing observations
 In NIPS
, 2006
"... We study the statistical convergence and consistency of regularized Boosting methods, where the samples are not independent and identically distributed (i.i.d.) but come from empirical processes of stationary βmixing sequences. Utilizing a technique that constructs a sequence of independent blocks ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
We study the statistical convergence and consistency of regularized Boosting methods, where the samples are not independent and identically distributed (i.i.d.) but come from empirical processes of stationary βmixing sequences. Utilizing a technique that constructs a sequence of independent blocks close in distribution to the original samples, we prove the consistency of the composite classifiers resulting from a regularization achieved by restricting the 1norm of the base classifiers ’ weights. When compared to the i.i.d. case, the nature of sampling manifests in the consistency result only through generalization of the original condition on the growth of the regularization parameter. 1
Methods and techniques of complex systems science: An overview
, 2003
"... In this chapter, I review the main methods and techniques of complex systems science. As a ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
In this chapter, I review the main methods and techniques of complex systems science. As a
DYNAMICS OF BAYESIAN UPDATING WITH DEPENDENT DATA AND MISSPECIFIED MODELS
, 2009
"... Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data being independent and identically distributed or Markovian. Here I establish sufficient conditions for the convergence of the posterior distribution in nonparametric problems even when all of the hypotheses are wrong, and the datagenerating process has a complicated dependence structure. The main dynamical assumption is the generalized asymptotic equipartition (or “ShannonMcMillanBreiman”) property of information theory. I derive a kind of large deviations principle for the posterior measure, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between the present results and the “replicator dynamics” of evolutionary theory.
Learning from dependent observations
, 2006
"... In most papers establishing consistency for learning algorithms it is assumed that the observations used for training are realizations of an i.i.d. process. In this paper we go far beyond this classical framework by showing that support vector machines (SVMs) essentially only require that the datag ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
In most papers establishing consistency for learning algorithms it is assumed that the observations used for training are realizations of an i.i.d. process. In this paper we go far beyond this classical framework by showing that support vector machines (SVMs) essentially only require that the datagenerating process satisfies a certain law of large numbers. We then consider the learnability of SVMs for αmixing (not necessarily stationary) processes for both classification and regression, where for the latter we explicitly allow unbounded noise. Keywords: Support vector machine, Consistency, Nonstationary mixing process, Classification, Regression
Rademacher Complexity Bounds for NonI.I.D. Processes
"... This paper presents the first Rademacher complexitybased error bounds for noni.i.d. settings, a generalization of similar existing bounds derived for the i.i.d. case. Our bounds hold in the scenario of dependent samples generated by a stationary βmixing process, which is commonly adopted in many p ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
This paper presents the first Rademacher complexitybased error bounds for noni.i.d. settings, a generalization of similar existing bounds derived for the i.i.d. case. Our bounds hold in the scenario of dependent samples generated by a stationary βmixing process, which is commonly adopted in many previous studies of noni.i.d. settings. They benefit from the crucial advantages of Rademacher complexity over other measures of the complexity of hypothesis classes. In particular, they are datadependent and measure the complexity of a class of hypotheses based on the training sample. The empirical Rademacher complexity can be estimated from such finite samples and lead to tighter generalization bounds. We also present the first margin bounds for kernelbased classification in this noni.i.d. setting and briefly study their convergence. 1
FiniteSample Analysis of LeastSquares Policy Iteration
 Journal of Machine learning Research (JMLR
, 2011
"... In this paper, we report a performance bound for the widely used leastsquares policy iteration (LSPI) algorithm. We first consider the problem of policy evaluation in reinforcement learning, that is, learning the value function of a fixed policy, using the leastsquares temporaldifference (LSTD) l ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
In this paper, we report a performance bound for the widely used leastsquares policy iteration (LSPI) algorithm. We first consider the problem of policy evaluation in reinforcement learning, that is, learning the value function of a fixed policy, using the leastsquares temporaldifference (LSTD) learning method, and report finitesample analysis for this algorithm. To do so, we first derive a bound on the performance of the LSTD solution evaluated at the states generated by the Markov chain and used by the algorithm to learn an estimate of the value function. This result is general in the sense that no assumption is made on the existence of a stationary distribution for the Markov chain. We then derive generalization bounds in the case when the Markov chain possesses a stationary distribution and is βmixing. Finally, we analyze how the error at each policy evaluation step is propagated through the iterations of a policy iteration method, and derive a performance bound for the LSPI algorithm.
Probably approximately correct learning with beta mixing input sequences,” private communication
"... Abstract. In this paper, we study the behaviour of PAC learning algorithms when the input sequence is not i.i.d., but is βmixing instead. A metatheorem is proved, showing that if an algorithm is (i) PAC when the inputs are i.i.d., and (ii) ‘subadditive ’ in a sense defined in the paper, then the ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract. In this paper, we study the behaviour of PAC learning algorithms when the input sequence is not i.i.d., but is βmixing instead. A metatheorem is proved, showing that if an algorithm is (i) PAC when the inputs are i.i.d., and (ii) ‘subadditive ’ in a sense defined in the paper, then the same algorithm continues to be PAC even with βmixing inputs. It is shown that if a function family is distributionfree learnable or consistently learnable with i.i.d. inputs, then every consistent algorithm is PAC even when the input sequence is βmixing. Explicit quantitative estimates are derived for the learning rates with βmixing inputs, in terms of the learning rates with i.i.d. inputs and the βmixing coefficients of the input sequence. Finally, it is shown that a large of Markov chains have the βmixing property. Hence the results derived here have wide applicability. 1
Stability bounds for noniid processes
 Advances in Neural Information Processing Systems
, 2007
"... The notion of algorithmic stability has been used effectively in the past to derive tight generalization bounds. A key advantage of these bounds is that they are designed for specific learning algorithms, exploiting their particular properties. But, as in much of learning theory, existing stability ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
The notion of algorithmic stability has been used effectively in the past to derive tight generalization bounds. A key advantage of these bounds is that they are designed for specific learning algorithms, exploiting their particular properties. But, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed (i.i.d.). In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence, which is clear in system diagnosis or time series prediction problems. This paper studies the scenario where the observations are drawn from a stationary mixing sequence, which implies a dependence between observations that weaken over time. It proves novel stabilitybased generalization bounds that hold even with this more general setting. These bounds strictly generalize the bounds given in the i.i.d. case. It also illustrates their application in the case of several general classes of learning algorithms, including Support Vector Regression and Kernel Ridge Regression. 1