Results 1  10
of
23
Infinitehorizon policygradient estimation
 Journal of Artificial Intelligence Research
, 2001
"... Gradientbased approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in valuefunction methods. In this paper we introduce � � , a si ..."
Abstract

Cited by 155 (5 self)
 Add to MetaCart
Gradientbased approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in valuefunction methods. In this paper we introduce � � , a simulationbased algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes ( � s) controlled by parameterized stochastic policies. A similar algorithm was proposed by Kimura, Yamamura, and Kobayashi (1995). The algorithm’s chief advantages are that it requires storage of only twice the number of policy parameters, uses one free parameter � � (which has a natural interpretation in terms of biasvariance tradeoff), and requires no knowledge of the underlying state. We prove convergence of � � , and show how the correct choice of the parameter is related to the mixing time of the controlled �. We briefly describe extensions of � � to controlled Markov chains, continuous state, observation and control spaces, multipleagents, higherorder derivatives, and a version for training stochastic policies with internal states. In a companion paper (Baxter, Bartlett, & Weaver, 2001) we show how the gradient estimates generated by � � can be used in both a traditional stochastic gradient algorithm and a conjugategradient procedure to find local optima of the average reward. 1.
Direct gradientbased reinforcement learning: I. gradient estimation algorithms
 National University
, 1999
"... In [2] we introduced ¢¡¤£¦¥¨§¦¡, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes ( ¡©£¦¥¨§¦ ¡ s). The algorithm’s chief advantages are that it requires only a single sample path of the underly ..."
Abstract

Cited by 64 (3 self)
 Add to MetaCart
In [2] we introduced ¢¡¤£¦¥¨§¦¡, an algorithm for computing arbitrarily accurate approximations to the performance gradient of parameterized partially observable Markov decision processes ( ¡©£¦¥¨§¦ ¡ s). The algorithm’s chief advantages are that it requires only a single sample path of the underlying Markov chain, it uses only one ���� � ������ � free parameter which has a natural interpretation in terms of biasvariance tradeoff, and it requires no knowledge of the underlying state. In addition, the algorithm can be applied to infinite state, control and observation spaces.
Stochastic processes as concurrent constraint programs
 In Symposium on Principles of Programming Languages
, 1999
"... ) Vineet Gupta Radha Jagadeesan Prakash Panangaden y vgupta@mail.arc.nasa.gov radha@cs.luc.edu prakash@cs.mcgill.ca Caelum Research Corporation Dept. of Math. and Computer Sciences School of Computer Science NASA Ames Research Center Loyola UniversityLake Shore Campus McGill University Moffe ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
) Vineet Gupta Radha Jagadeesan Prakash Panangaden y vgupta@mail.arc.nasa.gov radha@cs.luc.edu prakash@cs.mcgill.ca Caelum Research Corporation Dept. of Math. and Computer Sciences School of Computer Science NASA Ames Research Center Loyola UniversityLake Shore Campus McGill University Moffett Field CA 94035, USA Chicago IL 60626, USA Montreal, Quebec, Canada Abstract This paper describes a stochastic concurrent constraint language for the description and programming of concurrent probabilistic systems. The language can be viewed both as a calculus for describing and reasoning about stochastic processes and as an executable language for simulating stochastic processes. In this language programs encode probability distributions over (potentially infinite) sets of objects. We illustrate the subtleties that arise from the interaction of constraints, random choice and recursion. We describe operational semantics of these programs (programs are run by sampling random choices), deno...
Adaptivity with moving grids
, 2009
"... In this article we look at the modern theory of moving meshes as part of an radaptive strategy for solving partial differential equations with evolving internal structure. We firstly examine the possible geometries of a moving mesh in both one and higher dimensions, and the discretization of partia ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
In this article we look at the modern theory of moving meshes as part of an radaptive strategy for solving partial differential equations with evolving internal structure. We firstly examine the possible geometries of a moving mesh in both one and higher dimensions, and the discretization of partial differential equation on such meshes. In particular, we consider such issues as mesh regularity, equidistribution, variational methods, and the error in interpolating a function or truncation error on such a mesh. We show that, guided by these, we can design effective moving mesh strategies. We then look in more detail as to how these strategies are implemented. Firstly we look at positionbased methods and the use of moving mesh partial differential equation (MMPDE), variational and optimal transport methods. This is followed by an analysis of velocitybased methods such as the geometric conservation law (GCL) methods. Finally we look at a number of examples where the use of a moving mesh method is effective in applications. These include scaleinvariant problems, blowup problems, problems with moving fronts and problems in meteorology. We conclude that, whilst radaptive methods are still in a relatively new stage of development, with many outstanding questions remaining, they have enormous potential for development, and for many problems they represent an optimal form of adaptivity.
How to divide a territory? A new simple differential formalism for optimization of set functions
 International Journal of Intelligent Systems
, 1999
"... ..."
Accessibility and stable ergodicity for partially hyperbolic diffeomorphisms with 1dcenter bundle
 Inv. Math
"... Abstract. We prove that stable ergodicity is C r open and dense among conservative partially hyperbolic diffeomorphisms with onedimensional center bundle, for all r ∈ [2, ∞]. The proof follows PughShub program [21]: among conservative partially hyperbolic diffeomorphisms with onedimensional cente ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Abstract. We prove that stable ergodicity is C r open and dense among conservative partially hyperbolic diffeomorphisms with onedimensional center bundle, for all r ∈ [2, ∞]. The proof follows PughShub program [21]: among conservative partially hyperbolic diffeomorphisms with onedimensional center bundle, accessibility is C r open and dense, and essential accessibility implies ergodicity. 1.
The BellKochenSpecker Theorem
"... Meyer, Kent and Clifton (MKC) claim to have nullified the BellKochenSpecker (BellKS) theorem. It is true that they invalidate KS’s account of the theorem’s physical implications. However, they do not invalidate Bell’s point, that quantum mechanics is inconsistent with the classical assumption, th ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Meyer, Kent and Clifton (MKC) claim to have nullified the BellKochenSpecker (BellKS) theorem. It is true that they invalidate KS’s account of the theorem’s physical implications. However, they do not invalidate Bell’s point, that quantum mechanics is inconsistent with the classical assumption, that a measurement tells us about a property previously possessed by the system. This failure of classical ideas about measurement is, perhaps, the single most important implication of quantum mechanics. In a conventional colouring there are some remaining patches of white. MKC fill in these patches, but only at the price of introducing patches where the colouring becomes “pathologically” discontinuous. The discontinuities mean that the colours in these patches are empirically unknowable. We prove a general theorem which shows that their extent is at least as great as the patches of white in a conventional approach. The theorem applies, not only to the MKC colourings, but also to any other such attempt to circumvent the BellKS theorem (Pitowsky’s colourings, for example). We go on to discuss the implications. MKC do not nullify the BellKS theorem. They do, however, show that we did not, hitherto, properly understand the theorem. For that reason their results (and Pitowsky’s earlier results) are of major importance. 1 1.
A SURVEY ON PARTIALLY HYPERBOLIC DYNAMICS.
, 2006
"... Abstract. Some of the guiding problems in partially hyperbolic systems are the following: (1) Examples, (2) Properties of invariant foliations, (3) Accessibility, (4) Ergodicity, (5) Lyapunov exponents, (6) Integrability of central foliations, (7) Transitivity and (8) Classification. Here we will su ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract. Some of the guiding problems in partially hyperbolic systems are the following: (1) Examples, (2) Properties of invariant foliations, (3) Accessibility, (4) Ergodicity, (5) Lyapunov exponents, (6) Integrability of central foliations, (7) Transitivity and (8) Classification. Here we will survey the state of the art on these subjects, and propose related problems.
PROPER POSTERIORS FROM IMPROPER PRIORS FOR AN UNIDENTIFIED ERRORSINVARIABLES MODEL
"... The problem considered is inference in a simple errorsinvariables model where consistent estimation is impossible without introducing additional exact prior information. The probabilistic prior information required for Bayesian analysis is found to be surprisingly light: despite the model's l ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The problem considered is inference in a simple errorsinvariables model where consistent estimation is impossible without introducing additional exact prior information. The probabilistic prior information required for Bayesian analysis is found to be surprisingly light: despite the model's lack of identification a proper posterior is guaranteed for any bounded prior density, including those representing improper priors. This result is illustrated with the improper uniform prior, which implies marginal posterior densities obtainable by integrating the likelihood function; surprisingly, the posterior mode for the regression slope is the usual least squares estimate. KEYwoRDs: Errorsinvariables, Bayesian inference, identification, improper priors, proper posteriors, finitely additive probabilities, coherence. 1.