#### DMCA

## Computationally efficient Gaussian process changepoint detection and regression (2014)

Citations: | 2 - 0 self |

### Citations

5604 | Reinforcement learning: an introduction
- Sutton, Barto
- 1998
(Show Context)
Citation Context ...ple efficient and, in contrast to model-based algorithms, capable of acting in real time, as demonstrated on a five-dimensional aircraft simulator. 133 C.2 Introduction In Reinforcement Learning (RL) =-=[118]-=-, several new algorithms for efficient exploration in continuous state spaces have been proposed, including GP-Rmax [61] and C-PACE [95]. In particular, C-PACE was shown to be PAC-MDP, an important cl... |

2207 | Probability inequalities for sums of bounded random variables - Hoeffding - 1963 |

1997 |
Nonlinear Systems
- Khalil
- 2002
(Show Context)
Citation Context ...ment here is restricted to control-affine systems, sufficient conditions exist to convert a class of non-affine in control nonlinear systems to the control-affine form in (B.1) 107 (see Chapter 13 in =-=[65]-=-), and the AMI-MRAC framework can also be extended to a class of non-affine in control systems [64, 66]. The AMI-MRAC approach used here feedback linearizes the system by finding a pseudocontrol input... |

1897 |
Markov decision processes: Discrete stochastic dynamic programming
- Puterman
- 1994
(Show Context)
Citation Context ...ns F that may be detected using the nonstationary algorithm. These are detailed in the preliminaries Section 2.2. 1.2.2 Reinforcement Learning A RL environment is modeled as a Markov Decision Process =-=[99]-=- (MDP). A MDP M is a tuple (S, A, R, T, -y) where S is the potentially infinite set of states, A is a finite set of actions available to the agent, and 0 < y < 1 is a discount factor for the sums of r... |

1231 |
Reinforcement Learning
- SUTTON, BARTO
- 1998
(Show Context)
Citation Context ...CRL-GP-CPD). UCRL-GP-CPD uses UCRL-GP to explore the state space and learn an optimal policy in between changepoints, and uses GP-NBC to detect changepoints as they occur. Reinforcement Learning (RL) =-=[119]-=- is a widely studied framework in which an agent interacts with an environment and receives rewards for performing actions in various states. Through exploring the environment, the agent learns to opt... |

718 | Gaussian processes for Machine Learning
- Rasmussen, Williams
- 2006
(Show Context)
Citation Context ...environment from noisy observations. Gaussian Processes (GPs) have emerged as a widely applied framework for inference in many machine learning and decision making applications, including regression, =-=[102]-=-, classification, [85], adaptive control, [26], and reinforcement learning, [38]. However, most existing (online) GP inference algorithms assume that the underlying generative model is stationary, tha... |

690 | Detection of Abrupt Changes: Theory and Applications - Basseville, Nikiforov - 1993 |

592 | An introduction to kernel-based learning algorithms
- Müller, Mika, et al.
- 2001
(Show Context)
Citation Context ...observations. Gaussian Processes (GPs) have emerged as a widely applied framework for inference in many machine learning and decision making applications, including regression, [102], classification, =-=[85]-=-, adaptive control, [26], and reinforcement learning, [38]. However, most existing (online) GP inference algorithms assume that the underlying generative model is stationary, that is, the data generat... |

555 |
On the method of bounded differences
- McDiarmid
- 1989
(Show Context)
Citation Context ...ovide information about the distribution structure for smaller numbers of samples. For this purpose, two large deviations inequalities are used, Hoeffding's Inequality [53] and McDiarmid's Inequality =-=[81]-=-, to bound the probability of (2.8) deviating from its expected value by more than some specified amount. These are often referred to as concentration inequalities. Theorem 2.1 (Hoeffding's Inequality... |

461 | Least-squares policy iteration
- Lagoudakis, Parr
(Show Context)
Citation Context ...], a model-free algorithm which uses GPs, and GP-Rmax [60], a model-based algorithm which uses GPs. The algorithms are tested on two synthetic domains: puddle world [9] and a cart-pole balancing task =-=[72]-=-. In the first domain, puddle world, an agent must navigate a continuous planar world S : [0, 1]2 from one corner to another while avoiding a puddle. The agent receives a reward of - 1 for every step ... |

452 |
ROS: An open-source robot operating system
- Quigley, Gerkey, et al.
- 2009
(Show Context)
Citation Context ... a monitoring facility with a Mobotix Q24 fish eye camera. The UAV used in this experiment is the a AR-Drone Parrot 2.0 Quadrotor controlled by a client program using the Robot Operating System (ROS) =-=[101]-=-. 86 UAV Automated RC cm Overhad 024 FksI-eve Camera Figure 4-6: The flight-test bed at the DAS lab at OSU emulates an urban environment and is equipped with motion capture facility. Discussion of Res... |

379 |
On the problem of the most efficient tests of statistical hypotheses
- Neyman, Pearson
- 1933
(Show Context)
Citation Context ...probability of a false alarm. The decision rule that maximizes the probability of detection subject to some maximum probability of a false hypothesis choice is still a LRT by the Neyman-Pearson lemma =-=[90]-=-. A variation of the Generalized Likelihood Ratio (GLR) [7], an instance of non-Bayesian hypothesis testing, is utilized to perform changepoint detection. The GLR detects changes by comparing a window... |

340 | Near-Optimal sensor placements in Gaussian Processes: Theory, efficient August 23, 2012 DRAFT algorithms and empirical
- KRAUSE, SINGH, et al.
(Show Context)
Citation Context ...S instead of exploring. The use of predictive variance for exploration is similar to "knownness" based model-free MDP solvers [11] and information entropy maximizing active sensor placement algorithm =-=[71]-=-. Description of the Experiment The mission scenario is for the agents to go from a predefined ingress location to a predefined egress location (referred to as the goal location) in the domain. The ag... |

305 | Generalization in reinforcement learning: Safely approximating the value function
- Boyan, Moore
- 1995
(Show Context)
Citation Context ...t the Q function directly, DGPQ [52], a model-free algorithm which uses GPs, and GP-Rmax [60], a model-based algorithm which uses GPs. The algorithms are tested on two synthetic domains: puddle world =-=[9]-=- and a cart-pole balancing task [72]. In the first domain, puddle world, an agent must navigate a continuous planar world S : [0, 1]2 from one corner to another while avoiding a puddle. The agent rece... |

297 | R-max – a general polynomial time algorithm for near-optimal reinforcement learning.
- Braffman, Tennenholtz
- 2002
(Show Context)
Citation Context ...above a certain threshold, the agent should only exploit its knowledge of S instead of exploring. The use of predictive variance for exploration is similar to "knownness" based model-free MDP solvers =-=[11]-=- and information entropy maximizing active sensor placement algorithm [71]. Description of the Experiment The mission scenario is for the agents to go from a predefined ingress location to a predefine... |

244 | Variational Inference for Dirichlet Process Mixtures. Bayesian Analysis
- Blei, Jordan
- 2005
(Show Context)
Citation Context ...he Dirichlet prior may not converge to the correct number of models [83] and comes with no guarantees on regression accuracy or rates of convergence to the true posterior. While variational inference =-=[8, 13]-=- can reduce computation, these approaches require a variational approximation of the distribution, which do not exist for GPs, to the author's knowledge. In contrast to these methods, GP-NBC has been ... |

236 |
Universal approximation using radial-basis-function networks.
- Park, Sandberg
- 1991
(Show Context)
Citation Context ...perceptron Neural Networks [67, 75], the RBFNs are linear-in-parameters universal function approximators. The accuracy of an RBFN representation, however, greatly depends on the choice of RBF centers =-=[93]-=-. Typically, authors have assumed that the operating domain of the system is known and have pre-allocated a fixed quantity of Gaussian RBF centers over the presumed domain [23, 67, 88, 94, 125]. Howev... |

230 | Slotine, "Gaussian networks for direct adaptive control
- Sanner, E
- 1991
(Show Context)
Citation Context ...owest score. The online sparse GP algorithm allocates new basis vector locations dynamically, thus preventing ad-hoc a priori feature selection as is common with other methods such as neural networks =-=[107]-=-. Hyperparameters can be optimized online as well [50, 106]. 2.2 Nonstationary Prediction and Learning In the stationary analysis Section 4.2.1, no assumptions are made about the structure of the dist... |

216 |
Analysis of Recursive Stochastic Algorithms
- Ljung
- 1977
(Show Context)
Citation Context ... the adaptive law that trains on recorded data was found to be Fwb = 0.5. Theoretically, MRAC and CL-MRAC learning rates can be kept constant in adaptive control, however stochastic stability results =-=[79]-=-) indicate that driving the learning rates to zero is required for guaranteeing convergence in presence of noise. The problem with this approach is that the effect of adaptation would be eventually re... |

182 | Sparse on-line Gaussian processes
- Csató, Opper
(Show Context)
Citation Context ...ackup on each step. In DGPQ, the 134 delayed updates of DQL are adapted to continuous state spaces using a GP. It is proven that DGPQ is PAC-MDP and shown that using a sparse online GP implementation =-=[33]-=- DGPQ can perform in real-time. The empirical results, including those on an F-16 simulator, show DGPQ is both sample efficient and orders of magnitude faster in per-sample computation than other PAC-... |

139 | An analytic solution to discrete Bayesian reinforcement learning.
- Poupart, Vlassis, et al.
- 2006
(Show Context)
Citation Context ...t prior can be used [129], however, it has been 27 shown that the Dirichlet prior is not a consistent estimator of the number of tasks [83]. While Bayesian methods have had a variety of success in RL =-=[98]-=-, Bayesian methods do not offer any theoretical guarantees on performance or against negative transfer. Indeed, if the prior distribution is significantly different than the true, or posterior distrib... |

134 | Reinforcement learning with Gaussian processes.
- Engel, Mannor, et al.
- 2005
(Show Context)
Citation Context ...s GPs to model reward and transition functions, and it is proven that UCRL-GP is sample efficient, i.e. PAC-MDP. GPs have been studied in both model-free and model-based RL. In model-free RL, GPSARSA =-=[43]-=- has been used to model a value function and heuristically extended for better empirical exploration (iGP-SARSA) [32]. However, I prove in [52] that these algorithms may require an exponential (in 11-... |

98 | Near-optimal Regret Bounds for Reinforcement Learning.
- Jaksch, Ortner, et al.
- 2010
(Show Context)
Citation Context ...air after sufficient samples have been observed. Thus, until that number of samples is collected, the algorithm ignores all previous information at that state action pair. On the other hand, the UCRL =-=[5]-=- and MBIE [116] algorithms maintain optimistic upper bounds on the reward function and error bounds on the empirical transition probability function, using the information collected in the state actio... |

91 |
Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Pr I Llc,
- Busoniu, Babuska, et al.
- 2010
(Show Context)
Citation Context ... total accumulated reward over each model. Exploration of the domain is required by any reinforcement learning algorithm to ensure that it obtains sufficient information to compute the optimal policy =-=[15]-=-. In this architecture, the exploration strategy is to compute a policy that guides the agents to areas of the domain with high variance, rather than using the estimate of the GP reward model. However... |

84 | Pilco: A model-based and data-efficient approach to policy search.
- Deisenroth, Rasmussen
- 2011
(Show Context)
Citation Context ...e spaces using a GP to 25 model the Q-function and determine regions of high accuracy. In model-based RL, the PILCO algorithm trained GPs to represent T, and then derived policies using policy search =-=[37]-=-. However, PILCO does not include a provably efficient (PAC-MDP) exploration strategy. GP-Rmax [62] does include an exploration strategy, specifically replacing areas of low confidence in the (T and R... |

82 | A theoretical analysis of model-based interval estimation.
- Strehl, Littman
- 2005
(Show Context)
Citation Context ...ficient samples have been observed. Thus, until that number of samples is collected, the algorithm ignores all previous information at that state action pair. On the other hand, the UCRL [5] and MBIE =-=[116]-=- algorithms maintain optimistic upper bounds on the reward function and error bounds on the empirical transition probability function, using the information collected in the state action pair already.... |

77 | Gaussian process priors with uncertain inputs - application to multiple-step ahead time series forecasting
- Girard, Candela, et al.
- 2003
(Show Context)
Citation Context ...ence between the true position and mean predicted position, where the mean position is the GP mean that incorporates the 61 1 0 -1 -2 -3 -4 -5 -6 uncertainty in previous predictions at each time step =-=[48]-=-. Prediction errors for each cluster are averaged across all trajectories at each time step. The prediction error computed by both algorithms for trajectories within the three training clusters was si... |

74 | A Bayesian sampling approach to exploration in reinforcement learning.
- Asmuth, Li, et al.
- 2009
(Show Context)
Citation Context ...ge from a different task to the current task, resulting in extremely poor performance. This phenomena is known as negative transfer. While some algorithms provide guarantees against negative transfer =-=[3, 14, 74]-=-, others offer no such guarantees [122, 129]. In multi-task reinforcement learning, one of the more common frameworks is the hierarchical Bayesian RL framework [129]. In this framework, tasks are draw... |

71 | Knows What It Knows: A Framework for Self-Aware Learning. In
- Li, Littman, et al.
- 2008
(Show Context)
Citation Context ... goal of learning 7T*. For value-based methods (as opposed to policy search [63]), there are roughly two classes of RL algorithms: model-based and model-free. Model-based algorithms, such as KWIKRmax =-=[77]-=-, build models of T(s, a) and R(s, a) and then use a planner to find Q*(s, a). Many model-based approaches have sample efficiency guarantees, that is bounds on the amount of exploration they perform. ... |

70 | Real-time indoor autonomous vehicle test environment
- How, Bethke, et al.
- 2008
(Show Context)
Citation Context ...2 4-11 Space Explored by each planner indicative of the variance . . . . . . . . . 93 12 B-1 Two MIT quadrotors equipped to fly in the ACL Real Time Indoor Autonomous Vehicle Test Environment (RAVEN) =-=[54]-=-. The baseline controller on both quadrotors is PID. The small quadrotor uses gains and thrust mappings from the bigger one, resulting in relatively poor trajectory tracking perform ance. . . . . . . ... |

69 |
R-max-a general polynomial time algorithm for near-optimal reinforcement learning,”
- Brafman, Tennenholtz
- 2003
(Show Context)
Citation Context ... reward and values, UCRL-GP uses an upper probabilistic bound on the estimate of the reward. A similar analogy between GP-Rmax and UCRL-GP exists in discrete RL as well. In particular, discrete R-max =-=[12]-=- only uses information about a state action pair after sufficient samples have been observed. Thus, until that number of samples is collected, the algorithm ignores all previous information at that st... |

63 | Bayesian online changepoint detection. ArXiv e-prints,
- Adams, MacKay
- 2007
(Show Context)
Citation Context ...e learned online. Approaches to CPD and time series monitoring using GPs include GP-changepoint detection algorithm (GP-CPD) [106] based on the Bayesian online changepoint detection (BOCPD) algorithm =-=[1]-=- and using special kernels which include changepoint times [47]. GP-CPD creates a GP model for every possible run length, and performs Bayesian inference to get a distribution over changepoint times. ... |

52 | Reinforcement Learning in Finite MDPs : PAC Analysis.
- Strehl, Li, et al.
- 2009
(Show Context)
Citation Context ...2.5 Probably Approximately Correct in MDPs Framework In the following section, the Probably Approximately Correct in MDPs (PAC-MDP) framework is reviewed. Definitions and Theorems are reproduced from =-=[114]-=-. The PAC-MDP framework is a methodology for analyzing the learning complexity of a reinforcement learning algorithm. Learning complexity refers to the number of steps for which an algorithm must act ... |

48 |
Neural Networks for Control: Theory and Practice,
- Narendra
- 1996
(Show Context)
Citation Context ...d Slotine [107], Radial Basis Function Networks (RBFNs also referred to as RBF-Neural Networks) are perhaps the most widely used adaptive elements when a basis for the modeling uncertainty is unknown =-=[67, 88, 120]-=-. Unlike multi-layer perceptron Neural Networks [67, 75], the RBFNs are linear-in-parameters universal function approximators. The accuracy of an RBFN representation, however, greatly depends on the c... |

42 | Adaptive Trajectory Control for Autonomous Helicopters,”
- Johnson, Kannan
- 2005
(Show Context)
Citation Context ...sure stability and performance in presence of such modeling uncertainty. MRAC has been implemented on several experimental and in-service aerospace flight platforms, including the Georgia Tech GT-MAX =-=[58]-=-, the JDAM guided munition [110], F-36 aircraft [111], the X 33 launch vehicle [59]. It has also been used for fault-tolerant adaptive control research in the context of adaptive control in presence o... |

40 |
Constrained control allocation
- Durham
- 1993
(Show Context)
Citation Context ...nd uniqueness of the solution to (B.1) over D. Also assume that 1 < n, (while restrictive for overactuated systems, this assumption can be relaxed through the design of appropriate control assignment =-=[42]-=-). Further note that while the development here is restricted to control-affine systems, sufficient conditions exist to convert a class of non-affine in control nonlinear systems to the control-affine... |

37 |
Efficient Reinforcement Learning using Gaussian Processes.
- Deisenroth
- 2009
(Show Context)
Citation Context ...idely applied framework for inference in many machine learning and decision making applications, including regression, [102], classification, [85], adaptive control, [26], and reinforcement learning, =-=[38]-=-. However, most existing (online) GP inference algorithms assume that the underlying generative model is stationary, that is, the data generating process is time-invariant. Yet, many online learning t... |

35 | Adaptive output feedback control of nonlinear systems using neural networks
- Calise, Hovakimyan, et al.
- 2001
(Show Context)
Citation Context ...d Model Reference Adaptive Control AMI-MRAC is an approximate feedback-linearization based MRAC method that allows the design of adaptive controllers for a general class of nonlinear plants (see e.g. =-=[16, 58]-=-). The GP-MRAC approach is introduced in the framework of AMI-MRAC, although it should be noted that it is applicable to other MRAC architectures (see e.g. [4, 57, 89, 121]). Let x(t) = [xf(t), x j(t)... |

35 | Online learning of non-stationary sequences.
- Monteleoni, Jaakkola
- 2003
(Show Context)
Citation Context ...thesis considers time series with abrupt changepoints. For abrupt changes, many CPD algorithms do not use or learn models. CPD algorithms with models have been proposed given a set of possible models =-=[84]-=-, but this thesis considers the case where new models may need to be learned online. Approaches to CPD and time series monitoring using GPs include GP-changepoint detection algorithm (GP-CPD) [106] ba... |

34 | Movement templates for learning of hitting and batting,
- Kober, Muelling, et al.
- 2010
(Show Context)
Citation Context ...ated with other pedestrians and with previous goals selected by the pedestrian [6]. Additionally, the pedestrian may change his/her intent at any time within the trajectory [45]. In robotic ping pong =-=[69]-=-, where the person aims the ball is likely to be highly correlated with past strategies. Lastly, these approaches require a costly sampling procedure using Gibbs sampling to obtain the posterior. In m... |

31 | Streaming variational bayes
- Broderick, Boyd, et al.
- 2013
(Show Context)
Citation Context ...he Dirichlet prior may not converge to the correct number of models [83] and comes with no guarantees on regression accuracy or rates of convergence to the true posterior. While variational inference =-=[8, 13]-=- can reduce computation, these approaches require a variational approximation of the distribution, which do not exist for GPs, to the author's knowledge. In contrast to these methods, GP-NBC has been ... |

30 | Constructing skill trees for reinforcement learning agents from demonstration trajectories. Advances in neural information processing systems
- Konidaris, Kuindersma, et al.
- 2010
(Show Context)
Citation Context ... in between episodes. Therefore, the changepoint times are known, so no changepoint detection algorithm is required. Algorithms do exist which perform changepoint detection using hidden Markov models =-=[70]-=- and by formulating the problem as a partially observed MDP (POMDP) [6], but these algo28 rithms assume that the models are known a priori. In many applications, however, the time of the changepoint i... |

30 |
Development of a reconfigurable flight control law for tailless aircraft
- Calise, Lee, et al.
- 2001
(Show Context)
Citation Context ...odeling uncertainty. MRAC has been implemented on several experimental and in-service aerospace flight platforms, including the Georgia Tech GT-MAX [58], the JDAM guided munition [110], F-36 aircraft =-=[111]-=-, the X 33 launch vehicle [59]. It has also been used for fault-tolerant adaptive control research in the context of adaptive control in presence of loss of lifting surface or actuator uncertainties [... |

29 |
Nonlinear network structures for feedback control
- Lewis
- 1999
(Show Context)
Citation Context ...eferred to as RBF-Neural Networks) are perhaps the most widely used adaptive elements when a basis for the modeling uncertainty is unknown [67, 88, 120]. Unlike multi-layer perceptron Neural Networks =-=[67, 75]-=-, the RBFNs are linear-in-parameters universal function approximators. The accuracy of an RBFN representation, however, greatly depends on the choice of RBF centers [93]. Typically, authors have assum... |

25 |
Gaussian process dynamic programming’, Neurocomputing 72(7–9
- Deisenroth, Rasmussen, et al.
- 2009
(Show Context)
Citation Context ...ximation error [34]. In addition, one does not need to manually make the mesh grid finer or coarser to control resolution. In order to perform interpolation, a GP is used in a method similar to GP-DP =-=[40]-=-. At each iteration of the value iteration algorithm (Algorithm 5), a GP is trained using the values at the basis vectors. However, unlike GP-DP, instead of performing updates at every observation loc... |

20 | Necessary and sufficient condition for parameter convergence in adaptive control
- Boyd, Sastry
- 1984
(Show Context)
Citation Context ...hortcomings of the standard gradient based MRAC parameter update laws, such as the lack of convergence guarantees and the possibility of bursting (parameters growing unboundedly) in presence of noise =-=[4, 10, 87]-=-. Efficient "budgeted" online algorithms were presented that ensure tractable computational complexity for online inference and efficient methods for optimizing hyperparameters online while ensuring s... |

20 | Parallelizing exploration-exploitation tradeoffs in gaussian process bandit optimization.
- Desautels, Krause, et al.
- 2014
(Show Context)
Citation Context ...NBC may make per phase and the empirical results show that GP-NBC is orders of magnitude faster than existing methods. In contrast to the sample efficiency results for GPs in the optimization context =-=[41]-=-, the bounds given here are designed specifically for online prediction with non-stationary data. 1.3.2 Reinforcement Learning In this section, relevant work in Reinforcement Learning (RL) is reviewed... |

20 | Bayesian multivariate autoregressive models with structured priors - Penny, Roberts - 2002 |

20 | Adaptive Control Based on Retrospective Cost Optimization,”
- Santillo, Bernstein
- 2010
(Show Context)
Citation Context ...-tolerant adaptive control research in the context of adaptive control in presence of loss of lifting surface or actuator uncertainties [22]. Several active directions in MRAC research exist, such as =-=[17, 31, 73, 91, 108, 130]-=-, and a central element of these and many other MRAC architectures is a parametrized adaptive element, the parameters of which are tuned online by the MRAC architecture to capture the modeling uncerta... |

19 | Sequential bayesian prediction in the presence of changepoints,” in
- Garnett, Osborne, et al.
- 2009
(Show Context)
Citation Context ...er is important in many domains in which models may be revisited, such as pedestrian tracking [6] and robotic ping pong [126]. Online GP changepoint detection algorithms have been proposed previously =-=[47, 106]-=-. However, these algorithms come with no theoretical guarantees on accuracy and cannot reuse models after changepoints. Additionally, as shown in the experiments section, Section 3.3, these algorithms... |

18 |
High-Level Feedback Control with Neural Networks
- Kim, Lewis
- 1998
(Show Context)
Citation Context ...d Slotine [107], Radial Basis Function Networks (RBFNs also referred to as RBF-Neural Networks) are perhaps the most widely used adaptive elements when a basis for the modeling uncertainty is unknown =-=[67, 88, 120]-=-. Unlike multi-layer perceptron Neural Networks [67, 75], the RBFNs are linear-in-parameters universal function approximators. The accuracy of an RBFN representation, however, greatly depends on the c... |

17 |
Limited Authority Adaptive Flight Control for Reusable Launch Vehicles,”
- Johnson, Calise
- 2003
(Show Context)
Citation Context ...een implemented on several experimental and in-service aerospace flight platforms, including the Georgia Tech GT-MAX [58], the JDAM guided munition [110], F-36 aircraft [111], the X 33 launch vehicle =-=[59]-=-. It has also been used for fault-tolerant adaptive control research in the context of adaptive control in presence of loss of lifting surface or actuator uncertainties [22]. Several active directions... |

16 |
model-free reinforcement learning
- PAC
- 2006
(Show Context)
Citation Context ...of the nafve model-free approach. The underlying analogy is that while GP-Rmax and C-PACE are generalizations of model-based Rmax [11], DGPQ is a generalization of model-free Delayed Q-learning (DQL) =-=[114, 115]-=- to general continuous spaces. That is, while early PAC-MDP RL algorithms were model-based and ran a planner after each experience [77], DQL is a PACMDP algorithm that performs at most a single Bellma... |

15 | A simple example of dirichlet process mixture inconsistency for the number of components.
- Miller, Harrison
- 2013
(Show Context)
Citation Context ... inference methods fail to achieve the required computational efficiency for many real-time prediction applications. Additionally, the Dirichlet prior may not converge to the correct number of models =-=[83]-=- and comes with no guarantees on regression accuracy or rates of convergence to the true posterior. While variational inference [8, 13] can reduce computation, these approaches require a variational a... |

15 |
Rasmussen and Zoubin Ghahramani. Infinite mixtures of Gaussian process experts
- Edward
- 2001
(Show Context)
Citation Context ...the other hand, algorithms that learn hierarchical GP-mixture models largely deal with changepoints using BNP approaches, such as the Infinite Mixture of Experts (referred to in this thesis as DP-GP) =-=[105]-=- and the algorithm proposed in [113], which is referred to as MCMC-CRP in this thesis. DP-GP uses a Dirichlet prior and Gibbs sampling over individual data points to obtain the posterior distribution ... |

14 |
Intentionaware motion planning
- Bandyopadhyay, Won, et al.
- 2013
(Show Context)
Citation Context ..., and cannot make use of previously learned models that may reappear and become applicable again. The latter is important in many domains in which models may be revisited, such as pedestrian tracking =-=[6]-=- and robotic ping pong [126]. Online GP changepoint detection algorithms have been proposed previously [47, 106]. However, these algorithms come with no theoretical guarantees on accuracy and cannot r... |

14 | Actuator constrained trajectory generation and control for variable-pitch Quadrotor
- Cutler, How
- 2012
(Show Context)
Citation Context ..., which does not distinguish between real and simulated agents. The simulated agents use physics based dynamics model for a medium sized Quadrotor and quaternion based trajectory tracking controllers =-=[36, 55]-=-. The real-UAVs fly in an experimental testbed; the testbed area is 16 x 12 ft and it is equipped with the Optitrack motion capture system and is designed to conduct real-time testing in an emulated u... |

14 | An Empirical Analysis of Value Function-Based and Policy Search Reinforcement Learning
- Kalyanakrishnan, Stone
- 2009
(Show Context)
Citation Context ...ice the algorithm performs well even in stochastic domains. In RL, an agent is given S, A, and y and then acts in M with the goal of learning 7T*. For value-based methods (as opposed to policy search =-=[63]-=-), there are roughly two classes of RL algorithms: model-based and model-free. Model-based algorithms, such as KWIKRmax [77], build models of T(s, a) and R(s, a) and then use a planner to find Q*(s, a... |

14 |
An adaptive autopilot design for guided munitions
- Calise, Sharma
- 1998
(Show Context)
Citation Context ...in presence of such modeling uncertainty. MRAC has been implemented on several experimental and in-service aerospace flight platforms, including the Georgia Tech GT-MAX [58], the JDAM guided munition =-=[110]-=-, F-36 aircraft [111], the X 33 launch vehicle [59]. It has also been used for fault-tolerant adaptive control research in the context of adaptive control in presence of loss of lifting surface or act... |

11 |
Robust adaptive control in the presence of bounded disturbances
- Narendra, Annaswamy
- 1986
(Show Context)
Citation Context ...hortcomings of the standard gradient based MRAC parameter update laws, such as the lack of convergence guarantees and the possibility of bursting (parameters growing unboundedly) in presence of noise =-=[4, 10, 87]-=-. Efficient "budgeted" online algorithms were presented that ensure tractable computational complexity for online inference and efficient methods for optimizing hyperparameters online while ensuring s... |

11 |
Narendra and Anuradha M. Annaswamy. Stable Adaptive Systems
- Kumpati
- 1989
(Show Context)
Citation Context ...class of nonlinear plants (see e.g. [16, 58]). The GP-MRAC approach is introduced in the framework of AMI-MRAC, although it should be noted that it is applicable to other MRAC architectures (see e.g. =-=[4, 57, 89, 121]-=-). Let x(t) = [xf(t), x j(t)]T E Dx C R", such that xi(t) c R"s, x 2 (t) E Rns, and rt 2ni. Let 5 c D6 c R', and consider the following multiple-input controllable control-affine nonlinear uncertain d... |

10 |
Gaussian processes for sample efficient reinforcement learning with rmax-like exploration
- Jung, Stone
- 2012
(Show Context)
Citation Context ...suming that a state has a maximal value Q(s, a) = Vmax until enough data points have been observed in that state action pair such that the reward and transition estimates are highly accurate. GP-Rmax =-=[60]-=- is a previously proposed algorithm which generalizes Rmax to continuous domains by modeling the Q function with a GP and assuming that Q(s, a) is Vmax if the variance of the GP is too high. GP-Rmax w... |

9 |
Adaptive Control of Systems in Cascade with Saturation
- Kannan
- 2005
(Show Context)
Citation Context ...f non-affine in control nonlinear systems to the control-affine form in (B.1) 107 (see Chapter 13 in [65]), and the AMI-MRAC framework can also be extended to a class of non-affine in control systems =-=[64, 66]-=-. The AMI-MRAC approach used here feedback linearizes the system by finding a pseudocontrol input v(t) c Rn, that achieves a desired acceleration. If the exact plant model in (B. 1) is known and inver... |

9 |
Improved Methods in Neural Network Based Adaptive Output Feedback Control, with Applications to Flight Control
- Kim
- 2003
(Show Context)
Citation Context ...f non-affine in control nonlinear systems to the control-affine form in (B.1) 107 (see Chapter 13 in [65]), and the AMI-MRAC framework can also be extended to a class of non-affine in control systems =-=[64, 66]-=-. The AMI-MRAC approach used here feedback linearizes the system by finding a pseudocontrol input v(t) c Rn, that achieves a desired acceleration. If the exact plant model in (B. 1) is known and inver... |

9 | Gausssian process change point models
- Saatci, Turner, et al.
- 2010
(Show Context)
Citation Context ...er is important in many domains in which models may be revisited, such as pedestrian tracking [6] and robotic ping pong [126]. Online GP changepoint detection algorithms have been proposed previously =-=[47, 106]-=-. However, these algorithms come with no theoretical guarantees on accuracy and cannot reuse models after changepoints. Additionally, as shown in the experiments section, Section 3.3, these algorithms... |

8 |
Design and Control of an Autonomous Variable-Pitch Quadrotor Helicopter
- Cutler
- 2012
(Show Context)
Citation Context ...nds being calculated at 1 kHz. Due to limitations of the speed controllers, the smaller quadrotor motors only accept motor updates at 490 Hz. More details on the autopilot and attitude control are in =-=[35, 36]-=-. B.6.2 Augmentation of baseline linear control with adaptation The outer loop (position and velocity) of the baseline proportional-integral-derivative controller (PID) on the quadrotor was augmented ... |

8 |
Reducing reinforcement learning to KWIK online regression
- Li, Littman
- 2010
(Show Context)
Citation Context ...136 actually is PAC-MDP, but the algorithm's planning phase (comparable to the C-PACE planning in the experiments) after each update makes it computationally infeasible for real-time control. REKWIRE =-=[76]-=- uses KWIK linear regression for a model-free PAC-MDP algorithm. However, REKWIRE needs H separate approximators for finite horizon H, and assumes Q can be modeled as a linear function, in contrast to... |

8 | Gaussian processes for nonlinear signal processing: An overview of recent advances,”
- Perez-Cruz, Vaerenbergh, et al.
- 2013
(Show Context)
Citation Context ...can be reliably characterized for applications requiring learning rates and safety guarantees. One way of dealing with non-stationarity is to add time, or another counting mechanism, to the GP kernel =-=[26, 97]-=-. However, this technique forgets useful information from earlier phases of learning if no changepoints occur, and cannot make use of previously learned models that may reappear and become applicable ... |

7 |
Design and analysis of a novel adaptive control architecture with guaranteed transient performance,
- Cao, Hovakimyan
- 2008
(Show Context)
Citation Context ...-tolerant adaptive control research in the context of adaptive control in presence of loss of lifting surface or actuator uncertainties [22]. Several active directions in MRAC research exist, such as =-=[17, 31, 73, 91, 108, 130]-=-, and a central element of these and many other MRAC architectures is a parametrized adaptive element, the parameters of which are tuned online by the MRAC architecture to capture the modeling uncerta... |

6 | Sample complexity of multi-task reinforcement learning. arXiv preprint arXiv:1309.6821
- Brunskill, Li
- 2013
(Show Context)
Citation Context ...r past experience to speed up learning and improve performance. Most multi-task reinforcement solutions that have been proposed previously require 19 that all tasks be encountered in a training phase =-=[14, 123, 129]-=- or require a distributional assumption about how the MDPs are generated [129]. These algorithms also assume that changes may only occur in between episodes, requiring that the changepoint time is kno... |

6 | Sequential transfer in multi-armed bandit with finite set of models.
- Azar, Lazaric, et al.
- 2013
(Show Context)
Citation Context ...ge from a different task to the current task, resulting in extremely poor performance. This phenomena is known as negative transfer. While some algorithms provide guarantees against negative transfer =-=[3, 14, 74]-=-, others offer no such guarantees [122, 129]. In multi-task reinforcement learning, one of the more common frameworks is the hierarchical Bayesian RL framework [129]. In this framework, tasks are draw... |

6 | Scalable reward learning from demonstration
- Michini, Cutler, et al.
- 2013
(Show Context)
Citation Context ...occurred. Lastly, in inverse reinforcement learning (IRL), the agent attempts to learn a task through demonstration by an expert. In the case that the task contains multiple subtasks, many algorithms =-=[70, 82]-=- exist to automatically decompose a task into the relevant subtasks, or segments. While there are many technical parallels between decomposing a task and changepoint detection for multi-task RL, the f... |

6 | PAC optimal exploration in continuous space Markov decision processes.
- PAZIS, PARR
- 2013
(Show Context)
Citation Context ...this work is proving that GPs, a nonparametric FA, provide such a notion of predictive confidence. The full proof that GPs are KWIK-learnable are available in Appendix C. Lastly, the C-PACE algorithm =-=[95]-=- has already been shown to be PAC-MDP in the continuous setting, though it does not use a GP representation. C-PACE stores data points that do not have close-enough neighbors to be considered "known".... |

6 |
Fully Tuned Radial Basis Function Neural Networks For Flight Control,"
- 14Sundararajan, Saratchandran, et al.
- 2002
(Show Context)
Citation Context ...rs over the presumed domain [23, 67, 88, 94, 125]. However, if the system operates outside of the domain of the RBF kernel's centers [93], or if the bandwidth of the kernel is not correctly specified =-=[86, 109, 117]-=-, the modeling error will be high and stability cannot be guaranteed globally. In fact, RBFN based stability results require that the system operate enforce constraints such that the system does not l... |

5 |
Concurrent learning for convergence in adaptive control without persistency of excitation
- Chowdhary, Johnson
- 2010
(Show Context)
Citation Context ...e choice of RBF centers [93]. Typically, authors have assumed that the operating domain of the system is known and have pre-allocated a fixed quantity of Gaussian RBF centers over the presumed domain =-=[23, 67, 88, 94, 125]-=-. However, if the system operates outside of the domain of the RBF kernel's centers [93], or if the bandwidth of the kernel is not correctly specified [86, 109, 117], the modeling error will be high a... |

5 | Neural Network based Adaptive Algorithms for Nonlinear Control
- Nardi
- 2000
(Show Context)
Citation Context ...rs over the presumed domain [23, 67, 88, 94, 125]. However, if the system operates outside of the domain of the RBF kernel's centers [93], or if the bandwidth of the kernel is not correctly specified =-=[86, 109, 117]-=-, the modeling error will be high and stability cannot be guaranteed globally. In fact, RBFN based stability results require that the system operate enforce constraints such that the system does not l... |

5 | Bayesian inference for change points in dynamical systems with reusable states—a Chinese restaurant process approach
- Stimberg, Opper, et al.
(Show Context)
Citation Context ...n hierarchical GP-mixture models largely deal with changepoints using BNP approaches, such as the Infinite Mixture of Experts (referred to in this thesis as DP-GP) [105] and the algorithm proposed in =-=[113]-=-, which is referred to as MCMC-CRP in this thesis. DP-GP uses a Dirichlet prior and Gibbs sampling over individual data points to obtain the posterior distribution over model assignments and the numbe... |

4 | Sample efficient reinforcement learning with Gaussian processes
- Grande, Walsh, et al.
- 2014
(Show Context)
Citation Context ...model-free and model-based RL. In model-free RL, GPSARSA [43] has been used to model a value function and heuristically extended for better empirical exploration (iGP-SARSA) [32]. However, I prove in =-=[52]-=- that these algorithms may require an exponential (in 11-) number of samples to reach optimal behavior since they only use a single GP to model the value function. This work is reproduced in Appendix ... |

4 |
Reproducing kernel hilbert space approach for the online update of radial bases in neuro-adaptive control. Neural Networks and Learning Systems
- Kingravi, Chowdhary, et al.
- 2012
(Show Context)
Citation Context ...need not be pre-allocated. GPs utilize a Bayesian framework which models uncertainties as a distributions overfunctions, which differs from the traditional deterministic weight-space based approaches =-=[67, 68, 88, 125]-=-. Furthermore, Bayesian inference in GP-MRAC overcomes the shortcomings of the standard gradient based MRAC parameter update laws, such as the lack of convergence guarantees and the possibility of bur... |

4 |
Combined/composit model reference adaptive control
- Lavretsky
- 2009
(Show Context)
Citation Context ...-tolerant adaptive control research in the context of adaptive control in presence of loss of lifting surface or actuator uncertainties [22]. Several active directions in MRAC research exist, such as =-=[17, 31, 73, 91, 108, 130]-=-, and a central element of these and many other MRAC architectures is a parametrized adaptive element, the parameters of which are tuned online by the MRAC architecture to capture the modeling uncerta... |

4 |
Daniil Ryabko, et al. Online regret bounds for undiscounted continuous reinforcement learning
- Ortner
- 2012
(Show Context)
Citation Context ...sufficiently high, i.e. the variance is sufficiently low, whereas UCRL-GP maintains an optimistic upper bound on the reward function, thus allowing for a less conservative approach to planning. UCRLC =-=[92]-=- is also an algorithm for continuous RL that maintains an optimistic upper bound on the reward, but this algorithm relies on state space discretization, and requires a specific planner that may not be... |

3 |
Human aware uas path planning in urban environments using nonstationary mdps
- Allamaraju, Kingravi, et al.
- 2014
(Show Context)
Citation Context ...00 RBF centers were generated using a uniform random distribution over a domain where the states were expected to evolve. The centers for the position and velocity for the x,y axes were spread across =-=[-2, 2]-=-. For the vertical z axis, the position and velocity were spread across [-0.5, 0.5] and [-0.6, 0.6] respectively. The centers for quaternions were placed within [-1, 1]. The bandwidth for each radial ... |

3 |
Nonparametric adaptive control using Gaussian processes with online hyperparameter estimation
- Grande, Chowdhary, et al.
- 2013
(Show Context)
Citation Context ...tive baseline control to aggressive adaptive control based on the learned model. It should be noted that preliminary flight test results over a limited experiment were presented in a conference paper =-=[19]-=-. B.3 Approximate Model Inversion based Model Reference Adaptive Control AMI-MRAC is an approximate feedback-linearization based MRAC method that allows the design of adaptive controllers for a genera... |

3 |
Concurrent learning for improved parameter convergence in adaptive control
- Chowdhary, Johnson
- 2010
(Show Context)
Citation Context ...eter RBF-NNs in terms of tracking error, transient performance, and learning model uncertainty. GP-MRAC is also compared with the previously flight-tested Concurrent Learning MRAC (CL-MRAC) algorithm =-=[24, 25, 30]-=-, as well as a variant of GP-MRAC which automatically learns hyperparameters in an online setting. The experiments show that the ability to dynamically place new kernel centers enables GP-MRAC to main... |

3 |
Nonparameteric adaptive control of time varying systems using gaussian processes. Aerospace control laboratory technical report
- Chowdhary, Kingravi, et al.
- 2013
(Show Context)
Citation Context ...ment for use in a control law. To overcome the aforementioned limitations of RBFNs and other fixed-parameter adaptive elements, Gaussian Process (GP) Bayesian nonparametric adaptive elements are used =-=[20, 27, 28]-=-, creating a new class of BNP adaptive control in an architecture called GPMRAC. Unlike traditional MRAC, GP-MRAC is nonparametric and makes no assumptions about the operating domain. Using only data ... |

3 |
Domestic Use of Aerial Drones by Law Enforcement Likely to Prompt Privacy Debate
- Finn
(Show Context)
Citation Context ...cameras, which lead to growing concerns among the public that UAS on missions that require flying over inhabited areas could invade privacy by taking pictures of humans in private locations (see e.g. =-=[46]-=-). Human-aware UAS path planning algorithms that minimize the likelihood of a UAS flying over areas with high human density could potentially address this issue. Such algorithms would also be useful f... |

3 |
Experimental validation of bayesian nonparametric adaptive control using Gaussian processes,”
- Grande, Chowdhary, et al.
- 2014
(Show Context)
Citation Context ...r past models correctly. Lastly, the appendices of this thesis include work for two other projects which I worked on during my Masters. These include my work on Gaussian Processes in adaptive control =-=[49, 50]-=- in Appendix B, as well as the first sample complexity results ever for a modelfree continuous RL algorithm, DGPQ [52], which is presented in Appendix C. 30 Chapter 2 Preliminaries This section discus... |

3 |
A McAllester. Some pac-bayesian theorems
- David
- 1998
(Show Context)
Citation Context ...tist approaches. These bounds, however, are notoriously loose, and are more interesting for asymptotic analysis and theoretical justification of algorithms rather actual use in practice. PAC-Bayesian =-=[80]-=- is a class of methods which employ priors and perform posterior inference, making them Bayesian, but also employ some aspect of PAC proofs to theoretically prove that an algorithm is correct. It woul... |

3 |
Asymptotic linearity of optimal control modification adaptive law with analytical stability margins
- Nguyen
- 2010
(Show Context)
Citation Context |

3 |
Self-Organizing radial basis function networks for adaptive flight control and aircraft engine state estimation
- Shankar
- 2007
(Show Context)
Citation Context ...rs over the presumed domain [23, 67, 88, 94, 125]. However, if the system operates outside of the domain of the RBF kernel's centers [93], or if the bandwidth of the kernel is not correctly specified =-=[86, 109, 117]-=-, the modeling error will be high and stability cannot be guaranteed globally. In fact, RBFN based stability results require that the system operate enforce constraints such that the system does not l... |

2 |
Chapron, Marie-Jose Cros, Frdrick Garcia, and Rgis Sabbadin. Markov decision processes (MDP) toolbox. http: / /www7 . inra. f r/ mia/T/MDPtoolbox/MDPtoolbox. html
- Chads, Guillaume
- 2012
(Show Context)
Citation Context ...onding GP. The urban arena is split into grids of dimension 50 x 50, and the agent action set consists of a decision to move to any directly adjacent grid. An efficient policy-iteration based planner =-=[18]-=- is used to compute the value function at the beginning of each run using the learned reward model. The resulting policy is further discretized into a set of waypoints, which are then sent to the agen... |

2 |
Autonomous guidance and control of airplanes under actuator failures and severe structural damage
- Chowdhary, Johnson, et al.
(Show Context)
Citation Context ...eter RBF-NNs in terms of tracking error, transient performance, and learning model uncertainty. GP-MRAC is also compared with the previously flight-tested Concurrent Learning MRAC (CL-MRAC) algorithm =-=[24, 25, 30]-=-, as well as a variant of GP-MRAC which automatically learns hyperparameters in an online setting. The experiments show that the ability to dynamically place new kernel centers enables GP-MRAC to main... |

2 | Off-policy reinforcement learning with gaussian processes. Acta Automatica Sinica
- Chowdhary, Liu, et al.
- 2014
(Show Context)
Citation Context ... Section 2.1. C.3.3 Related Work GPs have been used for both model-free and model-based RL. In model-free RL, GP-Sarsa [43] has been used to model a value function and extended to off-policy learning =-=[29]-=-, and for (heuristically) better exploration in iGP-Sarsa [32]. However, it is proven in Section C.5.1 that these algorithms may require an exponential (in I.) number of samples to reach optimal behav... |

2 |
Rapid transfer of controllers between UAVs using learning based adaptive control
- Chowdhary, Wu, et al.
- 2013
(Show Context)
Citation Context ... settings from q E [0.1, 2] and m E [3, 50] is analyzed. For 7 > 2, the changepoint does not trigger a changepoint, for 7 < 0.1, false positives start occurring. For reasonable values of j ~ 0.5, m E =-=[3,30]-=- results in similar performance. This demonstrates the robustness of GP-NBC to parameter selection. 58 1.6 1.5 1.4 1.3 1.2 0.15 0.4 0.3 3.3.2 eal Dtas. t 0.2 0.1 Figure 3-3: Heat map of the Mean Abs E... |

2 |
and Salah Sukkarieh. Gaussian processes for informative exploration in reinforcement learning
- Chung, Lawrance
- 2013
(Show Context)
Citation Context ...have been studied in both model-free and model-based RL. In model-free RL, GPSARSA [43] has been used to model a value function and heuristically extended for better empirical exploration (iGP-SARSA) =-=[32]-=-. However, I prove in [52] that these algorithms may require an exponential (in 11-) number of samples to reach optimal behavior since they only use a single GP to model the value function. This work ... |

2 | Real-time predictive modeling and robust avoidance of pedestrians with uncertain, changing intentions
- Ferguson, Luders, et al.
- 2014
(Show Context)
Citation Context ...pedestrian is highly correlated with other pedestrians and with previous goals selected by the pedestrian [6]. Additionally, the pedestrian may change his/her intent at any time within the trajectory =-=[45]-=-. In robotic ping pong [69], where the person aims the ball is likely to be highly correlated with past strategies. Lastly, these approaches require a costly sampling procedure using Gibbs sampling to... |

2 | Online regression for data with changepoints using Gaussian processes and reusable models - Grande, Walsh, et al. |

2 |
Girish Chowdhary. Handbook of Unmanned Aerial Vehicles, chapter Linear Flight Contol Techniques for Unmanned Aerial Vehicles
- How, Frazzoli
- 2012
(Show Context)
Citation Context ..., which does not distinguish between real and simulated agents. The simulated agents use physics based dynamics model for a medium sized Quadrotor and quaternion based trajectory tracking controllers =-=[36, 55]-=-. The real-UAVs fly in an experimental testbed; the testbed area is 16 x 12 ft and it is equipped with the Optitrack motion capture system and is designed to conduct real-time testing in an emulated u... |

2 |
Neural network based model reference adaptive control system
- Patino, Liu
- 2000
(Show Context)
Citation Context ...e choice of RBF centers [93]. Typically, authors have assumed that the operating domain of the system is known and have pre-allocated a fixed quantity of Gaussian RBF centers over the presumed domain =-=[23, 67, 88, 94, 125]-=-. However, if the system operates outside of the domain of the RBF kernel's centers [93], or if the bandwidth of the kernel is not correctly specified [86, 109, 117], the modeling error will be high a... |

1 |
Tansel Yucelen, Maximillian Mifhlegg, and Eric N Johnson. Concurrent learning adaptive control of linear systems with exponentially convergent bounds
- Chowdhary
- 2012
(Show Context)
Citation Context |