Results 1  10
of
3,934
Reinforcement Learning I: Introduction
, 1998
"... In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search ..."
Abstract

Cited by 5614 (118 self)
 Add to MetaCart
In which we try to give a basic intuitive sense of what reinforcement learning is and how it differs and relates to other fields, e.g., supervised learning and neural networks, genetic algorithms and artificial life, control theory. Intuitively, RL is trial and error (variation and selection, search) plus learning (association, memory). We argue that RL is the only field that seriously addresses the special features of the problem of learning from interaction to achieve longterm goals.
Reinforcement learning: a survey
 Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract

Cited by 1714 (25 self)
 Add to MetaCart
(Show Context)
This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trialanderror interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Multicast Routing in Datagram Internetworks and Extended LANs
 ACM Transactions on Computer Systems
, 1990
"... Multicasting, the transmission of a packet to a group of hosts, is an important service for improving the efficiency and robustness of distributed systems and applications. Although multicast capability is available and widely used in local area networks, when those LANs are interconnected by store ..."
Abstract

Cited by 1074 (5 self)
 Add to MetaCart
(Show Context)
Multicasting, the transmission of a packet to a group of hosts, is an important service for improving the efficiency and robustness of distributed systems and applications. Although multicast capability is available and widely used in local area networks, when those LANs are interconnected by storeandforward routers, the multicast service is usually not offered across the resulting internetwork. To address this limitation, we specify extensions to two common internetwork routing algorithmsdistancevector routing and linkstate routingto support lowdelay datagram multicasting beyond a single LAN. We also describe modifications to the singlespanningtree routing algorithm commonly used by linklayer bridges, to reduce the costs of multicasting in large extended LANs. Finally, we discuss how the use of multicast scope control and hierarchical multicast routing allows the multicast service to scale up to large internetworks.
Dynamic programming algorithm optimization for spoken word recognition
 IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING
, 1978
"... This paper reports on an optimum dynamic programming (DP) based timenormalization algorithm for spoken word recognition. First, a general principle of timenormalization is given using timewarping function. Then, two timenormalized distance definitions, ded symmetric and asymmetric forms, are der ..."
Abstract

Cited by 788 (3 self)
 Add to MetaCart
This paper reports on an optimum dynamic programming (DP) based timenormalization algorithm for spoken word recognition. First, a general principle of timenormalization is given using timewarping function. Then, two timenormalized distance definitions, ded symmetric and asymmetric forms, are derived from the principle. These two forms are compared with each other through theoretical discussions and experimental studies. The symmetric form algorithm superiority is established. A new technique, called slope constraint, is successfully introduced, in which the warping function slope is restricted so as to improve discrimination between words in different categories. AND SEIBI CHIBA vestigations were made, based on the assumption that speech patterns are timesampled with a common and uniform sampling period, as in most general cases. One of the problems discussed in this paper involves the relative superiority of either a symmetric form of DPmatching or an asymmetric one. In the asymmetric form, timenormalization is achieved by transforming the time axis of a speech pattern onto that of the other. In the symmetric form, on the other hand, both time axes are transformed onto a temporarily defined common axis. Theoretical and experimental comparisons show that the symmetric form gives better recognition than the asymmetric one. Another problem discussed concerns slope constraint technique. Since too much of the warping function flexibility sometimes results in poor discrimination between words in different The effective slope constraint characteristic is qualitatively analyzed, and the optimum slope constraint condition is determined through experiments. The optimized algorithm is then extensively subjected to experimentat comparison with various DPalgorithms, previously applied to spoken word recognition by different research groups. The experiment shows that the present algorithm gives no more than about twothirds errors, even compared to the best conventional algorithm. categories, a constraint is newly introduced on the warping I.
Constrained model predictive control: Stability and optimality
 AUTOMATICA
, 2000
"... Model predictive control is a form of control in which the current control action is obtained by solving, at each sampling instant, a finite horizon openloop optimal control problem, using the current state of the plant as the initial state; the optimization yields an optimal control sequence and t ..."
Abstract

Cited by 738 (16 self)
 Add to MetaCart
Model predictive control is a form of control in which the current control action is obtained by solving, at each sampling instant, a finite horizon openloop optimal control problem, using the current state of the plant as the initial state; the optimization yields an optimal control sequence and the first control in this sequence is applied to the plant. An important advantage of this type of control is its ability to cope with hard constraints on controls and states. It has, therefore, been widely applied in petrochemical and related industries where satisfaction of constraints is particularly important because efficiency demands operating points on or close to the boundary of the set of admissible states and controls. In this review, we focus on model predictive control of constrained systems, both linear and nonlinear and discuss only briefly model predictive control of unconstrained nonlinear and/or timevarying systems. We concentrate our attention on research dealing with stability and optimality; in these areas the subject has developed, in our opinion, to a stage where it has achieved sufficient maturity to warrant the active interest of researchers in nonlinear control. We distill from an extensive literature essential principles that ensure stability and use these to present a concise characterization of most of the model predictive controllers that have been proposed in the literature. In some cases the finite horizon optimal control problem solved online is exactly equivalent to the same problem with an infinite horizon; in other cases it is equivalent to a modified infinite horizon optimal control problem. In both situations, known advantages of infinite horizon optimal control accrue.
UPPAAL in a Nutshell
, 1997
"... . This paper presents the overall structure, the design criteria, and the main features of the tool box Uppaal. It gives a detailed user guide which describes how to use the various tools of Uppaal version 2.02 to construct abstract models of a realtime system, to simulate its dynamical behavior, ..."
Abstract

Cited by 662 (51 self)
 Add to MetaCart
(Show Context)
. This paper presents the overall structure, the design criteria, and the main features of the tool box Uppaal. It gives a detailed user guide which describes how to use the various tools of Uppaal version 2.02 to construct abstract models of a realtime system, to simulate its dynamical behavior, to specify and verify its safety and bounded liveness properties in terms of its model. In addition, the paper also provides a short review on casestudies where Uppaal is applied, as well as references to its theoretical foundation. 1 Introduction Uppaal is a tool box for modeling, simulation and verification of realtime systems, based on constraintsolving and onthefly techniques, developed jointly by Uppsala University and Aalborg University. It is appropriate for systems that can be modeled as a collection of nondeterministic processes with finite control structure and realvalued clocks, communicating through channels and (or) shared variables [34, 26]. Typical application areas in...
Randomized kinodynamic planning
 THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH 2001; 20; 378
, 2001
"... This paper presents the first randomized approach to kinodynamic planning (also known as trajectory planning or trajectory design). The task is to determine control inputs to drive a robot from an initial configuration and velocity to a goal configuration and velocity while obeying physically based ..."
Abstract

Cited by 626 (35 self)
 Add to MetaCart
This paper presents the first randomized approach to kinodynamic planning (also known as trajectory planning or trajectory design). The task is to determine control inputs to drive a robot from an initial configuration and velocity to a goal configuration and velocity while obeying physically based dynamical models and avoiding obstacles in the robot’s environment. The authors consider generic systems that express the nonlinear dynamics of a robot in terms of the robot’s highdimensional configuration space. Kinodynamic planning is treated as a motionplanning problem in a higher dimensional state space that has both firstorder differential constraints and obstaclebased global constraints. The state space serves the same role as the configuration space for basic path planning; however, standard randomized pathplanning techniques do not directly apply to planning trajectories in the state space. The authors have developed a randomized
DecisionTheoretic Planning: Structural Assumptions and Computational Leverage
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives ..."
Abstract

Cited by 515 (4 self)
 Add to MetaCart
Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDPrelated methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to de...