Results 1  10
of
1,195
Reinforcement learning: a survey
 Journal of Artificial Intelligence Research
, 1996
"... This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem ..."
Abstract

Cited by 1298 (23 self)
 Add to MetaCart
This paper surveys the field of reinforcement learning from a computerscience perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trialanderror interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word "reinforcement." The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.
Planning and acting in partially observable stochastic domains
 ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract

Cited by 822 (31 self)
 Add to MetaCart
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finitememory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
A First Step towards Automated Detection of Buffer Overrun Vulnerabilities
 In Network and Distributed System Security Symposium
, 2000
"... We describe a new technique for finding potential buffer overrun vulnerabilities in securitycritical C code. The key to success is to use static analysis: we formulate detection of buffer overruns as an integer range analysis problem. One major advantage of static analysis is that security bugs can ..."
Abstract

Cited by 339 (10 self)
 Add to MetaCart
We describe a new technique for finding potential buffer overrun vulnerabilities in securitycritical C code. The key to success is to use static analysis: we formulate detection of buffer overruns as an integer range analysis problem. One major advantage of static analysis is that security bugs can be eliminated before code is deployed. We have implemented our design and used our prototype to find new remotelyexploitable vulnerabilities in a large, widely deployed software package. An earlier hand audit missed these bugs. 1.
Some efficient solutions to the affine scheduling problem  Part I Onedimensional Time
, 1996
"... Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In many cases, actions may be labelled by integral vectors in some iteration domain, and precedence constraints may be described by affine relations. A s ..."
Abstract

Cited by 216 (18 self)
 Add to MetaCart
Programs and systems of recurrence equations may be represented as sets of actions which are to be executed subject to precedence constraints. In many cases, actions may be labelled by integral vectors in some iteration domain, and precedence constraints may be described by affine relations. A schedule for such a program is a function which assigns an execution date to each action. Knowledge of such a schedule allows one to estimate the intrinsic degree of parallelism of the program and to compile a parallel version for multiprocessor architectures or systolic arrays. This paper deals with the problem of finding closed form schedules as affine or piecewise affine functions of the iteration vector. An efficient algorithm is presented which reduces the scheduling problem to a parametric linear program of small size, which can be readily solved by an efficient algorithm.
A Logic for Reasoning about Probabilities
 Information and Computation
, 1990
"... We consider a language for reasoning about probability which allows us to make statements such as “the probability of E, is less than f ” and “the probability of E, is at least twice the probability of E,, ” where E, and EZ are arbitrary events. We consider the case where all events are measurable ( ..."
Abstract

Cited by 214 (19 self)
 Add to MetaCart
We consider a language for reasoning about probability which allows us to make statements such as “the probability of E, is less than f ” and “the probability of E, is at least twice the probability of E,, ” where E, and EZ are arbitrary events. We consider the case where all events are measurable (i.e., represent measurable sets) and the more general case, which is also of interest in practice, where they may not be measurable. The measurable case is essentially a formalization of (the propositional fragment of) Nilsson’s probabilistic logic. As we show elsewhere, the general (nonmeasurable) case corresponds precisely to replacing probability measures by DempsterShafer belief functions. In both cases, we provide a complete axiomatization and show that the problem of deciding satistiability is NPcomplete, no worse than that of propositional logic. As a tool for proving our complete axiomatizations, we give a complete axiomatization for reasoning about Boolean combinations of linear inequalities, which is of independent interest. This proof and others make crucial use of results from the theory of linear programming. We then extend the language to allow reasoning about conditional probability and show that the resulting logic is decidable and completely axiomatizable, by making use of the theory of real closed fields. ( 1990 Academic Press. Inc 1.
Model Checking of Probabilistic and Nondeterministic Systems
, 1995
"... . The temporal logics pCTL and pCTL* have been proposed as tools for the formal specification and verification of probabilistic systems: as they can express quantitative bounds on the probability of system evolutions, they can be used to specify system properties such as reliability and performance. ..."
Abstract

Cited by 200 (13 self)
 Add to MetaCart
. The temporal logics pCTL and pCTL* have been proposed as tools for the formal specification and verification of probabilistic systems: as they can express quantitative bounds on the probability of system evolutions, they can be used to specify system properties such as reliability and performance. In this paper, we present modelchecking algorithms for extensions of pCTL and pCTL* to systems in which the probabilistic behavior coexists with nondeterminism, and show that these algorithms have polynomialtime complexity in the size of the system. This provides a practical tool for reasoning on the reliability and performance of parallel systems. 1 Introduction Temporal logic has been successfully used to specify the behavior of concurrent and reactive systems. These systems are usually modeled as nondeterministic processes: at any moment in time, more than one future evolution may be possible, but a probabilistic characterization of their likelihood is normally not attempted. While ma...
Algebraic Algorithms for Sampling from Conditional Distributions
 Annals of Statistics
, 1995
"... We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so a ..."
Abstract

Cited by 192 (16 self)
 Add to MetaCart
We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include generating tables with fixed row and column sums and higher dimensional analogs. The algorithms involve finding bases for associated polynomial ideals and so an excursion into computational algebraic geometry.
A Fast LinearArithmetic Solver for DPLL(T
, 2006
"... Abstract. We present a new Simplexbased linear arithmetic solver that can be integrated efficiently in the DPLL(T) framework. The new solver improves over existing approaches by enabling fast backtracking, supporting a priori simplification to reduce the problem size, and providing an efficient for ..."
Abstract

Cited by 183 (7 self)
 Add to MetaCart
Abstract. We present a new Simplexbased linear arithmetic solver that can be integrated efficiently in the DPLL(T) framework. The new solver improves over existing approaches by enabling fast backtracking, supporting a priori simplification to reduce the problem size, and providing an efficient form of theory propagation. We also present a new and simple approach for solving strict inequalities. Experimental results show substantial performance improvements over existing tools that use other Simplexbased solvers in DPLL(T) decision procedures. The new solver is even competitive with stateoftheart tools specialized for the difference logic fragment. 1
Algorithms for Sequential Decision Making
, 1996
"... Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one ..."
Abstract

Cited by 175 (8 self)
 Add to MetaCart
Sequential decision making is a fundamental task faced by any intelligent agent in an extended interaction with its environment; it is the act of answering the question "What should I do now?" In this thesis, I show how to answer this question when "now" is one of a finite set of states, "do" is one of a finite set of actions, "should" is maximize a longrun measure of reward, and "I" is an automated planning or learning system (agent). In particular,
Combinatorial auctions: A survey
, 2000
"... Many auctions involve the sale of a variety of distinct assets. Examples are airport time slots, delivery routes and furniture. Because of complementarities (or substitution effects) between the different assets, bidders have preferences not just for particular items but for sets or bundles of items ..."
Abstract

Cited by 170 (1 self)
 Add to MetaCart
Many auctions involve the sale of a variety of distinct assets. Examples are airport time slots, delivery routes and furniture. Because of complementarities (or substitution effects) between the different assets, bidders have preferences not just for particular items but for sets or bundles of items. For this reason, economic efficiency is enhanced if bidders are allowed to bid on bundles or combinations of different assets. This paper surveys the state of knowledge about the design of combinatorial auctions. Second, it uses this subject as a vehicle to convey the aspects of integer programming that are relevant for the