Results 11  20
of
98
Estimating semiparametric ARCH(∞) models by kernel smoothing methods
 Econometrica
, 2005
"... Contents: ..."
FiniteTime Bounds for Fitted Value Iteration
"... In this paper we develop a theoretical analysis of the performance of samplingbased fitted value iteration (FVI) to solve infinite statespace, discountedreward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come i ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
(Show Context)
In this paper we develop a theoretical analysis of the performance of samplingbased fitted value iteration (FVI) to solve infinite statespace, discountedreward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of finitetime bounds on the performance of two versions of samplingbased FVI. The convergence rate results obtained allow us to show that both versions of FVI are well behaving in the sense that by using a sufficiently large number of samples for a large class of MDPs, arbitrary good performance can be achieved with high probability. An important feature of our proof technique is that it permits the study of weighted L pnorm performance bounds. As a result, our technique applies to a large class of functionapproximation methods (e.g., neural networks, adaptive regression trees, kernel machines, locally weighted learning), and our bounds scale well with the effective horizon of the MDP. The bounds show a dependence on the stochastic stability properties of the MDP: they scale with the discountedaverage concentrability of the futurestate distributions. They also depend on a new measure of the approximation power of the function space, the inherent Bellman residual, which reflects how well the function space is “aligned ” with the dynamics and rewards of the MDP. The conditions of the main result, as well as the concepts introduced in the analysis, are extensively discussed and compared to previous theoretical results. Numerical experiments are used to substantiate the theoretical findings.
The browser war  econometric analysis of Markov perfect equilibrium in markets with network effects
 TRIAL EXHIBIT: MICROSOFT OEM SALES FY ’98 MIDYEAR REVIEW (GX 421) IN UNITED STATES V. MICROSOFT CORPORATION, CIVIL ACTION NO
, 2004
"... When demands for heterogeneous goods in a concentrated market shift over time due to network or contagion effects, forwardlooking firms consider the strategic impact of investment, pricing, and other conduct. Network effects may be a substantial barrier to entry, giving both entrants and incumbents ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
When demands for heterogeneous goods in a concentrated market shift over time due to network or contagion effects, forwardlooking firms consider the strategic impact of investment, pricing, and other conduct. Network effects may be a substantial barrier to entry, giving both entrants and incumbents powerful strategic incentives to “tip” the market. A Markov perfect equilibrium model captures this strategic behavior, and permits the comparison of “as is ” market trajectories with “but for ” trajectories under counterfactuals where “bad acts ” by some firms are eliminated. Our analysis is applied to a stylized description of the browser war between Netscape and Microsoft. Appendices give conditions for econometric identification and estimation of a Markov perfect equilibrium model from observations on partial trajectories, and discuss estimation of the impacts of
Solving factored MDPs with hybrid state and action variables
 J. Artif. Intell. Res. (JAIR
"... Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model tha ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
Efficient representations and solutions for large decision problems with continuous and discrete variables are among the most important challenges faced by the designers of automated decision support systems. In this paper, we describe a novel hybrid factored Markov decision process (MDP) model that allows for a compact representation of these problems, and a new hybrid approximate linear programming (HALP) framework that permits their efficient solutions. The central idea of HALP is to approximate the optimal value function by a linear combination of basis functions and optimize its weights by linear programming. We analyze both theoretical and computational aspects of this approach, and demonstrate its scaleup potential on several hybrid optimization problems. 1.
A unifying framework for computational reinforcement learning theory
, 2009
"... Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understand ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervisedlearning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize longterm utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce shortterm utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better gameplaying strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach nearoptimal strategies
Random sampling of states in dynamic programming
 in Proc. NIPS Conf., 2007
"... Abstract—We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using local trajectory optimizers to globally optimize a policy and associated value function. Our focus is on finding s ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
Abstract—We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using local trajectory optimizers to globally optimize a policy and associated value function. Our focus is on finding steadystate policies for deterministic timeinvariant discrete time control problems with continuous states and actions often found in robotics. In this paper, we describe our approach and provide initial results on several simulated robotics problems. Index Terms—Dynamic programming, optimal control, random sampling. I.
KernelBased Reinforcement Learning in AverageCost Problems: An Application to Optimal Portfolio Choice
 Advances in Neural Information Processing Systems
, 2000
"... Many approaches to reinforcement learning combine neural networks or other parametric function approximators with a form of temporaldifference learning to estimate the value function of a Markov Decision Process. A significant disadvantage of those procedures is that the resulting learning algo ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(Show Context)
Many approaches to reinforcement learning combine neural networks or other parametric function approximators with a form of temporaldifference learning to estimate the value function of a Markov Decision Process. A significant disadvantage of those procedures is that the resulting learning algorithms are frequently unstable.
Robust strategies for managing rangelands with multiple stable attractors
 Journal of Environmental Economics and Management
, 2004
"... Savanna rangelands are characterized by dynamic interactions between grass, shrubs, fire and livestock driven by highly variable rainfall. When the livestock are grazers (only or preferentially eating grass) the desirable state of the system is dominated by grass, with scattered trees and shrubs. Ho ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Savanna rangelands are characterized by dynamic interactions between grass, shrubs, fire and livestock driven by highly variable rainfall. When the livestock are grazers (only or preferentially eating grass) the desirable state of the system is dominated by grass, with scattered trees and shrubs. However, the system can have multiple stable attractors and a perturbation such as a drought can cause it to move from such a desired configuration into one that is dominated by shrubs with very little grass. In this paper, using the rangelands of New SouthWales in Australia as an example, we provide a methodology to find robust management strategies in the context of this complex ecological system driven by stochastic rainfall events. The control variables are sheep density and the degree of fire suppression. By comparing the optimal solution where it is assumed the manager has perfect knowledge and foresight of rainfall conditions with one where the rainfall variability is ignored, we found that rainfall variability and the related uncertainty lead to a reduction of the possible expected returns from grazing activity by 33%. Using a genetic algorithm, we develop robust management strategies for highly variable rainfall that more than doubles expected returns compared to those obtained under variable rainfall using an optimal solution based on average rainfall (i.e., where the manager ignores rainfall variability).
A time aggregation approach to Markov decision processes. Automatica
 Automatica
, 2002
"... Abstract We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
Abstract We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a reduced state space, which may lead to a substantial reduction in computational and storage requirements, especially for problems with certain structural properties. However, in contrast to state aggregation, which generally results in an approximate model due to the loss of Markov property, time aggregation suffers no loss of accuracy, because the Markov property is preserved. Single sample pathbased estimation algorithms are developed that allow the time aggregation approach to be implemented online for practical systems. Some numerical and simulation examples are presented to illustrate the ideas and potential computational savings.
A Parallel CuttingPlane Algorithm for the Vehicle Routing Problem With Time Windows
, 1999
"... In the vehicle routing problem with time windows a number of identical vehicles must be routed to and from a depot to cover a given set of customers, each of whom has a specified time interval indicating when they are available for service. Each customer also has a known demand, and a vehicle may on ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
In the vehicle routing problem with time windows a number of identical vehicles must be routed to and from a depot to cover a given set of customers, each of whom has a specified time interval indicating when they are available for service. Each customer also has a known demand, and a vehicle may only serve the customers on a route if the total demand does not exceed the capacity of the vehicle. The most effective solution method proposed to date for this problem is due to Kohl, Desrosiers, Madsen, Solomon, and Soumis. Their algorithm uses a cuttingplane approach followed by a branchand bound search with column generation, where the columns of the LP relaxation represent routes of individual vehicles. We describe a new implementation of their method, using Karger's randomized minimumcut algorithm to generate cutting planes. The standard benchmark in this area is a set of 87 problem instances generated in 1984 by M. Solomon; making using of parallel processing in both the cuttingpla...