Results 11  20
of
76
FiniteTime Bounds for Fitted Value Iteration
"... In this paper we develop a theoretical analysis of the performance of samplingbased fitted value iteration (FVI) to solve infinite statespace, discountedreward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come i ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
In this paper we develop a theoretical analysis of the performance of samplingbased fitted value iteration (FVI) to solve infinite statespace, discountedreward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of finitetime bounds on the performance of two versions of samplingbased FVI. The convergence rate results obtained allow us to show that both versions of FVI are well behaving in the sense that by using a sufficiently large number of samples for a large class of MDPs, arbitrary good performance can be achieved with high probability. An important feature of our proof technique is that it permits the study of weighted L pnorm performance bounds. As a result, our technique applies to a large class of functionapproximation methods (e.g., neural networks, adaptive regression trees, kernel machines, locally weighted learning), and our bounds scale well with the effective horizon of the MDP. The bounds show a dependence on the stochastic stability properties of the MDP: they scale with the discountedaverage concentrability of the futurestate distributions. They also depend on a new measure of the approximation power of the function space, the inherent Bellman residual, which reflects how well the function space is “aligned ” with the dynamics and rewards of the MDP. The conditions of the main result, as well as the concepts introduced in the analysis, are extensively discussed and compared to previous theoretical results. Numerical experiments are used to substantiate the theoretical findings.
2009a): “A Dynamic Oligopoly Game of the US Airline Industry: Estimation and Policy Experiments
"... This paper estimates the contribution of demand, cost and strategic factors to explain why most companies in the US airline industry operate using a hubspoke network. We postulate and estimate a dynamic oligopoly model where airline companies decide, every quarter, which routes (directional citypa ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
This paper estimates the contribution of demand, cost and strategic factors to explain why most companies in the US airline industry operate using a hubspoke network. We postulate and estimate a dynamic oligopoly model where airline companies decide, every quarter, which routes (directional citypairs) to operate, the type of product (direct flight vs. stopflight), and the fare of each routeproduct. The model incorporates three factors which may contribute to the profitability of hubspoke networks. First, consumers may value the scale of operation of an airline in the origin and destination airports (e.g., more convenient checkingin and landing facilities). Second, operating costs and entry costs may depend on the airline’s network because economies of density and scale. And third, a hubspoke network may be an strategy to deter the entry of non hubspoke carriers in some routes. We estimate our dynamic oligopoly model using panel data from the Airline Origin and Destination Survey with information on quantities, prices, and entry and exit decisions for every airline company over more than two thousand citypair markets and several years. Demand and variable cost parameters are estimated using demand equations and NashBertrand equilibrium conditions for prices. In a second step, we estimate fixed operating costs and sunk costs from the dynamic entryexit game. Counterfactual experiments show that hubsize effects on entry costs is, by far, the most important factor to explain hubspoke networks. Strategic entry deterrence is also significant and more important to explain hubspoke networks than hubsize effects on demand, variable costs or fixed costs.
Random sampling of states in dynamic programming
 in Proc. NIPS Conf., 2007
"... Abstract—We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using local trajectory optimizers to globally optimize a policy and associated value function. Our focus is on finding s ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Abstract—We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using local trajectory optimizers to globally optimize a policy and associated value function. Our focus is on finding steadystate policies for deterministic timeinvariant discrete time control problems with continuous states and actions often found in robotics. In this paper, we describe our approach and provide initial results on several simulated robotics problems. Index Terms—Dynamic programming, optimal control, random sampling. I.
Estimating semiparametric ARCH(∞) models by kernel smoothing methods
 Econometrica
, 2005
"... Contents: ..."
KernelBased Reinforcement Learning in AverageCost Problems: An Application to Optimal Portfolio Choice
 Advances in Neural Information Processing Systems
, 2000
"... Many approaches to reinforcement learning combine neural networks or other parametric function approximators with a form of temporaldifference learning to estimate the value function of a Markov Decision Process. A significant disadvantage of those procedures is that the resulting learning algo ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Many approaches to reinforcement learning combine neural networks or other parametric function approximators with a form of temporaldifference learning to estimate the value function of a Markov Decision Process. A significant disadvantage of those procedures is that the resulting learning algorithms are frequently unstable.
The browser war  econometric analysis of Markov perfect equilibrium in markets with network effects
 TRIAL EXHIBIT: MICROSOFT OEM SALES FY ’98 MIDYEAR REVIEW (GX 421) IN UNITED STATES V. MICROSOFT CORPORATION, CIVIL ACTION NO
, 2004
"... When demands for heterogeneous goods in a concentrated market shift over time due to network or contagion effects, forwardlooking firms consider the strategic impact of investment, pricing, and other conduct. Network effects may be a substantial barrier to entry, giving both entrants and incumbents ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
When demands for heterogeneous goods in a concentrated market shift over time due to network or contagion effects, forwardlooking firms consider the strategic impact of investment, pricing, and other conduct. Network effects may be a substantial barrier to entry, giving both entrants and incumbents powerful strategic incentives to “tip” the market. A Markov perfect equilibrium model captures this strategic behavior, and permits the comparison of “as is ” market trajectories with “but for ” trajectories under counterfactuals where “bad acts ” by some firms are eliminated. Our analysis is applied to a stylized description of the browser war between Netscape and Microsoft. Appendices give conditions for econometric identification and estimation of a Markov perfect equilibrium model from observations on partial trajectories, and discuss estimation of the impacts of
A Parallel CuttingPlane Algorithm for the Vehicle Routing Problem With Time Windows
, 1999
"... In the vehicle routing problem with time windows a number of identical vehicles must be routed to and from a depot to cover a given set of customers, each of whom has a specified time interval indicating when they are available for service. Each customer also has a known demand, and a vehicle may on ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
In the vehicle routing problem with time windows a number of identical vehicles must be routed to and from a depot to cover a given set of customers, each of whom has a specified time interval indicating when they are available for service. Each customer also has a known demand, and a vehicle may only serve the customers on a route if the total demand does not exceed the capacity of the vehicle. The most effective solution method proposed to date for this problem is due to Kohl, Desrosiers, Madsen, Solomon, and Soumis. Their algorithm uses a cuttingplane approach followed by a branchand bound search with column generation, where the columns of the LP relaxation represent routes of individual vehicles. We describe a new implementation of their method, using Karger's randomized minimumcut algorithm to generate cutting planes. The standard benchmark in this area is a set of 87 problem instances generated in 1984 by M. Solomon; making using of parallel processing in both the cuttingpla...
On the design of globally optimal communication strategies for realtime communcation systems with noisy feedback
 IEEE J. SELECT. AREAS COMMUN
, 2008
"... Abstract—A realtime communication system with noisy feedback is considered. The system consists of a Markov source, forward and backward discrete memoryless channels, and a receiver with limited memory. The receiver can send messages to the encoder over the backward noisy channel. The encoding at t ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
Abstract—A realtime communication system with noisy feedback is considered. The system consists of a Markov source, forward and backward discrete memoryless channels, and a receiver with limited memory. The receiver can send messages to the encoder over the backward noisy channel. The encoding at the encoder and the decoding, the feedback, and the memory update at the receiver must be done in realtime. A distortion metric that does not tolerate delays is given. The objective is to design an optimal realtime communication strategy, i.e., design optimal realtime encoding, decoding, feedback, and memory update strategies to minimize a total expected distortion over a finite horizon. This problem is formulated as a decentralized stochastic optimization problem and a methodology for its sequential decomposition is presented. This results in a set of nested optimality equations that can be used to sequentially determine optimal communication strategies. The methodology exponentially simplifies the search for determining an optimal realtime communication strategy. Index Terms—Markov decision processes, realtime communication, noisy feedback, dynamic teams, information state, common knowledge, common belief I.
Randomly sampling actions in dynamic programming
 in Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning
, 2007
"... Abstract — We describe an approach towards reducing the curse of dimensionality for deterministic dynamic programming with continuous actions by randomly sampling actions while computing a steady state value function and policy. This approach results in globally optimized actions, without searching ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
Abstract — We describe an approach towards reducing the curse of dimensionality for deterministic dynamic programming with continuous actions by randomly sampling actions while computing a steady state value function and policy. This approach results in globally optimized actions, without searching over a discretized multidimensional grid. We present results on finding time invariant control laws for two, four, and six dimensional deterministic swing up problems with up to 480 million discretized states. I.
A time aggregation approach to Markov decision processes. Automatica
 Automatica
, 2002
"... Abstract We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Abstract We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a reduced state space, which may lead to a substantial reduction in computational and storage requirements, especially for problems with certain structural properties. However, in contrast to state aggregation, which generally results in an approximate model due to the loss of Markov property, time aggregation suffers no loss of accuracy, because the Markov property is preserved. Single sample pathbased estimation algorithms are developed that allow the time aggregation approach to be implemented online for practical systems. Some numerical and simulation examples are presented to illustrate the ideas and potential computational savings.