Results 11 - 20
of
53
A Parallel Cutting-Plane Algorithm for the Vehicle Routing Problem With Time Windows
, 1999
"... In the vehicle routing problem with time windows a number of identical vehicles must be routed to and from a depot to cover a given set of customers, each of whom has a specified time interval indicating when they are available for service. Each customer also has a known demand, and a vehicle may on ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In the vehicle routing problem with time windows a number of identical vehicles must be routed to and from a depot to cover a given set of customers, each of whom has a specified time interval indicating when they are available for service. Each customer also has a known demand, and a vehicle may only serve the customers on a route if the total demand does not exceed the capacity of the vehicle. The most effective solution method proposed to date for this problem is due to Kohl, Desrosiers, Madsen, Solomon, and Soumis. Their algorithm uses a cutting-plane approach followed by a branchand -bound search with column generation, where the columns of the LP relaxation represent routes of individual vehicles. We describe a new implementation of their method, using Karger's randomized minimum-cut algorithm to generate cutting planes. The standard benchmark in this area is a set of 87 problem instances generated in 1984 by M. Solomon; making using of parallel processing in both the cutting-pla...
Finite-Time Bounds for Fitted Value Iteration
"... In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come i ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of finite-time bounds on the performance of two versions of sampling-based FVI. The convergence rate results obtained allow us to show that both versions of FVI are well behaving in the sense that by using a sufficiently large number of samples for a large class of MDPs, arbitrary good performance can be achieved with high probability. An important feature of our proof technique is that it permits the study of weighted L p-norm performance bounds. As a result, our technique applies to a large class of function-approximation methods (e.g., neural networks, adaptive regression trees, kernel machines, locally weighted learning), and our bounds scale well with the effective horizon of the MDP. The bounds show a dependence on the stochastic stability properties of the MDP: they scale with the discounted-average concentrability of the future-state distributions. They also depend on a new measure of the approximation power of the function space, the inherent Bellman residual, which reflects how well the function space is “aligned ” with the dynamics and rewards of the MDP. The conditions of the main result, as well as the concepts introduced in the analysis, are extensively discussed and compared to previous theoretical results. Numerical experiments are used to substantiate the theoretical findings.
Random sampling of states in dynamic programming
- in Proc. NIPS Conf., 2007
"... Abstract—We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using local trajectory optimizers to globally optimize a policy and associated value function. Our focus is on finding s ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract—We combine three threads of research on approximate dynamic programming: sparse random sampling of states, value function and policy approximation using local models, and using local trajectory optimizers to globally optimize a policy and associated value function. Our focus is on finding steady-state policies for deterministic time-invariant discrete time control problems with continuous states and actions often found in robotics. In this paper, we describe our approach and provide initial results on several simulated robotics problems. Index Terms—Dynamic programming, optimal control, random sampling. I.
The browser war -- econometric analysis of Markov perfect equilibrium in markets with network effects
- TRIAL EXHIBIT: MICROSOFT OEM SALES FY ’98 MID-YEAR REVIEW (GX 421) IN UNITED STATES V. MICROSOFT CORPORATION, CIVIL ACTION NO
, 2004
"... When demands for heterogeneous goods in a concentrated market shift over time due to network or contagion effects, forward-looking firms consider the strategic impact of investment, pricing, and other conduct. Network effects may be a substantial barrier to entry, giving both entrants and incumbents ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
When demands for heterogeneous goods in a concentrated market shift over time due to network or contagion effects, forward-looking firms consider the strategic impact of investment, pricing, and other conduct. Network effects may be a substantial barrier to entry, giving both entrants and incumbents powerful strategic incentives to “tip” the market. A Markov perfect equilibrium model captures this strategic behavior, and permits the comparison of “as is ” market trajectories with “but for ” trajectories under counterfactuals where “bad acts ” by some firms are eliminated. Our analysis is applied to a stylized description of the browser war between Netscape and Microsoft. Appendices give conditions for econometric identification and estimation of a Markov perfect equilibrium model from observations on partial trajectories, and discuss estimation of the impacts of
Efficient Approximate Planning in Continuous Space Markovian Decision Problems
, 2001
"... this paper we consider another on-line planning algorithm that will be shown to scale polynomially with the horizon-time, as well. The price of this is that we have to assume more regularity on the MDPs we consider. In particular, we will restrict ourselves to stochastic MDPs with finite action spac ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
this paper we consider another on-line planning algorithm that will be shown to scale polynomially with the horizon-time, as well. The price of this is that we have to assume more regularity on the MDPs we consider. In particular, we will restrict ourselves to stochastic MDPs with finite action spaces and state space , and, more importantly, assume that the transition probability kernel of the MDPs are subject to the Lipschitzcondition L p for any states x 1 , x 2 , x # and action a # A. Here L p > 0 is a given fixed number and 1 denotes the # norm of vectors. Another restriction (quite common in the literature) that we will assume is the uniform boundedness of the transition probabilities (the bound shall be denoted by K p ) and of the immediate rewards (bound denoted by K r ). Further, our bounds will depend on the dimension of the state space, d
Performance loss bounds for approximate value iteration with state aggregation
- Mathematics of Operations Research
, 2005
"... We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We consider approximate value iteration with a parameterized approximator in which the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performance loss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to using invariant distributions of appropriate policies as projection weights. Such projection weighting relates to what is done by temporaldifference learning. Our analysis also leads to the first performance loss bound for approximate value iteration with an average-cost objective. Key words: approximate value iteration; state aggregation; temporal-difference learning
On-Line Sampling-Based Control For Network Queueing Problems
, 2001
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ix 1. ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ix 1.
A time aggregation approach to Markov decision processes. Automatica
- Automatica
, 2002
"... Abstract We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract We propose a time aggregation approach for the solution of infinite horizon average cost Markov decision processes via policy iteration. In this approach, policy update is only carried out when the process visits a subset of the state space. As in state aggregation, this approach leads to a reduced state space, which may lead to a substantial reduction in computational and storage requirements, especially for problems with certain structural properties. However, in contrast to state aggregation, which generally results in an approximate model due to the loss of Markov property, time aggregation suffers no loss of accuracy, because the Markov property is preserved. Single sample path-based estimation algorithms are developed that allow the time aggregation approach to be implemented online for practical systems. Some numerical and simulation examples are presented to illustrate the ideas and potential computational savings.
Robust Combination of Local Controllers
- Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence (UAI-01
, 2001
"... Finding solutions to high dimensional Markov Decision Processes (MDPs) is a difficult problem, especially in the presence of uncertainty or if the actions and time measurements are continuous. Frequently this difficulty can be alleviated by the availability of problem-specific knowledge. For example ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Finding solutions to high dimensional Markov Decision Processes (MDPs) is a difficult problem, especially in the presence of uncertainty or if the actions and time measurements are continuous. Frequently this difficulty can be alleviated by the availability of problem-specific knowledge. For example, it may be relatively easy to design controllers that are good locally, though having no global guarantees. We propose a nonparametric method to combine these local controllersto obtain globally good solutions. We apply this formulation to two types of problems: motion planning (stochastic shortest path problems) and discounted-cost MDPs. For motion planning, we argue that only considering the expected cost of a pathmay be overly simplistic in the presence of uncertainty. We propose an alternative: finding the minimum cost path, subject to the constraint that the robot must reach the goal with high probability. For this problem, we prove that a polynomial number of samples is sufficient to obtain a high probability path. For discounted MDPs, we consider various problem formulations that explicitly deal with model uncertainty. We provide empirical evidence of the usefulness of these approaches using the control of a robot arm.
ACTIVE LEARNING IN REGRESSION, WITH APPLICATION TO STOCHASTIC DYNAMIC PROGRAMMING
"... Abstract: We study active learning as a derandomized form of sampling. We show that full derandomization is not suitable in a robust framework, propose partially derandomized samplings, and develop new active learning methods (i) in which expert knowledge is easy to integrate (ii) with a parameter f ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract: We study active learning as a derandomized form of sampling. We show that full derandomization is not suitable in a robust framework, propose partially derandomized samplings, and develop new active learning methods (i) in which expert knowledge is easy to integrate (ii) with a parameter for the exploration/exploitation dilemma (iii) less randomized than the full-random sampling (yet also not deterministic). Experiments are performed in the case of regression for value-function learning on a continuous domain. Our main results are (i) efficient partially derandomized point sets (ii) moderate-derandomization theorems (iii) experimental evidence of the importance of the frontier (iv) a new regression-specific user-friendly sampling tool lessrobust than blind samplers but that sometimes works very efficiently in large dimensions. All experiments can be reproduced by downloading the source code and running the provided command line. 1

