Results 1  10
of
73
Planning and acting in partially observable stochastic domains
 ARTIFICIAL INTELLIGENCE
, 1998
"... In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm ..."
Abstract

Cited by 822 (31 self)
 Add to MetaCart
In this paper, we bring techniques from operations research to bear on the problem of choosing optimal actions in partially observable stochastic domains. We begin by introducing the theory of Markov decision processes (mdps) and partially observable mdps (pomdps). We then outline a novel algorithm for solving pomdps offline and show how, in some cases, a finitememory controller can be extracted from the solution to a pomdp. We conclude with a discussion of how our approach relates to previous work, the complexity of finding exact solutions to pomdps, and of some possibilities for finding approximate solutions.
Communication over fading channels with delay constraints
 IEEE Transactions on Information Theory
, 2002
"... We consider a user communicating over a fading channel with perfect channel state information. Data is assumed to arrive from some higher layer application and is stored in a buffer until it is transmitted. We study adapting the user's transmission rate and power based on the channel state informati ..."
Abstract

Cited by 170 (7 self)
 Add to MetaCart
We consider a user communicating over a fading channel with perfect channel state information. Data is assumed to arrive from some higher layer application and is stored in a buffer until it is transmitted. We study adapting the user's transmission rate and power based on the channel state information as well as the buffer occupancy; the objectives are to regulate both the longterm average transmission power and the average buffer delay incurred by the traffic. Two models for this situation are discussed; one corresponding to fixedlength/variablerate codewords and one corresponding to variablelength codewords. The tradeoff between the average delay and the average transmission power required for reliable communication is analyzed. A dynamic programming formulation is given to find all Pareto optimal power/delay operating points. We then quantify the behavior of this tradeoff in the regime of asymptotically large delay. In this regime we characterize simple buffer control policies which exhibit optimal characteristics. Connections to the delaylimited capacity and the expected capacity of fading channels are also discussed.
Monetary Policy Evaluation with Noisy Information
, 1998
"... This paper investigates the implications of noisy information regarding the measurement of economic activity for the evaluation of monetary policy. A common implicit assumption in such evaluations is that policymakers observe the current state of the economy promptly and accurately and can therefore ..."
Abstract

Cited by 132 (23 self)
 Add to MetaCart
This paper investigates the implications of noisy information regarding the measurement of economic activity for the evaluation of monetary policy. A common implicit assumption in such evaluations is that policymakers observe the current state of the economy promptly and accurately and can therefore adjust policy based on this information. However, in reality, decisions are made in real time when there is considerable uncertainty about the true state of affairs in the economy. Policy must be made with partial information. Using a simple model of the U.S. economy, I show that failing to account for the actual level of information noise in the historical data provides a seriously distorted picture of feasible macroeconomic outcomes and produces inefficient policy rules. Naive adoption of policies identified as efficient when such information noise is ignored results in macroeconomic performance worse than actual experience. When the noise content of the data is properly taken into account, policy reactions are cautious and less sensitive to the apparent imbalances in the unfiltered data. The resulting policy prescriptions reflect the recognition that excessively activist policy can increase rather than decrease economic instability.
CongestionDependent Pricing of Network Services
 IEEE/ACM Transactions on Networking
, 1998
"... Weconsider a service provider (SP) who provides access to a communication network or some other form of online services. Users access the network and initiate calls that belong to a set of diverse service classes, differing in resource requirements, demand pattern, and call duration. ..."
Abstract

Cited by 123 (0 self)
 Add to MetaCart
Weconsider a service provider (SP) who provides access to a communication network or some other form of online services. Users access the network and initiate calls that belong to a set of diverse service classes, differing in resource requirements, demand pattern, and call duration.
Optimal control of execution costs
 JOURNAL OF FINANCIAL MARKETS 1 (1998) 1—50
, 1998
"... We derive dynamic optimal trading strategies that minimize the expected cost of trading a large block of equity over a fixed time horizon. Specifically, given a fixed block SM of shares to be executed within a fixed finite number of periods , and given a priceimpact function that yields the executi ..."
Abstract

Cited by 98 (2 self)
 Add to MetaCart
We derive dynamic optimal trading strategies that minimize the expected cost of trading a large block of equity over a fixed time horizon. Specifically, given a fixed block SM of shares to be executed within a fixed finite number of periods , and given a priceimpact function that yields the execution price of an individual trade as a function of the shares traded and market conditions, we obtain the optimal sequence of trades as a function of market conditions — closedform expressions in some cases — that minimizes the expected cost of executing SM within periods. Our analysis is extended to the portfolio case in which price impact across stocks can have an important effect on the total cost of trading a portfolio.
Temporal differencesbased policy iteration and applications in neurodynamic programming
, 1996
"... by ..."
Approximate Dynamic Programming For Sensor Management
, 1997
"... This paper studies the problem of dynamic scheduling of multimode sensor resources for the problem of classification of multiple unknown objects. Because of the uncertain nature of the object types, the problem is formulated as a partially observed Markov decision problem with a large state space. ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
This paper studies the problem of dynamic scheduling of multimode sensor resources for the problem of classification of multiple unknown objects. Because of the uncertain nature of the object types, the problem is formulated as a partially observed Markov decision problem with a large state space. The paper describes a hierarchical algorithm approach for e#cient solution of sensor scheduling problems with large numbers of objects, based on combination of stochastic dynamic programming and nondi#erentiable optimization techniques. The algorithm is illustrated with an application involving classification of 10,000 unknown objects. 1 Introduction Many modern avionics systems include multiple sensors as well as individual sensors capable of focusing on different objects with di#erent modes. In order to achieve an accurate possible representation of all objects of interest, it is important to coordinate the allocation and scheduling of the di#erent sensors and sensor modes across the di#...
Pricing in Multiservice Loss Networks: Static Pricing, Asymptotic Optimality, and Demand Substitution Effects
 IEEE/ACM Transactions On Networking
, 2002
"... We consider a communication network with xed routing that can accommodate multiple service classes, diering in bandwidth requirements, demand pattern, call duration, and routing. The network charges a fee per call which can depend on the current congestion level, and which aects user's demand. Build ..."
Abstract

Cited by 32 (0 self)
 Add to MetaCart
We consider a communication network with xed routing that can accommodate multiple service classes, diering in bandwidth requirements, demand pattern, call duration, and routing. The network charges a fee per call which can depend on the current congestion level, and which aects user's demand. Building on the singlenode results of Paschalidis and Tsitsiklis, 2000, we consider both problems of revenue and welfare maximization and show that static pricing is asymptotically optimal in a regime of many, relatively small, users. In particular, the performance of an optimal (dynamic) pricing strategy is closely matched by a suitably chosen classdependent static price, which does not depend on instantaneous congestion. This result holds even when we incorporate demand substitution eects into the demand model. More speci cally, we model the situation where price increases for a class of service might lead users to use another class as an imperfect substitute. For both revenue and welfare maximization objectives we characterize the structure of the asymptotically optimal static prices, expressing them as a function of a parsimonious number of parameters. We employ a simulationbased approach to tune those parameters and to eciently compute an eective policy away from the limiting regime. Our approach can handle large, realistic, instances of the problem.
SelfOptimizing and ParetoOptimal Policies in General Environments based on BayesMixtures
 Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Lecture Notes in Arti Intelligence
, 2002
"... The problem of making sequential decisions in unknown probabilistic environments is studied. In cycle t action y t results in perception x_t and reward r_t, where all quantities in general may depend on the complete history. The perception x_t and reward r_t are sampled from the (reactive) environme ..."
Abstract

Cited by 31 (15 self)
 Add to MetaCart
The problem of making sequential decisions in unknown probabilistic environments is studied. In cycle t action y t results in perception x_t and reward r_t, where all quantities in general may depend on the complete history. The perception x_t and reward r_t are sampled from the (reactive) environmental probability distribution μ. This very general setting includes...
Synthesis of hierarchical finitestate controllers for POMDPs
 In Proceedings of ICAPS
, 2003
"... We develop a hierarchical approach to planning for partially observable Markov decision processes (POMDPs) in which a policy is represented as a hierarchical finitestate controller. To provide a foundation for this approach, we discuss some extensions of the POMDP framework that allow us to formali ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
We develop a hierarchical approach to planning for partially observable Markov decision processes (POMDPs) in which a policy is represented as a hierarchical finitestate controller. To provide a foundation for this approach, we discuss some extensions of the POMDP framework that allow us to formalize the process of abstraction by which a hierarchical controller is constructed. We describe a planning algorithm that uses a programmerdefined task hierarchy to constrain the search space of finitestate controllers, and prove that this algorithm converges to a hierarchical finitestate controller that is εoptimal in a limited but welldefined sense, related to the concept of recursive optimality.