Results 1 - 10
of
18
Efficient top-k query evaluation on probabilistic data
- in ICDE
, 2007
"... Modern enterprise applications are forced to deal with unreliable, inconsistent and imprecise information. Probabilistic databases can model such data naturally, but SQL query evaluation on probabilistic databases is difficult: previous approaches have either restricted the SQL queries, or computed ..."
Abstract
-
Cited by 106 (26 self)
- Add to MetaCart
Modern enterprise applications are forced to deal with unreliable, inconsistent and imprecise information. Probabilistic databases can model such data naturally, but SQL query evaluation on probabilistic databases is difficult: previous approaches have either restricted the SQL queries, or computed approximate probabilities, or did not scale, and it was shown recently that precise query evaluation is theoretically hard. In this paper we describe a novel approach, which computes and ranks efficiently the top-k answers to a SQL query on a probabilistic database. The restriction to top-k answers is natural, since imprecisions in the data often lead to a large number of answers of low quality, and users are interested only in the answers with the highest probabilities. The idea in our algorithm is to run in parallel several Monte-Carlo simulations, one for each candidate answer, and approximate each probability only to the extent needed to compute correctly the top-k answers. The algorithms is in a certain sense provably optimal and scales to large databases: we have measured running times of 5 to 50 seconds for complex SQL queries over a large database (10M tuples of which 6M probabilistic). Additional contributions of the paper include several optimization techniques, and a simple data model for probabilistic data that achieves completeness by using SQL views. 1
A knowledge-gradient policy for sequential information collection
- SIAM J. on Control and Optimization
"... Abstract. In a sequential Bayesian ranking and selection problem with independent normal populations and common known variance, we study a previously introduced measurement policy which we refer to as the knowledge-gradient policy. This policy myopically maximizes the expected increment in the value ..."
Abstract
-
Cited by 7 (7 self)
- Add to MetaCart
Abstract. In a sequential Bayesian ranking and selection problem with independent normal populations and common known variance, we study a previously introduced measurement policy which we refer to as the knowledge-gradient policy. This policy myopically maximizes the expected increment in the value of information in each time period, where the value is measured according to the terminal utility function. We show that the knowledge-gradient policy is optimal both when the horizon is a single time period and in the limit as the horizon extends to infinity. We show furthermore that, in some special cases, the knowledge-gradient policy is optimal regardless of the length of any given fixed total sampling horizon. We bound the knowledge-gradient policy’s suboptimality in the remaining cases, and show through simulations that it performs competitively with or significantly better than other policies.
The Knowledge-Gradient Algorithm for Sequencing Experiments in Drug Discovery
- INFORMS J. on Computing
, 2010
"... We present a new technique for adaptively choosing the sequence of molecular compounds to test in drug discovery. Beginning with a base compound, we consider the problem of searching for a chemical derivative of the molecule that best treats a given disease. The problem of choosing molecules to test ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
We present a new technique for adaptively choosing the sequence of molecular compounds to test in drug discovery. Beginning with a base compound, we consider the problem of searching for a chemical derivative of the molecule that best treats a given disease. The problem of choosing molecules to test to maximize the expected quality of the best compound discovered may be formulated mathematically as a ranking-andselection problem in which each molecule is an alternative. We apply a recently developed algorithm, known as the knowledge-gradient algorithm, that uses correlations in our Bayesian prior distribution between the performance of different alternatives (molecules) to dramatically reduce the number of molecular tests required, but it has heavy computational requirements that limit the number of possible alternatives to a few thousand. We develop computational improvements that allow the knowledge-gradient method to consider much larger sets of alternatives, and we demonstrate the method on a problem with 87,120 alternatives.
Efficient PAC Learning for Episodic Tasks with Acyclic State Spaces
"... This paper considers the problem of computing an optimal policy for a Markov Decision Process (MDP), under lack of complete a priori knowledge of (i) the branching probability distributions determining the evolution of the process state upon the execution of the different actions, and (ii) the proba ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper considers the problem of computing an optimal policy for a Markov Decision Process (MDP), under lack of complete a priori knowledge of (i) the branching probability distributions determining the evolution of the process state upon the execution of the different actions, and (ii) the probability distributions characterizing the immediate rewards returned by the environment as a result of the execution of these actions at different states of the process. In addition, it is assumed that the underlying process evolves in a repetitive, episodic manner, with each episode starting from a well-defined initial state and evolving over an acyclic state space. A novel efficient algorithm for this problem is proposed, and its convergence properties and computational complexity are rigorously characterized in the formal framework of computational learning theory. Furthermore, in the process of deriving the aforementioned results, the presented work generalizes Bechhofer’s “indifference-zone” approach for the Ranking & Selection problem, that arises in statistical inference theory, so that it applies to populations with bounded general distributions.
Information collection on a graph
, 2010
"... We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to refine Bayesian estimates of individual edge values in order to learn about the best path. This problem differs from traditional ranking and selection, in that the implementat ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We derive a knowledge gradient policy for an optimal learning problem on a graph, in which we use sequential measurements to refine Bayesian estimates of individual edge values in order to learn about the best path. This problem differs from traditional ranking and selection, in that the implementation decision (the path we choose) is distinct from the measurement decision (the edge we measure). Our decision rule is easy to compute, and performs competitively against other learning policies, including a Monte Carlo adaptation of the knowledge gradient policy for ranking and selection. 1
Online Supplement to “Some Almost-sure Convergence Properties Useful in Sequential Analysis”
, 2005
"... Kim and Nelson [14] propose sequential procedures for selecting the simulated system with the largest steady-state mean from a set of alternatives that yield stationary output processes. Each procedure uses a triangular continuation region so that sampling stops when the relevant test statistic firs ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Kim and Nelson [14] propose sequential procedures for selecting the simulated system with the largest steady-state mean from a set of alternatives that yield stationary output processes. Each procedure uses a triangular continuation region so that sampling stops when the relevant test statistic first reaches the region’s boundary. In applying the generalized continuous mapping theorem to prove the asymptotic validity of these procedures as the indifference-zone parameter tends to zero, we are given (i) a sequence of functions (which are right-continuous with left-hand limits) converging to a realization of a certain Brownian motion process with drift; and (ii) a sequence of triangular continuation regions corresponding to the functions in sequence (i) and converging to the triangular continuation region for the Brownian motion process. From each function in sequence (i) and its corresponding continuation region in sequence (ii), we obtain the associated boundary-hitting point; and we prove that the resulting sequence of boundaryhitting points converges almost surely to the boundary-hitting point for the Brownian motion process. The method of proof can be adapted to study the asymptotic behavior of certain steady-state simulation output analysis procedures as well as sequential-analysis procedures with continuation regions of various shapes. Key Words: Sequential ranking-and-selection procedures; Steady-state computer simulation; Crossing problems; Brownian motion with drift.
Efficient Risk Estimation via Nested Sequential Simulation
, 2010
"... We analyze the computational problem of estimating financial risk in a nested simulation. In this approach, an outer simulation is used to generate financial scenarios and an inner simulation is used to estimate future portfolio values in each scenario. We focus on one risk measure, the probability ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We analyze the computational problem of estimating financial risk in a nested simulation. In this approach, an outer simulation is used to generate financial scenarios and an inner simulation is used to estimate future portfolio values in each scenario. We focus on one risk measure, the probability of a large loss, and we propose a new algorithm to estimate this risk. Our algorithm sequentially allocates computational effort in the inner simulation based on marginal changes in the risk estimator in each scenario. Theoretical results are given to show that the risk estimator has a faster convergence order compared to the conventional uniform inner sampling approach. Numerical results consistent with the theory are presented. 1.
A LARGE DEVIATIONS PERSPECTIVE ON ORDINAL OPTIMIZATION
"... We consider the problem of optimal allocation of computing budget to maximize the probability of correct selection in the ordinal optimization setting. This problem has been studied in the literature in an approximate mathematical framework under the assumption that the underlying random variables h ..."
Abstract
- Add to MetaCart
We consider the problem of optimal allocation of computing budget to maximize the probability of correct selection in the ordinal optimization setting. This problem has been studied in the literature in an approximate mathematical framework under the assumption that the underlying random variables have a Gaussian distribution. We use the large deviations theory to develop a mathematically rigorous framework for determining the optimal allocation of computing resources even when the underlying variables have general, non-Gaussian distributions. Further, in a simple setting we show that when there exists an indifference zone, quick stopping rules may be developed that exploit the exponential decay rates of the probability of false selection. In practice, the distributions of the underlying variables are estimated from generated samples leading to performance degradation due to estimation errors. On a positive note, we show that the corresponding estimates of optimal allocations converge to their true values as the number of samples used for estimation increases to infinity. 1
Proceedings of the 2002 Winter Simulation Conference
"... A simulation model is successful if it leads to policy action, i.e., if it is implemented. Studies show that for a model to be implemented, it must have good correspondence with the mental model of the system held by the user of the model. The user must feel confident that the simulation model corre ..."
Abstract
- Add to MetaCart
A simulation model is successful if it leads to policy action, i.e., if it is implemented. Studies show that for a model to be implemented, it must have good correspondence with the mental model of the system held by the user of the model. The user must feel confident that the simulation model corresponds to this mental model. An understanding of how the model works is required. Simulation models for implementation must be developed step by step, starting with a simple model, the simulation prototype. After this has been explained to the user, a more detailed model can be developed on the basis of feedback from the user. Software for simulation prototyping is discussed, e.g., with regard to the ease with which models and output can be explained and the speed with which small models can be written.
Proceedings of the 2003 Winter Simulation Conference
"... The model used in this report focuses on the analysis of ship waiting statistics and stock fluctuations under different arrival processes. However, the basic outline is the same: central to both models are a jetty and accompanying tankfarm facilities belonging to a new chemical plant in the Po ..."
Abstract
- Add to MetaCart
The model used in this report focuses on the analysis of ship waiting statistics and stock fluctuations under different arrival processes. However, the basic outline is the same: central to both models are a jetty and accompanying tankfarm facilities belonging to a new chemical plant in the Port of Rotterdam. Both the supply of raw materials and the export of finished products occur through ships loading and unloading at the jetty. Since disruptions in the plants production process are very expensive, buffer stock is needed to allow for variations in ship arrivals and overseas exports through large ships. Ports provide jetty facilities for ships to load and unload their cargo. Since ship delays are costly, terminal operators attempt to minimize their number and duration. Here, simulation has proved to be a very suitable tool. However, in port simulation models, the impact of the arrival process of ships on the model outcomes tends to be underestimated. This article considers three arrival processes: stock-controlled, equidistant per ship type, and Poisson. We assess how their deployment in a port simulation model, based on data from a real case study, affects the efficiency of the loading and unloading process. Poisson, which is the chosen arrival process in many client-oriented simulations, actually performs worst in terms of both ship delays and required storage capacity. Stock-controlled arrivals perform best with regard to ship delays and required storage capacity. In the case study two types of arrival processes were considered. The first type are the so-called stock-controlled arrivals, i.e., ship arrivals are scheduled in such a way, that a base stock level is maintained in the tanks. Given a base stock level of a raw material or ...

