Results 1  10
of
17
Efficient exact stochastic simulation of chemical systems with many species and many channels
 J. Phys. Chem. A
, 2000
"... There are two fundamental ways to view coupled systems of chemical equations: as continuous, represented by differential equations whose variables are concentrations, or as discrete, represented by stochastic processes whose variables are numbers of molecules. Although the former is by far more comm ..."
Abstract

Cited by 419 (5 self)
 Add to MetaCart
There are two fundamental ways to view coupled systems of chemical equations: as continuous, represented by differential equations whose variables are concentrations, or as discrete, represented by stochastic processes whose variables are numbers of molecules. Although the former is by far more common, systems with very small numbers of molecules are important in some applications (e.g., in small biological cells or in surface processes). In both views, most complicated systems with multiple reaction channels and multiple chemical species cannot be solved analytically. There are exact numerical simulation methods to simulate trajectories of discrete, stochastic systems, (methods that are rigorously equivalent to the Master Equation approach) but these do not scale well to systems with many reaction pathways. This paper presents the Next Reaction Method, an exact algorithm to simulate coupled chemical reactions that is also efficient: it (a) uses only a single random number per simulation event, and (b) takes time proportional to the logarithm of the number of reactions, not to the number of reactions itself. The Next Reaction Method is extended to include timedependent rate constants and nonMarkov processes and is applied to a sample application in biology (the lysis/lysogeny decision circuit of lambda phage). The performance of the Next Reaction Method on this application is compared with one standard method and an optimized version of that standard method. 1.
Join synopses for approximate query answering
 In SIGMOD
, 1999
"... In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on statistical summaries of the full data. In this paper, we demonstrate the difficulty of providing good approximate answers for joinqueries using only statistic ..."
Abstract

Cited by 167 (9 self)
 Add to MetaCart
In large data warehousing environments, it is often advantageous to provide fast, approximate answers to complex aggregate queries based on statistical summaries of the full data. In this paper, we demonstrate the difficulty of providing good approximate answers for joinqueries using only statistics (in particular, samples) from the base relations. We propose join synopses (join samples) as an effective solution for this problem and show how precomputing just one join synopsis for each relation suffices to significantly improve the quality of approximate answers for arbitrary queries with foreign key joins. We present optimal strategies for allocating the available space among the various join synopses when the query work load is known and identify heuristics for the common case when the work load is not known. We also present efficient algorithms for incrementally maintaining join synopses in the presence of updates to the base relations. One of our key contributions is a detailed analysis of the error bounds obtained for approximate answers that demonstrates the tradeoffs in various methods, as well as the advantages in certain scenarios of a new subsampling method we propose. Our extensive set of experiments on the TPCD benchmark database show the effectiveness of join synopses and various other techniques proposed in this paper. 1
Synopsis Data Structures for Massive Data Sets
"... Abstract. Massive data sets with terabytes of data are becoming commonplace. There is an increasing demand for algorithms and data structures that provide fast response times to queries on such data sets. In this paper, we describe a context for algorithmic work relevant to massive data sets and a f ..."
Abstract

Cited by 116 (13 self)
 Add to MetaCart
Abstract. Massive data sets with terabytes of data are becoming commonplace. There is an increasing demand for algorithms and data structures that provide fast response times to queries on such data sets. In this paper, we describe a context for algorithmic work relevant to massive data sets and a framework for evaluating such work. We consider the use of "synopsis" data structures, which use very little space and provide fast (typically approximated) answers to queries. The design and analysis of effective synopsis data structures o er many algorithmic challenges. We discuss a number of concrete examples of synopsis data structures, and describe fast algorithms for keeping them uptodate in the presence of online updates to the data sets.
Optimal Prediction for Prefetching in the Worst Case
, 1998
"... Response time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, whe ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
Response time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, where the predictions must be accurate as well as made in a computationally efficient way. Unlike other online problems, prefetching cannot admit a competitive analysis, since the optimal offline prefetcher incurs no cost when it knows the future page requests. Previous analytical work on prefetching [J. Assoc. Comput. Mach., 143 (1996), pp. 771–793] consisted of modeling the user as a probabilistic Markov source. In this paper, we look at the much stronger form of worstcase analysis and derive a randomized algorithm for pure prefetching. We compare our algorithm for every page request sequence with the important class of finite state prefetchers, making no assumptions as to how the sequence of page requests is generated. We prove analytically that the fault rate of our online prefetching algorithm converges almost surely for every page request sequence to the fault rate of the optimal finite state prefetcher for the sequence. This analysis model can be looked upon as a generalization of the competitive framework, in that it compares an online algorithm in a worstcase manner over all sequences with a powerful yet nonclairvoyant opponent. We simultaneously achieve the computational goal of implementing our prefetcher in optimal constant expected time per prefetched page using the optimal dynamic discrete random variate generator of Matias, Vitter, and Ni [Proc. 4th Annual SIAM/ACM
Aqua project white paper
, 1997
"... Viswanath Poosala z In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. The goal is to provide an estimated response in orders of magnitude less time than the time to compute an exact answer, by avoiding o ..."
Abstract

Cited by 19 (10 self)
 Add to MetaCart
Viswanath Poosala z In large data recording and warehousing environments, it is often advantageous to provide fast, approximate answers to queries, whenever possible. The goal is to provide an estimated response in orders of magnitude less time than the time to compute an exact answer, by avoiding or minimizing the number of accesses to the base data. This white paper describes the Approximate QUery Answering (AQUA) Project underway in the Information Sciences Research Center at Bell Labs. We present a framework for an approximate query engine that observes new data as it arrives and maintains small synopsis data structures on that data. These data structures are used to provide fast, approximate answers to a broad class of queries. We describe metrics for evaluating approximate query answers. We also present new synopsis data structures, and new techniques for approximate query answers. We report on the goals and status of the Aqua project, and plans for future work.
A nearly lineartime PTAS for explicit fractional packing and covering linear programs
 Algorithmica
"... Abstract We give an approximation algorithm for fractional packing and covering linear programs (linear programs with nonnegative coefficients). Given a constraint matrix with n nonzeros, r rows, and c columns, the algorithm (with high probability) computes feasible primal and dual solutions whose ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract We give an approximation algorithm for fractional packing and covering linear programs (linear programs with nonnegative coefficients). Given a constraint matrix with n nonzeros, r rows, and c columns, the algorithm (with high probability) computes feasible primal and dual solutions whose costs are within a factor of 1 + ε of OPT (the optimal cost) in time O((r + c) log(n)/ε2 + n).
Efficient sampling methods for discrete distributions
 In Proc. 39th International Colloquium on Automata, Languages, and Programming (ICALP’12
, 2012
"... Abstract We study the fundamental problem of the exact and efficient generation of random values from a finite and discrete probability distribution. Suppose that we are given n distinct events with associated probabilities p1,..., pn. We consider the problem of sampling a subset, which includes the ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract We study the fundamental problem of the exact and efficient generation of random values from a finite and discrete probability distribution. Suppose that we are given n distinct events with associated probabilities p1,..., pn. We consider the problem of sampling a subset, which includes the ith event independently with probability pi, and the problem of sampling from the distribution, where the ith event has a probability proportional to pi. For both problems we present on two different classes of inputs – sorted and general probabilities – efficient preprocessing algorithms that allow for asymptotically optimal querying, and prove almost matching lower bounds for their complexity. 1
Succinct Sampling from Discrete Distributions
"... We revisit the classic problem of sampling from a discrete distribution: Given n nonnegative wbit integers x1,..., xn, the task is to build a data structure that allows sampling i with probability proportional to xi. The classic solution is Walker’s alias method that takes, when implemented on a W ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
We revisit the classic problem of sampling from a discrete distribution: Given n nonnegative wbit integers x1,..., xn, the task is to build a data structure that allows sampling i with probability proportional to xi. The classic solution is Walker’s alias method that takes, when implemented on a Word RAM, O(n) preprocessing time, O(1) expected query time for one sample, and n(w+2 lg n+o(1)) bits of space. Using the terminology of succinct data structures, this solution has redundancy 2n lg n + o(n) bits, i.e., it uses 2n lg n + o(n) bits in addition to the information theoretic minimum required for storing the input. In this paper, we study whether this space usage can be improved. In the systematic case, in which the input is readonly, we present a novel data structure using r + O(w) redundant
Using Learning and Difficulty of Prediction to Decrease Computation: A Fast Sort and Priority Queue on Entropy Bounded Inputs
"... There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently, (e.g. see [Vitter, Krishnan, FOCS91], [Karlin, Philips, Raghavan, FOCS92] [Raghavan92]) for use of Markov models for online algor ..."
Abstract
 Add to MetaCart
There is an upsurge in interest in the Markov model and also more general stationary ergodic stochastic distributions in theoretical computer science community recently, (e.g. see [Vitter, Krishnan, FOCS91], [Karlin, Philips, Raghavan, FOCS92] [Raghavan92]) for use of Markov models for online algorithms e.g., cashing and prefetching). Their results used the fact that compressible sources are predictable (and vise versa), and show that online algorithms can improve their performance by prediction. Actual page access sequences are in fact somewhat compressible, so their predictive methods can be of benefit. This paper investigates the interesting idea of decreasing computation by using learning in the opposite way, namely to determine the difficulty of prediction. That is, we will approximately learn the input distribution, and then improve the performance of the computation when the input is not too predictable, rather than the reverse. To our knowledge, this is first case of a computational problem where we do not assume any particular fixed input distribution and yet computation is decreased when the input is less predictable, rather than the reverse. We concentrate our investigation on a basic computational problem: sorting and a basic data structure problem: maintaining a priority queue. We present the first known case of sorting and priority queue algorithms whose complexity depends on the binary entropy H ≤ 1 of input keys where assume that input keys are generated from an unknown but arbitrary stationary ergodic source. This is, we assume that each of the input keys can be each arbitrarily long, but have entropy H. Note that H can be