Results 11  20
of
1,217
On the computation of multidimensional aggregates
 IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATABASES
, 1996
"... At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection ..."
Abstract

Cited by 205 (18 self)
 Add to MetaCart
At the heart of all OLAP or multidimensional data analysis applications is the ability to simultaneously aggregate across many sets of dimensions. Computing multidimensional aggregates is a performance bottleneck for these applications. This paper presents fast algorithms for computing a collection of groupbys. We focus on a special case of the aggregation problem  computation of the CUBE operator. The CUBE operator requires computing groupbys on all possible combinations of a list of attributes, and is equivalent to the union of a number of standard groupby operations. We show howthe structure of CUBE computation can be viewed in terms of a hierarchy of groupby operations. Our algorithms extend sortbased and hashbased grouping methods with several optimizations, like combining common operations across multiple groupbys, caching, and using precomputed groupbys for computing other groupbys. Empirical evaluation shows that the resulting algorithms give much better performance compared to straightforward methods. This paper combines work done concurrently on computing the data cube by two different teams as reported in [SAG96] and [DANR96].
Closest Point Search in Lattices
 IEEE TRANS. INFORM. THEORY
, 2000
"... In this semitutorial paper, a comprehensive survey of closestpoint search methods for lattices without a regular structure is presented. The existing search strategies are described in a unified framework, and differences between them are elucidated. An efficient closestpoint search algorithm, ba ..."
Abstract

Cited by 194 (1 self)
 Add to MetaCart
In this semitutorial paper, a comprehensive survey of closestpoint search methods for lattices without a regular structure is presented. The existing search strategies are described in a unified framework, and differences between them are elucidated. An efficient closestpoint search algorithm, based on the SchnorrEuchner variation of the Pohst method, is implemented. Given an arbitrary point x 2 R m and a generator matrix for a lattice , the algorithm computes the point of that is closest to x. The algorithm is shown to be substantially faster than other known methods, by means of a theoretical comparison with the Kannan algorithm and an experimental comparison with the Pohst algorithm and its variants, such as the recent ViterboBoutros decoder. The improvement increases with the dimension of the lattice. Modifications of the algorithm are developed to solve a number of related search problems for lattices, such as finding a shortest vector, determining the kissing number, compu...
Diffusion kernels on graphs and other discrete input spaces
 in: Proceedings of the 19th International Conference on Machine Learning
, 2002
"... The application of kernelbased learning algorithms has, so far, largely been confined to realvalued data and a few special data types, such as strings. In this paper we propose a general method of constructing natural families of kernels over discrete structures, based on the matrix exponentiation ..."
Abstract

Cited by 187 (7 self)
 Add to MetaCart
The application of kernelbased learning algorithms has, so far, largely been confined to realvalued data and a few special data types, such as strings. In this paper we propose a general method of constructing natural families of kernels over discrete structures, based on the matrix exponentiation idea. In particular, we focus on generating kernels on graphs, for which we propose a special class of exponential kernels called diffusion kernels, which are based on the heat equation and can be regarded as the discretization of the familiar Gaussian kernel of Euclidean space.
Convergence Analysis of Canonical Genetic Algorithms
 IEEE Transactions on Neural Networks
, 1994
"... This paper analyzes the convergence properties of the canonical genetic algorithm (CGA) with mutation, crossover and proportional reproduction applied to static optimization problems. It is proved by means of homogeneous finite Markov chain analysis that a CGA will never converge to the global optim ..."
Abstract

Cited by 173 (0 self)
 Add to MetaCart
This paper analyzes the convergence properties of the canonical genetic algorithm (CGA) with mutation, crossover and proportional reproduction applied to static optimization problems. It is proved by means of homogeneous finite Markov chain analysis that a CGA will never converge to the global optimum regardless of the initialization, crossover operator and objective function. But variants of CGAs that always maintain the best solution in the population, either before or after selection, are shown to converge to the global optimum due to the irreducibility property of the underlying original nonconvergent CGA. These results are discussed with respect to the schema theorem. Keywords: canonical genetic algorithm, global convergence, Markov chains, schema theorem 1 Introduction Canonical genetic algorithms (CGA) as introduced in [1] are often used to tackle static optimization problems of the type maxff(b) j b 2 IB l g (1) assuming that 0 ! f(b) ! 1 for all b 2 IB l = f0; 1g l and ...
Communication over fading channels with delay constraints
 IEEE Transactions on Information Theory
, 2002
"... We consider a user communicating over a fading channel with perfect channel state information. Data is assumed to arrive from some higher layer application and is stored in a buffer until it is transmitted. We study adapting the user's transmission rate and power based on the channel state informati ..."
Abstract

Cited by 170 (7 self)
 Add to MetaCart
We consider a user communicating over a fading channel with perfect channel state information. Data is assumed to arrive from some higher layer application and is stored in a buffer until it is transmitted. We study adapting the user's transmission rate and power based on the channel state information as well as the buffer occupancy; the objectives are to regulate both the longterm average transmission power and the average buffer delay incurred by the traffic. Two models for this situation are discussed; one corresponding to fixedlength/variablerate codewords and one corresponding to variablelength codewords. The tradeoff between the average delay and the average transmission power required for reliable communication is analyzed. A dynamic programming formulation is given to find all Pareto optimal power/delay operating points. We then quantify the behavior of this tradeoff in the regime of asymptotically large delay. In this regime we characterize simple buffer control policies which exhibit optimal characteristics. Connections to the delaylimited capacity and the expected capacity of fading channels are also discussed.
Updating and Querying Databases that Track Mobile Units
"... In this paper we consider databases representing information about moving objects (e.g. vehicles), particularly their location. We address the problems of updating and querying such databases. Specifically, the update problem is to determine when the location of a moving object in the database (name ..."
Abstract

Cited by 155 (12 self)
 Add to MetaCart
In this paper we consider databases representing information about moving objects (e.g. vehicles), particularly their location. We address the problems of updating and querying such databases. Specifically, the update problem is to determine when the location of a moving object in the database (namely its database location) should be updated. We answer this question by proposing an information cost model that captures uncertainty, deviation, and communication. Then we analyze deadreckoning policies, namely policies that update the database location whenever the distance between the actual location and the database location exceeds a given threshold, x. Deadreckoning is the prevalent approach in military applications, and our cost model enables us to determine the threshold x. We propose several deadreckoning policies and we compare their performance by simulation. Then we consider the problem of processing range queries in the database. An example of a range query is `retrieve the ob...
Logarithmic Asymptotics For SteadyState Tail Probabilities In A SingleServer Queue
, 1993
"... We consider the standard singleserver queue with unlimited waiting space and the firstin firstout service discipline, but without any explicit independence conditions on the interarrival and service times. We find conditions for the steadystate waitingtime distribution to have smalltail asympt ..."
Abstract

Cited by 150 (14 self)
 Add to MetaCart
We consider the standard singleserver queue with unlimited waiting space and the firstin firstout service discipline, but without any explicit independence conditions on the interarrival and service times. We find conditions for the steadystate waitingtime distribution to have smalltail asymptotics of the form x  1 logP(W > x)  q * as x for q * > 0. We require only stationarity of the basic sequence of service times minus interarrival times and a Ga .. rtnerEllis condition for the cumulant generating function of the associated partial sums, i.e., n  1 log Ee qS n y(q) as n , plus regularity conditions on the decay rate function y. The asymptotic decay rate q * is the root of the equation y(q) = 0. This result in turn implies a corresponding asymptotic result for the steadystate workload in a queue with general nondecreasing input. This asymptotic result covers the case of multiple independent sources, so that it provides additional theoretical support for a concept of effective bandwidths for admission control in multiclass queues based on asymptotic decay rates.
Probabilistic Models for Information Retrieval based on Divergence from Randomness
 ACM Transactions on Information Systems
, 2002
"... We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive termweighting models by measuring the divergence of the actual term distribution from that obtained under a ra ..."
Abstract

Cited by 149 (5 self)
 Add to MetaCart
We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive termweighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose–Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document–query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed document. The second normalization is related to the document length and to other statistics. These two normalization methods are applied to the basic models in succession to obtain weighting formulae. Results show that our framework produces different nonparametric models forming baseline alternatives to the standard tfidf model.
HeavyTailed Phenomena in Satisfiability and Constraint Satisfaction Problems
 J. of Autom. Reasoning
, 2000
"... Abstract. We study the runtime distributions of backtrack procedures for propositional satisfiability and constraint satisfaction. Such procedures often exhibit a large variability in performance. Our study reveals some intriguing properties of such distributions: They are often characterized by ver ..."
Abstract

Cited by 148 (27 self)
 Add to MetaCart
Abstract. We study the runtime distributions of backtrack procedures for propositional satisfiability and constraint satisfaction. Such procedures often exhibit a large variability in performance. Our study reveals some intriguing properties of such distributions: They are often characterized by very long tails or “heavy tails”. We will show that these distributions are best characterized by a general class of distributions that can have infinite moments (i.e., an infinite mean, variance, etc.). Such nonstandard distributions have recently been observed in areas as diverse as economics, statistical physics, and geophysics. They are closely related to fractal phenomena, whose study was introduced by Mandelbrot. We also show how random restarts can effectively eliminate heavytailed behavior. Furthermore, for harder problem instances, we observe long tails on the lefthand side of the distribution, which is indicative of a nonnegligible fraction of relatively short, successful runs. A rapid restart strategy eliminates heavytailed behavior and takes advantage of short runs, significantly reducing expected solution time. We demonstrate speedups of up to two orders of magnitude on SAT and CSP encodings of hard problems in planning, scheduling, and circuit synthesis. Key words: satisfiability, constraint satisfaction, heavy tails, backtracking 1.