Results 11  20
of
115
A General Lower Bound on the I/OComplexity of Comparisonbased Algorithms
 In Proc. Workshop on Algorithms and Data Structures, LNCS 709
, 1993
"... We show a general relationship between the number of comparisons and the number of I/Ooperations needed to solve a given problem. This relationship enables one to show lower bounds on the number of I/Ooperations needed to solve a problem whenever a lower bound on the number of comparisons is known ..."
Abstract

Cited by 32 (11 self)
 Add to MetaCart
We show a general relationship between the number of comparisons and the number of I/Ooperations needed to solve a given problem. This relationship enables one to show lower bounds on the number of I/Ooperations needed to solve a problem whenever a lower bound on the number of comparisons is known. We use the result to show lower bounds on the I/Ocomplexity on a number of problems where known techniques only give trivial bounds. Among these are the problems of removing duplicates from a multiset, a problem of great importance in e.g. relational database systems, and the problem of determining the mode  the most frequently occurring element  of a multiset. We develop algorithms for these problems in order to show that the lower bounds are tight.
The Cell Probe Complexity of Succinct Data Structures
 In Automata, Languages and Programming, 30th International Colloquium (ICALP 2003
, 2003
"... We show lower bounds in the cell probe model for the redundancy/query time tradeoff of solutions to static data structure problems. ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
We show lower bounds in the cell probe model for the redundancy/query time tradeoff of solutions to static data structure problems.
Adaptive Intersection and tThreshold Problems
, 2002
"... Consider the problem of computing the intersection of k sorted sets. In the comparison model, we prove a new lower bound which depends on the nondeterministic complexity of the instance, and implies that the algorithm of Demaine, LopezOrtiz and Munro [2] is usually optimal in this \adaptive" sense ..."
Abstract

Cited by 30 (12 self)
 Add to MetaCart
Consider the problem of computing the intersection of k sorted sets. In the comparison model, we prove a new lower bound which depends on the nondeterministic complexity of the instance, and implies that the algorithm of Demaine, LopezOrtiz and Munro [2] is usually optimal in this \adaptive" sense. We extend the lower bound and the algorithm to the tThreshold Problem, which consists in nding the elements which are in at least t of the k sets. These problems are motivated by boolean queries in text database systems.
PRIMES is in P
 Ann. of Math
, 2002
"... We present an unconditional deterministic polynomialtime algorithm that determines whether an input number is prime or composite. 1 ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
We present an unconditional deterministic polynomialtime algorithm that determines whether an input number is prime or composite. 1
Outlier mining in large highdimensional data sets
 IEEE Transactions on Knowledge and Data Engineering
, 2005
"... In this paper a new definition of distancebased outlier and an algorithm, called HilOut, designed to efficiently detect the top n outliers of a large and highdimensional data set are proposed. Given an integer k, the weight of a point is defined as the sum of the distances separating it from its k ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
In this paper a new definition of distancebased outlier and an algorithm, called HilOut, designed to efficiently detect the top n outliers of a large and highdimensional data set are proposed. Given an integer k, the weight of a point is defined as the sum of the distances separating it from its k nearestneighbors. Outlier are those points scoring the largest values of weight. The algorithm HilOut makes use of the notion of spacefilling curve to linearize the data set, and it consists of two phases. The first phase provides an approximate solution, within a rough factor, after the execution of at most d + 1 sorts and scans of the data set, with temporal cost quadratic in d and linear in N and in k, where d is the number of dimensions of the data set and N is the number of points in the data set. During this phase, the algorithm isolates points candidate to be outliers and reduces this set at each iteration. If the size of this set becomes n, then the algorithm stops reporting the exact solution. The second phase calculates the exact solution with a final scan examining further the candidate outliers remained after the first phase. Experimental results show that the algorithm always stops, reporting the exact solution, during the first phase after much less than d + 1 steps. We present both an inmemory and diskbased implementation of the HilOut algorithm and a thorough scaling analysis for real and synthetic data sets showing that the algorithm scales well in both cases.
An asymptotic theory for CauchyEuler differential equations with applications to the analysis of algorithms
, 2002
"... CauchyEuler differential equations surfaced naturally in a number of sorting and searching problems, notably in quicksort and binary search trees and their variations. Asymptotics of coefficients of functions satisfying such equations has been studied for several special cases in the literature. We ..."
Abstract

Cited by 22 (10 self)
 Add to MetaCart
CauchyEuler differential equations surfaced naturally in a number of sorting and searching problems, notably in quicksort and binary search trees and their variations. Asymptotics of coefficients of functions satisfying such equations has been studied for several special cases in the literature. We study in this paper the most general framework for CauchyEuler equations and propose an asymptotic theory that covers almost all applications where CauchyEuler equations appear. Our approach is very general and requires almost no background on differential equations. Indeed the whole theory can be stated in terms of recurrences instead of functions. Old and new applications of the theory are given. New phase changes of limit laws of new variations of quicksort are systematically derived. We apply our theory to about a dozen of diverse examples in quicksort, binary search trees, urn models, increasing trees, etc.
Incremental Organization for Data Recording and Warehousing
, 1997
"... Data warehouses and recording systems typically have a large continuous stream of incoming data, that must be stored in a manner suitable for future access. Access to stored records is usually based on a key. Organizing the data on disk as the data arrives using standard techniques would result in e ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
Data warehouses and recording systems typically have a large continuous stream of incoming data, that must be stored in a manner suitable for future access. Access to stored records is usually based on a key. Organizing the data on disk as the data arrives using standard techniques would result in either (a) one or more I/Os to store each incoming record (to keep the data clustered by the key), which is too expensive when data arrival rates are very high, or (b) many I/Os to locate records for a particular customer (if data is stored clustered by arrival order). We study two techniques, inspired by external sorting algorithms, to store data incrementally as it arrives, simultaneously providing good performance for recording and querying. We present concurrency control and recovery schemes for both techniques. We show the benefits of our techniques both analytically and experimentally. 1 Introduction A fundamental characteristic of many data warehouses and data recording systems ([JMS...
Total Path Length for Random Recursive Trees
, 1998
"... Total path length, or search cost, for a rooted tree is defined as the sum of all roottonode distances. Let T n be the total path length for a random recursive tree of order n. Mahmoud (1991) showed that W n := (T n \Gamma E[T n ])=n converges almost surely and in L 2 to a nondegenerate limiting ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Total path length, or search cost, for a rooted tree is defined as the sum of all roottonode distances. Let T n be the total path length for a random recursive tree of order n. Mahmoud (1991) showed that W n := (T n \Gamma E[T n ])=n converges almost surely and in L 2 to a nondegenerate limiting random variable W . Here we give recurrence relations for the moments of W n and of W and show that W n converges to W in L p for each 0 ! p ! 1. We confirm the conjecture that the distribution of W is not normal. We also show that the distribution of W is characterized among all distributions having zero mean and finite variance by the distributional identity W d = U(1 +W ) + (1 \Gamma U)W \Gamma E(U); where E(x) := \Gammax ln x \Gamma (1 \Gamma x) ln(1 \Gamma x) is the binary entropy function, U is a uniform(0; 1) random variable, W and W have the same distribution, and U; W , and W are mutually independent. Finally, we derive an approximation for the distribution of W usi...
On the Cell Probe Complexity of Polynomial Evaluation
, 1995
"... We consider the cell probe complexity of the polynomial evaluation problem with preprocessing of coefficients, for polynomials of degree at most n over a finite field K. We show that the trivial cell probe algorithm for the problem is optimal if K is sufficiently large compared to n. As an applicati ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
We consider the cell probe complexity of the polynomial evaluation problem with preprocessing of coefficients, for polynomials of degree at most n over a finite field K. We show that the trivial cell probe algorithm for the problem is optimal if K is sufficiently large compared to n. As an application, we give a new proof of the fact that P 6= incrTIME(o(log n= log log n)). 1 Introduction Let K be a field. We consider the polynomial evaluation problem with preprocessing of coefficients. This problem is as follows: Given a polynomial f(X) 2 K[X], preprocess it, so that later, for any field element a 2 K, f(a) can be computed efficiently. It is a classical problem in the theory of algebraic complexity and has been intensively investigated in the model of arithmetic straight line programs. In this model, a solution for the polynomials of degree at most n is given by two objects: ffl A map OE from the set of polynomials of degree at most n into K s , where s is any integer, called t...
Duality between prefetching and queued writing with applications to external sorting
 IN EUROPEAN SYMPOSIUM ON ALGORITHMS, VOLUME 2161 OF LECTURE NOTES IN COMPUTER SCIENCE
, 1998
"... Parallel disks promise to be a cost effective means for achieving high bandwidth in applications involving massive data sets, but algorithms for parallel disks can be difficult to devise. To combat this problem, we define a useful and natural duality between writing to parallel disks and the seeming ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
Parallel disks promise to be a cost effective means for achieving high bandwidth in applications involving massive data sets, but algorithms for parallel disks can be difficult to devise. To combat this problem, we define a useful and natural duality between writing to parallel disks and the seemingly more difficult problem of prefetching. We first explore this duality for applications involving readonce accesses using parallel disks. We get a simple linear time algorithm for computing optimal prefetch schedules and analyze the efficiency of the resulting schedules for randomly placed data and for arbitrary interleaved accesses to striped sequences. Duality also provides an optimal schedule for the integrated caching and prefetching problem, in which blocks can be accessed multiple times. Another application of this duality gives us the rst parallel disk sorting algorithms that are provably optimal up to lower order terms. One of these algorithms is a simple and practical variant of multiway merge sort, addressing a question that has been open for some time.