Results 11  20
of
21
DistributionSensitive Algorithms
 NORDIC J. COMPUT
, 1998
"... We investigate a new paradigm of algorithm design for geometric problems that can be termed distributionsensitive. Our notion of distribution is more combinatorial in nature than spatial. We illustrate this on problems like planarhulls and 2Dmaxima where some of the previously known outputsensit ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We investigate a new paradigm of algorithm design for geometric problems that can be termed distributionsensitive. Our notion of distribution is more combinatorial in nature than spatial. We illustrate this on problems like planarhulls and 2Dmaxima where some of the previously known outputsensitive algorithms are recast in this setting. In a number of cases, the distributionsensitive analysis yields superior results for the above problems. Moreover these bounds are shown to be tight in the linear decision tree model. Our approach owes its spirit to the results known for sorting multisets and we exploit this relationship further to derive fast and efficient parallel algorithms for sorting multisets along with the geometric problems.
Instanceoptimal geometric algorithms
"... ... in 2d and 3d, and offline point location in 2d. We prove the existence of an algorithm A for computing 2d or 3d convex hulls that is optimal for every point set in the following sense: for every set S of n points and for every algorithm A ′ in a certain class A, the maximum running time of ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
... in 2d and 3d, and offline point location in 2d. We prove the existence of an algorithm A for computing 2d or 3d convex hulls that is optimal for every point set in the following sense: for every set S of n points and for every algorithm A ′ in a certain class A, the maximum running time of A on input 〈s1,..., sn〉 is at most a constant factor times the maximum running time of A ′ on 〈s1,..., sn〉, where the maximum is taken over all permutations 〈s1,..., sn 〉 of S. In fact, we can establish a stronger property: for every S and A ′ , the maximum running time of A is at most a constant factor times the average running time of A ′ over all permutations of S. We call algorithms satisfying these properties instanceoptimal in the orderoblivious and randomorder setting. Such instanceoptimal algorithms simultaneously subsume outputsensitive algorithms and distributiondependent averagecase algorithms, and all algorithms that do not take advantage of the order of the input or that assume the input is given in a random order. The class A under consideration consists of all algorithms in a decision tree model where the tests involve only multilinear functions with a constant number of arguments. To establish an instancespecific lower bound, we deviate from traditional Ben–Orstyle proofs and adopt an interesting adversary argument. For 2d convex hulls, we prove that a version of the well known algorithm by Kirkpatrick and Seidel (1986) or Chan, Snoeyink, and Yap (1995) already attains this lower bound. For 3d convex hulls, we propose a new algorithm. To demonstrate the potential of the concept, we further obtain instanceoptimal results for a few other standard problems in computational geometry, such as maxima in 2d and 3d, orthogonal line segment intersection in 2d, finding bichromatic L∞close pairs in 2d, offline orthogonal range searching in 2d, offline dominance reporting in 2d and 3d, offline halfspace range reporting 1.
Distributed Computation of the Mode
 PODC'08
, 2008
"... This paper studies the problem of computing the most frequent element (the mode) by means of a distributed algorithm where the elements are located at the nodes of a network. Let k denote the number of distinct elements and further let mi be the number of occurrences of the element ei in the ordered ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This paper studies the problem of computing the most frequent element (the mode) by means of a distributed algorithm where the elements are located at the nodes of a network. Let k denote the number of distinct elements and further let mi be the number of occurrences of the element ei in the ordered list of occurrences m1> m2> =...> = mk. We give a deterministic distributed algorithm with time complexity O(D+k) where D denotes the diameter of the graph, which is essentially tight. As our main contribution, a Monte Carlo algorithm is presented which computes the mode in O(D + F2/m21 * log k) time with high probability, where the frequency moment F ` is deo/ned as F ` = Pki=1 m`i. This algorithm is substantially faster than the deterministic algorithm for various relevant frequency distributions. Moreover, we provide a lower bound of \Omega (D + F5/(m51B)), where B is the maximum message size, that captures the effect of the frequency distribution on the time complexity to compute the mode.
Distributionsensitive set multipartitioning
 1st International Conference on the Analysis of Algorithms
, 2005
"... Given a set S with realvalued members, associated with each member one of two possible types; a multipartitioning of S is a sequence of the members of S such that if x, y ∈ S have different types and x < y, x precedes y in the multipartitioning of S. We give two distributionsensitive algorithms ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Given a set S with realvalued members, associated with each member one of two possible types; a multipartitioning of S is a sequence of the members of S such that if x, y ∈ S have different types and x < y, x precedes y in the multipartitioning of S. We give two distributionsensitive algorithms for the set multipartitioning problem and a matching lower bound in the algebraic decisiontree model. One of the two algorithms can be made stable and can be implemented in place. We also give an outputsensitive algorithm for the problem.
Data Reduction Through Early Grouping
 In Proceedings of the 1994 IBM CAS Conference
, 1994
"... SQL queries containing GROUPBY and aggregation occur frequently in decision support applications. Grouping with aggregation is typically done by first sorting the input and then performing the aggregation as part of the output phase of the sort. The most widely used external sorting algorithm is mer ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
SQL queries containing GROUPBY and aggregation occur frequently in decision support applications. Grouping with aggregation is typically done by first sorting the input and then performing the aggregation as part of the output phase of the sort. The most widely used external sorting algorithm is merge sort, consisting of a run formation phase followed by a (single) merge pass. The amount of data output from the run formation phase can be reduced by a technique that we call early grouping. The idea is straightforward: simply form groups and perform aggregation during run formation. Each run will now consist of partial groups instead of individual records. These partial groups are then combined during the merge phase. Early grouping always reduces the number of records output from the run formation phase. The relative output size depends on the amount of memory relative to the total number of groups and the distribution of records over groups. When the input data is uniformly distributed...
CACHEOBLIVIOUS SEARCHING AND SORTING IN MULTISETS
"... presented to the University of Waterloo ..."
DistributionSensitive Construction of MinimumRedundancy Prefix Codes
, 2005
"... Abstract. A new method for constructing minimumredundancy prefix codes is described. This method does not build a Huffman tree; instead it uses a property of optimal codes to find the codeword length of each weight. The running time of the algorithm is shown to be O(nk), where n is the number of we ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. A new method for constructing minimumredundancy prefix codes is described. This method does not build a Huffman tree; instead it uses a property of optimal codes to find the codeword length of each weight. The running time of the algorithm is shown to be O(nk), where n is the number of weights and k is the number of different codeword lengths. When the given sequence of weights is already sorted, it is shown that the codes can be constructed using O(log 2k−1 n) comparisons, which is sublinear if the value of k is small.
From Time to Space: Fast Algorithms that yield Small and Fast Data Structures
"... Abstract. In many cases, the relation between encoding space and execution time translates into combinatorial lower bounds on the computational complexity of algorithms in the comparison or external memory models. We describe a few cases which illustrate this relation in a distinct direction, where ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. In many cases, the relation between encoding space and execution time translates into combinatorial lower bounds on the computational complexity of algorithms in the comparison or external memory models. We describe a few cases which illustrate this relation in a distinct direction, where fast algorithms inspire compressed encodings or data structures. In particular, we describe the relation between searching in an ordered array and encoding integers; merging sets and encoding a sequence of symbols; and sorting and compressing permutations.
Duplicate Detection and Deletion in the Extended NF² Data Model
 IN PROC. OF THE 1989 INTL. CONF. ON FOUNDATIONS OF DATA ORGANIZATION AND ALGORITHMS
, 1989
"... A current research topic in the area of relational databases is the design of systems based on the Non First Normal Form (NF ) data model. One particular development, the socalled extended NF data model, even permits structured values like lists and tuples to be included as attributes in rel ..."
Abstract
 Add to MetaCart
A current research topic in the area of relational databases is the design of systems based on the Non First Normal Form (NF ) data model. One particular development, the socalled extended NF data model, even permits structured values like lists and tuples to be included as attributes in relations. It is thus well suited to represent complex objects for nonstandard database applications. A DBMS which uses this model, called the Advanced Information Management Prototype, is currently being implemented at the IBM Heidelberg Scientific Center. In this paper we examine the problem of detecting and deleting duplicates within this data model. Several alternative approaches are evaluated and a new method, based on sorting complex objects, is proposed, which is both time and spaceefficient.
Empirical entropy in context
, 708
"... In statistics as in life, many things become clearer when we consider context. Statisticians’ use of context itself becomes clearer, in fact, when we consider the past century. It was anathema to them prior to 1906, when Markov [22] proved the weak law of large numbers applies to chains of dependent ..."
Abstract
 Add to MetaCart
In statistics as in life, many things become clearer when we consider context. Statisticians’ use of context itself becomes clearer, in fact, when we consider the past century. It was anathema to them prior to 1906, when Markov [22] proved the weak law of large numbers applies to chains of dependent events over finite domains (i.e., finitestate Markov processes). 1 He published several papers on the statistics of dependent events and in 1913 gave an example of dependence in language: he analyzed the first 20000 characters of Pushkin’s Eugene Onegin and found the likelihood of a vowel was strongly affected by the presence of vowels in the four preceding positions. Many other examples have been found since, in physics, chemistry, biology, economics, sociology, psychology — every branch of the natural and social sciences. While Markov was developing the idea of Markov processes, another probability theorist, Borel, was starting an investigation into examples beyond their scope. Borel [2] defined a number to be normal in base b if, in its infinite bary representation, every ktuple occurs with relative frequency 1/b k; he called a number absolutely normal if normal in every base. Using the BorelCantelli Lemma, he showed nearly all numbers are absolutely normal, although his proof was completely nonconstructive. Sierpinski [28] gave the first example