Results 11  20
of
22
LinearSpace Data Structures for Range Mode Query in Arrays ∗
"... A mode of a multiset S is an element a ∈ S of maximum multiplicity; that is, a occurs at least as frequently as any other element in S. Given an array A[1: n] of n elements, we consider a basic problem: constructing a static data structure that efficiently answers range mode queries on A. Each query ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
A mode of a multiset S is an element a ∈ S of maximum multiplicity; that is, a occurs at least as frequently as any other element in S. Given an array A[1: n] of n elements, we consider a basic problem: constructing a static data structure that efficiently answers range mode queries on A. Each query consists of an input pair of indices (i, j) for which a mode of A[i: j] must be returned. The best previous data structure with linear space, by Krizanc, Morin, and Smid (ISAAC 2003), requires O ( √ n log log n) query time. We improve their result and present an O(n)space data structure that supports range mode queries in O ( p n / log n) worstcase time. Furthermore, we present strong evidence that a query time significantly below √ n cannot be achieved by purely combinatorial techniques; we show that boolean matrix multiplication of two √ n × √ n matrices reduces to n range mode queries in an array of size O(n). Additionally, we give linearspace data structures for orthogonal range mode in higher dimensions (queries in near O(n 1−1/2d) time) and for halfspace range mode in higher dimensions (queries in O(n 1−1/d2) time).
Distributed Computation of the Mode
 PODC'08
, 2008
"... This paper studies the problem of computing the most frequent element (the mode) by means of a distributed algorithm where the elements are located at the nodes of a network. Let k denote the number of distinct elements and further let mi be the number of occurrences of the element ei in the ordered ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This paper studies the problem of computing the most frequent element (the mode) by means of a distributed algorithm where the elements are located at the nodes of a network. Let k denote the number of distinct elements and further let mi be the number of occurrences of the element ei in the ordered list of occurrences m1> m2> =...> = mk. We give a deterministic distributed algorithm with time complexity O(D+k) where D denotes the diameter of the graph, which is essentially tight. As our main contribution, a Monte Carlo algorithm is presented which computes the mode in O(D + F2/m21 * log k) time with high probability, where the frequency moment F ` is deo/ned as F ` = Pki=1 m`i. This algorithm is substantially faster than the deterministic algorithm for various relevant frequency distributions. Moreover, we provide a lower bound of \Omega (D + F5/(m51B)), where B is the maximum message size, that captures the effect of the frequency distribution on the time complexity to compute the mode.
Distributionsensitive set multipartitioning
 1st International Conference on the Analysis of Algorithms
, 2005
"... Given a set S with realvalued members, associated with each member one of two possible types; a multipartitioning of S is a sequence of the members of S such that if x, y ∈ S have different types and x < y, x precedes y in the multipartitioning of S. We give two distributionsensitive algorithms ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Given a set S with realvalued members, associated with each member one of two possible types; a multipartitioning of S is a sequence of the members of S such that if x, y ∈ S have different types and x < y, x precedes y in the multipartitioning of S. We give two distributionsensitive algorithms for the set multipartitioning problem and a matching lower bound in the algebraic decisiontree model. One of the two algorithms can be made stable and can be implemented in place. We also give an outputsensitive algorithm for the problem.
Data Reduction Through Early Grouping
 In Proceedings of the 1994 IBM CAS Conference
, 1994
"... SQL queries containing GROUPBY and aggregation occur frequently in decision support applications. Grouping with aggregation is typically done by first sorting the input and then performing the aggregation as part of the output phase of the sort. The most widely used external sorting algorithm is mer ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
SQL queries containing GROUPBY and aggregation occur frequently in decision support applications. Grouping with aggregation is typically done by first sorting the input and then performing the aggregation as part of the output phase of the sort. The most widely used external sorting algorithm is merge sort, consisting of a run formation phase followed by a (single) merge pass. The amount of data output from the run formation phase can be reduced by a technique that we call early grouping. The idea is straightforward: simply form groups and perform aggregation during run formation. Each run will now consist of partial groups instead of individual records. These partial groups are then combined during the merge phase. Early grouping always reduces the number of records output from the run formation phase. The relative output size depends on the amount of memory relative to the total number of groups and the distribution of records over groups. When the input data is uniformly distributed...
CACHEOBLIVIOUS SEARCHING AND SORTING IN MULTISETS
"... presented to the University of Waterloo ..."
DistributionSensitive Construction of MinimumRedundancy Prefix Codes
, 2005
"... Abstract. A new method for constructing minimumredundancy prefix codes is described. This method does not build a Huffman tree; instead it uses a property of optimal codes to find the codeword length of each weight. The running time of the algorithm is shown to be O(nk), where n is the number of we ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. A new method for constructing minimumredundancy prefix codes is described. This method does not build a Huffman tree; instead it uses a property of optimal codes to find the codeword length of each weight. The running time of the algorithm is shown to be O(nk), where n is the number of weights and k is the number of different codeword lengths. When the given sequence of weights is already sorted, it is shown that the codes can be constructed using O(log 2k−1 n) comparisons, which is sublinear if the value of k is small.
From Time to Space: Fast Algorithms that yield Small and Fast Data Structures
"... Abstract. In many cases, the relation between encoding space and execution time translates into combinatorial lower bounds on the computational complexity of algorithms in the comparison or external memory models. We describe a few cases which illustrate this relation in a distinct direction, where ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. In many cases, the relation between encoding space and execution time translates into combinatorial lower bounds on the computational complexity of algorithms in the comparison or external memory models. We describe a few cases which illustrate this relation in a distinct direction, where fast algorithms inspire compressed encodings or data structures. In particular, we describe the relation between searching in an ordered array and encoding integers; merging sets and encoding a sequence of symbols; and sorting and compressing permutations.
Duplicate Detection and Deletion in the Extended NF² Data Model
 IN PROC. OF THE 1989 INTL. CONF. ON FOUNDATIONS OF DATA ORGANIZATION AND ALGORITHMS
, 1989
"... A current research topic in the area of relational databases is the design of systems based on the Non First Normal Form (NF ) data model. One particular development, the socalled extended NF data model, even permits structured values like lists and tuples to be included as attributes in rel ..."
Abstract
 Add to MetaCart
A current research topic in the area of relational databases is the design of systems based on the Non First Normal Form (NF ) data model. One particular development, the socalled extended NF data model, even permits structured values like lists and tuples to be included as attributes in relations. It is thus well suited to represent complex objects for nonstandard database applications. A DBMS which uses this model, called the Advanced Information Management Prototype, is currently being implemented at the IBM Heidelberg Scientific Center. In this paper we examine the problem of detecting and deleting duplicates within this data model. Several alternative approaches are evaluated and a new method, based on sorting complex objects, is proposed, which is both time and spaceefficient.
Empirical entropy in context
, 708
"... In statistics as in life, many things become clearer when we consider context. Statisticians’ use of context itself becomes clearer, in fact, when we consider the past century. It was anathema to them prior to 1906, when Markov [22] proved the weak law of large numbers applies to chains of dependent ..."
Abstract
 Add to MetaCart
In statistics as in life, many things become clearer when we consider context. Statisticians’ use of context itself becomes clearer, in fact, when we consider the past century. It was anathema to them prior to 1906, when Markov [22] proved the weak law of large numbers applies to chains of dependent events over finite domains (i.e., finitestate Markov processes). 1 He published several papers on the statistics of dependent events and in 1913 gave an example of dependence in language: he analyzed the first 20000 characters of Pushkin’s Eugene Onegin and found the likelihood of a vowel was strongly affected by the presence of vowels in the four preceding positions. Many other examples have been found since, in physics, chemistry, biology, economics, sociology, psychology — every branch of the natural and social sciences. While Markov was developing the idea of Markov processes, another probability theorist, Borel, was starting an investigation into examples beyond their scope. Borel [2] defined a number to be normal in base b if, in its infinite bary representation, every ktuple occurs with relative frequency 1/b k; he called a number absolutely normal if normal in every base. Using the BorelCantelli Lemma, he showed nearly all numbers are absolutely normal, although his proof was completely nonconstructive. Sierpinski [28] gave the first example
Abstract
, 907
"... Although many authors have considered how many ternary comparisons it takes to sort a multiset S of size n, the best known upper and lower bounds still differ by a term linear in n. In this paper we restrict our attention to online stable sorting and prove upper and lower bounds that are within o(n) ..."
Abstract
 Add to MetaCart
Although many authors have considered how many ternary comparisons it takes to sort a multiset S of size n, the best known upper and lower bounds still differ by a term linear in n. In this paper we restrict our attention to online stable sorting and prove upper and lower bounds that are within o(n) not only of each other but also of the best known upper bound for offline sorting. Specifically, we first prove that if the number of distinct elements σ = o(n / log n), then (H + 1)n + o(n) comparisons are sufficient, where H is the entropy of the distribution of the elements in S. We then give a simple proof that (H + 1)n − o(n) comparisons are necessary in the worst case. Key words: Analysis of algorithms; online algorithms; sorting. 1