Results 1 - 10
of
41
Adaptive Set Intersections, Unions, and Differences
- In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA
, 2000
"... Motivated by boolean queries in text database systems, we consider the problems of finding the intersection, union, or difference of a collection of sorted sets. While the worst-case complexity of these problems is straightforward, we consider a notion of complexity that depends on the particular in ..."
Abstract
-
Cited by 51 (6 self)
- Add to MetaCart
Motivated by boolean queries in text database systems, we consider the problems of finding the intersection, union, or difference of a collection of sorted sets. While the worst-case complexity of these problems is straightforward, we consider a notion of complexity that depends on the particular instance. We develop the idea of a proof that a given set is indeed the correct answer. Proofs, and in particular shortest proofs, are characterized. We present adaptive algorithms that make no a priori assumptions about the problem instance, and show that their running times are within a constant factor of optimal with respect to a natural measure of the difficulty of an instance. In the process, we develop a framework for designing and evaluating adaptive algorithms in the comparison model. 1 Introduction and Overview Our work can be seen in the general context of performing searches quickly in a database or data warehousing environment. The broad issue is that of characterizing what type ...
Conditions on input vectors for consensus solvability in asynchronous distributed systems
- Journal of the ACM
, 2001
"... Abstract. This article introduces and explores the condition-based approach to solve the consensus problem in asynchronous systems. The approach studies conditions that identify sets of input vectors for which it is possible to solve consensus despite the occurrence of up to f process crashes. The f ..."
Abstract
-
Cited by 30 (9 self)
- Add to MetaCart
Abstract. This article introduces and explores the condition-based approach to solve the consensus problem in asynchronous systems. The approach studies conditions that identify sets of input vectors for which it is possible to solve consensus despite the occurrence of up to f process crashes. The first main result defines acceptable conditions and shows that these are exactly the conditions for which a consensus protocol exists. Two examples of realistic acceptable conditions are presented, and proved to be maximal, in the sense that they cannot be extended and remain acceptable. The second main result is a generic consensus shared-memory protocol for any acceptable condition. The protocol always guarantees agreement and validity, and terminates (at least) when the inputs satisfy the condition with which the protocol has been instantiated, or when there are no crashes. An efficient version of the protocol is then designed for the message passing model that works when f < n/2, and it is shown that no such protocol exists when f ≥ n/2. It is also shown how the protocol’s safety can be traded for its liveness.
Adaptive Intersection and t-Threshold Problems
, 2002
"... Consider the problem of computing the intersection of k sorted sets. In the comparison model, we prove a new lower bound which depends on the non-deterministic complexity of the instance, and implies that the algorithm of Demaine, Lopez-Ortiz and Munro [2] is usually optimal in this \adaptive" sense ..."
Abstract
-
Cited by 26 (11 self)
- Add to MetaCart
Consider the problem of computing the intersection of k sorted sets. In the comparison model, we prove a new lower bound which depends on the non-deterministic complexity of the instance, and implies that the algorithm of Demaine, Lopez-Ortiz and Munro [2] is usually optimal in this \adaptive" sense. We extend the lower bound and the algorithm to the t-Threshold Problem, which consists in nding the elements which are in at least t of the k sets. These problems are motivated by boolean queries in text database systems.
Faster adaptive set intersections for text searching
- Experimental Algorithms: 5th International Workshop, WEA 2006, Cala Galdana, Menorca
, 2006
"... Abstract. The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which improves over those proposed by Demaine, Munro and López-Ortiz [SODA 2000/ALENEX 2001], by us ..."
Abstract
-
Cited by 22 (5 self)
- Add to MetaCart
Abstract. The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which improves over those proposed by Demaine, Munro and López-Ortiz [SODA 2000/ALENEX 2001], by using a variant of interpolation search. More specifically, our contributions are threefold. First, we corroborate and complete the practical study from Demaine et al. on comparison based intersection algorithms. Second, we show that in practice replacing binary search and galloping (one-sided binary) search [4] by interpolation search improves the performance of each main intersection algorithms. Third, we introduce and test variants of interpolation search: this results in an even better intersection algorithm.
Adaptive searching in succinctly encoded binary relations and tree-structured documents (Extended Abstract)
- THEORETICAL COMPUTER SCIENCE
, 2005
"... This paper deals with succinct representations of data types motivated by applications in posting lists for search engines, in querying XML documents, and in the more general setting (which extends XML) of multi-labeled trees, where several labels can be assigned to each node of a tree. To find th ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
This paper deals with succinct representations of data types motivated by applications in posting lists for search engines, in querying XML documents, and in the more general setting (which extends XML) of multi-labeled trees, where several labels can be assigned to each node of a tree. To find the set of references corresponding to a set of keywords, one typically intersects the list of references associated with each keyword. We view this instead as having a single list of objects [n] = {1,..., n} (the references), each of which has a subset of the labels [σ] = {1,..., σ} (the keywords) associated with it. We are able to find the objects associated with an arbitrary set of keywords in time O(δk lg lg σ) using a data structure requiring only t(lg σ +o(lg σ)) bits, where δ is the number of steps required by a non-deterministic algorithm to check the answer, k is the number of keywords in the query, σ is the size of the set from which the keywords are chosen, and t is the number of associations between references and keywords. The data structure is succinct in that it differs from the space needed to write down all t occurrences of keywords by only a lower order term. An XML document is, for our purpose, a labeled rooted tree. We deal primarily with “non-recursive labeled trees”, where no label occurs more than once on any root to leaf path. We find the set of nodes which path from the root include a set of keywords in the same time, O(δk lg lg σ), on a representation of the tree using essentially minimum space, 2n + n(lg σ + o(lg σ)) bits, where n is the number of nodes in the tree. If we permit nodes to have multiple
A framework for adaptive algorithm selection in STAPL
- IN PROC. ACM SIGPLAN SYMP. PRIN. PRAC. PAR. PROG. (PPOPP), PP 277–288
, 2005
"... Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the runtime environment, and input data characteristics. This is even more challenging on parallel and distr ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Writing portable programs that perform well on multiple platforms or for varying input sizes and types can be very difficult because performance is often sensitive to the system architecture, the runtime environment, and input data characteristics. This is even more challenging on parallel and distributed systems due to the wide variety of system architectures. One way to address this problem is to adaptively select the best parallel algorithm for the current input data and system from a set of functionally equivalent algorithmic options. Toward this goal, we have developed a general framework for adaptive algorithm selection for use in the Standard Template Adaptive Parallel Library (STAPL). Our framework uses machine learning techniques to analyze data collected by STAPL installation benchmarks and to determine tests that will select among algorithmic options at run-time. We apply a prototype implementation of our framework to two important parallel operations, sorting and matrix multiplication, on multiple platforms and show that the framework determines run-time tests that correctly select the best performing algorithm from among several competing algorithmic options in 86-100 % of the cases studied, depending on the operation and the system.
Self-improving algorithms
- in SODA ’06: Proceedings of the seventeenth annual ACMSIAM symposium on Discrete algorithm
"... We investigate ways in which an algorithm can improve its expected performance by finetuning itself automatically with respect to an arbitrary, unknown input distribution. We give such self-improving algorithms for sorting and computing Delaunay triangulations. The highlights of this work: (i) an al ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
We investigate ways in which an algorithm can improve its expected performance by finetuning itself automatically with respect to an arbitrary, unknown input distribution. We give such self-improving algorithms for sorting and computing Delaunay triangulations. The highlights of this work: (i) an algorithm to sort a list of numbers with optimal expected limiting complexity; and (ii) an algorithm to compute the Delaunay triangulation of a set of points with optimal expected limiting complexity. In both cases, the algorithm begins with a training phase during which it adjusts itself to the input distribution, followed by a stationary regime in which the algorithm settles to its optimized incarnation. 1
Implementing Sorting in Database Systems
- ACM Comput. Surv
, 2006
"... Most commercial database systems do (or should) exploit many sorting techniques that are publicly known, but not readily available in the research literature. These techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in mul ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Most commercial database systems do (or should) exploit many sorting techniques that are publicly known, but not readily available in the research literature. These techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in multiuser operations. This survey collects many of these techniques for easy reference by students, researchers, and product developers. It covers in-memory sorting, disk-based external sorting, and considerations that apply specifically to sorting in database systems.
Compressed representations of permutations, and applications
- SYMPOSIUM ON THEORETICAL ASPECTS OF COMPUTER SCIENCE
"... We explore various techniques to compress a permutation π over n integers, taking advantage of ordered subsequences in π, while supporting its application π(i) and the application of its inverse π −1 (i) in small time. Our compression schemes yield several interesting byproducts, in many cases mat ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
We explore various techniques to compress a permutation π over n integers, taking advantage of ordered subsequences in π, while supporting its application π(i) and the application of its inverse π −1 (i) in small time. Our compression schemes yield several interesting byproducts, in many cases matching, improving or extending the best existing results on applications such as the encoding of a permutation in order to support iterated applications π k (i) of it, of integer functions, and of inverted lists and suffix arrays.
A framework for speeding up priorityqueue operations
, 2004
"... Abstract. We introduce a framework for reducing the number of element comparisons performed in priority-queue operations. In particular, we give a priority queue which guarantees the worst-case cost of O(1) per minimum finding and insertion, and the worst-case cost of O(log n) with at most log n + O ..."
Abstract
-
Cited by 8 (8 self)
- Add to MetaCart
Abstract. We introduce a framework for reducing the number of element comparisons performed in priority-queue operations. In particular, we give a priority queue which guarantees the worst-case cost of O(1) per minimum finding and insertion, and the worst-case cost of O(log n) with at most log n + O(1) element comparisons per minimum deletion and deletion, improving the bound of 2log n + O(1) on the number of element comparisons known for binomial queues. Here, n denotes the number of elements stored in the data structure prior to the operation in question, and log n equals max {1,log 2 n}. We also give a priority queue that provides, in addition to the above-mentioned methods, the priority-decrease (or decrease-key) method. This priority queue achieves the worst-case cost of O(1) per minimum finding, insertion, and priority decrease; and the worst-case cost of O(log n) with at most log n + O(log log n) element comparisons per minimum deletion and deletion. CR Classification. E.1 [Data Structures]: Lists, stacks, and queues; E.2 [Data

