Results 1 - 10
of
24
Fast Computation of Database Operations Using Graphics Processors
- Proc. of ACM SIGMOD
, 2004
"... We present new algorithms on commodity graphics processors for performing fast computation of several common database operations. Specifically, we consider operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical databa ..."
Abstract
-
Cited by 61 (14 self)
- Add to MetaCart
We present new algorithms on commodity graphics processors for performing fast computation of several common database operations. Specifically, we consider operations such as conjunctive selections, aggregations, and semi-linear queries, which are essential computational components of typical database, data warehousing, and data mining applications. While graphics processing units (GPUs) have been designed for fast display of geometric primitives, we utilize the inherent pipelining and parallelism, single instruction and multiple data (SIMD) capabilities, and vector processing functionality of GPUs, for evaluating boolean predicate combinations and semi-linear queries on attributes and executing database operations e#ciently. Our algorithms take into account some of the limitations of the programming model of current GPUs and perform no data rearrangements. Our algorithms have been implemented on a programmable GPU (e.g. NVIDIA's GeForce FX 5900) and applied to databases consisting of up to a million records. We have compared their performance with an optimized implementation of CPU-based algorithms. Our experiments indicate that the graphics processor available on commodity computer systems is an e#ective co-processor for performing database operations.
Efficient Computation of Frequent and Top-k Elements in Data Streams
- IN ICDT
, 2005
"... We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and top-k elements with tight guarantees on errors. For ..."
Abstract
-
Cited by 35 (6 self)
- Add to MetaCart
We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and top-k elements with tight guarantees on errors. For general data distributions, our top-k algorithm returns k elements that have roughly the highest frequencies; and it uses limited space for calculating frequent elements. For realistic Zipfian data, the space requirement of the proposed algorithm for solving the exact frequent elements problem decreases dramatically with the parameter of the distribution; and for top-k queries, the analysis ensures that only the top-k elements, in the correct order, are reported. The experiments, using real and synthetic data sets, show space reductions with no loss in accuracy. Having proved the effectiveness of the proposed approach through both analysis and experiments, we extend it to be able to answer continuous queries about frequent and top-k elements. Although the problems of incremental reporting of frequent and top-k elements are useful in many applications, to the best of our knowledge, no solution has been proposed.
Efficient Construction of Program Dependence Graphs
- ACM International Symposium on Software Testing and Analysis
, 1993
"... We present a new technique for constructing a program dependence graph that contains a program’s control flow, along with the usual control and data dependence information. Our algorithm constructs a program dependence graph while the program is be-ing parsed. For programs containing only structured ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
We present a new technique for constructing a program dependence graph that contains a program’s control flow, along with the usual control and data dependence information. Our algorithm constructs a program dependence graph while the program is be-ing parsed. For programs containing only structured transfers of control, our algorithm does not require in-formation provided by the control flow graph or post dominator tree and therefore obviates the construction of these auxiliary graphs. For programs cent aining explicit transfers of control, our algorithm adjusts the partial control dependence subgraph, constructed dur-ing the parse, to incorporate exact control dependence information. There are several advantages to our ap-proach. For many programs, our algorithm may result in substantial savings in time and memory since our construction of the program dependence graph does not require the auxiliary graph. Furthermore, since we incorporate control and data flow as well as ex-act control dependence information into the program dependence graph, our graph has a wide range of ap-plicability. We have implemented our algorithm by incorporating it into the Free Software Foundation’s GNU C compiler; currently we are performing experi-ments that compare our technique with the traditional approach.
A Logical Analysis of Aliasing in Imperative Higher-Order Functions
- INTERNATIONAL CONFERENCE ON FUNCTIONAL PROGRAMMING, ICFP’05
, 2005
"... We present a compositional program logic for call-by-value imperative higherorder functions with general forms of aliasing, which can arise from the use of reference names as function parameters, return values, content of references and part of data structures. The program logic ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
We present a compositional program logic for call-by-value imperative higherorder functions with general forms of aliasing, which can arise from the use of reference names as function parameters, return values, content of references and part of data structures. The program logic
Quickselect and Dickman function
- Combinatorics, Probability and Computing
, 2000
"... We show that the limiting distribution of the number of comparisons used by Hoare's quickselect algorithm when given a random permutation of n elements for finding the m-th smallest element, where m = o(n), is the Dickman function. The limiting distribution of the number of exchanges is also derived ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
We show that the limiting distribution of the number of comparisons used by Hoare's quickselect algorithm when given a random permutation of n elements for finding the m-th smallest element, where m = o(n), is the Dickman function. The limiting distribution of the number of exchanges is also derived. 1 Quickselect Quickselect is one of the simplest and e#cient algorithms in practice for finding specified order statistics in a given sequence. It was invented by Hoare [19] and uses the usual partitioning procedure of quicksort: choose first a partitioning key, say x; regroup the given sequence into two parts corresponding to elements whose values are less than and larger than x, respectively; then decide, according to the size of the smaller subgroup, which part to continue recursively or to stop if x is the desired order statistics; see Figure 1 for an illustration in terms of binary search trees. For more details, see Guibas [15] and Mahmoud [26]. This algorithm , although ine#cient in the worst case, has linear mean when given a sequence of n independent and identically distributed continuous random variables, or equivalently, when given a random permutation of n elements, where, here and throughout this paper, all n! permutations are equally likely. Let C n,m denote the number of comparisons used by quickselect for finding the m-th smallest element in a random permutation, where the first partitioning stage uses n 1 comparisons. Knuth [23] was the first to show, by some di#erencing argument, that E(C n,m ) = 2 (n + 3 + (n + 1)H n (m + 2)Hm (n + 3 -m)H n+1-m ) , n, where Hm = 1#k#m k -1 . A more transparent asymptotic approximation is E(C n,m ) (#), (#) := 2 #), # Part of the work of this author was done while he was visiting School of C...
Modeling High-Dimensional Index Structures using Sampling
- In Proc. ACM SIGMOD Int. Conf. on Management of Data
, 2001
"... A large number of index structures for high-dimensional data have been proposed previously. In order to tune and compare such index structures, it is vital to have efficient cost prediction techniques for these structures. Previous techniques either assume uniformity of the data or are not applicabl ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
A large number of index structures for high-dimensional data have been proposed previously. In order to tune and compare such index structures, it is vital to have efficient cost prediction techniques for these structures. Previous techniques either assume uniformity of the data or are not applicable to high-dimensional data. We propose the use of sampling to predict the number of accessed index pages during a query execution. Sampling is independent of the dimensionality and preserves clusters which is important for representing skewed data. We present a general model for estimating the index page layout using sampling and show how to compensate for errors. We then give an implementation of our model under restricted memory assumptions and show that it performs well even under these constraints. Errors are minimal and the overall prediction time is up to two orders of magnitude below the time for building and probing the full index without sampling. 1.
Top-k Spatial Joins of Probabilistic Objects
- ICDE
, 2008
"... Probabilistic data have recently become popular in applications such as scientific and geospatial databases. For images and other spatial datasets, probabilistic values can capture the uncertainty in extent and class of the objects in the images. Relating one such dataset to another by spatial joins ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Probabilistic data have recently become popular in applications such as scientific and geospatial databases. For images and other spatial datasets, probabilistic values can capture the uncertainty in extent and class of the objects in the images. Relating one such dataset to another by spatial joins is an important operation for data management systems. We propose probabilistic spatial join (PSJ) queries, which rank the results according to a score that incorporates both the uncertainties associated with the objects and the distances between them. We present the first algorithms for two kinds of PSJ queries: Threshold PSJ queries, which return all pairs that score above a given threshold, and top-k PSJ queries, which return the k top-scoring pairs. For threshold PSJ queries, we propose a plane sweep algorithm that, because it exploits the special structure of the problem, runs in O(n (log n + k)) time, where n is the number of points and k is the number of results. We extend the algorithms to multidimensional data and to top-k PSJ queries. To further speed up top-k PSJ queries, we develop a scheduling technique that estimates the scores at the level of blocks, then hands the blocks to the plane sweep algorithm. By finding high-scoring pairs early, the scheduling allows a large portion of the datasets to be pruned. Experiments demonstrate speed-ups of two orders of magnitude.
Heuristics for automatic localization of software faults
, 1992
"... Developing effective debugging strategies to guarantee the reliability of software is important. By analyzing the debugging process used by experienced programmers, four distinct tasks are found to be consistently performed: (1) determining statements involved in program failures, (2) selecting susp ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Developing effective debugging strategies to guarantee the reliability of software is important. By analyzing the debugging process used by experienced programmers, four distinct tasks are found to be consistently performed: (1) determining statements involved in program failures, (2) selecting suspicious statements that might contain faults, (3) making hypotheses about suspicious faults (variables and locations), and (4) restoring program state to a specific statement for verification. If all four tasks could be performed with direct assistance from a debugging tool, the debugging effort would become much easier. We have built a prototype debugging tool, Spyder, to assist users in conducting the first and last tasks. Spyder executes the first task by using dynamic program slicing and the fourth task by backward execution. This research focuses on the second task, reducing the search domain containing faults, referred to as fault localization. Several heuristics are presented here based on dynamic program slices and information obtained from testing. A family tree of the heuristics is constructed to study effective application of the heuristics. The relationships among the heuristics and the potential order of using them are also explored. A preliminary
Optimal incremental sorting
- In Proc. 8th Workshop on Algorithm Engineering and Experiments (ALENEX
, 2006
"... Let A be a set of size m. Obtaining the first k ≤ m elements of A in ascending order can be done in optimal O(m+k log k) time. We present an algorithm (online on k) which incrementally gives the next smallest element of the set, so that the first k elements are obtained in optimal time for any k. We ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Let A be a set of size m. Obtaining the first k ≤ m elements of A in ascending order can be done in optimal O(m+k log k) time. We present an algorithm (online on k) which incrementally gives the next smallest element of the set, so that the first k elements are obtained in optimal time for any k. We also give a practical algorithm with the same complexity on average, which improves in practice the existing online algorithm. As a direct application, we use our technique to implement Kruskal’s Minimum Spanning Tree algorithm, where our solution is competitive with the best current implementations. We finally show that our technique can be applied to several other problems, such as obtaining an interval of the sorted sequence and implementing heaps. 1

