Results 1  10
of
38
Fast computation of database operations using graphics processors
 Proc. of ACM SIGMOD
, 2004
"... We present new algorithms for performing fast computation of several common database operations on commodity graphics processors. Specifically, we consider operations such as conjunctive selections, aggregations, and semilinear queries, which are essential computational components of typical databa ..."
Abstract

Cited by 91 (16 self)
 Add to MetaCart
(Show Context)
We present new algorithms for performing fast computation of several common database operations on commodity graphics processors. Specifically, we consider operations such as conjunctive selections, aggregations, and semilinear queries, which are essential computational components of typical database, data warehousing, and data mining applications. While graphics processing units (GPUs) have been designed for fast display of geometric primitives, we utilize the inherent pipelining and parallelism, single instruction and multiple data (SIMD) capabilities, and vector processing functionality of GPUs, for evaluating boolean predicate combinations and semilinear queries on attributes and executing database operations efficiently. Our algorithms take into account some of the limitations of the programming model of current GPUs and perform no data rearrangements. Our algorithms have been implemented on a programmable GPU (e.g. NVIDIA’s GeForce FX 5900) and applied to databases consisting of up to a million records. We have compared their performance with an optimized implementation of CPUbased algorithms. Our experiments indicate that the graphics processor available on commodity computer systems is an effective coprocessor for performing database operations.
Efficient Computation of Frequent and Topk Elements in Data Streams
 IN ICDT
, 2005
"... We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and topk elements with tight guarantees on errors. For ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and topk elements with tight guarantees on errors. For general data distributions, our topk algorithm returns k elements that have roughly the highest frequencies; and it uses limited space for calculating frequent elements. For realistic Zipfian data, the space requirement of the proposed algorithm for solving the exact frequent elements problem decreases dramatically with the parameter of the distribution; and for topk queries, the analysis ensures that only the topk elements, in the correct order, are reported. The experiments, using real and synthetic data sets, show space reductions with no loss in accuracy. Having proved the effectiveness of the proposed approach through both analysis and experiments, we extend it to be able to answer continuous queries about frequent and topk elements. Although the problems of incremental reporting of frequent and topk elements are useful in many applications, to the best of our knowledge, no solution has been proposed.
Abbadi, “An integrated efficient solution for computing frequent and topk elements in data streams
 ACM Trans. Database Syst
, 2006
"... We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and topk elements with tight guarantees on errors. Fo ..."
Abstract

Cited by 32 (6 self)
 Add to MetaCart
We propose an approximate integrated approach for solving both problems of finding the most popular k elements, and finding frequent elements in a data stream coming from a large domain. Our solution is space efficient and reports both frequent and topk elements with tight guarantees on errors. For general data distributions, our topk algorithm returns k elements that have roughly the highest frequencies; and it uses limited space for calculating frequent elements. For realistic Zipfian data, the space requirement of the proposed algorithm for solving the exact frequent elements problem decreases dramatically with the parameter of the distribution; and for topk queries, the analysis ensures that only the topk elements, in the correct order, are reported. The experiments, using real and synthetic data sets, show space reductions with hardly any loss in accuracy. Having proved the effectiveness of the proposed approach through both analysis and experiments, we extend it to be able to answer continuous queries about frequent and topk elements. Although the problems of incremental reporting of frequent and topk elements are useful in many applications, to the best of our knowledge, no solution has been proposed.
Efficient Construction of Program Dependence Graphs
 ACM International Symposium on Software Testing and Analysis
, 1993
"... We present a new technique for constructing a program dependence graph that contains a program’s control flow, along with the usual control and data dependence information. Our algorithm constructs a program dependence graph while the program is being parsed. For programs containing only structured ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
We present a new technique for constructing a program dependence graph that contains a program’s control flow, along with the usual control and data dependence information. Our algorithm constructs a program dependence graph while the program is being parsed. For programs containing only structured transfers of control, our algorithm does not require information provided by the control flow graph or post dominator tree and therefore obviates the construction of these auxiliary graphs. For programs cent aining explicit transfers of control, our algorithm adjusts the partial control dependence subgraph, constructed during the parse, to incorporate exact control dependence information. There are several advantages to our approach. For many programs, our algorithm may result in substantial savings in time and memory since our construction of the program dependence graph does not require the auxiliary graph. Furthermore, since we incorporate control and data flow as well as exact control dependence information into the program dependence graph, our graph has a wide range of applicability. We have implemented our algorithm by incorporating it into the Free Software Foundation’s GNU C compiler; currently we are performing experiments that compare our technique with the traditional approach.
A Logical Analysis of Aliasing in Imperative HigherOrder Functions
 INTERNATIONAL CONFERENCE ON FUNCTIONAL PROGRAMMING, ICFP’05
, 2005
"... We present a compositional program logic for callbyvalue imperative higherorder functions with general forms of aliasing, which can arise from the use of reference names as function parameters, return values, content of references and part of data structures. The program logic ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
We present a compositional program logic for callbyvalue imperative higherorder functions with general forms of aliasing, which can arise from the use of reference names as function parameters, return values, content of references and part of data structures. The program logic
Quickselect and Dickman function
 Combinatorics, Probability and Computing
, 2000
"... We show that the limiting distribution of the number of comparisons used by Hoare's quickselect algorithm when given a random permutation of n elements for finding the mth smallest element, where m = o(n), is the Dickman function. The limiting distribution of the number of exchanges is also de ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
We show that the limiting distribution of the number of comparisons used by Hoare's quickselect algorithm when given a random permutation of n elements for finding the mth smallest element, where m = o(n), is the Dickman function. The limiting distribution of the number of exchanges is also derived. 1 Quickselect Quickselect is one of the simplest and e#cient algorithms in practice for finding specified order statistics in a given sequence. It was invented by Hoare [19] and uses the usual partitioning procedure of quicksort: choose first a partitioning key, say x; regroup the given sequence into two parts corresponding to elements whose values are less than and larger than x, respectively; then decide, according to the size of the smaller subgroup, which part to continue recursively or to stop if x is the desired order statistics; see Figure 1 for an illustration in terms of binary search trees. For more details, see Guibas [15] and Mahmoud [26]. This algorithm , although ine#cient in the worst case, has linear mean when given a sequence of n independent and identically distributed continuous random variables, or equivalently, when given a random permutation of n elements, where, here and throughout this paper, all n! permutations are equally likely. Let C n,m denote the number of comparisons used by quickselect for finding the mth smallest element in a random permutation, where the first partitioning stage uses n 1 comparisons. Knuth [23] was the first to show, by some di#erencing argument, that E(C n,m ) = 2 (n + 3 + (n + 1)H n (m + 2)Hm (n + 3 m)H n+1m ) , n, where Hm = 1#k#m k 1 . A more transparent asymptotic approximation is E(C n,m ) (#), (#) := 2 #), # Part of the work of this author was done while he was visiting School of C...
Topk spatial joins of probabilistic objects
 in ICDE
, 2008
"... Abstract — Probabilistic data have recently become popular in applications such as scientific and geospatial databases. For images and other spatial datasets, probabilistic values can capture the uncertainty in extent and class of the objects in the images. Relating one such dataset to another by sp ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Abstract — Probabilistic data have recently become popular in applications such as scientific and geospatial databases. For images and other spatial datasets, probabilistic values can capture the uncertainty in extent and class of the objects in the images. Relating one such dataset to another by spatial joins is an important operation for data management systems. We consider probabilistic spatial join (PSJ) queries, which rank the results according to a score that incorporates both the uncertainties associated with the objects and the distances between them. We present algorithms for two kinds of PSJ queries: Threshold PSJ queries, which return all pairs that score above a given threshold, and topk PSJ queries, which return the k topscoring pairs. For threshold PSJ queries, we propose a plane sweep algorithm that, because it exploits the special structure of the problem, runs in O(n (log n + k)) time, where n is the number of points and k is the number of results. We extend the algorithms to 2D data and to topk PSJ queries. To further speed up topk PSJ queries, we develop a scheduling technique that estimates the scores at the level of blocks, then hands the blocks to the plane sweep algorithm. By finding highscoring pairs early, the scheduling allows a large portion of the datasets to be pruned. Experiments demonstrate speedups of two orders of magnitude. I.
Linear time bounds for median computations
, 1971
"... New upper and lower bounds are presented for the maximum number of comparisons • f(i,n), required to select the ith largest of n numbers. An upper bound is found, by an analysis of a new selection algorithm, to be a linear function of n: f(i,n) ~ 103n/18 < 5.73n, for i < i < n. A lower bo ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
(Show Context)
New upper and lower bounds are presented for the maximum number of comparisons • f(i,n), required to select the ith largest of n numbers. An upper bound is found, by an analysis of a new selection algorithm, to be a linear function of n: f(i,n) ~ 103n/18 < 5.73n, for i < i < n. A lower bound is shown deductively to be: f(i,n)> n+ min(i,ni+l) + [log2(n)] 4, for 2 < i < ni, or, for the case of computing medians: f([n/2],n)>3n/23
Heuristics for automatic localization of software faults
, 1992
"... Developing effective debugging strategies to guarantee the reliability of software is important. By analyzing the debugging process used by experienced programmers, four distinct tasks are found to be consistently performed: (1) determining statements involved in program failures, (2) selecting susp ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Developing effective debugging strategies to guarantee the reliability of software is important. By analyzing the debugging process used by experienced programmers, four distinct tasks are found to be consistently performed: (1) determining statements involved in program failures, (2) selecting suspicious statements that might contain faults, (3) making hypotheses about suspicious faults (variables and locations), and (4) restoring program state to a specific statement for verification. If all four tasks could be performed with direct assistance from a debugging tool, the debugging effort would become much easier. We have built a prototype debugging tool, Spyder, to assist users in conducting the first and last tasks. Spyder executes the first task by using dynamic program slicing and the fourth task by backward execution. This research focuses on the second task, reducing the search domain containing faults, referred to as fault localization. Several heuristics are presented here based on dynamic program slices and information obtained from testing. A family tree of the heuristics is constructed to study effective application of the heuristics. The relationships among the heuristics and the potential order of using them are also explored. A preliminary
Modeling HighDimensional Index Structures using Sampling
 In Proc. ACM SIGMOD Int. Conf. on Management of Data
, 2001
"... A large number of index structures for highdimensional data have been proposed previously. In order to tune and compare such index structures, it is vital to have efficient cost prediction techniques for these structures. Previous techniques either assume uniformity of the data or are not applicabl ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
(Show Context)
A large number of index structures for highdimensional data have been proposed previously. In order to tune and compare such index structures, it is vital to have efficient cost prediction techniques for these structures. Previous techniques either assume uniformity of the data or are not applicable to highdimensional data. We propose the use of sampling to predict the number of accessed index pages during a query execution. Sampling is independent of the dimensionality and preserves clusters which is important for representing skewed data. We present a general model for estimating the index page layout using sampling and show how to compensate for errors. We then give an implementation of our model under restricted memory assumptions and show that it performs well even under these constraints. Errors are minimal and the overall prediction time is up to two orders of magnitude below the time for building and probing the full index without sampling. 1.