Results 1  10
of
43
SpaceEfficient Framework for Topk String Retrieval Problems
"... Given a set D = {d1, d2,..., dD} of D strings of total length n, our task is to report the “most relevant” strings for a given query pattern P. This involves somewhat more advanced query functionality than the usual pattern matching, as some notion of “most relevant” is involved. In information retr ..."
Abstract

Cited by 40 (8 self)
 Add to MetaCart
(Show Context)
Given a set D = {d1, d2,..., dD} of D strings of total length n, our task is to report the “most relevant” strings for a given query pattern P. This involves somewhat more advanced query functionality than the usual pattern matching, as some notion of “most relevant” is involved. In information retrieval literature, this task is best achieved by using inverted indexes. However, inverted indexes work only for some predefined set of patterns. In the pattern matching community, the most popular patternmatching data structures are suffix trees and suffix arrays. However, a typical suffix tree search involves going through all the occurrences of the pattern over the entire string collection, which might be a lot more than the required relevant documents. The first formal framework to study such kind of retrieval problems was given by Muthukrishnan [25]. He considered two metrics for relevance: frequency and proximity. He took a thresholdbased approach on these metrics and gave data structures taking O(n log n) words of space. We study this problem in a slightly different framework of reporting the top k most relevant documents (in sorted order) under similar and more general relevance metrics. Our framework gives linear space data structure with optimal query times for arbitrary score functions. As a corollary, it improves the space utilization for the problems in [25] while maintaining optimal query performance. We also develop compressed variants of these data structures for several specific relevance metrics.
On the limits of cacheobliviousness
 IN PROC. 35TH ANNUAL ACM SYMPOSIUM ON THEORY OF COMPUTING
, 2003
"... In this paper, we present lower bounds for permuting and sorting in the cacheoblivious model. We prove that (1) I/O optimal cacheoblivious comparison based sorting is not possible without a tall cache assumption, and (2) there does not exist an I/O optimalcacheoblivious algorithm for permuting, ..."
Abstract

Cited by 39 (6 self)
 Add to MetaCart
(Show Context)
In this paper, we present lower bounds for permuting and sorting in the cacheoblivious model. We prove that (1) I/O optimal cacheoblivious comparison based sorting is not possible without a tall cache assumption, and (2) there does not exist an I/O optimalcacheoblivious algorithm for permuting, not even in the presence of a tall cache assumption.Our results for sorting show the existence of an inherent tradeoff in the cacheoblivious model between the strength of the tall cache assumption and the overhead for the case M >> B, and show that Funnelsort and recursive binary mergesort are optimal algorithms in the sense that they attain this tradeoff.
A skip list cookbook
, 1990
"... Skip lists are a probabilistic data structure that seem likely to supplant balanced trees as the implementation method of choice for many applications. Skip list algorithms have the same asymptotic expected time bounds as balanced trees and are simpler, faster and use less space. The original paper ..."
Abstract

Cited by 32 (1 self)
 Add to MetaCart
Skip lists are a probabilistic data structure that seem likely to supplant balanced trees as the implementation method of choice for many applications. Skip list algorithms have the same asymptotic expected time bounds as balanced trees and are simpler, faster and use less space. The original paper on skip lists only presented algorithms for search, insertion and deletion. In this paper, we show that skip lists are as versatile as balanced trees. We describe and analyze algorithms to use search fingers, merge, split and concatenate skip lists, and implement linear list operations using skip lists. The skip list algorithms for these actions are faster and simpler than their balanced tree cousins. The merge algorithm for skip lists we describe has better asymptotic time complexity than any previously described merge algorithm for balanced trees.
Finding maximal pairs with bounded gap
 Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1645 of Lecture Notes in Computer Science
, 1999
"... A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this pape ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this paper we present methods for finding all maximal pairs under various constraints on the gap. In a string of length n we can find all maximal pairs with gap in an upper and lower bounded interval in time O(n log n + z) where z is the number of reported pairs. If the upper bound is removed the time reduces to O(n+z). Since a tandem repeat is a pair where the gap is zero, our methods can be seen as a generalization of finding tandem repeats. The running time of our methods equals the running time of well known methods for finding tandem repeats.
Challenges and advances in parallel sparse matrixmatrix multiplication
 In The 37th International Conference on Parallel Processing (ICPP’08
, 2008
"... We identify the challenges that are special to parallel sparse matrixmatrix multiplication (PSpGEMM). We show that sparse algorithms are not as scalable as their dense counterparts, because in general, there are not enough nontrivial arithmetic operations to hide the communication costs as well as ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
We identify the challenges that are special to parallel sparse matrixmatrix multiplication (PSpGEMM). We show that sparse algorithms are not as scalable as their dense counterparts, because in general, there are not enough nontrivial arithmetic operations to hide the communication costs as well as the sparsity overheads. We analyze the scalability of 1D and 2D algorithms for PSpGEMM. While the 1D algorithm is a variant of existing implementations, 2D algorithms presented are completely novel. Most of these algorithms are based on the previous research on parallel dense matrix multiplication. We also provide results from preliminary experiments with 2D algorithms. 1
Unimodal Regression via Prefix Isotonic Regression
"... This paper gives optimal algorithms for determining realvalued univariate unimodal regressions, that is, for determining the optimal regression which is increasing and then decreasing. Such regressions arise in a wide variety of applications. They are shapeconstrained nonparametric regressions, cl ..."
Abstract

Cited by 14 (8 self)
 Add to MetaCart
(Show Context)
This paper gives optimal algorithms for determining realvalued univariate unimodal regressions, that is, for determining the optimal regression which is increasing and then decreasing. Such regressions arise in a wide variety of applications. They are shapeconstrained nonparametric regressions, closely related to isotonic regression. For unimodal regression on n weighted points our algorithm for the L_2 metric requires only &Theta;(n) time, while for the L_1 metric it requires &Theta;(n log n) time. For unweighted points our algorithm for the L_1 metric also requires only &Theta;(n) time. Previous algorithms were for the L_2 metric and required &Omega;(n&sup2;) time. All previous algorithms used multiple calls to isotonic regression, and our major contribution is to organize these into a prefix isotonic regression, determining the regression on all initial segments. The prefix approach reduces the total time required by utilizing the solution for one initial segment to solve the next.
A fast algorithm for optimal buffer insertion
 IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems
, 2005
"... Abstract—The classic buffer insertion algorithm of van Ginneken has time and space complexity ( 2), where is the number of possible buffer positions. For more than a decade, van Ginneken’s algorithm has been the foundation of buffer insertion. In this paper, we present a new algorithm that computes ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
Abstract—The classic buffer insertion algorithm of van Ginneken has time and space complexity ( 2), where is the number of possible buffer positions. For more than a decade, van Ginneken’s algorithm has been the foundation of buffer insertion. In this paper, we present a new algorithm that computes the same optimal buffer insertion, but runs much faster. For 2pin nets, our time complexity is ( log) and space complexity is (). For multipin nets, our time complexity is ( log2) and space complexity is ( log). The speedup is achieved by four novel techniques: predictive pruning, candidate tree, fast redundancy check, and fast merging. On industrial test cases, the new algorithms is 2–80 times faster than van Ginneken’s algorithm and uses 1 4–1 500 of the memory. Since van Ginneken’s algorithm and its variations are used by most existing algorithms on buffer insertion and buffer sizing, our new algorithm significantly improves the performance of all these algorithms. The predictive pruning technique has been applied to buffer cost minimization (Shi et al., 2004), and significantly improved the running time. Index Terms—Buffer insertion, data structure, Elmore delay, interconnect, routing. I.
Highly Parallel Sparse MatrixMatrix Multiplication
, 2010
"... Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on twodimensional block distribution of sparse matrices where serial sections use a novel hypersparse kernel for scalability. We give a stateoftheart MPI implementation of one of our algorithms. Our experiments show scaling up to thousands of processors on a variety of test scenarios.
Fast Set Intersection in Memory
"... Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worstcase efficient way. In general, given k (preprocessed) sets, with totally n elements ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Set intersection is a fundamental operation in information retrieval and database systems. This paper introduces linear space data structures to represent sets such that their intersection can be computed in a worstcase efficient way. In general, given k (preprocessed) sets, with totally n elements, we will show how to compute their intersection in expected time O(n / √ w + kr), where r is the intersection size and w is the number of bits in a machineword. In addition,we introduce a very simple version of this algorithm that has weaker asymptotic guarantees but performs even better in practice; both algorithms outperform the state of the art techniques for both synthetic and real data sets and workloads. 1.
Solving the string statistics problem in time O(n log n)
 Proc. 29th International Colloquium on Automata, Languages, and Programming
, 2002
"... The string statistics problem consists of preprocessing a string of length n such that given a query pattern of length m, the maximum number of nonoverlapping occurrences of the query pattern in the string can be reported efficiently... ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
The string statistics problem consists of preprocessing a string of length n such that given a query pattern of length m, the maximum number of nonoverlapping occurrences of the query pattern in the string can be reported efficiently...