Results 1 - 10
of
78
q-gram based database searching using a suffix array
- QUASAR). Proceedings of the third annual international conference on Computational molecular biology (Recomb 99
, 1999
"... With the increasing amount of DNA sequence information deposited in public databases, searching for similarity to a query sequence has become a basic operation in molecular biology. But even today’s fast algorithms reach their limits when applied to all-versus-all comparisons of large databases. Her ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
With the increasing amount of DNA sequence information deposited in public databases, searching for similarity to a query sequence has become a basic operation in molecular biology. But even today’s fast algorithms reach their limits when applied to all-versus-all comparisons of large databases. Here we present a new database searching algorithm called QUASAR (Q-gram Alignment based on Suffix ARrays) which was designed to quickly detect sequences with strong similarity to the query in a context where many searches are conducted on one database. Our algorithm applies a modification of q-tuple filtering implemented on top of a suffix array. Two versions were developed, one for a RAM resident suffix array and one for access to the suffix array on disk. We compared our implementation with BLAST and found that our approach is an order of magnitude faster. It is, however, restricted to the search for strongly similar DNA sequences as is typically required, e.g., in the context of clustering expressed sequence tags (ESTs). 1
The Niagara Internet Query System
- IEEE Data Engineering Bulletin
, 2001
"... Recently, there has been a great deal of research into XML query languages to enable the execution of database-style queries over XML files. However, merely being an XML query-processing engine does not render a system suitable for querying the Internet. A useful system must provide mechanisms to (a ..."
Abstract
-
Cited by 58 (9 self)
- Add to MetaCart
Recently, there has been a great deal of research into XML query languages to enable the execution of database-style queries over XML files. However, merely being an XML query-processing engine does not render a system suitable for querying the Internet. A useful system must provide mechanisms to (a) find the XML files that are relevant to a given query, and (b) deal with remote data sources that either provide unpredictable data access and transfer rates, or are infinite streams, or both. The Niagara Internet Query System was designed from the bottom-up to provide these mechanisms. In this article we describe the overall Niagara architecture, and how Niagara finds relevant XML documents by using a collaboration between the Niagara XML-QL query processor and the Niagara “text-in-context ” XML search engine. The Niagara Internet Query System is public domain software that can be found at
Deterministic Sorting and Randomized Median Finding on the BSP model
, 1996
"... We present new BSP algorithms for deterministic sorting and randomized median finding. We sort n general keys by using a partitioning scheme that achieves the requirements of efficiency (one-optimality) and insensitivity against data skew (the accuracy of the splitting keys depends solely on the ste ..."
Abstract
-
Cited by 45 (22 self)
- Add to MetaCart
We present new BSP algorithms for deterministic sorting and randomized median finding. We sort n general keys by using a partitioning scheme that achieves the requirements of efficiency (one-optimality) and insensitivity against data skew (the accuracy of the splitting keys depends solely on the step distance, which can be adapted to meet the worstcase requirements of our application). Although we employ sampling in order to realize efficiency, we can give a precise worst-case estimation of the maximum imbalance which might occur. We also investigate optimal randomized BSP algorithms for the problem of finding the median of n elements that require, with high-probability, 3n=(2p) + o(n=p) number of comparisons, for a wide range of values of n and p. Experimental results for the two algorithms are also presented.
Generalizing “Search” in Generalized Search Trees
- in Proceedings of the 14th International Conference on Data Engineering
, 1998
"... The generalized search tree, or GiST, defines a framework of basic interfaces required to construct a hierarchical access method for database systems. As originally specified, GiST only supports record selection. We show how a small number of additional interfaces enable GiST to support a much large ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
The generalized search tree, or GiST, defines a framework of basic interfaces required to construct a hierarchical access method for database systems. As originally specified, GiST only supports record selection. We show how a small number of additional interfaces enable GiST to support a much larger class of operations. Members of this class, which includes, nearest-neighbor and ranked search, user-defined aggregation and index-assisted selectivity estimation, are increasingly common in new database applications. The advantages of implementing these operations in the GiST framework include reduction of user development effort and the ability to use industrial strength concurrency and recovery mechanisms provided by expert implementers
Sorting And Indexing With Partitioned B-Trees
"... Partitioning within a B-tree, based on an artificial leading key column and combined with online reorganization, can be exploited during external merge sort for accurate deep read-ahead and dynamic resource allocation, during index creation for a reduced delay until the first query can search t ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
Partitioning within a B-tree, based on an artificial leading key column and combined with online reorganization, can be exploited during external merge sort for accurate deep read-ahead and dynamic resource allocation, during index creation for a reduced delay until the first query can search the new index, during data loading for streaming integration of new data into a fully indexed database, and for miscellaneous other operations.
Selection on the Bulk-Synchronous Parallel Model with Applications to Priority Queues
- IN PROCEEDINGS OF THE 1996 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS
, 1996
"... In this paper we present a new randomized selection algorithm on the Bulk-Synchronous Parallel (BSP) model of computation, along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our methods improve previous results upon both the c ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
In this paper we present a new randomized selection algorithm on the Bulk-Synchronous Parallel (BSP) model of computation, along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our methods improve previous results upon both the communication requirements and the amount of parallel slack required to achieve optimal performance. We also establish that optimality to within small multiplicative constant factors can be achieved for a wide range of parallel machines. While these algorithms are fairly simple themselves, descriptions of their performance in terms of the BSP parameters is somewhat involved. The main reward of quantifying these complications is that it allows transportable software to be written for parallel machines that fit the model. We also present experimental results for the selection algorithm that reinforce our claims. 1 Introduction and the BSP Model The main technical contribution of this work is ...
Communication Efficient Data Structures on the BSP model with Applications
- IN PROCEEDINGS OF EUROPAR'96
, 1996
"... The implementation of data structures on distributed memory models such as the Bulk-Synchronous Parallel (BSP) model, rather than shared memory ones such as the Parallel Random Access Machine (PRAM), offers a serious challenge. In this work we undertake the architecture independent study of the comp ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
The implementation of data structures on distributed memory models such as the Bulk-Synchronous Parallel (BSP) model, rather than shared memory ones such as the Parallel Random Access Machine (PRAM), offers a serious challenge. In this work we undertake the architecture independent study of the computation and communication requirements of searching ordered h-level graphs, which include many of the standard data structures. We propose multi-way search as a general tool for the design, analysis and implementation of BSP algorithms. This technique allows elegant high-level design and analysis of algorithms, using data structures similar to those of sequential models. Applications to computational geometry and sorting are also presented. In particular, our new randomized sorting algorithm improves previously known BSP randomized sorting algorithms upon the amount of parallel slackness required to achieve optimality. Moreover, our methods are within a 1 + o(1) multiplicative factor of the ...
Primitive Operations on the BSP Model
, 1996
"... The design of a complex algorithm relies heavily on a set of primitive operations and the instruments required to compile these operations into an algorithm. In this work, we examine some of these basic primitive operations and present algorithms that are suitable for the Bulk-Synchronous Parallel m ..."
Abstract
-
Cited by 17 (14 self)
- Add to MetaCart
The design of a complex algorithm relies heavily on a set of primitive operations and the instruments required to compile these operations into an algorithm. In this work, we examine some of these basic primitive operations and present algorithms that are suitable for the Bulk-Synchronous Parallel model. In particular, we consider algorithms for the following primitive operations: broadcasting, parallelprefix, merging, generalized and integer sorting. While our algorithms are fairly simple themselves, description of their performance in terms of the BSP parameters is somewhat complicated. The main reward for quantifying these complications, is that it enables software to be written once and for all that can be migrated efficiently among a variety of parallel machines.
A Randomized Sorting Algorithm on the BSP model
- IN PROCEEDINGS OF IPPS
, 1997
"... We present a new randomized sorting algorithm on the Bulk-SynchronousParallel (BSP) model. The algorithm improves upon the parallel slack of previous algorithms to achieve optimality. Tighter probabilistic bounds are also established. It uses sample sorting and utilizes recently introduced search al ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
We present a new randomized sorting algorithm on the Bulk-SynchronousParallel (BSP) model. The algorithm improves upon the parallel slack of previous algorithms to achieve optimality. Tighter probabilistic bounds are also established. It uses sample sorting and utilizes recently introduced search algorithms for a class of data structures on the BSP model. Moreover, our methods are within a 1+o(1) multiplicative factor of the respective sequential methods in terms of speedup for a wide range of the BSP parameters.

