Results 1 
6 of
6
Database indexing for large DNA and protein sequence collections
, 2002
"... Our aim is to develop new database technologies for the approximate matching of unstructured string data using indexes. We explore the potential of the suffix tree data structure in this context. We present a new method of building suffix trees, allowing us to build trees in excess of RAM size, whic ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
Our aim is to develop new database technologies for the approximate matching of unstructured string data using indexes. We explore the potential of the suffix tree data structure in this context. We present a new method of building suffix trees, allowing us to build trees in excess of RAM size, which has hitherto not been possible. We show that this method performs in practice as well as the O(n) method of Ukkonen [70]. Using this method we build indexes for 200Mb of protein and 300Mbp of DNA, whose diskimage exceeds the available RAM. We show experimentally that suffix trees can be effectively used in approximate string matching with biological data. For a range of query lengths and error bounds the suffix tree reduces the size of the unoptimised O(mn) dynamic programming calculation required in the evaluation of string similarity, and the gain from indexing increases with index size. In the indexes we built this reduction is significant, and less than 0.3% of the expected matrix is evaluated. We detail the requirements for further database and algorithmic research to support efficient use of large suffix indexes in biological applications.
Suffix Binary Search Trees and Suffix Arrays
, 2001
"... Suffix arrays and suffix binary search trees are two data structures that have been proposed as alternatives to the classical suffix tree to facilitate efficient online string searching. Here, we explore the relationship between these two structures. In particular, we present an alternative vie ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Suffix arrays and suffix binary search trees are two data structures that have been proposed as alternatives to the classical suffix tree to facilitate efficient online string searching. Here, we explore the relationship between these two structures. In particular, we present an alternative view of a suffix array, with its auxiliary information, as a perfectly balanced suffix binary search tree, and describe an elegant and efficient algorithm to construct the suffix array and its auxiliaries from an arbitrary suffix binary search tree. 1
The SBCTree: An Index for RunLength Compressed Sequences
"... RunLengthEncoding (RLE) is a data compression technique that is used in various applications, e.g., time series, biological sequences, and multimedia databases. One of the main challenges is how to operate on (e.g., index, search, and retrieve) compressed data without decompressing it. In this pap ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
RunLengthEncoding (RLE) is a data compression technique that is used in various applications, e.g., time series, biological sequences, and multimedia databases. One of the main challenges is how to operate on (e.g., index, search, and retrieve) compressed data without decompressing it. In this paper, we introduce the String Btree for Compressed sequences, termed the SBCtree, for indexing and searching RLEcompressed sequences of arbitrary length. The SBCtree is a twolevel index structure based on the wellknown String Btree and a 3sided range query structure [7]. The SBCtree supports pattern matching queries such as substring matching, prefix matching, and range search operations over RLEcompressed sequences. The SBCtree has an optimal externalmemory space complexity of O(N/B) pages, where N is the total length of the compressed sequences, and B is the disk page size. Substring matching, prefix matching, and range search execute in an optimal O(logB N + p+T) I/O operations, where p  is the
A General Technique for Managing Strings in ComparisonDriven Data Structures
"... Abstract. This paper presents a general technique for optimally transforming any dynamic data structure D that operates on atomic and indivisible keys by constanttime comparisons, into a data structure D ′ that handles unboundedlength keys whose comparison cost is not a constant. 1 ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. This paper presents a general technique for optimally transforming any dynamic data structure D that operates on atomic and indivisible keys by constanttime comparisons, into a data structure D ′ that handles unboundedlength keys whose comparison cost is not a constant. 1
Using Treaps for Optimization of Graph Storage
"... Adjacency matrix is an effective technique used to represent a graph or a Social network comprising of large number of vertices and edges. The intent is of this paper is to optimize the graph storage and mapping without using a large adjacency matrix to represent a large graph. A special data struct ..."
Abstract
 Add to MetaCart
Adjacency matrix is an effective technique used to represent a graph or a Social network comprising of large number of vertices and edges. The intent is of this paper is to optimize the graph storage and mapping without using a large adjacency matrix to represent a large graph. A special data structure Treap, a combination of binary search tree and heaps has been used as a replacement to a large adjacency matrix. It has been experimentally evaluated that the proposed approach significantly improves the space occupied by adjacency matrix and helps the graph to grow dynamically without affecting the current data structure.