Results 11  20
of
273
Compressed text databases with efficient query algorithms based on the compressed suffix array
 Proceedings of ISAAC'00, number 1969 in LNCS
, 2000
"... A compressed text database based on the compressed suffix array is proposed. The compressed su#x array of Grossi and Vitter occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies O(n log bits for the alphabet #. On the other hand, our data structure does n ..."
Abstract

Cited by 61 (3 self)
 Add to MetaCart
A compressed text database based on the compressed suffix array is proposed. The compressed su#x array of Grossi and Vitter occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies O(n log bits for the alphabet #. On the other hand, our data structure does not use the text itself, and supports important operations for text databases: inverse, search and decompress. Our algorithms can find occ occurrences of any substring P of the text
Compressing Integers for Fast File Access
 The Computer Journal
, 1999
"... this paper we show experimentally that, for large or small collections, storing integers in a compressed format reduces the time required for either sequential stream access or random access. We compare di#erent approaches to compressing integers, including the Elias gamma and delta codes, Golom ..."
Abstract

Cited by 59 (14 self)
 Add to MetaCart
this paper we show experimentally that, for large or small collections, storing integers in a compressed format reduces the time required for either sequential stream access or random access. We compare di#erent approaches to compressing integers, including the Elias gamma and delta codes, Golomb coding, and a variablebyte integer scheme. As a conclusion, we recommend that, for fast access to integers, files be stored compressed
A Subquadratic Sequence Alignment Algorithm for Unrestricted Cost Matrices
, 2002
"... The classical algorithm for computing the similarity between two sequences [36, 39] uses a dynamic programming matrix, and compares two strings of size n in O(n 2 ) time. We address the challenge of computing the similarity of two strings in subquadratic time, for metrics which use a scoring ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
The classical algorithm for computing the similarity between two sequences [36, 39] uses a dynamic programming matrix, and compares two strings of size n in O(n 2 ) time. We address the challenge of computing the similarity of two strings in subquadratic time, for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both local and global alignment computations. The speedup is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by LempelZiv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an O(n 2 = log n) algorithm for an input of constant alphabet size. For most texts, the time complexity is actually O(hn 2 = log n) where h 1 is the entropy of the text. Institut GaspardMonge, Universite de MarnelaVallee, Cite Descartes, ChampssurMarne, 77454 MarnelaVallee Cedex 2, France, email: mac@univmlv.fr. y Department of Computer Science, Haifa University, Haifa 31905, Israel, phone: (9724) 8240103, FAX: (9724) 8249331; Department of Computer and Information Science, Polytechnic University, Six MetroTech Center, Brooklyn, NY 112013840; email: landau@poly.edu; partially supported by NSF grant CCR0104307, by NATO Science Programme grant PST.CLG.977017, by the Israel Science Foundation (grants 173/98 and 282/01), by the FIRST Foundation of the Israel Academy of Science and Humanities, and by IBM Faculty Partnership Award. z Department of Computer Science, Haifa University, Haifa 31905, Israel; On Education Leave from the IBM T.J.W. Research Center; email: michal@cs.haifa.il; partially supported by by the Israel Science Foundation (grants 173/98 and 282/01), and by the FIRST Foundation of the Israel Academy of Science ...
Compact Encodings of Planar Graphs via Canonical Orderings and Multiple Parentheses
, 1998
"... . We consider the problem of coding planar graphs by binary strings. Depending on whether O(1)time queries for adjacency and degree are supported, we present three sets of coding schemes which all take linear time for encoding and decoding. The encoding lengths are significantly shorter than th ..."
Abstract

Cited by 50 (11 self)
 Add to MetaCart
. We consider the problem of coding planar graphs by binary strings. Depending on whether O(1)time queries for adjacency and degree are supported, we present three sets of coding schemes which all take linear time for encoding and decoding. The encoding lengths are significantly shorter than the previously known results in each case. 1 Introduction This paper investigates the problem of encoding a graph G with n nodes and m edges into a binary string S. This problem has been extensively studied with three objectives: (1) minimizing the length of S, (2) minimizing the time needed to compute and decode S, and (3) supporting queries efficiently. A number of coding schemes with different tradeoffs have been proposed. The adjacencylist encoding of a graph is widely useful but requires 2mdlog ne bits. (All logarithms are of base 2.) A folklore scheme uses 2n bits to encode a rooted nnode tree into a string of n pairs of balanced parentheses. Since the total number of such trees is...
Adding Compression to Block Addressing Inverted Indexes
, 2000
"... . Inverted index compression, block addressing and sequential search on compressed text are three techniques that have been separately developed for efficient, lowoverhead text retrieval. Modern text compression techniques can reduce the text to less than 30% of its size and allow searching it dire ..."
Abstract

Cited by 49 (28 self)
 Add to MetaCart
. Inverted index compression, block addressing and sequential search on compressed text are three techniques that have been separately developed for efficient, lowoverhead text retrieval. Modern text compression techniques can reduce the text to less than 30% of its size and allow searching it directly and faster than the uncompressed text. Inverted index compression obtains significant reduction of their original size at the same processing speed. Block addressing makes the inverted lists point to text blocks instead of exact positions and pay the reduction in space with some sequential text scanning. In this work we combine the three ideas in a single scheme. We present a compressed inverted file that indexes compressed text and uses block addressing. We consider different techniques to compress the index and study their performance with respect to the block size. We compare the index against three separate techniques for varying block sizes, showing that our index is superior to each isolated approach. For instance, with just 4% of extra space overhead the index has to scan less than 12% of the text for exact searches and about 20% allowing one error in the matches. Keywords: Text compression, inverted files, block addressing, text databases. 1.
Indexing and Retrieval for Genomic Databases
 IEEE Transactions on Knowledge and Data Engineering
, 2002
"... Genomic sequence databases are widely used by molecular biologists for homology searching. Aminoacid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationall ..."
Abstract

Cited by 45 (6 self)
 Add to MetaCart
Genomic sequence databases are widely used by molecular biologists for homology searching. Aminoacid and nucleotide databases are increasing in size exponentially, and mean sequence lengths are also increasing. In searching such databases, it is desirable to use heuristics to perform computationally intensive local alignments on selected sequences only and to reduce the costs of the alignments that are attempted. We present an indexbased approach for both selecting sequences that display broad similarity to a query and for fast local alignment. We show experimentally that the indexed approach results in signi cant savings in computationally intensive local alignments, and that indexbased searching is as accurate as existing exhaustive search schemes.
Short Encodings of Planar Graphs and Maps
 Discrete Applied Mathematics
, 1993
"... We discuss spaceefficient encoding schemes for planar graphs and maps. Our results improve on the constants of previous schemes and can be achieved with simple encoding algorithms. They are nearoptimal in number of bits per edge. 1 Introduction In this paper we discuss spaceefficient binary enco ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
We discuss spaceefficient encoding schemes for planar graphs and maps. Our results improve on the constants of previous schemes and can be achieved with simple encoding algorithms. They are nearoptimal in number of bits per edge. 1 Introduction In this paper we discuss spaceefficient binary encoding schemes for several classes of unlabeled connected planar graphs and maps. In encoding a graph we must encode the incidences among vertexes and edges. By maps we understand topological equivalence classes of planar embeddings of planar graphs. In encoding a map we are required to encode the topology of the embedding i.e., incidences among faces, edges, and vertexes, as well as the graph. Each map is an embedding of a unique graph, but a given graph may have multiple embeddings. Hence maps must require more bits to encode than graphs in some average sense. There are a number of recent results on spaceefficient encoding. A standard adjacency list encoding of an unlabeled graph G requires...