Results 11 - 20
of
95
A Fast Algorithm for Making Suffix Arrays and for Burrows-Wheeler Transformation
- IN PROCEEDINGS OF THE IEEE DATA COMPRESSION CONFERENCE, SNOWBIRD, UTAH, MARCH 30 - APRIL 1
, 1998
"... We propose a fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an arrayof indexes of suffixes is called suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
We propose a fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an arrayof indexes of suffixes is called suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desired. We compare
Fast lightweight suffix array construction and checking
- 14th Annual Symposium on Combinatorial Pattern Matching
, 2003
"... We describe an algorithm that, for any v 2 [2; n], constructs the suffix array of a string of length n in O(vn + n log n) time using O(v + n= p v) space in addition to the input (the string) and the output (the suffix array). By setting v = log n, we obtain an O(n log n) time algorithm using O n= p ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
We describe an algorithm that, for any v 2 [2; n], constructs the suffix array of a string of length n in O(vn + n log n) time using O(v + n= p v) space in addition to the input (the string) and the output (the suffix array). By setting v = log n, we obtain an O(n log n) time algorithm using O n= p
In-memory Hash Tables for Accumulating Text Vocabularies
- Information Processing Letters
, 2001
"... this paper we experimentally evaluate the performance of several data structures for building vocabularies, using a range of data collections and machines. Given the well-known properties of text and some initial experimentation, we chose to focus on the most promising candidates, splay trees and ch ..."
Abstract
-
Cited by 24 (9 self)
- Add to MetaCart
this paper we experimentally evaluate the performance of several data structures for building vocabularies, using a range of data collections and machines. Given the well-known properties of text and some initial experimentation, we chose to focus on the most promising candidates, splay trees and chained hash tables, also reporting results with binary trees. Of these, our experiments show that hash tables are by a considerable margin the most e#cient. We propose and measure a refinement to hash tables, the use of move-to-front lists. This refinement is remarkably e#ective: as we show, using a small table in which there are large numbers of strings in each chain has only limited impact on performance. Moving frequentlyaccessed words to the front of the list has the surprising property that the vast majority of accesses are to the first or second node. For example, our experiments show that in a typical case a table with an average of around 80 strings per slot is only 10%--40% slower than a table with around one string per slot---while a table without move-to-front is perhaps 40% slower again---and is still over three times faster than using a tree. We show, moreover, that a move-to-front hash table of fixed size is more e#cient in space and time than a hash table that is dynamically doubled in size to maintain a constant load average
On Sorting Strings in External Memory
, 1997
"... ) Lars Arge Paolo Ferragina y Roberto Grossi z Jeffrey Scott Vitter x Abstract. In this paper we address for the first time the I/O complexity of the problem of sorting strings in external memory, which is a fundamental component of many large-scale text applications. In the standard unit-cost RAM c ..."
Abstract
-
Cited by 23 (12 self)
- Add to MetaCart
) Lars Arge Paolo Ferragina y Roberto Grossi z Jeffrey Scott Vitter x Abstract. In this paper we address for the first time the I/O complexity of the problem of sorting strings in external memory, which is a fundamental component of many large-scale text applications. In the standard unit-cost RAM comparison model, the complexity of sorting K strings of total length N is \Theta(K log 2 K+N). By analogy, in the external memory (or I/O) model, where the internal memory has size M and the block transfer size is B, it would be natural to guess that the I/O complexity of sorting strings is \Theta( K B log M=B K B + N B ), but the known algorithms do not come even close to achieving this bound. Our results show, somewhat counterintuitively, that the I/O complexity of string sorting depends upon the length of the strings relative to the block size. We first consider a simple comparison I/O model, where one is not allowed to break the strings into their characters, and we sho...
The Analysis of Hybrid Trie Structures
, 1998
"... This paper provides a detailed analysis of various implementations of digital tries, including the “ternary search tries” of Bentley and Sedgewick. The methods employed combine symbolic uses of generating functions, Poisson models, and MeIlin transforms. Theoretical results are matched against real- ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
This paper provides a detailed analysis of various implementations of digital tries, including the “ternary search tries” of Bentley and Sedgewick. The methods employed combine symbolic uses of generating functions, Poisson models, and MeIlin transforms. Theoretical results are matched against real-life data and justify the claim that ternary search tries are a highly efficient dynamic dictionary structure for strings and textual data.
Modifications of the Burrows and Wheeler Data Compression Algorithm
- Proceedings of the ieee Data Compression Conference
, 1999
"... this paper we improve upon these previous results on the BW-algorithm. Based on the context tree model, we consider the specific statistical properties of the data at the output of the BWT. We describe six important properties, three of which have not been described elsewhere. These considerations l ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
this paper we improve upon these previous results on the BW-algorithm. Based on the context tree model, we consider the specific statistical properties of the data at the output of the BWT. We describe six important properties, three of which have not been described elsewhere. These considerations lead to modifications of the coding method, which in turn improve the coding efficiency. We shortly describe how to compute the BWT with low complexity in time and space, using suffix trees in two different representations. Finally, we present experimental results about the compression rate and running time of our method, and compare these results to previous achievements. More references on the methods described in this paper can be found in [1, 5].
Burst Tries: A Fast, Efficient Data Structure for String Keys
- ACM Transactions on Information Systems
, 2002
"... Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, t ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it requires no more memory than a binary tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or near-sorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
An asymptotic theory for Cauchy-Euler differential equations with applications to the analysis of algorithms
, 2002
"... Cauchy-Euler differential equations surfaced naturally in a number of sorting and searching problems, notably in quicksort and binary search trees and their variations. Asymptotics of coefficients of functions satisfying such equations has been studied for several special cases in the literature. We ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
Cauchy-Euler differential equations surfaced naturally in a number of sorting and searching problems, notably in quicksort and binary search trees and their variations. Asymptotics of coefficients of functions satisfying such equations has been studied for several special cases in the literature. We study in this paper the most general framework for Cauchy-Euler equations and propose an asymptotic theory that covers almost all applications where Cauchy-Euler equations appear. Our approach is very general and requires almost no background on differential equations. Indeed the whole theory can be stated in terms of recurrences instead of functions. Old and new applications of the theory are given. New phase changes of limit laws of new variations of quicksort are systematically derived. We apply our theory to about a dozen of diverse examples in quicksort, binary search trees, urn models, increasing trees, etc.
Implementing Radixsort
- ACM Jour. of Experimental Algorithmics
, 1998
"... We present and evaluate several new optimization and implementation techniques for string sorting. In particular, we study a recently published radix sorting algorithm, Forward radixsort, that has a provably good worst-case behavior. Our experimental results indicate that radix sorting is considerab ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
We present and evaluate several new optimization and implementation techniques for string sorting. In particular, we study a recently published radix sorting algorithm, Forward radixsort, that has a provably good worst-case behavior. Our experimental results indicate that radix sorting is considerably faster (often more than twice as fast) than comparison-based sorting methods. This is true even for small input sequences. We also show that it is possible to implement a radix sort with good worst-case running time without sacrificing average-case performance. Our implementations are competitive with the best previously published string sorting algorithms. Code, test data, and test results are available from the World Wide Web. 1. Introduction Radix sorting is a simple and very efficient sorting method that has received too little attention. A common misconception is that a radix sorting algorithm either has to inspect all the characters of the input or use an inordinate amount of extra...
TRP++ 2.0: A temporal resolution prover
- In Proc. CADE-19, LNAI
, 2003
"... Temporal logics are extensions of classical logic with operators that deal with time. They have been used in a wide variety of areas within Computer Science and Artificial Intelligence, for example robotics [14], databases [15], hardware verification [8] and agent-based systems [12]. ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
Temporal logics are extensions of classical logic with operators that deal with time. They have been used in a wide variety of areas within Computer Science and Artificial Intelligence, for example robotics [14], databases [15], hardware verification [8] and agent-based systems [12].

