Results 1  10
of
31
On RAM priority queues
, 1996
"... Priority queues are some of the most fundamental data structures. They are used directly for, say, task scheduling in operating systems. Moreover, they are essential to greedy algorithms. We study the complexity of priority queue operations on a RAM with arbitrary word size. We present exponential i ..."
Abstract

Cited by 70 (9 self)
 Add to MetaCart
Priority queues are some of the most fundamental data structures. They are used directly for, say, task scheduling in operating systems. Moreover, they are essential to greedy algorithms. We study the complexity of priority queue operations on a RAM with arbitrary word size. We present exponential improvements over previous bounds, and we show tight relations to sorting. Our first result is a RAM priority queue supporting insert and extractmin operations in worst case time O(log log n) where n is the current number of keys in the queue. This is an exponential improvement over the O( p log n) bound of Fredman and Willard from STOC'90. Our algorithm is simple, and it only uses AC 0 operations, meaning that there is no hidden time dependency on the word size. Plugging this priority queue into Dijkstra's algorithm gives an O(m log log m) algorithm for the single source shortest path problem on a graph with m edges, as compared with the previous O(m p log m) bound based on Fredman...
Efficient Implementation of Suffix Trees
, 1995
"... this article we discuss how the suffix tree can be used for string searching ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
this article we discuss how the suffix tree can be used for string searching
On Sorting Strings in External Memory
, 1997
"... ) Lars Arge Paolo Ferragina y Roberto Grossi z Jeffrey Scott Vitter x Abstract. In this paper we address for the first time the I/O complexity of the problem of sorting strings in external memory, which is a fundamental component of many largescale text applications. In the standard unitcost RAM c ..."
Abstract

Cited by 27 (12 self)
 Add to MetaCart
) Lars Arge Paolo Ferragina y Roberto Grossi z Jeffrey Scott Vitter x Abstract. In this paper we address for the first time the I/O complexity of the problem of sorting strings in external memory, which is a fundamental component of many largescale text applications. In the standard unitcost RAM comparison model, the complexity of sorting K strings of total length N is \Theta(K log 2 K+N). By analogy, in the external memory (or I/O) model, where the internal memory has size M and the block transfer size is B, it would be natural to guess that the I/O complexity of sorting strings is \Theta( K B log M=B K B + N B ), but the known algorithms do not come even close to achieving this bound. Our results show, somewhat counterintuitively, that the I/O complexity of string sorting depends upon the length of the strings relative to the block size. We first consider a simple comparison I/O model, where one is not allowed to break the strings into their characters, and we sho...
A Fast Algorithm for Making Suffix Arrays and for BurrowsWheeler Transformation
 IN PROCEEDINGS OF THE IEEE DATA COMPRESSION CONFERENCE, SNOWBIRD, UTAH, MARCH 30  APRIL 1
, 1998
"... We propose a fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an arrayof indexes of suffixes is called suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
We propose a fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an arrayof indexes of suffixes is called suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the BurrowsWheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desired. We compare
Implementing Radixsort
 ACM Jour. of Experimental Algorithmics
, 1998
"... We present and evaluate several new optimization and implementation techniques for string sorting. In particular, we study a recently published radix sorting algorithm, Forward radixsort, that has a provably good worstcase behavior. Our experimental results indicate that radix sorting is considerab ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
We present and evaluate several new optimization and implementation techniques for string sorting. In particular, we study a recently published radix sorting algorithm, Forward radixsort, that has a provably good worstcase behavior. Our experimental results indicate that radix sorting is considerably faster (often more than twice as fast) than comparisonbased sorting methods. This is true even for small input sequences. We also show that it is possible to implement a radix sort with good worstcase running time without sacrificing averagecase performance. Our implementations are competitive with the best previously published string sorting algorithms. Code, test data, and test results are available from the World Wide Web. 1. Introduction Radix sorting is a simple and very efficient sorting method that has received too little attention. A common misconception is that a radix sorting algorithm either has to inspect all the characters of the input or use an inordinate amount of extra...
CacheConscious Sorting of Large Sets of Strings with Dynamic Tries
"... Ongoing changes in computer performance are affecting the efficiency of string sorting algorithms. The size of main memory in typical computers continues to grow, but memory accesses require increasing numbers of instruction cycles, which is a problem for the most efficient of the existing stringso ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
Ongoing changes in computer performance are affecting the efficiency of string sorting algorithms. The size of main memory in typical computers continues to grow, but memory accesses require increasing numbers of instruction cycles, which is a problem for the most efficient of the existing stringsorting algorithms as they do not utilise cache particularly well for large data sets. We propose a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets. It is simple, fast, and efficient. We experimentally compare burstsort to existing stringsorting algorithms on large and small sets of strings with a range of characteristics. These experiments show that, for large sets of strings, burstsort is almost twice as fast as any previous algorithm, due primarily to a lower rate of cache miss.
Unifying Text Search And Compression  Suffix Sorting, Block Sorting and Suffix Arrays
, 2000
"... Today many electronic documents are available such as articles of newspapers, dictionaries, books, DNA sequences, etc. and they are stored in databases. We also have many documents on the Internet and have many email documents. Therefore, fast queries on such huge amount of documents and their comp ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Today many electronic documents are available such as articles of newspapers, dictionaries, books, DNA sequences, etc. and they are stored in databases. We also have many documents on the Internet and have many email documents. Therefore, fast queries on such huge amount of documents and their compression to reduce costs for storing or transferring them are important. In this thesis, a unified method for improving efficiency of search and compression for huge text data is proposed. All search methods and compression methods used in this thesis are related to a data structure called suffix array. The suffix array is a text search data structure and it is used in a text compression method called block sorting. Both are promising search method and compression method and there are many studies on the methods. Now a data structure called inverted file is used for queries from huge amount of documents. Though it is widely used, query unit is a document in order to reduce disk space to sto...
Improvements to the BurrowsWheeler Compression Algorithm: After BWT Stages
, 2003
"... ... This article describes improved algorithms for the run length encoding, inversion frequencies and weighted frequency count stages that follow the BurrowsWheeler Transform. Results for compression rates are presented for different variations of the algorithm together with compression and decompr ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
... This article describes improved algorithms for the run length encoding, inversion frequencies and weighted frequency count stages that follow the BurrowsWheeler Transform. Results for compression rates are presented for different variations of the algorithm together with compression and decompression times. Finally, an implementation with a compression rate of 2.238 bps on the Calgary Corpus is introduced, which is the best result published in this field to date.
Generic Discrimination  Sorting and Partitioning Unshared Data in Linear Time
, 2008
"... We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that worstcase lineartime discrimination functions (discriminators) can be defined generically, by (co)induction on an expressive language of order denotations. The generic definition yields di ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that worstcase lineartime discrimination functions (discriminators) can be defined generically, by (co)induction on an expressive language of order denotations. The generic definition yields discriminators that generalize both distributive sorting and multiset discrimination. The generic discriminator can be coded compactly using list comprehensions, with order denotations specified using Generalized Algebraic Data Types (GADTs). A GADTfree combinator formulation of discriminators is also given. We give some examples of the uses of discriminators, including a new mostsignificantdigit lexicographic sorting algorithm. Discriminators generalize binary comparison functions: They operate on n arguments at a time, but do not expose more information than the underlying equivalence, respectively ordering relation on the arguments. We argue that primitive types with equality (such as references in ML) and ordered types (such as the machine integer type), should expose their equality, respectively standard ordering relation, as discriminators: Having only a binary equality test on a type requires Θ(n 2) time to find all the occurrences of an element in a list of length n, for each element in the list, even if the equality test takes only constant time. A discriminator accomplishes this in linear time. Likewise, having only a (constanttime) comparison function requires Θ(n log n) time to sort a list of n elements. A discriminator can do this in linear time.