Results 1 - 10
of
24
Sorting in Linear Time?
, 1995
"... We show that a unit-cost RAM with a word length of w bits can sort n integers in the range 0 : : 2 w \Gamma1 in O(n log log n) time, for arbitrary w log n, a significant improvement over the bound of O(n p log n) achieved by the fusion trees of Fredman and Willard. Provided that w (log n) 2+f ..."
Abstract
-
Cited by 73 (15 self)
- Add to MetaCart
We show that a unit-cost RAM with a word length of w bits can sort n integers in the range 0 : : 2 w \Gamma1 in O(n log log n) time, for arbitrary w log n, a significant improvement over the bound of O(n p log n) achieved by the fusion trees of Fredman and Willard. Provided that w (log n) 2+ffl for some fixed ffl ? 0, the sorting can even be accomplished in linear expected time with a randomized algorithm. Both of our algorithms parallelize without loss on a unit-cost PRAM with a word length of w bits. The first one yields an algorithm that uses O(logn) time and O(n log log n) operations on a deterministic CRCW PRAM. The second one yields an algorithm that uses O(log n) expected time and O(n) expected operations on a randomized EREW PRAM, provided that w (log n) 2+ffl for some fixed ffl ? 0. Our deterministic and randomized sequential and parallel algorithms generalize to the lexicographic sorting problem of sorting multiple-precision integers represented in several words. ...
On RAM priority queues
, 1996
"... Priority queues are some of the most fundamental data structures. They are used directly for, say, task scheduling in operating systems. Moreover, they are essential to greedy algorithms. We study the complexity of priority queue operations on a RAM with arbitrary word size. We present exponential i ..."
Abstract
-
Cited by 69 (9 self)
- Add to MetaCart
Priority queues are some of the most fundamental data structures. They are used directly for, say, task scheduling in operating systems. Moreover, they are essential to greedy algorithms. We study the complexity of priority queue operations on a RAM with arbitrary word size. We present exponential improvements over previous bounds, and we show tight relations to sorting. Our first result is a RAM priority queue supporting insert and extract-min operations in worst case time O(log log n) where n is the current number of keys in the queue. This is an exponential improvement over the O( p log n) bound of Fredman and Willard from STOC'90. Our algorithm is simple, and it only uses AC 0 operations, meaning that there is no hidden time dependency on the word size. Plugging this priority queue into Dijkstra's algorithm gives an O(m log log m) algorithm for the single source shortest path problem on a graph with m edges, as compared with the previous O(m p log m) bound based on Fredman...
Efficient Implementation of Suffix Trees
, 1995
"... this article we discuss how the suffix tree can be used for string searching ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
this article we discuss how the suffix tree can be used for string searching
A Fast Algorithm for Making Suffix Arrays and for Burrows-Wheeler Transformation
- IN PROCEEDINGS OF THE IEEE DATA COMPRESSION CONFERENCE, SNOWBIRD, UTAH, MARCH 30 - APRIL 1
, 1998
"... We propose a fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an arrayof indexes of suffixes is called suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
We propose a fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an arrayof indexes of suffixes is called suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desired. We compare
On Sorting Strings in External Memory
, 1997
"... ) Lars Arge Paolo Ferragina y Roberto Grossi z Jeffrey Scott Vitter x Abstract. In this paper we address for the first time the I/O complexity of the problem of sorting strings in external memory, which is a fundamental component of many large-scale text applications. In the standard unit-cost RAM c ..."
Abstract
-
Cited by 23 (12 self)
- Add to MetaCart
) Lars Arge Paolo Ferragina y Roberto Grossi z Jeffrey Scott Vitter x Abstract. In this paper we address for the first time the I/O complexity of the problem of sorting strings in external memory, which is a fundamental component of many large-scale text applications. In the standard unit-cost RAM comparison model, the complexity of sorting K strings of total length N is \Theta(K log 2 K+N). By analogy, in the external memory (or I/O) model, where the internal memory has size M and the block transfer size is B, it would be natural to guess that the I/O complexity of sorting strings is \Theta( K B log M=B K B + N B ), but the known algorithms do not come even close to achieving this bound. Our results show, somewhat counterintuitively, that the I/O complexity of string sorting depends upon the length of the strings relative to the block size. We first consider a simple comparison I/O model, where one is not allowed to break the strings into their characters, and we sho...
Implementing Radixsort
- ACM Jour. of Experimental Algorithmics
, 1998
"... We present and evaluate several new optimization and implementation techniques for string sorting. In particular, we study a recently published radix sorting algorithm, Forward radixsort, that has a provably good worst-case behavior. Our experimental results indicate that radix sorting is considerab ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
We present and evaluate several new optimization and implementation techniques for string sorting. In particular, we study a recently published radix sorting algorithm, Forward radixsort, that has a provably good worst-case behavior. Our experimental results indicate that radix sorting is considerably faster (often more than twice as fast) than comparison-based sorting methods. This is true even for small input sequences. We also show that it is possible to implement a radix sort with good worst-case running time without sacrificing average-case performance. Our implementations are competitive with the best previously published string sorting algorithms. Code, test data, and test results are available from the World Wide Web. 1. Introduction Radix sorting is a simple and very efficient sorting method that has received too little attention. A common misconception is that a radix sorting algorithm either has to inspect all the characters of the input or use an inordinate amount of extra...
Cache-Conscious Sorting of Large Sets of Strings with Dynamic Tries
"... Ongoing changes in computer performance are affecting the efficiency of string sorting algorithms. The size of main memory in typical computers continues to grow, but memory accesses require increasing numbers of instruction cycles, which is a problem for the most efficient of the existing string-so ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Ongoing changes in computer performance are affecting the efficiency of string sorting algorithms. The size of main memory in typical computers continues to grow, but memory accesses require increasing numbers of instruction cycles, which is a problem for the most efficient of the existing string-sorting algorithms as they do not utilise cache particularly well for large data sets. We propose a new sorting algorithm for strings, burstsort, based on dynamic construction of a compact trie in which strings are kept in buckets. It is simple, fast, and efficient. We experimentally compare burstsort to existing string-sorting algorithms on large and small sets of strings with a range of characteristics. These experiments show that, for large sets of strings, burstsort is almost twice as fast as any previous algorithm, due primarily to a lower rate of cache miss.
Unifying Text Search And Compression - Suffix Sorting, Block Sorting and Suffix Arrays
, 2000
"... Today many electronic documents are available such as articles of newspapers, dictionaries, books, DNA sequences, etc. and they are stored in databases. We also have many documents on the Internet and have many e-mail documents. Therefore, fast queries on such huge amount of documents and their comp ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Today many electronic documents are available such as articles of newspapers, dictionaries, books, DNA sequences, etc. and they are stored in databases. We also have many documents on the Internet and have many e-mail documents. Therefore, fast queries on such huge amount of documents and their compression to reduce costs for storing or transferring them are important. In this thesis, a unified method for improving efficiency of search and compression for huge text data is proposed. All search methods and compression methods used in this thesis are related to a data structure called suffix array. The suffix array is a text search data structure and it is used in a text compression method called block sorting. Both are promising search method and compression method and there are many studies on the methods. Now a data structure called inverted file is used for queries from huge amount of documents. Though it is widely used, query unit is a document in order to reduce disk space to sto...
On sorting strings in external memory (extended abstract
- In STOC ’97: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
, 1997
"... Abstract. In this paper we address for the first time the I/O complexity of the problem of sorting strings in external memory, which is a fundamental component of many large-scale text applications. In the standard unit-cost RAM comparison model, the complexity of sorting K strings of total length N ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. In this paper we address for the first time the I/O complexity of the problem of sorting strings in external memory, which is a fundamental component of many large-scale text applications. In the standard unit-cost RAM comparison model, the complexity of sorting K strings of total length N is (K log2 K +N). By analogy, in the external memory (or I/O) model, where the internal memory has size M and the block transfer size is B, it would be natural to guess that the I/O complexity of sorting strings is ( K B logM=B K N
Constructing Suffix Arrays of Large Texts
- PROC OF DEWS98
, 1998
"... Recently, Sadakane [12] proposes a new fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an array of indexes of suffixes is called suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Recently, Sadakane [12] proposes a new fast and memory efficient algorithm for sorting suffixes of a text in lexicographic order. It is important to sort suffixes because an array of indexes of suffixes is called suffix array and it is a memory efficient alternative of the suffix tree. Sorting suffixes is also used for the Burrows-Wheeler transformation in the Block Sorting text compression, therefore fast sorting algorithms are desired. In full-text databases, of course the length of texts are quite large, and this algorithm makes it possible to use the suffix array data structure and the compression scheme for such larger texts. In this paper, we compare algorithms for making suffix arrays of Bentley-Sedgewick, Andersson-Nilsson and Karp-Miller-Rosenberg and making suffix trees of Larsson on speed and required memory and compare them with our new algorithm which is fast and memory efficient by combining them.

