Results 1  10
of
40
Succinct indexable dictionaries with applications to encoding kary trees and multisets
 In Proceedings of the 13th Annual ACMSIAM Symposium on Discrete Algorithms (SODA
"... We consider the indexable dictionary problem, which consists of storing a set S ⊆ {0,...,m − 1} for some integer m, while supporting the operations of rank(x), which returns the number of elements in S that are less than x if x ∈ S, and −1 otherwise; and select(i) which returns the ith smallest ele ..."
Abstract

Cited by 190 (7 self)
 Add to MetaCart
We consider the indexable dictionary problem, which consists of storing a set S ⊆ {0,...,m − 1} for some integer m, while supporting the operations of rank(x), which returns the number of elements in S that are less than x if x ∈ S, and −1 otherwise; and select(i) which returns the ith smallest element in S. We give a data structure that supports both operations in O(1) time on the RAM model and requires B(n,m)+ o(n)+O(lg lg m) bits to store a set of size n, where B(n,m) = ⌈ lg ( m) ⌉ n is the minimum number of bits required to store any nelement subset from a universe of size m. Previous dictionaries taking this space only supported (yes/no) membership queries in O(1) time. In the cell probe model we can remove the O(lg lg m) additive term in the space bound, answering a question raised by Fich and Miltersen, and Pagh. We present extensions and applications of our indexable dictionary data structure, including: • an informationtheoretically optimal representation of a kary cardinal tree that supports standard operations in constant time, • a representation of a multiset of size n from {0,...,m − 1} in B(n,m+n) + o(n) bits that supports (appropriate generalizations of) rank and select operations in constant time, and • a representation of a sequence of n nonnegative integers summing up to m in B(n,m + n) + o(n) bits that supports prefix sum queries in constant time. 1
Faster algorithms for the shortest path problem
, 1990
"... Efficient implementations of Dijkstra's shortest path algorithm are investigated. A new data structure, called the radix heap, is proposed for use in this algorithm. On a network with n vertices, mn edges, and nonnegative integer arc costs bounded by C, a onelevel form of radix heap gives a time b ..."
Abstract

Cited by 103 (10 self)
 Add to MetaCart
Efficient implementations of Dijkstra's shortest path algorithm are investigated. A new data structure, called the radix heap, is proposed for use in this algorithm. On a network with n vertices, mn edges, and nonnegative integer arc costs bounded by C, a onelevel form of radix heap gives a time bound for Dijkstra's algorithm of O(m + n log C). A twolevel form of radix heap gives a bound of O(m + n log C/log log C). A combination of a radix heap and a previously known data structure called a Fibonacci heap gives a bound of O(m + n /log C). The best previously known bounds are O(m + n log n) using Fibonacci heaps alone and O(m log log C) using the priority queue structure of Van Emde Boas et al. [17].
Deterministic PartofSpeech Tagging with FiniteState Transducers
 Computational Linguistics
, 1995
"... Stochastic approaches to natural language processing have often been preferred to rulebased approaches because of their robustness and their automatic training capabilities. This was the case for partofspeech tagging until Brill showed how stateoftheart partofspeech tagging can be achieved w ..."
Abstract

Cited by 82 (0 self)
 Add to MetaCart
Stochastic approaches to natural language processing have often been preferred to rulebased approaches because of their robustness and their automatic training capabilities. This was the case for partofspeech tagging until Brill showed how stateoftheart partofspeech tagging can be achieved with a rulebased tagger by inferring rules from a training corpus. However, current implementations of the rulebased tagger run more slowly than previous approaches. In this paper, we present a finitestate tagger, inspired by the rulebased tagger, that operates in optimal time in the sense that the time to assign tags to a sentence corresponds to the time required to follow a single path in a deterministic finitestate machine. This result is achieved by encoding the application of the rules found in the tagger as a nondeterministic finitestate transducer and then turning it into a deterministic transducer. The resulting deterministic transducer yields a partofspeech tagger whose speed is dominated by the access time of mass storage devices. We then generalize the techniques to the class of transformationbased systems. 1.
Compact NameIndependent Routing with Minimum Stretch
 In Proceedings of the 16th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA 2004
, 2004
"... Given a weighted undirected network with arbitrary node names, we present a compact routing scheme, using a O(√n) space routing table at each node, and routing along paths of stretch 3, that is, at most thrice as long as the shortest paths. This is optimal in a very strong sense. It is known t ..."
Abstract

Cited by 64 (12 self)
 Add to MetaCart
Given a weighted undirected network with arbitrary node names, we present a compact routing scheme, using a O(√n) space routing table at each node, and routing along paths of stretch 3, that is, at most thrice as long as the shortest paths. This is optimal in a very strong sense. It is known that no compact routing using o(n) space per node can route with stretch below 3. Also, it is known that any stretch below 5 requires Ω(√n) space per node.
Derandomization, witnesses for Boolean matrix multiplication and construction of perfect hash functions
 Algorithmica
, 1996
"... Small sample spaces with almost independent random variables are applied to design efficient sequential deterministic algorithms for two problems. The first algorithm, motivated by the attempt to design efficient algorithms for the All Pairs Shortest Path problem using fast matrix multiplication, so ..."
Abstract

Cited by 61 (5 self)
 Add to MetaCart
Small sample spaces with almost independent random variables are applied to design efficient sequential deterministic algorithms for two problems. The first algorithm, motivated by the attempt to design efficient algorithms for the All Pairs Shortest Path problem using fast matrix multiplication, solves the problem of computing witnesses for the Boolean product of two matrices. That is, if A and B are two n by n matrices, and C = AB is their Boolean product, the algorithm finds for every entry Cij = 1 a witness: an index k so that Aik = Bkj = 1. Its running time exceeds that of computing the product of two n by n matrices with small integer entries by a polylogarithmic factor. The second algorithm is a nearly linear time deterministic procedure for constructing a perfect hash function for a given nsubset of {1,..., m}.
LOW REDUNDANCY IN STATIC DICTIONARIES WITH CONSTANT QUERY TIME
 SIAM J. COMPUT.
, 2001
"... A static dictionary is a data structure storing subsets of a finite universe U, answering membership queries. We show that on a unit cost RAM with word size Θ(log U), a static dictionary for nelement sets with constant worst case query time can be obtained using B +O(log log U)+o(n) (U) bits ..."
Abstract

Cited by 49 (7 self)
 Add to MetaCart
A static dictionary is a data structure storing subsets of a finite universe U, answering membership queries. We show that on a unit cost RAM with word size Θ(log U), a static dictionary for nelement sets with constant worst case query time can be obtained using B +O(log log U)+o(n) (U) bits of storage, where B = ⌈log2 ⌉ is the minimum number of bits needed to represent all nn element subsets of U.
DynFO: A Parallel, Dynamic Complexity Class
 Journal of Computer and System Sciences
, 1994
"... Traditionally, computational complexity has considered only static problems. Classical Complexity Classes such as NC, P, and NP are defined in terms of the complexity of checking  upon presentation of an entire input  whether the input satisfies a certain property. For many applications of compu ..."
Abstract

Cited by 49 (4 self)
 Add to MetaCart
Traditionally, computational complexity has considered only static problems. Classical Complexity Classes such as NC, P, and NP are defined in terms of the complexity of checking  upon presentation of an entire input  whether the input satisfies a certain property. For many applications of computers it is more appropriate to model the process as a dynamic one. There is a fairly large object being worked on over a period of time. The object is repeatedly modified by users and computations are performed. We develop a theory of Dynamic Complexity. We study the new complexity class, Dynamic FirstOrder Logic (DynFO). This is the set of properties that can be maintained and queried in firstorder logic, i.e. relational calculus, on a relational database. We show that many interesting properties are in DynFO including multiplication, graph connectivity, bipartiteness, and the computation of minimum spanning trees. Note that none of these problems is in static FO, and this f...
Word Hyphenation by Computer
, 1983
"... This thesis describes research leading to an improved word hyphenation algorithm for the TjrjX82 typesetting system. Hyphenation is viewed primarily as a data compression problem, where we are given a dictionary of words with allowable division points, and try to devise methods that take advantage ..."
Abstract

Cited by 43 (0 self)
 Add to MetaCart
This thesis describes research leading to an improved word hyphenation algorithm for the TjrjX82 typesetting system. Hyphenation is viewed primarily as a data compression problem, where we are given a dictionary of words with allowable division points, and try to devise methods that take advantage of the large amount of redundancy present. The new hyphenation algorithm is based on the idea of hyphenating and inhibiting patterns. These are simply strings of letters that, when they match hi a word, give us information about hyphenation at some point in the pattern. For example, 'tion ' and *cc ' are good hyphenating patterns. An important feature of this method is that a suitable set of patterns can be extracted automatical! / from the dictionary. In order to represent the set of patterns in a compart form that is also reasonably efficient for searching, the author has developed a new data structure called a packed trie. This data structure allows the very fast search times characteristic of indexed tries, but in many cases it entirely eliminates the wasted space for null links usually present in such tries. We demonstrate the versatility and practical advantages of this data structure by using a variant of it as the critical component of the program that generates the patterns from the dictionary. The resulting hyphenation algorithm uses about 4500 patterns that compile into a packed trie occupying 25K bytes of storage. These patterns find 89 % of the hyphens in a pocket dictionary word list, with essentially no error. By comparison, the uncompressed dictionary occupies over 500K bytes.
Rapid Unit Selection from a Large Speech Corpus for Concatenative Speech Synthesis
 PROC. EUROSPEECH
, 1999
"... Concatenative TexttoSpeech (TTS) systems such as those described by Hunt and Black [6] can select at synthesis time from a very large number of recorded units. The selected units are chosen to minimize a combination of target and join costs for a given sentence. However, the join costs, in particu ..."
Abstract

Cited by 32 (6 self)
 Add to MetaCart
Concatenative TexttoSpeech (TTS) systems such as those described by Hunt and Black [6] can select at synthesis time from a very large number of recorded units. The selected units are chosen to minimize a combination of target and join costs for a given sentence. However, the join costs, in particular, can be quite expensive to compute, even when this computation has been optimized. If possible, we would avoid this computation by precomputing and caching all the possible join costs, but their number is prohibitive. Although the search space of possible joins is large, we have found that only a small fraction are selected in practice. By synthesizing a large quantity of text and logging the units actually selected, we were able to gather usage statistics and construct a practical and efficient cache of concatenation costs. Use of this cache dramatically decreased the runtime of the AT&T NextGeneration TTS system [1] with negligible effect on speech quality. Experiments show that by ca...
Improved Behaviour of Tries by Adaptive Branching
"... We introduce and analyze a method to reduce the search cost in tries. Traditional trie structures use branching factors at the nodes that are either fixed or a function of the number of elements. Instead, we let the distribution of the elements guide the choice of branching factors. This is accomp ..."
Abstract

Cited by 32 (8 self)
 Add to MetaCart
We introduce and analyze a method to reduce the search cost in tries. Traditional trie structures use branching factors at the nodes that are either fixed or a function of the number of elements. Instead, we let the distribution of the elements guide the choice of branching factors. This is accomplished in a strikingly simple way: in a binary trie, the i highest complete levels are replaced by a single node of degree 2i; the compression is repeated in the subtries. This structure, the levelcompressed trie, inherits the good properties of binary tries with respect to neighbour and range searches, while the external path length is significantly decreased. It also has the advantage of being easy to implement. Our analysis shows that the expected depth of a stored element is \Theta (log \Lambda n) for uniformly distributed data.