Results 1 -
3 of
3
Algorithms on Compressed Strings and Arrays
- In Proc. 26th Ann. Conf. on Current Trends in Theory and Practice of Infomatics
, 1999
"... . We survey the complexity issues related to several algorithmic problems for compressed one- and two-dimensional texts without explicit decompression: pattern-matching, equality-testing, computation of regularities, subsegment extraction, language membership, and solvability of word equations. Our ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
. We survey the complexity issues related to several algorithmic problems for compressed one- and two-dimensional texts without explicit decompression: pattern-matching, equality-testing, computation of regularities, subsegment extraction, language membership, and solvability of word equations. Our basic problem is one- and two-dimensional pattern-matching together with its variations. For some types of compression the pattern-matching problems are infeasible (NP-hard), for other types they are solvable in polynomial time and we discuss how to reduce the degree of corresponding polynomials. 1 Introduction In the last decade a new stream of research related to data compression has emerged: algorithms on compressed objects. It has been caused by the increase in the volume of data and the need to store and transmit masses of information in compressed form. The compressed information has to be quickly accessed and processed without explicit decompression. In this paper we consider severa...
On the Determinization of Weighted Finite Automata
- SIAM J. Comput
, 1998
"... . We study determinization of weighted finite-state automata (WFAs), which has important applications in automatic speech recognition (ASR). We provide the first polynomial-time algorithm to test for the twins property, which determines if a WFA admits a deterministic equivalent. We also provide ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
. We study determinization of weighted finite-state automata (WFAs), which has important applications in automatic speech recognition (ASR). We provide the first polynomial-time algorithm to test for the twins property, which determines if a WFA admits a deterministic equivalent. We also provide a rigorous analysis of a determinization algorithm of Mohri, with tight bounds for acyclic WFAs. Given that WFAs can expand exponentially when determinized, we explore why those used in ASR tend to shrink. The folklore explanation is that ASR WFAs have an acyclic, multi-partite structure. We show, however, that there exist such WFAs that always incur exponential expansion when determinized. We then introduce a class of WFAs, also with this structure, whose expansion depends on the weights: some weightings cause them to shrink, while others, including random weightings, cause them to expand exponentially. We provide experimental evidence that ASR WFAs exhibit this weight dependence. ...
Efficiency of Fast Parallel Pattern-Searching in Highly Compressed Texts
"... We consider efficiency of NC-algorithms for pattern-searching in highly compressed one- and two-dimensional texts. "Highly compressed" means that the text can be exponentially large with respect to its compressed version, and "fast" means "in polylogarithmic time". Given an uncompressed pattern P an ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We consider efficiency of NC-algorithms for pattern-searching in highly compressed one- and two-dimensional texts. "Highly compressed" means that the text can be exponentially large with respect to its compressed version, and "fast" means "in polylogarithmic time". Given an uncompressed pattern P and a compressed version of a text T, the compressed matching problem is to test if P occurs in T. Two types of closely related compressed representations of 1-dimensional texts are considered: the Lempel-Ziv encodings (LZ, in short) and restricted LZ encodings (RLZ, in short). For highly compressed texts there is a small difference between them, in extreme situations both of them compress text exponentially, e.g. Fibonacci words of size N have compressed versions of size O(log N) for LZ and Restricted LZ encodings. An efficient sequential algorithm for LZ-compressed matching was given in [7], we show that this algorithm is inherently sequential. Despite similarities we prove that LZ-compressed m...

