Results 1  10
of
22
The Smallest Grammar Problem
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2005
"... This paper addresses the smallest grammar problem: What is the smallest contextfree grammar that generates exactly one given string σ? This is a natural question about a fundamental object connected to many fields, including data compression, Kolmogorov complexity, pattern identification, and addi ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
This paper addresses the smallest grammar problem: What is the smallest contextfree grammar that generates exactly one given string σ? This is a natural question about a fundamental object connected to many fields, including data compression, Kolmogorov complexity, pattern identification, and addition chains. Due to the problem’s inherent complexity, our objective is to find an approximation algorithm which finds a small grammar for the input string. We focus attention on the approximation ratio of the algorithm (and implicitly, worstcase behavior) to establish provable performance guarantees and to address shortcomings in the classical measure of redundancy in the literature. Our first results are a variety of hardness results, most notably that every efficient algorithm for the smallest grammar problem has approximation ratio at least 8569 unless P = NP. 8568 We then bound approximation ratios for several of the bestknown grammarbased compression algorithms, including LZ78, BISECTION, SEQUENTIAL, LONGEST MATCH, GREEDY, and REPAIR. Among these, the best upper bound we show is O(n 1/2). We finish by presenting two novel algorithms with exponentially better ratios of O(log 3 n) and O(log(n/m ∗)), where m ∗ is the size of the smallest grammar for that input. The latter highlights a connection between grammarbased compression and LZ77.
Approximation Algorithms for GrammarBased Data Compression
, 2002
"... This thesis considers the smallest grammar problem: find the smallest contextfree grammar that generates exactly one given string. We show that this problem is intractable, and so our objective is to find approximation algorithms. This simple question is connected to many areas of research. Most im ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
This thesis considers the smallest grammar problem: find the smallest contextfree grammar that generates exactly one given string. We show that this problem is intractable, and so our objective is to find approximation algorithms. This simple question is connected to many areas of research. Most importantly, there is a link to data compression; instead of storing a long string, one can store a small grammar that generates it. A small grammar for a string also naturally brings out underlying patterns, a fact that is useful, for example, in DNA analysis. Moreover, the size of the smallest contextfree grammar generating a string can be regarded as a computable relaxation of Kolmogorov complexity. Finally, work on the smallest grammar problem qualitatively extends the study of approximation algorithms to hierarchicallystructured objects. In this thesis, we establish hardness results, evaluate several previously proposed algorithms, and then present new procedures with much stronger approximation guarantees.
Random Access to GrammarCompressed Strings
, 2011
"... Let S be a string of length N compressed into a contextfree grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, αk(n) is ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Let S be a string of length N compressed into a contextfree grammar S of size n. We present two representations of S achieving O(log N) random access time, and either O(n · αk(n)) construction time and space on the pointer machine model, or O(n) construction time and space on the RAM. Here, αk(n) is the inverse of the k th row of Ackermann’s function. Our representations also efficiently support decompression of any substring in S: we can decompress any substring of length m in the same complexity as a single random access query and additional O(m) time. Combining these results with fast algorithms for uncompressed approximate string matching leads to several efficient algorithms for approximate string matching on grammarcompressed strings without decompression. For instance, we can find all approximate occurrences of a pattern P with at most k errors in time O(n(min{P k, k 4 + P } + log N) + occ), where occ is the number of occurrences of P in S. Finally, we are able to generalize our results to navigation and other operations on grammarcompressed trees. All of the above bounds significantly improve the currently best known results. To achieve these bounds, we introduce several new techniques and data structures of independent interest, including a predecessor data structure, two ”biased” weighted ancestor data structures, and a compact representation of heavypaths in grammars.
Grammarbased Compression of DNA Sequences
, 2004
"... Grammarbased compression algorithms infer contextfree grammars to represent the input data. The grammar is then transformed into a symbol stream and finally encoded in binary. We explore the utility of grammarbased compression of DNA sequences. We strive to optimize the three stages of grammarba ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Grammarbased compression algorithms infer contextfree grammars to represent the input data. The grammar is then transformed into a symbol stream and finally encoded in binary. We explore the utility of grammarbased compression of DNA sequences. We strive to optimize the three stages of grammarbased compression to work optimally for DNA. DNA is notoriously hard to compress, and ultimately, our algorithm fails to achieve better compression than the best competitor. 1
A simple and fast DNA compressor
 Software  Practice and Experience
, 2004
"... In this paper we consider the problem of DNA compression. It is well known that one of the main features of DNA sequences is that they contain substrings which are duplicated except for a few random mutations. For this reason most DNA compressors work by searching and encoding approximate repeats. W ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
In this paper we consider the problem of DNA compression. It is well known that one of the main features of DNA sequences is that they contain substrings which are duplicated except for a few random mutations. For this reason most DNA compressors work by searching and encoding approximate repeats. We depart from this strategy by searching and encoding only exact repeats. However, we use an encoding designed to take advantage of the possible presence of approximate repeats. Our approach leads to an algorithm which is an order of magnitude faster than any other algorithm and achieves a compression ratio very close to the best DNA compressors. Another important feature of our algorithm is its small space occupancy which makes it possible to compress sequences hundreds of megabytes long, well beyond the range of any previous DNA compressor. 1
Structure induction by lossless graph compression
 In Proc. of the IEEE Data Compression Conference, 53–62
, 2007
"... ..."
An approach to phrase selection for offline data compression
 Proc. 25th Australasian Computer Science Conference
, 2002
"... Recently several oJfline data compression schemes have been published that expend large amounts of computing resources when encoding a file, but decode the file quickly. These compressors work by identifying phrases in the input data, and storing the data as a series of pointer to these phrases. Thi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Recently several oJfline data compression schemes have been published that expend large amounts of computing resources when encoding a file, but decode the file quickly. These compressors work by identifying phrases in the input data, and storing the data as a series of pointer to these phrases. This paper explores the application of an algorithm for computing all repeating substrings within a string for phrase selection in an offiine data compressor. Using our approach, we obtain compression similar to that of the best known offiine compressors on genetic data, but poor results on general text. It seems, however, that an alternate approach based on selecting repeating substrings is feasible. Keywords: strings, of_ fline data compression, textual substitution, repeating substrings 1
Lineartime offline text compression by longestfirst substitution
 in Proc. 10th International Symp. on String Processing and Information Retrieval (SPIRE’03
, 2003
"... Abstract. Given a text, grammarbased compression is to construct a grammar that generates the text. There are many kinds of text compression techniques of this type. Each compression scheme is categorized as being either offline or online, according to how a text is processed. One representative ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Abstract. Given a text, grammarbased compression is to construct a grammar that generates the text. There are many kinds of text compression techniques of this type. Each compression scheme is categorized as being either offline or online, according to how a text is processed. One representative tactics for offline compression is to substitute the longest repeated factors of a text with a production rule. In this paper, we present an algorithm that compresses a text basing on this longestfirst principle, in linear time. The algorithm employs a suitable index structure for a text, and involves technically efficient operations on the structure. 1
Inplace update of suffix array while recoding words
 in "International Journal of Foundations of Computer Science (IJFCS)", 2010. Symbiose 31
"... Abstract. Motivated by grammatical inference and data compression applications, we propose an algorithm to update a suffix array after the substitution, in the indexed text, of some occurrences of a given word by a new character. Compared to other published index update methods, the problem addresse ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. Motivated by grammatical inference and data compression applications, we propose an algorithm to update a suffix array after the substitution, in the indexed text, of some occurrences of a given word by a new character. Compared to other published index update methods, the problem addressed here may require the modification of a large number of distinct positions over the original text. The proposed algorithm uses the specific internal order of suffix arrays in order to update simultaneously groups of entries, and ensures that only entries to be modified are visited. Experiments confirm a significant execution time speedup compared to the construction of suffix array from scratch at each step of the application.
Prius: Generic Hybrid Trace Compression for Wireless Sensor Networks
"... Several diagnostic tracing techniques (e.g., event, power, and controlflow tracing) have been proposed for runtime debugging and postmortem analysis of wireless sensor networks (WSNs). Traces generated by such techniques can become large, defying the harsh resource constraints of WSNs. Compression ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Several diagnostic tracing techniques (e.g., event, power, and controlflow tracing) have been proposed for runtime debugging and postmortem analysis of wireless sensor networks (WSNs). Traces generated by such techniques can become large, defying the harsh resource constraints of WSNs. Compression is a straightforward candidate to reduce trace sizes, yet is challenged by the same resource constraints. Established trace compression algorithms perform unsatisfactorily under these constraints. We propose Prius, a novel hybrid (offline/online) trace compression technique that enables application of established trace compression algorithms for WSNs and achieves high compression rates and significant energy savings. We have implemented such hybrid versions of two established compression techniques for TinyOS and evaluated them on various applications. Prius respects the resource constraints of WSNs (5 % average program memory overhead) whilst reducing energy consumption on average by 46 % and 49% compared to straightforward online adaptations of established compression algorithms and the stateoftheart tracespecific compression algorithm respectively.