Results 1  10
of
172
Handbook of Applied Cryptography
, 1997
"... As we draw near to closing out the twentieth century, we see quite clearly that the informationprocessing and telecommunications revolutions now underway will continue vigorously into the twentyfirst. We interact and transact by directing flocks of digital packets towards each other through cybers ..."
Abstract

Cited by 2477 (30 self)
 Add to MetaCart
As we draw near to closing out the twentieth century, we see quite clearly that the informationprocessing and telecommunications revolutions now underway will continue vigorously into the twentyfirst. We interact and transact by directing flocks of digital packets towards each other through cyberspace, carrying love notes, digital cash, and secret corporate documents. Our personal and economic lives rely more and more on our ability to let such ethereal carrier pigeons mediate at a distance what we used to do with facetoface meetings, paper documents, and a firm handshake. Unfortunately, the technical wizardry enabling remote collaborations is founded on broadcasting everything as sequences of zeros and ones that one's own dog wouldn't recognize. What is to distinguish a digital dollar when it is as easily reproducible as the spoken word? How do we converse privately when every syllable is bounced off a satellite and smeared over an entire continent? How should a bank know that it really is Bill Gates requesting from his laptop in Fiji a transfer of $10,000,000,000 to another bank? Fortunately, the magical mathematics of cryptography can help. Cryptography provides techniques for keeping information secret, for determining that information
A universal algorithm for sequential data compression
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1977
"... A universal algorithm for sequential data compression is presented. Its performance is investigated with respect to a nonprobabilistic model of constrained sources. The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainabl ..."
Abstract

Cited by 1142 (4 self)
 Add to MetaCart
A universal algorithm for sequential data compression is presented. Its performance is investigated with respect to a nonprobabilistic model of constrained sources. The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by blocktovariable codes and variabletoblock codes designed to match a completely specified source.
Compression of Individual Sequences via VariableRate Coding
, 1978
"... this paper contains two parts: descrip five part (Section II) where all the results are stated 532 XEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 1T24, NO. 5, SEPTEMF..R 1978 and discussed and a formal part (Section III) where all proofs except that of Theorem 2 are given. The proof of Theorem 2, w ..."
Abstract

Cited by 732 (8 self)
 Add to MetaCart
this paper contains two parts: descrip five part (Section II) where all the results are stated 532 XEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 1T24, NO. 5, SEPTEMF..R 1978 and discussed and a formal part (Section III) where all proofs except that of Theorem 2 are given. The proof of Theorem 2, which is constructive and thus informative, is presented in the mainstream of Section II
Compressed fulltext indexes
 ACM COMPUTING SURVEYS
, 2007
"... Fulltext indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text l ..."
Abstract

Cited by 173 (79 self)
 Add to MetaCart
Fulltext indexes provide fast substring search over large text collections. A serious problem of these indexes has traditionally been their space consumption. A recent trend is to develop indexes that exploit the compressibility of the text, so that their size is a function of the compressed text length. This concept has evolved into selfindexes, which in addition contain enough information to reproduce any text portion, so they replace the text. The exciting possibility of an index that takes space close to that of the compressed text, replaces it, and in addition provides fast search over it, has triggered a wealth of activity and produced surprising results in a very short time, and radically changed the status of this area in less than five years. The most successful indexes nowadays are able to obtain almost optimal space and search time simultaneously. In this paper we present the main concepts underlying selfindexes. We explain the relationship between text entropy and regularities that show up in index structures and permit compressing them. Then we cover the most relevant selfindexes up to date, focusing on the essential aspects on how they exploit the text compressibility and how they solve efficiently various search problems. We aim at giving the theoretical background to understand and follow the developments in this area.
Universal prediction
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 1998
"... This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabili ..."
Abstract

Cited by 136 (11 self)
 Add to MetaCart
This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings.
Fast and Flexible Word Searching on Compressed Text
, 2000
"... ... text. When searching complex or approximate patterns, our algorithms are up to 8 times faster than the search on uncompressed text. We also discuss the impact of our technique in inverted files pointing to logical blocks and argue for the possibility of keeping the text compressed all the time, ..."
Abstract

Cited by 81 (33 self)
 Add to MetaCart
... text. When searching complex or approximate patterns, our algorithms are up to 8 times faster than the search on uncompressed text. We also discuss the impact of our technique in inverted files pointing to logical blocks and argue for the possibility of keeping the text compressed all the time, decompressing only for displaying purposes.
A New Challenge for Compression Algorithms: Genetic Sequences
 Information Processing & Management
, 1994
"... Universal data compression algorithms fail to compress genetic sequences. It is due to the specificity of this particular kind of "text". We analyze in some details the properties of the sequences, which cause the failure of classical algorithms. We then present a lossless algorithm, biocompress2, ..."
Abstract

Cited by 69 (0 self)
 Add to MetaCart
Universal data compression algorithms fail to compress genetic sequences. It is due to the specificity of this particular kind of "text". We analyze in some details the properties of the sequences, which cause the failure of classical algorithms. We then present a lossless algorithm, biocompress2, to compress the information contained in DNA and RNA sequences, based on the detection of regularities, such as the presence of palindromes. The algorithm combines substitutional and statistical methods, and to the best of our knowledge, lead to the highest compression of DNA. The results, although not satisfactory, gives insight to the necessary correlation between compression and comprehension of genetic sequences. 1 Introduction There are plenty of specific types of data which need to be compressed, for ease of storage and communication. Among them are texts (such as natural language and programs), images, sounds, etc. In this paper, we focus on the compression of a specific kin...
Asymptotic Behavior of the LempelZiv Parsing Scheme and Digital Search Trees
 Theoretical Computer Science
, 1995
"... The LempelZiv parsing scheme finds a wide range of applications, most notably in data compression and algorithms on words. It partitions a sequence of length n into variable phrases such that a new phrase is the shortest substring not seen in the past as a phrase. The parameter of interest is the n ..."
Abstract

Cited by 64 (30 self)
 Add to MetaCart
The LempelZiv parsing scheme finds a wide range of applications, most notably in data compression and algorithms on words. It partitions a sequence of length n into variable phrases such that a new phrase is the shortest substring not seen in the past as a phrase. The parameter of interest is the number M n of phrases that one can construct from a sequence of length n. In this paper, for the memoryless source with unequal probabilities of symbols generation we derive the limiting distribution of M n which turns out to be normal. This proves a long standing open problem. In fact, to obtain this result we solved another open problem, namely, that of establishing the limiting distribution of the internal path length in a digital search tree. The latter is a consequence of an asymptotic solution of a multiplicative differentialfunctional equation often arising in the analysis of algorithms on words. Interestingly enough, our findings are proved by a combination of probabilistic techniques such as renewal equation and uniform integrability, and analytical techniques such as Mellin transform, differentialfunctional equations, dePoissonization, and so forth. In concluding remarks we indicate a possibility of extending our results to Markovian models.
A Subquadratic Sequence Alignment Algorithm for Unrestricted Cost Matrices
, 2002
"... The classical algorithm for computing the similarity between two sequences [36, 39] uses a dynamic programming matrix, and compares two strings of size n in O(n 2 ) time. We address the challenge of computing the similarity of two strings in subquadratic time, for metrics which use a scoring ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
The classical algorithm for computing the similarity between two sequences [36, 39] uses a dynamic programming matrix, and compares two strings of size n in O(n 2 ) time. We address the challenge of computing the similarity of two strings in subquadratic time, for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both local and global alignment computations. The speedup is achieved by dividing the dynamic programming matrix into variable sized blocks, as induced by LempelZiv parsing of both strings, and utilizing the inherent periodic nature of both strings. This leads to an O(n 2 = log n) algorithm for an input of constant alphabet size. For most texts, the time complexity is actually O(hn 2 = log n) where h 1 is the entropy of the text. Institut GaspardMonge, Universite de MarnelaVallee, Cite Descartes, ChampssurMarne, 77454 MarnelaVallee Cedex 2, France, email: mac@univmlv.fr. y Department of Computer Science, Haifa University, Haifa 31905, Israel, phone: (9724) 8240103, FAX: (9724) 8249331; Department of Computer and Information Science, Polytechnic University, Six MetroTech Center, Brooklyn, NY 112013840; email: landau@poly.edu; partially supported by NSF grant CCR0104307, by NATO Science Programme grant PST.CLG.977017, by the Israel Science Foundation (grants 173/98 and 282/01), by the FIRST Foundation of the Israel Academy of Science and Humanities, and by IBM Faculty Partnership Award. z Department of Computer Science, Haifa University, Haifa 31905, Israel; On Education Leave from the IBM T.J.W. Research Center; email: michal@cs.haifa.il; partially supported by by the Israel Science Foundation (grants 173/98 and 282/01), and by the FIRST Foundation of the Israel Academy of Science ...
Autocorrelation On Words And Its Applications  Analysis of Suffix Trees by StringRuler Approach
 J. Combin.Theory Ser. A
, 1994
"... We study in a probabilistic framework some topics concerning the way words can overlap. Our probabilistic models assumes that a word is a sequence of i.i.d. symbols generated from a finite alphabet. This defines the so called Bernoulli model. We investigate the length of a subword that can be recopi ..."
Abstract

Cited by 54 (23 self)
 Add to MetaCart
We study in a probabilistic framework some topics concerning the way words can overlap. Our probabilistic models assumes that a word is a sequence of i.i.d. symbols generated from a finite alphabet. This defines the so called Bernoulli model. We investigate the length of a subword that can be recopied, that is, a subword that occurs at least twice in a given word. An occurrence of such repeated substrings is easy to detect in a digital tree called a suffix tree. The length of a repeated substring corresponds to the typical depth in the associated suffix tree. Our main finding shows that the typical depth in a suffix tree is asymptotically distributed in the same manner as the typical depth in a digital tree that stores independent keys (i.e., independent tries). More precisely, we prove that the typical depth in a suffix tree built from the first n suffixes of a random word is normally distributed with the mean asymptotically becoming 1=h 1 log n and the variance ff \Delta log n, where...