Results 1  10
of
11
Web Prefetching Using Partial Match Prediction
, 1998
"... Web traffic is now one of the major components of Internet traffic. One of the main directions of research in this area is to reduce the time latencies users experience when navigating through Web sites. Caching is already being used in that direction, yet, the characteristics of the Web cause cachi ..."
Abstract

Cited by 58 (1 self)
 Add to MetaCart
Web traffic is now one of the major components of Internet traffic. One of the main directions of research in this area is to reduce the time latencies users experience when navigating through Web sites. Caching is already being used in that direction, yet, the characteristics of the Web cause caching in this medium to have poor performance. Therefore, prefetching is now being studied in the Web context. This study investigates the use of partial match prediction, a technique taken from the data compression literature, for prefetching in the Web. The main concern when employing prefetching is to predict as many future requests as possible, while limiting the false predictions to a minimum. The simulation results suggest that a high fraction of the predictions are accurate (e.g., predicts 18%23% of the requests with 90%80% accuracy), so that additional network traffic is kept low. Furthermore, the simulations show that prefetching can substantially increase cache hit rates. 1 Introduc...
Applications of Finite Automata Representing Large Vocabularies
, 1992
"... The construction of minimal acyclic deterministic partial finite automata to represent large natural language vocabularies is described. Applications of such automata include: spelling checkers and advisers, multilanguage dictionaries, thesauri, minimal perfect hashing and text compression. Part of ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
The construction of minimal acyclic deterministic partial finite automata to represent large natural language vocabularies is described. Applications of such automata include: spelling checkers and advisers, multilanguage dictionaries, thesauri, minimal perfect hashing and text compression. Part of this research was supported by a grant awarded by the Brazilian National Council for Scientific and Technological Development (CNPq) to the second author. Authors' Address: Cl'audio L. Lucchesi and Tomasz Kowaltowski, Department of Computer Science, University of Campinas, Caixa Postal 6065, 13081 Campinas, SP, Brazil. Email: lucchesi@dcc.unicamp.br and tomasz@dcc.unicamp.br. 1 Introduction The use of finite automata (see for instance [5]) to represent sets of words is a well established technique. Perhaps the most traditional application is found in compiler construction where such automata can be used to model and implement efficient lexical analyzers (see [1]). Applications of finit...
Partial Words and the Critical Factorization Theorem
 J. Combin. Theory Ser. A
, 2007
"... The study of combinatorics on words, or finite sequences of symbols from a finite alphabet, finds applications in several areas of biology, computer science, mathematics, and physics. Molecular biology, in particular, has stimulated considerable interest in the study of combinatorics on partial word ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
(Show Context)
The study of combinatorics on words, or finite sequences of symbols from a finite alphabet, finds applications in several areas of biology, computer science, mathematics, and physics. Molecular biology, in particular, has stimulated considerable interest in the study of combinatorics on partial words that are sequences that may have a number of “do not know ” symbols also called “holes”. This paper is devoted to a fundamental result on periods of words, the Critical Factorization Theorem, which states that the period of a word is always locally detectable in at least one position of the word resulting in a corresponding critical factorization. Here, we describe precisely the class of partial words w with one hole for which the weak period is locally detectable in at least one position of w. Our proof provides an algorithm which computes a critical factorization when one exists. A World Wide Web server interface at
DNA Sequencing and String Learning
"... In laboratories, the majority of largescale DNA sequencing is done following the shotgun strategy, which is to randomly sequence large amount of relatively short fragments and then heuristically find a shortest common superstring of the fragments [26]. We study mathematical frameworks, under plau ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
In laboratories, the majority of largescale DNA sequencing is done following the shotgun strategy, which is to randomly sequence large amount of relatively short fragments and then heuristically find a shortest common superstring of the fragments [26]. We study mathematical frameworks, under plausible assumptions, suitable for massive automated DNA sequencing and for analyzing DNA sequencing algorithms. We model the DNA sequencing problem as learning a string from its randomly drawn substrings. Under certain restrictions, this may be viewed as string learning in Valiant's distributionfree learning model and in this case we give an efficient learning algorithm and a quantitative bound on how many examples suffice. One major obstacle to our approach turns out to be a quite wellknown open question on how to approximate a shortest common superstring of a set of strings, raised by a number of authors in the last ten years [9, 29, 30]. We give the first provably good algorithm which approximates a shortest superstring of length n by a superstring of length O(n log n). The algorithm works equally well even in the presence of negative examples, i.e., when merging of some strings is prohibited.
Text Data Compression Algorithms
, 1997
"... Contents 1 Text compression 3 2 Static Huffman coding 5 2.1 Encoding : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2 Decoding : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 3 Dynamic Huffman coding 12 3.1 Encoding : : : : : : : ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Contents 1 Text compression 3 2 Static Huffman coding 5 2.1 Encoding : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2 Decoding : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 3 Dynamic Huffman coding 12 3.1 Encoding : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 3.2 Decoding : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 3.3 Updating : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16 4 Arithmetic coding 19 4.1 Encoding : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 4.2 Decoding : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 4.3 Implementation : : : : : : : : : : : : : : : : : : : : : :
© World Scientific Publishing Company TREE COMPRESSION AND OPTIMIZATION WITH APPLICATIONS Dedicated to the memory of Markku Tamminen (19451989)
, 1990
"... Different methods for compressing trees are surveyed and developed. Tree compression can be seen as a tradeoff problem between time and space in which we can choose different strategies depending on whether we prefer better compression results or more efficient operations in the compressed structur ..."
Abstract
 Add to MetaCart
Different methods for compressing trees are surveyed and developed. Tree compression can be seen as a tradeoff problem between time and space in which we can choose different strategies depending on whether we prefer better compression results or more efficient operations in the compressed structure. Of special interest is the case where space can be saved while preserving the functionality of the operations; this is called data optimization. The general compression scheme employed here consists of separate linearization of the tree structure and the data stored in the tree. Also some applications of the tree compression methods are explored. These include the syntaxdirected compression of program files, the compression of pixel trees, trie compaction and dictionaries maintained as implicit data structures.
Status of This Memo Signaling Compression (SigComp) Users ’ Guide
, 2006
"... This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2006). This document provides an informational guide for users of the Signaling Compression (S ..."
Abstract
 Add to MetaCart
(Show Context)
This memo provides information for the Internet community. It does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The Internet Society (2006). This document provides an informational guide for users of the Signaling Compression (SigComp) protocol. The aim of the document is to assist users when making SigComp implementation decisions, for example, the choice of compression algorithm and the level of
Comparing Sequences with Segment Rearrangements
"... Abstract Computational genomics crucially involves comparing sequences based on "similarity " for detecting evolutionary and functional relationships. Until very recently, available portions of the human genome sequence (and that of other species) were fairly short and sparse. Most ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Computational genomics crucially involves comparing sequences based on &quot;similarity &quot; for detecting evolutionary and functional relationships. Until very recently, available portions of the human genome sequence (and that of other species) were fairly short and sparse. Most sequencing effort was focused on genes and other short units; similarity between such sequences was measured based on character level differences. However with the advent of whole genome sequencing technology there is emerging consensus that the measure of similarity between long genome sequences must capture the rearrangements of large segments found in abundance in the human genome. In this paper, we abstract the general problem of computing sequence similarity in the presence of segment rearrangements. This problem is closely related to computing the smallest grammar for a string or the block edit distance between two strings. Our problem, like these other problems, is NP hard. Our main result here is a simple O(1) factor approximation algorithm for this problem. In contrast, best known approximations for the related problems are factor \Omega (log n) off from the optimal. Our algorithm works in linear time, and in one pass. In proving our result, we relate sequence similarity measures based on different segment rearrangements, to each other, tight up to constant factors. 1 Introduction Similarity comparison between biomolecular sequences play an important role in computational genomics due to premise that sequence similarity usually indicates evolutionary and functional similarity. Such comparisons are performed to deduce information about the function or the evolutionary background of unknown genomic sequences in several ways: For example popular computational tools for both multiple sequence alignment and evolutionary tree construction are based on iteratively measuring the similarity between pairs of available sequences.