Context Modeling for Text Compression
, 1992
"... Adaptive context modeling has emerged as one of the most promising new approaches to compressing text. A finitecontext model is a probabilistic model that uses the context in which input symbols occur (generally a few preceding characters) to determine the number of bits used to code these symbols. ..."
Adaptive context modeling has emerged as one of the most promising new approaches to compressing text. A finitecontext model is a probabilistic model that uses the context in which input symbols occur (generally a few preceding characters) to determine the number of bits used to code these symbols. We provide an introduction to context modeling and recent research results that incorporate the concept of context modeling into practical data compression algorithms. 1. Introduction One of the more important developments in the study of data compression is the modern paradigm first presented by Rissanen and Langdon [RL81]. This paradigm divides the process of compression into two separate components: modeling and coding. A model is a representation of the source that generates the data being compressed. Modeling is the process of constructing this representation. Coding entails mapping the modeler's representation of the source into a compressed representation. The coding component tak...
Structural Compression for Document Analysis
 IN INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION
, 1996
"... In this paper we describe a structural compression technique to be used for document text image storage and retrieval. The primary objective is to provide an efficient representation, storage, transmission and display. A secondary objective is to provide an encoding which allows access to specified ..."
In this paper we describe a structural compression technique to be used for document text image storage and retrieval. The primary objective is to provide an efficient representation, storage, transmission and display. A secondary objective is to provide an encoding which allows access to specified regions within the image and facilitates traditional document processing operations without requiring complete decoding. We describe an algorithm which symbolically decomposes a document image and structurally orders the error bitmap based on a probabilistic model. The resultant symbol and error representations lend themseleves to reasonably high compression ratios and are structured so as to allow operations directly on the compressed image. The compression scheme is implemented and compared to traditional compression methods.
Moving Distributed Shared Memory to the Personal Computer: The MIRAGE+ Experience
 Department of Computer Science, University of California
, 1993
"... This paper describes the evolution of a distributed shared memory (DSM) system, Mirage, from its original implementation on VAX computers to its current implementation on modern highend personal computers. Mirage provides a form of shared memory that is network transparent in a loosely coupled envi ..."
This paper describes the evolution of a distributed shared memory (DSM) system, Mirage, from its original implementation on VAX computers to its current implementation on modern highend personal computers. Mirage provides a form of shared memory that is network transparent in a loosely coupled environment. The system hides network boundaries for processes that are accessing shared memory and is upward compatible with the System V UNIX 1 interface. This paper addresses the architectural dependencies in the design of the system and evaluates performance of the implementation. Mirage + performance is similar to Mirage, but the communication bottleneck has become more severe because of the larger page size used in the implementation. We show how this problem can be resolved on conventional hardware at little additional expense by using compression techniques. 1 UNIX is a Registered Trademark of AT&T. Contents 1 Introduction 1 1.1 Reasons for Porting to a New Platform : : : : : : ...
Compression Techniques for Chinese Text
"... With the growth of digital libraries and the internet, large volumes of text are available in electronic form. The majority of this text is English but other languages are increasingly well represented, including largealphabet languages such as Chinese. It is thus attractive to compress text wri ..."
With the growth of digital libraries and the internet, large volumes of text are available in electronic form. The majority of this text is English but other languages are increasingly well represented, including largealphabet languages such as Chinese. It is thus attractive to compress text written in the large alphabet languages, but the generalpurpose compression utilities are not particularly e#ective for this application. In this paper we survey proposals for compressing Chinese text, then examine in detail the application to Chinese text of the partial predictive matching compression technique (PPM). We propose several refinements to PPM to make it more e#ective for Chinese text, and, on our publiclyavailable test corpus of around 50 Mb of Chinese text documents, show that these refinements can significantly improve compression performance while using only a limited volume of memory.
Video on the World Wide Web — Accessing Video from WWW Browsers” Cand.Scient. (Master) thesis
, 1997
"... This report discusses inclusion of various kinds of video in browser programs for the World Wide Web. It contains description of video representation formats, video transfer on the Internet in general, and mechanisms for extending Web browsers to support initially unknown media types. A plugin fo ..."
This report discusses inclusion of various kinds of video in browser programs for the World Wide Web. It contains description of video representation formats, video transfer on the Internet in general, and mechanisms for extending Web browsers to support initially unknown media types. A plugin for Netscape Navigator, capable of displaying inline MPEG movies, is implemented, along with a Java applet for displaying live video captured from a camera connected to a remote computer. The plugin and the applet show that making video available from Web browsers is indeed possible, and not considerably harder than making a standalone video handling program.
A Generalization and Improvement to PPM's "Blending"
, 1997
"... The bestperforming method in the data compression literature for computing probability estimates of sequences online using a suffixtree model is the blending technique used by PPM. Blending can be viewed as a bottomup recursive procedure for computing a mixture, barring one missing term for each ..."
The bestperforming method in the data compression literature for computing probability estimates of sequences online using a suffixtree model is the blending technique used by PPM. Blending can be viewed as a bottomup recursive procedure for computing a mixture, barring one missing term for each level of the recursion, where a mixture is basically a weighted average of several probability estimates. We show by decomposition into an inheritance evaluation time and a mixture weighting function that mixtures generalize the techniques used in PPM variants. Doubly controlled experiments with our executable taxonomy of online sequence modeling algorithms and the Calgary Corpus demonstrate the impact of varying inheritance evaluation time, mixture weighting function, and including update exclusion. Keywords: data compression, universal coding, online stochastic modeling, statistical inference, finitestate automata 1 Portions of this paper also appear in Proceedings of the DCC, March ...
Compressed Domain Document Retrieval and Analysis
, 1996
"... In this paper we first describe a structural compression technique which has been designed to facilitate document text image storage, retrieval, and processing. This technique provides an efficient representation of textual images and lends itself to lossy compression, progressive transmission, dire ..."
In this paper we first describe a structural compression technique which has been designed to facilitate document text image storage, retrieval, and processing. This technique provides an efficient representation of textual images and lends itself to lossy compression, progressive transmission, direct access to subregions of the document and document processing in the compressed domain. We describe a data structure which can be used to efficiently store the compressed information, provide algorithms for creating and manipulating it, and present results of document processing on images compressed from the University of Washington database. Keywords: document image retrieval, compression, symbolic encoding, clustering, compresseddomain analysis 1 Introduction The amount of data contained in a document image is in general very high, and compression is therefore essential for efficient transmission and archiving. Traditional methods of image compression are based on statistical propertie...
LowBandwidth Access: An Evaluation of Application Level Protocol Compressibility
 Proceedings of the 4th International Conference on Telecommunciation Systems, Modelling and Analysis
, 1996
"... Wide area wireless networks will have limited bandwidth. One standard approach for increasing the effective throughput on such links is to employ data compression. In conjunction with any data compression supplied by applications, lowbandwidth systems may also use hardware data compression provide ..."
Wide area wireless networks will have limited bandwidth. One standard approach for increasing the effective throughput on such links is to employ data compression. In conjunction with any data compression supplied by applications, lowbandwidth systems may also use hardware data compression provided by the data communications device. Unfortunately, compression at such a low level can cause problems with both traffic analysis and cost estimation on limited bandwidth links. This investigation identifies some of the issues that arise from employing hardware compression and of monitoring lowbandwidth systems. It describes a compression investigation using CPU based link level compression rather than hardware compression. The results of this investigation, in conjunction with a previous study identifying the traffic characteristics of lowbandwidth links, detail the compressibility statistics for many of the application level protocols typically flowing over current lowbandwidth netwo...
Dictionary Compression on the PRAM
, 1994
"... Parallel algorithms for lossless data compression via dictionary compression using optimal, longest fragment first (LFF), and greedy parsing strategies are described. Dictionary compression removes redundancy by replacing substrings of the input by references to strings stored in a dictionary. Given ..."
Parallel algorithms for lossless data compression via dictionary compression using optimal, longest fragment first (LFF), and greedy parsing strategies are described. Dictionary compression removes redundancy by replacing substrings of the input by references to strings stored in a dictionary. Given a static dictionary stored as a suffix tree, we present a CREW PRAM algorithm for optimal compression which runs in O(M + log M log n) time with O(nM 2 ) processors, where it is assumed that M is the maximum length of any dictionary entry. Under the same model, we give an algorithm for LFF compression which runs in O(log 2 n) time with O(n= log n) processors where it is assumed that the maximum dictionary entry is of length O(log n). We also describe an O(M + log n) time and O(n) processor algorithm for greedy parsing given a static or slidingwindow dictionary. For slidingwindow compression, a different approach finds the greedy parsing in O(log n) time using O(nM log M= log n) proces...
Hidden Markov Models with Mixed States
"... We note similarities of the state space reconstruction ("Embedology") practiced in numerical work on chaos, state space methods of stochastic systems theory, and the hidden Markov models (HMMs) used in speech research. We review Baum's EM algorithm in general and the specific forward ..."
We note similarities of the state space reconstruction ("Embedology") practiced in numerical work on chaos, state space methods of stochastic systems theory, and the hidden Markov models (HMMs) used in speech research. We review Baum's EM algorithm in general and the specific forwardbackward algorithm that optimizes a class of HMM that has a mixed state space consisting of continuous and discrete parts. We then describe forecasts based on models fit to data set D. 1 Introduction In the first part of this paper we hope to provide an intuitive explanation of hidden Markov model (HMM) methods that builds on the notion of state space reconstruction. Later, we provide enough details about the approach to enable a careful reader to develop new variants and to write his own programs. We begin with some thoughts on forecasting and state spaces. We were drawn to work on varieties of hidden Markov models for scalar time series from chaotic dynamics, by the similarity of the notion of reconstru...