Structural Compression for Document Analysis
 IN INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION
, 1996
In this paper we describe a structural compression technique to be used for document text image storage and retrieval. The primary objective is to provide an efficient representation, storage, transmission and display. A secondary objective is to provide an encoding which allows access to specified regions within the image and facilitates traditional document processing operations without requiring complete decoding. We describe an algorithm which symbolically decomposes a document image and structurally orders the error bitmap based on a probabilistic model. The resultant symbol and error representations lend themseleves to reasonably high compression ratios and are structured so as to allow operations directly on the compressed image. The compression scheme is implemented and compared to traditional compression methods.
Context Modeling for Text Compression
, 1992
Adaptive context modeling has emerged as one of the most promising new approaches to compressing text. A finitecontext model is a probabilistic model that uses the context in which input symbols occur (generally a few preceding characters) to determine the number of bits used to code these symbols. We provide an introduction to context modeling and recent research results that incorporate the concept of context modeling into practical data compression algorithms. 1. Introduction One of the more important developments in the study of data compression is the modern paradigm first presented by Rissanen and Langdon [RL81]. This paradigm divides the process of compression into two separate components: modeling and coding. A model is a representation of the source that generates the data being compressed. Modeling is the process of constructing this representation. Coding entails mapping the modeler's representation of the source into a compressed representation. The coding component tak...
Video on the World Wide Web  Accessing Video from WWW Browsers
, 1997
This report discusses inclusion of various kinds of video in browser programs for the World Wide Web. It contains description of video representation formats, video transfer on the Internet in general, and mechanisms for extending Web browsers to support initially unknown media types. A plugin for Netscape Navigator, capable of displaying inline MPEG movies, is implemented, along with a Java applet for displaying live video captured from a camera connected to a remote computer. The plugin and the applet show that making video available from Web browsers is indeed possible, and not considerably harder than making a standalone video handling program. Preface This report documents my work on a master degree in computer science at Department of Informatics (Ifi), University of Oslo (UiO) in 1996 and 1997. The work was done at the University's Center for Information Technology Services (USIT). . . . and I wish to thank . . . My internal supervisors have been Fritz Albregtsen and Per ...
Compression Techniques for Chinese Text
With the growth of digital libraries and the internet, large volumes of text are available in electronic form. The majority of this text is English but other languages are increasingly well represented, including largealphabet languages such as Chinese. It is thus attractive to compress text written in the large alphabet languages, but the generalpurpose compression utilities are not particularly e#ective for this application. In this paper we survey proposals for compressing Chinese text, then examine in detail the application to Chinese text of the partial predictive matching compression technique (PPM). We propose several refinements to PPM to make it more e#ective for Chinese text, and, on our publiclyavailable test corpus of around 50 Mb of Chinese text documents, show that these refinements can significantly improve compression performance while using only a limited volume of memory.
A generalization and improvement to PPM's blending
, 1997
The bestperforming method in the data compression literature for computing probability estimates of sequences online using a suffixtree model is the blending technique used by PPM [CW84, MofSO]. Blending can be viewed as a bottomup recursive procedure for computing a mixture, barring one missing term for each level of the recursion, where a mixture is basically a weighted average of several probability estimates. In [Bun971 we have shown by decomposition into an inheritance weight &{A, B, C, D} and an inheritance evaluation time, Mh, that mixtures generalize the techniques used in DMC variants [CH87], as well as PPM variants, and thus these techniques, along with other variants of mixtures, are interchangeable. Table 1 shows the relative effectiveness of most combinations of mixture weighting functions and inheritance evaluation times. Table 2 is a study on the value of using update exclusion, especially in models using state selection. Table 1: How average compression performance on the Calgary Corpus as a whole is affected by varying mixture inheritance times and mixture weight functions, in models with and without (percolating) state selection.
A Hybrid FractalWavelet Transform Image Data Compression Algorithm
This report describes two seemingly distinct areas of work, wavelet analysis and fractal image compression. A review of these two areas is presented, a new algorithm outlined, and some results presented. Finally, some speculations concerning the future direction of this research is included. 1 Contents 1 Introduction 1 2 Fractal Block Coding 1 2.1 The Contraction Mapping Theorem : : : : : : : : : : : : : : : : : : : 1 2.2 Iterated Function Systems and The Collage Theorem : : : : : : : : : 1 2.3 Conventional Fractal Block Coding : : : : : : : : : : : : : : : : : : : 2 2.4 Reconstructing an Image from a fractal block code : : : : : : : : : : : 3 2.5 Advances in Fractal coding : : : : : : : : : : : : : : : : : : : : : : : : 4 3 The Wavelet Transform 7 3.1 The Discrete Wavelet Transform : : : : : : : : : : : : : : : : : : : : : 9 3.2 Relation Between Iterated Function Systems and The Wavelet Transform 13 4 The New Algorithm 15 4.1 Algorithm Basics : : : : : : : : : : : : : : : : : : ...
A Characterization of the Dynamic Markov Compression FSM with Finite Conditioning Contexts
, 1994
Structure of DMC s 0 s 1 a s 2 ac s 3 c s 5 acab b c suffix(s ): prefix(s ): i i M : 6 s 4 aca s 6 a a Figure 2: Observable Structure in DMC Models. For any state s i , suffix(s i ) is the original destination of the transition that was redirected to s i when s i was created; prefix(s i ) is the source of the transition which was redirected to s i , when s i was added to the model; and symbol(s i ) labels the transition that was originally redirected to s i , and any subsequently added transitions into s i . The context of s i , context(s i ), labels each state. The nonreflexive transitions of model M 6 , pictured in Figure 1, are omitted. However, the reflexive transitions of M 6 are included here to illustrate the consistent substructures they define in the DMC model. There are always jAj reflexive transitions in the model. (Here A = fa; b; cg). When a reflexive transition is redirected by cloning, the newly added state will have a reflexive transition with the same symbol. For an...
Dictionary Compression on the PRAM
, 1994
Parallel algorithms for lossless data compression via dictionary compression using optimal, longest fragment first (LFF), and greedy parsing strategies are described. Dictionary compression removes redundancy by replacing substrings of the input by references to strings stored in a dictionary. Given a static dictionary stored as a suffix tree, we present a CREW PRAM algorithm for optimal compression which runs in O(M + log M log n) time with O(nM 2 ) processors, where it is assumed that M is the maximum length of any dictionary entry. Under the same model, we give an algorithm for LFF compression which runs in O(log 2 n) time with O(n= log n) processors where it is assumed that the maximum dictionary entry is of length O(log n). We also describe an O(M + log n) time and O(n) processor algorithm for greedy parsing given a static or slidingwindow dictionary. For slidingwindow compression, a different approach finds the greedy parsing in O(log n) time using O(nM log M= log n) proces...
LowBandwidth Access: An Evaluation of Application Level Protocol Compressibility
 Proceedings of the 4th International Conference on Telecommunciation Systems, Modelling and Analysis
, 1996
Wide area wireless networks will have limited bandwidth. One standard approach for increasing the effective throughput on such links is to employ data compression. In conjunction with any data compression supplied by applications, lowbandwidth systems may also use hardware data compression provided by the data communications device. Unfortunately, compression at such a low level can cause problems with both traffic analysis and cost estimation on limited bandwidth links. This investigation identifies some of the issues that arise from employing hardware compression and of monitoring lowbandwidth systems. It describes a compression investigation using CPU based link level compression rather than hardware compression. The results of this investigation, in conjunction with a previous study identifying the traffic characteristics of lowbandwidth links, detail the compressibility statistics for many of the application level protocols typically flowing over current lowbandwidth netwo...
Compressed Domain Document Retrieval and Analysis
, 1996
In this paper we first describe a structural compression technique which has been designed to facilitate document text image storage, retrieval, and processing. This technique provides an efficient representation of textual images and lends itself to lossy compression, progressive transmission, direct access to subregions of the document and document processing in the compressed domain. We describe a data structure which can be used to efficiently store the compressed information, provide algorithms for creating and manipulating it, and present results of document processing on images compressed from the University of Washington database. Keywords: document image retrieval, compression, symbolic encoding, clustering, compresseddomain analysis 1 Introduction The amount of data contained in a document image is in general very high, and compression is therefore essential for efficient transmission and archiving. Traditional methods of image compression are based on statistical propertie...