Results 1 - 10
of
19
Data Compression Using Adaptive Coding and Partial String Matching
- IEEE Transactions on Communications
, 1984
"... The recently developed technique of arithmetic coding, in conjunction with a Markov model of the source, is a powerful method of data compression in situations where a linear treatment is inappropriate. Adaptive coding allows the model to be constructed dynamically by both encoder and decoder during ..."
Abstract
-
Cited by 293 (20 self)
- Add to MetaCart
The recently developed technique of arithmetic coding, in conjunction with a Markov model of the source, is a powerful method of data compression in situations where a linear treatment is inappropriate. Adaptive coding allows the model to be constructed dynamically by both encoder and decoder during the course of the transmission, and has been shown to incur a smaller coding overhead than explicit transmission of the model's statistics. But there is a basic conflict between the desire to use high-order Markov models and the need to have them formed quickly as the initial part of the message is sent. This paper describes how the conflict can be resolved with partial string matching, and reports experimental results which show that mixed-case English text can be coded in as little as 2.2 bits/ character with no prior knowledge of the source.
The Emerging JBIG2 Standard
- IEEE Trans. Circuits and Systems for Video Technology
, 1998
"... The Joint Bi-level Image Experts Group (JBIG), an international study group affiliated with ISO/IEC and ITU - T, is in the process of drafting a new standard for lossy and lossless compression of bi-level images. The new standard, informally referred to as JBIG2, will support model-based coding for ..."
Abstract
-
Cited by 43 (1 self)
- Add to MetaCart
The Joint Bi-level Image Experts Group (JBIG), an international study group affiliated with ISO/IEC and ITU - T, is in the process of drafting a new standard for lossy and lossless compression of bi-level images. The new standard, informally referred to as JBIG2, will support model-based coding for text and halftones to permit compression ratios up to three times those of existing standards for lossless compression. JBIG2 will also permit lossy preprocessing without specifying how it is to be done. In this case compression ratios up to eight times those of existing standards may be obtained with imperceptible loss of quality. It is expected that jbig2 will become an International Standard by 2000.
Textual Image Compression: Two-stage Lossy/Lossless Encoding of Textual Images
- Proceedings of the IEEE
, 1994
"... this paper combines a lossy technique, where the compression factor is high but what is reproduced is an approximation to the original digitized document, with a lossless technique, which enables the original to be reproduced exactly from its compressed form. This is achieved by separating text from ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
this paper combines a lossy technique, where the compression factor is high but what is reproduced is an approximation to the original digitized document, with a lossless technique, which enables the original to be reproduced exactly from its compressed form. This is achieved by separating text from noise in the document, and compressing the two components separately using a method appropriate for each. Results presented below show that (a) the lossy variant of the method outperforms the best previous lossy compression techniques for textual images, and (b) the lossless variant outperforms the best previous lossless techniques. The two methods combine naturally into a two-stage procedure for "progressive transmission": the lossy image is sent first and then, if desired, extra information is transmitted to refine it into an exact replica of the original.
Lossless Document Image Compression
, 1999
"... Document image compression reduces the storage requirements for digitised books or documents by using characters as the fundamental unit of compression. Compression gains can be achieved by identifying regions that contain text, isolating unique characters, and storing them in a codebook. This thes ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Document image compression reduces the storage requirements for digitised books or documents by using characters as the fundamental unit of compression. Compression gains can be achieved by identifying regions that contain text, isolating unique characters, and storing them in a codebook. This thesis investigates several fundamental areas of the compression process. Algorithms for each area are tested on a corpus of images and the improvements tested for statistical significance. Methods for isolating characters from a bitmap are investigated along with techniques for determining reading order. We introduce the use of the docstrum to aid image compression and show that it improves upon previous methods. The Hough transform is shown to be an accurate method for determining page skew and gives robust results over a range of image resolutions. Compression is shown to improve when the skew of an image is determined automatically, and used to determine reading order. If images can be segm...
Document Image Compression and Analysis
- PhD of the university of Maryland
, 1997
"... Image compression usually considers the minimization of storage space as its main objective. It is desirable, however, to code images so that we have the ability to process the resulting representation directly. In this thesis we explore an approach to document image compression that is efficient in ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Image compression usually considers the minimization of storage space as its main objective. It is desirable, however, to code images so that we have the ability to process the resulting representation directly. In this thesis we explore an approach to document image compression that is efficient in both space (storage requirement) and time (processing flexibility). A representation is presented in which component-level redundancy is removed by forming a prototype library and component location table. This representation forms a basis for compression and provides direct access to image components. To generate the prototype library, a new clustering approach is developed which is suitable for document image components. The distance metric is based on a character degradation model so that degraded versions of the same character will be grouped together. To achieve a lossless representation when required, the residuals are encoded efficiently using a structural distance ordering. OCR is...
Dictionary Design for Text Image Compression with JBIG2
- IEEE Trans. on Image Processing
, 2001
"... The JBIG2 standard for lossy and lossless bi-level image coding is a very flexible encoding strategy based on pattern matching techniques. This paper addresses the problem of compressing text images with JBIG2. For text image compression, JBIG2 allows two encoding strategies: SPM and PM&S. We com ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
The JBIG2 standard for lossy and lossless bi-level image coding is a very flexible encoding strategy based on pattern matching techniques. This paper addresses the problem of compressing text images with JBIG2. For text image compression, JBIG2 allows two encoding strategies: SPM and PM&S. We compare in detail the lossless and lossy coding performance using the SPM-based and PM&S-based JBIG2, including their coding efficiency, reconstructed image quality and system complexity. For the SPM-based JBIG2, we discuss the bit rate trade-off associated with symbol dictionary design. We propose two symbol dictionary design techniques: the class-based and tree-based techniques. Experiments show that the SPM-based JBIG2 is a more efficient lossless system, leading to 8% higher compression ratios on average. It also provides better control over the reconstructed image quality in lossy compression. However, SPM's advantages come at the price of higher encoder complexity. The proposed class-based and tree-based symbol dictionary designs outperform simpler dictionary formation techniques by 8% for lossless and 16-18% for lossy compression. Keywords Bi-level image coding, text image compression, JBIG2, soft pattern matching, symbol dictionary. 1 1
Lossless Compression for Text and Images
- International Journal of High Speed Electronics and Systems
, 1995
"... Most data that is inherently discrete needs to be compressed in such a way that it can be recovered exactly, without any loss. Examples include text of all kinds, experimental results, and statistical databases. Other forms of data may need to be stored exactly, such as images---particularly bilevel ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Most data that is inherently discrete needs to be compressed in such a way that it can be recovered exactly, without any loss. Examples include text of all kinds, experimental results, and statistical databases. Other forms of data may need to be stored exactly, such as images---particularly bilevel ones, or ones arising in medical and remotesensing applications, or ones that may be required to be certified true for legal reasons. Moreover, during the process of lossy compression, many occasions for lossless compression of coefficients or other information arise. This paper surveys techniques for lossless compression. The process of compression can be broken down into modeling and coding. We provide an extensive discussion of coding techniques, and then introduce methods of modeling that are appropriate for text and images. Standard methods used in popular utilities (in the case of text) and international standards (in the case of images) are described. Keywords Text compression, ima...
Pattern Matching in Compressed Text and Images
, 2001
"... Normally compressed data needs to be decompressed before it is processed, but if the compression has been done in the fight way, it is often possible to search the data without having to decompress it, or at least only partially decompress it. The problem can be divided into lossless and lossy c ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Normally compressed data needs to be decompressed before it is processed, but if the compression has been done in the fight way, it is often possible to search the data without having to decompress it, or at least only partially decompress it. The problem can be divided into lossless and lossy compression methods, and then in each of these cases the pattern matching can be either exact or inexact. Much work has been reported in the literature on techniques for all of these cases, including algorithms that are suitable for pattern matching for various compression methods, and compression methods designed specifically for pattern matching. This work is surveyed in this paper. The paper also exposes the important relationship between pattern matching and compression, and proposes some performance measures for compressed pattern matching algorithms. Ideas and directions for future work are also described.
An Overview of Run-length Encoding of Handwritten Word Images
, 2000
"... Analysis of handwritten word images is closely tied to the method of representing the images. Different representations have their own sets of advantages and disadvantages. In this paper, we propose a novel method of encoding handwritten images using vertical runs that significantly simplifies th ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Analysis of handwritten word images is closely tied to the method of representing the images. Different representations have their own sets of advantages and disadvantages. In this paper, we propose a novel method of encoding handwritten images using vertical runs that significantly simplifies the implementation of several image-processing tasks pertaining to handwriting recognition. We demonstrate the advantages of both horizontal and vertical run-length encoding schemes and compare them to other widely used representations like chain-code and bitmap. We illustrate ease of use of horizontal runs for correcting the slant angle, image smoothing, and base-line detection and vertical runs for correcting the skew angle and character segmentation. We believe this paper will serve as a useful tutorial in image representation schemes used in handwriting analysis and recognition. 1 Introduction Representation of handwritten images is an important issue in handwriting recognition. I...
Lossless Image Compression by Block Matching
- Comput. J
, 1997
"... this paper leads to a high-speed method that surpasses the fastest 1-D methods, but falls short of the best (but much slower) 2-D methods. An interesting question is whether improved pointer coding at an acceptable cost in speed would significantly improve the compression. Experience with 1-D LZ1-ty ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
this paper leads to a high-speed method that surpasses the fastest 1-D methods, but falls short of the best (but much slower) 2-D methods. An interesting question is whether improved pointer coding at an acceptable cost in speed would significantly improve the compression. Experience with 1-D LZ1-type methods (e.g. [15]) has shown that good pointer coding schemes are important, and the problem becomes much worse here. For example, the number of matches used was typically less than the number found by the LZ2-based algorithm of Constantinescu and Storer [11]; but the na ve coding algorithm uses many more bits per pointer. With rectangular matches, this issue becomes even more significant.

