Results 11 -
19 of
19
Probabilistic Finite-State Machines - Part II
"... Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked. In part I of this paper, we surveyed these objects and studied their properties. In this part II, we study the relations between probabilistic finit ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked. In part I of this paper, we surveyed these objects and studied their properties. In this part II, we study the relations between probabilistic finite-state automata and other well known devices that generate strings like hidden Markov models and n- grams, and provide theorems, algorithms and properties that represent a current state of the art of these objects.
Language Modeling With Stochastic Automata
, 1996
"... It is well known that language models are effective for increasing accuracy of speech and handwriting recognizers, but large language models are often required to achieve low model perplexity (or entropy) and still have adequate language coverage. We study three efficient methods for stochastic lang ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
It is well known that language models are effective for increasing accuracy of speech and handwriting recognizers, but large language models are often required to achieve low model perplexity (or entropy) and still have adequate language coverage. We study three efficient methods for stochastic language modeling in the context of the stochastic pattern recognition problem and give results of a comparative performance analysis. In addition we show that a method which combines two of these language modeling techniques yields even better performance than the best of the single techniques tested.
Fast and Efficient Algorithms for Text and Video Compression
, 1997
"... There is a tradeoff between the speed of a data compressor and the level of compression it can achieve. Improving compression generally requires more computation; and improving speed generally sacrifices compression. In this thesis, we examine a range of tradeoffs for text and video. In text compres ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
There is a tradeoff between the speed of a data compressor and the level of compression it can achieve. Improving compression generally requires more computation; and improving speed generally sacrifices compression. In this thesis, we examine a range of tradeoffs for text and video. In text compression, we attempt to bridge the gap between statistical techniques, which exhibit a greater amount of compression but are computationally intensive, and dictionary-based techniques, which give less compression but run faster. We combine the context modeling of statistical coding with dynamic dictionaries into a hybrid coding scheme we call Dictionary by Partial Matching. In low-bit-rate video compression, we explore the speed-compression tradeoffs with a range of motion estimation techniques operating within the H.261 video coding standard. We initially consider algorithms that explicitly minimizes bit rate and combination of rate and distortion. With insights gained from the explicit minimization algorithms, we propose a new technique for motion estimation that minimizes an efficiently computed heuristic function. The new technique gives compression efficiency comparable to the explicit-minimization algorithms while running much faster. We also explore bit-minimization in a non-standard quadtree-based video coder that codes
PPM Model Cleaning
- In Proceedings of the IEEE Data Compression Conference (DCC’95
, 2003
"... Predictio by Partial Matching (PPM) algo3I uses a cumulative frequencycon to f input symbo2 in di#erent co texts to estimate theirproI3 y distributio ExcellentcoIV2 ratio yielded by the PPMalgo111 have no instigatedbrogat useo f this scheme mainly becauseo f its high demandfo cond ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Predictio by Partial Matching (PPM) algo3I uses a cumulative frequencycon to f input symbo2 in di#erent co texts to estimate theirproI3 y distributio ExcellentcoIV2 ratio yielded by the PPMalgo111 have no instigatedbrogat useo f this scheme mainly becauseo f its high demandfo condI71IV resoI7 In this paper, we present an algo13I whichimpro ves thememo2 usage by the PPMmo del.
Compression Techniques for Chinese Text
"... With the growth of digital libraries and the internet, large volumes of text are available in electronic form. The majority of this text is English but other languages are increasingly well represented, including large-alphabet languages such as Chinese. It is thus attractive to compress text wri ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
With the growth of digital libraries and the internet, large volumes of text are available in electronic form. The majority of this text is English but other languages are increasingly well represented, including large-alphabet languages such as Chinese. It is thus attractive to compress text written in the large alphabet languages, but the general-purpose compression utilities are not particularly e#ective for this application. In this paper we survey proposals for compressing Chinese text, then examine in detail the application to Chinese text of the partial predictive matching compression technique (PPM). We propose several refinements to PPM to make it more e#ective for Chinese text, and, on our publicly-available test corpus of around 50 Mb of Chinese text documents, show that these refinements can significantly improve compression performance while using only a limited volume of memory.
Static Compression for Dynamic Texts
- Proc. IEEE Data Compression Conference
, 1994
"... : Two problems arise when semi-static word-based compression methods are applied to large texts, such as those stored in information retrieval systems. First, the space required for the model during decoding can become very large. Second, the need to handle document insertions means that the collect ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
: Two problems arise when semi-static word-based compression methods are applied to large texts, such as those stored in information retrieval systems. First, the space required for the model during decoding can become very large. Second, the need to handle document insertions means that the collection must be periodically recompressed if compression efficiency is to be maintained. Here we show that with careful management the impact of both of these drawbacks can be minimised. Experiments with a word-based model and over 500 Mb of text show that compression rates can be retained even in the face of severe memory limitations on the decoder, and in the face of significant expansion in the size of the text itself. 1 Word-Based Compression The use of a word-based zero-order compression model to represent English text has been considered by several authors [2, 6, 7, 15]. It is particularly appropriate for compressing full-text document collections, an application in which very large quant...
ON THE RELATION BETWEEN ADDITIVE SMOOTHING AND UNIVERSAL CODING
"... We analyze the performance of smoothing methods for language modeling from the perspective of universal compression. We use existing asymptotic bounds on the performance of simple additive rules for compression of finite-alphabet memoryless sources to explain the empirical predictive abilities of ad ..."
Abstract
- Add to MetaCart
We analyze the performance of smoothing methods for language modeling from the perspective of universal compression. We use existing asymptotic bounds on the performance of simple additive rules for compression of finite-alphabet memoryless sources to explain the empirical predictive abilities of additive smoothing techniques. We further suggest a smoothing method that overcomes some of the problems observed in previous approaches. The new method outperforms existing ones on the Wall Street Journal(WSJ) database for bigram and trigram models. We then suggest possible directions for future research. 1.
A Universal Compression Perspective of Smoothing
"... We analyze smoothing algorithms from a universal-compression perspective. Instead of evaluating their performance on an empirical sample, we analyze their performance on the most inconvenient sample possible. Consequently the performance of the algorithm can be guaranteed even on unseen data. We sho ..."
Abstract
- Add to MetaCart
We analyze smoothing algorithms from a universal-compression perspective. Instead of evaluating their performance on an empirical sample, we analyze their performance on the most inconvenient sample possible. Consequently the performance of the algorithm can be guaranteed even on unseen data. We show that universal compression bounds can explain the empirical performance of several smoothing methods. We also describe a new interpolated additive smoothing algorithm, and show that it has lower training complexity and better compression performance than existing smoothing techniques. Key words: Language modeling, universal compression, smoothing 1

