Results 1  10
of
25
The LOCOI Lossless Image Compression Algorithm: Principles and Standardization into JPEGLS
 IEEE TRANSACTIONS ON IMAGE PROCESSING
, 2000
"... LOCOI (LOw COmplexity LOssless COmpression for Images) is the algorithm at the core of the new ISO/ITU standard for lossless and nearlossless compression of continuoustone images, JPEGLS. It is conceived as a "low complexity projection" of the universal context modeling paradigm, matching its mo ..."
Abstract

Cited by 152 (10 self)
 Add to MetaCart
LOCOI (LOw COmplexity LOssless COmpression for Images) is the algorithm at the core of the new ISO/ITU standard for lossless and nearlossless compression of continuoustone images, JPEGLS. It is conceived as a "low complexity projection" of the universal context modeling paradigm, matching its modeling unit to a simple coding unit. By combining simplicity with the compression potential of context models, the algorithm "enjoys the best of both worlds." It is based on a simple fixed context model, which approaches the capability of the more complex universal techniques for capturing highorder dependencies. The model is tuned for efficient performance in conjunction with an extended family of Golombtype codes, which are adaptively chosen, and an embedded alphabet extension for coding of lowentropy image regions. LOCOI attains compression ratios similar or superior to those obtained with stateoftheart schemes based on arithmetic coding. Moreover, it is within a few percentage points of the best available compression ratios, at a much lower complexity level. We discuss the principles underlying the design of LOCOI, and its standardization into JPEGLS.
On prediction using variable order Markov models
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2004
"... This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Cont ..."
Abstract

Cited by 56 (1 self)
 Add to MetaCart
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average logloss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a “decomposed” CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the LempelZiv compression algorithm, significantly outperforms all algorithms on the protein classification problems.
Inequalities between Entropy and Index of Coincidence derived from Information Diagrams
 IEEE Trans. Inform. Theory
, 2001
"... To any discrete probability distribution P we can associate its entropy H(P) = − � pi ln pi and its index of coincidence IC(P) = � p 2 i. The main result of the paper is the determination of the precise range of the map P � (IC(P), H(P)). The range looks much like that of the map P � (Pmax, H(P ..."
Abstract

Cited by 19 (11 self)
 Add to MetaCart
To any discrete probability distribution P we can associate its entropy H(P) = − � pi ln pi and its index of coincidence IC(P) = � p 2 i. The main result of the paper is the determination of the precise range of the map P � (IC(P), H(P)). The range looks much like that of the map P � (Pmax, H(P)) where Pmax is the maximal point probability, cf. research from 1965 (Kovalevskij [18]) to 1994 (Feder and Merhav [7]). The earlier results, which actually focus on the probability of error 1 − Pmax rather than Pmax, can be conceived as limiting cases of results obtained by methods here presented. Ranges of maps as those indicated are called Information Diagrams. The main result gives rise to precise lower as well as upper bounds for the entropy function. Some of these bounds are essential for the exact solution of certain problems of universal coding and prediction for Bernoulli sources. Other applications concern Shannon theory (relations betweeen various measures of divergence), statistical decision theory and rate distortion theory. Two methods are developed. One is topological, another involves convex analysis and is based on a “lemma of replacement ” which is of independent interest in relation to problems of optimization of mixed type (concave/convex optimization).
How Well do Bayes Methods Work for OnLine Prediction of {±1} values?
 In Proceedings of the Third NEC Symposium on Computation and Cognition. SIAM
, 1992
"... We look at sequential classification and regression problems in which f\Sigma1glabeled instances are given online, one at a time, and for each new instance, before seeing the label, the learning system must either predict the label, or estimate the probability that the label is +1. We look at the ..."
Abstract

Cited by 18 (11 self)
 Add to MetaCart
We look at sequential classification and regression problems in which f\Sigma1glabeled instances are given online, one at a time, and for each new instance, before seeing the label, the learning system must either predict the label, or estimate the probability that the label is +1. We look at the performance of Bayes method for this task, as measured by the total number of mistakes for the classification problem, and by the total log loss (or information gain) for the regression problem. Our results are given by comparing the performance of Bayes method to the performance of a hypothetical "omniscient scientist" who is able to use extra information about the labeling process that would not be available in the standard learning protocol. The results show that Bayes methods perform only slightly worse than the omniscient scientist in many cases. These results generalize previous results of Haussler, Kearns and Schapire, and Opper and Haussler. 1 Introduction Several recent papers in...
Lossless compression of continuoustone images
 Proc. IEEE
, 2000
"... Abstract — In this paper, we survey some of the recent advances in lossless compression of continuoustone images. The modeling paradigms underlying the stateoftheart algorithms, and the principles guiding their design, are discussed in a unified manner. The algorithms are described and experiment ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Abstract — In this paper, we survey some of the recent advances in lossless compression of continuoustone images. The modeling paradigms underlying the stateoftheart algorithms, and the principles guiding their design, are discussed in a unified manner. The algorithms are described and experimentally compared. I.
Understanding Probabilistic Classifiers
, 2001
"... . Probabilistic classifiers are developed by assuming generative models which are product distributions over the original attribute space (as in naive Bayes) or more involved spaces (as in general Bayesian networks). While this paradigm has been shown experimentally successful on real world appli ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
. Probabilistic classifiers are developed by assuming generative models which are product distributions over the original attribute space (as in naive Bayes) or more involved spaces (as in general Bayesian networks). While this paradigm has been shown experimentally successful on real world applications, despite vastly simplified probabilistic assumptions, the question of why these approaches work is still open. This paper resolves this question. We show that almost all joint distributions with a given set of marginals (i.e., all distributions that could have given rise to the classifier learned) or, equivalently, almost all data sets that yield this set of marginals, are very close (in terms of distributional distance) to the product distribution on the marginals; the number of these distributions goes down exponentially with their distance from the product distribution. Consequently, as we show, for almost all joint distributions with this set of marginals, the penalty incurred in using the marginal distribution rather than the true one is small. In addition to resolving the puzzle surrounding the success of probabilistic classifiers our results contribute to understanding the tradeoffs in developing probabilistic classifiers and will help in developing better classifiers. 1
Sequential prediction and ranking in universal context modeling and data compression
 IEEE Trans. Inform. Theory
, 1997
"... Prediction is one of the oldest and most successful tools in the data compression practitioner's toolbox. It is particularly useful in situations where the data (e.g., a digital image) originates from a natural physical process (e.g., sensed light), and the data samples (e.g., real numbers) represen ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Prediction is one of the oldest and most successful tools in the data compression practitioner's toolbox. It is particularly useful in situations where the data (e.g., a digital image) originates from a natural physical process (e.g., sensed light), and the data samples (e.g., real numbers) represent a continuously varying physical magnitude (e.g., brightness). In these cases, the value of the next sample can often be accurately
Can Entropy Characterize Performance of Online Algorithms?
 in Symposium on Discrete Algorithms, 2001
, 2001
"... We focus in this work on an aspect of online computation that is not addressed by the standard competitive analysis. Namely, identifying request sequences for which nontrivial online algorithms are useful versus request sequences for which all algorithms perform equally bad. The motivation for t ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We focus in this work on an aspect of online computation that is not addressed by the standard competitive analysis. Namely, identifying request sequences for which nontrivial online algorithms are useful versus request sequences for which all algorithms perform equally bad. The motivation for this work are advanced system and architecture designs which allow the operating system to dynamically allocate resources to online protocols such as prefetching and caching. To utilize these features the operating system needs to identify data streams that can benet from more resources. Our approach in this work is based on the relation between entropy, compression and gambling, extensively studied in information theory. It has been shown that in some settings entropy can either fully or at least partially characterize the expected outcome of an iterative gambling game. Viewing online problem with stochastic input as an iterative gambling game, our goal is to study the extent to which the entropy of the input characterizes the expected performance of online algorithms for problems that arise in computer applications. We study bounds based on entropy for three online problems { list accessing, prefetching and caching. We show that entropy is a good performance characterizer for prefetching, but not so good characterizer for online caching. Our work raises several open questions in using entropy as a predictor in online computation. Computer Science Department, Brown University, Box 1910, Providence, RI 029121910, USA. Email: fgopal, elig@cs.brown.edu. Supported in part by NSF grant CCR9731477. A preliminary version of this paper appeared in the proceedings of the 12th annual ACMSIAM Symposium on Discrete Algorithms (SODA), Washington D.C., 2001. 1
ExplanationBased Feature Construction
"... Choosing good features to represent objects can be crucial to the success of supervised machine learning algorithms. Good highlevel features are those that concentrate information about the classification task. Such features can often be constructed as nonlinear combinations of raw or native input ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Choosing good features to represent objects can be crucial to the success of supervised machine learning algorithms. Good highlevel features are those that concentrate information about the classification task. Such features can often be constructed as nonlinear combinations of raw or native input features such as the pixels of an image. Using many nonlinear combinations, as do SVMs, can dilute the classification information necessitating many training examples. On the other hand, searching even a modestlyexpressive space of nonlinear functions for highinformation ones can be intractable. We describe an approach to feature construction where taskrelevant discriminative features are automatically constructed, guided by an explanationbased interaction of training examples and prior domain knowledge. We show that in the challenging task of distinguishing handwritten Chinese characters, our automatic featureconstruction approach performs particularly well on the most difficult and complex character pairs. 1
An InformationTheory Framework for the Study of the Complexity of Visibility and Radiosity in a Scene
, 2002
"... this dissertation. 1.1 Radiosity, Complexity, and Information Theory The three fundamental pillars of this thesis are radiosity, complexity, and information theory: One of the most important topics in computer graphics is the accurate computation of the global illumination in a closed virtual ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
this dissertation. 1.1 Radiosity, Complexity, and Information Theory The three fundamental pillars of this thesis are radiosity, complexity, and information theory: One of the most important topics in computer graphics is the accurate computation of the global illumination in a closed virtual environment (scene), i.e. the intensities of light over all its surfaces. "The production of realistic images requires in particular a precise treatment of lighting e#ects that can be achieved by simulating the underlying physical phenomena of light emission, propagation, and reflection"[82]. This type of simulation is called global illumination and is represented by the rendering equation [43], which is a Fredholm integral equation of the second kind. However obtaining an exact representation of the illumination is an intractable problem. Many di#erent techniques are used to obtain an approximate quantification of it [12, 82, 33]