Results 11  20
of
130
Universal schemes for sequential decision from individual data sequences
, 1993
"... Sequential decision algorithms are investigated, under a family of additive performance criteria, for individual data sequences, with various application areas in information theory and signal processing. Simple universal sequential schemes are known, under certain conditions, to approach optimality ..."
Abstract

Cited by 28 (11 self)
 Add to MetaCart
Sequential decision algorithms are investigated, under a family of additive performance criteria, for individual data sequences, with various application areas in information theory and signal processing. Simple universal sequential schemes are known, under certain conditions, to approach optimality uniformly as fast as nl log n, where n is the sample size. For the case of finitealphabet observations, the class of schemes that can be implemented by bitestate machines (FSM’s), is studied. It is shown that Markovian machines with daently long memory exist that are asympboticaily nerrly as good as any given FSM (deterministic or WomhI) for the purpose of sequential decision. For the continuousvalued observation case, a useful class of parametric schemes is discussed with special attention to the recursive least squares W) algorithm.
Context tree estimation for not necessarily finite memory processes, via BIC and MDL
 IEEE Trans. Inf. Theory
, 2006
"... The concept of context tree, usually defined for finite memory processes, is extended to arbitrary stationary ergodic processes (with finite alphabet). These context trees are not necessarily complete, and may be of infinite depth. The familiar BIC and MDL principles are shown to provide strongly co ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
The concept of context tree, usually defined for finite memory processes, is extended to arbitrary stationary ergodic processes (with finite alphabet). These context trees are not necessarily complete, and may be of infinite depth. The familiar BIC and MDL principles are shown to provide strongly consistent estimators of the context tree, via optimization of a criterion for hypothetical context trees of finite depth, allowed to grow with the sample size n as o(log n). Algorithms are provided to compute these estimators in O(n) time, and to compute them online for all i ≤ n in o(n log n) time.
Low Complexity Sequential Lossless Coding for Piecewise Stationary Memoryless Sources
 IEEE Transactions on Information Theory
, 1999
"... Abstract — Three strongly sequential, lossless compression schemes, one with linearly growing perletter computational complexity, and two with fixed perletter complexity, are presented and analyzed for memoryless sources with abruptly changing statistics. The first method, which improves on Willem ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
Abstract — Three strongly sequential, lossless compression schemes, one with linearly growing perletter computational complexity, and two with fixed perletter complexity, are presented and analyzed for memoryless sources with abruptly changing statistics. The first method, which improves on Willems’ weighting approach, asymptotically achieves a lower bound on the redundancy, and hence is optimal. The second scheme achieves redundancy of O (log N=N) when the transitions in the statistics are large, and O (log log N = log N) otherwise. The third approach always achieves redundancy of O ( log N=N). Obviously, the two fixed complexity approaches can be easily combined to achieve the better redundancy between the two. Simulation results support the analytical bounds derived for all the coding schemes. Index Terms — Change detection, ideal code length, minimum description length, piecewisestationary memoryless source, redundancy, segmentation, sequential coding, source block code, strongly sequential coding, transition path, universal coding, weighting. I.
An informationtheoretic approach to detecting changes in multidimensional data streams
 In Proc. Symp. on the Interface of Statistics, Computing Science, and Applications
, 2006
"... Abstract An important problem in processing large data streams is detecting changes in the underlying distribution that generates the data. The challenge in designing change detection schemes is making them general, scalable, and statistically sound. In this paper, we take a general,informationthe ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
Abstract An important problem in processing large data streams is detecting changes in the underlying distribution that generates the data. The challenge in designing change detection schemes is making them general, scalable, and statistically sound. In this paper, we take a general,informationtheoretic approach to the change detection problem, which works for multidimensional as well as categorical data. We use relative entropy, also called the KullbackLeiblerdistance, to measure the difference between two given distributions. The KLdistance is known to be related to the optimal error in determining whether the two distributions are the sameand draws on fundamental results in hypothesis testing. The KLdistance also generalizes traditional distance measures in statistics, and has invariance properties that make it ideally suitedfor comparing distributions. Our scheme is general; it is nonparametric and requires no assumptions on the underlyingdistributions. It employs a statistical inference procedure based on the theory of bootstrapping, which allows us to determine whether our measurements are statistically significant. The schemeis also quite flexible from a practical perspective; it can be implemented using any spatial partitioning scheme that scales well with dimensionality. In addition to providing change detections,our method generalizes Kulldorff's spatial scan statistic, allowing us to quantitatively identify specific regions in space where large changes have occurred.We provide a detailed experimental study that demonstrates the generality and efficiency of our approach with different kinds of multidimensional datasets, both synthetic and real. 1 Introduction We are collecting and storing data in unprecedented quantities and varietiesstreams, images, audio, text, metadata descriptions, and even simple numbers. Over time, these data streams change as the underlying processes that generate them change. Some changes are spurious and pertain to glitches in the data. Some are genuine, caused by changes in the underlying distributions. Some changes are gradual and some are more precipitous. We would like to detect changes in a variety of settings:
Online learning of nonstationary sequences
 In Advances in Neural Information Processing Systems
, 2003
"... We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. The performance of each expert may change over time in a manner unknown to the learner. We formulate a class of universal learning algorithms for this problem by expressing them ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
We consider an online learning scenario in which the learner can make predictions on the basis of a fixed set of experts. The performance of each expert may change over time in a manner unknown to the learner. We formulate a class of universal learning algorithms for this problem by expressing them as simple Bayesian algorithms operating on models analogous to Hidden Markov Models (HMMs). We derive a new performance bound for such algorithms which is considerably simpler than existing bounds. The bound provides the basis for learning the rate at which the identity of the optimal expert switches over time. We find an analytic expression for the aprioriresolution at which we need to learn the rate parameter. We extend our scalar switchingrate result to models of the switchingrate that aregoverned by a matrix of parameters, i.e. arbitrary homogeneous HMMs. We apply and examine our algorithm in the context of the problem of energy management in
Adaptive Mixtures of Probabilistic Transducers
 Neural Computation
, 1996
"... We describe and analyze a mixture model for supervised learning of probabilistic transducers. We devise an online learning algorithm that efficiently infers the structure and estimates the parameters of each probabilistic transducer in the mixture. Theoretical analysis and comparative simulations i ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
We describe and analyze a mixture model for supervised learning of probabilistic transducers. We devise an online learning algorithm that efficiently infers the structure and estimates the parameters of each probabilistic transducer in the mixture. Theoretical analysis and comparative simulations indicate that the learning algorithm tracks the best transducer from an arbitrarily large (possibly infinite) pool of models. We also present an application of the model for inducing a noun phrase recognizer. 1 Introduction Supervised learning of probabilistic mappings between temporal sequences is an important goal of natural data analysis and classification with a broad range of applications, including handwriting and speech recognition, natural language processing and biological sequence analysis. Research efforts in supervised learning of probabilistic mappings have been almost exclusively focused on estimating the parameters of a predefined model. For example, Giles et al. (1992) used a...
Universal Linear Least Squares Prediction: Upper and Lower Bounds
 IEEE Trans. Inf. Theory
, 2002
"... Universal linear least squares prediction of realvalued bounded individual sequences in the presence of additive bounded noise is considered. It is shown that there is a sequential predictor observing noisy samples of the sequence to be predicted only, whose loss in terms of the noisefree sequence ..."
Abstract

Cited by 17 (12 self)
 Add to MetaCart
Universal linear least squares prediction of realvalued bounded individual sequences in the presence of additive bounded noise is considered. It is shown that there is a sequential predictor observing noisy samples of the sequence to be predicted only, whose loss in terms of the noisefree sequence is asymptotically as small as that of the best batch predictor out of the class of all linear predictors with knowledge of the entire noisy sequence in advance. Index Terms — Prediction, least squares, linear, noise 1.
Context Weighting for General FiniteContext Sources
 IEEE Trans. Inform. Theory
, 1996
"... Context weighting procedures are presented for sources with models (structures) in four different classes. Although the procedures are designed for universal data compression purposes, their generality allows application in the area of classification. 1 Introduction Recently in [14],[15] the author ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Context weighting procedures are presented for sources with models (structures) in four different classes. Although the procedures are designed for universal data compression purposes, their generality allows application in the area of classification. 1 Introduction Recently in [14],[15] the authors introduced contexttree weighting as a sequential universal source coding method for the class of binary (bounded memory) tree sources. Tree sources were defined around the same time by Weinberger et al. [13]. The idea behind context weighting procedures can be summarized as follows 1 : The well known Elias algorithm (described in e.g. Jelinek [1]) produces for any coding distribution P c (x T 1 ) over all binary sequences of length T , a binary prefix code with codeword lengths L(x T 1 ) that satisfy L(x T 1 ) ! log 1 P c (x T 1 ) + 2 for all x T 1 : (1) (We assume that the base of the log(\Delta) is 2. Codeword lengths and information quantities are expressed in bits.) If th...
Beyond Word NGrams
, 1995
"... We describe, analyze, and experimentally evaluate a new probabilistic model for word sequence prediction in natural languages, based on prediction suffix trees (PSTs). By using efficient data structures, we extend the notion of PST to unbounded vocabularies. We also show how to use a Bayesian ap ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
We describe, analyze, and experimentally evaluate a new probabilistic model for word sequence prediction in natural languages, based on prediction suffix trees (PSTs). By using efficient data structures, we extend the notion of PST to unbounded vocabularies. We also show how to use a Bayesian approach based on recursire priors over all possible PSTs to efficiently maintain tree mixtures. These mixtures have provably and practically better performance than almost any single model. Finally, we evaluate the model on several corpora. The low perplexity achieved by relatively small PST mixture models suggests that they may be an advantageous alternative, both theoretically and practically, to the widely used ngram models.
Implementing the Context Tree Weighting Method for Text Compression
 In Data Compression Conference
, 2000
"... Context tree weighting method is a universal compression algorithm for FSMX sources. Though we expect that it will have good compression ratio in practice, it is difficult to implement it and in many cases the implementation is only for estimating compression ratio. Though Willems and Tjalkens showe ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Context tree weighting method is a universal compression algorithm for FSMX sources. Though we expect that it will have good compression ratio in practice, it is difficult to implement it and in many cases the implementation is only for estimating compression ratio. Though Willems and Tjalkens showed practical implementation using not block probabilities but conditional probabilities, it is used for only binary alphabet sequences. We extend the method for multialphabet sequences and show a simple implementation using PPM techniques. We also propose a method to optimize a parameter of the context tree weighting for binary alphabet case. Experimental results on texts and DNA sequences show that the performance of PPM can be improved by combining the context tree weighting and that DNA sequences can be compressed in less than 2.0 bpc.