Results 1  10
of
34
On prediction using variable order Markov models
 JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2004
"... This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Cont ..."
Abstract

Cited by 56 (1 self)
 Add to MetaCart
This paper is concerned with algorithms for prediction of discrete sequences over a finite alphabet, using variable order Markov models. The class of such algorithms is large and in principle includes any lossless compression algorithm. We focus on six prominent prediction algorithms, including Context Tree Weighting (CTW), Prediction by Partial Match (PPM) and Probabilistic Suffix Trees (PSTs). We discuss the properties of these algorithms and compare their performance using real life sequences from three domains: proteins, English text and music pieces. The comparison is made with respect to prediction quality as measured by the average logloss. We also compare classification algorithms based on these predictors with respect to a number of large protein classification tasks. Our results indicate that a “decomposed” CTW (a variant of the CTW algorithm) and PPM outperform all other algorithms in sequence prediction tasks. Somewhat surprisingly, a different algorithm, which is a modification of the LempelZiv compression algorithm, significantly outperforms all algorithms on the protein classification problems.
Context tree estimation for not necessarily finite memory processes, via BIC and MDL
 IEEE Trans. Inf. Theory
, 2006
"... The concept of context tree, usually defined for finite memory processes, is extended to arbitrary stationary ergodic processes (with finite alphabet). These context trees are not necessarily complete, and may be of infinite depth. The familiar BIC and MDL principles are shown to provide strongly co ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
The concept of context tree, usually defined for finite memory processes, is extended to arbitrary stationary ergodic processes (with finite alphabet). These context trees are not necessarily complete, and may be of infinite depth. The familiar BIC and MDL principles are shown to provide strongly consistent estimators of the context tree, via optimization of a criterion for hypothetical context trees of finite depth, allowed to grow with the sample size n as o(log n). Algorithms are provided to compute these estimators in O(n) time, and to compute them online for all i ≤ n in o(n log n) time.
Efficient Universal Lossless Data Compression Algorithms Based on a Greedy Sequential Grammar Transform  Part One: Without Context Models
 IEEE TRANSACTIONS ON INFORMATION THEORY
, 2000
"... A grammar transform is a transformation that converts any data sequence to be compressed into a grammar from which the original data sequence can be fully reconstructed. In a grammarbased code, a data sequence is first converted into a grammar by a grammar transform and then losslessly encoded. In ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
A grammar transform is a transformation that converts any data sequence to be compressed into a grammar from which the original data sequence can be fully reconstructed. In a grammarbased code, a data sequence is first converted into a grammar by a grammar transform and then losslessly encoded. In this paper, a greedy grammar transform is first presented; this grammar transform constructs sequentially a sequence of irreducible grammars from which the original data sequence can be recovered incrementally. Based on this grammar transform, three universal lossless data compression algorithms, a sequential algorithm, an improved sequential algorithm, and a hierarchical algorithm, are then developed. These algorithms combine the power of arithmetic coding with that of string matching. It is shown that these algorithms are all universal in the sense that they can achieve asymptotically the entropy rate of any stationary, ergodic source. Moreover, it is proved that their worst case redundancies among all individual sequences of length are upperbounded by �� � �� � �� � , where is a constant. Simulation results show that the proposed algorithms outperform the Unix Compress and Gzip algorithms, which are based on LZ78 and LZ77, respectively.
Schemes for BiDirectional Modeling of Discrete Stationary Sources
, 2005
"... Adaptive models are developed to deal with bidirectional modeling of unknown discrete stationary sources, which can be generally applied to statistical inference problems such as noncausal universal discrete denoising that exploits bidirectional dependencies. Efficient algorithms for constructing ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
Adaptive models are developed to deal with bidirectional modeling of unknown discrete stationary sources, which can be generally applied to statistical inference problems such as noncausal universal discrete denoising that exploits bidirectional dependencies. Efficient algorithms for constructing those models are developed and implemented. Denoising is a primary focus of the application of those models, and we compare their performance to that of the DUDE algorithm [1] for universal discrete denoising.
A Monte Carlo AIXI Approximation
 J. Artif. Intell. Res
"... This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. Thi ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. This allows a Bayesian mixture of environment models to be computed in time proportional to the logarithm of the size of the model class. Secondly, the finitehorizon expectimax search is approximated by an asymptotically convergent Monte Carlo Tree Search technique. This scaled down AIXI agent is empirically shown to be effective on a wide class of toy problem domains, ranging from simple fully observable games to small POMDPs. We explore the limits of this approximate agent and propose a general heuristic framework for scaling this technique to much larger problems.
Linear Time Universal Coding and Time Reversal of Tree Sources via FSM Closure
 IEEE Trans. Inform. Theory
, 2004
"... Tree models are efficient parametrizations of finitememory processes, offering potentially significant model cost savings. The information theory literature has focused mostly on redundancy aspects of the universal estimation and coding of these models. In this paper, we investigate representations ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Tree models are efficient parametrizations of finitememory processes, offering potentially significant model cost savings. The information theory literature has focused mostly on redundancy aspects of the universal estimation and coding of these models. In this paper, we investigate representations and supporting data structures for finitememory processes, as well as the major impact these structures have on the computational complexity of the universal algorithms in which they are used. We first generalize the class of tree models, and then define and investigate the properties of the finite state machine (FSM) closure of a tree, which is the smallest FSM that generates all the processes generated by the tree. The interaction between FSM closures, generalized context trees, and classical data structures such as compact suffix trees brings together the informationtheoretic and the computational aspects, leading to an implementation in linear encoding/decoding time of the semipredictive approach to the Context algorithm, a lossless universal coding scheme in the class of tree models. An optimal context selection rule and the corresponding context transitions are computationally not more expensive than the various steps involved in the implementation of the BurrowsWheeler transform (BWT) and use, in fact, similar tools. We also present a reversible transform that displays the same "context deinterleaving" feature as the BWT but is naturally based on an optimal context tree. FSM closures are also applied to an investigation of the effect of time reversal on tree models, motivated in part by the following question: When compressing a data sequence using a universal scheme in the class of tree models, can it make a difference whether we read the sequence from...
Lossless compression based on the Sequence Memoizer
 In Data Compression Conference 2010
, 2010
"... In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of PitmanYor processes of unbounded depth previously proposed by Wood et al. [2009] in the context of language modelling, allows modelling ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of PitmanYor processes of unbounded depth previously proposed by Wood et al. [2009] in the context of language modelling, allows modelling of longrange dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties. 1
A MonteCarlo AIXI Approximation
, 2009
"... This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. Thi ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
This paper describes a computationally feasible approximation to the AIXI agent, a universal reinforcement learning agent for arbitrary environments. AIXI is scaled down in two key ways: First, the class of environment models is restricted to all prediction suffix trees of a fixed maximum depth. This allows a Bayesian mixture of environment models to be computed in time proportional to the logarithm of the size of the model class. Secondly, the finitehorizon expectimax search is approximated by an asymptotically convergent Monte Carlo Tree Search technique. This scaled down AIXI agent is empirically shown to be effective on a wide class of toy problem domains, ranging from simple fully observable games to small POMDPs. We explore the limits of this approximate agent and propose a general heuristic framework for scaling this technique to much larger problems.
An O(n) semipredictive universal encoder via the BWT
 IEEE Trans. Inform. Theory
, 2004
"... We provide an O(N) algorithm for a nonsequential semipredictive encoder whose pointwise redundancy with respect to any (unbounded depth) tree source is O(1) bits per state above Rissanen’s lower bound. This is achieved by using the Burrows Wheeler transform (BWT), an invertible permutation transfo ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We provide an O(N) algorithm for a nonsequential semipredictive encoder whose pointwise redundancy with respect to any (unbounded depth) tree source is O(1) bits per state above Rissanen’s lower bound. This is achieved by using the Burrows Wheeler transform (BWT), an invertible permutation transform that has been suggested for lossless data compression. First, we use the BWT only as an efficient computational tool for pruning context trees, and encode the input sequence rather than the BWT output. Second, we estimate the minimum description length (MDL) source by incorporating suffix tree methods to construct the unbounded depth context tree that corresponds to the input sequence in O(N) time. Third, we point out that a variety of previous source coding methods required superlinear complexity for determining which tree source state generated each of the symbols of the input. We show how backtracking from the BWT output to the input sequence enables to solve this problem in O(N) worstcase complexity.
Estimating the SecrecyRate of Physical Unclonable Functions with the ContextTree Weighting Method
 In IEEE International Symposium on Information Theory
, 2006
"... Abstract — We propose methods to estimate the secrecyrate of fuzzy sources (e.g. biometrics and Physical Unclonable Functions (PUFs)) using contexttree weighting (CTW, Willems et al. [1995]). In this paper we focus on PUFs. In order to show that our estimates are realistic we first generalize Maur ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Abstract — We propose methods to estimate the secrecyrate of fuzzy sources (e.g. biometrics and Physical Unclonable Functions (PUFs)) using contexttree weighting (CTW, Willems et al. [1995]). In this paper we focus on PUFs. In order to show that our estimates are realistic we first generalize Maurer’s [1993] result to the ergodic case. Then we focus on the fact that the entropy of a stationary twodimensional structure is a limit of a series of conditional entropies, a result by Anastassiou and Sakrison [1982]. We extend this result to the conditional entropy of one twodimensional structure given another one. Finally we show that the general CTWmethod approaches the source entropy also in the twodimensional stationary case. We further extend this result to the twodimensional conditional entropy. Based on the obtained results we do several measurements on (our) optical PUFs. These measurements allow us to conclude that a secrecyrate of 0.3 bit/location is possible. I. GENERATING A SHARED SECRET KEY A shared secret key can be produced by two terminals if these terminals observe dependent sequences and at least one of the terminals is allowed to transmit a message to the other one. Although the transmitted message is public, it need not reveal information about the secret key that is generated. This concept was described by Maurer [6] and was ✻S