Results 1  10
of
71
Bootstraps for Time Series
, 1999
"... We compare and review block, sieve and local bootstraps for time series and thereby illuminate theoretical facts as well as performance on nitesample data. Our (re) view is selective with the intention to get a new and fair picture about some particular aspects of bootstrapping time series. The ge ..."
Abstract

Cited by 57 (4 self)
 Add to MetaCart
We compare and review block, sieve and local bootstraps for time series and thereby illuminate theoretical facts as well as performance on nitesample data. Our (re) view is selective with the intention to get a new and fair picture about some particular aspects of bootstrapping time series. The generality of the block bootstrap is contrasted by sieve bootstraps. We discuss implementational dis/advantages and argue that two types of sieves outperform the block method, each of them in its own important niche, namely linear and categorical processes, respectively. Local bootstraps, designed for nonparametric smoothing problems, are easy to use and implement but exhibit in some cases low performance. Key words and phrases. Autoregression, block bootstrap, categorical time series, context algorithm, double bootstrap, linear process, local bootstrap, Markov chain, sieve bootstrap, stationary process. 1 Introduction Bootstrapping can be viewed as simulating a statistic or statistical pro...
Computational mechanics: Pattern and prediction, structure and simplicity
 Journal of Statistical Physics
, 1999
"... Computational mechanics, an approach to structural complexity, defines a process’s causal states and gives a procedure for finding them. We show that the causalstate representation—an Emachine—is the minimal one consistent with ..."
Abstract

Cited by 44 (8 self)
 Add to MetaCart
Computational mechanics, an approach to structural complexity, defines a process’s causal states and gives a procedure for finding them. We show that the causalstate representation—an Emachine—is the minimal one consistent with
Architectural Bias in Recurrent Neural Networks  Fractal Analysis
 IEEE TRANSACTIONS ON NEURAL NETWORKS
"... We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoidtype activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoidtype activation functions are inherently biased towards Markov models, i.e. even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002; Tino, Cernansky & Benuskova, 2002; Tino, Cernansky & Benuskova, 2002a). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this paper we further extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finitestate transition diagram  a scenario that has been frequently considered in the past e.g. when studying RNNbased learning and implementation of regular grammars and finitestate transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as boxcounting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.
Predicting the Future of Discrete Sequences From Fractal Representations of the Past
, 2001
"... We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the nblock structure of the training sequence into a geometric structure of points in a unit hypercube, such ..."
Abstract

Cited by 29 (10 self)
 Add to MetaCart
We propose a novel approach for building nite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by rst transforming the nblock structure of the training sequence into a geometric structure of points in a unit hypercube, such that the longer is the common sux shared by any two nblocks, the closer lie their point representations.
Blind construction of optimal nonlinear recursive predictors for discrete sequences
 In “Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference
, 2004
"... We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (CausalState Splitting Reconstru ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (CausalState Splitting Reconstruction), which approximates the ideal predictor from data. We discuss the reliability of CSSR, its data requirements, and its performance in simulations. Finally, we compare our approach to existing methods using variablelength Markov models and crossvalidated hidden Markov models, and show theoretically and experimentally that our method delivers results superior to the former and at least comparable to the latter. 1
Context tree estimation for not necessarily finite memory processes, via BIC and MDL
 IEEE Trans. Inf. Theory
, 2006
"... The concept of context tree, usually defined for finite memory processes, is extended to arbitrary stationary ergodic processes (with finite alphabet). These context trees are not necessarily complete, and may be of infinite depth. The familiar BIC and MDL principles are shown to provide strongly co ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
The concept of context tree, usually defined for finite memory processes, is extended to arbitrary stationary ergodic processes (with finite alphabet). These context trees are not necessarily complete, and may be of infinite depth. The familiar BIC and MDL principles are shown to provide strongly consistent estimators of the context tree, via optimization of a criterion for hypothetical context trees of finite depth, allowed to grow with the sample size n as o(log n). Algorithms are provided to compute these estimators in O(n) time, and to compute them online for all i ≤ n in o(n log n) time.
Recurrent Neural Networks With Small Weights Implement Definite Memory Machines
 NEURAL COMPUTATION
, 2003
"... Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition funct ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
Recent experimental studies indicate that recurrent neural networks initialized with `small' weights are inherently biased towards definite memory machines (Tino, Cernansky, Benuskova, 2002a; Tino, Cernansky, Benuskova, 2002b). This paper establishes a theoretical counterpart: transition function of recurrent network with small weights and `squashing ' activation function is a contraction. We prove that recurrent networks with contractive transition function can be approximated arbitrarily well on input sequences of unbounded length by a definite mem
The mixture transition distribution model for highorder Markov chains and nonGaussian time series
 Statistical Science
, 2002
"... Abstract. The mixture transition distribution model (MTD) was introduced in 1985 by Raftery for the modeling of highorder Markov chains with a finite state space. Since then it has been generalized and successfully applied to a range of situations, including the analysis of wind directions, DNA seq ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Abstract. The mixture transition distribution model (MTD) was introduced in 1985 by Raftery for the modeling of highorder Markov chains with a finite state space. Since then it has been generalized and successfully applied to a range of situations, including the analysis of wind directions, DNA sequences and social behavior. Here we review the MTD model and the developments since 1985. We first introduce the basic principle and then we present several extensions, including general state spaces and spatial statistics. Following that, we review methods for estimating the model parameters. Finally, a review of different types of applications shows the practical interest of the MTD model. Key words and phrases: Mixture transition distribution (MTD) model, Markov chains, highorder dependences, time series, GMTD model, EM algorithm,
Transdimensional Markov Chains: A Decade of Progress and Future Perspectives
 Journal of the American Statistical Association
, 2005
"... The last ten years have witnessed the development of sampling frameworks that permit the construction of Markov chains which simultaneously traverse both parameter and model space. In this time substantial methodological progress has been made. In this article we present a survey of the current stat ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
The last ten years have witnessed the development of sampling frameworks that permit the construction of Markov chains which simultaneously traverse both parameter and model space. In this time substantial methodological progress has been made. In this article we present a survey of the current state of the art and evaluate some of the most recent advances in this field. We also discuss future research perspectives in the context of the drive to develop sampling mechanisms with high degrees of both efficiency and automation. 1
Schemes for BiDirectional Modeling of Discrete Stationary Sources
, 2005
"... Adaptive models are developed to deal with bidirectional modeling of unknown discrete stationary sources, which can be generally applied to statistical inference problems such as noncausal universal discrete denoising that exploits bidirectional dependencies. Efficient algorithms for constructing ..."
Abstract

Cited by 14 (9 self)
 Add to MetaCart
Adaptive models are developed to deal with bidirectional modeling of unknown discrete stationary sources, which can be generally applied to statistical inference problems such as noncausal universal discrete denoising that exploits bidirectional dependencies. Efficient algorithms for constructing those models are developed and implemented. Denoising is a primary focus of the application of those models, and we compare their performance to that of the DUDE algorithm [1] for universal discrete denoising.