Results 1  10
of
11
Application of Kolmogorov complexity and universal codes to identity testing and nonparametric testing of serial independence for time series
, 2006
"... ..."
Clustering processes
"... The problem of clustering is considered, for the case when each data point is a sample generated by a stationary ergodic process. We propose a very natural asymptotic notion of consistency, and show that simple consistent algorithms exist, under most general nonparametric assumptions. The notion of ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
The problem of clustering is considered, for the case when each data point is a sample generated by a stationary ergodic process. We propose a very natural asymptotic notion of consistency, and show that simple consistent algorithms exist, under most general nonparametric assumptions. The notion of consistency is as follows: two samples should be put into the same cluster if and only if they were generated by the same distribution. With this notion of consistency, clustering generalizes such classical statistical problems as homogeneity testing and process classification. We show that, for the case of a known number of clusters, consistency can be achieved under the only assumption that the joint distribution of the data is stationary ergodic (no parametric or Markovian assumptions, no assumptions of independence, neither between nor within the samples). If the number of clusters is unknown, consistency can be achieved under appropriate assumptions on the mixing rates of the processes. In both cases we give examples of simple (at most quadratic in each argument) algorithms which are consistent. 1.
On Finding Predictors for Arbitrary Families of Processes
"... The problem is sequence prediction in the following setting. A sequence x1,..., xn,... of discretevalued observations is generated according to some unknown probabilistic law (measure) µ. After observing each outcome, it is required to give the conditional probabilities of the next observation. The ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
The problem is sequence prediction in the following setting. A sequence x1,..., xn,... of discretevalued observations is generated according to some unknown probabilistic law (measure) µ. After observing each outcome, it is required to give the conditional probabilities of the next observation. The measure µ belongs to an arbitrary but known class C of stochastic process measures. We are interested in predictors ρ whose conditional probabilities converge (in some sense) to the “true ” µconditional probabilities, if any µ ∈ C is chosen to generate the sequence. The contribution of this work is in characterizing the families C for which such predictors exist, and in providing a specific and simple form in which to look for a solution. We show that if any predictor works, then there exists a Bayesian predictor, whose prior is discrete, and which works too. We also find several sufficient and necessary conditions for the existence of a predictor, in terms of topological characterizations of the family C, as well as in terms of local behaviour of the measures in C, which in some cases lead to procedures for constructing such predictors. It should be emphasized that the framework is completely general: the stochastic processes considered are not required to be i.i.d., stationary, or to belong to any parametric or countable family. 1
Confidence Sets in Time–Series Filtering
"... Abstract—The problem of filtering of finite–alphabet stationary ergodic time series is considered. A method for constructing a confidence set for the (unknown) signal is proposed, such that the resulting set has the following properties: First, it includes the unknown signal with probability γ, wher ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract—The problem of filtering of finite–alphabet stationary ergodic time series is considered. A method for constructing a confidence set for the (unknown) signal is proposed, such that the resulting set has the following properties: First, it includes the unknown signal with probability γ, where γ is a parameter supplied to the filter. Second, the size of the confidence sets grows exponentially with the rate that is asymptotically equal to the conditional entropy of the signal given the data. Moreover, it is shown that this rate is optimal. I.
Unsupervised modelfree representation learning
"... Abstract. Numerous control and learning problems face the situation where sequences of highdimensional highly dependent data are available, but no or little feedback is provided to the learner. In such situations it may be useful to find a concise representation of the input signal, that would pres ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. Numerous control and learning problems face the situation where sequences of highdimensional highly dependent data are available, but no or little feedback is provided to the learner. In such situations it may be useful to find a concise representation of the input signal, that would preserve as much as possible of the relevant information. In this work we are interested in the problems where the relevant information is in the timeseries dependence. Thus, the problem can be formalized as follows. Given a series of observations X0,..., Xn coming from a large (highdimensional) space X, find a representation function f mapping X to a finite space Y such that the series f(X0),..., f(Xn) preserve as much information as possible about the original timeseries dependence in X0,..., Xn. For stationary time series, the function f can be selected as the one maximizing the timeseries information I∞(f) = h0(f(X)) − h∞(f(X)) where h0(f(X)) is the Shannon entropy of f(X0) and h∞(f(X)) is the entropy rate of the time series f(X0),..., f(Xn),.... In this paper we study the functional I∞(f) from the learningtheoretic point of view. Specifically, we provide some uniform approximation results, and study the behaviour of I∞(f) in the problem of optimal control. 1
Timeseries information and learning
 In Proc. 2013 IEEE International Symposium on Information Theory
, 2013
"... in a large (highdimensional) space X, we would like to find a function f from X to a small (lowdimensional or finite) space Y such that the time series f(X1),..., f(Xn),... retains all the information about the timeseries dependence in the original sequence, or as much as possible thereof. This g ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
in a large (highdimensional) space X, we would like to find a function f from X to a small (lowdimensional or finite) space Y such that the time series f(X1),..., f(Xn),... retains all the information about the timeseries dependence in the original sequence, or as much as possible thereof. This goal is formalized in this work, and it is shown that the target function f can be found as the one that maximizes a certain quantity that can be expressed in terms of entropies of the series (f(Xi))i∈N. This quantity can be estimated empirically, and does not involve estimating the distribution on the original time series (Xi)i∈N. I.
Uniform hypothesis testing for ergodic time series distributions
, 2011
"... Given a discretevalued sample X1,...,Xn we wish to decide whether it was generated by a distribution belonging to a family H0, or it was generated by a distribution belonging to a family H1. In this work we assume that all distributions are stationary ergodic, and do not make any further assumption ..."
Abstract
 Add to MetaCart
Given a discretevalued sample X1,...,Xn we wish to decide whether it was generated by a distribution belonging to a family H0, or it was generated by a distribution belonging to a family H1. In this work we assume that all distributions are stationary ergodic, and do not make any further assumptions (e.g. no independence or mixing rate assumptions). We would like to have a test whose probability of error (both Type I and Type II) is uniformly bounded. More precisely, we require that for each ε there exist a sample size n such that probability of error is upperbounded by ε for samples longer than n. We find some necessaryand some sufficient conditions on H0 and H1 under which a consistent test (with this notion of consistency) exists. These conditions are topological, with respect to the topology of distributional distance. 1
2011 IEEE International Symposium on Information Theory Proceedings Confidence Sets in Time–Series Filtering
"... Abstract—The problem of filtering of finite–alphabet stationary ergodic time series is considered. A method for constructing a confidence set for the (unknown) signal is proposed, such that the resulting set has the following properties: First, it includes the unknown signal with probability γ, wher ..."
Abstract
 Add to MetaCart
Abstract—The problem of filtering of finite–alphabet stationary ergodic time series is considered. A method for constructing a confidence set for the (unknown) signal is proposed, such that the resulting set has the following properties: First, it includes the unknown signal with probability γ, where γ is a parameter supplied to the filter. Second, the size of the confidence sets grows exponentially with the rate that is asymptotically equal to the conditional entropy of the signal given the data. Moreover, it is shown that this rate is optimal. I.
Novosibirsk
"... Abstract — The statistical structure of DNA–sequences is of a great interest to molecular biology, genetics and the theory of evolution (see Chen and others, GIW99, 1999, Aktulga and others, EURASIP J. of Bioinformatics and Systems Biology, 2007, Li, Computers and Chemistry, 1997). One of the appro ..."
Abstract
 Add to MetaCart
Abstract — The statistical structure of DNA–sequences is of a great interest to molecular biology, genetics and the theory of evolution (see Chen and others, GIW99, 1999, Aktulga and others, EURASIP J. of Bioinformatics and Systems Biology, 2007, Li, Computers and Chemistry, 1997). One of the approaches is a sequence modeling using Markov processes of different orders, and further statistical estimation of their parameters (see Simons and others, JSPI, 2005). In this paper we use firstly the test for the serial independence from Ryabko, Astola (Stat. Methodology, 2006) to estimate the ”memory ” (or connectivity) of genetic texts and secondly we apply the homogeneity test for solving the DNA– based problem connected to the phylogenetic system of various organisms. I.