#### DMCA

## Reducing statistical time-series problems to binary classification (2012)

Venue: | in ‘Neural Information Processing Systems (NIPS)’, Lake Tahoe, Nevada, United States |

Citations: | 5 - 4 self |

### Citations

6496 | LIBSVM: A Library for Support Vector Machines, 2001. Software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm
- Chang, Lin
(Show Context)
Citation Context ...ng is used, with the telescope distance between samples calculated using an SVM, as described in Section 4. In all experiments, SVM is used with radial basis kernel, with default parameters of libsvm =-=[5]-=-. The parameters wk in the definition of the telescope distance (Definition 1) are set to wk := k −2. 8.1 Synthetic data For the artificial setting we have chosen highly-dependent time series distribu... |

835 |
Convergence of Stochastic Processes,
- Pollard
- 1984
(Show Context)
Citation Context ...are any integers in 1..n and ln = n/tn. The parameters tn should be set according to the values of β in order to optimize the bound. One can use similar bounds for classes of finite Pollard dimension =-=[18]-=- or more general bounds expressed in terms of covering numbers, such as those given in [12]. Here we consider classes of finite VC dimension only for the ease of the exposition and for the sake of con... |

788 | Dynamic programming algorithm optimization for spoken word recognition,”
- Sakoe, Chiba
- 1978
(Show Context)
Citation Context ..., labeled TSSVM. All the computation for this experiment takes approximately 6 minutes on a standard laptop. The following methods were used for comparison. First, we used dynamic time wrapping (DTW) =-=[24]-=- which is a popular base-line approach for time-series clustering. The other two methods in Table 1 are from [10]. The comparison is not fully relevant, since the results in [10] are for different set... |

183 |
Sulla determinazione empirica di una legge di distribuzione”,
- Kolmogorov
- 1933
(Show Context)
Citation Context ...) and a set H of measurable functions on X , one can define the distance dH(P,Q) := sup h∈H |EPh−EQh|. 2 This metric has been studied since at least [26]; its special cases include Kolmogorov-Smirnov =-=[15]-=-, Kantorovich-Rubinstein [11] and Fortet-Mourier [7] metrics. Note that the distance function so defined may not be measurable; however, it is measurable under mild conditions which we assume when nec... |

139 | Detecting change in data streams
- Kifer, Ben-David, et al.
- 2004
(Show Context)
Citation Context ... can be used to reduce various statistical problems to the classification problem. This distance was previously applied to such statistical problems as homogeneity testing and change-point estimation =-=[14]-=-. However, these applications so far have only concerned i.i.d. data, whereas we want to work with highly-dependent time series. Thus, the second building block are the recent results of [1, 2], that ... |

101 |
The Ergodic Theory of Discrete Sample Paths,
- Shields
- 1996
(Show Context)
Citation Context ...of 0 and covariance matrix Id×1/4. N2 is the same but with mean 1. If α is irrational1 then the distribution ρ(α) is stationary ergodic, but does not belong to any simpler natural distribution family =-=[25]-=-. The single-dimensional marginal is the same for all values of α. The latter two properties make all parametric and most non-parametric methods inapplicable to this problem. In our experiments, we us... |

84 |
Prediction of random sequences and universal coding.
- Ryabko
- 1988
(Show Context)
Citation Context ...al distance. The empirical distance is based on counting frequencies of bins of decreasing sizes and “telescoping.” A similar telescoping trick is used in different problems, e.g. sequence prediction =-=[19]-=-. Another related approach to time-series analysis involves a different reduction, namely, that to data compression [20]. Organisation. Section 2 is preliminary. In Section 3 we introduce and discuss ... |

72 |
Asymptotically optimal classification for multiple test with empirically observed statistics".
- Gutman
- 1989
(Show Context)
Citation Context ...umed to be stationary ergodic, but no further assumptions are made about them (no independence, mixing or memory assumptions). The three sample-problem for dependent time series has been addressed in =-=[9]-=- for Markov processes and in [23] for stationary ergodic time series. The latter work uses an approach based on the distributional distance. Indeed, to solve this problem it suffices to have consisten... |

53 |
On the need for on-line learning in braincomputer interfaces,”
- Millan
- 2004
(Show Context)
Citation Context ...es about 5 min. on a standard laptop. 8.2 Real data To demonstrate the applicability of the proposed methods to realistic scenarios, we chose the braincomputer interface data from BCI competition III =-=[17]-=-. The dataset consists of (pre-processed) BCI recordings of mental imagery: a person is thinking about one of three subjects (left foot, right foot, a random letter). Originally, each time series cons... |

40 | A discriminative framework for clustering via similarity functions.
- Balcan, Blum, et al.
- 2008
(Show Context)
Citation Context ...mated. Our approach here is based on the telescope distance, and thus we use D̂. The clustering problem is relatively simple if the target clustering has what is called the strict separation property =-=[4]-=-: every two points in the same target cluster are closer to each other than to any point from a different target cluster. The following statement is an easy corollary of Theorem 1. Theorem 3. Let the ... |

36 | Robust reductions from ranking to classification.
- Balcan, Bansal, et al.
- 2008
(Show Context)
Citation Context ... learning problems by reducing them to binary classification. This approach has been applied to many different problems, starting with multi-class classification, and including regression and ranking =-=[3, 16]-=-, to give just a few examples. However, all of these problems are formulated in terms of independent and identically distributed (i.i.d.) samples. This is also the assumption underlying the theoretica... |

29 | Kernel Change-Point Analysis
- Harchaoui, Bach, et al.
- 2009
(Show Context)
Citation Context ...following methods were used for comparison. First, we used dynamic time wrapping (DTW) [24] which is a popular base-line approach for time-series clustering. The other two methods in Table 1 are from =-=[10]-=-. The comparison is not fully relevant, since the results in [10] are for different settings; the method KCpA was used in change-point estimation method (a different but also unsupervised setting), an... |

26 |
Convergence de la répartition empirique vers la répartition théorique
- Fortet, Mourier
- 1953
(Show Context)
Citation Context ...define the distance dH(P,Q) := sup h∈H |EPh−EQh|. 2 This metric has been studied since at least [26]; its special cases include Kolmogorov-Smirnov [15], Kantorovich-Rubinstein [11] and Fortet-Mourier =-=[7]-=- metrics. Note that the distance function so defined may not be measurable; however, it is measurable under mild conditions which we assume when necessary. In particular, separability of H is a suffic... |

25 | Nonparametric statistical inference for ergodic processes.
- Ryabko, Ryabko
- 2010
(Show Context)
Citation Context ... approach to address the problems considered here, as well some related problems about stationary ergodic time series, is based on (consistent) empirical estimates of the distributional distance, see =-=[23, 21, 13]-=- and [8] about the distributional distance. The empirical distance is based on counting frequencies of bins of decreasing sizes and “telescoping.” A similar telescoping trick is used in different prob... |

19 | Uniform convergence of Vapnik-Chervonenkis classes under ergodic sampling. The Annals of Probability
- Adams, Nobel
- 2010
(Show Context)
Citation Context ...timation [14]. However, these applications so far have only concerned i.i.d. data, whereas we want to work with highly-dependent time series. Thus, the second building block are the recent results of =-=[1, 2]-=-, that show that empirical estimates of dH are consistent (under certain conditions on H) for arbitrary stationary ergodic distributions. This, however, is not enough: evaluating dH for (stationary er... |

19 | Discrimination between B-processes is impossible
- Ryabko
(Show Context)
Citation Context ... ergodic; this is one of the weakest assumptions used in statistics. For homogeneity testing we have to make some mixing assumptions in order to obtain consistency results (this is indeed unavoidable =-=[22]-=-). Mixing conditions are also used to obtain finite-sample performance guarantees for the first two problems. The proposed approach is based on a new distance between time-series distributions (that i... |

17 | Compression-based methods for nonparametric prediction and estimation of some characteristics of time series
- Ryabko
(Show Context)
Citation Context ...milar telescoping trick is used in different problems, e.g. sequence prediction [19]. Another related approach to time-series analysis involves a different reduction, namely, that to data compression =-=[20]-=-. Organisation. Section 2 is preliminary. In Section 3 we introduce and discuss the telescope distance. Section 4 explains how this distance can be calculated using binary classification methods. Sect... |

17 | Clustering processes.
- Ryabko
- 2010
(Show Context)
Citation Context ... approach to address the problems considered here, as well some related problems about stationary ergodic time series, is based on (consistent) empirical estimates of the distributional distance, see =-=[23, 21, 13]-=- and [8] about the distributional distance. The empirical distance is based on counting frequencies of bins of decreasing sizes and “telescoping.” A similar telescoping trick is used in different prob... |

13 |
Rates of uniform convergence of empirical means with mixing processes. Statistics and Probability Letters,
- Karandikar, Vidyasagar
- 2002
(Show Context)
Citation Context ...the weights wk in the definition 1 of D̂, wk := 2 −k. (8) The general tool that we use to obtain performance guarantees in this section is the following bound that can be obtained from the results of =-=[12]-=-. qn(ρ,Hk, ε) := ρ ( sup h∈Hk ∣∣∣∣∣ 1n− k + 1 n−k+1∑ i=1 h(Xi..i+k−1)−Eρh(X1..k) ∣∣∣∣∣ > ε ) ≤ nβ(ρ, tn − k) + 8tdk+1n e−lnε 2/8, (9) where tn are any integers in 1..n and ln = n/tn. The parameters tn... |

13 | Online clustering of processes.
- Khaleghi, Ryabko, et al.
- 2012
(Show Context)
Citation Context ... approach to address the problems considered here, as well some related problems about stationary ergodic time series, is based on (consistent) empirical estimates of the distributional distance, see =-=[23, 21, 13]-=- and [8] about the distributional distance. The empirical distance is based on counting frequencies of bins of decreasing sizes and “telescoping.” A similar telescoping trick is used in different prob... |

13 |
Metric distances in spaces of random variables and their distributions
- Zolotarev
- 1976
(Show Context)
Citation Context ...For two probability distributions P and Q on (X ,F1) and a set H of measurable functions on X , one can define the distance dH(P,Q) := sup h∈H |EPh−EQh|. 2 This metric has been studied since at least =-=[26]-=-; its special cases include Kolmogorov-Smirnov [15], Kantorovich-Rubinstein [11] and Fortet-Mourier [7] metrics. Note that the distance function so defined may not be measurable; however, it is measur... |

11 | Predicting Conditional Quantiles via Reduction to Classification. In:
- Langford, Oliveira, et al.
- 2006
(Show Context)
Citation Context ... learning problems by reducing them to binary classification. This approach has been applied to many different problems, starting with multi-class classification, and including regression and ranking =-=[3, 16]-=-, to give just a few examples. However, all of these problems are formulated in terms of independent and identically distributed (i.i.d.) samples. This is also the assumption underlying the theoretica... |

9 |
On a functional space and certain extremal problems,
- Rubinstein
- 1957
(Show Context)
Citation Context ...unctions on X , one can define the distance dH(P,Q) := sup h∈H |EPh−EQh|. 2 This metric has been studied since at least [26]; its special cases include Kolmogorov-Smirnov [15], Kantorovich-Rubinstein =-=[11]-=- and Fortet-Mourier [7] metrics. Note that the distance function so defined may not be measurable; however, it is measurable under mild conditions which we assume when necessary. In particular, separa... |

7 |
Eric Moulines. Kernel change-point analysis
- Harchaoui, Bach
- 2008
(Show Context)
Citation Context ...following methods were used for comparison. First, we used dynamic time wrapping (DTW) [24] which is a popular base-line approach for time-series clustering. The other two methods in Table 1 are from =-=[10]-=-. The comparison is not fully relevant, since the results in [10] are for different settings; the method KCpA was used in change-point estimation method (a different but also unsupervised setting), an... |

5 | Uniform approximation of Vapnik-Chervonenkis classes
- Adams, Nobel
(Show Context)
Citation Context ...timation [14]. However, these applications so far have only concerned i.i.d. data, whereas we want to work with highly-dependent time series. Thus, the second building block are the recent results of =-=[1, 2]-=-, that show that empirical estimates of dH are consistent (under certain conditions on H) for arbitrary stationary ergodic distributions. This, however, is not enough: evaluating dH for (stationary er... |