Results 11  20
of
50
Detecting the Direction of Causal Time Series
"... We propose a method that detects the true direction of time series, by fitting an autoregressive moving average model to the data. Whenever the noise is independent of the previous samples for one ordering of the observations, but dependent for the opposite ordering, we infer the former direction to ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
We propose a method that detects the true direction of time series, by fitting an autoregressive moving average model to the data. Whenever the noise is independent of the previous samples for one ordering of the observations, but dependent for the opposite ordering, we infer the former direction to be the true one. We prove that our method works in the population case as long as the noise of the process is not normally distributed (for the latter case, the direction is not identifiable). A new and important implication of our result is that it confirms a fundamental conjecture in causal reasoning — if after regression the noise is independent of signal for one direction and dependent for the other, then the former represents the true causal direction — in the case of time series. We test our approach on two types of data: simulated data sets conforming to our modeling assumptions, and real world EEG time series. Our method makes a decision for a significant fraction of both data sets, and these decisions are mostly correct. For real world data, our approach outperforms alternative solutions to the problem of time direction recovery. 1.
A Fast, Consistent Kernel TwoSample Test
"... A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
A kernel embedding of probability distributions into reproducing kernel Hilbert spaces (RKHS) has recently been proposed, which allows the comparison of two probability measures P and Q based on the distance between their respective embeddings: for a sufficiently rich RKHS, this distance is zero if and only if P and Q coincide. In using this distance as a statistic for a test of whether two samples are from different distributions, a major difficulty arises in computing the significance threshold, since the empirical statistic has as its null distribution (where P = Q) an infinite weighted sum of χ 2 random variables. Prior finite sample approximations to the null distribution include using bootstrap resampling, which yields a consistent estimate but is computationally costly; and fitting a parametric model with the low order moments of the test statistic, which can work well in practice but has no consistency or accuracy guarantees. The main result of the present work is a novel estimate of the null distribution, computed from the eigenspectrum of the Gram matrix on the aggregate sample from P and Q, and having lower computational cost than the bootstrap. A proof of consistency of this estimate is provided. The performance of the null distribution estimate is compared with the bootstrap and parametric approaches on an artificial example, high dimensional multivariate data, and text. 1
Grammatical Inference as a Principal Component Analysis Problem
"... One of the main problems in probabilistic grammatical inference consists in inferring a stochastic language, i.e. a probability distribution, in some class of probabilistic models, from a sample of strings independently drawn according to a fixed unknown target distribution p. Here, we consider the ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
One of the main problems in probabilistic grammatical inference consists in inferring a stochastic language, i.e. a probability distribution, in some class of probabilistic models, from a sample of strings independently drawn according to a fixed unknown target distribution p. Here, we consider the class of rational stochastic languages composed of stochastic languages that can be computed by multiplicity automata, which can be viewed as a generalization of probabilistic automata. Rational stochastic languages p have a useful algebraic characterization: all the mappings ˙up: v → p(uv) lie in a finite dimensional vector subspace V ∗ p of the vector space R〈〈Σ〉 〉 composed of all realvalued functions defined over Σ ∗. Hence, a first step in the grammatical inference process can consist in identifying the subspace V ∗ p. In this paper, we study the possibility of using Principal Component Analysis to achieve this task. We provide an inference algorithm which computes an estimate of this space and then build a multiplicity automaton which computes an estimate of the target distribution. We prove some theoretical properties of this algorithm and we provide results from numerical simulations that confirm the relevance of our approach. 1.
On the Equivalence between Herding and Conditional Gradient Algorithms
, 2012
"... We show that the herding procedure of Welling (2009b) takes exactly the form of a standard convex optimization algorithm—namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider fast ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We show that the herding procedure of Welling (2009b) takes exactly the form of a standard convex optimization algorithm—namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We study the behavior of the different variants through numerical simulations. The experiments indicate that while we can improve over herding on the task of approximating integrals, the original herding algorithm tends to approach more often the maximum entropy distribution, shedding more light on the learning bias behind herding. 1
M.: Conditional mean embeddings as regressors
 ICML
, 2012
"... We demonstrate an equivalence between reproducing kernel Hilbert space (RKHS) embeddings of conditional distributions and vectorvalued regressors. This connection introduces a natural regularized loss function which the RKHS embeddings minimise, providing an intuitive understanding of the embedding ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
We demonstrate an equivalence between reproducing kernel Hilbert space (RKHS) embeddings of conditional distributions and vectorvalued regressors. This connection introduces a natural regularized loss function which the RKHS embeddings minimise, providing an intuitive understanding of the embeddings and a justification for their use. Furthermore, the equivalence allows the application of vectorvalued regression methods and results to the problem of learning conditional distributions. Using this link we derive a sparse version of the embedding by considering alternative formulations. Further, by applying convergence results for vectorvalued regression to the embedding problem we derive minimax convergence rates which are O(log(n)/n) – compared to current state of the art rates of O(n−1/4) – and are valid under milder and more intuitive assumptions. These minimax upper rates coincide with lower rates up to a logarithmic factor, showing that the embedding method achieves nearly optimal rates. We study our sparse embedding algorithm in a reinforcement learning task where the algorithm shows significant improvement in sparsity over an incomplete Cholesky decomposition. 1. Introduction/Motivation In recent years a framework for embedding probability distributions into reproducing kernel Hilbert spaces (RKHS)
Kernel methods for detecting the direction of time series
 In: Proccedings of the 32nd Annual Conference of the German Classification Society (GfKl 2008
, 2009
"... Summary. We propose two kernel based methods for detecting the time direction in empirical time series. First we apply a Support Vector Machine on the finitedimensional distributions of the time series (classification method) by embedding these distributions into a Reproducing Kernel Hilbert Space. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Summary. We propose two kernel based methods for detecting the time direction in empirical time series. First we apply a Support Vector Machine on the finitedimensional distributions of the time series (classification method) by embedding these distributions into a Reproducing Kernel Hilbert Space. For the ARMA method we fit the observed data with an autoregressive moving average process and test whether the regression residuals are statistically independent of the past values. Whenever the dependence in one direction is significantly weaker than in the other we infer the former to be the true one. Both approaches were able to detect the direction of the true generating model for simulated data sets. We also applied our tests to a large number of real world time series. The ARMA method made a decision for a significant fraction of them, in which it was mostly correct, while the classification method did not perform as well, but still exceeded chance level.
Universal kernels on nonstandard input spaces
 in Advances in Neural Information Processing Systems
, 2010
"... During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of R d. Examples include SVMs for the analysis of histograms or colored images, SVMs for text classification and web mining, and SVMs for applications ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of R d. Examples include SVMs for the analysis of histograms or colored images, SVMs for text classification and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp(PX) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ̸ ⊂ R d. We close this gap by providing a general technique based on Taylortype kernels to explicitly construct universal kernels on compact metric spaces which are not subset of R d. We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1
SpatiallyAware Comparison and Consensus for Clusterings ∗
"... This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert spacebased representation of clusters as a combination of the representations of their constituent points. ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach builds on the idea of a Hilbert spacebased representation of clusters as a combination of the representations of their constituent points. We use this representation and the underlying metric to design a spatiallyaware consensus clustering procedure. This consensus procedure is implemented via a novel reduction to Euclidean clustering, and is both simple and efficient. All of our results apply to both soft and hard clusterings. We accompany these algorithms with a detailed experimental evaluation that demonstrates the efficiency and quality of our techniques.
Twomanifold problems with applications to nonlinear system identification
 In Proc. 29th Intl. Conf. on Machine Learning (ICML
"... Recently, there has been much interest in spectral approaches to learning manifolds— socalled kernel eigenmap methods. These methods have had some successes, but their applicability is limited because they are not robust to noise. To address this limitation, we look at twomanifold problems, in whi ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Recently, there has been much interest in spectral approaches to learning manifolds— socalled kernel eigenmap methods. These methods have had some successes, but their applicability is limited because they are not robust to noise. To address this limitation, we look at twomanifold problems, in which we simultaneously reconstruct two related manifolds, each representing a different view of the same data. By solving these interconnected learning problems together, twomanifold algorithms are able to succeed where a nonintegrated approach would fail: each view allows us to suppress noise in the other, reducing bias. We propose a class of algorithms for twomanifold problems, based on spectral decomposition of crosscovariance operators in Hilbert space, and discuss when twomanifold problems are useful. Finally, we demonstrate that solving a twomanifold problem can aid in learning a nonlinear dynamical system from limited data. 1.