Results 1  10
of
17
Diffusion Kernels on Statistical Manifolds
, 2004
"... A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian ker ..."
Abstract

Cited by 87 (6 self)
 Add to MetaCart
A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomial families are derived, leading to kernelbased learning algorithms that apply naturally to discrete data. Bounds on covering numbers and Rademacher averages for the kernels are proved using bounds on the eigenvalues of the Laplacian on Riemannian manifolds. Experimental results are presented for document classification, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classification.
The em algorithm for kernel matrix completion with auxiliary data
 Journal of Machine Learning Research
, 2003
"... In biological data, it is often the case that observed data are available only for a subset of samples. When a kernel matrix is derived from such data, we have to leave the entries for unavailable samples as missing. In this paper, the missing entries are completed by exploiting an auxiliary kernel ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
In biological data, it is often the case that observed data are available only for a subset of samples. When a kernel matrix is derived from such data, we have to leave the entries for unavailable samples as missing. In this paper, the missing entries are completed by exploiting an auxiliary kernel matrix derived from another information source. The parametric model of kernel matrices is created as a set of spectral variants of the auxiliary kernel matrix, and the missing entries are estimated by fitting this model to the existing entries. For model fitting, we adopt the em algorithm (distinguished from the EM algorithm of Dempster et al., 1977) based on the information geometry of positive definite matrices. We will report promising results on bacteria clustering experiments using two marker sequences: 16S and gyrB.
Information geometry of UBoost and Bregman divergence
 Neural Computation
, 2004
"... We aim to extend from AdaBoost to UBoost in the paradigm to build up a stronger classification machine in a set of weak learning machines. A geometric understanding for the Bregman divergence defined by a generic function U being convex leads to UBoost method in the framework of information geomet ..."
Abstract

Cited by 23 (8 self)
 Add to MetaCart
We aim to extend from AdaBoost to UBoost in the paradigm to build up a stronger classification machine in a set of weak learning machines. A geometric understanding for the Bregman divergence defined by a generic function U being convex leads to UBoost method in the framework of information geometry for the finite measure functions over the label set. We propose two versions of UBoost learning algorithms by taking whether the domain is restricted to the space of probability functions or not. In the sequential step we observe that the two adjacent and the initial classifiers associate with a right triangle in the scale via the Bregman divergence, called the Pythagorean relation. This leads to a mild convergence property of the UBoost algorithm as seen in the EM algorithm. Statistical discussion for consistency and robustness elucidates the properties of UBoost methods based on a probabilistic assumption for a training data. 1
Information Geometrical Framework for Analyzing Belief Propagation Decoder
, 2001
"... The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difficult to obtain the true "belief" by BP, and the characteristics of the algorithm and its equilibrium are not ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difficult to obtain the true "belief" by BP, and the characteristics of the algorithm and its equilibrium are not clearly understood. Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. Based on the framework, we reveal basic properties of the turbo decoding. 1
The Leaveoneout Kernel
"... Recently, several attempts have been made for deriving datadependent kernels from distribution estimates with parametric models (e.g. the Fisher kernel). In this paper, we propose a new kernel derived from any distribution estimators, parametric or nonparametric. This kernel is called the Leaveone ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Recently, several attempts have been made for deriving datadependent kernels from distribution estimates with parametric models (e.g. the Fisher kernel). In this paper, we propose a new kernel derived from any distribution estimators, parametric or nonparametric. This kernel is called the Leaveoneout kernel (i.e. LOO kernel), because the leaveoneout process plays an important role to compute this kernel. We will show that, when applied to a parametric model, the LOO kernel converges to the Fisher kernel asymptotically as the number of samples goes to infinity.
Clustering with the Fisher Score
 Advances in Neural Information Processing Systems 15
, 2003
"... Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserv ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, KMeans type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).
Asymptotic Properties of the Fisher Kernel
 Neural Computation
, 2003
"... This paper analyses the Fisher kernel (FK) from a statistical point of view. The FK is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data, i.e. of the underlying data density. It is important to analyse and ultimate ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This paper analyses the Fisher kernel (FK) from a statistical point of view. The FK is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data, i.e. of the underlying data density. It is important to analyse and ultimately understand the statistical properties of the FK. To this end, we first establish su#cient conditions that the constructed posterior model is realizable, i.e. that it contains the true distribution.
Group action induced distances on spaces of highdimensional linear stochastic processes. submitted
"... Abstract. This paper studies the geometrization of spaces of stochastic processes. Our main motivation is the problem of pattern recognition in highdimensional timeseries data (e.g., video sequence classification and clustering). First, we review some existing approaches to defining distances on s ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract. This paper studies the geometrization of spaces of stochastic processes. Our main motivation is the problem of pattern recognition in highdimensional timeseries data (e.g., video sequence classification and clustering). First, we review some existing approaches to defining distances on spaces of stochastic processes. Next, we focus on the space of processes generated by (stochastic) linear dynamical systems (LDSs) of fixed size and order (this space is a natural choice for the pattern recognition problem). When the LDSs are represented in statespace form, the space of LDSs can be considered as the base space of a principal fiber bundle. We use this fact to introduce a large class of easytocompute group actioninduced distances on the space of LDSs and hence on the corresponding space of stochastic processes. We call such a distance an alignment distance. One of our aims is to demonstrate the usefulness of controltheoretic tools in problems related to stochastic processes.
FreeBSD CVS log for ports/INDEX with Asami' s song texts: http://www.freebsd.org/cgi/cvsweb.cgi/ports/INDEX FreeBSD porters Handbook: http://www.freebsd.org/doc/en_US.ISO88591/books/portershandbook OpenBSD: "Building an OpenBSD port" http://ww
 In the
"... Abstract. KullbackLeibler relativeentropy, in cases involving distributions resulting from relativeentropy minimization, has a celebrated property reminiscent of squared Euclidean distance: it satisfies an analogue of the Pythagoras ’ theorem. And hence, this property is referred to as Pythagoras ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. KullbackLeibler relativeentropy, in cases involving distributions resulting from relativeentropy minimization, has a celebrated property reminiscent of squared Euclidean distance: it satisfies an analogue of the Pythagoras ’ theorem. And hence, this property is referred to as Pythagoras ’ theorem of relativeentropy minimization or triangle equality and plays a fundamental role in geometrical approaches of statistical estimation theory like information geometry. Equvalent of Pythagoras’ theorem in the generalized nonextensive formalism is established in (Dukkipati at
Bethe Free Energy and Contrastive Divergence Approximations for Undirected Graphical Models
, 2003
"... Bethe Free Energy and Contrastive Divergence Approximations for Undirected Graphical Models Yee Whye Teh Doctorate of Philosophy Graduate Department of Computer Science University of Toronto 2003 As the machine learning community tackles more complex and harder problems, the graphical models ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Bethe Free Energy and Contrastive Divergence Approximations for Undirected Graphical Models Yee Whye Teh Doctorate of Philosophy Graduate Department of Computer Science University of Toronto 2003 As the machine learning community tackles more complex and harder problems, the graphical models needed to solve such problems become larger and more complicated. As a result performing inference and learning exactly for such graphical models become ever more expensive, and approximate inference and learning techniques become ever more prominent.