Results 1 - 10
of
11
Diffusion Kernels on Statistical Manifolds
, 2004
"... A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian ker ..."
Abstract
-
Cited by 63 (5 self)
- Add to MetaCart
A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomial families are derived, leading to kernel-based learning algorithms that apply naturally to discrete data. Bounds on covering numbers and Rademacher averages for the kernels are proved using bounds on the eigenvalues of the Laplacian on Riemannian manifolds. Experimental results are presented for document classification, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classification.
The em algorithm for kernel matrix completion with auxiliary data
- Journal of Machine Learning Research
, 2003
"... In biological data, it is often the case that observed data are available only for a subset of samples. When a kernel matrix is derived from such data, we have to leave the entries for unavailable samples as missing. In this paper, the missing entries are completed by exploiting an auxiliary kernel ..."
Abstract
-
Cited by 37 (6 self)
- Add to MetaCart
In biological data, it is often the case that observed data are available only for a subset of samples. When a kernel matrix is derived from such data, we have to leave the entries for unavailable samples as missing. In this paper, the missing entries are completed by exploiting an auxiliary kernel matrix derived from another information source. The parametric model of kernel matrices is created as a set of spectral variants of the auxiliary kernel matrix, and the missing entries are estimated by fitting this model to the existing entries. For model fitting, we adopt the em algorithm (distinguished from the EM algorithm of Dempster et al., 1977) based on the information geometry of positive definite matrices. We will report promising results on bacteria clustering experiments using two marker sequences: 16S and gyrB.
Information geometry of U-Boost and Bregman divergence
- Neural Computation
, 2004
"... We aim to extend from AdaBoost to U-Boost in the paradigm to build up a stronger classification machine in a set of weak learning machines. A geometric understanding for the Bregman divergence defined by a generic function U being convex leads to U-Boost method in the framework of information geomet ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
We aim to extend from AdaBoost to U-Boost in the paradigm to build up a stronger classification machine in a set of weak learning machines. A geometric understanding for the Bregman divergence defined by a generic function U being convex leads to U-Boost method in the framework of information geometry for the finite measure functions over the label set. We propose two versions of U-Boost learning algorithms by taking whether the domain is restricted to the space of probability functions or not. In the sequential step we observe that the two adjacent and the initial classifiers associate with a right triangle in the scale via the Bregman divergence, called the Pythagorean relation. This leads to a mild convergence property of the U-Boost algorithm as seen in the EM algorithm. Statistical discussion for consistency and robustness elucidates the properties of U-Boost methods based on a probabilistic assumption for a training data. 1
Information Geometrical Framework for Analyzing Belief Propagation Decoder
, 2001
"... The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difficult to obtain the true "belief" by BP, and the characteristics of the algorithm and its equilibrium are not ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
The mystery of belief propagation (BP) decoder, especially of the turbo decoding, is studied from information geometrical viewpoint. The loopy belief network (BN) of turbo codes makes it difficult to obtain the true "belief" by BP, and the characteristics of the algorithm and its equilibrium are not clearly understood. Our study gives an intuitive understanding of the mechanism, and a new framework for the analysis. Based on the framework, we reveal basic properties of the turbo decoding. 1
The Leave-one-out Kernel
"... Recently, several attempts have been made for deriving datadependent kernels from distribution estimates with parametric models (e.g. the Fisher kernel). In this paper, we propose a new kernel derived from any distribution estimators, parametric or nonparametric. This kernel is called the Leave-one ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Recently, several attempts have been made for deriving datadependent kernels from distribution estimates with parametric models (e.g. the Fisher kernel). In this paper, we propose a new kernel derived from any distribution estimators, parametric or nonparametric. This kernel is called the Leave-one-out kernel (i.e. LOO kernel), because the leave-one-out process plays an important role to compute this kernel. We will show that, when applied to a parametric model, the LOO kernel converges to the Fisher kernel asymptotically as the number of samples goes to infinity.
Clustering with the Fisher Score
- Advances in Neural Information Processing Systems 15
, 2003
"... Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserv ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Recently the Fisher score (or the Fisher kernel) is increasingly used as a feature extractor for classification problems. The Fisher score is a vector of parameter derivatives of loglikelihood of a probabilistic model. This paper gives a theoretical analysis about how class information is preserved in the space of the Fisher score, which turns out that the Fisher score consists of a few important dimensions with class information and many nuisance dimensions. When we perform clustering with the Fisher score, K-Means type methods are obviously inappropriate because they make use of all dimensions. So we will develop a novel but simple clustering algorithm specialized for the Fisher score, which can exploit important dimensions. This algorithm is successfully tested in experiments with artificial data and real data (amino acid sequences).
Asymptotic Properties of the Fisher Kernel
- Neural Computation
, 2003
"... This paper analyses the Fisher kernel (FK) from a statistical point of view. The FK is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data, i.e. of the underlying data density. It is important to analyse and ultimate ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper analyses the Fisher kernel (FK) from a statistical point of view. The FK is a particularly interesting method for constructing a model of the posterior probability that makes intelligent use of unlabeled data, i.e. of the underlying data density. It is important to analyse and ultimately understand the statistical properties of the FK. To this end, we first establish su#cient conditions that the constructed posterior model is realizable, i.e. that it contains the true distribution.
FreeBSD CVS log for ports/INDEX with Asami' s song texts: http://www.freebsd.org/cgi/cvsweb.cgi/ports/INDEX FreeBSD porters Handbook: http://www.freebsd.org/doc/en_US.ISO8859-1/books/porters-handbook OpenBSD: "Building an OpenBSD port" http://ww
- In the
"... Abstract. Kullback-Leibler relative-entropy, in cases involving distributions resulting from relative-entropy minimization, has a celebrated property reminiscent of squared Euclidean distance: it satisfies an analogue of the Pythagoras ’ theorem. And hence, this property is referred to as Pythagoras ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Kullback-Leibler relative-entropy, in cases involving distributions resulting from relative-entropy minimization, has a celebrated property reminiscent of squared Euclidean distance: it satisfies an analogue of the Pythagoras ’ theorem. And hence, this property is referred to as Pythagoras ’ theorem of relative-entropy minimization or triangle equality and plays a fundamental role in geometrical approaches of statistical estimation theory like information geometry. Equvalent of Pythagoras’ theorem in the generalized nonextensive formalism is established in (Dukkipati at
Bethe Free Energy and Contrastive Divergence Approximations for Undirected Graphical Models
, 2003
"... Bethe Free Energy and Contrastive Divergence Approximations for Undirected Graphical Models Yee Whye Teh Doctorate of Philosophy Graduate Department of Computer Science University of Toronto 2003 As the machine learning community tackles more complex and harder problems, the graphical models ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Bethe Free Energy and Contrastive Divergence Approximations for Undirected Graphical Models Yee Whye Teh Doctorate of Philosophy Graduate Department of Computer Science University of Toronto 2003 As the machine learning community tackles more complex and harder problems, the graphical models needed to solve such problems become larger and more complicated. As a result performing inference and learning exactly for such graphical models become ever more expensive, and approximate inference and learning techniques become ever more prominent.
Information Diffusion Kernels
- Advances in Neural Information Processing Systems 15
, 2002
"... A new family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. Based on the heat equation on the Riemannian manifold defined by the Fisher information metric, information diffusion kernels generalize the Gaussian kernel of Euclidean sp ..."
Abstract
- Add to MetaCart
A new family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. Based on the heat equation on the Riemannian manifold defined by the Fisher information metric, information diffusion kernels generalize the Gaussian kernel of Euclidean space, and provide a natural way of combining generative statistical modeling with non-parametric discriminative learning. As a special case, the kernels give a new approach to applying kernel-based learning algorithms to discrete data. Bounds on covering numbers for the new kernels are proved using spectral theory in differential geometry, and experimental results are presented for real data sets.

