Results 1  10
of
17
Diffusion Kernels on Statistical Manifolds
, 2004
"... A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian ker ..."
Abstract

Cited by 87 (6 self)
 Add to MetaCart
A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomial families are derived, leading to kernelbased learning algorithms that apply naturally to discrete data. Bounds on covering numbers and Rademacher averages for the kernels are proved using bounds on the eigenvalues of the Laplacian on Riemannian manifolds. Experimental results are presented for document classification, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classification.
A Scalable Method for Estimating Network Traffic Matrices from Link Counts
, 2000
"... Traffic matrices are extremely useful for network configuration, management, engineering, and pricing. Direct measurement is, however, expensive in general and impossible in some cases. This paper proposes a scalable algorithm for statistically estimating a traffic matrix from the readily available ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
Traffic matrices are extremely useful for network configuration, management, engineering, and pricing. Direct measurement is, however, expensive in general and impossible in some cases. This paper proposes a scalable algorithm for statistically estimating a traffic matrix from the readily available link counts. It relies on a divideandconquer strategy to lower the computational cost without losing estimation accuracy. The proposed algorithm is tested on a real network with 18 nodes. The estimates are comparable to the direct estimates but require dramatically less computation.
Covariance and Fisher information in quantum mechanics
, 2002
"... Variance and Fisher information are ingredients of the CramerRao inequality. We regard Fisher information as a Riemannian metric on a quantum statistical manifold and choose monotonicity under coarse graining as the fundamental property of variance and Fisher information. In this approach we show t ..."
Abstract

Cited by 27 (18 self)
 Add to MetaCart
Variance and Fisher information are ingredients of the CramerRao inequality. We regard Fisher information as a Riemannian metric on a quantum statistical manifold and choose monotonicity under coarse graining as the fundamental property of variance and Fisher information. In this approach we show that there is a kind of dual onetoone correspondence between the candidates of the two concepts. We emphasis that Fisher informations are obtained from relative entropies as contrast functions on the state space and argue that the scalar curvature might be interpreted as an uncertainty density on a statistical manifold.
Learning Riemannian Metrics
 In Proceedings of the 19th conference on Uncertainty in Artificial Intelligence (UAI
, 2003
"... We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given dataset of points. From a stati ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given dataset of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pullback metrics of the Fisher information under a continuous group of transformations. When applied to documents, the resulting geodesics resemble, but outperform, the TFIDF cosine similarity measure in classification.
The Locally Weighted Bag of Words Framework for Document Representation
"... The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document representation that goes beyond the bag of words represen ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document representation that goes beyond the bag of words representation and its ngram extensions. This representation uses local smoothing to embed documents as smooth curves in the multinomial simplex thereby preserving valuable sequential information. In contrast to bag of words or ngrams, the new representation is able to robustly capture medium and long range sequential trends in the document. We discuss the representation and its geometric properties and demonstrate its applicability for various text processing tasks.
Bayesian Quadratic Discriminant Analysis
 Journal of Machine Learning Research
, 2007
"... Quadratic discriminant analysis is a common tool for classification, but estimation of the Gaussian parameters can be illposed. This paper contains theoretical and algorithmic contributions to Bayesian estimation for quadratic discriminant analysis. A distributionbased Bayesian classifier is deriv ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Quadratic discriminant analysis is a common tool for classification, but estimation of the Gaussian parameters can be illposed. This paper contains theoretical and algorithmic contributions to Bayesian estimation for quadratic discriminant analysis. A distributionbased Bayesian classifier is derived using information geometry. Using a calculus of variations approach to define a functional Bregman divergence for distributions, it is shown that the Bayesian distributionbased classifier that minimizes the expected Bregman divergence of each class conditional distribution also minimizes the expected misclassification cost. A series approximation is used to relate regularized discriminant analysis to Bayesian discriminant analysis. A new Bayesian quadratic discriminant analysis classifier is proposed where the prior is defined using a coarse estimate of the covariance based on the training data; this classifier is termed BDA7. Results on benchmark data sets and simulations show that BDA7 performance is competitive with, and in some cases significantly better than, regularized quadratic discriminant analysis and the crossvalidated Bayesian quadratic discriminant analysis classifier Quadratic Bayes.
Functional bregman divergence and bayesian estimation of distributions
 CoRR
"... Abstract—A class of distortions termed functional Bregman divergences is defined, which includes squared error and relative entropy. A functional Bregman divergence acts on functions or distributions, and generalizes the standard Bregman divergence for vectors and a previous pointwise Bregman diverg ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Abstract—A class of distortions termed functional Bregman divergences is defined, which includes squared error and relative entropy. A functional Bregman divergence acts on functions or distributions, and generalizes the standard Bregman divergence for vectors and a previous pointwise Bregman divergence that was defined for functions. A recent result showed that the mean minimizes the expected Bregman divergence. The new functional definition enables the extension of this result to the continuous case to show that the mean minimizes the expected functional Bregman divergence over a set of functions or distributions. It is shown how this theorem applies to the Bayesian estimation of distributions. Estimation of the uniform distribution from independent and identically drawn samples is presented as a case study. Index Terms—Bayesian estimation, Bregman divergence, convexity, Fréchet derivative, uniform distribution.
Distributionbased Bayesian minimum expected risk for discriminant analysis
 in Proc. IEEE Int. Symp. Inf. Theory
"... Abstract — This paper considers a distributionbased Bayesian estimation for classification by quadratic discriminant analysis, instead of the standard parameterbased Bayesian estimation. This approach also yields closed form solutions, but removes the parameterbased restriction of requiring more ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Abstract — This paper considers a distributionbased Bayesian estimation for classification by quadratic discriminant analysis, instead of the standard parameterbased Bayesian estimation. This approach also yields closed form solutions, but removes the parameterbased restriction of requiring more training samples than feature dimensions. We investigate how to define a prior so that it has an adaptively regularizing effect: yielding robust estimation when the number of training samples are small compared to the number of feature dimensions, but converging as the number of data points grows large. Comparative performance on a suite of simulations shows that the distributionbased Bayesian discriminant analysis is advantageous in terms of average error. I.
Information Resonance and Pattern Recognition in Classical and Quantum Systems: Toward a `Language Model' of Hierarchical Neural Structure and Process
, 2000
"... Recent applications of the ShannonMcMillan Theorem to arrays of nonlinear components undergoing what is effectively an `information resonance' (R Wallace, 2000a) may be extended to include many neural models, both classical and quantum. Some consideration reduces the threefold interacting complex o ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Recent applications of the ShannonMcMillan Theorem to arrays of nonlinear components undergoing what is effectively an `information resonance' (R Wallace, 2000a) may be extended to include many neural models, both classical and quantum. Some consideration reduces the threefold interacting complex of sensory activity, ongoing activity, and nonlinear oscillator to a single object, a parametized ergodic information source. Invocation of the `large deviations ' program of applied probability that unifies treatment of dynamical fluctuations, statistical mechanics, and information theory allows a `natural' transfer of thermodynamic and renormalization arguments from statistical physics to information theory, permitting a markedly simplified analysis of neural dynamics. This suggests an inherent languagebased foundation, in a large sense, to neural structure and process, and implies that approaches without intimate relation to language may be seriously incomplete. Key Words: Coevolution, in...
Hyperplane Margin Classifiers on the Multinomial Manifold
 In Proc. of the 21st International Conference on Machine Learning
, 2004
"... The assumptions behind linear classifiers for categorical data are examined and reformulated in the context of the multinomial manifold, the simplex of multinomial models furnished with the Riemannian structure induced by the Fisher information. ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
The assumptions behind linear classifiers for categorical data are examined and reformulated in the context of the multinomial manifold, the simplex of multinomial models furnished with the Riemannian structure induced by the Fisher information.