Results 1 -
3 of
3
Fine: Fisher information non-parametric embedding
- IEEE Transactions on Signal Processing
"... Abstract—We consider the problems of clustering, classification, and visualization of high-dimensional data when no straightforward euclidean representation exists. In this paper, we propose using the properties of information geometry and statistical manifolds in order to define similarities betwee ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
Abstract—We consider the problems of clustering, classification, and visualization of high-dimensional data when no straightforward euclidean representation exists. In this paper, we propose using the properties of information geometry and statistical manifolds in order to define similarities between data sets using the Fisher information distance. We will show that this metric can be approximated using entirely nonparametric methods, as the parameterization and geometry of the manifold is generally unknown. Furthermore, by using multidimensional scaling methods, we are able to reconstruct the statistical manifold in a low-dimensional euclidean space; enabling effective learning on the data. As a whole, we refer to our framework as Fisher Information Nonparametric Embedding (FINE) and illustrate its uses on practical problems, including a biomedical application and document classification. Index Terms—Information geometry, statistical manifold, dimensionality reduction, multidimensional scaling. 1
Fine: Information embedding for document classification
- in Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing
, 2008
"... The problem of document classification considers categorizing or grouping of various document types. Each document can be represented as a bag of words, which has no straightforward Euclidean representation. Relative word counts form the basis for similarity metrics among documents. Endowing the vec ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
The problem of document classification considers categorizing or grouping of various document types. Each document can be represented as a bag of words, which has no straightforward Euclidean representation. Relative word counts form the basis for similarity metrics among documents. Endowing the vector of term frequencies with a Euclidean metric has no obvious straightforward justification. A more appropriate assumption commonly used is that the data lies on a statistical manifold, or a manifold of probabilistic generative models. In this paper, we propose calculating a low-dimensional, information based embedding of documents into Euclidean space. One component of our approach motivated by information geometry is the Fisher information distance to define similarities between documents. The other component is the calculation of the Fisher metric over a lower dimensional statistical manifold estimated in a nonparametric fashion from the data. We demonstrate that in the classification task, this information driven embedding outperforms both a standard PCA embedding and other Euclidean embeddings of the term frequency vector. Index Terms — Manifold learning, Riemannian manifold, geodesics, text classification, information geometry
Learning on statistical manifolds for clustering and visualization
- in Proceedings of Forty-Fifth Annual Allerton Conference on Communication, Control, and Computing
, 2007
"... ..."

