Results 1  10
of
13
Dimensionality reduction on statistical manifolds
, 2009
"... This work could not have been possible without the support of many individuals, and I would be remiss if I did not take the opportunity to thank them. To start, I give the utmost thanks to my advisor, Professor Alfred Hero. He not only took me under his wing as a research assistant, but was a major ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
This work could not have been possible without the support of many individuals, and I would be remiss if I did not take the opportunity to thank them. To start, I give the utmost thanks to my advisor, Professor Alfred Hero. He not only took me under his wing as a research assistant, but was a major contributor to my professional development. While his otherworldly knowledge base was critical towards my maturation as a researcher, his motivation, mentorship, and words of advice kept me going during difficult and stressful times. I would also like to thank Professor Raviv Raich, who has worked sidebyside with me throughout my entire research experience. Whenever I came upon a road block, I knew I could count on Raviv to have the patience and wherewithal to guide me through. My ability to progress so quickly throughout this process was due in large part to my amazing research project. I owe this entirely to Dr. William Finn and the Department of Pathology at the University of Michigan, who came to us with an idea and a lot of data. Dr. Finn was always available for discussion and insight into the process of flow cytometry, and throughout my development he has shown a genuine excitement for all of the work I have done. Without his knowledge, support, and enthusiasm, none of this work would have been completed. This work has also benefited from discussions with the remainder of my committee members. A special thanks goes to Professor Elizaveta Levina and Professor Clayton Scott for their input and support. Their level of expertise in many of the areas iii directly coinciding to my research topics was very beneficial, and the thirdparty
Information preserving component analysis: Data projections for flow cytometry analysis
 Signal Process. (Special Issue on Digital Image Processing Techniques for Oncology
, 2009
"... Flow cytometry is often used to characterize the malignant cells in leukemia and lymphoma patients, traced to the level of the individual cell. Typically, flow cytometric data analysis is performed through a series of 2dimensional projections onto the axes of the data set. Through the years, clinic ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
Flow cytometry is often used to characterize the malignant cells in leukemia and lymphoma patients, traced to the level of the individual cell. Typically, flow cytometric data analysis is performed through a series of 2dimensional projections onto the axes of the data set. Through the years, clinicians have determined combinations of different fluorescent markers which generate relatively known expression patterns for specific subtypes of leukemia and lymphoma – cancers of the hematopoietic system. By only viewing a series of 2dimensional projections, the highdimensional nature of the data is rarely exploited. In this paper we present a means of determining a lowdimensional projection which maintains the highdimensional relationships (i.e. information) between differing oncological data sets. By using machine learning techniques, we allow clinicians to visualize data in a low dimension defined by a linear combination of all of the available markers, rather than just 2 at a time. This provides an aid in diagnosing similar forms of cancer, as well as a means for variable selection in exploratory flow cytometric research. We refer to our method as Information Preserving Component Analysis (IPCA).
Robust Object Pose Estimation via Statistical Manifold Modeling
"... We propose a novel statistical manifold modeling approach that is capable of classifying poses of object categories from video sequences by simultaneously minimizing the intraclass variability and maximizing interpose distance. Following the intuition that an object part based representation and a ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We propose a novel statistical manifold modeling approach that is capable of classifying poses of object categories from video sequences by simultaneously minimizing the intraclass variability and maximizing interpose distance. Following the intuition that an object part based representation and a suitable part selection process may help achieve our purpose, we formulate the part selection problem from a statistical manifold modeling perspective and treat part selection as adjusting the manifold of the object (parameterized by pose) by means of the manifold “alignment” and “expansion ” operations. We show that manifold alignment and expansion are equivalent to minimizing the intraclass distance given a pose while increasing the interpose distance given an object instance respectively. We formulate and solve this (otherwise intractable) part selection problem as a combinatorial optimization problem using graph analysis techniques. Quantitative and qualitative experimental analysis validates our theoretical claims. 1.
SPHERICAL LAPLACIAN INFORMATION MAPS (SLIM) FOR DIMENSIONALITY REDUCTION
"... There have been several recently presented works on finding informationgeometric embeddings using the properties of statistical manifolds. These methods have generally focused on embedding probability density functions into an open Euclidean space. In this paper we propose adding an additional cons ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
There have been several recently presented works on finding informationgeometric embeddings using the properties of statistical manifolds. These methods have generally focused on embedding probability density functions into an open Euclidean space. In this paper we propose adding an additional constraint by embedding onto the surface of the sphere in an unsupervised manner. This additional constraint is shown to have superior performance for both manifold reconstruction and visualization when the true underlying statistical manifold is that of a lowdimensional sphere. We call the proposed method Spherical Laplacian Information Maps (SLIM), and we illustrate its utilization as a proofofconcept on both real and synthetic data. Index Terms — Information geometry, statistical manifold, dimensionality reduction 1.
Informationgeometric dimensionality reduction
 Signal Processing Magazine, IEEE
, 2011
"... We consider the problem of dimensionality reduction and manifold learning when the domain of interest is a set of probability distributions instead of a set of Euclidean data vectors. In this problem, one seeks to discover a low dimensional representation, called an embedding, that preserves certain ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of dimensionality reduction and manifold learning when the domain of interest is a set of probability distributions instead of a set of Euclidean data vectors. In this problem, one seeks to discover a low dimensional representation, called an embedding, that preserves certain properties such as distance between measured distributions or separation between classes of distributions. Such representations are useful for data visualization and clustering. While a standard Euclidean dimension reduction method like PCA, ISOMAP, or Laplacian Eigenmaps can easily be applied to distributional data – e.g. by quantization and vectorization of the distributions – this may not provide the best lowdimensional embedding. This is because the most natural measure of dissimilarity between probability distributions is the information divergence and not the standard Euclidean distance. If the information divergence is adopted then the space of probability distributions becomes a nonEuclidean space called an information geometry. This article presents methods that are specifically designed for the lowdimensional embedding of informationgeometric data, and we illustrate these methods for visualization in flow cytometry and demography analysis. Index Terms Information geometry, dimensionality reduction, statistical manifold, classification
On Local Intrinsic Dimension Estimation and Its Applications
"... Abstract—In this paper, we present multiple novel applications for local intrinsic dimension estimation. There has been much work done on estimating the global dimension of a data set, typically for the purposes of dimensionality reduction. We show that by estimating dimension locally, we are able t ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we present multiple novel applications for local intrinsic dimension estimation. There has been much work done on estimating the global dimension of a data set, typically for the purposes of dimensionality reduction. We show that by estimating dimension locally, we are able to extend the uses of dimension estimation to many applications, which are not possible with global dimension estimation. Additionally, we show that local dimension estimation can be used to obtain a better global dimension estimate, alleviating the negative bias that is common to all known dimension estimation algorithms. We illustrate local dimension estimation’s uses towards additional applications, such as learning on statistical manifolds, network anomaly detection, clustering, and image segmentation. Index Terms—Geodesics, image segmentation, intrinsic dimension, manifold learning, nearest neighbor graph. I.
Information preserving embeddings for discrimination
 In Digital Signal Processing Workshop and 5th IEEE Signal Processing Education Workshop
, 2009
"... Dimensionality reduction is required for ‘human in the loop’ analysis of high dimensional data. We present a method for dimensionality reduction that is tailored to tasks of data set discrimination. As contrasted with Euclidean dimensionality reduction, which preserves Euclidean distance or Euler an ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Dimensionality reduction is required for ‘human in the loop’ analysis of high dimensional data. We present a method for dimensionality reduction that is tailored to tasks of data set discrimination. As contrasted with Euclidean dimensionality reduction, which preserves Euclidean distance or Euler angles in the lower dimensional space, our method seeks to preserve information as measured by the Fisher information distance, or approximations thereof, on the dataassociated probability density functions. We will illustrate the approach for multiclass object discrimination problems. Index Terms — Information geometry, statistical manifold, dimensionality reduction, classification, object recognition 1.
Shrinkage Fisher Information Embedding of High Dimensional Feature Distributions
"... Abstract — In this paper, we introduce a dimensionality reduction method that can be applied to clustering of high dimensional empirical distributions. The proposed approach is based on stabilized information geometrical representation of the feature distributions. The problem of dimensionality redu ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — In this paper, we introduce a dimensionality reduction method that can be applied to clustering of high dimensional empirical distributions. The proposed approach is based on stabilized information geometrical representation of the feature distributions. The problem of dimensionality reduction on spaces of distribution functions arises in many applications including hyperspectral imaging, document clustering, and classifying flow cytometry data. Our method is a shrinkage regularized version of Fisher information distance, that we call shrinkage FINE (sFINE), which is implemented by Steinian shrinkage estimation of the matrix of Kullback Liebler distances between feature distributions. The proposed method involves computing similarities using shrinkage regularized Fisher information distance between probability density functions (PDFs) of the data features, then applying Laplacian eigenmaps on a derived similarity matrix to accomplish the embedding and perform clustering. The shrinkage regularization controls the tradeoff between bias and variance and is especially wellsuited for clustering empirical probability distributions of highdimensional data sets. We also show significant gains in clustering performance on both of the UCI dataset and a spam data set. Finally we demonstrate the superiority of embedding and clustering distributional data using sFINE as compared to other stateoftheart methods such as nonparametric information clustering, support vector machine (SVM) and sparse Kmeans. I.
Statistical File Matching of Flow Cytometry Data
"... Flow cytometry is a technology that rapidly measures antigenbased markers associated to cells in a cell population. Although analysis of flow cytometry data has traditionally considered one or two markers at a time, there has been increasing interest in multidimensional analysis. However, flow cyto ..."
Abstract
 Add to MetaCart
(Show Context)
Flow cytometry is a technology that rapidly measures antigenbased markers associated to cells in a cell population. Although analysis of flow cytometry data has traditionally considered one or two markers at a time, there has been increasing interest in multidimensional analysis. However, flow cytometers are limited in the number of markers they can jointly observe, which is typically a fraction of the number of markers of interest. For this reason, practitioners often perform multiple assays based on different, overlapping combinations of markers. In this paper, we address the challenge of imputing the high dimensional jointly distributed values of marker attributes based on overlapping marginal observations. We show that simple nearest neighbor based imputation can lead to spurious subpopulations in the imputed data and introduce an alternative approach based on nearest neighbor imputation restricted to a cell’s subpopulation. This requires us to perform clustering with missing data, which we address with a mixture model approach and novel EM algorithm. Since mixture model fitting may be illposed in this context, we also develop techniques to initialize the EM algorithm using domain knowledge. We demonstrate our approach on real flow cytometry data.
Domain Adaptation on the Statistical Manifold
"... In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the unsupervised scenario where no labeled samples from the target domain are provided, a popular approach consists in transforming the data such that the source and target distributions become similar. To ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the unsupervised scenario where no labeled samples from the target domain are provided, a popular approach consists in transforming the data such that the source and target distributions become similar. To compare the two distributions, existing approaches make use of the Maximum Mean Discrepancy (MMD). However, this does not exploit the fact that probability distributions lie on a Riemannian manifold. Here, we propose to make better use of the structure of this manifold and rely on the distance on the manifold to compare the source and target distributions. In this framework, we introduce a sample selection method and a subspacebased method for unsupervised domain adaptation, and show that both these manifoldbased techniques outperform the corresponding approaches based on the MMD. Furthermore, we show that our subspacebased approach yields stateoftheart results on a standard object recognition benchmark. 1.