• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Automatic choice of dimensionality for PCA (2000)

by T Minka
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 43
Next 10 →

Gaussian process latent variable models for visualisation of high dimensional data

by Michalis K. Titsias, Neil D. Lawrence - Adv. in Neural Inf. Proc. Sys , 2004
"... We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the ex ..."
Abstract - Cited by 91 (1 self) - Add to MetaCart
We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs. 1

Constructing Internet Coordinate System Based on Delay Measurement

by Hyuk Lim, Jennifer C. Hou, Chong-Ho Choi, Seoul Korea , 2003
"... In this paper, we consider the problem of how to represent the locations of Internet hosts in a Cartesian coordinate system to facilitate estimate of the network distance between two arbitrary Internet hosts. We envision an infrastructure that consists of beacon nodes and provides the service of est ..."
Abstract - Cited by 85 (3 self) - Add to MetaCart
In this paper, we consider the problem of how to represent the locations of Internet hosts in a Cartesian coordinate system to facilitate estimate of the network distance between two arbitrary Internet hosts. We envision an infrastructure that consists of beacon nodes and provides the service of estimating network distance between two hosts without direct delay measurement. We show that the principal component analysis (PCA) technique can e#ectively extract topological information from delay measurements between beacon hosts. Based on PCA, we devise a transformation method that projects the distance data space into a new coordinate system of (much) smaller dimensions. The transformation retains as much topological information as possible and yet enables end hosts to easily determine their locations in the coordinate system. The resulting new coordinate system is termed as the Internet Coordinate System (ICS). As compared to existing work (e.g., IDMaps [1] and GNP [2]), ICS incurs smaller computation overhead in calculating the coordinates of hosts and smaller measurement overhead (required for end hosts to measure their distances to beacon hosts). Finally, we show via experimentation with real-life data sets that ICS is robust and accurate, regardless of the number of beacon nodes (as long as it exceeds certain threshold) and the complexity of network topology.

Segmenting Motion Capture Data into Distinct Behaviors

by Jernej Barbic, Alla Safonova, Jia-Yu Pan, Christos Faloutsos, Jessica K. Hodgins, Nancy S. Pollard - In Graphics Interface , 2004
"... Much of the motion capture data used in animations, commercials, and video games is carefully segmented into distinct motions either at the time of capture or by hand after the capture session. As we move toward collecting more and longer motion sequences, however, automatic segmentation techniques ..."
Abstract - Cited by 61 (5 self) - Add to MetaCart
Much of the motion capture data used in animations, commercials, and video games is carefully segmented into distinct motions either at the time of capture or by hand after the capture session. As we move toward collecting more and longer motion sequences, however, automatic segmentation techniques will become important for processing the results in a reasonable time frame.

Is there something out there? Infering space from sensorimotor dependencies

by D. Philipona, J.K. O'Regan, J.-P. Nadal - Neural Computation , 2002
"... This paper suggests that in biological organisms, the perceived structure of reality, in particular the notions of body, environment, space, object, and attribute, could be a consequence of an effort on the part of brains to account for the dependency between their inputs and their outputs in terms ..."
Abstract - Cited by 45 (3 self) - Add to MetaCart
This paper suggests that in biological organisms, the perceived structure of reality, in particular the notions of body, environment, space, object, and attribute, could be a consequence of an effort on the part of brains to account for the dependency between their inputs and their outputs in terms of a small number of parameters. To validate this idea, a procedure is demonstrated whereby the brain of an organism with arbitrary input and output connectivity can deduce the dimensionality of the rigid group of the space underlying its input output relationship, that is the dimension of what the organism will call physical space.

Minimum description length shape and appearance models

by Hans Henrik Thodberg - In Image Processing Medical Imaging, IPMI , 2003
"... Abstract. The Minimum Description Length (MDL) approach to shape modelling is reviewed. It solves the point correspondence problem of selecting points on shapes defined as curves so that the points correspond across a data set. An efficient numerical implementation is presented and made available as ..."
Abstract - Cited by 35 (1 self) - Add to MetaCart
Abstract. The Minimum Description Length (MDL) approach to shape modelling is reviewed. It solves the point correspondence problem of selecting points on shapes defined as curves so that the points correspond across a data set. An efficient numerical implementation is presented and made available as open source Matlab code. The problems with the early MDL approaches are discussed. Finally the MDL approach is extended to an MDL Appearance Model, which is proposed as a means to perform unsupervised image segmentation. 1.

Probabilistic Independent Component Analysis

by Christian F. Beckmann, Christian F. Beckmann*t, Stephen M. Smith , 2003
"... Independent Component Analysis is becoming a popular exploratory method for analysing complex data such as that from FMRI experiments. The application of such 'model-free' methods, however, has been somewhat restricted both by the view that results can be uninterpretable and by the lack of ability t ..."
Abstract - Cited by 28 (8 self) - Add to MetaCart
Independent Component Analysis is becoming a popular exploratory method for analysing complex data such as that from FMRI experiments. The application of such 'model-free' methods, however, has been somewhat restricted both by the view that results can be uninterpretable and by the lack of ability to quantify statistical significance. We present an integrated approach to Probabilistic ICA for FMRI data that allows for non-square mixing in the presence of Gaussian noise. We employ an objective estimation of the amount of Gaussian noise through Bayesian analysis of the true dimensionality of the data, i.e. the number of activation and non-Gaussian noise sources. Reduction of the data to this 'true' subspace before the ICA decomposition automatically results in an estimate of the noise, leading to the ability to assign significance to voxels in ICA spatial maps. Estimation of the number of intrinsic sources not only enables us to carry out probabilistic modelling, but also achieves an asymptotically unique decomposition of the data. This reduces problems of interpretation, as each final independent component is now much more likely to be due to only one physical or physiological process. We also describe other improvements to standard ICA, such as temporal pre-whitening and variance normafisation of timeseries, the latter being particularly useful in the context of dimensionality reduction when weak activation is present. We discuss the use of prior information about the spatiotemporal nature of the source processes, and an alternative-hypothesis testing approach for inference, using Gaussian mixture models. The performance of our approach is illustrated and evaluated on real and complex artificial FMRI data, and compared to the spatio-temporal accuracy of restfits obtaine...

Non-linear Matrix Factorization with Gaussian Processes

by Neil D. Lawrence, Raquel Urtasun
"... A popular approach to collaborative filtering is matrix factorization. In this paper we develop a non-linear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to ..."
Abstract - Cited by 20 (1 self) - Add to MetaCart
A popular approach to collaborative filtering is matrix factorization. In this paper we develop a non-linear probabilistic matrix factorization using Gaussian process latent variable models. We use stochastic gradient descent (SGD) to optimize the model. SGD allows us to apply Gaussian processes to data sets with millions of observations without approximate methods. We apply our approach to benchmark movie recommender data sets. The results show better than previous state-of-theart performance. 1.

Signal Detection Using ICA: Application to Chat Room Topic Spotting

by Thomas Kolenda, Lars Kai Hansen, Jan Larsen , 2001
"... Signal detection and pattern recognition for online grouping huge amounts of data and retrospective analysis is becoming increasingly important as knowledge based standards, such as XML and advanced MPEG, gain popularity. Independent component analysis (ICA) can be used to both cluster and detect si ..."
Abstract - Cited by 18 (3 self) - Add to MetaCart
Signal detection and pattern recognition for online grouping huge amounts of data and retrospective analysis is becoming increasingly important as knowledge based standards, such as XML and advanced MPEG, gain popularity. Independent component analysis (ICA) can be used to both cluster and detect signals with weak a priori assumptions in multimedia contexts. ICA of real world data is typically performed without knowledge of the number of non-trivial independent components, hence, it is of interest to test hypotheses concerning the number of components or simply to test whether a given set of components is significant relative to a "white noise" null hypothesis. It was recently proposed to use the so-called Bayesian information criterion (BIC) approximation, for estimation of such probabilities of competing hypotheses. Here, we apply this approach to the understanding of chat. We show that ICA can detect meaningful context structures in a chat room log file.

On ranking the effectiveness of searches

by Vishwa Vinay, Ingemar J. Cox Natasa Milic-frayling, Ken Wood - In: Proc. of the 29th Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval , 2006
"... There is a growing interest in estimating the effectiveness of search. Two approaches are typically considered: examining the search queries and examining the retrieved document sets. In this paper, we take the latter approach. We use four measures to characterize the retrieved document sets and est ..."
Abstract - Cited by 17 (0 self) - Add to MetaCart
There is a growing interest in estimating the effectiveness of search. Two approaches are typically considered: examining the search queries and examining the retrieved document sets. In this paper, we take the latter approach. We use four measures to characterize the retrieved document sets and estimate the quality of search. These measures are (i) the clustering tendency as measured by the Cox-Lewis statistic, (ii) the sensitivity to document perturbation, (iii) the sensitivity to query perturbation and (iv) the local intrinsic dimensionality. We present experimental results for the task of ranking 200 queries according to the search effectiveness over the TREC (discs 4 and 5) dataset. Our ranking of queries is compared with the ranking based on the average precision using the Kendall τ statistic. The best individual estimator is the sensitivity to document perturbation and yields Kendall τ of 0.521. When combined with the clustering tendency based on the Cox-Lewis statistic and the query perturbation measure, it results in Kendall τ of 0.562 which to our knowledge is the highest correlation with the average precision reported to date.

ICA: Model order selection and dynamic source models

by W.D. Penny, S. J. Roberts , 2001
"... proved source estimation. The second part of this chapter looks at the use of such dynamic source models, where the sources are modelled using a generalised autoregressive (GAR) process. This is the usual autoregressive process 1 2 Penny and Roberts and Everson but where the noise has a Generalise ..."
Abstract - Cited by 12 (2 self) - Add to MetaCart
proved source estimation. The second part of this chapter looks at the use of such dynamic source models, where the sources are modelled using a generalised autoregressive (GAR) process. This is the usual autoregressive process 1 2 Penny and Roberts and Everson but where the noise has a Generalised Exponential (GE) distribution instead of the usual Gaussian. This chapter consists of six further sections. The rst descibres the probability model for non-square ICA and derives the Laplace approximation required to calculate the data likelihood. The second section describes the decorrelating manifold and the third describes ICA and PCA model order selection methods. Section four describes dierent source models including the GAR process. This includes a description of its own model order criterion for determining the number of taps in the GAR lter. Section ve describes results from applying the above methods to the unmixing of music sources and the chapter is
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University