Results 1  10
of
11
Learning spectral clustering, with application to speech separation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost fun ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled datasets. We apply our learning algorithm to the blind onemicrophone speech separation problem, casting the problem as one of segmentation of the spectrogram.
Clustering by weighted cuts in directed graphs
 In Proceedings of the 2007 SIAM International Conference on Data Mining
, 2007
"... In this paper we formulate spectral clustering in directed graphs as an optimization problem, the objective being a weighted cut in the directed graph. This objective extends several popular criteria like the normalized cut and the averaged cut to asymmetric affinity data. We show that this problem ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
In this paper we formulate spectral clustering in directed graphs as an optimization problem, the objective being a weighted cut in the directed graph. This objective extends several popular criteria like the normalized cut and the averaged cut to asymmetric affinity data. We show that this problem can be relaxed to a Rayleigh quotient problem for a symmetric matrix obtained from the original affinities and therefore a large body of the results and algorithms developed for spectral clustering of symmetric data immediately extends to asymmetric cuts. 1
An Information Theoretic Approach to Machine Learning
, 2005
"... In this thesis, theory and applications of machine learning systems based on information theoretic criteria as performance measures are studied. A new clustering algorithm based on maximizing the CauchySchwarz (CS) divergence measure between probability density functions (pdfs) is proposed. The CS ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
In this thesis, theory and applications of machine learning systems based on information theoretic criteria as performance measures are studied. A new clustering algorithm based on maximizing the CauchySchwarz (CS) divergence measure between probability density functions (pdfs) is proposed. The CS divergence is estimated nonparametrically using the Parzen window technique for density estimation. The problem domain is transformed from discrete 0/1 cluster membership values to continuous membership values. A constrained gradient descent maximization algorithm is implemented. The gradients are stochastically approximated to reduce computational complexity, making the algorithm more practical. Parzen window annealing is incorporated into the algorithm to help avoid convergence to a local maximum. The clustering results obtained on synthetic and real data are encouraging. The Parzen windowbased estimator for the CS divergence is shown to have a dual expression as a measure of the cosine of the angle between cluster mean vectors in a feature space determined by the eigenspectrum of a Mercer kernel matrix. A spectral clustering
Model averaging and dimension selection for the singular value decomposition
 Journal of the American Statistical Association
, 2007
"... Many multivariate data analysis techniques for an m × n matrix Y are related to the model Y = M+E, where Y is an m×n matrix of full rank and M is an unobserved mean matrix of rank K < (m ∧ n). Typically the rank of M is estimated in a heuristic way and then the leastsquares estimate of M is obtaine ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Many multivariate data analysis techniques for an m × n matrix Y are related to the model Y = M+E, where Y is an m×n matrix of full rank and M is an unobserved mean matrix of rank K < (m ∧ n). Typically the rank of M is estimated in a heuristic way and then the leastsquares estimate of M is obtained via the singular value decomposition of Y, yielding an estimate that can have a very high variance. In this paper we suggest a modelbased alternative to the above approach by providing prior distributions and posterior estimation for the rank of M and the components of its singular value decomposition. In addition to providing more accurate inference, such an approach has the advantage of being extendable to more general dataanalysis situations, such as inference in the presence of missing data and estimation in a generalized linear modeling framework.
Information Theoretic Spectral Clustering
 In Proceedings of International Joint Conference on Neural Networks
"... Abstract — We discuss a new informationtheoretic framework for spectral clustering that is founded on the recently introduced Information Cut. A novel spectral clustering algorithm is proposed, where the clustering solution is given as a linearly weighted combination of certain top eigenvectors of ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Abstract — We discuss a new informationtheoretic framework for spectral clustering that is founded on the recently introduced Information Cut. A novel spectral clustering algorithm is proposed, where the clustering solution is given as a linearly weighted combination of certain top eigenvectors of the data affinity matrix. The Information Cut provides us with a theoretically well defined graphspectral cost function, and also establishes a close link between spectral clustering, and nonparametric density estimation. As a result, a natural criterion for creating the data affinity matrix is provided. We present preliminary clustering results to illustrate some of the properties of our algorithm, and we also make comparative remarks. I.
On Potts Model Clustering, Kernel KMeans and Density Estimation
 JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS
, 2008
"... ... follow the same recipe: (i) choose a measure of similarity between observations; (ii) define a figure of merit assigning a large value to partitions of the data that put similar observations in the same cluster; and (iii) optimize this figure of merit over partitions. Potts model clustering repr ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
... follow the same recipe: (i) choose a measure of similarity between observations; (ii) define a figure of merit assigning a large value to partitions of the data that put similar observations in the same cluster; and (iii) optimize this figure of merit over partitions. Potts model clustering represents an interesting variation on this recipe. Blatt, Wiseman, and Domany defined a new figure of merit for partitions that is formally similar to the Hamiltonian of the Potts model for ferromagnetism, extensively studied in statistical physics. For each temperature T, the Hamiltonian defines a distribution assigning a probability to each possible configuration of the physical system or, in the language of clustering, to each partition. Instead of searching for a single partition optimizing the Hamiltonian, they sampled a large number of partitions from this distribution for a range of temperatures. They proposed a heuristic for choosing an appropriate temperature and from the sample of partitions associated with this chosen temperature, they then derived what we call a consensus clustering: two observations are put in the same consensus cluster if they belong to the same cluster in the majority of the random partitions. In a sense, the consensus clustering is an “average ” of plausible
Spectral clustering for speech separation
"... Spectral clustering refers to a class of recent techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this chapter, we introduce ..."
Abstract
 Add to MetaCart
Spectral clustering refers to a class of recent techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this chapter, we introduce the main concepts and algorithms together with recent advances in learning the similarity matrix from data. The techniques are illustrated on the blind onemicrophone speech separation problem, by casting the problem as one of segmentation of the spectrogram. 1.
Functional Connectivity Mapping Using the Ferromagnetic Potts
, 2007
"... Abstract: An unsupervised stochastic clustering method based on the ferromagnetic Potts spin model is introduced as a powerful tool to determine functionally connected regions. The method provides an intuitively simple approach to clustering and makes no assumptions of the number of clusters in the ..."
Abstract
 Add to MetaCart
Abstract: An unsupervised stochastic clustering method based on the ferromagnetic Potts spin model is introduced as a powerful tool to determine functionally connected regions. The method provides an intuitively simple approach to clustering and makes no assumptions of the number of clusters in the data or their underlying distribution. The performance of the method and its dependence on the intrinsic parameters (size of the neighborhood, form of the interaction term, etc.) is investigated on the simulated data and real fMRI data acquired during a conventional periodic finger tapping task. The merits of incorporating Euclidean information into the connectivity analysis are discussed. The ability of the Potts model clustering to uncover the hidden structure in the complex data is demonstrated through its application to the restingstate data to determine functional connectivity networks of the anterior and posterior cingulate cortices for the group of nine healthy male subjects. Hum Brain Mapp 29:422–
unknown title
, 2008
"... Model averaging and dimension selection for the singular value decomposition ..."
Abstract
 Add to MetaCart
Model averaging and dimension selection for the singular value decomposition
Unsupervised Learning of Boosted Tree . . .
, 2006
"... This study proposes an unsupervised learning approach for the task of hand pose recognition. Considering the large variation in hand poses, classification using a decision tree seems highly suitable for this purpose. Various research works have used boosted decision trees and have shown encouraging ..."
Abstract
 Add to MetaCart
This study proposes an unsupervised learning approach for the task of hand pose recognition. Considering the large variation in hand poses, classification using a decision tree seems highly suitable for this purpose. Various research works have used boosted decision trees and have shown encouraging results for pose recognition. This work also employs a boosted classifier tree learned in an unsupervised manner for hand pose recognition. We use a recursive two way spectral clustering method, namely the Normalized Cut method (NCut), to generate the decision tree. A binary boosting classifier is then learned at each node of the tree generated by the clustering algorithm. Since the output of the clustering algorithm may contain outliers in practice, the variant of boosting algorithm applied at each node is the Soft Margin version of AdaBoost, which was developed to maximize the classifier margin in a noisy environment. We propose a novel approach to learn the weak classifiers of the boosting process using the partitioning vector given by the NCut algorithm. The algorithm applies a linear regression of feature responses with the partitioning vector and utilizes the sample weights used in boosting to learn the weak hypotheses. Initial result shows satisfactory performances in recognizing complex hand poses with large variations in background and illumination. This framework of tree classifier can also be applied to general multiclass object recognition.