Results 1  10
of
47
Support Vector Clustering
, 2001
"... We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several compo ..."
Abstract

Cited by 162 (1 self)
 Add to MetaCart
We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several components, each enclosing a separate cluster of points. We present a simple algorithm for identifying these clusters. The width of the Gaussian kernel controls the scale at which the data is probed while the soft margin constant helps coping with outliers and overlapping clusters. The structure of a dataset is explored by varying the two parameters, maintaining a minimal number of support vectors to assure smooth cluster boundaries. We demonstrate the performance of our algorithm on several datasets.
Multivariate information bottleneck
, 2001
"... The Information bottleneck method is an unsupervised nonparametric data organization technique. Given a joint distribution¢¤£¦¥¨§�©� � , this method constructs a new variable � that extracts partitions, or clusters, over the values of ¥ that are informative about ©. The information bottleneck has a ..."
Abstract

Cited by 86 (14 self)
 Add to MetaCart
The Information bottleneck method is an unsupervised nonparametric data organization technique. Given a joint distribution¢¤£¦¥¨§�©� � , this method constructs a new variable � that extracts partitions, or clusters, over the values of ¥ that are informative about ©. The information bottleneck has already been applied to document classification, gene expression, neural code, and spectral analysis. In this paper, we introduce a general principled framework for multivariate extensions of the information bottleneck method. This allows us to consider multiple systems of data partitions that are interrelated. Our approach utilizes Bayesian networks for specifying the systems of clusters and what information each captures. We show that this construction provides insight about bottleneck variations and enables us to characterize solutions of these variations. We also present a general framework for iterative algorithms for constructing solutions, and apply it to several examples. 1
Pagerank without hyperlinks: structural reranking using links induced by language models
 In Proceedings of SIGIR
, 2005
"... Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural reranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider gener ..."
Abstract

Cited by 80 (11 self)
 Add to MetaCart
Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural reranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider generation links, which indicate that the language model induced from one document assigns high probability to the text of another; in doing so, we take care to prevent bias against long documents. We study a number of reranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard languagemodelbased retrieval is quite effective at improving precision at top ranks.
The power of word clusters for text classification
 In 23rd European Colloquium on Information Retrieval Research
, 2001
"... The recently introduced Information Bottleneck method [21] provides an information theoretic framework, for extracting features of one variable, that are relevant for the values of another variable. Several previous works already suggested applying this method for document clustering, gene expressio ..."
Abstract

Cited by 68 (6 self)
 Add to MetaCart
The recently introduced Information Bottleneck method [21] provides an information theoretic framework, for extracting features of one variable, that are relevant for the values of another variable. Several previous works already suggested applying this method for document clustering, gene expression data analysis, spectral analysis and more. In this work we present a novel implementation of this method for supervised text classification. Specifically, we apply the information bottleneck method to find wordclusters that preserve the information about document categories and use these clusters as features for classification. Previous work [1] used a similar clustering procedure to show that wordclusters can significantly reduce the feature space dimensionality, with only a minor change in classification accuracy. In this work we present similar results and go further to show that when the training sample is small word clusters can yield significant improvement in classification accuracy (up to ¢¡¤£) over the performance using the words directly. 1
Diffusion maps, spectral clustering and eigenfunctions of fokkerplanck operators
 in Advances in Neural Information Processing Systems 18
, 2005
"... This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points ..."
Abstract

Cited by 65 (10 self)
 Add to MetaCart
This paper presents a diffusion based probabilistic interpretation of spectral clustering and dimensionality reduction algorithms that use the eigenvectors of the normalized graph Laplacian. Given the pairwise adjacency matrix of all points, we define a diffusion distance between any two data points and show that the low dimensional representation of the data by the first few eigenvectors of the corresponding Markov matrix is optimal under a certain mean squared error criterion. Furthermore, assuming that data points are random samples from a density p(x) = e −U(x) we identify these eigenvectors as discrete approximations of eigenfunctions of a FokkerPlanck operator in a potential 2U(x) with reflecting boundary conditions. Finally, applying known results regarding the eigenvalues and eigenfunctions of the continuous FokkerPlanck operator, we provide a mathematical justification for the success of spectral clustering and dimensional reduction algorithms based on these first few eigenvectors. This analysis elucidates, in terms of the characteristics of diffusion processes, many empirical findings regarding spectral clustering algorithms.
Diffusion Maps, Spectral Clustering and Reaction
 Applied and Computational Harmonic Analysis: Special issue on Diffusion Maps and Wavelets
, 2006
"... A central problem in data analysis is the low dimensional representation of high dimensional data, and the concise description of its underlying geometry and density. In the analysis of large scale simulations of complex dynamical systems, where the notion of time evolution comes into play, importan ..."
Abstract

Cited by 58 (13 self)
 Add to MetaCart
A central problem in data analysis is the low dimensional representation of high dimensional data, and the concise description of its underlying geometry and density. In the analysis of large scale simulations of complex dynamical systems, where the notion of time evolution comes into play, important problems are the identification of slow variables and dynamically meaningful reaction coordinates that capture the long time evolution of the system. In this paper we provide a unifying view of these apparently different tasks, by considering a family of di#usion maps, defined as the embedding of complex (high dimensional) data onto a low dimensional Euclidian space, via the eigenvectors of suitably defined random walks defined on the given datasets. Assuming that the data is randomly sampled from an underlying general probability distribution p(x) = e U(x) , we show that as the number of samples goes to infinity, the eigenvectors of each di#usion map converge to the eigenfunctions of a corresponding di#erential operator defined on the support of the probability distribution. Di#erent normalizations of the Markov chain on the graph lead to di#erent limiting di#erential operators.
Learning spectral clustering, with application to speech separation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost fun ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters, with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive new cost functions for spectral clustering based on measures of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing these cost functions with respect to the partition leads to new spectral clustering algorithms. Minimizing with respect to the similarity matrix leads to algorithms for learning the similarity matrix from fully labelled datasets. We apply our learning algorithm to the blind onemicrophone speech separation problem, casting the problem as one of segmentation of the spectrogram.
Information Regularization with Partially Labeled Data
 In Advances in Neural Information Processing Systems 15
, 2003
"... Classification with partially labeled data requires using a large number of unlabeled examples (or an estimated marginal P (x)), to further constrain the conditional P (yx) beyond a few available labeled examples. ..."
Abstract

Cited by 41 (6 self)
 Add to MetaCart
Classification with partially labeled data requires using a large number of unlabeled examples (or an estimated marginal P (x)), to further constrain the conditional P (yx) beyond a few available labeled examples.
Learning hidden variable networks: The information bottleneck approach
 Journal of Machine Learning Research
, 2005
"... A central challenge in learning probabilistic graphical models is dealing with domains that involve hidden variables. The common approach for learning model parameters in such domains is the expectation maximization (EM) algorithm. This algorithm, however, can easily get trapped in suboptimal local ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
A central challenge in learning probabilistic graphical models is dealing with domains that involve hidden variables. The common approach for learning model parameters in such domains is the expectation maximization (EM) algorithm. This algorithm, however, can easily get trapped in suboptimal local maxima. Learning the model structure is even more challenging. The structural EM algorithm can adapt the structure in the presence of hidden variables, but usually performs poorly without prior knowledge about the cardinality and location of the hidden variables. In this work, we present a general approach for learning Bayesian networks with hidden variables that overcomes these problems. The approach builds on the information bottleneck framework of Tishby et al. (1999). We start by proving formal correspondence between the information bottleneck objective and the standard parametric EM functional. We then use this correspondence to construct a learning algorithm that combines an informationtheoretic smoothing term with a continuation procedure. Intuitively, the algorithm bypasses local maxima and achieves superior solutions by following a continuous path from a solution of, an easy and smooth, target function, to a solution of the desired likelihood function. As we show, our algorithmic framework allows learning of the parameters as well as the structure of a network. In addition, it also allows us to introduce new hidden variables during model selection and learn their cardinality. We demonstrate the performance of our procedure on several challenging reallife data sets.
Path Based Pairwise Data Clustering with Application to Texture Segmentation
, 2001
"... Most cost function based clustering or partitioning methods measure the compactness of groups of data. In contrast to this picture of a point source in feature space, some data sources are spread out on a lowdimensional manifold which is embedded in a high dimensional data space. This property ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
Most cost function based clustering or partitioning methods measure the compactness of groups of data. In contrast to this picture of a point source in feature space, some data sources are spread out on a lowdimensional manifold which is embedded in a high dimensional data space. This property is adequately captured by the criterion of connectedness which is approximated by graph theoretic partitioning methods.