Results 1  10
of
94
InformationTheoretic CoClustering
 In KDD
, 2003
"... Twodimensional contingency or cooccurrence tables arise frequently in important applications such as text, weblog and marketbasket data analysis. A basic problem in contingency table analysis is coclustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views ..."
Abstract

Cited by 342 (12 self)
 Add to MetaCart
(Show Context)
Twodimensional contingency or cooccurrence tables arise frequently in important applications such as text, weblog and marketbasket data analysis. A basic problem in contingency table analysis is coclustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the coclustering problem as an optimization problem in information theory  the optimal coclustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters.
2005), “Disambiguating Web Appearances of People in a Social Network
 Proceedings of the 2005 World Wide Web Conference
"... Say you are looking for information about a particular person. A search engine returns many pages for that person’s name but which pages are about the person you care about, and which are about other people who happen to have the same name? Furthermore, if we are looking for multiple people who are ..."
Abstract

Cited by 126 (2 self)
 Add to MetaCart
(Show Context)
Say you are looking for information about a particular person. A search engine returns many pages for that person’s name but which pages are about the person you care about, and which are about other people who happen to have the same name? Furthermore, if we are looking for multiple people who are related in some way, how can we best leverage this social network? This paper presents two unsupervised frameworks for solving this problem: one based on link structure of the Web pages, another using Agglomerative/Conglomerative Double Clustering (A/CDC)—an application of a recently introduced multiway distributional clustering method. To evaluate our methods, we collected and handlabeled a dataset of over 1000 Web pages retrieved from Google queries on 12 personal names appearing together in someones in an email folder. On this dataset our methods outperform traditional agglomerative clustering by more than 20%, achieving over 80 % Fmeasure.
NonRedundant Data Clustering
, 2004
"... Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to de ..."
Abstract

Cited by 90 (3 self)
 Add to MetaCart
Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and nonnumeric attributes. We present experimental results for applications in text mining and computer vision.
Distributional Word Clusters vs. Words for Text Categorization
 Journal of Machine Learning Research
, 2003
"... We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. This wordcluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and efficient representati ..."
Abstract

Cited by 87 (7 self)
 Add to MetaCart
(Show Context)
We study an approach to text categorization that combines distributional clustering of words and a Support Vector Machine (SVM) classifier. This wordcluster representation is computed using the recently introduced Information Bottleneck method, which generates a compact and efficient representation of documents. When combined with the classification power of the SVM, this method yields high performance in text categorization. This novel combination of SVM with wordcluster representation is compared with SVMbased categorization using the simpler bagofwords (BOW) representation. The comparison is performed over three known datasets. On one of these datasets (the 20 Newsgroups) the method based on word clusters significantly outperforms the wordbased representation in terms of categorization accuracy or representation efficiency. On the two other sets (Reuters21578 and WebKB) the wordbased representation slightly outperforms the wordcluster representation. We investigate the potential reasons for this behavior and relate it to structural differences between the datasets.
Expectation maximization and posterior constraints
 In NIPS 20
, 2008
"... The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables ..."
Abstract

Cited by 73 (12 self)
 Add to MetaCart
(Show Context)
The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple apriori information about latent variables in graphical models without making the models overly complex or intractable. In this paper, we present an efficient, principled way to inject rich constraints on the posteriors of latent variables into the EM algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard baselines and be competitive with more complex, intractable models. 1
Multiway distributional clustering via pairwise interactions
 In ICML
, 2005
"... We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions between the types, as observed in cooccurrence data. In this scheme, multiple clustering systems are generated aiming at maximi ..."
Abstract

Cited by 62 (10 self)
 Add to MetaCart
(Show Context)
We present a novel unsupervised learning scheme that simultaneously clusters variables of several types (e.g., documents, words and authors) based on pairwise interactions between the types, as observed in cooccurrence data. In this scheme, multiple clustering systems are generated aiming at maximizing an objective function that measures multiple pairwise mutual information between cluster variables. To implement this idea, we propose an algorithm that interleaves topdown clustering of some variables and bottomup clustering of the other variables, with a local optimization correction routine. Focusing on document clustering we present an extensive empirical study of twoway, threeway and fourway applications of our scheme using six realworld datasets including the 20 Newsgroups (20NG) and the Enron email collection. Our multiway distributional clustering (MDC) algorithms consistently and significantly outperform previous stateoftheart information theoretic clustering algorithms. 1.
Generative modelbased document clustering: a comparative study
 Knowledge and Information Systems
, 2005
"... Semisupervised learning has become an attractive methodology for improving classification models and is often viewed as using unlabeled data to aid supervised learning. However, it can also be viewed as using labeled data to help clustering, namely, semisupervised clustering. Viewing semisupervis ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
(Show Context)
Semisupervised learning has become an attractive methodology for improving classification models and is often viewed as using unlabeled data to aid supervised learning. However, it can also be viewed as using labeled data to help clustering, namely, semisupervised clustering. Viewing semisupervised learning from a clustering angle is useful in practical situations when the set of labels available in labeled data are not complete, i.e., unlabeled data contain new classes that are not present in labeled data. This paper analyzes several multinomial modelbased semisupervised document clustering methods under a principled modelbased clustering framework. The framework naturally leads to a deterministic annealing extension of existing semisupervised clustering approaches. We compare three (slightly) different semisupervised approaches for clustering documents: Seeded damnl, Constrained damnl, and Feedbackbased damnl, where damnl stands for multinomial modelbased deterministic annealing algorithm. The first two are extensions of the seeded kmeans and constrained kmeans algorithms studied by Basu et al. (2002); the last one is motivated by Cohn et al. (2003). Through empirical experiments on text datasets, we show that: (a) deterministic annealing can often significantly improve the performance of semisupervised clustering; (b) the constrained approach is the best when available labels are complete whereas the feedbackbased approach excels when available labels are incomplete.
Nonnegative matrix factorization for rapid recovery of constituent spectra in magnetic resonance chemical shift imaging of the brain
 IEEE Trans on Med Imaging
, 2004
"... Abstract—We present an algorithm for blindly recovering constituent source spectra from magnetic resonance (MR) chemical shift imaging (CSI) of the human brain. The algorithm, which we call constrained nonnegative matrix factorization (cNMF), does not enforce independence or sparsity, instead only ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We present an algorithm for blindly recovering constituent source spectra from magnetic resonance (MR) chemical shift imaging (CSI) of the human brain. The algorithm, which we call constrained nonnegative matrix factorization (cNMF), does not enforce independence or sparsity, instead only requiring the source and mixing matrices to be nonnegative. It is based on the nonnegative matrix factorization (NMF) algorithm, extending it to include a constraint on the positivity of the amplitudes of the recovered spectra. This constraint enables recovery of physically meaningful spectra even in the presence of noise that causes a significant number of the observation amplitudes to be negative. We demonstrate and characterize the algorithm’s performance using P volumetric brain data, comparing the results with two different blind source separation methods: Bayesian spectral decomposition (BSD) and nonnegative sparse coding (NNSC). We then incorporate the cNMF algorithm into a hierarchical decomposition framework, showing that it can be used to recover tissuespecific spectra given a processing hierarchy that proceeds coarsetofine. We demonstrate the hierarchical procedure on H brain data and conclude that the computational efficiency of the algorithm makes it wellsuited for use in diagnostic workup. Index Terms—Blind source separation (BSS), chemical shift imaging (CSI), hierarchical decomposition, magnetic resonance (MR), magnetic resonance spectroscopy (MRS), nonnegative matrix factorization (NMF).
Sufficient Dimensionality Reduction
 Journal of Machine Learning Research
, 2003
"... Dimensionality reduction of empirical cooccurrence data is a fundamental problem in unsupervised learning. It is also a well studied problem in statistics known as the analysis of crossclassified data. One principled approach to this problem is to represent the data in low dimension with minimal l ..."
Abstract

Cited by 37 (6 self)
 Add to MetaCart
(Show Context)
Dimensionality reduction of empirical cooccurrence data is a fundamental problem in unsupervised learning. It is also a well studied problem in statistics known as the analysis of crossclassified data. One principled approach to this problem is to represent the data in low dimension with minimal loss of (mutual) information contained in the original data. In this paper we introduce an information theoretic nonlinear method for finding such a most informative dimension reduction. In contrast with...