Results 1  10
of
37
Supervised Learning of Quantizer Codebooks by Information Loss Minimization
, 2007
"... This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss, such that the index K of the quantizer region to which a given feature X is assigned approximates a sufficient statistic fo ..."
Abstract

Cited by 71 (0 self)
 Add to MetaCart
(Show Context)
This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss, such that the index K of the quantizer region to which a given feature X is assigned approximates a sufficient statistic for its class label Y. We derive an alternating minimization procedure for simultaneously learning codebooks in the Euclidean feature space and in the simplex of posterior class distributions. The resulting quantizer can be used to encode unlabeled points outside the training set and to predict their posterior class distributions, and has an elegant interpretation in terms of lossless source coding. The proposed method is extensively validated on synthetic and real datasets, and is applied to two diverse problems: learning discriminative visual vocabularies for bagoffeatures image classification, and image segmentation.
Discriminative Clustering by Regularized Information Maximization
"... Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive informationtheoretic objective function which balances class separation, class balance and classifier complexity. The approach can flexibly incorporate different likelihood functions, express prior assumptions about the relative size of different classes and incorporate partial labels for semisupervised learning. In particular, we instantiate the framework to unsupervised, multiclass kernelized logistic regression. Our empirical evaluation indicates that RIM outperforms existing methods on several real data sets, and demonstrates that RIM is an effective model selection method. 1
A Nonparametric Information Theoretic Clustering Algorithm
"... In this paper we propose a novel clustering algorithm based on maximizing the mutual information between data points and clusters. Unlike previous methods, we neither assume the data are given in terms of distributions nor impose any parametric model on the withincluster distribution. Instead, we u ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
In this paper we propose a novel clustering algorithm based on maximizing the mutual information between data points and clusters. Unlike previous methods, we neither assume the data are given in terms of distributions nor impose any parametric model on the withincluster distribution. Instead, we utilize a nonparametric estimation of the average cluster entropies and search for a clustering that maximizes the estimated mutual information between data points and clusters. The improved performance of the proposed algorithm is demonstrated on several standard datasets. 1.
PACBayesian Analysis of Coclustering and Beyond
"... We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approa ..."
Abstract

Cited by 15 (7 self)
 Add to MetaCart
We derive PACBayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as coclustering, matrix trifactorization, graphical models, graph clustering, and pairwise clustering. 1 We begin with the analysis of coclustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in cooccurrence matrices. We derive PACBayesian generalization bounds for the expected outofsample performance of coclusteringbased solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of coclustering. The bounds suggest that the expected performance of coclustering is governed by a tradeoff between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this tradeoff for discriminative prediction tasks. This algorithm achieved stateoftheart performance in the MovieLens collaborative filtering task. Our coclustering model can also be seen as matrix trifactorization and the results provide generalization bounds, regularization
Information bottleneck for non cooccurrence data
 In Advances in Neural Information Processing Systems 19
, 2007
"... We present a general modelindependent approach to the analysis of data in cases when these data do not appear in the form of cooccurrence of two variables X, Y, but rather as a sample of values of an unknown (stochastic) function Z(X, Y). For example, in gene expression data, the expression level ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
(Show Context)
We present a general modelindependent approach to the analysis of data in cases when these data do not appear in the form of cooccurrence of two variables X, Y, but rather as a sample of values of an unknown (stochastic) function Z(X, Y). For example, in gene expression data, the expression level Z is a function of gene X and condition Y; or in movie ratings data the rating Z is a function of viewer X and movie Y. The approach represents a consistent extension of the Information Bottleneck method that has previously relied on the availability of cooccurrence statistics. By altering the relevance variable we eliminate the need in the sample of joint distribution of all input variables. This new formulation also enables simple MDLlike model complexity control and prediction of missing values of Z. The approach is analyzed and shown to be on a par with the best known clustering algorithms for a wide range of domains. For the prediction of missing values (collaborative filtering) it improves the currently best known results. 1
Learning nearestneighbor quantizers from labeled data by information loss minimization
 In Int. Conf. on AI and Stat
, 2007
"... This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss, such that the index K of the quantizer region to which a given feature X is assigned approximates a sufficient statistic fo ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
This paper proposes a technique for jointly quantizing continuous features and the posterior distributions of their class labels based on minimizing empirical information loss, such that the index K of the quantizer region to which a given feature X is assigned approximates a sufficient statistic for its class label Y. We derive an alternating minimization procedure for simultaneously learning codebooks in the Euclidean feature space and in the simplex of posterior class distributions. The resulting quantizer can be used to encode unlabeled points outside the training set and to predict their posterior class distributions, and has an elegant interpretation in terms of universal lossless coding. The promise of our method is demonstrated for the application of learning discriminative visual vocabularies for bagoffeatures image classification. 1
Allegro: Analyzing expression and sequence in concert to discover regulatory programs
, 2008
"... A major goal of system biology is the characterization of transcription factors and microRNAs (miRNAs) and the transcriptional programs they regulate. We present Allegro, a method for denovo discovery of cisregulatory transcriptional programs through joint analysis of genomewide expression data a ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
A major goal of system biology is the characterization of transcription factors and microRNAs (miRNAs) and the transcriptional programs they regulate. We present Allegro, a method for denovo discovery of cisregulatory transcriptional programs through joint analysis of genomewide expression data and promoter or 3 ’ UTR sequences. The algorithm uses a novel loglikelihoodbased, nonparametric model to describe the expression pattern shared by a group of coregulated genes. We show that Allegro is more accurate and sensitive than existing techniques, and can simultaneously analyze multiple expression datasets with more than 100 conditions. We apply Allegro on datasets from several species and report on the transcriptional modules it uncovers. Our analysis reveals a novel motif overrepresented in the promoters of genes highly expressed in murine oocytes, and several new motifs related to fly development. Finally, using stemcell expression profiles, we identify three miRNA families with pivotal roles in human embryogenesis.
Learning and generalization with the information bottleneck method
, 2008
"... The Information Bottleneck (IB) method, introduced in [22], is an informationtheoretic framework for extracting relevant components of an ‘input ’ random variable X, with respect to an ‘output ’ random variable Y. This is performed by finding a compressed, nonparametric and modelindependent repres ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
The Information Bottleneck (IB) method, introduced in [22], is an informationtheoretic framework for extracting relevant components of an ‘input ’ random variable X, with respect to an ‘output ’ random variable Y. This is performed by finding a compressed, nonparametric and modelindependent representation
doi:10.1093/nar/gkp792 Universal functionspecificity of codon usage
, 2009
"... Synonymous codon usage has long been known as a factor that affects average expression level of proteins in fastgrowing microorganisms, but neither its role in dynamic changes of expression in response to environmental changes nor selective factors shaping it in the genomes of higher eukaryotes hav ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Synonymous codon usage has long been known as a factor that affects average expression level of proteins in fastgrowing microorganisms, but neither its role in dynamic changes of expression in response to environmental changes nor selective factors shaping it in the genomes of higher eukaryotes have been fully understood. Here, we propose that codon usage is ubiquitously selected to synchronize the translation efficiency with the dynamic alteration of protein expression in response to environmental and physiological changes. Our analysis reveals that codon usage is universally correlated with gene function, suggesting its potential contribution to synchronized regulation of genes with similar functions. We directly show that coexpressed genes have similar synonymous codon usages within the genomes of human, yeast, Caenorhabditis elegans and Escherichia coli. We also demonstrate that perturbing the codon usage directly affects the level or even direction of changes in protein expression in response to environmental stimuli. Perturbing tRNA composition also has tangible phenotypic effects on the cell. By showing that codon usage is universally functionspecific, our results expand, to almost all organisms, the notion that cells may need to dynamically alter their intracellular tRNA composition in order to adapt to their new environment or physiological role.
Parallel pairwise clustering
 SDM’09, Proceedings of SIAM Data Mining conference
, 2009
"... Given the pairwise affinity relations associated with a set of data items, the goal of a clustering algorithm is to automatically partition the data into a small number of homogeneous clusters. However, since the input size is quadratic in the number of data points, existing algorithms are non feasi ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Given the pairwise affinity relations associated with a set of data items, the goal of a clustering algorithm is to automatically partition the data into a small number of homogeneous clusters. However, since the input size is quadratic in the number of data points, existing algorithms are non feasible for many practical applications. Here, we propose a simple strategy to cluster massive data by randomly splitting the original affinity matrix into small manageable affinity matrices that are clustered independently. Our proposal is most appealing in a parallel computing environment where at each iteration, each worker node clusters a subset of the input data and the results from all workers are then integrated in a master node to create a new clustering partition over the entire data. We demonstrate that this approach yields high quality clustering partitions for various real world problems, even though at each iteration only small fractions of the original data matrix are examined and at no point is the entire affinity matrix stored in memory or even computed. Furthermore, we demonstrate that the proposed algorithm has intriguing stochastic convergence properties that provide further insight into the clustering problem. 1