Results 1  10
of
13
Sparse probabilistic projections
"... We present a generative model for performing sparse probabilistic projections, which includes sparse principal component analysis and sparse canonical correlation analysis as special cases. Sparsity is enforced by means of automatic relevance determination or by imposing appropriate prior distributi ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
(Show Context)
We present a generative model for performing sparse probabilistic projections, which includes sparse principal component analysis and sparse canonical correlation analysis as special cases. Sparsity is enforced by means of automatic relevance determination or by imposing appropriate prior distributions, such as generalised hyperbolic distributions. We derive a variational ExpectationMaximisation algorithm for the estimation of the hyperparameters and show that our novel probabilistic approach compares favourably to existing techniques. We illustrate how the proposed method can be applied in the context of cryptoanalysis as a preprocessing tool for the construction of template attacks. 1
Multilabel Prediction via Sparse Infinite CCA
"... Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of cor ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
Canonical Correlation Analysis (CCA) is a useful technique for modeling dependencies between two (or more) sets of variables. Building upon the recently suggested probabilistic interpretation of CCA, we propose a nonparametric, fully Bayesian framework that can automatically select the number of correlation components, and effectively capture the sparsity underlying the projections. In addition, given (partially) labeled data, our algorithm can also be used as a (semi)supervised dimensionality reduction technique, and can be applied to learn useful predictive features in the context of learning a set of related tasks. Experimental results demonstrate the efficacy of the proposed approach for both CCA as a standalone problem, and when applied to multilabel prediction. 1
Bayesian exponential family projections for coupled data sources
"... Exponential family extensions of principal component analysis (EPCA) have received a considerable amount of attention in recent years, demonstrating the growing need for basic modeling tools that do not assume the squared loss or Gaussian distribution. We extend the EPCA model toolbox by presenting ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Exponential family extensions of principal component analysis (EPCA) have received a considerable amount of attention in recent years, demonstrating the growing need for basic modeling tools that do not assume the squared loss or Gaussian distribution. We extend the EPCA model toolbox by presenting the first exponential family multiview learning methods of the partial least squares and canonical correlation analysis, based on a unified representation of EPCA as matrix factorization of the natural parameters of exponential family. The models are based on a new family of priors that are generally usable for all such factorizations. We also introduce new inference strategies, and demonstrate how the methods outperform earlier ones when the Gaussianity assumption does not hold. 1
Variational Bayesian matching
 In Proceedings of Asian Conference on Machine Learning
"... Matching of samples refers to the problem of inferring unknown cooccurrence or alignment between observations in two data sets. Given two sets of equally many samples, the task is to find for each sample a representative sample in the other set, without prior knowledge on a distance measure between ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Matching of samples refers to the problem of inferring unknown cooccurrence or alignment between observations in two data sets. Given two sets of equally many samples, the task is to find for each sample a representative sample in the other set, without prior knowledge on a distance measure between the sets. Recently a few alternative solutions have been suggested, based on maximization of joint likelihood or various measures of betweendata statistical dependency. In this work we present an variational Bayesian solution for the problem, learning a Bayesian canonical correlation analysis model with a permutation parameter for reordering the samples in one of the sets. We approximate the posterior over the permutations, and demonstrate that the resulting matching algorithm clearly outperforms all of the earlier solutions.
Multivariate
"... Vol. 26 ISMB 2010, pages i391–i398 doi:10.1093/bioinformatics/btq174 ..."
(Show Context)
Systematic Use of Computational Methods Allows Stratifying Treatment Responders in
"... Cancers are complex diseases whose comprehensive characterization requires genomescale molecular data at several levels from genetics to transcriptomics and clinical data. We use our recently published Anduril framework and introduce novel approaches, such as dependency analysis, to identify key va ..."
Abstract
 Add to MetaCart
(Show Context)
Cancers are complex diseases whose comprehensive characterization requires genomescale molecular data at several levels from genetics to transcriptomics and clinical data. We use our recently published Anduril framework and introduce novel approaches, such as dependency analysis, to identify key variables at miRNA, copy number variation, expression, methylation and pathway level in glioblastoma multiforme (GBM) progression and drug resistance. We also present methods to identify characteristics of clinically relevant subgroups, such as patients treated with temozolomide drug and patients with an EGFRvIII mutation, which is a constitutively active variant of EGFR. Our results identify several novel genomic regions and transcript profiles that may contribute to GBM progression and drug resistance. All results and Anduril scripts are available at
Biomarker discovery via dependency analysis of
"... multiview functional genomics data ..."
(Show Context)
Bayesian CCA via Group Sparsity
"... Bayesian treatments of Canonical Correlation Analysis (CCA)type latent variable models have been recently proposed for coping with overfitting in small sample sizes, as well as for producing factorizations of the data sources into correlated and nonshared effects. However, all of the current imple ..."
Abstract
 Add to MetaCart
Bayesian treatments of Canonical Correlation Analysis (CCA)type latent variable models have been recently proposed for coping with overfitting in small sample sizes, as well as for producing factorizations of the data sources into correlated and nonshared effects. However, all of the current implementations of Bayesian CCA and its extensions are computationally inefficient for highdimensional data and, as shown in this paper, break down completely for highdimensional sources with low sample count. Furthermore, they cannot reliably separate the correlated effects from nonshared ones. We propose a new Bayesian CCA variant that is computationally efficient and works for highdimensional data, while also learning the factorization more accurately. The improvements are gained by introducing a group sparsity assumption and an improved variational approximation. The method is demonstrated to work well on multilabel prediction tasks and in analyzing brain correlates of naturalistic audio stimulation. 1.
Päivämäärä: 10.5.2010 Kieli: Englanti Sivumäärä:6+47 Elektroniikan, tietoliikenteen ja automaation tiedekunta
"... aaltoyliopisto teknillinen korkeakoulu diplomityön tiivistelmä ..."
(Show Context)
MultiWay, MultiView Learning
"... We extend multiway, multivariate ANOVAtype analysis to cases where one covariate is the view, with features of each view coming from different, highdimensional domains. The different views are assumed to be connected by having paired samples; this is common in our main application, biological expe ..."
Abstract
 Add to MetaCart
(Show Context)
We extend multiway, multivariate ANOVAtype analysis to cases where one covariate is the view, with features of each view coming from different, highdimensional domains. The different views are assumed to be connected by having paired samples; this is common in our main application, biological experiments integrating data from different sources. Such experiments typically also include a controlled multiway experimental setup where disease status, medical treatment groups, gender and time of the measurement are usual covariates. We introduce a multiway latent variable model for this new task, by extending the generative model of Bayesian canonical correlation analysis (CCA) both to take multiway covariate information into account as population priors, and by reducing the dimensionality by an integrated factor analysis that assumes the features to come in correlated groups. 1