Results 1  10
of
52
Learning the structure of linear latent variable models
 Journal of Machine Learning Research
, 2006
"... We describe anytime search procedures that (1) find disjoint subsets of recorded variables for which the members of each subset are dseparated by a single common unrecorded cause, if such exists; (2) return information about the causal relations among the latent factors so identified. We prove the ..."
Abstract

Cited by 50 (16 self)
 Add to MetaCart
We describe anytime search procedures that (1) find disjoint subsets of recorded variables for which the members of each subset are dseparated by a single common unrecorded cause, if such exists; (2) return information about the causal relations among the latent factors so identified. We prove the procedure is pointwise consistent assuming (a) the causal relations can be represented by a directed acyclic graph (DAG) satisfying the Markov Assumption and the Faithfulness Assumption; (b) unrecorded variables are not caused by recorded variables; and (c) dependencies are linear. We compare the procedure with standard approaches over a variety of simulated structures and sample sizes, and illustrate its practical value with brief studies of social science data sets. Finally, we
The Infinite Hierarchical Factor Regression Model
"... We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on King ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
(Show Context)
We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman’s coalescent. We apply this model to two problems (factor analysis and factor regression) in geneexpression data analysis. 1
Undercomplete blind subspace deconvolution
 JMLR
, 2007
"... We introduce the blind subspace deconvolution (BSSD) problem, which is the extension of both the blind source deconvolution (BSD) and the independent subspace analysis (ISA) tasks. We examine the case of the undercomplete BSSD (uBSSD). Applying temporal concatenation we reduce this problem to ISA. T ..."
Abstract

Cited by 20 (15 self)
 Add to MetaCart
(Show Context)
We introduce the blind subspace deconvolution (BSSD) problem, which is the extension of both the blind source deconvolution (BSD) and the independent subspace analysis (ISA) tasks. We examine the case of the undercomplete BSSD (uBSSD). Applying temporal concatenation we reduce this problem to ISA. The associated ‘high dimensional ’ ISA problem can be handled by a recent technique called joint fdecorrelation (JFD). Similar decorrelation methods have been used previously for kernel independent component analysis (kernelICA). More precisely, the kernel canonical correlation (KCCA) technique is a member of this family, and, as is shown in this paper, the kernel generalized variance (KGV) method can also be seen as a decorrelation method in the feature space. These kernel based algorithms will be adapted to the ISA task. In the numerical examples, we (i) examine how efficiently the emerging higher dimensional ISA tasks can be tackled, and (ii) explore the working and advantages of the derived kernelISA methods.
Variational Inference for the Nested Chinese Restaurant Process
"... The nested Chinese restaurant process (nCRP) is a powerful nonparametric Bayesian model for learning treebased hierarchies from data. Since its posterior distribution is intractable, current inference methods have all relied on MCMC sampling. In this paper, we develop an alternative inference techn ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
The nested Chinese restaurant process (nCRP) is a powerful nonparametric Bayesian model for learning treebased hierarchies from data. Since its posterior distribution is intractable, current inference methods have all relied on MCMC sampling. In this paper, we develop an alternative inference technique based on variational methods. To employ variational methods, we derive a treebased stickbreaking construction of the nCRP mixture model, and a novel variational algorithm that efficiently explores a posterior over a large set of combinatorial structures. We demonstrate the use of this approach for text and hand written digits modeling, where we show we can adapt the nCRP to continuous data as well. 1
Learning graphical models for hypothesis testing
 IN PROC. 14TH IEEE STATIST. SIGNAL PROCESS. WORKSHOP
, 2010
"... Sparse graphical models have proven to be a flexible class of multivariate probability models for approximating highdimensional distributions. In this paper, we propose techniques to exploit this modeling ability for binary classification by discriminatively learning such models from labeled traini ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Sparse graphical models have proven to be a flexible class of multivariate probability models for approximating highdimensional distributions. In this paper, we propose techniques to exploit this modeling ability for binary classification by discriminatively learning such models from labeled training data, i.e., using both positive and negative samples to optimize for the structures of the two models. We motivate why it is difficult to adapt existing generative methods, and propose an alternative method consisting of two parts. First, we develop a novel method to learn treestructured graphical models which optimizes an approximation of the loglikelihood ratio. We also formulate a joint objective to learn a nested sequence of optimal forestsstructured models. Second, we construct a classifier by using ideas from boosting to learn a set of discriminative trees. The final classifier can interpreted as a likelihood ratio test between two models with a larger set of pairwise features. We use crossvalidation to determine the optimal number of edges in the final model. The algorithm presented in this paper also provides a method to identify a subset of the edges that are most salient for discrimination. Experiments show that the proposed procedure outperforms generative methods such as Tree Augmented Naïve Bayes and ChowLiu as well as their boosted counterparts.
Independent Process Analysis Without a Priori Dimensional Information
"... Abstract. Recently, several algorithms have been proposed for independent subspace analysis where hidden variables are i.i.d. processes. We show that these methods can be extended to certain AR, MA, ARMA and ARIMA tasks. Central to our paper is that we introduce a cascade of algorithms, which aims t ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Abstract. Recently, several algorithms have been proposed for independent subspace analysis where hidden variables are i.i.d. processes. We show that these methods can be extended to certain AR, MA, ARMA and ARIMA tasks. Central to our paper is that we introduce a cascade of algorithms, which aims to solve these tasks without previous knowledge about the number and the dimensions of the hidden processes. Our claim is supported by numerical simulations. As an illustrative application where the dimensions of the hidden variables are unknown, we search for subspaces of facial components. 1
Learning HighDimensional Markov Forest Distributions: Analysis of Error Rates
, 1005
"... The problem of learning foreststructured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the ChowLiu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error proba ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
The problem of learning foreststructured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the ChowLiu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under fixed model size. For the highdimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufficient conditions on (n,d,k) are given for the algorithm to satisfy structural and risk consistencies. In addition, the extremal structures for learning are identified; we prove that the independent (resp. tree) model is the hardest (resp. easiest) to learn using the proposed algorithm in terms of error rates for structure learning.
A unifying model for blind separation of independent sources
 Signal Processing 85 (7
, 2005
"... www.cs.helsinki.fi/aapo.hyvarinen/ ..."
(Show Context)
Generalized component analysis and blind source separation methods for analyzing multichannel brain signals
 Statistical and Process Models of Cognitive Aging, Mahwah, NJ
, 2006
"... Blind source separation (BSS) and related methods, e.g., independent component analysis (ICA) are generally based on a wide class of unsupervised learning algorithms and they found potential applications in many areas from engineering to neuroscience. The recent trends in blind source separation and ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Blind source separation (BSS) and related methods, e.g., independent component analysis (ICA) are generally based on a wide class of unsupervised learning algorithms and they found potential applications in many areas from engineering to neuroscience. The recent trends in blind source separation and generalized component analysis (GCA) is to consider problems in the framework of matrix factorization or more general signals decomposition with probabilistic generative models and exploit a priori knowledge about true nature, morphology or structure of latent (hidden) variables or sources such as sparseness, spatiotemporal decorrelation, statistical independence, nonnegativity, smoothness or lowest possible complexity. The goal of BSS can be considered as estimation of true physical sources and parameters of a mixing system, while objective of GCA is finding a new reduced or hierarchical and structured representation for the observed (sensor) data that can be interpreted as physically meaningful coding or blind signal decompositions. The key issue is to find a such transformation or coding which has true physical meaning and interpretation. In this paper we discuss some promising applications of BSS/GCA for analyzing multimodal, multisensory data, especially EEG/MEG data. Moreover, we propose to apply
Blind Source Separation and Independent Component Analysis: A Review
, 2004
"... Blind source separation (BSS) and independent component analysis (ICA) are generally based on a wide class of unsupervised learning algorithms and they found potential applications in many areas from engineering to neuroscience. A recent trend in BSS is to consider problems in the framework of matr ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Blind source separation (BSS) and independent component analysis (ICA) are generally based on a wide class of unsupervised learning algorithms and they found potential applications in many areas from engineering to neuroscience. A recent trend in BSS is to consider problems in the framework of matrix factorization or more general signals decomposition with probabilistic generative and tree structured graphical models and exploit a priori knowledge about true nature and structure of latent (hidden) variables or sources such as spatiotemporal decorrelation, statistical independence, sparseness, smoothness or lowest complexity in the sense e.g., of best predictability. The possible goal of such decomposition can be considered as the estimation of sources not necessary statistically independent and parameters of a mixing system or more generally as finding a new reduced or hierarchical and structured representation for the observed (sensor) data that can be interpreted as physically meaningful coding or blind source estimation. The key issue is to find a such transformation or coding (linear or nonlinear) which has true physical meaning and interpretation. We present a review of BSS and ICA, including various algorithms for static and dynamic models and their applications. The paper mainly consists of three parts: