Results 1  10
of
40
Efficient learning of hierarchical latent class models
 In: Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, Boca
, 2004
"... Hierarchical latent class (HLC) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are hidden. In earlier work, we have demonstrated in principle the possibility of reconstructing HLC models from data. In this paper, we address the scalability issue and de ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
Hierarchical latent class (HLC) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are hidden. In earlier work, we have demonstrated in principle the possibility of reconstructing HLC models from data. In this paper, we address the scalability issue and develop a searchbased algorithm that can efficiently learn highquality HLC models for realistic domains. There are three technical contributions: (1) the identification of a set of search operators; (2) the use of improvement in BIC score per unit of increase in model complexity, rather than BIC score itself, for model selection; and (3) the adaptation of structural EM for situations where candidate models contain different variables than the current model. The algorithm was tested on the COIL Challenge 2000 data set and an interesting model was found. 1
SumProduct Networks: A New Deep Architecture
"... The key limiting factor in graphical model inference and learning is the complexity of the partition function. We thus ask the question: what are general conditions under which the partition function is tractable? The answer leads to a new kind of deep architecture, which we call sumproduct networks ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
The key limiting factor in graphical model inference and learning is the complexity of the partition function. We thus ask the question: what are general conditions under which the partition function is tractable? The answer leads to a new kind of deep architecture, which we call sumproduct networks (SPNs). SPNs are directed acyclic graphs with variables as leaves, sums and products as internal nodes, and weighted edges. We show that if an SPN is complete and consistent it represents the partition function and all marginals of some graphical model, and give semantics to its nodes. Essentially all tractable graphical models can be cast as SPNs, but SPNs are also strictly more general. We then propose learning algorithms for SPNs, based on backpropagation and EM. Experiments show that inference and learning with SPNs can be both faster and more accurate than with standard deep networks. For example, SPNs perform image completion better than stateoftheart deep networks for this task. SPNs also have intriguing potential connections to the architecture of the cortex. 1
Learning Latent Tree Graphical Models
 J. of Machine Learning Research
, 2011
"... We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing me ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using socalled information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a preprocessing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighborjoining) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare
The “ideal parent” structure learning for continuous variable networks
 Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence
, 2004
"... In recent years, there is a growing interest in learning Bayesian networks with continuous variables. Learning the structure of such networks is a computationally expensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks with hi ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
In recent years, there is a growing interest in learning Bayesian networks with continuous variables. Learning the structure of such networks is a computationally expensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks with hidden variables. We present a general method for significantly speeding the structure search algorithm for continuous variable networks with common parametric distributions. Importantly, our method facilitates the addition of new hidden variables into the network structure efficiently. We demonstrate the method on several data sets, both for learning structure on fully observable data, and for introducing new hidden variables during structure search. 1
Classification using Hierarchical Naïve Bayes models
 Machine Learning 2006
, 2002
"... Classification problems have a long history in the machine learning literature. One of the simplest, and yet most consistently well performing set of classifiers is the Nave Bayes models. However, an inherent problem with these classifiers is the assumption that all attributes used to describe an in ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Classification problems have a long history in the machine learning literature. One of the simplest, and yet most consistently well performing set of classifiers is the Nave Bayes models. However, an inherent problem with these classifiers is the assumption that all attributes used to describe an instance are conditionally independent given the class of that instance. When this assumption is violated (which is often the case in practice) it can reduce classification accuracy due to "information doublecounting" and interaction omission.
1 Greedy Learning of Binary Latent Trees
"... Abstract—Inferring latent structures from observations helps to model and possibly also understand underlying data generating processes. A rich class of latent structures are the latent trees, i.e. treestructured distributions involving latent variables where the visible variables are leaves. These ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
Abstract—Inferring latent structures from observations helps to model and possibly also understand underlying data generating processes. A rich class of latent structures are the latent trees, i.e. treestructured distributions involving latent variables where the visible variables are leaves. These are also called hierarchical latent class (HLC) models. Zhang (2004) proposed a search algorithm for learning such models in the spirit of Bayesian network structure learning. While such an approach can find good solutions it can be computationally expensive. As an alternative we investigate two greedy procedures: the BING algorithm determines both the structure of the tree and the cardinality of the latent variables in a bottomup fashion. The BINA algorithm first determines the tree structure using agglomerative hierarchical clustering, and then determines the cardinality of the latent variables as for BING. We show that even with restricting ourselves to binary trees we obtain HLC models of comparable quality to Zhang’s solutions (in terms of crossvalidated loglikelihood), while being generally faster to compute. This claim is validated by a comprehensive comparison on several datasets. Furthermore, we demonstrate that our methods are able to estimate interpretable latent structures on realworld data with a large number of variables. By applying our method to a restricted version of the 20 newsgroups data these models turn out to be related to topic models, and on data from the PASCAL Visual Object Classes (VOC) 2007 challenge we show how such treestructured models help us understand how objects cooccur in images. For reproducibility of all experiments in this paper, all code and datasets (or links to data) is available 1.
Efficient Model Evaluation in the SearchBased Approach to Latent Structure Discovery
"... Latent tree (LT) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are hidden. We are interested in learning LT models through systematic search. A key problem here is how to efficiently evaluate candidate models during search. The problem is difficult b ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
Latent tree (LT) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are hidden. We are interested in learning LT models through systematic search. A key problem here is how to efficiently evaluate candidate models during search. The problem is difficult because there is a large number of candidate models, the candidate models contain latent variables, and some of those latent variables are foreign to the current model. A variety of ideas for attacking the problem have emerged from the literature. In this paper we observe that the ideas can be grouped into two distinct approaches. The first is based on data completion, while the second is based on what we call maximum restricted likelihood. We investigate and compare the two approaches in the framework of EAST, a newly developed search procedure for learning LT models. 1
Kernel Embeddings of Latent Tree Graphical Models
"... Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and nonGaussian variables. Our method can recover the latent tree structures with provable guarantees and perform localminimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1
Effective Dimensions of Hierarchical Latent Class Models
 Journal of Artificial Intelligence Research
, 2002
"... Hierarchical latent class (HLC) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are latent. There are no theoretically well justified model selection criteria for HLC models in particular and Bayesian networks with latent nodes in general. Nonetheless, ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Hierarchical latent class (HLC) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are latent. There are no theoretically well justified model selection criteria for HLC models in particular and Bayesian networks with latent nodes in general. Nonetheless, empirical studies suggest that the BIC score is a reasonable criterion to use in practice for learning HLC models. Empirical studies also suggest that sometimes model selection can be improved if standard model dimension is replaced with effective model dimension in the penalty term of the BIC score.