Results 1  10
of
20
Learning with mixtures of trees
 Journal of Machine Learning Research
, 2000
"... This paper describes the mixturesoftrees model, a probabilistic model for discrete multidimensional domains. Mixturesoftrees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learnin ..."
Abstract

Cited by 136 (2 self)
 Add to MetaCart
(Show Context)
This paper describes the mixturesoftrees model, a probabilistic model for discrete multidimensional domains. Mixturesoftrees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learning mixturesoftrees models in maximum likelihood and Bayesian frameworks. We also discuss additional efficiencies that can be obtained when data are “sparse, ” and we present data structures and algorithms that exploit such sparseness. Experimental results demonstrate the performance of the model for both density estimation and classification. We also discuss the sense in which treebased classifiers perform an implicit form of feature selection, and demonstrate a resulting insensitivity to irrelevant attributes.
Hierarchical Latent Class Models for Cluster Analysis
 Journal of Machine Learning Research
, 2002
"... Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is ..."
Abstract

Cited by 58 (12 self)
 Add to MetaCart
Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is often untrue. In this paper we propose hierarchical latent class models as a framework where the local dependence problem can be addressed in a principled manner. We develop a searchbased algorithm for learning hierarchical latent class models from data. The algorithm is evaluated using both synthetic and realworld data.
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data
, 2001
"... We investigate the problem of generating fast approximate answers to queries for large sparse binary data sets. We focus in particular on probabilistic modelbased approaches to this problem and develop a number of techniques that are significantly more accurate than a baseline independence model. I ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
(Show Context)
We investigate the problem of generating fast approximate answers to queries for large sparse binary data sets. We focus in particular on probabilistic modelbased approaches to this problem and develop a number of techniques that are significantly more accurate than a baseline independence model. In particular, we introduce a novel technique for building probabilistic models from frequent itemsets. The itemsets are treated as constraints on the distribution of the query variables and the maximum entropy principle is used online to build a joint probability model for attributes in the query. We show that the resulting probability model defines a Markov random field (MRF) and that the time taken to answer a query scales exponentially as a function of the induced width of the associated MRF graph. We empirically compare the MRF model to other probabilistic models, such as the independence model, the ChowLiu tree model, the Bernoulli mixture model, and the ADTree model. Experimental resu...
Maximum likelihood bounded treewidth markov networks
 Artificial Intelligence
, 2001
"... We study the problem of projecting a distribution onto (or finding a maximum likelihood distribution among) Markov networks of bounded treewidth. By casting it as the combinatorial optimization problem of finding a maximum weight hypertree, we prove that it is NPhard to solve exactly and provide a ..."
Abstract

Cited by 46 (4 self)
 Add to MetaCart
(Show Context)
We study the problem of projecting a distribution onto (or finding a maximum likelihood distribution among) Markov networks of bounded treewidth. By casting it as the combinatorial optimization problem of finding a maximum weight hypertree, we prove that it is NPhard to solve exactly and provide an approximation algorithm with a provable performance guarantee.
Tractable Bayesian Learning of Tree Belief Networks
, 2000
"... In this paper we present decomposable priors, a family of priors over structure and parameters of tree belief nets for which Bayesian learning with complete observations is tractable, in the sense that the posterior is also decomposable and can be completely determined analytically in polynomial tim ..."
Abstract

Cited by 39 (1 self)
 Add to MetaCart
(Show Context)
In this paper we present decomposable priors, a family of priors over structure and parameters of tree belief nets for which Bayesian learning with complete observations is tractable, in the sense that the posterior is also decomposable and can be completely determined analytically in polynomial time. This follows from two main results: First, we show that factored distributions over spanning trees in a graph can be integrated in closed form. Second, we examine priors over tree parameters and show that a set of assumptions similar to (Heckerman and al., 1995) constrain the tree parameter priors to be a compactly parametrized product of Dirichlet distributions. Besides allowing for exact Bayesian learning, these results permit us to formulate a new class of tractable latent variable models in which the likelihood of a data point is computed through an ensemble average over tree structures. 1 Introduction In the framework of graphical models, tree distributions stand out by their spec...
Detecting coevolution in and among protein domains
 PLoS Comp. Biol
, 2007
"... Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and sp ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
(Show Context)
Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuoustime Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to largescale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level.
An accelerated Chow and Liu algorithm: fitting tree distributions to highdimensional sparse data
, 1999
"... Chow and Liu [2] introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimes ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Chow and Liu [2] introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimesion of the domain, and linear in the number of data points that define the target distribution P . This paper shows that for sparse, discrete data, fitting a tree distribution can be done in time and memory that is jointly subquadratic in the number of variables and the size of the data set. The new algorithm, called the acCL algorithm, takes advantage of the sparsity of the data to accelerate the computation of pairwise marginals and the sorting of the resulting mutual informations, achieving speed ups of up to 23 orders of magnitude in the experiments. Copyright c # Massachusetts Institute of Technology, 1998 This report describes research done at the Dept. of Electrical Enginee...
Robust bayesian linear classifier ensembles
 In Machine Learning: ECML 2005
, 2005
"... Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modelling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimizition. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators.We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to BMA for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.
Learning the Tree Augmented Naive Bayes Classifier from incomplete datasets
"... The Bayesian network formalism is becoming increasingly popular in many areas such as decision aid or diagnosis, in particular thanks to its inference capabilities, even when data are incomplete. For classification tasks, Naive Bayes and Augmented Naive Bayes classifiers have shown excellent perform ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
The Bayesian network formalism is becoming increasingly popular in many areas such as decision aid or diagnosis, in particular thanks to its inference capabilities, even when data are incomplete. For classification tasks, Naive Bayes and Augmented Naive Bayes classifiers have shown excellent performances. Learning a Naive Bayes classifier from incomplete datasets is not difficult as only parameter learning has to be performed. But there are not many methods to efficiently learn Tree Augmented Naive Bayes classifiers from incomplete datasets. In this paper, we take up the structural em algorithm principle introduced by (Friedman, 1997) to propose an algorithm to answer this question. 1
Highdimensional probability density estimation with randomized ensembles of tree structured
"... Bayesian networks ..."
(Show Context)