• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Learning with mixtures of trees (1999)

by M Meilă-Predoviciu
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 11
Next 10 →

Learning with mixtures of trees

by Marina Meilă, Michael I. Jordan - Journal of Machine Learning Research , 2000
"... This paper describes the mixtures-of-trees model, a probabilistic model for discrete multidimensional domains. Mixtures-of-trees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learnin ..."
Abstract - Cited by 91 (2 self) - Add to MetaCart
This paper describes the mixtures-of-trees model, a probabilistic model for discrete multidimensional domains. Mixtures-of-trees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learning mixtures-of-trees models in maximum likelihood and Bayesian frameworks. We also discuss additional efficiencies that can be obtained when data are “sparse, ” and we present data structures and algorithms that exploit such sparseness. Experimental results demonstrate the performance of the model for both density estimation and classification. We also discuss the sense in which tree-based classifiers perform an implicit form of feature selection, and demonstrate a resulting insensitivity to irrelevant attributes.

Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data

by Dmitry Pavlov, Heikki Mannila, Padhraic Smyth , 2001
"... We investigate the problem of generating fast approximate answers to queries for large sparse binary data sets. We focus in particular on probabilistic model-based approaches to this problem and develop a number of techniques that are significantly more accurate than a baseline independence model. I ..."
Abstract - Cited by 34 (6 self) - Add to MetaCart
We investigate the problem of generating fast approximate answers to queries for large sparse binary data sets. We focus in particular on probabilistic model-based approaches to this problem and develop a number of techniques that are significantly more accurate than a baseline independence model. In particular, we introduce a novel technique for building probabilistic models from frequent itemsets. The itemsets are treated as constraints on the distribution of the query variables and the maximum entropy principle is used online to build a joint probability model for attributes in the query. We show that the resulting probability model defines a Markov random field (MRF) and that the time taken to answer a query scales exponentially as a function of the induced width of the associated MRF graph. We empirically compare the MRF model to other probabilistic models, such as the independence model, the Chow-Liu tree model, the Bernoulli mixture model, and the ADTree model. Experimental resu...

Hierarchical Latent Class Models for Cluster Analysis

by Nevin L. Zhang - Journal of Machine Learning Research , 2002
"... Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is ..."
Abstract - Cited by 34 (9 self) - Add to MetaCart
Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is often untrue. In this paper we propose hierarchical latent class models as a framework where the local dependence problem can be addressed in a principled manner. We develop a search-based algorithm for learning hierarchical latent class models from data. The algorithm is evaluated using both synthetic and real-world data.

Tractable Bayesian Learning of Tree Belief Networks

by Marina Meila, Tommi Jaakkola , 2000
"... In this paper we present decomposable priors, a family of priors over structure and parameters of tree belief nets for which Bayesian learning with complete observations is tractable, in the sense that the posterior is also decomposable and can be completely determined analytically in polynomial tim ..."
Abstract - Cited by 33 (1 self) - Add to MetaCart
In this paper we present decomposable priors, a family of priors over structure and parameters of tree belief nets for which Bayesian learning with complete observations is tractable, in the sense that the posterior is also decomposable and can be completely determined analytically in polynomial time. This follows from two main results: First, we show that factored distributions over spanning trees in a graph can be integrated in closed form. Second, we examine priors over tree parameters and show that a set of assumptions similar to (Heckerman and al., 1995) constrain the tree parameter priors to be a compactly parametrized product of Dirichlet distributions. Besides allowing for exact Bayesian learning, these results permit us to formulate a new class of tractable latent variable models in which the likelihood of a data point is computed through an ensemble average over tree structures. 1 Introduction In the framework of graphical models, tree distributions stand out by their spec...

Maximum likelihood bounded tree-width markov networks

by Nathan Srebro - Artificial Intelligence , 2001
"... We study the problem of projecting a distribution onto (or finding a maximum likelihood distribution among) Markov networks of bounded tree-width. By casting it as the combinatorial optimization problem of finding a maximum weight hypertree, we prove that it is NP-hard to solve exactly and provide a ..."
Abstract - Cited by 32 (3 self) - Add to MetaCart
We study the problem of projecting a distribution onto (or finding a maximum likelihood distribution among) Markov networks of bounded tree-width. By casting it as the combinatorial optimization problem of finding a maximum weight hypertree, we prove that it is NP-hard to solve exactly and provide an approximation algorithm with a provable performance guarantee.

An accelerated Chow and Liu algorithm: fitting tree distributions to high-dimensional sparse data

by Marina Meila, An Accelerated Chow, Liu Algorithm , 1999
"... Chow and Liu [2] introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimes ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
Chow and Liu [2] introduced an algorithm for fitting a multivariate distribution with a tree (i.e. a density model that assumes that there are only pairwise dependencies between variables) and that the graph of these dependencies is a spanning tree. The original algorithm is quadratic in the dimesion of the domain, and linear in the number of data points that define the target distribution P . This paper shows that for sparse, discrete data, fitting a tree distribution can be done in time and memory that is jointly subquadratic in the number of variables and the size of the data set. The new algorithm, called the acCL algorithm, takes advantage of the sparsity of the data to accelerate the computation of pairwise marginals and the sorting of the resulting mutual informations, achieving speed ups of up to 2-3 orders of magnitude in the experiments. Copyright c # Massachusetts Institute of Technology, 1998 This report describes research done at the Dept. of Electrical Enginee...

Robust bayesian linear classifier ensembles

by Jesús Cerquides, Ramon López De Màntaras - Proc. 16th European Conf. Machine Learning, Lecture Notes in Computer Science , 2005
"... Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modelling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimizition. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators. We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to BMA for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.

Probabilistic tools for traffic management

by Tomas Singliar , 2007
"... As road capacity expansion is hitting the wall of cost, environmental regulations and engineering challenges, traffic management practitioners are starting to recognize the value of quantitative decision making for improvements in utilization and performance of existing road networks. Civil engineer ..."
Abstract - Add to MetaCart
As road capacity expansion is hitting the wall of cost, environmental regulations and engineering challenges, traffic management practitioners are starting to recognize the value of quantitative decision making for improvements in utilization and performance of existing road networks. Civil engineering manuals have characteristically boiled their guidance down to a single number or recommendation. However, traffic behaves stochastically and often cannot be well characterized with such simplification. We design generative models that describe the joint distribution of observed traffic patterns. The primary use we envision for such models is to impute data missing due to sensor malfunctions that are frequent in practice. Computational decision-making machinery such as Markov decision processes offers a principled way to aid traffic managers to absorb more complex traffic descriptions into their decisions. Quantitative decision-making paradigms fundamentally require probabilistic descriptions of traffic state. Again, the standard models of traffic flow are deterministic and fully observable and therefore unfit to provide the requisite probabilities in a stochastic network. We propose to build a model of traffic flow inspired by time-tested deterministic flow models, but one that deals with uncertainty of measurement and unobservability of certain important quantities.

BAYESIAN NETWORK STRUCTURAL LEARNING AND INCOMPLETE DATA

by Olivier François
"... The Bayesian network formalism is becoming increasingly popular in many areas such as decision aid, diagnosis and complex systems control, in particular thanks to its inference capabilities, even when data are incomplete. Besides, estimating the parameters of a fixed-structure Bayesian network is ea ..."
Abstract - Add to MetaCart
The Bayesian network formalism is becoming increasingly popular in many areas such as decision aid, diagnosis and complex systems control, in particular thanks to its inference capabilities, even when data are incomplete. Besides, estimating the parameters of a fixed-structure Bayesian network is easy. However, very few methods are capable of using incomplete cases as a base to determine the structure of a Bayesian network. In this paper, we take up the structural EM algorithm principle [9, 10] to propose an algorithm which extends the Maximum Weight Spanning Tree algorithm to deal with incomplete data. We also propose to use this extension in order to (1) speed up the structural EM algorithm or (2) in classification tasks extend the Tree Augmented Naive classifier in order to deal with incomplete data. 1.

High-dimensional probability density estimation with randomized ensembles of tree structured

by Sourour Ammar, Boris Defourny, Louis Wehenkel
"... Bayesian networks ..."
Abstract - Add to MetaCart
Bayesian networks
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University