Results 1  10
of
21
Learning with mixtures of trees
 Journal of Machine Learning Research
, 2000
"... This paper describes the mixturesoftrees model, a probabilistic model for discrete multidimensional domains. Mixturesoftrees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learnin ..."
Abstract

Cited by 109 (2 self)
 Add to MetaCart
This paper describes the mixturesoftrees model, a probabilistic model for discrete multidimensional domains. Mixturesoftrees generalize the probabilistic trees of Chow and Liu [6] in a different and complementary direction to that of Bayesian networks. We present efficient algorithms for learning mixturesoftrees models in maximum likelihood and Bayesian frameworks. We also discuss additional efficiencies that can be obtained when data are “sparse, ” and we present data structures and algorithms that exploit such sparseness. Experimental results demonstrate the performance of the model for both density estimation and classification. We also discuss the sense in which treebased classifiers perform an implicit form of feature selection, and demonstrate a resulting insensitivity to irrelevant attributes.
Using Tarjan's Red Rule for Fast Dependency Tree Construction
 Advances in Neural Information Processing Systems 15
, 2002
"... We focus on the problem of efficient learning of dependency trees. It is wellknown that given the pairwise mutual information coefficients, a minimumweight spanning tree algorithm solves this problem exactly and in polynomial time. However, for large datasets it is the construction of the cor ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
We focus on the problem of efficient learning of dependency trees. It is wellknown that given the pairwise mutual information coefficients, a minimumweight spanning tree algorithm solves this problem exactly and in polynomial time. However, for large datasets it is the construction of the correlation matrix that dominates the running time. We have developed a new spanningtree algorithm which is capable of exploiting partial knowledge about edge weights. The partial knowledge we maintain is a probabilistic confidence interval on the coefficients, which we derive by examining just a small sample of the data. The algorithm is able to flag the need to shrink an interval, which translates to inspection of more data for the particular attribute pair. Experimental results show running time that is nearconstant in the number of records, without significant loss in accuracy of the generated trees. Interestingly, our spanningtree algorithm is based solely on Tarjan's rededge rule, which is generally considered a guaranteed recipe for bad performance.
Using probabilistic reasoning to automate software tuning
, 2003
"... Complex software systems typically include a set of parameters that can be adjusted to improve the system’s performance. System designers expose these parameters, which are often referred to as knobs, because they realize that no single configuration of the system can adequately handle every possibl ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Complex software systems typically include a set of parameters that can be adjusted to improve the system’s performance. System designers expose these parameters, which are often referred to as knobs, because they realize that no single configuration of the system can adequately handle every possible workload. Therefore, users are allowed to tune the system, reconfiguring it to perform well on a specific workload. However, manually tuning a software system can be an extremely difficult task, and it is often necessary to dynamically retune the system as workload characteristics change over time. As a result, many systems are run using either the default knob settings or nonoptimal alternate settings, and potential performance improvements go unrealized. Ideally, the process of software tuning should be automated, allowing software systems to determine their own optimal knob settings and to reconfigure themselves as needed in response to changing conditions. This thesis demonstrates that probabilistic reasoning and decisionmaking techniques can be used as the foundation of an effective, automated approach to
Exploiting parameter domain knowledge for learning in Bayesian networks
 Carnegie Mellon University
, 2005
"... implied, of any sponsoring institution, the U.S. government or any other entity. ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
implied, of any sponsoring institution, the U.S. government or any other entity.
Latent classification models
 Machine Learning
, 2005
"... Abstract. One of the simplest, and yet most consistently wellperforming set of classifiers is the Naïve Bayes models. These models rely on two assumptions: (i) All the attributes used to describe an instance are conditionally independent given the class of that instance, and (ii) all attributes fol ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Abstract. One of the simplest, and yet most consistently wellperforming set of classifiers is the Naïve Bayes models. These models rely on two assumptions: (i) All the attributes used to describe an instance are conditionally independent given the class of that instance, and (ii) all attributes follow a specific parametric family of distributions. In this paper we propose a new set of models for classification in continuous domains, termed latent classification models. The latent classification model can roughly be seen as combining the Naïve Bayes model with a mixture of factor analyzers, thereby relaxing the assumptions of the Naïve Bayes classifier. In the proposed model the continuous attributes are described by a mixture of multivariate Gaussians, where the conditional dependencies among the attributes are encoded using latent variables. We present algorithms for learning both the parameters and the structure of a latent classification model, and we demonstrate empirically that the accuracy of the proposed model is significantly higher than the accuracy of other probabilistic classifiers. Keywords: classification, probabilistic graphical models, Naïve Bayes, correlation
Mixnets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables
, 2000
"... Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees (Moore, 1999). In this paper, we propose a kind of Bayesian network in which lowdimensional mixtures of Gaussians over different subsets of the domain’s variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modeling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrating how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as possible applications.
Supervised classification with conditional gaussian networks: Increasing the structure complexity from naive bayes
 International Journal of Approximate Reasoning
"... Most of the Bayesian networkbased classifiers are usually only able to handle discrete variables. However, most realworld domains involve continuous variables. A common practice to deal with continuous variables is to discretize them, with a subsequent loss of information. This work shows how disc ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Most of the Bayesian networkbased classifiers are usually only able to handle discrete variables. However, most realworld domains involve continuous variables. A common practice to deal with continuous variables is to discretize them, with a subsequent loss of information. This work shows how discrete classifier induction algorithms can be adapted to the conditional Gaussian network paradigm to deal with continuous variables without discretizing them. In addition, three novel classifier induction algorithms and two new propositions about mutual information are introduced. The classifier induction algorithms presented are ordered and grouped according to their structural complexity: naive Bayes, tree augmented naive Bayes, kdependence Bayesian classifiers and semi naive Bayes. All the classifier induction algorithms are empirically evaluated using predictive accuracy, and they are compared to linear discriminant analysis, as a continuous classic statistical benchmark classifier. Besides, the accuracies for a set of stateoftheart classifiers are included in order to justify the use of linear discriminant analysis as the benchmark algorithm. In order to understand the behavior of the conditional Gaussian networkbased classifiers better, the results include biasvariance decomposition of the expected misclassification rate. The study suggests that semi naive Bayes structure based classifiers and, especially, the novel wrapper condensed semi naive Bayes backward, outperform the behavior of the rest of the presented classifiers. They also obtain quite competitive results compared to the stateoftheart algorithms included. Key words: conditional Gaussian network, Bayesian network, naive Bayes, tree augmented naive Bayes, kdependence Bayesian classifiers, semi naive Bayes, filter, wrapper.