• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Guide to the Literature on Learning Probabilistic Networks From Data (1996)

by Wray Buntine
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 114
Next 10 →

Interpreting Bayesian Logic Programs

by Kristian Kersting, Luc De Raedt, Stefan Kramer - PROCEEDINGS OF THE WORK-IN-PROGRESS TRACK AT THE 10TH INTERNATIONAL CONFERENCE ON INDUCTIVE LOGIC PROGRAMMING , 2001
"... Various proposals for combining first order logic with Bayesian nets exist. We introduce the formalism of Bayesian logic programs, which is basically a simplification and reformulation of Ngo and Haddawys probabilistic logic programs. However, Bayesian logic programs are sufficiently powerful to ..."
Abstract - Cited by 92 (7 self) - Add to MetaCart
Various proposals for combining first order logic with Bayesian nets exist. We introduce the formalism of Bayesian logic programs, which is basically a simplification and reformulation of Ngo and Haddawys probabilistic logic programs. However, Bayesian logic programs are sufficiently powerful to represent essentially the same knowledge in a more elegant manner. The elegance is illustrated by the fact that they can represent both Bayesian nets and definite clause programs (as in "pure" Prolog) and that their kernel in Prolog is actually an adaptation of an usual Prolog meta-interpreter.

Learning Bayesian Networks from Data: An Information-Theory Based Approach

by Jie Cheng, Russell Greiner, Jonathan Kelly, David Bell, Weiru Liu
"... This paper provides algorithms that use an information-theoretic analysis to learn Bayesian network structures from data. Based on our three-phase learning framework, we develop efficient algorithms that can effectively learn Bayesian networks, requiring only polynomial numbers of conditional indepe ..."
Abstract - Cited by 67 (4 self) - Add to MetaCart
This paper provides algorithms that use an information-theoretic analysis to learn Bayesian network structures from data. Based on our three-phase learning framework, we develop efficient algorithms that can effectively learn Bayesian networks, requiring only polynomial numbers of conditional independence (CI) tests in typical cases. We provide precise conditions that specify when these algorithms are guaranteed to be correct as well as empirical evidence (from real world applications and simulation tests) that demonstrates that these systems work efficiently and reliably in practice.

Dynamic Bayesian Multinets

by Jeff A. Bilmes , 2000
"... In this work, dynamic Bayesian multinets are introduced where a Markov chain state at time t determines conditional independence patterns between random variables lying within a local time window surrounding t. It is shown how information-theoretic criterion functions can be used to induce spa ..."
Abstract - Cited by 54 (14 self) - Add to MetaCart
In this work, dynamic Bayesian multinets are introduced where a Markov chain state at time t determines conditional independence patterns between random variables lying within a local time window surrounding t. It is shown how information-theoretic criterion functions can be used to induce sparse, discriminative, and classconditional network structures that yield an optimal approximation to the class posterior probability, and therefore are useful for the classification task. Using a new structure learning heuristic, the resulting models are tested on a medium-vocabulary isolated-word speech recognition task. It is demonstrated that these discriminatively structured dynamic Bayesian multinets, when trained in a maximum likelihood setting using EM, can outperform both HMMs and other dynamic Bayesian networks with a similar number of parameters. 1 Introduction While Markov chains are sometimes a useful model for sequences, such simple independence assumptions can lead...

Graphical models and automatic speech recognition

by Jeffrey A. Bilmes - Mathematical Foundations of Speech and Language Processing , 2003
"... Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recog ..."
Abstract - Cited by 49 (10 self) - Add to MetaCart
Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic-, pronunciation-, and language-modeling levels. A number of speech recognition techniques born directly out of the graphical-models paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov model-based speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model.

Learning Belief Networks from Data: An Information Theory Based Approach

by Jie Cheng, David A. Bell, Weiru Liu - In Proceedings of the Sixth ACM International Conference on Information and Knowledge Management
"... This paper presents an efficient algorithm for learning Bayesian belief networks from databases. The algorithm takes a database as input and constructs the belief network structure as output. The construction process is based on the computation of mutual information of attribute pairs. Given a data ..."
Abstract - Cited by 48 (7 self) - Add to MetaCart
This paper presents an efficient algorithm for learning Bayesian belief networks from databases. The algorithm takes a database as input and constructs the belief network structure as output. The construction process is based on the computation of mutual information of attribute pairs. Given a data set that is large enough, this algorithm can generate a belief network very close to the underlying model, and at the same time, enjoys the time complexity of O N ( ) 4 on conditional independence (CI) tests. When the data set has a normal DAG-Faithful (see Section 3.2) probability distribution, the algorithm guarantees that the structure of a perfect map [Pearl, 1988] of the underlying dependency model is generated. To evaluate this algorithm, we present the experimental results on three versions of the wellknown ALARM network database, which has 37 attributes and 10,000 records. The results show that this algorithm is accurate and efficient. The proof of correctness and the analysis of c...

Learning Bayesian Nets that Perform Well

by Russell Greiner, Adam J. Grove, Dale Schuurmans - In UAI-97 , 1997
"... A Bayesian net (BN) is more than a succinct way to encode a probabilistic distribution; it also corresponds to a function used to answer queries. A BN can therefore be evaluated by the accuracy of the answers it returns. Many algorithms for learning BNs, however, attempt to optimize another criterio ..."
Abstract - Cited by 45 (16 self) - Add to MetaCart
A Bayesian net (BN) is more than a succinct way to encode a probabilistic distribution; it also corresponds to a function used to answer queries. A BN can therefore be evaluated by the accuracy of the answers it returns. Many algorithms for learning BNs, however, attempt to optimize another criterion (usually likelihood, possibly augmented with a regularizing term), which is independent of the distribution of queries that are posed. This paper takes the "performance criteria" seriously, and considers the challenge of computing the BN whose performance --- read "accuracy over the distribution of queries" --- is optimal. We show that many aspects of this learning task are more difficult than the corresponding subtasks in the standard model. To appear in Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI-97), Providence, RI, August 1997. 1 INTRODUCTION Many tasks require answering questions; this model applies, for example, to both expert systems th...

Data Analysis with Bayesian Networks: A Bootstrap Approach

by Nir Friedman, Moises Goldszmidt, Abraham Wyner , 1999
"... In recent years there has been significant progress in algorithms and methods for inducing Bayesian networks from data. However, in complex data analysis problems, we need to go beyond being satisfied with inducing networks with high scores. We need to provide confidence measures on features o ..."
Abstract - Cited by 41 (7 self) - Add to MetaCart
In recent years there has been significant progress in algorithms and methods for inducing Bayesian networks from data. However, in complex data analysis problems, we need to go beyond being satisfied with inducing networks with high scores. We need to provide confidence measures on features of these networks: Is the existence of an edge between two nodes warranted? Is the Markov blanket of a given node robust? Can we say something about the ordering of the variables? We should be able to address these questions, even when the amount of data is not enough to induce a high scoring network. In this paper we propose Efron's Bootstrap as a computationally efficient approach for answering these questions. In addition, we propose to use these confidence measures to induce better structures from the data, and to detect the presence of latent variables.

Robust Learning with Missing Data

by Marco Ramoni, Marco Ramoni, Paola Sebastiani, Paola Sebastiani , 1996
"... Bayesian methods are becoming increasingly popular in the development of intelligent machines. Bayesian Belief Networks (bbns) are nowaday a prominent reasoning method and, during the past few years, several efforts have been addressed to develop methods able to learn bbns directly from databases. H ..."
Abstract - Cited by 38 (5 self) - Add to MetaCart
Bayesian methods are becoming increasingly popular in the development of intelligent machines. Bayesian Belief Networks (bbns) are nowaday a prominent reasoning method and, during the past few years, several efforts have been addressed to develop methods able to learn bbns directly from databases. However, all these methods assume that the database is complete or, at least, that unreported data are missing at random. Unfortunately, real-world databases are rarely complete and the "Missing at Random" assumption is often unrealistic. This paper shows that this assumption can dramatically affect the reliability of the learned bbn and introduces a robust method to learn conditional probabilities in a bbn, which does not rely on this assumption. In order to drop this assumption, we have to change the overall learning strategy used by traditional Bayesian methods: our method bounds the set of all posterior probabilities consistent with the database and proceed by refining this set as more i...

Optimization by learning and simulation of Bayesian and Gaussian networks

by P. Larrañaga, R. Etxeberria, J. A. Lozano, J.M. Peña, J. M. Pe~na , 1999
"... Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses -- organ ..."
Abstract - Cited by 34 (6 self) - Add to MetaCart
Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses -- organized in the same way as most evolutionary computation heuristics. In opposition to most evolutionary computation paradigms which consider the crossing and mutation operators as essential tools to generate new populations, EDA replaces those operators by the estimation and simulation of the joint probability distribution of the selected individuals. In this work, after making a review of the different approaches based on EDA for problems of combinatorial optimization as well as for problems of optimization in continuous domains, we propose new approaches based on the theory of probabilistic graphical models to solve problems in both domains. More precisely, we propose to adapt algorit...

Distribution of Mutual Information

by Marcus Hutter - Advances in Neural Information Processing Systems 14 , 2001
"... The mutual information of two random variables i and j with joint probabilities t ij is commonly used in learning Bayesian nets as well as in many other fields. The chances t ij are usually estimated by the empirical sampling frequency n ij /n leading to a point estimate I(n ij /n) for the mutual in ..."
Abstract - Cited by 34 (12 self) - Add to MetaCart
The mutual information of two random variables i and j with joint probabilities t ij is commonly used in learning Bayesian nets as well as in many other fields. The chances t ij are usually estimated by the empirical sampling frequency n ij /n leading to a point estimate I(n ij /n) for the mutual information. To answer questions like "is I(n ij /n) consistent with zero?" or "what is the probability that the true mutual information is much larger than the point estimate?" one has to go beyond the point estimate. In the Bayesian framework one can answer these questions by utilizing a (second order) prior distribution p(t) comprising prior information about t. From the prior p(t) one can compute the posterior p(t|n), from which the distribution p(I|n) of the mutual information can be calculated. We derive reliable and quickly computable approximations for p(I|n). We concentrate on the mean, variance, skewness, and kurtosis, and non-informative priors. For the mean we also give an exact expression. Numerical issues and the range of validity are discussed.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University