Results 1 - 10
of
608
A tutorial on learning with Bayesian networks
- Learning in Graphical Models
, 1995
"... A companion set of lecture slides is available at ..."
Abstract
-
Cited by 710 (4 self)
- Add to MetaCart
A companion set of lecture slides is available at
Monte Carlo Statistical Methods
, 1998
"... This paper is also the originator of the Markov Chain Monte Carlo methods developed in the following chapters. The potential of these two simultaneous innovations has been discovered much latter by statisticians (Hastings 1970; Geman and Geman 1984) than by of physicists (see also Kirkpatrick et al. ..."
Abstract
-
Cited by 588 (21 self)
- Add to MetaCart
This paper is also the originator of the Markov Chain Monte Carlo methods developed in the following chapters. The potential of these two simultaneous innovations has been discovered much latter by statisticians (Hastings 1970; Geman and Geman 1984) than by of physicists (see also Kirkpatrick et al. 1983). 5.5.5 ] PROBLEMS 211
Statistical pattern recognition: A review
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques ..."
Abstract
-
Cited by 487 (20 self)
- Add to MetaCart
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have bean receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants
- MACHINE LEARNING
, 1999
"... Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in co ..."
Abstract
-
Cited by 449 (2 self)
- Add to MetaCart
Methods for voting classification algorithms, such as Bagging and AdaBoost, have been shown to be very successful in improving the accuracy of certain classifiers for artificial and real-world datasets. We review these algorithms and describe a large empirical study comparing several variants in conjunction with a decision tree inducer (three variants) and a Naive-Bayes inducer.
The purpose of the study is to improve our understanding of why and
when these algorithms, which use perturbation, reweighting, and
combination techniques, affect classification error. We provide a
bias and variance decomposition of the error to show how different
methods and variants influence these two terms. This allowed us to
determine that Bagging reduced variance of unstable methods, while
boosting methods (AdaBoost and Arc-x4) reduced both the bias and
variance of unstable methods but increased the variance for Naive-Bayes,
which was very stable. We observed that Arc-x4 behaves differently
than AdaBoost if reweighting is used instead of resampling,
indicating a fundamental difference. Voting variants, some of which
are introduced in this paper, include: pruning versus no pruning,
use of probabilistic estimates, weight perturbations (Wagging), and
backfitting of data. We found that Bagging improves when
probabilistic estimates in conjunction with no-pruning are used, as
well as when the data was backfit. We measure tree sizes and show
an interesting positive correlation between the increase in the
average tree size in AdaBoost trials and its success in reducing the
error. We compare the mean-squared error of voting methods to
non-voting methods and show that the voting methods lead to large
and significant reductions in the mean-squared errors. Practical
problems that arise in implementing boosting algorithms are
explored, including numerical instabilities and underflows. We use
scatterplots that graphically show how AdaBoost reweights instances,
emphasizing not only "hard" areas but also outliers and noise.
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have bee ..."
Abstract
-
Cited by 393 (4 self)
- Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linear-Gaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying Rao-Blackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Operations for Learning with Graphical Models
- Journal of Artificial Intelligence Research
, 1994
"... This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models ..."
Abstract
-
Cited by 214 (13 self)
- Add to MetaCart
This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks from data. The paper conclu...
Unsupervised learning of finite mixture models
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization ..."
Abstract
-
Cited by 201 (16 self)
- Add to MetaCart
AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach. Index TermsÐFinite mixtures, unsupervised learning, model selection, minimum message length criterion, Bayesian methods, expectation-maximization algorithm, clustering. 1
A Guide to the Literature on Learning Probabilistic Networks From Data
, 1996
"... This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the ..."
Abstract
-
Cited by 156 (0 self)
- Add to MetaCart
This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The presentation avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples. Keywords--- Bayesian networks, graphical models, hidden variables, learning, learning structure, probabilistic networks, knowledge discovery. I. Introduction Probabilistic networks or probabilistic gra...
Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables
- Machine Learning
, 1997
"... We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MD ..."
Abstract
-
Cited by 155 (9 self)
- Add to MetaCart
We discuss Bayesian methods for learning Bayesian networks when data sets are incomplete. In particular, we examine asymptotic approximations for the marginal likelihood of incomplete data given a Bayesian network. We consider the Laplace approximation and the less accurate but more efficient BIC/MDL approximation. We also consider approximations proposed by Draper (1993) and Cheeseman and Stutz (1995). These approximations are as efficient as BIC/MDL, but their accuracy has not been studied in any depth. We compare the accuracy of these approximations under the assumption that the Laplace approximation is the most accurate. In experiments using synthetic data generated from discrete naive-Bayes models having a hidden root node, we find that (1) the BIC/MDL measure is the least accurate, having a bias in favor of simple models, and (2) the Draper and CS measures are the most accurate. 1
Predictive regressions
- Journal of Financial Economics
, 1999
"... When a rate of return is regressed on a lagged stochastic regressor, such as a dividend yield, the regression disturbance is correlated with the regressor's innovation. The OLS estimator's "nite-sample properties, derived here, can depart substantially from the standard regression setting. Bayesian ..."
Abstract
-
Cited by 134 (4 self)
- Add to MetaCart
When a rate of return is regressed on a lagged stochastic regressor, such as a dividend yield, the regression disturbance is correlated with the regressor's innovation. The OLS estimator's "nite-sample properties, derived here, can depart substantially from the standard regression setting. Bayesian posterior distributions for the regression parameters are obtained under speci"cations that di!er with respect to (i) prior beliefs about the autocorrelation of the regressor and (ii) whether the initial observation of the regressor is speci"ed as "xed or stochastic. The posteriors di!er across such speci"cations, and asset allocations in the presence of estimation risk exhibit sensitivity to those

