## A survey of Bayesian Data Mining - Part I: Discrete and semi-discrete Data Matrices (1999)

Venue: | SICS TR T99:08, ISSN 1100-3154, ISRN:SICS-T99/08-SE |

Citations: | 2 - 0 self |

### BibTeX

@INPROCEEDINGS{Arnborg99asurvey,

author = {Stefan Arnborg},

title = {A survey of Bayesian Data Mining - Part I: Discrete and semi-discrete Data Matrices},

booktitle = { SICS TR T99:08, ISSN 1100-3154, ISRN:SICS-T99/08-SE},

year = {1999},

publisher = {}

}

### OpenURL

### Abstract

This tutorial summarises the use of Bayesian analysis and Bayes factors for nding signicant properties of discrete (categorical and ordinal) data. It overviews methods for nding dependencies and graphical models, latent variables, robust decision trees and association rules.

### Citations

2693 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ... implicit in the Bayes factor approach is a factor n2 (p1 p2 ) , where n is the number of data points (cases) and p i is the number of parameters in model M i . This estimate was rst found by Schwarz[=-=31]-=-, and is known, when used to penalize more detailed models in a likelihood based model comparison, as the Bayesian information criterion (BIC). So deciding between the models using the likelihood rati... |

1376 |
Statistical Decision Theory and Bayesian Analysis
- Berger
- 1993
(Show Context)
Citation Context ... used to make decisions. Clearly, it is not adequate to select a model from its posterior probability without considering the consequences of decisions. In Bayesian decision theory (see, e.g., (Berger=-=[1]-=-), we introduce actions and expected utility of actions given a 'state of the world', which could be a model or a model with its parameter. However, in Bayesian decision theory, the rational decision ... |

1155 | Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ... current survey of MCMC methods, which can solve some complex evaluations required in Bayesian modeling, can be found in the book[17]. Books explaining theory and use of graphical models are Lauritzen=-=[22]-=-, Cox and Wermuth[10], and Whittaker[35]. A tutorial on Bayesian network approaches to data mining is found in (Heckermann[18]). This present report describes data mining in a relational data structur... |

1150 |
Bayesian Theory
- Bernardo, Smith
- 1994
(Show Context)
Citation Context ... dicult integrals have to be evaluated, and presently only Markov Chain Monte Carlo (MCMC) methods are available. There are several recent books describing the Bayesian method from both a theoretical[=-=-=-3], an ideological[19, 32] and an application oriented[7] perspective. A main historic inuence leading to increased interest in Bayesian methods is Harold Jereys, who wrote particularly two books on s... |

590 | Probabilistic inference using Markov chain Monte Carlo methods
- Neal
- 1993
(Show Context)
Citation Context ...istribution but only nding the quotient of the distribution at two arbitrary given points, and it can be chosen from a set with better convergence properties. A thorough introduction is given by Neal[=-=25]-=-. To sum it up, MCMC methods can be used to estimate distributions that are not tractable analytically or numerically. We get real estimates of posterior distributions and not just approximate maxima ... |

572 | Probability Theory: The Logic of Science
- Jaynes
- 2007
(Show Context)
Citation Context ...have to be evaluated, and presently only Markov Chain Monte Carlo (MCMC) methods are available. There are several recent books describing the Bayesian method from both a theoretical[3], an ideological=-=[19, -=-32] and an application oriented[7] perspective. A main historic inuence leading to increased interest in Bayesian methods is Harold Jereys, who wrote particularly two books on scientic inference and p... |

476 | On Bayesian analysis of mixtures with an unknown number of components
- Richardson, Green
- 1997
(Show Context)
Citation Context ...sses probabilistically, with the probability of each case membership determined by the likelihood vector for it in the current class parameters[8]. We can also solve the problem with the MCMC approach=-=[-=-28]. The MCMC approach to classication is the following: Assume that we have a data matrix and want a classication of its cases which makes the attributes independent. Dene a class assignment randomly... |

445 | Bayesian density estimation and inference using mixtures
- Escobar, West
- 1995
(Show Context)
Citation Context ...s adaptations of the method, it is possible to express multivariate data as a mixture of multivariate distributions and to nd the posterior distribution of the number of classes and their parameters [=-=14, 15, 16, 28]-=-. The missing data problem can also be solved in the sense that parameters and dependence structures can be estimated with missing data without the simple expedient of wasting incomplete cases. The st... |

338 |
Bayes and Empirical Bayes Methods for Data Analysis
- Carlin, Louis
- 1996
(Show Context)
Citation Context ...only Markov Chain Monte Carlo (MCMC) methods are available. There are several recent books describing the Bayesian method from both a theoretical[3], an ideological[19, 32] and an application oriented=-=-=-[7] perspective. A main historic inuence leading to increased interest in Bayesian methods is Harold Jereys, who wrote particularly two books on scientic inference and probability theory from a Bayesi... |

306 |
Algorithmic aspects of vertex elimination on graphs
- Rose, Tarjan, et al.
- 1976
(Show Context)
Citation Context ... relationships between variables (typically variables in systems of equations or inequalities). They can be characterized in many different but equivalent ways, see (Rose [29], Rose, Lueker and Tarjan=-=[30]-=-). One simple way is to consider a decomposable graph as consisting of the union of a number of maximal complete graphs (cliques, or maximally connected subgraphs) , in such a way that (i) there is at... |

292 | Model Selection and Accounting for Model Uncertainty in Graphical Models Using Occam’s Window
- Madigan, Raftery
- 1994
(Show Context)
Citation Context ... connection between them - either directly causal, mediated through another variable, or introduced through sampling bias? These questions are analyzed using graphical models, directed or decomposable=-=[2-=-4]. As an example, in gure 1 M 1 indicates a model where A and B are dependent, whereas they are independent in model M 2 . In gure 2, we describe a directed graphical model M 00 4 indicating that var... |

215 |
Data Analysis: A Bayesian Tutorial
- Sivia, Skilling
(Show Context)
Citation Context ...have to be evaluated, and presently only Markov Chain Monte Carlo (MCMC) methods are available. There are several recent books describing the Bayesian method from both a theoretical[3], an ideological=-=[19, -=-32] and an application oriented[7] perspective. A main historic inuence leading to increased interest in Bayesian methods is Harold Jereys, who wrote particularly two books on scientic inference and p... |

173 |
Bayesian model choice via Markov chain Monte Carlo methods
- Carlin, Chib
- 1995
(Show Context)
Citation Context ... applications arising when the dimension of the parameter space changes. Although it can sometimes be avoided it is not always so. The reversible jump process was designed to cope with this phenomenon=-=[6]-=-. 10 Association rules Association rules are special sets of rules used to predict data in data mining. The literature on association rules emphasizes rapid extraction, since typically a data matrix h... |

133 |
Triangulated Graphs the Elimination Process
- Rose
- 1970
(Show Context)
Citation Context ...ny applications of describing relationships between variables (typically variables in systems of equations or inequalities). They can be characterized in many different but equivalent ways, see (Rose =-=[29]-=-, Rose, Lueker and Tarjan[30]). One simple way is to consider a decomposable graph as consisting of the union of a number of maximal complete graphs (cliques, or maximally connected subgraphs) , in su... |

123 |
Hyper markov laws and statistical analysis of decomposable graphical models
- Dawid, Lauritzen
- 1993
(Show Context)
Citation Context ...sible outcomes) of the distribution. We will nd that this analysis is the key step in determining a full graphical model for the data matrix. Our analysis is analogous to those of Dawid and Lauritzen[=-=12]-=- and Madigan and Raftery[24], but their analyses are in many ways more general and use a likelihood approach with penalization of detailed models using the BIC criterion and other similar techniques. ... |

81 |
Bayesian networks for data mining
- Heckerman
- 1997
(Show Context)
Citation Context ...ook[17]. Books explaining theory and use of graphical models are Lauritzen[22], Cox and Wermuth[10], and Whittaker[35]. A tutorial on Bayesian network approaches to data mining is found in (Heckermann=-=[18]-=-). This present report describes data mining in a relational data structure with discrete data (discrete data matrix) and the simplest generalizations to numerical data. A second part will describe ge... |

56 | Further experimental evidence against the utility of occams razor
- Webb
- 1996
(Show Context)
Citation Context ...ned out not to be the case, and the argument that a smallest decision tree should be preferred because of some kind of Occam's razor argument is apparently not valid, neither in theory nor in practise=-=[34, 2]-=-. The Bayesian approach gives the right information on the credibility and generalizing power of a decision tree. It is explained in recent papers by (Chipman, George and McCullogh[9]) and by (Paass a... |

55 | Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models
- Dellaportas, Forster
- 1999
(Show Context)
Citation Context ...e structures can be estimated with missing data without the simple expedient of wasting incomplete cases. The structure of a graphical model can be obtained as a sample from the posterior distribution=-=[4, 13]-=-. 11.1 Example: Univariate Gaussian Mixture modeling Consider the problem of deciding, for a set of real numbers, the most plausible decompositions of the distribution as a weighted sum (mixture) of a... |

52 |
frequency and reasonable expectations
- Probability
- 1946
(Show Context)
Citation Context ...t these eorts were more or less ignored when the discipline of statistics was created in the early 20th century. The rst derivation of the necessity of Bayesian methods was done by R. T. Cox in 1946[1=-=1]-=-, and has been repackaged by Jaynes with a lot of motivating discussion. Basically, the analysis investigates which family of rules for reasoning with the plausibility of statements about the world is... |

29 |
Theory of Probability
- Je¤reys
- 1939
(Show Context)
Citation Context ...A main historic inuence leading to increased interest in Bayesian methods is Harold Jereys, who wrote particularly two books on scientic inference and probability theory from a Bayesian perspective[21=-=, 20]-=-. A current survey of MCMC methods, which can solve some complex evaluations required in Bayesian modeling, can be found in the book[17]. Books explaining theory and use of graphical models are Laurit... |

23 |
On the foundations of statistical inference (with discussion
- Birnbaum
- 1962
(Show Context)
Citation Context ...f the observed data can inuence our belief in a hypothesis. This principle, the Likelihood Principle, was proposed by Fisher and Barnard, but it was rst given a detailed analysis by Birnbaum in 1962[5=-=]-=-. In the subsequent debate, frequentists have proposed that the Likelihood Principle is not applicable in this case and that the experimental design could in practise be relevant information. A Bayesi... |

23 | Parameter estimation in Bayesian networks from incomplete databases. Intelligent Data Analysis
- Ramoni, Sebastiani
- 1998
(Show Context)
Citation Context ...ternalized, and the analysis can proceed as usual, with the important dierence that the missing values are not available for analysis. A more sceptical approach was developed by Ramoni and Sebastiani[=-=27]-=-, who consider an option to regard the missing values as adversaries (the conclusions on dependence would then be true no matter what the missing values are). The other possibility is that missingness... |

12 |
A Bayesian predictive Approach to determining the number of components in a mixture distribution
- Dey, Kuo, et al.
- 1995
(Show Context)
Citation Context ...s adaptations of the method, it is possible to express multivariate data as a mixture of multivariate distributions and to nd the posterior distribution of the number of classes and their parameters [=-=14, 15, 16, 28]-=-. The missing data problem can also be solved in the sense that parameters and dependence structures can be estimated with missing data without the simple expedient of wasting incomplete cases. The st... |

11 |
Bayesian classi…cation (AutoClass): Theory and results
- Cheesman, Stutz
- 1996
(Show Context)
Citation Context ... distribution. The problem of identifying classes is known as 4 unsupervised classication. One comprehensive system for classication based on Bayesian methodology is described by Cheeseman and Stutz[8=-=]-=-. A third question - often the one of highest practical concern - is whether some designated variable can be reliably predicted in the sense that it is well related to combinations of values of other ... |

11 |
Estimation of nite mixture distributions through Bayesian sampling
- Diebolt, Robert
- 1994
(Show Context)
Citation Context ...s adaptations of the method, it is possible to express multivariate data as a mixture of multivariate distributions and to nd the posterior distribution of the number of classes and their parameters [=-=14, 15, 16, 28]-=-. The missing data problem can also be solved in the sense that parameters and dependence structures can be estimated with missing data without the simple expedient of wasting incomplete cases. The st... |

5 |
Dynamic graphical models and Markov chain Monte Carlo methods
- Berzuini, Best, et al.
- 1994
(Show Context)
Citation Context ...e structures can be estimated with missing data without the simple expedient of wasting incomplete cases. The structure of a graphical model can be obtained as a sample from the posterior distribution=-=[4, 13]-=-. 11.1 Example: Univariate Gaussian Mixture modeling Consider the problem of deciding, for a set of real numbers, the most plausible decompositions of the distribution as a weighted sum (mixture) of a... |

3 |
What should be optimized in a decision tree
- Berkman, Sandholm
- 1995
(Show Context)
Citation Context ...ned out not to be the case, and the argument that a smallest decision tree should be preferred because of some kind of Occam's razor argument is apparently not valid, neither in theory nor in practise=-=[34, 2]-=-. The Bayesian approach gives the right information on the credibility and generalizing power of a decision tree. It is explained in recent papers by (Chipman, George and McCullogh[9]) and by (Paass a... |

3 |
Cox and Nanny Wermuth. Multivariate Dependencies
- R
- 1996
(Show Context)
Citation Context ...MC methods, which can solve some complex evaluations required in Bayesian modeling, can be found in the book[17]. Books explaining theory and use of graphical models are Lauritzen[22], Cox and Wermuth=-=[10]-=-, and Whittaker[35]. A tutorial on Bayesian network approaches to data mining is found in (Heckermann[18]). This present report describes data mining in a relational data structure with discrete data ... |

3 |
Model selection and accounting fro model uncertainty in graphical models using occam’s window
- Madigan, Raftery
- 1993
(Show Context)
Citation Context ...ndent on prior information, and comparing their posterior probabilities with respect to the data matrix. A set of highest posterior probability models usually gives many clues to the data dependencies=-=[23, 24], alt-=-hough 3 B A C A B C A B C A B C A B C M3 M3' M4 M4' M4" Figure 2: Graphical models one must - as always in statistics - constantly remember that dependencies are not necessarily causalities. A se... |

3 |
Graphical Models in Multivariate Statistics
- Whittaker
- 2001
(Show Context)
Citation Context ...an solve some complex evaluations required in Bayesian modeling, can be found in the book[17]. Books explaining theory and use of graphical models are Lauritzen[22], Cox and Wermuth[10], and Whittaker=-=[35]-=-. A tutorial on Bayesian network approaches to data mining is found in (Heckermann[18]). This present report describes data mining in a relational data structure with discrete data (discrete data matr... |

2 | Scientic Inference - Jereys - 1931 |

1 |
Bayesian classication trees with overlapping leaves applied to credit-scoring
- Paass, Kindermann
- 1394
(Show Context)
Citation Context ...n approach gives the right information on the credibility and generalizing power of a decision tree. It is explained in recent papers by (Chipman, George and McCullogh[9]) and by (Paass and Kindermann=-=[26-=-]). A decision tree statistical model is one where a number of boxes are dened on one set of variables by recursive splitting of one box into two by splitting the range of one designated variable into... |