## Predictive Data Mining with Finite Mixtures (1996)

### Cached

### Download Links

- [www.aaai.org]
- [www.aaai.org]
- [www.cs.helsinki.fi]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of The Second International Conference on Knowledge Discovery and Data Mining |

Citations: | 10 - 5 self |

### BibTeX

@INPROCEEDINGS{Kontkanen96predictivedata,

author = {Petri Kontkanen and Petri Myllymsi and Henry Tirri},

title = {Predictive Data Mining with Finite Mixtures},

booktitle = {In Proceedings of The Second International Conference on Knowledge Discovery and Data Mining},

year = {1996},

pages = {176--182}

}

### Years of Citing Articles

### OpenURL

### Abstract

In data mining the goal is to develop methods for discovering previously unknown regularities from databases. The resulting models are interpreted and evaluated by domain experts, but some model evaluation criterion is needed also for the model construction process. The optimal choice would be to use the same criterion as the human experts, but this is usually impossible as the experts are not capable of expressing their evaluation criteria formally. On the other hand, it seems reasonable to assume that any model pos-31353ulg nP.nn:,,.. cl.,3 IA&l2 LqJa”urvy nn,nl.:,:+.. “I,4 UanuL ~-,1,:..,.. ~““U a.eF..l ~IlxmAl‘“uu,mFl:n+:nw. ” am ” nlo, captures some structure of the reality. For this reason, in predictive data mining the search for good models is guided by the expected predictive error of the models. In this paper we describe the Bayesian approach to predictive data mining in the finite mixture modeling framework. The finite mixture model family is a natural choice for domains where the data exhibits a clustering structure. In many real world domains this seems to be the case, as is demonstrated by our experimental results on a set of public domain databases. Data mining aims at extracting useful information from databases by discovering previously unknown regularities from data (Fayyad et al. 1996). In the most general context, finding such interesting regularities is a process (often called knowledge discovery in databases) which includes the interpretation of the extracted patterns based on the domain knowledge available. Typically the pattern extraction phase is performed by a structure searching program, and the interpretation phase by a human expert. The various proposed ap-

### Citations

8089 | Maximum likelihood from incomplete data via the EM algorithm - Dempster, Laird, et al. - 1977 |

1240 | Bayesian Data Analysis
- Gelman, Carlin, et al.
- 1995
(Show Context)
Citation Context ... or distributions, which can then be coded as mixing distributions in our finite mixture framework. In order to find probabilistic models for making good predictions, we follow the Bayesian approach (=-=Gelman et al. 1995-=-; Cheeseman 1995), as it offers a solid theoretical framework for combining both (suitably coded) a priori domain information and informthe .L-- nnmhar ----_--- of -- clll~t~r~ ---C.lI--L ___ in the L... |

1039 |
Bayesian Theory
- Bernardo, Smith
- 1994
(Show Context)
Citation Context ... model \Theta that models the cluster structure of the database, predictive inference can be performed in a computationally efficient manner. The Bayesian approach to predictive inference (see e.g., (=-=Bernardo & Smith 1994-=-)) aims at predicting unobserved future quantities by means of already observed quantities. More precisely, let I = fi 1 ; : : : ; i t g be the indices of the instantiated variables, and let X = fX i ... |

903 | Learning Bayesian networks: The combination of knowledge and statistical data - Heckerman, Geiger, et al. - 1995 |

721 |
Cross-Validatory Choices and Assessment of Statistical Prediction (with Discussion
- Stone
- 1974
(Show Context)
Citation Context ... task, where the errors in incomplete pattern completion can be used as a model measure for the goodness of the model. In this work we adopt the empirical approach and use the crossvalidation method (=-=Stone 1974-=-; Geisser 1975) for model selection on a set of public domain databases. In the work presented below we have adopted the basic concepts from the general framework of exploring computational models of ... |

693 |
Optimal Statistical Decisions
- DeGroot
- 1970
(Show Context)
Citation Context ... of parameters by maximizing the posterior density P (\ThetajD). We assume that the prior distributions of the parameters are from the family of Dirichlet densities, since it is conjugate (see e.g., (=-=DeGroot 1970-=-)) to the family of multinomials, i.e., the functional form of parameter distribution remains invariant in the prior-to-posterior transformation. Finding the exact MAP estimate of \Theta is, however, ... |

624 | Statistical Analysis of Finite Mixture Distributions - Titterington, Smith, et al. - 1985 |

480 |
Bayesian classification (AutoClass): Theory and results
- Cheeseman, Stutz
- 1995
(Show Context)
Citation Context ...rithms and model evaluation criteria. Our approach is akin to the AutoClass system (Cheeseman et al. 1988), which has been successfully used for data mining problems, such as LandSat data clustering (=-=Cheeseman & Stutz 1996-=-). In the case of finite mixtures, the model search problem can be seen as searching for the missing values of the unobserved latent clustering variable in the dataset. The model construction process ... |

472 |
Fast discovery of association rules
- Agrawal, Mannila, et al.
- 1996
(Show Context)
Citation Context ...ructure searching program, and the interpretation phase by a human expert. The various proposed approaches differ in the representation language for the structure to be discovered (association rules (=-=Agrawal et al. 1996-=-), Bayesian networks (Spirtes, Glymour, & Scheines 1993), functional dependencies (Mannila & Raiha 1991), prototypes (Hu & Cercone 1995) etc.), and in the search methodology used for discovering such ... |

296 |
Stochastic Complexity in Statistical Inquiry
- Rissanen
- 1989
(Show Context)
Citation Context ...an be predicted using the regularities in the existing configuration database. For estimating the expected predictive performance, there exist theoretical measures (see e.g., (Wallace & Freeman 1987; =-=Rissanen 1989-=-; Raftery 1993)) which offer a solid evaluation criterion for the models, but such measures tend to be hard to compute for highdimensional spaces. In the case of large databases several approximations... |

240 |
AutoClass: A Bayesian classification system
- Cheeseman, Kelly, et al.
- 1988
(Show Context)
Citation Context ...tween the search component and the model measure, and allows therefore modular combinations of different search algorithms and model evaluation criteria. Our approach is akin to the AutoClass system (=-=Cheeseman et al. 1988-=-), which has been successfully used for data mining problems, such as LandSat data clustering (Cheeseman & Stutz 1996). In the case of finite mixtures, the model search problem can be seen as searchin... |

201 |
Finite Mixture Distributions
- Everitt, Hand
- 1981
(Show Context)
Citation Context ...redefined set, which we call the model space. Examples of such model spaces are the set of all possible association rules with a fixed set of attributes, or a set of all finite mixture distributions (=-=Everitt & Hand 1981-=-; Titterington, Smith, & Makov 1985). A choice of a model space necessarily introduces prior knowledge to the search process. We would like the model space to be simple enough to allow tractable searc... |

102 |
The predictive sample reuse method with applications
- Geisser
- 1975
(Show Context)
Citation Context ... the errors in incomplete pattern completion can be used as a model measure for the goodness of the model. In this work we adopt the empirical approach and use the crossvalidation method (Stone 1974; =-=Geisser 1975-=-) for model selection on a set of public domain databases. In the work presented below we have adopted the basic concepts from the general framework of exploring computational models of scientific dis... |

100 |
Statistical factor analysis and related methods: Theory and application
- Basilevsky
- 1994
(Show Context)
Citation Context ...s exploratory in nature, i.e., search for any kind of structure in the database in order to understand the domain better. Akin to the practice of multivariate exploratory analysis in social sciences (=-=Basilevsky 1994-=-), much of the work in the data mining area relies on a task-specific expert assessment of the model goodness. We depart from this tradition, and assume that the discovery process is performed with th... |

98 | Approximate Bayes Factors and Accounting for Model Uncertainty in Generalized Linear Regression Models. Biometrika
- Raftery
- 1996
(Show Context)
Citation Context ... using the regularities in the existing configuration database. For estimating the expected predictive performance, there exist theoretical measures (see e.g., (Wallace & Freeman 1987; Rissanen 1989; =-=Raftery 1993-=-)) which offer a solid evaluation criterion for the models, but such measures tend to be hard to compute for highdimensional spaces. In the case of large databases several approximations to these crit... |

95 |
The Design of Relational Databases
- Mannila, Räihä
- 1987
(Show Context)
Citation Context ...ches differ in the representation language for the structure to be discovered (association rules (Agrawal et al. 1996), Bayesian networks (Spirtes, Glymour, & Scheines 1993), functional dependencies (=-=Mannila & Raiha 1991-=-), prototypes (Hu & Cercone 1995) etc.), and in the search methodology used for discovering such structures. A large body of the data mining research is exploratory in nature, i.e., search for any kin... |

20 | Massively parallel case-based reasoning with probabilistic similarity metrics
- MyllymZki, Tirri
- 1994
(Show Context)
Citation Context ... of instantiated variables and n i the number of values of X i . Observe that K is usually small compared to the sample size N , and thus the prediction computation can be performed very efficiently (=-=Myllymaki & Tirri 1994-=-). The predictive distributions can be used for classification and regression tasks. In classification problems, we have a special class variable X c which is used for classifying data. In more genera... |

16 | Probabilistic instance-based learning - Tirri, Kontkanen, et al. - 1996 |

10 | Comparing Bayesian model class selection criteria by discrete finite mixtures - Kontkanen, Myllymaki, et al. - 1996 |

6 |
Overview of model selection
- Cheeseman
- 1993
(Show Context)
Citation Context ...hich can then be coded as mixing distributions in our finite mixture framework. In order to find probabilistic models for making good predictions, we follow the Bayesian approach (Gelmanset al. 1995; =-=Cheeseman 1995-=-), as it offers a solid theoretical framework for combining both (suitably coded) a priori domain information and information from the sample database in the model construction process. Bayesian appro... |

4 |
Rough sets similaritybased learning from databases
- Hu, Cercone
- 1995
(Show Context)
Citation Context ...anguage for the structure to be discovered (association rules (Agrawal et al. 1996), Bayesian networks (Spirtes, Glymour, & Scheines 1993), functional dependencies (Mannila & Raiha 1991), prototypes (=-=Hu & Cercone 1995-=-) etc.), and in the search methodology used for discovering such structures. A large body of the data mining research is exploratory in nature, i.e., search for any kind of structure in the database i... |

1 | Optimal statistical decisions. McGraw-Hill. l-.~ uempster, ~1 - DeGroot - 1970 |

1 |
The predictive sample reuse method with appiications
- Geisser
- 1975
(Show Context)
Citation Context ... in this work we adopt the empirical ap- ation criteria. Our approach is akin to the AutoCiass proach and use the crossvalidation method (Stone 1974; system (Cheeseman et aE. 1988), which has been suc=-=Geisser 1975-=-) for model selection on a set of public do- cessfully used for data mining problems, such as Landmain databases. Sat data clustering (Cheeseman & Stutz 1996). In the work presented below we have adop... |

1 | Cross-validatory choice and assessment of statistical predictions - IStone - 1974 |