Predictive Data Mining with Finite Mixtures (1996)

Venue: In Proceedings of The Second International Conference on Knowledge Discovery and Data Mining

Citations: 10 - 5 self

### BibTeX

@INPROCEEDINGS{Kontkanen96predictivedata,

author = {Petri Kontkanen and Petri Myllymsi and Henry Tirri},

title = {Predictive Data Mining with Finite Mixtures},

booktitle = {In Proceedings of The Second International Conference on Knowledge Discovery and Data Mining},

year = {1996},

pages = {176--182}

}

### Abstract

In data mining the goal is to develop methods for discovering previously unknown regularities from databases. The resulting models are interpreted and evaluated by domain experts, but some model evaluation criterion is needed also for the model construction process. The optimal choice would be to use the same criterion as the human experts, but this is usually impossible as the experts are not capable of expressing their evaluation criteria formally. On the other hand, it seems reasonable to assume that any model pos-31353ulg nP.nn:,,.. cl.,3 IA&l2 LqJa”urvy nn,nl.:,:+.. “I,4 UanuL ~-,1,:..,.. ~““U a.eF..l ~IlxmAl‘“uu,mFl:n+:nw. ” am ” nlo, captures some structure of the reality. For this reason, in predictive data mining the search for good models is guided by the expected predictive error of the models. In this paper we describe the Bayesian approach to predictive data mining in the finite mixture modeling framework. The finite mixture model family is a natural choice for domains where the data exhibits a clustering structure. In many real world domains this seems to be the case, as is demonstrated by our experimental results on a set of public domain databases. Data mining aims at extracting useful information from databases by discovering previously unknown regularities from data (Fayyad et al. 1996). In the most general context, finding such interesting regularities is a process (often called knowledge discovery in databases) which includes the interpretation of the extracted patterns based on the domain knowledge available. Typically the pattern extraction phase is performed by a structure searching program, and the interpretation phase by a human expert. The various proposed ap-

