## Statistical Approaches to Predictive Modeling in Large Databases (1998)

Citations: | 5 - 0 self |

### BibTeX

@MISC{Cheng98statisticalapproaches,

author = {Shan Cheng},

title = {Statistical Approaches to Predictive Modeling in Large Databases},

year = {1998}

}

### OpenURL

### Abstract

Prediction, i.e., predicting the potential values or value distributions of certain attributes for objects in a database or data warehouse, is an attractive goal in data mining. To predict future events not shown in databases with high quality can help users to make smart business decisions. With the concern of both scalability and high quality of prediction, we propose a predictive modeling algorithm for interactive prediction in large databases and data warehouses. The algorithm consists of three steps: (1) data generalization, which converts data in relational databases or data warehouses into a multi-dimensional databases to which efficient analysis techniques can be applied; (2) relevance analysis, which identifies the attributes that are highly relevant to the prediction, to reduce number of attributes in prediction with the benefits in improving both efficiency and reliability of prediction; and (3) a statistical regression model, called generalized linear model, is constructed ...