## A general framework for mining concept-drifting data streams with skewed distributions (2007)

### Cached

### Download Links

- [www.siam.org]
- [siam.org]
- [www.siam.org]
- [www.cs.uiuc.edu]
- [www.cs.uiuc.edu]
- [www.cs.columbia.edu]
- [www.weifan.info]
- [www1.cs.columbia.edu]
- [web.engr.illinois.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc. SDM’07 |

Citations: | 20 - 3 self |

### BibTeX

@INPROCEEDINGS{Gao07ageneral,

author = {Jing Gao and Wei Fan and Jiawei Han and Philip S. Yu},

title = {A general framework for mining concept-drifting data streams with skewed distributions},

booktitle = {In Proc. SDM’07},

year = {2007}

}

### OpenURL

### Abstract

In recent years, there have been some interesting studies on predictive modeling in data streams. However, most such studies assume relatively balanced and stable data streams but cannot handle well rather skewed (e.g., few positives but lots of negatives) and stochastic distributions, which are typical in many data stream applications. In this paper, we propose a new approach to mine data streams by estimating reliable posterior probabilities using an ensemble of models to match the distribution over under-samples of negatives and repeated samples of positives. We formally show some interesting and important properties of the proposed framework, e.g., reliability of estimated probabilities on skewed positive class, accuracy of estimated probabilities, efficiency and scalability. Experiments are performed on several synthetic as well as real-world datasets with skewed distributions, and they demonstrate that our framework has substantial advantages over existing approaches in estimation reliability and predication accuracy. 1

### Citations

3214 |
Data mining: practical machine learning tools and techniques with java implementations
- IH, Frank
- 2000
(Show Context)
Citation Context ...rated well. In the experiments, the base learners include both parametric and non-parametric classifiers: Decision Tree, Naive Bayes and Logistic Regression. We use the implementation in Weka package =-=[13]-=-. The parameters for single and ensemble models are set to be the same, which are the default values in Weka. 4.2 Empirical Results In this part, we report the experimental results regarding the effec... |

2201 |
The Elements of Statistical Learning
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...er, it is hard to learn parameters accurately from limited examples. If the training examples are far from sufficient, the parametric model would overfit the data and have low generalization accuracy =-=[9]-=-. Therefore, descriptive methods provide a more favorable solution to stream classification problems. Building a descriptive model does not require prior knowledge about the form of data distribution.... |

658 | Models and issues in data stream system
- BABCOCK, BABU, et al.
- 2002
(Show Context)
Citation Context ...tion accuracy. 1 Introduction Many real applications, such as network traffic monitoring, credit card fraud detection, and web click stream, generate continuously arriving data, known as data streams =-=[3]-=-. Since classification could help decision making by predicting class labels for given data based on past records, classification on stream data has been extensively studied in recent years, with many... |

263 | Mining time-changing data streams
- Hulten, Spencer, et al.
- 2001
(Show Context)
Citation Context ...decision making by predicting class labels for given data based on past records, classification on stream data has been extensively studied in recent years, with many interesting algorithms developed =-=[10, 14, 7, 2]-=-. However, there are still some open problems in stream classification as illustrated below. First, descriptive (non-parametric) and generative (parametric) methods are two major categories for stream... |

190 | Mining concept-drifting data streams using ensemble classifiers
- Wang, Fan, et al.
- 2003
(Show Context)
Citation Context ...decision making by predicting class labels for given data based on past records, classification on stream data has been extensively studied in recent years, with many interesting algorithms developed =-=[10, 14, 7, 2]-=-. However, there are still some open problems in stream classification as illustrated below. First, descriptive (non-parametric) and generative (parametric) methods are two major categories for stream... |

136 | Editorial: Special issue on learning from imbalanced data sets
- Chawla, Japkowicz, et al.
(Show Context)
Citation Context ...e cost of misclassifying a credit card fraud as normal will impose thousands of dollars loss on the bank. The deficiency in inductive learning methods on skewed data has been addressed by many people =-=[15, 5, 4]-=-. Inductive learner’s goal is to minimize classification error rate, therefore, it completely ignores the small number of positive examples and predicts every example as negative. This is definitely u... |

107 | A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data
- Batista, Prati, et al.
(Show Context)
Citation Context ...e cost of misclassifying a credit card fraud as normal will impose thousands of dollars loss on the bank. The deficiency in inductive learning methods on skewed data has been addressed by many people =-=[15, 5, 4]-=-. Inductive learner’s goal is to minimize classification error rate, therefore, it completely ignores the small number of positive examples and predicts every example as negative. This is definitely u... |

86 | The effect of class distribution on classifier learning: an empirical study
- Weiss, Provost
- 2001
(Show Context)
Citation Context ...e cost of misclassifying a credit card fraud as normal will impose thousands of dollars loss on the bank. The deficiency in inductive learning methods on skewed data has been addressed by many people =-=[15, 5, 4]-=-. Inductive learner’s goal is to minimize classification error rate, therefore, it completely ignores the small number of positive examples and predicts every example as negative. This is definitely u... |

78 | Analysis of decision boundaries in linearly combined neural classifiers
- Tumer, Ghosh
- 1996
(Show Context)
Citation Context ...e bias measures the difference between the expected probability and the true probability, whereas the variance measures the changes in estimated probabilities using varied training sets. As stated in =-=[12, 14]-=-, given x, the output of a classifier can be expressed as: (3.2) fc(x) = P(c|x) + βc + ηc(x) where P(c|x) is the posterior probability of class c given input x, βc is the bias introduced by the classi... |

53 | On Demand Classification of Data Streams
- CC, Han, et al.
- 2004
(Show Context)
Citation Context ...decision making by predicting class labels for given data based on past records, classification on stream data has been extensively studied in recent years, with many interesting algorithms developed =-=[10, 14, 7, 2]-=-. However, there are still some open problems in stream classification as illustrated below. First, descriptive (non-parametric) and generative (parametric) methods are two major categories for stream... |

49 |
Systematic data selection to mine concept-drifting data streams
- Fan
- 2004
(Show Context)
Citation Context |

26 | Summarizing and mining skewed data streams
- Cormode, Muthukrishnan
- 2005
(Show Context)
Citation Context ...oncept drifts are present. Therefore, a general framework for dealing with skewed data stream is in great demand. Skewed stream problems have been studied in the context of summarization and modeling =-=[6, 11]-=-. However, the evaluation of existing stream classification methods is done on balanced data streams [10, 14, 7, 2]. In reality, the concepts of data streams usually evolve with time. Several stream c... |

12 | Probabilistic Score Estimation with Piecewise Logistic Regression
- Zhang, Yang
- 2004
(Show Context)
Citation Context ...ity estimates, on the other hand, provide some information about uncertainties in classification. To ensure the accuracy of probability estimates, some calibration and smoothing methods could be used =-=[16]-=-. We would have high confidence in the prediction of an example with 99% posterior probability while are not sure about the prediction on an example with estimated posterior probability around 50%. Th... |

7 | X.: Forecasting skewed biased stochastic ozone days: Analyses and solutions
- Zhang, Fan, et al.
- 2006
(Show Context)
Citation Context ...ata streams [14, 7], however, they regard concept drifts as changessin conditional probability. In our work, it is shown that concept changes may occur in both feature and conditional probability. In =-=[8, 17]-=-, two application examples of skewed data mining are studied. But we provide a more general framework for building accurate classification models on skewed data streams. 6 Conclusions This paper has t... |

6 |
Modeling skew in data streams
- Korn, Muthukrishnan, et al.
- 2006
(Show Context)
Citation Context ...oncept drifts are present. Therefore, a general framework for dealing with skewed data stream is in great demand. Skewed stream problems have been studied in the context of summarization and modeling =-=[6, 11]-=-. However, the evaluation of existing stream classification methods is done on balanced data streams [10, 14, 7, 2]. In reality, the concepts of data streams usually evolve with time. Several stream c... |

5 | Mining extremely skewed trading anomalies
- Fan, Yu, et al.
(Show Context)
Citation Context ...ata streams [14, 7], however, they regard concept drifts as changessin conditional probability. In our work, it is shown that concept changes may occur in both feature and conditional probability. In =-=[8, 17]-=-, two application examples of skewed data mining are studied. But we provide a more general framework for building accurate classification models on skewed data streams. 6 Conclusions This paper has t... |

3 |
Sampling approaches to learning from imbalanced datasets: active learning, cost sensitive learning and beyond
- Abe
- 2003
(Show Context)
Citation Context ... d i=1 aixixd−i+1 − a0. Then the examples satisfying g(x) < 0 are labeled positive, whereas other examples are labeled negative. Weights ai(1 ≤ i ≤ d) are initialized by random values in the range of =-=[0,1]-=-. We set the value of a0 so that the number of positive examples is much smaller than thatsTable 1: Description of Data Sets data sets two classes #inst #feature #rare class inst #chunk chunksize Thyr... |