Outlier management in intelligent data analysis (2000)
| Citations: | 2 - 0 self |
BibTeX
@TECHREPORT{Cheng00outliermanagement,
author = {J. Gongxian Cheng},
title = {Outlier management in intelligent data analysis},
institution = {},
year = {2000}
}
OpenURL
Abstract
In spite of many statistical methods for outlier detection and for robust analysis, there is little work on further analysis of outliers themselves to determine their origins. For example, there are “good ” outliers that provide useful information that can lead to the discovery of new knowledge, or “bad ” outliers that include noisy data points. Successfully distinguishing between different types of outliers is an important issue in many applications, including fraud detection, medical tests, process analysis and scientific discovery. It requires not only an understanding of the mathematical properties of data but also relevant knowledge in the domain context in which the outliers occur. This thesis presents a novel attempt in automating the use of domain knowledge in helping distinguish between different types of outliers. Two complementary knowledge-based outlier analysis strategies are proposed: one using knowledge regarding how “normal data ” should be distributed in a domain of interest in order to identify “good ” outliers, and the other using the understanding of “bad ” outliers. This kind of knowledge-based outlier analysis is a useful extension to existing work in both statistical and computing communities on outlier detection.







