## Outlier management in intelligent data analysis (2000)

Citations: | 2 - 0 self |

### BibTeX

@TECHREPORT{Cheng00outliermanagement,

author = {J. Gongxian Cheng},

title = {Outlier management in intelligent data analysis},

institution = {},

year = {2000}

}

### OpenURL

### Abstract

In spite of many statistical methods for outlier detection and for robust analysis, there is little work on further analysis of outliers themselves to determine their origins. For example, there are “good ” outliers that provide useful information that can lead to the discovery of new knowledge, or “bad ” outliers that include noisy data points. Successfully distinguishing between different types of outliers is an important issue in many applications, including fraud detection, medical tests, process analysis and scientific discovery. It requires not only an understanding of the mathematical properties of data but also relevant knowledge in the domain context in which the outliers occur. This thesis presents a novel attempt in automating the use of domain knowledge in helping distinguish between different types of outliers. Two complementary knowledge-based outlier analysis strategies are proposed: one using knowledge regarding how “normal data ” should be distributed in a domain of interest in order to identify “good ” outliers, and the other using the understanding of “bad ” outliers. This kind of knowledge-based outlier analysis is a useful extension to existing work in both statistical and computing communities on outlier detection.

### Citations

8965 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ...r methods which have much to contribute to data analysis, including case-based reasoning [146, 98], fuzzy and rough sets [192, 7, 135], inductive logic programming [129, 142], support vector machines =-=[167]-=-, and visualisation [40, 131]. Of course, one should not forget that a 34svast volume of literature on data analysis can be found in statistics and pattern recognition [42, 118, 101, 63]. 2.3 Diagnosi... |

8083 | likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...neral networks in the fully observable case. A deep statistical analysis of Bayesian networks is provided in [156] for the fixed structure, fully observable case. Work on missing data can be found in =-=[41, 54, 143]-=-. 2.2.3 Extracting Rules from Data Many real-world problem-solving tasks are classification – assigning cases to categories or classes determined by their attributes. For instance, given the categorie... |

7335 |
Genetic Algorithms
- Goldberg
- 1989
(Show Context)
Citation Context ...s and the analyst’s strategy. In addition, given each data set there are often a large number of possible fitting models. Genetic algorithms are a strong contender for many classes of search problems =-=[56, 126]-=-. 2.2.5 Other IDA Methods In the above sections, I have briefly discussed some of the advanced IDA methods, which have been under rapid development for the last decade. These methods have been applied... |

7046 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...tion [153], and combinatorial optimisation [46]. I will provide a more detailed description of how the SOM is applied to an application in the next chapter. 2.2.2 Bayesian Networks A Bayesian network =-=[103, 136, 84]-=- is a directed, acyclic graph (DAG) that encodes probabilistic relationships among variables of interest. The process of using the Bayesian network for problem-solving is to find the appropriate struc... |

3917 |
Pattern classification and scene analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...], support vector machines [167], and visualisation [40, 131]. Of course, one should not forget that a 34svast volume of literature on data analysis can be found in statistics and pattern recognition =-=[42, 118, 101, 63]-=-. 2.3 Diagnosing Eye Diseases Due to the nature of our medical applications, there is much specialised clinical or ophthalmic terminology throughout this dissertation. Because many concepts are crucia... |

3718 |
Stochastic relaxation, gibbs distributions, and the bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...neral networks in the fully observable case. A deep statistical analysis of Bayesian networks is provided in [156] for the fixed structure, fully observable case. Work on missing data can be found in =-=[41, 54, 143]-=-. 2.2.3 Extracting Rules from Data Many real-world problem-solving tasks are classification – assigning cases to categories or classes determined by their attributes. For instance, given the categorie... |

3351 | Induction of decision trees
- Quinlan
- 1986
(Show Context)
Citation Context ...ues to acquire classificatory knowledge. Notable examples include the TDIDT (Top Down Induction of Decision Trees) family of learning systems using the chi-square test with various pruning techniques =-=[141, 20]-=-. There have been also attempts to use neural networks to acquire knowledge from noisy data. Some of the most relevant work is Ultsch [165, 166], which uses unsupervised neural networks to find regula... |

3238 |
Self-Organizing Maps
- Kohonen
- 2001
(Show Context)
Citation Context ...e t 1 < t 2 < t 3. In the discussion above, the apparent assumption is that the distance between two output nodes is measured by Euclidean distance. This process leads to several important properties =-=[96]-=-. First, similar inputs are mapped onto closely neighboured nodes. Second, each particular node represents one special input cluster. Another important property of the maps is their vector quantisatio... |

2864 |
Genetic Programming: On the Programming of Computers by Means of Natural Selection
- Koza
- 1992
(Show Context)
Citation Context ...olland’s pioneer work, especially in the last decade. We have witnessed an increasing number of interesting practical applications, including “genetic programming,” the evolution of computer programs =-=[99]-=-, the prediction of protein structure [152], and the prediction of dynamic systems behaviour [134]. The basic idea of genetic algorithms may be formulated as the problem of “search” – search for solut... |

2432 | Arun: Mining Association Rules between Sets of Items in Large Databases
- Agrawal, Imielinski, et al.
- 1993
(Show Context)
Citation Context ...ical community, may be used [21]. See [151, 127] for overviews of different rule induction methods. Recently there has been much work on the extraction of so-called “association rules” from databases =-=[2]-=-. Association rules are statements of the form “x% of customers who bought items A and B also bought the item C.” Many algorithms have been invented to extract various kinds of rules from data [120, 3... |

1826 |
Robust statistics
- Huber
- 1981
(Show Context)
Citation Context ...view of the difficulties with explicit examination of outliers, a majority of current statistical work adopts an alternative approach that neither rejects nor welcomes an outlier, but accommodates it =-=[81]-=-. This approach is characterised by the development of a variety of statistical estimation or testing procedures, which are robust against or relatively unaffected by outliers. In these procedures, ou... |

1458 |
An Introduction to Genetic Algorithms
- Mitchell
- 1996
(Show Context)
Citation Context ... of candidate solutions to a given problem by using operators inspired by natural genetic variation and natural selection. These approaches form the backbone of the field of “evolutionary computation =-=[126]-=-.” 33sThe field of genetic algorithms has progressed a long way since Holland’s pioneer work, especially in the last decade. We have witnessed an increasing number of interesting practical application... |

1283 |
Local computations with probabilities on graphical structures and their application to expert systems
- Lauritzen, Spiegelhalter
- 1988
(Show Context)
Citation Context ...tion [153], and combinatorial optimisation [46]. I will provide a more detailed description of how the SOM is applied to an application in the next chapter. 2.2.2 Bayesian Networks A Bayesian network =-=[103, 136, 84]-=- is a directed, acyclic graph (DAG) that encodes probabilistic relationships among variables of interest. The process of using the Bayesian network for problem-solving is to find the appropriate struc... |

1233 |
Case-Based Reasoning
- Kolodner
- 1993
(Show Context)
Citation Context ...ve been applied to a wide range of practical applications. However, it should be noted that there are many other methods which have much to contribute to data analysis, including case-based reasoning =-=[146, 98]-=-, fuzzy and rough sets [192, 7, 135], inductive logic programming [129, 142], support vector machines [167], and visualisation [40, 131]. Of course, one should not forget that a 34svast volume of lite... |

1122 |
Statistical Analysis with Missing Data
- Little, Rubin
- 1987
(Show Context)
Citation Context ...o erroneous analysis results, research on data quality has attracted a significant amount of attention from different communities, including information systems, management, computing, and statistics =-=[178, 168, 145, 106, 143]-=-. 1.2 Outlier Management 1.2.1 Issues and Challenges A strange data value that stands out because it is not like the rest of the data in some sense is commonly called an outlier. An outlier may appear... |

1075 | Herskovitz: A Bayesian Method for the Induction
- Cooper, E
- 1992
(Show Context)
Citation Context ...lgorithms for trees were developed. An algorithm for learning “polytrees” with unknown structure and fully observable variables is given in [136]. Early work on learning Bayesian networks was done in =-=[39]-=-, extended by [71] for recovering the structure of general networks in the fully observable case. A deep statistical analysis of Bayesian networks is provided in [156] for the fixed structure, fully o... |

904 |
An Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ...tion [153], and combinatorial optimisation [46]. I will provide a more detailed description of how the SOM is applied to an application in the next chapter. 2.2.2 Bayesian Networks A Bayesian network =-=[103, 136, 84]-=- is a directed, acyclic graph (DAG) that encodes probabilistic relationships among variables of interest. The process of using the Bayesian network for problem-solving is to find the appropriate struc... |

903 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...s were developed. An algorithm for learning “polytrees” with unknown structure and fully observable variables is given in [136]. Early work on learning Bayesian networks was done in [39], extended by =-=[71]-=- for recovering the structure of general networks in the fully observable case. A deep statistical analysis of Bayesian networks is provided in [156] for the fixed structure, fully observable case. Wo... |

873 |
Rough Sets Theoretical Aspects of Reasoning about Data
- Pawlak
- 1991
(Show Context)
Citation Context ...of practical applications. However, it should be noted that there are many other methods which have much to contribute to data analysis, including case-based reasoning [146, 98], fuzzy and rough sets =-=[192, 7, 135]-=-, inductive logic programming [129, 142], support vector machines [167], and visualisation [40, 131]. Of course, one should not forget that a 34svast volume of literature on data analysis can be found... |

589 |
Neurons with Graded Response have Collective Computational Properties like those of Two-State Neurons
- Hopfield
- 1984
(Show Context)
Citation Context ...ow and Hoff investigated perceptron networks (“Adelins”) and the delta rule [175]. By the late 60’s it became clear that single-layer perceptron networks had very limited capabilities [125]. Hopfield =-=[79]-=- analysed asymmetric networks using statistical mechanics and analogies from physics, and the Boltzmann Machine [75] tightened the link between statistical mechanics and neural network theory even fur... |

539 |
The meaning and use of the area under a receiver operating characteristic (ROC) curve
- Hanley, McNeil
- 1982
(Show Context)
Citation Context ...tliers are deleted from those test records, and the data set with selected outliers (after applying the noise model to eliminate noisy outliers). The Receiver Operator Characteristic, or ROC analysis =-=[65]-=-, is used to assess the test’s diagnostic performance by displaying pairs of false alarms and detection rates throughout the whole range of the CCVP’s measurements. While the curves shifted towards th... |

532 |
Adaptive switching circuits
- Widrow, Hoff
- 1960
(Show Context)
Citation Context ...ial neurons [123] caused much excitement, which led to the exploration of variations of this model. In the early 1960s, Widrow and Hoff investigated perceptron networks (“Adelins”) and the delta rule =-=[175]-=-. By the late 60’s it became clear that single-layer perceptron networks had very limited capabilities [125]. Hopfield [79] analysed asymmetric networks using statistical mechanics and analogies from ... |

524 | Surround-Screen Projection-Based Virtual Reality: The Design and Implementation of the CAVE
- Cruz-Niera, Sandin, et al.
- 1993
(Show Context)
Citation Context ... to contribute to data analysis, including case-based reasoning [146, 98], fuzzy and rough sets [192, 7, 135], inductive logic programming [129, 142], support vector machines [167], and visualisation =-=[40, 131]-=-. Of course, one should not forget that a 34svast volume of literature on data analysis can be found in statistics and pattern recognition [42, 118, 101, 63]. 2.3 Diagnosing Eye Diseases Due to the na... |

508 |
Beyond Regression: new tools for prediction and analysis in the behavioral sciences
- Werbos
- 1974
(Show Context)
Citation Context ...he Boltzmann Machine [75] tightened the link between statistical mechanics and neural network theory even further. Perhaps the most widely used artificial neural networks are backpropagation networks =-=[23, 149, 173]-=- and self-organising maps (SOM) [95, 96], which are powerful supervised and unsupervised learning methods, respectively. The back-propagation network has a “teacher” who supervises the learning by pro... |

472 |
Fast discovery of association rules
- Agrawal, Mannila, et al.
- 1996
(Show Context)
Citation Context ...ses [2]. Association rules are statements of the form “x% of customers who bought items A and B also bought the item C.” Many algorithms have been invented to extract various kinds of rules from data =-=[120, 3]-=-, and they have been found particularly useful for analysing basket data in retail applications for the purposes of cross-marketing, store layout, catalogue design and customer segmentation. 2.2.4 Evo... |

470 |
Cluster Analysis
- Everitt, Landau, et al.
- 2001
(Show Context)
Citation Context ...(due to known measurement noise). I make the following observations regarding the proposed method: 1) The clustering algorithm used in the method can be a traditional statistical clustering algorithm =-=[45]-=-, a self-organising neural network [95], or other machine learning methods [48]. 2) The construction of the noise model requires a set of representative and labelled instances on which the “noise mode... |

372 |
Empirical Methods for Artificial Intelligence
- Cohen
- 1995
(Show Context)
Citation Context ...ion step refers to the “best” model in terms of its predictive accuracy, its simplicity or interpretability, misclassification costs, or other appropriate criteria for the problem under investigation =-=[172, 36]-=-. 4) The success of the method very much depends on the correctness and completeness of the noise model constructed. The correctness of the model depends largely on the quality of domain knowledge – a... |

365 |
Computer Systems that Learn
- Weiss, Kulikowski
- 1995
(Show Context)
Citation Context ...ion step refers to the “best” model in terms of its predictive accuracy, its simplicity or interpretability, misclassification costs, or other appropriate criteria for the problem under investigation =-=[172, 36]-=-. 4) The success of the method very much depends on the correctness and completeness of the noise model constructed. The correctness of the model depends largely on the quality of domain knowledge – a... |

320 |
the machine learning approach
- Bioinformatics
- 1998
(Show Context)
Citation Context ...orrect output. The back-propagation network has been used to implement applications in many domains for a variety of problems, including bioinformatics, control, speech recognition and credit scoring =-=[6, 69]-=-. 30sOn the other hand, Kohonen’s SOM automatically model the features found in the input data and reflects these features in topological maps. The resulting maps form local neighbourhoods that act as... |

317 |
Multivariate Data Analysis
- Hair, Anderson, et al.
- 2004
(Show Context)
Citation Context ..., but cannot guarantee, such a tangible explanation for an outlier, no such obvious remedy is available to us, and we have no alternative but to regard the outlier as being of a random nature [8]. In =-=[62]-=-, outliers are classified into one of four classes. First, an outlier may arise from a procedural error, such as a data entry error or a mistake in coding. These outliers should be identified in the d... |

279 |
Statistical pattern recognition
- Webb
- 2002
(Show Context)
Citation Context ...12, 130]. In addition, a variety of AI techniques have been used to help detect outliers in datasets, including Bayesian methods, rule-based systems, decision trees, and nearest neighbour classifiers =-=[127, 169]-=-. In applying these methods, the challenge is often to balance two things: the blind removal of outliers, which may result in an inaccurate and often too simplistic model, and an over-fitted model of ... |

265 | Models selection and accounting for model uncertainty in graphical models using occam’s window
- Madigan, Raftery
- 1994
(Show Context)
Citation Context ... Third, Bayesian networks can be used both for supervised learning [82] and for unsupervised learning [26]. For these reasons, we are beginning to see more Bayesian networks in practical applications =-=[53, 119]-=-. 31sIn AI, work on Bayesian networks can be traced back to [137] in which “message-passing” algorithms for trees were developed. An algorithm for learning “polytrees” with unknown structure and fully... |

260 |
Learning representations by back-propagating errors. Nature
- Rumelhart, Hinton, et al.
- 1986
(Show Context)
Citation Context ... overcome this difficulty, machine learning methods can be applied to generalise from those 27 classification examples provided by the expert. In particular, I have used the backpropagation algorithm =-=[150]-=- for this purpose. The input nodes represent the locations within the visual field, output nodes are those pathological groups, and three hidden nodes are used in the fully configured network. The tra... |

240 |
AutoClass: A Bayesian classification system
- Cheeseman, Kelly, et al.
- 1988
(Show Context)
Citation Context ...and probabilistic semantics, it is an ideal representation for combining prior knowledge and data. Third, Bayesian networks can be used both for supervised learning [82] and for unsupervised learning =-=[26]-=-. For these reasons, we are beginning to see more Bayesian networks in practical applications [53, 119]. 31sIn AI, work on Bayesian networks can be traced back to [137] in which “message-passing” algo... |

210 |
Applied Optimal Control
- Bryson, Ho
- 1969
(Show Context)
Citation Context ...he Boltzmann Machine [75] tightened the link between statistical mechanics and neural network theory even further. Perhaps the most widely used artificial neural networks are backpropagation networks =-=[23, 149, 173]-=- and self-organising maps (SOM) [95, 96], which are powerful supervised and unsupervised learning methods, respectively. The back-propagation network has a “teacher” who supervises the learning by pro... |

191 |
Bayesian analysis in expert systems
- Spiegelhalter, Dawid, et al.
- 1993
(Show Context)
Citation Context ...ing Bayesian networks was done in [39], extended by [71] for recovering the structure of general networks in the fully observable case. A deep statistical analysis of Bayesian networks is provided in =-=[156]-=- for the fixed structure, fully observable case. Work on missing data can be found in [41, 54, 143]. 2.2.3 Extracting Rules from Data Many real-world problem-solving tasks are classification – assigni... |

188 |
Construction and Assessment of Classification Rules
- Hand
- 1997
(Show Context)
Citation Context ...], support vector machines [167], and visualisation [40, 131]. Of course, one should not forget that a 34svast volume of literature on data analysis can be found in statistics and pattern recognition =-=[42, 118, 101, 63]-=-. 2.3 Diagnosing Eye Diseases Due to the nature of our medical applications, there is much specialised clinical or ophthalmic terminology throughout this dissertation. Because many concepts are crucia... |

179 |
Identification of Outliers
- Hawkins
- 1980
(Show Context)
Citation Context ...lid member of the population should it be deleted. 2.1.2 Outlier Detection Many statistical techniques have been proposed to detect outliers and comprehensive texts on this topic are those by Hawkins =-=[67]-=-, Barnet and Lewis [8]. Outliers can be identified from a univariate, bivariate, or multivariate perspective. The univariate perspective for identifying outliers examines the distribution of observati... |

177 |
Providing OLAP (On-Line Analytic Processing) to User-Analysts: An
- Codd
- 1994
(Show Context)
Citation Context ...ovide more data accessibility; widely available personal computers and mobile computing devices enable easier data collection and processing; and techniques such as On-Line Analytic Processing (OLAP) =-=[35]-=- allow rapid retrieval of data from data warehouses. In addition, many of the advanced computational methods for extracting information from large quantities of data, or “data mining” methods, are beg... |

175 |
Fuzzy sets, Information and control
- Zadeh
- 1965
(Show Context)
Citation Context ...of practical applications. However, it should be noted that there are many other methods which have much to contribute to data analysis, including case-based reasoning [146, 98], fuzzy and rough sets =-=[192, 7, 135]-=-, inductive logic programming [129, 142], support vector machines [167], and visualisation [40, 131]. Of course, one should not forget that a 34svast volume of literature on data analysis can be found... |

154 |
Self-organizing semantic maps
- Ritter, Kohonen
- 1989
(Show Context)
Citation Context ...ty, i.e. their ability to project high dimensional input data onto a two-dimensional map. This makes visualisation of complex data possible, e.g., speech data [94] or symbolic descriptions of objects =-=[147]-=-. Perhaps the most significant property in many applications is that similar input vectors are mapped to geometrically close winner nodes on the output map. This is called neighbourhood 53 c N c(t ) 1... |

145 |
Tej: The Process of Knowledge Discovery in Databases
- Brachman, Anand
- 1996
(Show Context)
Citation Context ... of the abnormal events leading to possible plant shutdown, while nothing much needs to be done for the latter [102]. In certain situations, outliers may lead to the discovery of unexpected knowledge =-=[19]-=-. In the 1880s when the English physicist Rayleigh measured nitrogen from different sources, he found that there were small discrepancies among the density measurements. After closer examination, he d... |

141 |
Experimentation in Software Engineering
- Basili, Selby, et al.
- 1986
(Show Context)
Citation Context ...ading newspapers and magazines, or chatting with others. 8.4 Software System Evaluation A sensible evaluation of AI systems or software systems in general has always been a challenging research issue =-=[10, 90, 27]-=-. A careful assessment of such systems in laboratory environments is important but is no substitute for testing them in the real-world environments that they are developed for. This is especially impo... |

130 |
Self-organizing neural network that discovers surfaces in random-dot stereograms
- Becker, Hinton
- 1992
(Show Context)
Citation Context ...factory way of locating and rejecting noise in the test data. 4.1 A Two-Step Strategy for Outlier Discrimination One fundamental assumption made when a new self-organising neural network was proposed =-=[14]-=- is that interesting properties in data are more stable than the noise [128]. Consider the example of the visual function test for a normal person who does not have visual function loss. The property ... |

130 |
Self-Organising Maps
- Kohonen
- 2001
(Show Context)
Citation Context ...ing for a way of understanding when, where, and how outliers occur in a data set. In this section, I introduce a computational method for this purpose. In particular, Kohonen’s selforganising network =-=[95, 96, 97]-=- is used to model noisy data. Transition trajectories on the maps are utilised to provide graphical information about the nature of the outliers. This visualisation of outlier patterns establishes a f... |

120 |
Cybernetic solution path of an experimental problem
- Rechenberg
- 1965
(Show Context)
Citation Context ...egmentation. 2.2.4 Evolutionary Computation The idea that evolution could be used as an optimisation tool for engineering problems was studied in the early days of computing. For instance, Rechenberg =-=[144]-=- introduced “evolution strategies” and applied them to the optimisation of real-valued parameters for devices such as airfoils. Fogel et al. [51] developed “evolutionary programming,” in which finite-... |

111 |
Subjective Bayesian methods for rule-based inference systems
- Duda, Hart, et al.
- 1976
(Show Context)
Citation Context ...ed heavily, many exclusively, on the acquisition of relevant rules from the experts. Despite being one of the most successful sub-fields of AI for over a decade with many impressive systems developed =-=[154, 43, 47]-=-, these systems suffered from the knowledge acquisition brittleness: when a new case falls outside the experts’ considerations, these systems fail to give an appropriate answer. Second, the classifica... |

90 | Reverend Bayes on inference engines: A distributed hierarchical approach
- Pearl
- 1982
(Show Context)
Citation Context ...2] and for unsupervised learning [26]. For these reasons, we are beginning to see more Bayesian networks in practical applications [53, 119]. 31sIn AI, work on Bayesian networks can be traced back to =-=[137]-=- in which “message-passing” algorithms for trees were developed. An algorithm for learning “polytrees” with unknown structure and fully observable variables is given in [136]. Early work on learning B... |

84 | A linear method for deviation detection in large databases
- Arning, Agrawal, et al.
- 1996
(Show Context)
Citation Context ...te. A rule-based system for outlier detection is introduced in [138] to check the data quality of patient records in Austrian hospitals. Some other computational methods for outlier detection include =-=[5, 22, 92, 122]-=-. 2.1.3 Methods for Handling Outliers Although a considerable amount of work on outlier detection has been done in the statistical community, relatively little work has been done on how to decide whet... |

83 | A framework for analysis of Data Quality Research
- Wang, Storey, et al.
- 1995
(Show Context)
Citation Context ...o erroneous analysis results, research on data quality has attracted a significant amount of attention from different communities, including information systems, management, computing, and statistics =-=[178, 168, 145, 106, 143]-=-. 1.2 Outlier Management 1.2.1 Issues and Challenges A strange data value that stands out because it is not like the rest of the data in some sense is commonly called an outlier. An outlier may appear... |