Results 1  10
of
37
Detecting outliers using transduction and statistical testing
 In Proceedings of the 12th Annual SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2006
"... Outlier detection can uncover malicious behavior in fields like intrusion detection and fraud analysis. Although there has been a significant amount of work in outlier detection, most of the algorithms proposed in the literature are based on a particular definition of outliers (e.g., densitybased), ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Outlier detection can uncover malicious behavior in fields like intrusion detection and fraud analysis. Although there has been a significant amount of work in outlier detection, most of the algorithms proposed in the literature are based on a particular definition of outliers (e.g., densitybased), and use adhoc thresholds to detect them. In this paper we present a novel technique to detect outliers with respect to an existing clustering model. However, the test can also be successfully utilized to recognize outliers when the clustering information is not available. Our method is based on Transductive Confidence Machines, which have been previously proposed as a mechanism to provide individual confidence measures on classification decisions. The test uses hypothesis testing to prove or disprove whether a point is fit to be in each of the clusters of the model. We experimentally demonstrate that the test is highly robust, and produces very few misdiagnosed points, even when no clustering information is available. Furthermore, our experiments demonstrate the robustness of our method under the circumstances of data contaminated by outliers. We finally show that our technique can be successfully applied to identify outliers in a noisy data set for which no information is available (e.g., ground truth, clustering structure, etc.). As such our proposed methodology is capable of bootstrapping from a noisy data set a clean one that can be used to identify future outliers.
Subtractive Clustering Based RBF Neural Network Model for Outlier Detection
"... Abstract—Outlier detection has many important applications in the field of fraud detection, network robustness analysis and intrusion detection. Some researches have utilized the neural network to solve the problem because it has the advantage of powerful modeling ability. In this paper, we propose ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Outlier detection has many important applications in the field of fraud detection, network robustness analysis and intrusion detection. Some researches have utilized the neural network to solve the problem because it has the advantage of powerful modeling ability. In this paper, we propose a RBF neural network model using subtractive clustering algorithm for selecting the hidden node centers, which can achieve faster training speed. In the meantime, the RBF network was trained with a regularization term so as to minimize the variances of the nodes in the hidden layer and perform more accurate prediction. By defining the degree of outlier, we can effectively find the abnormal data whose actual output is serious deviation from its expectation as long as the output is certainty. Experimental results on different datasets show that the proposed RBF model has higher detection rate as well as lower false positive rate comparing with the other methods, and it can be an effective solution for detecting outliers. Index Terms—outlier detection, radial basis function, neural network, subtractive clustering I.
A Taxonomy Framework for Unsupervised Outlier Detection Techniques for MultiType Data Sets,Technical Report
, 2007
"... ..."
New Outlier Detection Method Based on Fuzzy Clustering
, 2010
"... In this paper, a new efficient method for outlier detection is proposed. The proposed method is based on fuzzy clustering techniques. The cmeans algorithm is first performed, then small clusters are determined and considered as outlier clusters. Other outliers are then determined based on computing ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
In this paper, a new efficient method for outlier detection is proposed. The proposed method is based on fuzzy clustering techniques. The cmeans algorithm is first performed, then small clusters are determined and considered as outlier clusters. Other outliers are then determined based on computing differences between objective function values when points are temporarily removed from the data set. If a noticeable change occurred on the objective function values, the points are considered outliers. Test results were performed on different wellknown data sets in the data mining literature. The results showed that the proposed method gave good results.
On Detecting Clustered Anomalies using SCiForest
"... Abstract. Detecting local clustered anomalies is an intricate problem for many existing anomaly detection methods. Distancebased and densitybased methods are inherently restricted by their basic assumptions—anomalies are either far from normal points or being sparse. Clustered anomalies are able t ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Detecting local clustered anomalies is an intricate problem for many existing anomaly detection methods. Distancebased and densitybased methods are inherently restricted by their basic assumptions—anomalies are either far from normal points or being sparse. Clustered anomalies are able to avoid detection since they defy these assumptions by being dense and, in many cases, in close proximity to normal instances. In this paper, without using any density or distance measure, we propose a new method called SCiForest to detect clustered anomalies. SCiForest separates clustered anomalies from normal points effectively even when clustered anomalies are very close to normal points. It maintains the ability of existing methods to detect scattered anomalies, and it has superior time and space complexities against existing distancebased and densitybased methods. 1
Measuring the Distance from Training Data Set
 in Proc. Int. Symp. on Applied Stochastic Models and Data Analysis
, 2005
"... Abstract. In this paper, a new method is proposed for measuring the distance between a training data set and a single, new observation. The novel distance measure reflects the expected squared prediction error, when the prediction is based on the k nearest neighbours of the training data set. The si ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract. In this paper, a new method is proposed for measuring the distance between a training data set and a single, new observation. The novel distance measure reflects the expected squared prediction error, when the prediction is based on the k nearest neighbours of the training data set. The simulation shows that the distance measure correlates well with the true expected squared prediction error in practice. The distance measure can be applied, for example, to assessing the uncertainty of prediction.
MODELLING OF CONDITIONAL VARIANCE AND UNCERTAINTY USING INDUSTRIAL PROCESS DATA
, 2006
"... Academic dissertation to be presented, with the assent of ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Academic dissertation to be presented, with the assent of
The Robust Distance for Similarity Measure Of Content Based Image Retrieval
"... Abstract—Content based image retrieval (CBIR) is a retrieval technique which uses the visual information by retrieving collections of digital images. The process of retrieval is carried out by measuring the similarity between query image and the image in the database through similarity measure. Dist ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract—Content based image retrieval (CBIR) is a retrieval technique which uses the visual information by retrieving collections of digital images. The process of retrieval is carried out by measuring the similarity between query image and the image in the database through similarity measure. Distance is a metric often used as similarity measure on CBIR. The query image is relevant to an image in the database if the value of similarity measure is ‘small’. This means that a good CBIR retrieval system must be supported by an accurate similarity measure. The classical distance is generated from the arithmetic mean which is vulnerable to the masking effect. The appearance of extreme data causes the inflation of deviation of the arithmetic mean, this implies the distance between the extreme data or the outlier becomes closer than it supposed to be. This paper proposes a robust distance on the CBIR process which is derived from the measure of multivariate dispersion called vector variance (VV). The minimum vector variance (MVV) estimator is high breakdown point and insensitive to outliers. Another good property of VV is VV takes a shorter time of computation than covariance determinant (CD).
An Outlier Detection Method Based on Clustering
 Emerging Applications of Information Technology (EAIT
"... Abstract—In this paper we propose a clustering based method to capture outliers. We apply Kmeans clustering algorithm to divide the data set into clusters. The points which are lying near the centroid of the cluster are not probable candidate for outlier and we can prune out such points from each c ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper we propose a clustering based method to capture outliers. We apply Kmeans clustering algorithm to divide the data set into clusters. The points which are lying near the centroid of the cluster are not probable candidate for outlier and we can prune out such points from each cluster. Next we calculate a distance based outlier score for remaining points. The computations needed to calculate the outlier score reduces considerably due to the pruning of some points. Based on the outlier score we declare the top n points with the highest score as outliers. The experimental results using real data set demonstrate that even though the number of computations is less, the proposed method performs better than the existing method. Keywordsoutlier; cluster; distancebased; I.
Rough set, kernel set and spatiotemporal outlier detection
 IEEE Trans. Knowledge and Data Engineering
"... Abstract—Nowadays, the high availability of data gathered from wireless sensor networks and telecommunication systems has drawn the attention of researchers on the problem of extracting knowledge from spatiotemporal data. Detecting outliers which are grossly different from or inconsistent with the r ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Nowadays, the high availability of data gathered from wireless sensor networks and telecommunication systems has drawn the attention of researchers on the problem of extracting knowledge from spatiotemporal data. Detecting outliers which are grossly different from or inconsistent with the remaining spatiotemporal data set is a major challenge in realworld knowledge discovery and data mining applications. In this paper, we deal with the outlier detection problem in spatiotemporal data and describe a rough set approach that finds the top outliers in an unlabeled spatiotemporal data set. The proposed method, called Rough Outlier Set Extraction (ROSE), relies on a rough set theoretic representation of the outlier set using the rough set approximations, i.e., lower and upper approximations. We have also introduced a new set, named Kernel Set, that is a subset of the original data set, which is able to describe the original data set both in terms of data structure and of obtained results. Experimental results on realworld data sets demonstrate the superiority of ROSE, both in terms of some quantitative indices and outliers detected, over those obtained by various rough fuzzy clustering algorithms and by the stateoftheart outlier detection methods. It is also demonstrated that the kernel set is able to detect the same outliers set but with less computational time. Index Terms—Spatiotemporal data, outlier detection, spatiotemporal uncertainty management, rough set and granular computing Ç 1