Results 1 - 10
of
11
Anomaly Detection: A Survey
, 2007
"... Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and c ..."
Abstract
-
Cited by 69 (1 self)
- Add to MetaCart
Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the di®erent directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
Trajectory-Based Anomalous Event Detection
"... Abstract—During the last years, the task of automatic event analysis in video sequences has gained an increasing attention among the research community. The application domains are disparate, ranging from video surveillance to automatic video annotation for sport videos or TV shots. Whatever the app ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Abstract—During the last years, the task of automatic event analysis in video sequences has gained an increasing attention among the research community. The application domains are disparate, ranging from video surveillance to automatic video annotation for sport videos or TV shots. Whatever the application field, most of the works in event analysis are based on two main approaches: the former based on explicit event recognition, focused on finding highlevel, semantic interpretations of video sequences, and the latter based on anomaly detection. This paper deals with the second approach, where the final goal is not the explicit labeling of recognized events, but the detection of anomalous events differing from typical patterns. In particular, the proposed work addresses anomaly detection by means of trajectory analysis, an approach with several application fields, most notably video surveillance and traffic monitoring. The proposed approach is based on single-class support vector machine (SVM) clustering, where the novelty detection SVM capabilities are used for the identification of anomalous trajectories. Particular attention is given to trajectory classification in absence of a priori information on the distribution of outliers. Experimental results prove the validity of the proposed approach. Index Terms—Anomaly detection, event analysis, support vector machines (SVMs), trajectory clustering.
Multiresolution Motif Discovery in Time Series
"... Time series motif discovery is an important problem with applications in a variety of areas that range from telecommunications to medicine. Several algorithms have been proposed to solve the problem. However, these algorithms heavily use expensive random disk accesses or assume the data can fit into ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Time series motif discovery is an important problem with applications in a variety of areas that range from telecommunications to medicine. Several algorithms have been proposed to solve the problem. However, these algorithms heavily use expensive random disk accesses or assume the data can fit into main memory. They only consider motifs at a single resolution and are not suited to interactivity. In this work, we tackle the motif discovery problem as an approximate Top-K frequent subsequence discovery problem. We fully exploit state of the art iSAX representation multiresolution capability to obtain motifs at different resolutions. This property yields interactivity, allowing the user to navigate along the Top-K motifs structure. This permits a deeper understanding of the time series database. Further, we apply the
Learning from Time Series in the Presence of Noise: Unsupervised and Semi-Supervised Approaches
, 2008
"... Needless to say, I would not reach this stage of graduate school if it was not for my advisor Dr. Eamonn Keogh. I have never worked with another person with so much drive and passion for what they do, and I just hope that at least part of these qualities were acquired by me too. Eamonn taught me the ..."
Abstract
- Add to MetaCart
Needless to say, I would not reach this stage of graduate school if it was not for my advisor Dr. Eamonn Keogh. I have never worked with another person with so much drive and passion for what they do, and I just hope that at least part of these qualities were acquired by me too. Eamonn taught me the basic practices and knowledge a data mining researcher needs to have, but for what its worth, it is his attitude that probably made the biggest impact on me. Any single time I would talk to him, he would be positive and encouraging. Thank you, Eamonn, for being there for me and for the rest of your students! I would like to thank my dissertation committee- Dr. Vassilis Tsotras and Dr. Stefano Lonardi. Vassilis sent me my acceptance letter exactly five years ago promising that I will enjoy the atmosphere in UC Riverside. I really did! Stefano was there for my most important publication- the first one. From him I learned that every detail matters, that every word needs to be accurately placed. I had three great internships with Yahoo!. I worked with incredible people to whom I am greatly thankful. The first summer my mentor was Dr. Dennis DeCoste. Dennis inspired many of my subsequent interests, such as ensemble learning and support vector
Automated Load Curve Data Cleansing in Power Systems
"... Abstract—Load curve data refers to the electric energy consumption recorded by meters at certain time intervals at delivery points or end user points, and contains vital information for day-to-day operations, system analysis, system visualization, system reliability performance, energy saving and ad ..."
Abstract
- Add to MetaCart
Abstract—Load curve data refers to the electric energy consumption recorded by meters at certain time intervals at delivery points or end user points, and contains vital information for day-to-day operations, system analysis, system visualization, system reliability performance, energy saving and adequacy in system planning. Unfortunately, it is unavoidable that load curves contain corrupted data and missing data due to various random failure factors in meters and transfer processes. This paper presents the B-Spline smoothing and Kernel smoothing based techniques to automatically cleanse corrupted and missing data. In implementation, a man–machine dialogue procedure is proposed to enhance the performance. The experiment results on the real British Columbia Transmission Corporation (BCTC) load curve data demonstrated the effectiveness of the presented solution. Index Terms—Load management, load modeling, power systems, smoothing methods, power quality.
Data Editing Techniques to Allow the Application of Distance-Based Outlier Detection to Streams
"... Abstract — The problem of finding outliers in data has broad applications in areas as diverse as data cleaning, fraud detection, network monitoring, invasive species monitoring, etc. While there are dozens of techniques that have been proposed to solve this problem for static data collections, very ..."
Abstract
- Add to MetaCart
Abstract — The problem of finding outliers in data has broad applications in areas as diverse as data cleaning, fraud detection, network monitoring, invasive species monitoring, etc. While there are dozens of techniques that have been proposed to solve this problem for static data collections, very simple distance-based outlier detection methods are known to be competitive or superior to more complex methods. However, distance-based methods have time and space complexities that make them impractical for streaming data and/or resource limited sensors. In this work, we show that simple data-editing techniques can make distance-based outlier detection practical for very fast streams and resource limited sensors. Our technique generalizes to produce two algorithms, which, relative to the original algorithm, can guarantee to produce no false positives, or guarantee to produce no false negatives. Our methods are independent of both data type and distance measure, and are thus broadly applicable.
Load
"... Work done while at University of Illinois Abstract — Outlier detection in vehicle traffic data is a practical problem that has gained traction lately due to an increasing capability to track moving vehicles in city roads. In contrast to other applications, this particular domain includes a very dyna ..."
Abstract
- Add to MetaCart
Work done while at University of Illinois Abstract — Outlier detection in vehicle traffic data is a practical problem that has gained traction lately due to an increasing capability to track moving vehicles in city roads. In contrast to other applications, this particular domain includes a very dynamic dimension: time. Many existing algorithms have studied the problem of outlier detection at a single instant in time. This study proposes a method for detecting temporal outliers with an emphasis on historical similarity trends between data points. Outliers are calculated from drastic changes in the trends. Experiments with real world traffic data show that this approach is effective and efficient.
Faster and Parameter-Free Discord Search in Quasi-Periodic Time Series
"... Abstract. Time series discord has proven to be a useful concept for timeseries anomaly identification. To search for discords, various algorithms have been developed. Most of these algorithms rely on pre-building an index (such as a trie) for subsequences. Users of these algorithms are typically req ..."
Abstract
- Add to MetaCart
Abstract. Time series discord has proven to be a useful concept for timeseries anomaly identification. To search for discords, various algorithms have been developed. Most of these algorithms rely on pre-building an index (such as a trie) for subsequences. Users of these algorithms are typically required to choose optimal values for word-length and/or alphabetsize parameters of the index, which are not intuitive. In this paper, we propose an algorithm to directly search for the top-K discords, without the requirement of building an index or tuning external parameters. The algorithm exploits quasi-periodicity present in many time series. For quasiperiodic time series, the algorithm gains significant speedup by reducing the number of calls to the distance function.
Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL
"... Abstract—Most algorithms for mining or indexing time series data do not operate directly on the original data, but instead they consider alternative representations that include transforms, quantization, approximation, and multi-resolution abstractions. Choosing the best representation and abstracti ..."
Abstract
- Add to MetaCart
Abstract—Most algorithms for mining or indexing time series data do not operate directly on the original data, but instead they consider alternative representations that include transforms, quantization, approximation, and multi-resolution abstractions. Choosing the best representation and abstraction level for a given task/dataset is arguably the most critical step in time series data mining. In this paper, we investigate techniques to discover the natural intrinsic representation model, dimensionality and alphabet cardinality of a time series. The ability to discover these intrinsic features has implications beyond selecting the best parameters for particular algorithms, as characterizing data in such a manner is useful in its own right and an important sub-routine in algorithms for classification, clustering and outlier discovery. We will frame the discovery of these intrinsic features in the Minimal Description Length (MDL) framework. Extensive empirical tests show that our method is simpler, more general and significantly more accurate than previous methods, and has the important advantage of being essentially parameter-free.
2011 11th IEEE International Conference on Data Mining Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL
"... Abstract—Most algorithms for mining or indexing time series data do not operate directly on the original data, but instead they consider alternative representations that include transforms, quantization, approximation, and multiresolution abstractions. Choosing the best representation and abstractio ..."
Abstract
- Add to MetaCart
Abstract—Most algorithms for mining or indexing time series data do not operate directly on the original data, but instead they consider alternative representations that include transforms, quantization, approximation, and multiresolution abstractions. Choosing the best representation and abstraction level for a given task/dataset is arguably the most critical step in time series data mining. In this paper, we investigate techniques to discover the natural intrinsic representation model, dimensionality and alphabet cardinality of a time series. The ability to discover these intrinsic features has implications beyond selecting the best parameters for particular algorithms, as characterizing data in such a manner is useful in its own right and an important sub-routine in algorithms for classification, clustering and outlier discovery. We will frame the discovery of these intrinsic features in the Minimal Description Length (MDL) framework. Extensive empirical tests show that our method is simpler, more general and significantly more accurate than previous methods, and has the important advantage of being essentially parameter-free.

