• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized

by Dragomir Yankov Eamonn Keogh
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 20
Next 10 →

Anomaly Detection: A Survey

by Varun Chandola, Arindam Banerjee, Vipin Kumar , 2007
"... Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and c ..."
Abstract - Cited by 540 (5 self) - Add to MetaCart
Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the di®erent directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
(Show Context)

Citation Context

...ithin a given sequence which is anomalous with respect to the rest of the sequence. Such anomalous subsequences have also been referred as discords [Bu et al. 2007; Fu et al. 2006; Keogh et al. 2005; =-=Yankov et al. 2007-=-]. This problem formulation occurs in event and time-series data sets where the data is in the form of a long sequence and contains regions that are anomalous. The techniques that address this problem...

Frequency Analysis,

by R B Randall , 1987
"... ..."
Abstract - Cited by 66 (7 self) - Add to MetaCart
Abstract not found

Trajectory-Based Anomalous Event Detection

by Claudio Piciarelli, Christian Micheloni, Gian Luca Foresti, Senior Member
"... Abstract—During the last years, the task of automatic event analysis in video sequences has gained an increasing attention among the research community. The application domains are disparate, ranging from video surveillance to automatic video annotation for sport videos or TV shots. Whatever the app ..."
Abstract - Cited by 45 (5 self) - Add to MetaCart
Abstract—During the last years, the task of automatic event analysis in video sequences has gained an increasing attention among the research community. The application domains are disparate, ranging from video surveillance to automatic video annotation for sport videos or TV shots. Whatever the application field, most of the works in event analysis are based on two main approaches: the former based on explicit event recognition, focused on finding highlevel, semantic interpretations of video sequences, and the latter based on anomaly detection. This paper deals with the second approach, where the final goal is not the explicit labeling of recognized events, but the detection of anomalous events differing from typical patterns. In particular, the proposed work addresses anomaly detection by means of trajectory analysis, an approach with several application fields, most notably video surveillance and traffic monitoring. The proposed approach is based on single-class support vector machine (SVM) clustering, where the novelty detection SVM capabilities are used for the identification of anomalous trajectories. Particular attention is given to trajectory classification in absence of a priori information on the distribution of outliers. Experimental results prove the validity of the proposed approach. Index Terms—Anomaly detection, event analysis, support vector machines (SVMs), trajectory clustering.
(Show Context)

Citation Context

...2), is a good measure for outlier detection. To perform this test, we compared the proposed method with another simple yet very effective outlier detection technique, based on the concept of discords =-=[39]-=-, [40]: a discord is defined as the trajectory maximizing 1 The data sets used in this section are available at http://avires.dimi.uniud.it/ papers/trclust its Euclidean distance from the nearest neig...

A Complexity-Invariant Distance Measure for Time Series Gustavo E.A.P.A. Batista 1,2

by Xiaoyue Wang, Eamonn J. Keogh
"... The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While there is a plethora of classification algorithms that can be applied to time series, all of the current empirical evidence suggests that simple neares ..."
Abstract - Cited by 21 (3 self) - Add to MetaCart
The ubiquity of time series data across almost all human endeavors has produced a great interest in time series data mining in the last decade. While there is a plethora of classification algorithms that can be applied to time series, all of the current empirical evidence suggests that simple nearest neighbor classification is exceptionally difficult to beat. The choice of distance measure used by the nearest neighbor algorithm depends on the invariances required by the domain. For example, motion capture data typically requires invariance to warping. In this work we make a surprising claim. There is an invariance that the community has missed, complexity invariance. Intuitively, the problem is that in many domains the different classes may have different complexities, and pairs of complex objects, even those which subjectively may seem very similar to the human eye, tend to be further apart under current distance measures than pairs of simple objects. This fact introduces errors in nearest neighbor classification, where complex objects are incorrectly assigned to a simpler class. We introduce the first complexity-invariant distance measure for time series, and show that it generally produces significant improvements in classification accuracy. We further show that this improvement does not compromise efficiency, since we can lower bound the measure and use a modification of triangular inequality, thus making use of most existing indexing and data mining algorithms. We evaluate our ideas with the largest and most comprehensive set of time series classification experiments ever attempted, and show that complexity-invariant distance measures can produce improvements in accuracy in the vast majority of cases.
(Show Context)

Citation Context

...at this improvement does not compromise the efficiency of algorithms that make frequent calls to a distance measure (classification [5], clustering [6], motif discovery [17] and outlier detection [20]=-=[21]-=-), since we can lower bound the measure and use a minor modification of triangular inequality, and thus avail of all existing indexing and data mining techniques. It is critical to note that the probl...

Automated Load Curve Data Cleansing in Power Systems

by Jiyi Chen, Wenyuan Li, Adriel Lau, Jiguo Cao, Ke Wang, Senior Member
"... Abstract—Load curve data refers to the electric energy consumption recorded by meters at certain time intervals at delivery points or end user points, and contains vital information for day-to-day operations, system analysis, system visualization, system reliability performance, energy saving and ad ..."
Abstract - Cited by 16 (2 self) - Add to MetaCart
Abstract—Load curve data refers to the electric energy consumption recorded by meters at certain time intervals at delivery points or end user points, and contains vital information for day-to-day operations, system analysis, system visualization, system reliability performance, energy saving and adequacy in system planning. Unfortunately, it is unavoidable that load curves contain corrupted data and missing data due to various random failure factors in meters and transfer processes. This paper presents the B-Spline smoothing and Kernel smoothing based techniques to automatically cleanse corrupted and missing data. In implementation, a man–machine dialogue procedure is proposed to enhance the performance. The experiment results on the real British Columbia Transmission Corporation (BCTC) load curve data demonstrated the effectiveness of the presented solution. Index Terms—Load management, load modeling, power systems, smoothing methods, power quality.

Temporal outlier detection in vehicle traffic data

by Xiaolei Li, Zhenhui Li, Jiawei Han, Jae-gil Lee - In ICDE ’09
"... Abstract — Outlier detection in vehicle traffic data is a practical problem that has gained traction lately due to an increasing capability to track moving vehicles in city roads. In contrast to other applications, this particular domain includes a very dynamic dimension: time. Many existing algorit ..."
Abstract - Cited by 15 (2 self) - Add to MetaCart
Abstract — Outlier detection in vehicle traffic data is a practical problem that has gained traction lately due to an increasing capability to track moving vehicles in city roads. In contrast to other applications, this particular domain includes a very dynamic dimension: time. Many existing algorithms have studied the problem of outlier detection at a single instant in time. This study proposes a method for detecting temporal outliers with an emphasis on historical similarity trends between data points. Outliers are calculated from drastic changes in the trends. Experiments with real world traffic data show that this approach is effective and efficient. I.
(Show Context)

Citation Context

...], which detects local outliers. The temporal neighborhood vector can be viewed as a historical record of local neighbors. In time series research, algorithms have been proposed to find outliers [5], =-=[11]-=-. One algorithm [5] defines outliers as time series that have very distant nearest neighbors. Compared to TOD, this definition is very simplistic in the sense that it only measures entire time series ...

Outlier Detection for Temporal Data: A Survey

by Manish Gupta, Jing Gao, Charu C. Aggarwal, Jiawei Han
"... Abstract—In the statistics community, outlier detection for time series data has been studied for decades. Recently, with advances in hardware and software technology, there has been a large body of work on temporal outlier detection from a computational perspective within the computer science commu ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Abstract—In the statistics community, outlier detection for time series data has been studied for decades. Recently, with advances in hardware and software technology, there has been a large body of work on temporal outlier detection from a computational perspective within the computer science community. In particular, advances in hardware technology have enabled the availability of various forms of temporal data collection mechanisms, and advances in software technology have enabled a variety of data management mechanisms. This has fueled the growth of different kinds of data sets such as data streams, spatiotemporal data, distributed streams, temporal networks, and time series data, generated by a multitude of applications. There arises a need for an organized and detailed study of the work done in the area of outlier detection with respect to such temporal datasets. In this survey, we provide a comprehensive and structured overview of a large set of interesting outlier definitions for various forms of temporal data, novel techniques, and application scenarios in which specific definitions and techniques have been widely used. Index Terms—temporal outlier detection, time series data, data streams, distributed data streams, temporal networks, spatiotemporal outliers 1
(Show Context)

Citation Context

...n distance while Compression6 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 1, JANUARY 2013 based Dissimilarity Measure (CDM) is used as a distance measure in [94]. Yankov et al. =-=[95]-=- solve the problem for a large time series stored on the disk. Chen et al. [96] define the subsequence outlier detection problem for an unequal interval time series which is a sequence X = 〈v1 = (x1, ...

Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL

by Bing Hu, Thanawin Rakthanmanon, Yuan Hao, Scott Evans, Stefano Lonardi, Eamonn Keogh
"... Abstract—Most algorithms for mining or indexing time series data do not operate directly on the original data, but instead they consider alternative representations that include transforms, quantization, approximation, and multi-resolution abstractions. Choosing the best representation and abstracti ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
Abstract—Most algorithms for mining or indexing time series data do not operate directly on the original data, but instead they consider alternative representations that include transforms, quantization, approximation, and multi-resolution abstractions. Choosing the best representation and abstraction level for a given task/dataset is arguably the most critical step in time series data mining. In this paper, we investigate techniques to discover the natural intrinsic representation model, dimensionality and alphabet cardinality of a time series. The ability to discover these intrinsic features has implications beyond selecting the best parameters for particular algorithms, as characterizing data in such a manner is useful in its own right and an important sub-routine in algorithms for classification, clustering and outlier discovery. We will frame the discovery of these intrinsic features in the Minimal Description Length (MDL) framework. Extensive empirical tests show that our method is simpler, more general and significantly more accurate than previous methods, and has the important advantage of being essentially parameter-free.
(Show Context)

Citation Context

..., as characterizing data in such a manner is useful in its own right to understand/describe the data and an important subroutine in algorithms for classification, clustering and outlier discovery [27]=-=[37]-=-. To illustrate this, consider the three unrelated datasets in Figure 1. 1 0 -1 1 0 -1 4 0 -4 -8 0 0 I II III 20 40 60 80 100 100 200 300 400 500 0 300 600 900 Figure 1. Three unrelated industrial tim...

Multiresolution Motif Discovery in Time Series

by Nuno Castro, Paulo Azevedo
"... Time series motif discovery is an important problem with applications in a variety of areas that range from telecommunications to medicine. Several algorithms have been proposed to solve the problem. However, these algorithms heavily use expensive random disk accesses or assume the data can fit into ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
Time series motif discovery is an important problem with applications in a variety of areas that range from telecommunications to medicine. Several algorithms have been proposed to solve the problem. However, these algorithms heavily use expensive random disk accesses or assume the data can fit into main memory. They only consider motifs at a single resolution and are not suited to interactivity. In this work, we tackle the motif discovery problem as an approximate Top-K frequent subsequence discovery problem. We fully exploit state of the art iSAX representation multiresolution capability to obtain motifs at different resolutions. This property yields interactivity, allowing the user to navigate along the Top-K motifs structure. This permits a deeper understanding of the time series database. Further, we apply the
(Show Context)

Citation Context

...e [4, 13, 21] which is inefficient. It is known in the database community that accessing 10% of a disk database randomly takes essentially the same time as traversing the entire database sequentially =-=[18]-=-. Even for moderate sized datasets this becomes an issue. Other techniques [6, 10, 16, 17], tackle this problem by putting the entire dataset in main memory. The assumption that the data can fit in ma...

Data Editing Techniques to Allow the Application of Distance-Based Outlier Detection to Streams

by Vit Niennattrakul, Eamonn Keogh, Chotirat Ann Ratanamahatana
"... Abstract — The problem of finding outliers in data has broad applications in areas as diverse as data cleaning, fraud detection, network monitoring, invasive species monitoring, etc. While there are dozens of techniques that have been proposed to solve this problem for static data collections, very ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract — The problem of finding outliers in data has broad applications in areas as diverse as data cleaning, fraud detection, network monitoring, invasive species monitoring, etc. While there are dozens of techniques that have been proposed to solve this problem for static data collections, very simple distance-based outlier detection methods are known to be competitive or superior to more complex methods. However, distance-based methods have time and space complexities that make them impractical for streaming data and/or resource limited sensors. In this work, we show that simple data-editing techniques can make distance-based outlier detection practical for very fast streams and resource limited sensors. Our technique generalizes to produce two algorithms, which, relative to the original algorithm, can guarantee to produce no false positives, or guarantee to produce no false negatives. Our methods are independent of both data type and distance measure, and are thus broadly applicable.
(Show Context)

Citation Context

...s shown in Figure 1. While this algorithm is very simple, and, apart from the threshold value, is completely parameter free. It is surprisingly effective, as previous works (with minor differences [5]=-=[16]-=-) and later experiments will show. In this work, we introduce a solution to this problem using a data editing technique [9]. A simple search-based technique uses some heuristic functions to reduce the...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University