• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Hot sax: Efficiently finding the most unusual time series subsequence (2005)

by Eamonn Keogh, Jessica Lin
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 108
Next 10 →

Anomaly Detection: A Survey

by Varun Chandola, Arindam Banerjee, Vipin Kumar , 2007
"... Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and c ..."
Abstract - Cited by 540 (5 self) - Add to MetaCart
Anomaly detection is an important problem that has been researched within diverse research areas and application domains. Many anomaly detection techniques have been specifically developed for certain application domains, while others are more generic. This survey tries to provide a structured and comprehensive overview of the research on anomaly detection. We have grouped existing techniques into different categories based on the underlying approach adopted by each technique. For each category we have identified key assumptions, which are used by the techniques to differentiate between normal and anomalous behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. For each category, we provide a basic anomaly detection technique, and then show how the different existing techniques in that category are variants of the basic technique. This template provides an easier and succinct understanding of the techniques belonging to each category. Further, for each category, we identify the advantages and disadvantages of the techniques in that category. We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains. We hope that this survey will provide a better understanding of the di®erent directions in which research has been done on this topic, and how techniques developed in one area can be applied in domains for which they were not intended to begin with.
(Show Context)

Citation Context

...ect a subsequence within a given sequence which is anomalous with respect to the rest of the sequence. Such anomalous subsequences have also been referred as discords [Bu et al. 2007; Fu et al. 2006; =-=Keogh et al. 2005-=-; Yankov et al. 2007]. This problem formulation occurs in event and time-series data sets where the data is in the form of a long sequence and contains regions that are anomalous. The techniques that ...

Knowledge Discovery from Data Streams

by Joao Gama
"... ..."
Abstract - Cited by 47 (4 self) - Add to MetaCart
Abstract not found

Trajectory-Based Anomalous Event Detection

by Claudio Piciarelli, Christian Micheloni, Gian Luca Foresti, Senior Member
"... Abstract—During the last years, the task of automatic event analysis in video sequences has gained an increasing attention among the research community. The application domains are disparate, ranging from video surveillance to automatic video annotation for sport videos or TV shots. Whatever the app ..."
Abstract - Cited by 45 (5 self) - Add to MetaCart
Abstract—During the last years, the task of automatic event analysis in video sequences has gained an increasing attention among the research community. The application domains are disparate, ranging from video surveillance to automatic video annotation for sport videos or TV shots. Whatever the application field, most of the works in event analysis are based on two main approaches: the former based on explicit event recognition, focused on finding highlevel, semantic interpretations of video sequences, and the latter based on anomaly detection. This paper deals with the second approach, where the final goal is not the explicit labeling of recognized events, but the detection of anomalous events differing from typical patterns. In particular, the proposed work addresses anomaly detection by means of trajectory analysis, an approach with several application fields, most notably video surveillance and traffic monitoring. The proposed approach is based on single-class support vector machine (SVM) clustering, where the novelty detection SVM capabilities are used for the identification of anomalous trajectories. Particular attention is given to trajectory classification in absence of a priori information on the distribution of outliers. Experimental results prove the validity of the proposed approach. Index Terms—Anomaly detection, event analysis, support vector machines (SVMs), trajectory clustering.
(Show Context)

Citation Context

... a good measure for outlier detection. To perform this test, we compared the proposed method with another simple yet very effective outlier detection technique, based on the concept of discords [39], =-=[40]-=-: a discord is defined as the trajectory maximizing 1 The data sets used in this section are available at http://avires.dimi.uniud.it/ papers/trclust its Euclidean distance from the nearest neighbor i...

Semi-Supervised Time Series Classification

by Li Wei , et al.
"... The problem of time series classification has attracted great interest in the last decade. However current research assumes the existence of large amounts of labeled training data. In reality, such data may be very difficult or expensive to obtain. For example, it may require the time and expertise ..."
Abstract - Cited by 38 (2 self) - Add to MetaCart
The problem of time series classification has attracted great interest in the last decade. However current research assumes the existence of large amounts of labeled training data. In reality, such data may be very difficult or expensive to obtain. For example, it may require the time and expertise of cardiologists, space launch technicians, or other domain specialists. As in many other domains, there are often copious amounts of unlabeled data available. For example, the PhysioBank archive contains gigabytes of ECG data. In this work we propose a semisupervised technique for building time series classifiers. While such algorithms are well known in text domains, we will show that special considerations must be made to make them both efficient and effective for the time series domain. We evaluate our work with a comprehensive set of experiments on diverse data sources including electrocardiograms, handwritten documents, manufacturing, and video datasets. The experimental results demonstrate that our approach requires only a handful of labeled examples to construct accurate classifiers.

Change-point detection in time-series data by direct density-ratio estimation

by Yoshinobu Kawahara, Masashi Sugiyama - Proceedings of 2009 SIAM International Conference on Data Mining (SDM2009 , 2009
"... Change-point detection is the problem of discovering time points at which properties of time-series data change. This covers a broad range of real-world problems and has been actively discussed in the community of statistics and data mining. In this paper, we present a novel non-parametric approach ..."
Abstract - Cited by 25 (5 self) - Add to MetaCart
Change-point detection is the problem of discovering time points at which properties of time-series data change. This covers a broad range of real-world problems and has been actively discussed in the community of statistics and data mining. In this paper, we present a novel non-parametric approach to detecting the change of probability distribu-tions of sequence data. Our key idea is to estimate the ratio of probability densities, not the probability densities them-selves. This formulation allows us to avoid non-parametric density estimation, which is known to be a difficult prob-lem. We provide a change-point detection algorithm based on direct density-ratio estimation that can be computed very efficiently in an online manner. The usefulness of the pro-posed method is demonstrated through experiments using artificial and real datasets.
(Show Context)

Citation Context

... dataset contains 15 time-series data—each of which records patients’ respiration measured by thorax extension and every time period is manually annotated by a medical expert as ‘awake’, ‘sleep’ etc. =-=[19]-=-. Two examples of the original time-series as well as the annotation results are depicted in the top graphs of Figure 5. The task is to detect the time points at which the state of patients changes fr...

Real-time motion trajectory-based indexing and retrieval of video sequences

by Faisal I. Bashir, Student Member, Ashfaq A. Khokhar, Senior Member, Dan Schonfeld, Senior Member - IEEE Trans. Multimedia , 2007
"... Abstract—This paper presents a novel motion trajectory-based compact indexing and efficient retrieval mechanism for video sequences. Assuming trajectory information is already available, we represent trajectories as temporal ordering of subtrajectories. This approach solves the problem of trajectory ..."
Abstract - Cited by 22 (2 self) - Add to MetaCart
Abstract—This paper presents a novel motion trajectory-based compact indexing and efficient retrieval mechanism for video sequences. Assuming trajectory information is already available, we represent trajectories as temporal ordering of subtrajectories. This approach solves the problem of trajectory representation when only partial trajectory information is available due to occlusion. It is achieved by a hypothesis testing-based method applied to curvature data computed from trajectories. The subtrajectories are then represented by their principal component analysis (PCA) coefficients for optimally compact representation. Different techniques are integrated to index and retrieve subtrajectories, including PCA, spectral clustering, and string matching. We assume a query by example mechanism where an example trajectory is presented to the system and the search system returns a ranked list of most similar items in the dataset. Experiments based on datasets obtained from University of California at Irvine’s KDD archives and Columbia University’s DVMM group demonstrate the superiority of our proposed PCA-based approaches in terms of indexing and retrieval times and precision recall ratios, when compared to other techniques in the literature. Index Terms—Principal component analysis, spectral clustering, string Matching, trajectory retrieval. I.
(Show Context)

Citation Context

...e viewed as a time series when - and -projections are combined for representation. There has been tremendous amount of activity in time series representation and retrieval in recent years. Lin et al. =-=[20]-=-, [21], have presented a symbolic representation of a time-series approach (SAX) using piecewise aggregate approximation (PAA). Although quite close to our string matching-based system, there are two ...

Approximate embedding-based subsequence matching of time series

by Vassilis Athitsos, Panagiotis Papapetrou, Michalis Potamias, George Kollios, Dimitrios Gunopulos - In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data , 2008
"... A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for Embedding-Based Subsequence Matching. The key ..."
Abstract - Cited by 21 (6 self) - Add to MetaCart
A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for Embedding-Based Subsequence Matching. The key idea is to convert subsequence matching to vector matching using an embedding. This embedding maps each database time series into a sequence of vectors, so that every step of every time series in the database is mapped to a vector. The embedding is computed by applying full dynamic time warping between reference objects and each database time series. At runtime, given a query object, an embedding of that object is computed in the same manner, by running dynamic time warping between the reference objects and the query. Comparing the embedding of the query with the database vectors is used to efficiently identify relatively few areas of interest in the database sequences. Those areas of interest are then fully explored using the exact DTW-based subsequence matching algorithm. Experiments on a large, public time series data set produce speedups of over one order of magnitude compared to brute-force search, with very small losses (< 1%) in retrieval accuracy.

Escalation: Complex Event Detection in Wireless Sensor Networks

by Michael Zoumboulakis, George Roussos
"... Abstract. We present a new approach for the detection of complex events in Wireless Sensor Networks. Complex events are sets of data points that correspond to interesting or unusual patterns in the underlying phenomenon that the network monitors. Our approach is inspired from time-series data mining ..."
Abstract - Cited by 20 (2 self) - Add to MetaCart
Abstract. We present a new approach for the detection of complex events in Wireless Sensor Networks. Complex events are sets of data points that correspond to interesting or unusual patterns in the underlying phenomenon that the network monitors. Our approach is inspired from time-series data mining techniques and transforms a stream of realvalued sensor readings into a symbolic representation. Complex event detection is then performed using distance metrics, allowing us to detect events that are difficult or even impossible to describe using traditional declarative SQL-like languages and thresholds. We have tested our approach with four distinct data sets and the experimental results were encouraging in all cases. We have implemented our approach for the TinyOS and Contiki Operating Systems, for the Sky mote platform.
(Show Context)

Citation Context

...tively much smaller periods of rare events. 2 Conversion of Streaming Sensor Data to a Symbolic Representation For the conversion to string we use the Symbolic Aggregate Approximation (SAX) algorithm =-=[10,11]-=- which is a very mature and robust solution for mining time-series data. SAX creates an approximation of the original data by reducing its original size while keeping the essential features — this fac...

SAXually Explicit Images: Finding Unusual Shapes

by Li Wei, Eamonn Keogh, Xiaopeng Xi - In proceedings of the 2006 IEEE International Conference on Data Mining. Hong Kong. Dec , 2006
"... Among the visual features of multimedia content, shape is of particular interest because humans can often recognize objects solely on the basis of shape. Over the past three decades, there has been a great deal of research on shape analysis, focusing mostly on shape indexing, clustering, and classif ..."
Abstract - Cited by 20 (1 self) - Add to MetaCart
Among the visual features of multimedia content, shape is of particular interest because humans can often recognize objects solely on the basis of shape. Over the past three decades, there has been a great deal of research on shape analysis, focusing mostly on shape indexing, clustering, and classification. In this work, we introduce the new problem of finding shape discords, the most unusual shapes in a collection. We motivate the problem by considering the utility of shape discords in diverse domains including zoology, anthropology, and medicine. While the brute force search algorithm has quadratic time complexity, we avoid this by using locality-sensitive hashing to estimate similarity between shapes which enables us to reorder the search more efficiently. An extensive experimental evaluation demonstrates that our approach can speed up computation by three to four orders of magnitude.
(Show Context)

Citation Context

... no. We cannot leverage off the existing time series novelty detection techniques because most of them assume that time series subsequences are extracted by sliding a window across a long time series =-=[17]-=-[18][34], while we have individual time series here. Another possibility would be to simply project the shape time series into n-dimensional space and use existing outlier detection methods [5][22]. T...

Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized

by Dragomir Yankov Eamonn Keogh
"... The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many real-world problems this is not be the case. For example, in astrono ..."
Abstract - Cited by 20 (6 self) - Add to MetaCart
The problem of finding unusual time series has recently attracted much attention, and several promising methods are now in the literature. However, virtually all proposed methods assume that the data reside in main memory. For many real-world problems this is not be the case. For example, in astronomy, multi-terabyte time series datasets are the norm. Most current algorithms faced with data which cannot fit in main memory resort to multiple scans of the disk/tape and are thus intractable. In this work we show how one particular definition of unusual time series, the time series discord, can be discovered with a disk aware algorithm. The proposed algorithm is exact and requires only two linear scans of the disk with a tiny buffer of main memory. Furthermore, it is very simple to implement. We use the algorithm to provide further evidence of the effectiveness of the discord definition in areas as diverse as astronomy, web query mining, video surveillance, etc., and show the efficiency of our method on datasets which are many orders of magnitude larger than anything else attempted in the literature. 1.
(Show Context)

Citation Context

...t our algorithm can tackle multi-gigabyte data sets containing tens of millions of time series in just a few hours. 2. Related Work And Background The time series discord definition was introduced in =-=[13]-=-. Since then, it has attracted considerable interest and followup work. For example, [6] provide independent confirmation of the utility of discords for discovering abnormal heartbeats, in [3] the aut...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University