Results 1  10
of
30
Searching and mining trillions of time series subsequences under dynamic time warping
 In SIGKDD
, 2012
"... Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time ..."
Abstract

Cited by 43 (3 self)
 Add to MetaCart
(Show Context)
Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current stateoftheart Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We show that our ideas allow us to solve higherlevel time series data mining problem such as motif discovery and clustering at scales that would otherwise be untenable. In addition to mining massive datasets, we will show that our ideas also have implications for realtime monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.
Approximate embeddingbased subsequence matching of time series
 In SIGMOD ’08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
, 2008
"... A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for EmbeddingBased Subsequence Matching. The key ..."
Abstract

Cited by 21 (6 self)
 Add to MetaCart
A method for approximate subsequence matching is introduced, that significantly improves the efficiency of subsequence matching in large time series data sets under the dynamic time warping (DTW) distance measure. Our method is called EBSM, shorthand for EmbeddingBased Subsequence Matching. The key idea is to convert subsequence matching to vector matching using an embedding. This embedding maps each database time series into a sequence of vectors, so that every step of every time series in the database is mapped to a vector. The embedding is computed by applying full dynamic time warping between reference objects and each database time series. At runtime, given a query object, an embedding of that object is computed in the same manner, by running dynamic time warping between the reference objects and the query. Comparing the embedding of the query with the database vectors is used to efficiently identify relatively few areas of interest in the database sequences. Those areas of interest are then fully explored using the exact DTWbased subsequence matching algorithm. Experiments on a large, public time series data set produce speedups of over one order of magnitude compared to bruteforce search, with very small losses (< 1%) in retrieval accuracy.
Accelerating dynamic time warping subsequence search with GPUs and FPGAs
 in Proc. ICDM, 2010
"... Abstract—Many time series data mining problems require subsequence similarity search as a subroutine. While this can be performed with any distance measure, and dozens of distance measures have been proposed in the last decade, there is increasing evidence that Dynamic Time Warping (DTW) is the best ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Many time series data mining problems require subsequence similarity search as a subroutine. While this can be performed with any distance measure, and dozens of distance measures have been proposed in the last decade, there is increasing evidence that Dynamic Time Warping (DTW) is the best measure across a wide range of domains. Given DTW’s usefulness and ubiquity, there has been a large communitywide effort to mitigate its relative lethargy. Proposed speedup techniques include early abandoning strategies, lowerbound based pruning, indexing and embedding. In this work we argue that we are now close to exhausting all possible speedup from software, and that we must turn to hardwarebased solutions if we are to tackle the many problems that are currently untenable even with stateoftheart algorithms running on highend desktops. With this motivation, we investigate both GPU (Graphics Processing Unit) and FPGA (Field Programmable Gate Array) based acceleration of subsequence similarity search under the DTW measure. As we shall show, our novel algorithms allow GPUs, which are typically bundled with standard desktops, to achieve two orders of magnitude speedup. For problem domains which require even greater scale up, we show that FPGAs costing just a few thousand dollars can be used to produce four orders of magnitude speedup. We conduct detailed case studies on the classification of astronomical observations and similarity search in commercial agriculture, and demonstrate that our ideas allow us to tackle problems that would be simply untenable otherwise. Keywords time series; similarity search; dynamic time warping; FPGA; GPU; I.
V.: A survey of querybyhumming similarity methods
 In: Conf. on Pervasive Technologies Related to Assistive Environments (PETRA
, 2012
"... Performing similarity search in large databases is a problem of particular interest in many communities, such as music, database, and data mining. Although several solutions have been proposed in the literature that perform well in many application domains, there is no best method to solve this kind ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Performing similarity search in large databases is a problem of particular interest in many communities, such as music, database, and data mining. Although several solutions have been proposed in the literature that perform well in many application domains, there is no best method to solve this kind of problem in a QueryByHumming (QBH) application. In QBH the goal is to find the song(s) most similar to a hummed query in an efficient manner. In this paper, we focus on providing a brief overview of the representations to encode music pieces, and also on the methods that have been proposed for QBH or other similarly defined problems.
Efficient Processing of Warping Time Series Join of Motion Capture Data
"... Abstract — Discovering nontrivial matching subsequences from two time series is very useful in synthesizing novel time series. This can be applied to applications such as motion synthesis where smooth and natural motion sequences are often required to be generated from existing motion sequences. We ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Discovering nontrivial matching subsequences from two time series is very useful in synthesizing novel time series. This can be applied to applications such as motion synthesis where smooth and natural motion sequences are often required to be generated from existing motion sequences. We first address this problem by defining it as a problem of lεjoin over two time series. Given two time series, the goal of lεjoin is to find those nontrivial matching subsequences by detecting maximal lconnections from the εmatching matrix of the two time series. Given a querying motion sequence, the lεjoin can be applied to retrieve all connectable motion sequences from a database of motion sequences. To support efficient lεjoin of time series, we propose a twostep filterandrefine algorithm, called Warping Time Series Join (WTSJ) algorithm. The filtering step serves to prune those sparse regions of the εmatching matrix where there are no maximal lconnections without incurring costly computation. The refinement step serves to detect closed lconnections within regions that cannot be pruned by the filtering step. To speed up the computation of εmatching matrix, we propose a blockbased time series summarization method, based on which the blockwise εmatching matrix is first computed. Lots of pairwise distance computation of elements can then be avoided by applying the filtering algorithm on the blockwise εmatching matrix. Extensive experiments on lεjoin of motion capture sequences are conducted. The results confirm the efficiency and effectiveness of our proposed algorithm in processing lεjoin of motion capture time series. I.
Online Detecting and Predicting Special Patterns over Financial Data Streams
"... Abstract: Online detecting special patterns over financial data streams is an interesting and significant work. Existing many algorithms take it as a subsequence similarity matching problem. However, pattern detection on streaming time series is naturally expensive by this means. An efficient segmen ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract: Online detecting special patterns over financial data streams is an interesting and significant work. Existing many algorithms take it as a subsequence similarity matching problem. However, pattern detection on streaming time series is naturally expensive by this means. An efficient segmenting algorithm ONSP (ONline Segmenting and Pruning) is proposed, which is used to find the end points of special patterns. Moreover, a novel metric distance function is introduced which more agrees with human perceptions of pattern similarity. During the process, our system presents a pattern matching algorithm to efficiently match possible emerging patterns among data streams, and a probability prediction approach to predict the possible patterns which have not emerged in the system. Experimental results show that these approaches are effective and efficient for online pattern detecting and predicting over thousands of financial data streams.
A Generic Framework for Efficient and Effective Subsequence Retrieval
, 2012
"... This paper proposes a general framework for matching similar subsequences in both time series and string databases. The matching results are pairs of query subsequences and database subsequences. The framework finds all possible pairs of similar subsequences if the distance measure satisfies the “co ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
This paper proposes a general framework for matching similar subsequences in both time series and string databases. The matching results are pairs of query subsequences and database subsequences. The framework finds all possible pairs of similar subsequences if the distance measure satisfies the “consistency ” property, which is a property introduced in this paper. We show that most popular distance functions, such as the Euclidean distance, DTW, ERP, the Frechét distance for time series, and the Hamming distance and Levenshtein distance for strings, are all “consistent”. We also propose a generic index structure for metric spaces named “reference net”. The reference net occupies O(n) space, where n is the size of the dataset and is optimized to work well with our framework. The experiments demonstrate the ability of our method to improve retrieval performance when combined with diverse distance measures. The experiments also illustrate that the reference net scales well in terms of space overhead and query time.
Generalizing Dynamic Time Warping to the Multi Dimensional Case Requires an Adaptive Approach
"... This paper is an extended version of our SDM 2015 paper [a]. We are making this version available early as a service to the community (as MultiDimensional DTW has become ubiquitous with the prevalence of wearable sensors) and to solicit feedback and corrections. ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper is an extended version of our SDM 2015 paper [a]. We are making this version available early as a service to the community (as MultiDimensional DTW has become ubiquitous with the prevalence of wearable sensors) and to solicit feedback and corrections.
M.S.: Subsequence matching of stream synopses under the time warping distance
 PAKDD 2010. LNCS
, 2010
"... Abstract. In this paper, we propose a method for online subsequence matching between histogrambased stream synopsis structures under the dynamic warping distance. Given a query synopsis pattern, the work continuously identifies all the matching subsequences for a stream as the histograms are genera ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we propose a method for online subsequence matching between histogrambased stream synopsis structures under the dynamic warping distance. Given a query synopsis pattern, the work continuously identifies all the matching subsequences for a stream as the histograms are generated. To effectively reduce the computation time, we design a Weighted Dynamic Time Warping (WDTW) algorithm which computes the warping distance directly between two histogrambased synopses. Our experiments on real datasets show that the proposed method significantly speeds up the pattern matching by sacrificing a little accuracy. 1
Autoplait: Automatic mining of coevolving time sequences
 In SIGMOD
, 2014
"... ABSTRACT Given a large collection of coevolving multiple timeseries, which contains an unknown number of patterns of different durations, how can we efficiently and effectively find typical patterns and the points of variation? How can we statistically summarize all the sequences, and achieve a m ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
ABSTRACT Given a large collection of coevolving multiple timeseries, which contains an unknown number of patterns of different durations, how can we efficiently and effectively find typical patterns and the points of variation? How can we statistically summarize all the sequences, and achieve a meaningful segmentation? In this paper we present AUTOPLAIT, a fully automatic mining algorithm for coevolving time sequences. Our method has the following properties: (a) effectiveness: it operates on large collections of timeseries, and finds similar segment groups that agree with human intuition; (b) scalability: it is linear with the input size, and thus scales up very well; and (c) AUTOPLAIT is parameterfree, and requires no user intervention, no prior training, and no parameter tuning. Extensive experiments on 67GB of real datasets demonstrate that AUTOPLAIT does indeed detect meaningful patterns correctly, and it outperforms stateoftheart competitors as regards accuracy and speed: AUTOPLAIT achieves nearperfect, over 95% precision and recall, and it is up to 472 times faster than its competitors.