## Time Series Shapelets: A New Primitive for Data Mining

Citations: | 25 - 7 self |

### BibTeX

@MISC{Ye_timeseries,

author = {Lexiang Ye and Eamonn Keogh},

title = {Time Series Shapelets: A New Primitive for Data Mining},

year = {}

}

### OpenURL

### Abstract

Classification of time series has been attracting great interest over the past decade. Recent empirical evidence has strongly suggested that the simple nearest neighbor algorithm is very difficult to beat for most time series problems. While this may be considered good news, given the simplicity of implementing the nearest neighbor algorithm, there are some negative consequences of this. First, the nearest neighbor algorithm requires storing and searching the entire dataset, resulting in a time and space complexity that limits its applicability, especially on resource-limited sensors. Second, beyond mere classification accuracy, we often wish to gain some insight into the data. In this work we introduce a new time series primitive, time series shapelets, which addresses these limitations. Informally, shapelets are time series subsequences which are in some sense maximally representative of a class. As we shall show with extensive empirical evaluations in diverse domains, algorithms based on the time series shapelet primitives can be interpretable, more accurate and significantly faster than state-of-the-art classifiers.

### Citations

4457 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ... some metric to evaluate how well it can divide the entire combined dataset into two original classes. Here, we use concepts very similar to the information gain used in the traditional decision tree =-=[2]-=-. The reader may recall the original definition of entropy which we review here: Definition 6: Entropy. A time series dataset D consists of two classes, A and B. Given that the proportion of objects i... |

505 |
Individual Comparisons by Ranking Methods
- Wilcoxon
- 1945
(Show Context)
Citation Context ...e reader will recall that we used the information gain (or entropy) as that measure. However, there are other commonly used measures for distribution evaluation, such as the Wilcoxon signed-rank test =-=[13]-=-. We adopted the entropy evaluation for two reasons. First, it is easily generalized to the multi-class problem. Second, as we will now show, we can use a novel idea called early entropy pruning to av... |

237 | On the need for time series data mining benchmarks: a survey and empirical demonstration
- Keogh, Kasetty
- 2002
(Show Context)
Citation Context ...mmarized as the following: “Urtica dioica has a stem that connects to the leaf at almost 90 degrees.” Most other state-of-the-art time series/shape classifiers do not produce interpretable results [4]=-=[7]-=-. � Shapelets can be significantly more accurate/robust on some datasets. This is because they are local features, whereas most other state-of-the-art time series/shape classifiers consider global fea... |

172 | On comparing classifiers: Pitfalls to avoid and a recommended approach
- Salzberg
- 1997
(Show Context)
Citation Context ..., Experimentation 1. INTRODUCTION While the last decade has seen a huge interest in time series classification, to date the most accurate and robust method is the simple nearest neighbor algorithm [4]=-=[12]-=-[14]. While the nearest neighbor algorithm has the advantages of simplicity and not requiring extensive parameter tuning, it does have several important disadvantages. Chief among these are its space ... |

130 | Probabilistic Discovery of Time Series Motifs
- Chiu, Keogh, et al.
(Show Context)
Citation Context ...e importantly, an admissible pruning technique that can prune off more than 99.9% of the calculations (c.f. Section 5.1). Our work may also be seen as a form of a supervised motif discovery algorithm =-=[3]-=-. 2.1 Notation Table 1 summarizes the notation in the paper; we expand on the definitions below. Table 1: Symbol table Symbol Explanation T, R time series S subsequence m, |T| length of time series l,... |

69 | Querying and mining of time series data: experimental comparison of representations and distance measures
- Ding, Trajcevski, et al.
(Show Context)
Citation Context ...hms, Experimentation 1. INTRODUCTION While the last decade has seen a huge interest in time series classification, to date the most accurate and robust method is the simple nearest neighbor algorithm =-=[4]-=-[12][14]. While the nearest neighbor algorithm has the advantages of simplicity and not requiring extensive parameter tuning, it does have several important disadvantages. Chief among these are its sp... |

58 | Pattern Extraction for Time Series Classification - Geurts - 2001 |

43 | LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures
- Keogh, Wei, et al.
(Show Context)
Citation Context ...ighlighted section of the time series will be made apparent shortly Such representations have been successfully used for the classification, clustering and outlier detection of shapes in recent years =-=[8]-=-. However, here we find that using a nearest neighbor classifier with either the (rotation invariant) Euclidean distance or Dynamic Time Warping (DTW) distance does not significantly outperform random... |

40 | Fast time series classification using numerosity reduction
- Xi, Keogh, et al.
- 2006
(Show Context)
Citation Context ...perimentation 1. INTRODUCTION While the last decade has seen a huge interest in time series classification, to date the most accurate and robust method is the simple nearest neighbor algorithm [4][12]=-=[14]-=-. While the nearest neighbor algorithm has the advantages of simplicity and not requiring extensive parameter tuning, it does have several important disadvantages. Chief among these are its space and ... |

9 |
Interval and dynamic time warping-based decision trees
- Rodriguez, Alonso
- 2004
(Show Context)
Citation Context ...shortest time series object in the dataset, the number of shapelet candidates is linear in k, and quadratic in m , the average length of time series objects. For example, the well-known Trace dataset =-=[11]-=- has 200 instances, each of length 275. If we set MINLEN=3, MAXLEN=275, there will be 7,480,200 shapelet candidates. For each of these candidates, we need to find its nearest neighbor within the k tim... |

1 | Facsimile edition with commentary: Kommentar zum Faksimile des Codex Manesse: Die grosse Heidelberger Liederhandschrift - Koschorreck, Werner, et al. - 1981 |

1 |
A guide to the study of heraldry
- Montagu
(Show Context)
Citation Context ...Figure 15 can still correctly classify the shield of Charles II, even though a large fraction of it is missing. Figure 16: The top section of a page of the 1840 text, A guide to the study of heraldry =-=[10]-=-. Note some shields are torn 5.4 Understanding the Gun/NoGun Problem The Gun/NoGun motion capture time series dataset is perhaps the most studied time series classification problem in the literature [... |

1 |
The Time Series Shapelet Webpage. www.cs.ucr.edu/~lexiangy/shapelet.html
- Ye
(Show Context)
Citation Context ...MENTAL EVALUATION We begin by discussing our experimental philosophy. We have designed and conducted all experiments such that they are easily reproducible. With this in mind, we have built a webpage =-=[15]-=- which contains all of the datasets and code used in this work, together with spreadsheets which contain the raw numbers displayed in all of the figures, and larger annotated figures showing the decis... |