## iSAX: Indexing and Mining Terabyte Sized Time Series, SIGKDD. pp (2008)

Citations: | 24 - 3 self |

### BibTeX

@MISC{Shieh08isax:indexing,

author = {Jin Shieh and Eamonn Keogh},

title = {iSAX: Indexing and Mining Terabyte Sized Time Series, SIGKDD. pp},

year = {2008}

}

### OpenURL

### Abstract

Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, it has not led to algorithms that can scale to the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we show how a novel multiresolution symbolic representation can be used to index datasets which are several orders of magnitude larger than anything else considered in the literature. Our approach allows both fast exact search and ultra fast approximate search. We show how to exploit the combination of both types of search as sub-routines in data mining algorithms, allowing for the exact mining of truly massive real world datasets, containing millions of time series.

### Citations

424 | Fast subsequence matching in time-series databases
- Faloutsos, Ranganathan, et al.
- 1994
(Show Context)
Citation Context ... o Singular Value Decomposition* o Symbolic � Natural Language � Strings • Non-Lower Bounding [1][7][13] • SAX* [12], iSAX* o Trees • Non-Data Adaptive o Wavelets* o Random Mappings o Spectral � DFT* =-=[6]-=- � DCT* � Chebyshev Polynomials* [4] o Piecewise Aggregate Approximation* [9] • Data Dictated o Clipped Data* Those representations annotated with an asterisk have the very desirable property of allow... |

155 | Dimensionality Reduction for fast similarity search in large time series databases
- Keogh, Chakrabarti, et al.
- 2000
(Show Context)
Citation Context ...n-Lower Bounding [1][7][13] • SAX* [12], iSAX* o Trees • Non-Data Adaptive o Wavelets* o Random Mappings o Spectral � DFT* [6] � DCT* � Chebyshev Polynomials* [4] o Piecewise Aggregate Approximation* =-=[9]-=- • Data Dictated o Clipped Data* Those representations annotated with an asterisk have the very desirable property of allowing lower bounding. That is to say, we can define a distance measurement on t... |

53 | Experiencing sax: a novel symbolic representation of time series
- Lin, Wei, et al.
(Show Context)
Citation Context ...mining of truly massive datasets, with millions of time series, occupying up to a terabyte of disk space. Our approach is based on a modification of the SAX representation to allow extensible hashing =-=[12]-=-. In essence, we show how we can modify SAX to be a multiresolution representation, similar in spirit to wavelets. It is this multiresolution property that allows us to index time series with zero ove... |

50 | Indexing spatio-temporal trajectories with chebyshev 146 onManagementofdata,SIGMOD’04,pp.599–610,ACM,2004
- Cai, Ng
(Show Context)
Citation Context ...ymbolic � Natural Language � Strings • Non-Lower Bounding [1][7][13] • SAX* [12], iSAX* o Trees • Non-Data Adaptive o Wavelets* o Random Mappings o Spectral � DFT* [6] � DCT* � Chebyshev Polynomials* =-=[4]-=- o Piecewise Aggregate Approximation* [9] • Data Dictated o Clipped Data* Those representations annotated with an asterisk have the very desirable property of allowing lower bounding. That is to say, ... |

35 | Fast time series classification using numerosity reduction
- Xi
- 2006
(Show Context)
Citation Context ...OUND AND RELATED WORK 2.1 Time Series Distance Measures It is increasingly understood that Dynamic Time Warping (DTW) is better than Euclidean Distance (ED) for most data mining tasks in most domains =-=[17]-=-. It is therefore natural to ask why we are planning to consider Euclidean distance in this work. The well documented superiority of DTW over ED is due to the fact that in small datasets it might be n... |

29 | Adaptive Query Processing for TimeSeries Data - Huang, Yu - 1999 |

26 | Discovery of Climate Indices using Clustering
- Steinbach, Tan, et al.
- 2003
(Show Context)
Citation Context ... are the eigenvalue analysis techniques such as SVD and PCA. While such techniques give optimal linear dimensionality reduction, we believe they are untenable for massive datasets. For example, while =-=[16]-=- notes that they can transform 70,000 time series in under 10 minutes, this assumes the data can fit in main memory. However, to transform all the out-of-core (disk resident) datasets we consider in t... |

24 | Using Signature Files for Querying Timeseries Data
- Andre-Jonsson, Badal
- 1997
(Show Context)
Citation Context ...daptive o Piecewise Polynomials � Interpolation* � Regression o Adaptive Piecewise Constant Approximation* o Singular Value Decomposition* o Symbolic � Natural Language � Strings • Non-Lower Bounding =-=[1]-=-[7][13] • SAX* [12], iSAX* o Trees • Non-Data Adaptive o Wavelets* o Random Mappings o Spectral � DFT* [6] � DCT* � Chebyshev Polynomials* [4] o Piecewise Aggregate Approximation* [9] • Data Dictated ... |

17 |
Three myths about dynamic time warping
- Ratanamahatana, Keogh
(Show Context)
Citation Context ...h the nearest neighbor. However, in larger datasets one is more likely to find a close match without the need to warp. As DTW warps less and less, it degenerates to simple ED. This was first noted in =-=[14]-=- and later confirmed in [17] and elsewhere. For completeness, we will show a demonstration of this effect. We measured the classification accuracy of both DTW and ED on increasingly large datasets con... |

14 | Multiresolution Symbolic Representation of Time Series
- Megalooikonomou
- 2005
(Show Context)
Citation Context ... indexing 100,000,000 random walk time series of length 256. To the best of our knowledge, this is as least two orders of magnitude larger that any other dataset considered in the literature [2][4][6]=-=[13]-=-. Since the publication of Don Quixote de la Mancha in the 17 th century, the idiom, “a needle in a haystack” has been used to signify a near impossible search. If each time series in this experiment ... |

10 |
A wavelet transform-based ecg compression method guaranteeing desired signal quality
- Chen, Itoh
- 1998
(Show Context)
Citation Context ...ncate the real valued coefficients to save space? In fact, this is a very common idea in compression of time series data. For example, in the medical domain it is frequently done for both the wavelet =-=[5]-=- and cosine [3] representations. However, recall that we are not interested in compression per se. Our interest is in dimensionality reduction that allows indexing with no false dismissals. If, for th... |

9 | The TS-Tree: Efficient Time Series Search and Retrieval
- Assent, Krieger, et al.
(Show Context)
Citation Context ...ow how we can modify SAX to be a multiresolution representation, similar in spirit to wavelets. It is this multiresolution property that allows us to index time series with zero overlap at leaf nodes =-=[2]-=-, unlike R-trees and other spatial access methods. As we shall show, our indexing technique is fast and scalable due to intrinsic properties of the iSAX representation. Because of this, we do not requ... |

3 |
Compression of ecg signals by optimized quantization of discrete cosine transform coefficients. Med Eng Phys
- Batista, Melcher, et al.
- 2001
(Show Context)
Citation Context ...valued coefficients to save space? In fact, this is a very common idea in compression of time series data. For example, in the medical domain it is frequently done for both the wavelet [5] and cosine =-=[3]-=- representations. However, recall that we are not interested in compression per se. Our interest is in dimensionality reduction that allows indexing with no false dismissals. If, for the other approac... |

1 |
Atlas of states of sleep and wakefulness in infants and children. Somnologie - Schlafforschung und Schlafmedizin 3:4
- unknown authors
- 1999
(Show Context)
Citation Context ... to slowing of the heart rate with expiration and increase of the heart rate with inspiration, given that it is well known that respiration patterns change in conjunction with changes in sleep stages =-=[15]-=-. An obvious naive algorithm to find the TSSD is to do 20,000 exact searches, one for each object in A. This requires (“only”) 325,604,200 Euclidean distance calculations, but it requires approximatel... |