## Time Series Analysis Using the Concept of Adaptable Threshold Similarity (2006)

### Cached

### Download Links

- [www.dbs.informatik.uni-muenchen.de]
- [www.dbs.informatik.uni-muenchen.de]
- [www.dbs.ifi.lmu.de]
- DBLP

### Other Repositories/Bibliography

Citations: | 1 - 1 self |

### BibTeX

@MISC{Assfalg06timeseries,

author = {Johannes Assfalg and Hans-peter Kriegel and Peer Kröger and Peter Kunath and Alexey Pryakhin and Matthias Renz},

title = {Time Series Analysis Using the Concept of Adaptable Threshold Similarity},

year = {2006}

}

### OpenURL

### Abstract

The issue of data mining in time series databases is of utmost importance for many practical applications and has attracted a lot of research in the past years. In this paper, we focus on the recently proposed concept of threshold similarity which compares the time series based on the time frames within which they exceed a user-defined amplitude threshold τ. We propose a novel approach for cluster analysis of time series based on adaptable threshold similarity. The most important issue in threshold similarity is the choice of the threshold τ. Thus, the threshold τ is automatically adapted to the characteristics of a small training dataset using the concept of support vector machines. Thus, the optimal τ is learned from a small training set in order to yield an accurate clustering of the entire time series database. In our experimental evaluation we demonstrate that our cluster analysis using adaptable threshold similarity can be successfully applied to many scientific real-world data mining applications.

### Citations

9811 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ... leaves much space for improvement. In this paper, we compute an optimal score by applying the concept of support vector machines (SVMs). In general, SVMs provide an optimal separation of two classes =-=[24]-=- and can easily be extended to multi-class problems. However, in order to apply SVMs to threshold similarity of time series we have to extend the basic concepts of threshold similarity. In the followi... |

2113 | Data mining: Concepts and techniques
- Han, Kamber
- 2001
(Show Context)
Citation Context ...lysis task. For clustering time series data, most of the various clustering methods proposed in the past decades have been successfully applied. A general overview over clustering methods is given in =-=[18]-=-. 2.2. Semi-Supervised Cluster Analysis In addition to the similarity information used by unsupervised clustering, in many cases a small amount of knowledge is available concerning either pairwise (mu... |

892 |
Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization
- Spellman, Sherlock, et al.
- 1998
(Show Context)
Citation Context ...e series represents the measurements of one station at a given day containing 48 values for one of 10 different parameters such as temperature, ozone concentration, etc. The gene expression data from =-=[23]-=- contains the expression level of approximately 6,000 genes measured at only 24 different time slots. The expression level of a gene indicates how active it is. This dataset was derived from the Gene ... |

441 | Fast Subsequence Matching in Time-Series Database
- Faloutsos, Rangantathan, et al.
- 1994
(Show Context)
Citation Context ...rse of dimensionality. Thus, several more suitable representations of time series data, e.g. by reducing the dimensionality, have been proposed. Most of them are based on the GEMINI indexing approach =-=[15]-=-: extract a few key features for each time series and map each time sequence X to a point f(X) in a lower dimensional feature space, such that the distance between X and any other time series Y is alw... |

439 | Efficient similarity search in sequence databases
- Agrawal, Faloutsos, et al.
- 1993
(Show Context)
Citation Context ... efficient access any well known spatial access method can be used to index the feature space. The proposed methods mainly differ in the representation of the time series, including among others, DFT =-=[1]-=- and extensions [26], DWT [12], PAA [27], SVD [21, 2], APCA [19], Chebyshev Polynomials [11], and cubic splines [7]. However, all techniques which are based on dimensionality reduction cannot be direc... |

376 | OPTICS: Ordering points to identify the clustering structure
- Ankerst, Breunig, et al.
- 1999
(Show Context)
Citation Context ... methods for semi-supervised clustering do not take the threshold-based similarity of time series data into account. In our approach, we use the density-based hierarchical clustering algorithm OPTICS =-=[3]-=-. However, any clustering algorithm is applicable in our framework. 2.3. Adaptable Threshold Similarity The concept of threshold similarity in time series databases has been proposed in [6, 5]. We wil... |

350 | Constrained k-means clustering with background knowledge
- Wagstaff, Cardie, et al.
- 2001
(Show Context)
Citation Context ...is given in [14] describing SPAM a supervised variant of PAM, SRIDHCR, a greedy algorithm with random restart, SCEC, an evolutionary algorithm, TDS, a medoid-based top down partitioning algorithm. In =-=[25]-=-, a variant of a k-means based clustering algorithm is proposed. The authors derive constraints from the labeled objects which are used during the clustering. They distinguish between explicit and can... |

260 | Adaptive duplicate detection using learnable string similarity measures
- Bilenko, Mooney
(Show Context)
Citation Context ...aptive similarity measure. The authors of [20] propose to apply a complete-link clustering algorithm after replacing the Euclidean distance with the shortest path algorithm. The approach described in =-=[10]-=- weights the edit distance using an expectation maximization algorithm to detect approximately duplicate objects in a database. [9] describes a probabilistic framework for semi-supervised clustering t... |

246 | Locally adaptive dimensionality reduction for indexing large time series databases
- Chakrabarti, Keogh, et al.
- 2001
(Show Context)
Citation Context ...sed to index the feature space. The proposed methods mainly differ in the representation of the time series, including among others, DFT [1] and extensions [26], DWT [12], PAA [27], SVD [21, 2], APCA =-=[19]-=-, Chebyshev Polynomials [11], and cubic splines [7]. However, all techniques which are based on dimensionality reduction cannot be directly applied to threshold similarity, because usually, in a reduc... |

215 | Efficient Time Series Matching by Wavelets
- Chan, Fu
- 1999
(Show Context)
Citation Context ...own spatial access method can be used to index the feature space. The proposed methods mainly differ in the representation of the time series, including among others, DFT [1] and extensions [26], DWT =-=[12]-=-, PAA [27], SVD [21, 2], APCA [19], Chebyshev Polynomials [11], and cubic splines [7]. However, all techniques which are based on dimensionality reduction cannot be directly applied to threshold simil... |

207 | A probabilistic framework for semi-supervised clustering
- Basu, Bilenko, et al.
- 2004
(Show Context)
Citation Context ...based clustering algorithm is proposed. The authors derive constraints from the labeled objects which are used during the clustering. They distinguish between explicit and cannot-link constraints. In =-=[8]-=-, a k-means based method is introduced which is based on both types of constraints and which exploits the data distribution. The authors of [13] describe an evolutionary method for semi-supervised clu... |

194 | Integrating constraints and metric learning in semi-supervised clustering
- Bilenko, Basu, et al.
- 2004
(Show Context)
Citation Context ...istance with the shortest path algorithm. The approach described in [10] weights the edit distance using an expectation maximization algorithm to detect approximately duplicate objects in a database. =-=[9]-=- describes a probabilistic framework for semi-supervised clustering to additionally support several non Euclidian distance measures, e.g. the cosine distance. All mentioned methods for semi-supervised... |

194 | On clustering validation techniques
- HALKIDI, BATISTAKIS, et al.
- 2001
(Show Context)
Citation Context ...e do indeed yield good results on the whole data set. To evaluate this, we clustered the time series for two different threshold values τ+ and τ− and determined the rand index and the average entropy =-=[17]-=-. For example, the threshold value τ+ = 710 which corresponds to a high separation score on the DS3 dataset resulted in a rand index equal to 0.97. Contrary, when using a threshold value of τ− = 3064 ... |

164 | From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering
- Klein, Kamvar, et al.
- 2002
(Show Context)
Citation Context ... reflects the class distribution of the labeled training data, further methods have been developed which use a standard clustering algorithm by applying an adaptive similarity measure. The authors of =-=[20]-=- propose to apply a complete-link clustering algorithm after replacing the Euclidean distance with the shortest path algorithm. The approach described in [10] weights the edit distance using an expect... |

155 |
Faloutsos: Fast time sequence indexing for arbitrary Lp norms. VLDB
- Yi, C
- 2000
(Show Context)
Citation Context ...l access method can be used to index the feature space. The proposed methods mainly differ in the representation of the time series, including among others, DFT [1] and extensions [26], DWT [12], PAA =-=[27]-=-, SVD [21, 2], APCA [19], Chebyshev Polynomials [11], and cubic splines [7]. However, all techniques which are based on dimensionality reduction cannot be directly applied to threshold similarity, bec... |

118 | Multi-instance kernels
- Gärtner, Flach, et al.
- 2002
(Show Context)
Citation Context ...ervals of all objects in T generated by a given threshold τ. For our approach, in order to compare two time series X, Y ∈ T , we use the set kernel k(T Cτ (X), T Cτ (Y )) which has been introduced in =-=[16]-=- and is defined by k(T Cτ (X), T Cτ (Y )) := � t X ∈T Cτ (X),t Y ∈T Cτ (Y ) κχ(t X , t Y ), where κχ denotes a kernel on χ, i.e. on single intervals. In order to keep the similarity function invariant... |

107 | C.: Efficiently supporting ad hoc queries in large datasets of time sequences
- Korn, Jagadish, et al.
- 1997
(Show Context)
Citation Context ...ethod can be used to index the feature space. The proposed methods mainly differ in the representation of the time series, including among others, DFT [1] and extensions [26], DWT [12], PAA [27], SVD =-=[21, 2]-=-, APCA [19], Chebyshev Polynomials [11], and cubic splines [7]. However, all techniques which are based on dimensionality reduction cannot be directly applied to threshold similarity, because usually,... |

84 |
Generalized singular value decomposition for comparative analysis of genomescale expression data sets of two dierent organisms
- Alter
- 2003
(Show Context)
Citation Context ...ethod can be used to index the feature space. The proposed methods mainly differ in the representation of the time series, including among others, DFT [1] and extensions [26], DWT [12], PAA [27], SVD =-=[21, 2]-=-, APCA [19], Chebyshev Polynomials [11], and cubic splines [7]. However, all techniques which are based on dimensionality reduction cannot be directly applied to threshold similarity, because usually,... |

71 | Continuous representations of time-series gene expression data - Bar-Joseph - 2003 |

69 | Semi-supervised clustering using genetic algorithms
- Demiriz, Bennett, et al.
- 1999
(Show Context)
Citation Context ...inguish between explicit and cannot-link constraints. In [8], a k-means based method is introduced which is based on both types of constraints and which exploits the data distribution. The authors of =-=[13]-=- describe an evolutionary method for semi-supervised clustering. This approach has to be initialized with k arbitrary centroids and optimizes a quality measure considering cluster dispersion and impur... |

54 | Indexing Spatio-temporal Trajectories With Chebyshev Polynomials
- Cai, Ng
- 2004
(Show Context)
Citation Context ...ce. The proposed methods mainly differ in the representation of the time series, including among others, DFT [1] and extensions [26], DWT [12], PAA [27], SVD [21, 2], APCA [19], Chebyshev Polynomials =-=[11]-=-, and cubic splines [7]. However, all techniques which are based on dimensionality reduction cannot be directly applied to threshold similarity, because usually, in a reduced feature space, the origin... |

50 | Identifying periodically expressed transcripts in microarray time series data
- Wichert
- 2004
(Show Context)
Citation Context ...ny well known spatial access method can be used to index the feature space. The proposed methods mainly differ in the representation of the time series, including among others, DFT [1] and extensions =-=[26]-=-, DWT [12], PAA [27], SVD [21, 2], APCA [19], Chebyshev Polynomials [11], and cubic splines [7]. However, all techniques which are based on dimensionality reduction cannot be directly applied to thres... |

22 | Z..: Supervised Clustering – Algorithms and Benefits
- Eick, Zeidat, et al.
(Show Context)
Citation Context ...rved that using unlabeled data can increase the classification accuracy. Several extensions of existing standard clustering algorithms have been proposed in the literature. A brief survey is given in =-=[14]-=- describing SPAM a supervised variant of PAM, SRIDHCR, a greedy algorithm with random restart, SCEC, an evolutionary algorithm, TDS, a medoid-based top down partitioning algorithm. In [25], a variant ... |

22 |
Analyzing gene expression time-courses
- SCHLIEP, COSTA, et al.
- 2005
(Show Context)
Citation Context ...ing labeled data as feedback in order to help to cluster unlabeled data. Most of the proposed methods for semisupervised clustering assume that class labels for all objects to be processed are given. =-=[22]-=- proposes a method based on a mixture of hidden Markov models that makes use of prior knowledge in order to improve the robustness and the quality of the local optima found. The author of [28] introdu... |

5 | Semi-supervised sequence classification with HMMs
- Zhong
- 2005
(Show Context)
Citation Context ... given. [22] proposes a method based on a mixture of hidden Markov models that makes use of prior knowledge in order to improve the robustness and the quality of the local optima found. The author of =-=[28]-=- introduces a semi-supervised classification for time sequences based on hidden Markov models. Two different semi-supervised learning paradigms are discussed. The author observed that using unlabeled ... |

3 | Threshold similarity queries in large time series databases
- Aßfalg, Kriegel, et al.
- 2006
(Show Context)
Citation Context ...y between time series, e.g. similar patterns of time series, plays a key role for the analysis. Recently, a novel but very important similarity measure called threshold similarity has been introduced =-=[6, 5]-=- which enables the analysis of time series tightly focused on a specific amplitude spectrum, in particular amplitudes that are important and significant for the analysis goal. Given two time series X ... |

1 | Semi-supervised Threshold Queries on Pharmacogenomics Time Sequences
- Aßfalg, Kriegel, et al.
- 2006
(Show Context)
Citation Context ...apted threshold) is performed in order to detect novel and important patterns. So far, there is only one approach to “optimally” adapt the threshold for a threshold similarity analysis of time series =-=[4]-=-. However, this approach uses a very simple and — in most cases — not very accurate method to judge the quality of a given threshold for the analysis task. In this paper, we present a novel semi-super... |