Results 1  10
of
47
Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases
 In proceedings of ACM SIGMOD Conference on Management of Data
, 2002
"... Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions' involve performing dimensionality reduction on the data, then indexing the reduced d ..."
Abstract

Cited by 252 (28 self)
 Add to MetaCart
(Show Context)
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions' involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), and the Discrete Wavelet Transform (DWT). In this work we introduce a new dimensionality reduction technique which we call Adaptive Piecewise Constant Approximation (APCA). While previous techniques (e.g., SVD, DFT and DWT) choose a common representation for all the items in the database that minimizes the global reconstruction error, APCA approximates each time series by a set of constant value segments' of varying lengths' such that their individual reconstruction errors' are minimal. We show how APCA can be indexed using a multidimensional index structure. We propose two distance measures in the indexed space that exploit the high fidelity of APCA for fast searching: a lower bounding Euclidean distance approximation, and a nonlower bounding, but very tight Euclidean distance approximation and show how they can support fast exact searchin& and even faster approximate searching on the same index structure. We theoretically and empirically compare APCA to all the other techniques and demonstrate its' superiority.
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
 SIGKDD'02
, 2002
"... ... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in ..."
Abstract

Cited by 237 (51 self)
 Add to MetaCart
... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details. To illustrate our point
Discovering similar multidimensional trajectories
 In ICDE
, 2002
"... We investigate techniques for analysis and retrieval of object trajectories in a two or three dimensional space. Such kind of data usually contain a great amount of noise, that makes all previously used metrics fail. Therefore, here we formalize nonmetric similarity functions based on the Longest C ..."
Abstract

Cited by 188 (6 self)
 Add to MetaCart
(Show Context)
We investigate techniques for analysis and retrieval of object trajectories in a two or three dimensional space. Such kind of data usually contain a great amount of noise, that makes all previously used metrics fail. Therefore, here we formalize nonmetric similarity functions based on the Longest Common Subsequence (LCSS), which are very robust to noise and furthermore provide an intuitive notion of similarity between trajectories by giving more weight to the similar portions of the sequences. Stretching of sequences in time is allowed, as well as global translating of the sequences in space. Efficient approximate algorithms that compute these similarity measures are also provided. We compare these new methods to the widely used Euclidean and Time Warping distance functions (for real and synthetic data) and show the superiority of our approach, especially under the strong presence of noise. We prove a weaker version of the triangle inequality and employ it in an indexing structure to answer nearest neighbor queries. Finally, we present experimental results that validate the accuracy and efficiency of our approach. 1
Landmarks: a new model for similaritybased pattern querying in time series databases
 In ICDE
, 2000
"... In this paper we present the Landmark Model, a model for time series that yields new techniques for similaritybased time series pattern querying. The Landmark Model does not follow traditional similarity models that rely on pointwise Euclidean distance. Instead, it leads to Landmark Similarity, a g ..."
Abstract

Cited by 78 (5 self)
 Add to MetaCart
In this paper we present the Landmark Model, a model for time series that yields new techniques for similaritybased time series pattern querying. The Landmark Model does not follow traditional similarity models that rely on pointwise Euclidean distance. Instead, it leads to Landmark Similarity, a general model of similarity that is consistent with human intuition and episodic memory. By tracking different specific subsets of features of landmarks, we can efficiently compute different Landmark Similarity measures that are invariant under corresponding subsets of six transformations; namely, Shifting, Uniform
Patterns of temporal variation in online media
, 2010
"... Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored. We stu ..."
Abstract

Cited by 57 (3 self)
 Add to MetaCart
(Show Context)
Online content exhibits rich temporal dynamics, and diverse realtime user generated content further intensifies this process. However, temporal patterns by which online content grows and fades over time, and by which different pieces of content compete for attention remain largely unexplored. We study temporal patterns associated with online content and how the content’s popularity grows and fades over time. The attention that content receives on the Web varies depending on many factors and occurs on very different time scales and at different resolutions. In order to uncover the temporal dynamics of online content we formulate a time series clustering problem using a similarity metric that is invariant to scaling and shifting. We develop the KSpectral Centroid (KSC) clustering algorithm that effectively finds cluster centroids with our similarity measure. By applying an adaptive waveletbased incremental approach to clustering, we scale KSC to large data sets. We demonstrate our approach on two massive datasets: a set of 580 million Tweets, and a set of 170 million blog posts and news media articles. We find that KSC outperforms the Kmeans clustering algorithm in finding distinct shapes of time series. Our analysis shows that there are six main temporal shapes of attention of online content. We also present a simple model that reliably predicts the shape of attention by using information about only a small number of participants. Our analyses offer insight into common temporal patterns of the content on the Web and broaden the understanding of the dynamics of human attention.
Indexing SpatioTemporal Trajectories with Chebyshev Polynomials
 Proc. 2004 SIGMOD, toappear
"... In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate ..."
Abstract

Cited by 54 (0 self)
 Add to MetaCart
(Show Context)
In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate a multidimensional index into the reduced space of polynomial coefficients. There are many possible ways to choose the polynomial, including Fourier transforms, splines, nonlinear regressions, etc. Some of these possibilities have indeed been studied before. We hypothesize that one of the best approaches is the polynomial that minimizes the maximum deviation from the true value, which is called the minimax polynomial. Minimax approximation is particularly meaningful for indexing because in a branchandbound search (i.e., for finding nearest neighbours), the smaller the maximum deviation, the more pruning opportunities there exist. In general, among all the polynomials of the same degree, the optimal minimax polynomial is very hard to compute. However, it has been shown that the Chebyshev approximation is almost identical to the optimal minimax polynomial, and is easy to compute [32]. Thus, we shall explore how to use
An IndexBased Approach for Similarity Search Supporting Time Warping in Large Sequence Databases
 In ICDE
, 2001
"... This paper discusses an effective processing of similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warp ..."
Abstract

Cited by 44 (2 self)
 Add to MetaCart
(Show Context)
This paper discusses an effective processing of similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multidimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan all the database, thus suffer from serious performance degradation in large databases. Another method that hires the suffix tree, which does not assume any distance function, also shows poor performance due to the large tree size. In this paper, we propose a new novel method for similarity search that supports time warping. Our primary goal is to innovate on search performance in large databases without permitting any false dismissal. To attain this goal, we devise a new distance function D tw\Gammalb that consistently unde...
Robust Similarity Measures for Mobile Object Trajectories
 Proc. of DEXA Workshops
, 2002
"... We investigate techniques for similarity analysis of spatiotemporal trajectories for mobile objects. Such kind of data may contain a great amount of outliers, which degrades the performance of Euclidean and Time Warping Distance. Therefore, here we propose the use of nonmetric distance functions b ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
We investigate techniques for similarity analysis of spatiotemporal trajectories for mobile objects. Such kind of data may contain a great amount of outliers, which degrades the performance of Euclidean and Time Warping Distance. Therefore, here we propose the use of nonmetric distance functions based on the Longest Common Subsequence (LCSS), in conjunction with a sigmoidal matching function. Finally, we compare these new methods to various L p Norms and also to Time Warping distance (for real and synthetic data) and we present experimental results that validate the accuracy and efficiency of our approach, especially under the strong presence of noise.
Symbolic Representation and Retrieval of Moving Object Trajectories
, 2003
"... Similaritybased retrieval of moving object trajectory is useful to many applications GPS systems, sport and surveillance video analysis. However, due to sensor failures, errors in detection techniques, or different sampling rates, noises, local shifts and scales may appear in the trajectory record ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
(Show Context)
Similaritybased retrieval of moving object trajectory is useful to many applications GPS systems, sport and surveillance video analysis. However, due to sensor failures, errors in detection techniques, or different sampling rates, noises, local shifts and scales may appear in the trajectory records. Hence, it is difficult to design a robust and fast similarity measure for similaritybased retrieval in a large database. In this paper, normalized edit distance (NED) is proposed to measure the similarity between two trajectories. We evaluate the efficacy of NED and compare it with those of Euclidean distance, Dynamic Time Warping (DTW), and Longest Common Subsequences (LCSS), showing that NED is more robust and accurate for trajectories that contain noise and local time shifting. Furthermore, in order to improve the retrieval efficiency, we propose a novel representation of trajectories, called movement pattern strings, which convert the trajectories into a symbolic representation. Movement pattern strings encode both the movement direction and the movement distance information of the trajectories. The distances that are computed in a symbolic space are lower bounds of the distances of original trajectory data, which guarantees that no false dismissals will be introduced using movement pattern strings to retrieve trajectories. Finally, we define a modified frequency distance for frequency vectors that are obtained from movement pattern strings to reduce the dimensionality of movement pattern strings and computation cost of NED. The experimental results show that the cost of retrieving similar trajectories can be greatly reduced when the modified frequency distance is used as a filter. 1
Gamps: Compressing multi sensor data by grouping and amplitude scaling
 In: ACM SIGMOD. (2009
"... We consider the problem of collectively approximating a set of sensor signals using the least amount of space so that any individual signal can be efficiently reconstructed within a given maximum (L∞) error ε. The problem arises naturally in applications that need to collect large amounts of data fr ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of collectively approximating a set of sensor signals using the least amount of space so that any individual signal can be efficiently reconstructed within a given maximum (L∞) error ε. The problem arises naturally in applications that need to collect large amounts of data from multiple concurrent sources, such as sensors, servers and network routers, and archive them over a long period of time for offline data mining. We present GAMPS, a general framework that addresses this problem by combining several novel techniques. First, it dynamically groups multiple signals together so that signals within each group are correlated and can be maximally compressed jointly. Second, it appropriately scales the amplitudes of different signals within a group and compresses them within the maximum allowed reconstruction error bound. Our schemes are polynomial time O(α, β) approximation schemes, meaning that the maximum (L∞) error is at most αε and it uses at most β times the optimal memory. Finally, GAMPS maintains an index so that various queries can be issued directly on compressed data. Our experiments on several realworld sensor datasets show that GAMPS significantly reduces space without compromising the quality of search and query. Categories and Subject Descriptors