Results 1  10
of
167
Fast Subsequence Matching in TimeSeries Databases
 SIGMOD 94
, 1994
"... We present an efficient indexing method to locate 1dimensional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space ..."
Abstract

Cited by 430 (21 self)
 Add to MetaCart
We present an efficient indexing method to locate 1dimensional subsequences witbin a collection of sequences, such that the subsequences match a given (query) pattern within a specified tolerance. The idea is to map each data sequence into a small set of multidimensional rectangles in feature space. Then, these rectangles can be readily indexed using traditional spatial access methods, like the R*tree [9]. In more deteil, we use a sliding window over the data sequence and extract its features; the result is a trail in feature space. We propose an efficient and effective algorithm to divide such trails into subtrails, which are subsequently represented by their Minimum Bounding Rectangles (MBRs). We also examine queries of varying lengths, and we show how to handle each case efficiently. We implemented our method and carried out experiments on synthetic and real data (stock price movements). We compared the method to sequential scanning, which is the only obvious competitor. The results were excellent: our method accelerated the search time from 3 times up to 100 times.
Bursty and Hierarchical Structure in Streams
, 2002
"... A fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. Email and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade aw ..."
Abstract

Cited by 260 (2 self)
 Add to MetaCart
A fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. Email and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The published literature in a particular research field can be seen to exhibit similar phenomena over a much longer time scale. Underlying much of the text mining work in this area is the following intuitive premise  that the appearance of a topic in a document stream is signaled by a "burst of activity," with certain features rising sharply in frequency as the topic emerges.
Evaluating Probabilistic Queries over Imprecise Data
 In SIGMOD
, 2003
"... Sensors are often employed to monitor continuously changing entities like locations of moving objects and temperature. The sensor readings are reported to a database system, and are subsequently used to answer queries. Due to continuous changes in these values and limited resources (e.g., network ..."
Abstract

Cited by 219 (41 self)
 Add to MetaCart
Sensors are often employed to monitor continuously changing entities like locations of moving objects and temperature. The sensor readings are reported to a database system, and are subsequently used to answer queries. Due to continuous changes in these values and limited resources (e.g., network bandwidth and battery power), the database may not be able to keep track of the actual values of the entities. Queries that use these old values may produce incorrect answers. However, if the degree of uncertainty between the actual data value and the database value is limited, one can place more confidence in the answers to the queries. More generally, query answers can be augmented with probabilistic guarantees of the validity of the answers. In this paper, we study probabilistic query evaluation based on uncertain data. A classification of queries is made based upon the nature of the result set. For each class, we develop algorithms for computing probabilistic answers, and provide efficient indexing and numeric solutions. We address the important issue of measuring the quality of the answers to these queries, and provide algorithms for efficiently pulling data from relevant sensors or moving objects in order to improve the quality of the executing queries. Extensive experiments
Efficient time series matching by wavelets
 Proc. of 15th Int'l Conf. on Data Engineering
, 1999
"... Time series stored as feature vectors can be indexed by multidimensional index trees like RTrees for fast retrieval. Due to the dimensionality curse problem, transformations are applied to time series to reduce the number of dimensions of the feature vectors. Different transformations like Discrete ..."
Abstract

Cited by 205 (1 self)
 Add to MetaCart
Time series stored as feature vectors can be indexed by multidimensional index trees like RTrees for fast retrieval. Due to the dimensionality curse problem, transformations are applied to time series to reduce the number of dimensions of the feature vectors. Different transformations like Discrete Fourier Transform (DFT), Discrete Wavelet Transform (DWT), KarhunenLoeve (KL) transform or Singular Value Decomposition (SVD) can be applied. While the use of DFT and KL transform or SVD have been studied in the literature, to our knowledge, there is no indepth study on the application of DWT. In this paper, we propose to use Haar Wavelet Transform for time series indexing. The major contributions are: (1) we show that Euclidean distance is preserved in the Haar transformed domain and no false dismissal will occur, (2) we show that Haar transform can outperform DFT through experiments, (3) a new similarity model is suggested to accommodate vertical shift of time series, and (4) a twophase method is proposed for efficientnearest neighbor query in time series databases. 1.
The TVtree  an index structure for highdimensional data
 VLDB Journal
, 1994
"... We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree struc ..."
Abstract

Cited by 193 (7 self)
 Add to MetaCart
We propose a file structure to index highdimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length' feature vectors. Finally we report simulation results, comparing the proposed structure with the R tree, which is one of the most successful methods for lowdimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses. Type of Contribution: New Index Structure, for highdimensionality feature spaces. Algorithms and performance measurements. Keywords: Spatial Index, Similarity Retrieval, Query by Content 1 Introduction Many applications require enhanced indexing, capable of performing similarity searching on several, nontraditional (`exotic') data types. The targ...
Adaptive Routing for Intermittently Connected Mobile Ad Hoc Networks
 in Proc. WOWMOM
, 2005
"... The vast majority of mobile ad hoc networking research makes a very large assumption: that communication can only take place between nodes that are simultaneously accessible within in the same connected cloud (i.e., that communication is synchronous). In reality, this assumption is likely to be a po ..."
Abstract

Cited by 112 (28 self)
 Add to MetaCart
The vast majority of mobile ad hoc networking research makes a very large assumption: that communication can only take place between nodes that are simultaneously accessible within in the same connected cloud (i.e., that communication is synchronous). In reality, this assumption is likely to be a poor one, particularly for sparsely or irregularly populated environments. In this paper, we present the ContextAware Routing (CAR) algorithm. CAR is a novel approach to the provision of asynchronous communication in partiallyconnected mobile ad hoc networks, based on the intelligent placement of messages. We discuss the details of the algorithm, and then present simulation results demonstrating that it is possible for nodes to exploit context information in making local decisions that lead to good delivery ratios and latencies with small overheads. 1
Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data
 Proc. 30th Int’l Conf. Very Large Data Bases (VLDB
, 2004
"... It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of un ..."
Abstract

Cited by 105 (20 self)
 Add to MetaCart
It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of uncertainty intervals that model each data item as a range and associated probability density function (pdf) rather than a single value has recently been proposed. Querying these uncertain data introduces imprecision into answers, in the form of probability values that specify the likeliness the answer satisfies the query. These queries are more expensive to evaluate than their traditional counterparts but are guaranteed to be correct and more informative due to the probabilities accompanying the answers. Although the answer probabilities are useful, for many applications, it is only necessary to know whether the probability exceeds a given threshold – we term these Probabilistic Threshold Queries (PTQ). In this paper we address the efficient computation of these types of queries. In particular, we develop two index structures and associated algorithms to efficiently answer PTQs. The first index scheme is based on the idea of augmenting uncertainty information to an Rtree. We establish the difficulty
Advanced Spectral Methods for Climatic Time Series
, 2001
"... The analysis of uni or multivariate time series provides crucial information to describe, understand, and predict climatic variability. The discovery and implementation of a number of novel methods for extracting useful information from time series has recently revitalized this classical eld of ..."
Abstract

Cited by 96 (30 self)
 Add to MetaCart
The analysis of uni or multivariate time series provides crucial information to describe, understand, and predict climatic variability. The discovery and implementation of a number of novel methods for extracting useful information from time series has recently revitalized this classical eld of study. Considerable progress has also been made in interpreting the information so obtained in terms of dynamical systems theory.
Online Data Mining for CoEvolving Time Sequences
 In Proceedings of the 16th International Conference on Data Engineering
, 2000
"... In many applications, the data of interest comprises multiple sequences that evolve over time. Examples include currency exchange rates, network traffic data. We develop a fast method to analyze such coevolving time sequences jointly to allow (a) estimation/forecasting of missing /delayed/future v ..."
Abstract

Cited by 66 (4 self)
 Add to MetaCart
In many applications, the data of interest comprises multiple sequences that evolve over time. Examples include currency exchange rates, network traffic data. We develop a fast method to analyze such coevolving time sequences jointly to allow (a) estimation/forecasting of missing /delayed/future values, (b) quantitative data mining,and (c) outlier detection. Our method, MUSCLES, adapts to changing correlations among time sequences. It can handle indefinitely long sequences efficiently using an incremental algorithm and requires only small amount of storage and less I/O operations. To make it scale for a large number of sequences, we present a variation, the Selective MUSCLES method and propose an efficient algorithm to reduce the problem size. Experiments on real datasets show that MUSCLES outperforms popular competitors in prediction accuracy up to 10 times, and discovers interesting correlations. Moreover, Selective MUSCLES scales up very well for large numbers of sequences, reducing response time up to 110 times over MUSCLES, and sometimes even improves the prediction quality.
Evolutionary Spectral Clustering by Incorporating Temporal Smoothness
, 2007
"... Evolutionary clustering is an emerging research area essential to important applications such as clustering dynamic Web and blog contents and clustering data streams. In evolutionary clustering, a good clustering result should fit the current data well, while simultaneously not deviate too dramatica ..."
Abstract

Cited by 62 (7 self)
 Add to MetaCart
Evolutionary clustering is an emerging research area essential to important applications such as clustering dynamic Web and blog contents and clustering data streams. In evolutionary clustering, a good clustering result should fit the current data well, while simultaneously not deviate too dramatically from the recent history. To fulfill this dual purpose, a measure of temporal smoothness is integrated in the overall measure of clustering quality. In this paper, we propose two frameworks that incorporate temporal smoothness in evolutionary spectral clustering. For both frameworks, we start with intuitions gained from the wellknown kmeans clustering problem, and then propose and solve corresponding cost functions for the evolutionary spectral clustering problems. Our solutions to the evolutionary spectral clustering problems provide more stable and consistent clustering results that are less sensitive to shortterm noises while at the same time are adaptive to longterm cluster drifts. Furthermore, we demonstrate that our methods provide the optimal solutions to the relaxed versions of the corresponding evolutionary kmeans clustering problems. Performance experiments over a number of real and synthetic data sets illustrate our evolutionary spectral clustering methods provide more robust clustering results that are not sensitive to noise and can adapt to data drifts.