Results 1  10
of
19
Local correlation tracking in time series
 In ICDM
, 2006
"... We address the problem of capturing and tracking local correlations among time evolving time series. Our approach is based on comparing the local autocovariance matrices (via their spectral decompositions) of each series and generalizes the notion of linear crosscorrelation. In this way, it is pos ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
We address the problem of capturing and tracking local correlations among time evolving time series. Our approach is based on comparing the local autocovariance matrices (via their spectral decompositions) of each series and generalizes the notion of linear crosscorrelation. In this way, it is possible to concisely capture a wide variety of local patterns or trends. Our method produces a general similarity score, which evolves over time, and accurately reflects the changing relationships. Finally, it can also be estimated incrementally, in a streaming setting. We demonstrate its usefulness, robustness and efficiency on a wide range of real datasets. 1
Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking
"... We address the problem of preserving privacy in streams, which has received surprisingly limited attention. For static data, a wellstudied and widely used approach is based on random perturbation of the data values. However, streams pose additional challenges. First, analysis of the data has to be ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
We address the problem of preserving privacy in streams, which has received surprisingly limited attention. For static data, a wellstudied and widely used approach is based on random perturbation of the data values. However, streams pose additional challenges. First, analysis of the data has to be performed incrementally, using limited processing time and buffer space, making batch approaches unsuitable. Second, the characteristics of streams evolve over time. Consequently, approaches based on global analysis of the data are not adequate. We show that it is possible to efficiently and effectively track the correlation and autocorrelation structure of multivariate streams and leverage it to add noise which maximally preserves privacy, in the sense that it is very hard to remove. Our techniques achieve much better results than previous static, global approaches, while requiring limited processing time and memory. We provide both a mathematical analysis and experimental evaluation on real data to validate the correctness, efficiency, and effectiveness of our algorithms. 1.
Scalable algorithms for distribution search
 In ICDM
, 2009
"... Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, we want to fi ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, we want to find similar clouds (i.e., distributions), to discover patterns, rules and outlier clouds. For example, consider the numerical case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution of 2d points (one for each item he/she bought). We want to find similar users, e.g., for market segmentation, anomaly/fraud detection. We propose to address this problem and present DSearch, which includes fast and effective algorithms for similarity search in large distribution datasets. Our main contributions are (1) approximate KL divergence, which can speed up cloudsimilarity computations, (2) multistep sequential scan, which efficiently prunes a significant number of search candidates and leads to a direct reduction in the search cost. We also introduce an extended version of DSearch: (3) timeseries distribution mining, which finds similar subsequences in timeseries distribution datasets. Extensive experiments on real multidimensional datasets show that our solution achieves up to 2,300 faster wallclock time over the naive implementation while it does not sacrifice accuracy. 1
Efficient Processing of Warping Time Series Join of Motion Capture Data
"... Abstract — Discovering nontrivial matching subsequences from two time series is very useful in synthesizing novel time series. This can be applied to applications such as motion synthesis where smooth and natural motion sequences are often required to be generated from existing motion sequences. We ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract — Discovering nontrivial matching subsequences from two time series is very useful in synthesizing novel time series. This can be applied to applications such as motion synthesis where smooth and natural motion sequences are often required to be generated from existing motion sequences. We first address this problem by defining it as a problem of lεjoin over two time series. Given two time series, the goal of lεjoin is to find those nontrivial matching subsequences by detecting maximal lconnections from the εmatching matrix of the two time series. Given a querying motion sequence, the lεjoin can be applied to retrieve all connectable motion sequences from a database of motion sequences. To support efficient lεjoin of time series, we propose a twostep filterandrefine algorithm, called Warping Time Series Join (WTSJ) algorithm. The filtering step serves to prune those sparse regions of the εmatching matrix where there are no maximal lconnections without incurring costly computation. The refinement step serves to detect closed lconnections within regions that cannot be pruned by the filtering step. To speed up the computation of εmatching matrix, we propose a blockbased time series summarization method, based on which the blockwise εmatching matrix is first computed. Lots of pairwise distance computation of elements can then be avoided by applying the filtering algorithm on the blockwise εmatching matrix. Extensive experiments on lεjoin of motion capture sequences are conducted. The results confirm the efficiency and effectiveness of our proposed algorithm in processing lεjoin of motion capture time series. I.
Online Detecting and Predicting Special Patterns over Financial Data Streams
"... Abstract: Online detecting special patterns over financial data streams is an interesting and significant work. Existing many algorithms take it as a subsequence similarity matching problem. However, pattern detection on streaming time series is naturally expensive by this means. An efficient segmen ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract: Online detecting special patterns over financial data streams is an interesting and significant work. Existing many algorithms take it as a subsequence similarity matching problem. However, pattern detection on streaming time series is naturally expensive by this means. An efficient segmenting algorithm ONSP (ONline Segmenting and Pruning) is proposed, which is used to find the end points of special patterns. Moreover, a novel metric distance function is introduced which more agrees with human perceptions of pattern similarity. During the process, our system presents a pattern matching algorithm to efficiently match possible emerging patterns among data streams, and a probability prediction approach to predict the possible patterns which have not emerged in the system. Experimental results show that these approaches are effective and efficient for online pattern detecting and predicting over thousands of financial data streams.
M (2011) DSearch: an efficient and exact search algorithm for large distribution sets. Knowl Inf Syst 29(1):131–157
"... Abstract. Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, we ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, we want to find similar clouds (i.e., distributions), to discover patterns, rules and outlier clouds. For example, consider the numerical case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution of 2d points (one for each item he/she bought). We want to find similar users, e.g., for market segmentation or anomaly/fraud detection. We propose to address this problem and present DSearch, which includes fast and effective algorithms for similarity search in large distribution datasets. Our main contributions are (1) approximate KL divergence, which can speed up cloudsimilarity computations, (2) multistep sequential scan, which efficiently prunes a significant number of search candidates and leads to a direct reduction in the search cost. We also introduce an extended version of DSearch: (3) timeseries distribution mining, which finds similar subsequences in timeseries distribution datasets. Extensive experiments on real multidimensional datasets show that our solution achieves a wall clock time up to 2,300 times faster than the naive implementation without sacrificing accuracy.
Pgg: An online pattern based approach for stream variation management
 J. Comput. Sci. Technol
"... Abstract Many database applications require efficient processing data streams with value variations and fluctuant sampling frequency. The variations typically imply fundamental features of the stream and important domain knowledge for the underlying objects. In some data streams, successive events ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract Many database applications require efficient processing data streams with value variations and fluctuant sampling frequency. The variations typically imply fundamental features of the stream and important domain knowledge for the underlying objects. In some data streams, successive events seem to recur in a certain time interval, but the data indeed evolves with tiny differences as time elapses. This feature, so called pseudo periodicity, poses a new challenge to stream variation management. This study focuses on the online management for variations over such streams. The idea can be applied to many scenarios such as patient vital signal monitoring in medical applications. This paper proposes a new method named Pattern Growth Graph (PGG) to detect and manage variations over evolving streams with following features: 1) adopts the wavepattern to capture the major information of data evolution and represent them compactly; 2) detects the variations in a single pass over the stream with the help of wavepattern matching algorithm; 3) only stores different segments of the pattern for incoming stream, and hence substantially compress the data without losing important information; 4) distinguishes meaningful data changes from noise and reconstructs the stream with acceptable accuracy. Extensive experiments on real datasets containing millions of data items, as well as a prototype system, are carried out to demonstrate the feasibility and effectiveness of the proposed scheme.
Time Series Data Mining Methods: A Review
, 2015
"... Today, real world time series data sets can take a size up to a trillion observations and even more. Data miners ’ task is it to detect new information that is hidden in this massive amount of data. While well known techniques for data mining in cross sections have been developed, time series data m ..."
Abstract
 Add to MetaCart
Today, real world time series data sets can take a size up to a trillion observations and even more. Data miners ’ task is it to detect new information that is hidden in this massive amount of data. While well known techniques for data mining in cross sections have been developed, time series data mining methods are not as sophisticated and established yet. Large time series bring along problems like very high dimensionality and up to today, researchers haven’t agreed on best practices in this regard. This review gives an overview of the challenges of large time series and the proposed problem solving approaches from time series data mining community. We illustrate the most important techniques with Google trends data. Moreover, we review current research directions and point out open research questions.