Results 1 - 10
of
19
Local correlation tracking in time series
- In ICDM
, 2006
"... We address the problem of capturing and tracking local correlations among time evolving time series. Our approach is based on comparing the local auto-covariance matrices (via their spectral decompositions) of each series and generalizes the notion of linear cross-correlation. In this way, it is pos ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
(Show Context)
We address the problem of capturing and tracking local correlations among time evolving time series. Our approach is based on comparing the local auto-covariance matrices (via their spectral decompositions) of each series and generalizes the notion of linear cross-correlation. In this way, it is possible to concisely capture a wide variety of local patterns or trends. Our method produces a general similarity score, which evolves over time, and accurately reflects the changing relationships. Finally, it can also be estimated incrementally, in a streaming setting. We demonstrate its usefulness, robustness and efficiency on a wide range of real datasets. 1
Hiding in the Crowd: Privacy Preservation on Evolving Streams through Correlation Tracking
"... We address the problem of preserving privacy in streams, which has received surprisingly limited attention. For static data, a well-studied and widely used approach is based on random perturbation of the data values. However, streams pose additional challenges. First, analysis of the data has to be ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
(Show Context)
We address the problem of preserving privacy in streams, which has received surprisingly limited attention. For static data, a well-studied and widely used approach is based on random perturbation of the data values. However, streams pose additional challenges. First, analysis of the data has to be performed incrementally, using limited processing time and buffer space, making batch approaches unsuitable. Second, the characteristics of streams evolve over time. Consequently, approaches based on global analysis of the data are not adequate. We show that it is possible to efficiently and effectively track the correlation and autocorrelation structure of multivariate streams and leverage it to add noise which maximally preserves privacy, in the sense that it is very hard to remove. Our techniques achieve much better results than previous static, global approaches, while requiring limited processing time and memory. We provide both a mathematical analysis and experimental evaluation on real data to validate the correctness, efficiency, and effectiveness of our algorithms. 1.
Scalable algorithms for distribution search
- In ICDM
, 2009
"... Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, we want to fi ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, we want to find similar clouds (i.e., distributions), to discover patterns, rules and outlier clouds. For example, consider the numerical case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution of 2-d points (one for each item he/she bought). We want to find similar users, e.g., for market segmentation, anomaly/fraud detection. We propose to address this problem and present D-Search, which includes fast and effective algorithms for similarity search in large distribution datasets. Our main contributions are (1) approximate KL divergence, which can speed up cloud-similarity computations, (2) multi-step sequential scan, which efficiently prunes a significant number of search candidates and leads to a direct reduction in the search cost. We also introduce an extended version of D-Search: (3) time-series distribution mining, which finds similar subsequences in time-series distribution datasets. Extensive experiments on real multi-dimensional datasets show that our solution achieves up to 2,300 faster wallclock time over the naive implementation while it does not sacrifice accuracy. 1
Efficient Processing of Warping Time Series Join of Motion Capture Data
"... Abstract — Discovering non-trivial matching subsequences from two time series is very useful in synthesizing novel time series. This can be applied to applications such as motion synthesis where smooth and natural motion sequences are often required to be generated from existing motion sequences. We ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract — Discovering non-trivial matching subsequences from two time series is very useful in synthesizing novel time series. This can be applied to applications such as motion synthesis where smooth and natural motion sequences are often required to be generated from existing motion sequences. We first address this problem by defining it as a problem of l-ε-join over two time series. Given two time series, the goal of l-ε-join is to find those non-trivial matching subsequences by detecting maximal l-connections from the ε-matching matrix of the two time series. Given a querying motion sequence, the l-ε-join can be applied to retrieve all connectable motion sequences from a database of motion sequences. To support efficient l-ε-join of time series, we propose a two-step filter-and-refine algorithm, called Warping Time Series Join (WTSJ) algorithm. The filtering step serves to prune those sparse regions of the ε-matching matrix where there are no maximal l-connections without incurring costly computation. The refinement step serves to detect closed l-connections within regions that cannot be pruned by the filtering step. To speed up the computation of ε-matching matrix, we propose a blockbased time series summarization method, based on which the block-wise ε-matching matrix is first computed. Lots of pairwise distance computation of elements can then be avoided by applying the filtering algorithm on the block-wise ε-matching matrix. Extensive experiments on l-ε-join of motion capture sequences are conducted. The results confirm the efficiency and effectiveness of our proposed algorithm in processing l-ε-join of motion capture time series. I.
Online Detecting and Predicting Special Patterns over Financial Data Streams
"... Abstract: Online detecting special patterns over financial data streams is an interesting and significant work. Existing many algorithms take it as a subsequence similarity matching problem. However, pattern detection on streaming time series is naturally expensive by this means. An efficient segmen ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract: Online detecting special patterns over financial data streams is an interesting and significant work. Existing many algorithms take it as a subsequence similarity matching problem. However, pattern detection on streaming time series is naturally expensive by this means. An efficient segmenting algorithm ONSP (ONline Segmenting and Pruning) is proposed, which is used to find the end points of special patterns. Moreover, a novel metric distance function is introduced which more agrees with human perceptions of pattern similarity. During the process, our system presents a pattern matching algorithm to efficiently match possible emerging patterns among data streams, and a probability prediction approach to predict the possible patterns which have not emerged in the system. Experimental results show that these approaches are effective and efficient for online pattern detecting and predicting over thousands of financial data streams.
M (2011) D-Search: an efficient and exact search algorithm for large distribution sets. Knowl Inf Syst 29(1):131–157
"... Abstract. Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, we ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Distribution data naturally arise in countless domains, such as meteorology, biology, geology, industry and economics. However, relatively little attention has been paid to data mining for large distribution sets. Given n distributions of multiple categories and a query distribution Q, we want to find similar clouds (i.e., distributions), to discover patterns, rules and outlier clouds. For example, consider the numerical case of sales of items, where, for each item sold, we record the unit price and quantity; then, each customer is represented as a distribution of 2-d points (one for each item he/she bought). We want to find similar users, e.g., for market segmentation or anomaly/fraud detection. We propose to address this problem and present D-Search, which in-cludes fast and effective algorithms for similarity search in large distribution datasets. Our main contributions are (1) approximate KL divergence, which can speed up cloud-similarity computa-tions, (2) multi-step sequential scan, which efficiently prunes a significant number of search can-didates and leads to a direct reduction in the search cost. We also introduce an extended version of D-Search: (3) time-series distribution mining, which finds similar subsequences in time-series distribution datasets. Extensive experiments on real multi-dimensional datasets show that our so-lution achieves a wall clock time up to 2,300 times faster than the naive implementation without sacrificing accuracy.
Pgg: An online pattern based approach for stream variation management
- J. Comput. Sci. Technol
"... Abstract Many database applications require efficient processing data streams with value variations and fluctuant sampling frequency. The variations typically imply fundamental features of the stream and im-portant domain knowledge for the underlying objects. In some data streams, successive events ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract Many database applications require efficient processing data streams with value variations and fluctuant sampling frequency. The variations typically imply fundamental features of the stream and im-portant domain knowledge for the underlying objects. In some data streams, successive events seem to recur in a certain time interval, but the data indeed evolves with tiny differences as time elapses. This fea-ture, so called pseudo periodicity, poses a new challenge to stream variation management. This study fo-cuses on the online management for variations over such streams. The idea can be applied to many sce-narios such as patient vital signal monitoring in medical applications. This paper proposes a new method named Pattern Growth Graph (PGG) to detect and manage variations over evolving streams with follow-ing features: 1) adopts the wave-pattern to capture the major information of data evolution and represent them compactly; 2) detects the variations in a single pass over the stream with the help of wave-pattern matching algorithm; 3) only stores different segments of the pattern for incoming stream, and hence sub-stantially compress the data without losing important information; 4) distinguishes meaningful data changes from noise and reconstructs the stream with acceptable accuracy. Extensive experiments on real datasets containing millions of data items, as well as a prototype system, are carried out to demonstrate the feasibility and effectiveness of the proposed scheme.
Time Series Data Mining Methods: A Review
, 2015
"... Today, real world time series data sets can take a size up to a trillion observations and even more. Data miners ’ task is it to detect new information that is hidden in this massive amount of data. While well known techniques for data mining in cross sections have been developed, time series data m ..."
Abstract
- Add to MetaCart
Today, real world time series data sets can take a size up to a trillion observations and even more. Data miners ’ task is it to detect new information that is hidden in this massive amount of data. While well known techniques for data mining in cross sections have been developed, time series data mining methods are not as sophisticated and established yet. Large time series bring along problems like very high dimensionality and up to today, researchers haven’t agreed on best practices in this regard. This review gives an overview of the challenges of large time series and the proposed problem solving approaches from time series data mining community. We illustrate the most impor-tant techniques with Google trends data. Moreover, we review current research directions and point out open research questions.