Results 1 - 10
of
44
Probabilistic discovery of time series motifs
, 2003
"... Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of thi ..."
Abstract
-
Cited by 92 (19 self)
- Add to MetaCart
Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of this work were the poor scalability of the motif discovery algorithm, and the inability to discover motifs in the presence of noise. Here we address these limitations by introducing a novel algorithm inspired by recent advances in the problem of pattern discovery in biosequences. Our algorithm is probabilistic in nature, but as we show empirically and theoretically, it can find time series motifs with very high probability even in the presence of noise or “don’t care ” symbols. Not only is the algorithm fast, but it is an anytime algorithm, producing likely candidate motifs almost immediately, and gradually improving the quality of results over time.
Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research
- In Proc. of the 3rd IEEE International Conference on Data Mining
, 2003
"... Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algor ..."
Abstract
-
Cited by 58 (7 self)
- Add to MetaCart
Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algorithms such as rule discovery, indexing, summarization, anomaly detection, and classification. Given these two facts, it is hardly surprising that time series clustering has attracted much attention. The data to be clustered can be in one of two formats: many individual time series, or a single time series, from which individual time series are extracted with a sliding window. Given the recent explosion of interest in streaming data and online algorithms, the latter case has received much attention. In this work we make a surprising claim. Clustering of streaming time series is completely meaningless. More concretely, clusters extracted from streaming time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising, since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method which, based on the concept of time series motifs, is able to meaningfully cluster some streaming time series datasets.
Visually mining and monitoring massive time series
- In Proceedings of the 10 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2004
"... Moments before the launch of every space vehicle, engineering discipline specialists must make a critical go/no-go decision. The cost of a false positive, allowing a launch in spite of a fault, or a false negative, stopping a potentially successful launch, can be measured in the tens of millions of ..."
Abstract
-
Cited by 29 (9 self)
- Add to MetaCart
Moments before the launch of every space vehicle, engineering discipline specialists must make a critical go/no-go decision. The cost of a false positive, allowing a launch in spite of a fault, or a false negative, stopping a potentially successful launch, can be measured in the tens of millions of dollars, not including the cost in morale and other more intangible detriments. The Aerospace Corporation is responsible for providing engineering assessments critical to the go/no-go decision for every Department of Defense space vehicle. These assessments are made by constantly monitoring streaming telemetry data in the hours before launch. We will introduce VizTree, a novel time-series visualization tool to aid the Aerospace analysts who must make these engineering assessments. VizTree was developed at the University of California, Riverside and is unique in that the same tool is used for mining archival data and monitoring incoming live telemetry. The use of a single tool for both aspects of the task allows a natural and intuitive transfer of mined knowledge to the monitoring task. Our visualization approach works by transforming the time series into a symbolic representation, and encoding the data in a modified suffix tree in which the frequency and other properties of patterns are mapped onto colors and other visual properties. We demonstrate the utility of our system by comparing it with state-of-the-art batch algorithms on several real and synthetic datasets.
Symbolic Representation and Retrieval of Moving Object Trajectories
- In Proc. of the ACM SIGMM international workshop on multimedia information retrieval
, 2004
"... Similarity-based retrieval of moving object trajectory is useful to many applications- GPS systems, sport and surveillance video analysis. However, due to sensor failures, errors in detection techniques, or different sampling rates, noises, local shifts and scales may appear in the trajectory record ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Similarity-based retrieval of moving object trajectory is useful to many applications- GPS systems, sport and surveillance video analysis. However, due to sensor failures, errors in detection techniques, or different sampling rates, noises, local shifts and scales may appear in the trajectory records. Hence, it is difficult to design a robust and fast similarity measure for similarity-based retrieval in a large database. In this paper, normalized edit distance (NED) is proposed to measure the similarity between two trajectories. We evaluate the efficacy of NED and compare it with those of Euclidean distance, Dynamic Time Warping (DTW), and Longest Common Subsequences (LCSS), showing that NED is more robust and accurate for trajectories that contain noise and local time shifting. Furthermore, in order to improve the retrieval efficiency, we propose a novel representation of trajectories, called movement pattern strings, which convert the trajectories into a symbolic representation. Movement pattern strings encode both the movement direction and the movement distance information of the trajectories. The distances that are computed in a symbolic space are lower bounds of the distances of original trajectory data, which guarantees that no false dismissals will be introduced using movement pattern strings to retrieve trajectories. Finally, we define a modified frequency distance for frequency vectors that are obtained from movement pattern strings to reduce the dimensionality of movement pattern strings and computation cost of NED. The experimental results show that the cost of retrieving similar trajectories can be greatly reduced when the modified frequency distance is used as a filter. 1
Generating English Summaries of Time Series Data Using the Gricean Maxims
- In Proc. KDD’03
, 2003
"... We are developing technology for generating English textual summaries of time-series data, in three domains: weather forecasts, gas-turbine sensor readings, and hospital intensive care data. Our weather-forecast generator is currently operational and being used daily by a meteorological company. We ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
We are developing technology for generating English textual summaries of time-series data, in three domains: weather forecasts, gas-turbine sensor readings, and hospital intensive care data. Our weather-forecast generator is currently operational and being used daily by a meteorological company. We generate summaries in three steps: (a) selecting the most important trends and patterns to communicate; (b) mapping these patterns onto words and phrases; and (c) generating actual texts based on these words and phrases. In this paper we focus on the first step, (a), selecting the information to communicate, and describe how we perform this using modified versions of standard data analysis algorithms such as segmentation. The modifications arose out of empirical work with users and domain experts, and in fact can all be regarded as applications of the Gricean maxims of Quality, Quantity, Relevance, and Manner, which describe how a cooperative speaker should behave in order to help a hearer correctly interpret a text. The Gricean maxims are perhaps a key element of adapting data analysis algorithms for effective communication of information to human users, and should be considered by other researchers interested in communicating data to human users.
An inductive database for mining temporal patterns in event sequences
- In Proceedings of the workshop on Mining Spatial and Temporal Data
, 2005
"... Data mining aims at discovering previously unknown and potentially ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Data mining aims at discovering previously unknown and potentially
Animated People Textures
, 2004
"... This paper introduces a technique to create controllable animations of realistic figures of people starting from live-action video. The described synthesis of such `people textures' extends previous work in video textures to allow the `texturing' of human movement through human-specific feature extr ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper introduces a technique to create controllable animations of realistic figures of people starting from live-action video. The described synthesis of such `people textures' extends previous work in video textures to allow the `texturing' of human movement through human-specific feature extraction, coupled with careful data mining. In our approach, the video database is pre-processed to classify the motion of the human figures and identify the movements of repeated sequences using data motifs. Then, based on user input, novel sequences of video are computed with edits that are selected based on the raw footage found in the video database and performed based on morphing between segments to generate the transitions automatically. Applications for such animated people textures include video based animations for electronic games and creating background elements and special effects for movies.
Detecting Time Series Motifs Under Uniform Scaling ABSTRACT
"... Time series motifs are approximately repeated patterns found within the data. Such motifs have utility for many data mining algorithms, including rule-discovery, novelty-detection, summarization and clustering. Since the formalization of the problem and the introduction of efficient linear time algo ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Time series motifs are approximately repeated patterns found within the data. Such motifs have utility for many data mining algorithms, including rule-discovery, novelty-detection, summarization and clustering. Since the formalization of the problem and the introduction of efficient linear time algorithms, motif discovery has been successfully applied to many domains, including medicine, motion capture, robotics and meteorology. In this work we show that most previous applications of time series motifs have been severely limited by the definition’s brittleness to even slight changes of uniform scaling, the speed at which the patterns develop. We introduce a new algorithm that allows discovery of time series motifs with invariance to uniform scaling, and show that it produces objectively superior results in several important domains. Apart from being more general than all other motif discovery algorithms, a further contribution of our work is that it is simpler than previous approaches, in particular we have drastically reduced the number of parameters that need to be specified.
Mining frequent and periodic association patterns
- Dartmouth College, Coomputer Science and Engineering, Tech Report: TR
, 2005
"... Profiling the clients ’ movement behaviors is useful for mobility modeling, anomaly detection, and location predication. In this paper, we study clients ’ frequent and periodic movement patterns in a campus wireless network. We use offline data-mining algorithms to discover patterns from clients ’ a ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Profiling the clients ’ movement behaviors is useful for mobility modeling, anomaly detection, and location predication. In this paper, we study clients ’ frequent and periodic movement patterns in a campus wireless network. We use offline data-mining algorithms to discover patterns from clients ’ association history, and analyze the reported patterns using statistical methods. Many of our results reflect the common characteristics of a typical academic campus, though we also observed some unusual association patterns. There are two challenges: one is to remove noise from data for efficient pattern discovery, and the other is to interpret discovered patterns. We address the first challenge using a heuristic-based approach applying domain knowledge. The second issue is harder to address because we do not have the knowledge of people’s activities, but nonetheless we could make reasonable interpretation of the common patterns. 1
Viztree: a tool for visually mining and monitoring massive time series databases
- In Proceedings of International Conference on Very Large Data Bases
, 2004
"... Moments before the launch of every space vehicle, engineering discipline specialists must make a critical go/no-go decision. The cost of a false positive, allowing a launch in spite of a fault, or a false negative, stopping a potentially successful launch, can be measured in the tens of millions of ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Moments before the launch of every space vehicle, engineering discipline specialists must make a critical go/no-go decision. The cost of a false positive, allowing a launch in spite of a fault, or a false negative, stopping a potentially successful launch, can be measured in the tens of millions of dollars, not including the cost in morale and other more intangible detriments. The Aerospace Corporation is responsible for providing engineering assessments critical to the go/no-go decision for every Department of Defense (DoD) launch vehicle. These assessments are made by constantly monitoring streaming telemetry data in the hours before launch. For this demonstration, we will introduce VizTree, a novel time-series visualization tool to aid the Aerospace analysts who must make these engineering assessments. VizTree was developed at the University of California, Riverside and is unique in that the same tool is used for mining archival data and monitoring incoming live telemetry. Unlike other time series visualization tools, VizTree can scale to very large databases, giving it the potential to be a generally useful data mining and database tool. 1.

