Results 1 - 10
of
37
Clustering of Time Series Subsequences is Meaningless: Implications for Past and Future Research
- In Proc. of the 3rd IEEE International Conference on Data Mining
, 2003
"... Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algor ..."
Abstract
-
Cited by 59 (7 self)
- Add to MetaCart
Time series data is perhaps the most frequently encountered type of data examined by the data mining community. Clustering is perhaps the most frequently used data mining algorithm, being useful in it’s own right as an exploratory technique, and also as a subroutine in more complex data mining algorithms such as rule discovery, indexing, summarization, anomaly detection, and classification. Given these two facts, it is hardly surprising that time series clustering has attracted much attention. The data to be clustered can be in one of two formats: many individual time series, or a single time series, from which individual time series are extracted with a sliding window. Given the recent explosion of interest in streaming data and online algorithms, the latter case has received much attention. In this work we make a surprising claim. Clustering of streaming time series is completely meaningless. More concretely, clusters extracted from streaming time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising, since it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method which, based on the concept of time series motifs, is able to meaningfully cluster some streaming time series datasets.
Indexing large human-motion databases
- In Proc. 30th VLDB Conf
, 2004
"... Data-driven animation has become the industry standard for computer games and many animated movies and special effects. In particular, motion capture data recorded from live actors, is the most promising approach offered thus far for animating realistic human characters. However, the manipulation of ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
Data-driven animation has become the industry standard for computer games and many animated movies and special effects. In particular, motion capture data recorded from live actors, is the most promising approach offered thus far for animating realistic human characters. However, the manipulation of such data for general use and re-use is not yet a solved problem. Many of the existing techniques dealing with editing motion rely on indexing for annotation, segmentation, and re-ordering of the data. Euclidean distance is inappropriate for solving these indexing problems because of the inherent variability found in human motion. The limitations of Euclidean distance stems from the fact that it is very sensitive to distortions in the time axis. A partial solution to this problem, Dynamic Time Warping (DTW), aligns the time axis
Exploratory Medical Knowledge Discovery: Experiences and Issues
- SIGKDD Exploration
, 2003
"... The application of data mining and knowledge discovery techniques to medical and health datasets is a rewarding but highly challenging area. Not only are the datasets large, complex, heterogeneous, hierarchical, time-varying and of varying quality but there exists a substantial medical knowledge bas ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
The application of data mining and knowledge discovery techniques to medical and health datasets is a rewarding but highly challenging area. Not only are the datasets large, complex, heterogeneous, hierarchical, time-varying and of varying quality but there exists a substantial medical knowledge base which demands a robust collaboration between the data miner and the health professional(s) if useful information is to be extracted. This paper presents the experiences of the authors and others in applying exploratory data mining techniques to medical, health and clinical data. In so doing, it elicits a number of general issues and provides pointers to possible areas of future research in data mining and knowledge discovery more broadly.
Mining Patterns of Events in Students' Teamwork Data
- In Educational Data Mining Workshop, held in conjunction with Intelligent Tutoring Systems (ITS
, 2006
"... Abstract. It is difficult, but very important, to learn to work effectively as part of a team. One potentially invaluable source of information about the success, or problems, in the way that teams learn can be drawn from the electronic traces of their collaborations. The paper describes data mining ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Abstract. It is difficult, but very important, to learn to work effectively as part of a team. One potentially invaluable source of information about the success, or problems, in the way that teams learn can be drawn from the electronic traces of their collaborations. The paper describes data mining of student group interaction data to identify significant sequences of activity. Our goal is to build tools that can flag interaction sequences indicative of problems, so that we can use these to assist student teams in early recognition of problems. We also want tools that can identify patterns that are markers of success so that these might indicate improvements during the learning process. Our first challenge is to transform the raw data available in large quantities, preprocessing it into a suitable alphabet for use in data mining. Then, we need data mining algorithms that can properly account for the temporal nature of the data and the character of group interaction. We envisage that this may involve a two way process, where theories of effective group behaviour can drive the data mining and, in the opposite direction, that the data mining should provide results that are meaningful to groups wishing to improve their effectiveness. We report the results of our work in the context of a semester long software development project course.
Visualizing Time-Oriented Data -- A Systematic View
- COMPUTERS & GRAPHICS
, 2007
"... The analysis of time-oriented data is an important task in many application scenarios. In recent years, a variety of techniques for visualizing such data have been published. This variety makes it difficult for prospective users to select methods or tools that are useful for their particular task at ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
The analysis of time-oriented data is an important task in many application scenarios. In recent years, a variety of techniques for visualizing such data have been published. This variety makes it difficult for prospective users to select methods or tools that are useful for their particular task at hand. In this article, we develop and discuss a systematic view on the diversity of methods for visualizing time-oriented data. With the proposed categorization we try to untangle the visualization of time-oriented data, which is such an important concern in Visual Analytics. The categorization is not only helpful for users, but also for researchers to identify future tasks in Visual Analytics.
Distributed mining of spatio-temporal event patterns in sensor networks
, 2007
"... Abstract Many sensor network applications are concerned with discovering interesting patterns among observed real-world events. Often, only limited apriori knowledge exists about the patterns to be found eventually. Here, raw streams of sensor readings are collected at the sink for later offline ana ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Abstract Many sensor network applications are concerned with discovering interesting patterns among observed real-world events. Often, only limited apriori knowledge exists about the patterns to be found eventually. Here, raw streams of sensor readings are collected at the sink for later offline analysis – resulting in a large communication overhead. In this position paper, we explore the use of in-network data mining techniques to discover frequent event patterns and their spatial and temporal properties. With that approach, compact event patterns rather than raw data streams are sent to the sink. We also discuss various issues with the implementation of our proposal and report our experience with preliminary experiments.
Learning patterns in the dynamics of biological networks
- In KDD
, 2009
"... Our dynamic graph-based relational mining approach has been developed to learn structural patterns in biological networks as they change over time. The analysis of dynamic networks is important not only to understand life at the system-level, but also to discover novel patterns in other structural d ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Our dynamic graph-based relational mining approach has been developed to learn structural patterns in biological networks as they change over time. The analysis of dynamic networks is important not only to understand life at the system-level, but also to discover novel patterns in other structural data. Most current graph-based data mining approaches overlook dynamic features of biological networks, because they are focused on only static graphs. Our approach analyzes a sequence of graphs and discovers rules that capture the changes that occur between pairs of graphs in the sequence. These rules represent the graph rewrite rules that the first graph must go through to be isomorphic to the second graph. Then, our approach feeds the graph rewrite rules into a machine learning system that learns general transformation rules describing the types of changes that occur for a class of dynamic biological networks. The discovered graph-rewriting rules show how biological networks change over time, and the transformation rules show the repeated patterns in the structural changes. In this paper, we apply our approach to biological networks to evaluate our approach and to understand how the biosystems change over time. We evaluate our results using coverage and prediction metrics, and compare to biological literature.
Visualisation of Temporal Interval Association Rules
, 2000
"... . Temporal intervals and the interaction of interval-based events are fundamental in many domains including medicine, commerce, computer security and various types of normalcy analysis. In order to learn from temporal interval data we have developed a temporal interval association rule algorithm ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
. Temporal intervals and the interaction of interval-based events are fundamental in many domains including medicine, commerce, computer security and various types of normalcy analysis. In order to learn from temporal interval data we have developed a temporal interval association rule algorithm. In this paper, we will provide a definition for temporal interval association rules and present our visualisation techniques for viewing them. Visualisation techniques are particularly important because the complexity and volume of knowledge that is discovered during data mining often makes it difficult to comprehend. We adopt a circular graph for visualising a set of associations that allows underlying patterns in the associations to be identified. To visualize temporal relationships, a parallel coordinate graph for displaying the temporal relationships has been developed. 1

