Results 1 - 10
of
50
Bursty and Hierarchical Structure in Streams
, 2002
"... A fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. E-mail and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade aw ..."
Abstract
-
Cited by 196 (2 self)
- Add to MetaCart
A fundamental problem in text data mining is to extract meaningful structure from document streams that arrive continuously over time. E-mail and news articles are two natural examples of such streams, each characterized by topics that appear, grow in intensity for a period of time, and then fade away. The published literature in a particular research field can be seen to exhibit similar phenomena over a much longer time scale. Underlying much of the text mining work in this area is the following intuitive premise --- that the appearance of a topic in a document stream is signaled by a "burst of activity," with certain features rising sharply in frequency as the topic emerges.
Learning and inferring transportation routines
- Artificial Intelligence
"... This paper introduces a hierarchical Markov model that can learn and infer a user’s daily movements through the community. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user’s mode of transportation o ..."
Abstract
-
Cited by 170 (18 self)
- Add to MetaCart
This paper introduces a hierarchical Markov model that can learn and infer a user’s daily movements through the community. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user’s mode of transportation or her goal. We apply Rao-Blackwellised particle filters for efficient inference both at the low level and at the higher levels of the hierarchy. Significant locations such as goals or locations where the user frequently changes mode of transportation are learned from GPS data logs without requiring any manual labeling. We show how to detect abnormal behaviors (e.g. taking a wrong bus) by concurrently tracking his activities with a trained and a prior model. Experiments show that our model is able to accurately predict the goals of a person and to recognize situations in which the user performs unknown activities.
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
- SIGKDD'02
, 2002
"... ... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in ..."
Abstract
-
Cited by 169 (41 self)
- Add to MetaCart
... mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details. To illustrate our point
Web Usage Mining: Discovery and Application of Interestin Patterns from Web Data
, 2000
"... Web Usage Mining is the application of data mining techniques to Web clickstream data in order to extract usage patterns. As Web sites continue to grow in size and complexity, the results of Web Usage Mining have become critical for a number of applications such as Web site design, business and mark ..."
Abstract
-
Cited by 57 (0 self)
- Add to MetaCart
Web Usage Mining is the application of data mining techniques to Web clickstream data in order to extract usage patterns. As Web sites continue to grow in size and complexity, the results of Web Usage Mining have become critical for a number of applications such as Web site design, business and marketing decision support, personalization, usability studies, and network trac analysis. The two major challenges involved in Web Usage Mining are preprocessing the raw data to provide an accurate picture of how a site is being used, and ltering the results of the various data mining algorithms in order to present only the rules and patterns that are potentially interesting. This thesis develops and tests an architecture and algorithms for performing Web Usage Mining. An evidence combination framework referred to as the information lter is developed to compare and combine usage, content, and structure information about a Web site. The information lter automatically identi es the discovered ...
Mining Asynchronous Periodic Patterns in Time Series Data
- Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD
, 2000
"... Periodicy detection in time series data is a challenging problem of great importance in many applications. ..."
Abstract
-
Cited by 50 (8 self)
- Add to MetaCart
Periodicy detection in time series data is a challenging problem of great importance in many applications.
Online Novelty Detection on Temporal Sequences
, 2003
"... Novelty detection, or anomaly detection, on temporal sequences has increasingly attracted attention from researchers in different areas. In this paper, we present a new framework for online novelty detection on temporal sequences . This framework includes a mechanism for associating each detectio ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
Novelty detection, or anomaly detection, on temporal sequences has increasingly attracted attention from researchers in different areas. In this paper, we present a new framework for online novelty detection on temporal sequences . This framework includes a mechanism for associating each detection result with a confidence value. Based on this framework, we develop a concrete online detection algorithm, by modeling the temporal sequence using an online support vector regression algorithm. Experiments on both synthetic and real world data are performed to demonstrate the promising performance of our proposed detection algorithm.
A Unifying Framework for Detecting Outliers and Change Points from Non-Stationary Time Series Data
- In Proc. of the Eighth ACM SIGKDD, ACM
, 2002
"... We m'e concerned vith the issues of outlier detection and change point detection from a data stream. In the area of data mining, there have been increased interest in these issues since the former is related to fraud detection, rare event discovery, etc., vhile the latter is related to event/trend c ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
We m'e concerned vith the issues of outlier detection and change point detection from a data stream. In the area of data mining, there have been increased interest in these issues since the former is related to fraud detection, rare event discovery, etc., vhile the latter is related to event/trend change detection, activity monitoring, etc. Specifically, it is important to consider the situation where the data source is non-stationary, since the nature of data source may change over time in real applications. Although in most previous work outlier detection and change point detection have not been related explicitly, this paper presents a unifying frame- vork for dealing vith both of them on the basis of the theory of on-line learning of non-stationary time series. In this framevork a probabilistic model of the data source is inerementally learned using an on-line discounting learning algorithm, which can track the changing data source adaplively by forgetting the effect of past data gradually. Then the score for any given data is calculated to measure its deviation from the learned model, vith a higher score indicating a high possibility of being an outlier. Further change points in a data stream are detected by applying this scoring method into a time series of moving averaged losses for prediction using the learned model. Specifically ve develop an efficient algorithms for on-line discounting learning of auto-regression models from time series data, and demonstrate the validity of our framework through simulation and experimental applications to stock market data analysis.
Contour Map Matching for Event Detection in Sensor Networks
- In SIGMOD
, 2006
"... Many sensor network applications, such as object tracking and disaster monitoring, require effective techniques for event detection. In this paper, we propose a novel event detection mechanism based on matching the contour maps of in-network sensory data distribution. Our key observation is that eve ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
Many sensor network applications, such as object tracking and disaster monitoring, require effective techniques for event detection. In this paper, we propose a novel event detection mechanism based on matching the contour maps of in-network sensory data distribution. Our key observation is that events in sensor networks can be abstracted into spatio-temporal patterns of sensory data and that pattern matching can be done efficiently through contour map matching. Therefore, we propose simple SQL extensions to allow users to specify common types of events as patterns in contour maps and study energy-efficient techniques of contour map construction and maintenance for our patternbased event detection. Our experiments with synthetic workloads derived from a real-world coal mine surveillance application validate the effectiveness and efficiency of our approach. 1.

