Results 11  20
of
316
Indexing SpatioTemporal Trajectories with Chebyshev Polynomials
 Proc. 2004 SIGMOD, toappear
"... In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate ..."
Abstract

Cited by 83 (0 self)
 Add to MetaCart
(Show Context)
In this thesis, we investigate the subject of indexing large collections of spatiotemporal trajectories for similarity matching. Our proposed technique is to first mitigate the dimensionality curse problem by approximating each trajectory with a low order polynomiallike curve, and then incorporate a multidimensional index into the reduced space of polynomial coefficients. There are many possible ways to choose the polynomial, including Fourier transforms, splines, nonlinear regressions, etc. Some of these possibilities have indeed been studied before. We hypothesize that one of the best approaches is the polynomial that minimizes the maximum deviation from the true value, which is called the minimax polynomial. Minimax approximation is particularly meaningful for indexing because in a branchandbound search (i.e., for finding nearest neighbours), the smaller the maximum deviation, the more pruning opportunities there exist. In general, among all the polynomials of the same degree, the optimal minimax polynomial is very hard to compute. However, it has been shown that the Chebyshev approximation is almost identical to the optimal minimax polynomial, and is easy to compute [32]. Thus, we shall explore how to use
Similarity search over time series data using wavelets
 In ICDE
, 2002
"... We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over highdimensional timeseries data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this applicatio ..."
Abstract

Cited by 82 (0 self)
 Add to MetaCart
We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over highdimensional timeseries data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this application is the Haar wavelet. In this work, we observe that a large class of wavelet transformations (not only orthonormal wavelets but also biorthonormal wavelets)can be used to support similarity search. This class includes the most popular and most effective wavelets being used in image compression. We present a detailed performance study of the effects of using different wavelets on the performance of similarity search for timeseries data. We include several wavelets that outperform both the Haar wavelet and the best known nonwavelet transformations for this application. To ensure our results are usable by an application engineer, we also show how to configure an indexing strategy for the best performing transformations. Finally, we identify classes of data that can be indexed efficiently using these wavelet transformations. 1.
Making Timeseries Classification More Accurate Using Learned Constraints
, 2004
"... It has long been known that Dynamic Time Warping (DTW) is superior to Euclidean distance for classification and clustering of time series. However, until lately, most research has utilized Euclidean distance because it is more efficiently calculated. A recently introduced technique that greatly miti ..."
Abstract

Cited by 82 (18 self)
 Add to MetaCart
It has long been known that Dynamic Time Warping (DTW) is superior to Euclidean distance for classification and clustering of time series. However, until lately, most research has utilized Euclidean distance because it is more efficiently calculated. A recently introduced technique that greatly mitigates DTWs demanding CPU time has sparked a flurry of research activity. However, the technique and its many extensions still only allow DTW to be applied to moderately large datasets. In addition, almost all of the research on DTW has focused exclusively on speeding up its calculation; there has been little work done on improving its accuracy. In this work, we target the accuracy aspect of DTW performance and introduce a new framework that learns arbitrary constraints on the warping path of the DTW calculation. Apart from improving the accuracy of classification, our technique as a side effect speeds up DTW by a wide margin as well. We show the utility of our approach on datasets from diverse domains and demonstrate significant gains in accuracy and efficiency.
Identifying similarities, periodicities and bursts for online search queries
 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
"... We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., ‘Thanksgiving ’ or ‘Christmas gifts’) where the elements of the time series are the number of times that a query is issued o ..."
Abstract

Cited by 80 (4 self)
 Add to MetaCart
(Show Context)
We present several methods for mining knowledge from the query logs of the MSN search engine. Using the query logs, we build a time series for each query word or phrase (e.g., ‘Thanksgiving ’ or ‘Christmas gifts’) where the elements of the time series are the number of times that a query is issued on a day. All of the methods we describe use sequences of this form and can be applied to time series data generally. Our primary goal is the discovery of semantically similar queries and we do so by identifying queries with similar demand patterns. Utilizing the best Fourier coefficients and the energy of the omitted components, we improve upon the stateoftheart in timeseries similarity matching. The extracted sequence features are then organized in an efficient metric tree index structure. We also demonstrate how to efficiently and accurately discover the important periods in a timeseries. Finally we propose a simple but effective method for identification of bursts (long or shortterm). Using the burst information extracted from a sequence, we are able to efficiently perform ’querybyburst ’ on the database of timeseries. We conclude the presentation with the description of a tool that uses the described methods, and serves as an interactive exploratory data discovery tool for the MSN query database. 1.
Capturing SensorGenerated Time Series with Quality Guarantees
 In ICDE
, 2003
"... We are interested in capturing time series generated by small wireless electronic sensors. Batteryoperated sensors must avoid heavy use of their wireless radio which is a key cause of energy dissipation. When many sensors transmit, the resources of the recipient of the data are taxed; hence, limiti ..."
Abstract

Cited by 76 (11 self)
 Add to MetaCart
We are interested in capturing time series generated by small wireless electronic sensors. Batteryoperated sensors must avoid heavy use of their wireless radio which is a key cause of energy dissipation. When many sensors transmit, the resources of the recipient of the data are taxed; hence, limiting communication will benefit the recipient as well. In our paper we show how time series generated by sensors can be captured and stored in a database system (archive). Sensors compress time series instead of sending them in raw form. We propose an optimal online algorithm for constructing a piecewise constant approximation (PCA) of a time series which guarantees that the compressed representation satisfies an error bound on the distance. In addition to the capture task, we often want to estimate the values of a time series ahead of time, e.g., to answer realtime queries. To achieve this, sensors may fit predictive models on observed data, sending parameters of these models to the archive. We exploit the interplay between prediction and compression in a unified framework that avoids duplicating effort and leads to reduced communication.
Indexing large humanmotion databases
 In Proc. 30th VLDB Conf
, 2004
"... Datadriven animation has become the industry standard for computer games and many animated movies and special effects. In particular, motion capture data recorded from live actors, is the most promising approach offered thus far for animating realistic human characters. However, the manipulation of ..."
Abstract

Cited by 64 (6 self)
 Add to MetaCart
(Show Context)
Datadriven animation has become the industry standard for computer games and many animated movies and special effects. In particular, motion capture data recorded from live actors, is the most promising approach offered thus far for animating realistic human characters. However, the manipulation of such data for general use and reuse is not yet a solved problem. Many of the existing techniques dealing with editing motion rely on indexing for annotation, segmentation, and reordering of the data. Euclidean distance is inappropriate for solving these indexing problems because of the inherent variability found in human motion. The limitations of Euclidean distance stems from the fact that it is very sensitive to distortions in the time axis. A partial solution to this problem, Dynamic Time Warping (DTW), aligns the time axis
Classifying spatiotemporal object trajectories using unsupervised learning of basis function coefficients
 In VSSN ’05: Proceedings of the third ACM international workshop on Video surveillance & sensor networks
, 2005
"... This paper proposes a novel technique for clustering and classification of object trajectorybased video motion clips using spatiotemporal functional approximations. A Mahalanobis classifier is then used for the detection of anomalous trajectories. Motion trajectories are considered as time series a ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
(Show Context)
This paper proposes a novel technique for clustering and classification of object trajectorybased video motion clips using spatiotemporal functional approximations. A Mahalanobis classifier is then used for the detection of anomalous trajectories. Motion trajectories are considered as time series and modeled using the leading Fourier coefficients obtained by a Discrete Fourier Transform. Trajectory clustering is then carried out in the Fourier coefficient feature space to discover patterns of similar object motions. The coefficients of the basis functions are used as input feature vectors to a SelfOrganising Map which can learn similarities between object trajectories in an unsupervised manner. Encoding trajectories in this way leads to efficiency gains over existing approaches that use discrete pointbased flow vectors to represent the whole trajectory. Experiments are performed on two different datasets – synthetic and pedestrian object tracking to demonstrate the effectiveness of our approach. Applications to motion data mining in video surveillance databases are envisaged.
EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis
"... The domain name service (DNS) plays an important role in the operation of the Internet, providing a twoway mapping between domain names and their numerical identifiers. Given its fundamental role, it is not surprising that a wide variety of malicious activities involve the domain name service in on ..."
Abstract

Cited by 54 (3 self)
 Add to MetaCart
(Show Context)
The domain name service (DNS) plays an important role in the operation of the Internet, providing a twoway mapping between domain names and their numerical identifiers. Given its fundamental role, it is not surprising that a wide variety of malicious activities involve the domain name service in one way or another. For example, bots resolve DNS names to locate their command and control servers, and spam mails contain URLs that link to domains that resolve to scam servers. Thus, it seems beneficial to monitor the use of the DNS system for signs that indicate that a certain name is used as part of a malicious operation. In this paper, we introduce EXPOSURE, a system that employs largescale, passive DNS analysis techniques to detect domains that are involved in malicious activity. We use 15 features that we extract from the DNS traffic that allow us to characterize different properties of DNS names and the ways that they are queried. Our experiments with a large, realworld data set consisting of 100 billion DNS requests, and a reallife deployment for two weeks in an ISP show that our approach is scalable and that we are able to automatically identify unknown malicious domains that are misused in a variety of malicious activity (such as for botnet command and control, spamming, and phishing). 1
Visually mining and monitoring massive time series
 In Proceedings of the 10 th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2004
"... Moments before the launch of every space vehicle, engineering discipline specialists must make a critical go/nogo decision. The cost of a false positive, allowing a launch in spite of a fault, or a false negative, stopping a potentially successful launch, can be measured in the tens of millions of ..."
Abstract

Cited by 50 (12 self)
 Add to MetaCart
(Show Context)
Moments before the launch of every space vehicle, engineering discipline specialists must make a critical go/nogo decision. The cost of a false positive, allowing a launch in spite of a fault, or a false negative, stopping a potentially successful launch, can be measured in the tens of millions of dollars, not including the cost in morale and other more intangible detriments. The Aerospace Corporation is responsible for providing engineering assessments critical to the go/nogo decision for every Department of Defense space vehicle. These assessments are made by constantly monitoring streaming telemetry data in the hours before launch. We will introduce VizTree, a novel timeseries visualization tool to aid the Aerospace analysts who must make these engineering assessments. VizTree was developed at the University of California, Riverside and is unique in that the same tool is used for mining archival data and monitoring incoming live telemetry. The use of a single tool for both aspects of the task allows a natural and intuitive transfer of mined knowledge to the monitoring task. Our visualization approach works by transforming the time series into a symbolic representation, and encoding the data in a modified suffix tree in which the frequency and other properties of patterns are mapped onto colors and other visual properties. We demonstrate the utility of our system by comparing it with stateoftheart batch algorithms on several real and synthetic datasets.
Mining Motifs in Massive Time Series Databases
 In Proceedings of IEEE International Conference on Data Mining (ICDM’02
, 2002
"... The problem of efficiently locating previously known patterns in a time series database (i.e., query by content) has received much attention and may now largely be regarded as a solved problem. However, from a knowledge discovery viewpoint, a more interesting problem is the enumeration of previously ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
(Show Context)
The problem of efficiently locating previously known patterns in a time series database (i.e., query by content) has received much attention and may now largely be regarded as a solved problem. However, from a knowledge discovery viewpoint, a more interesting problem is the enumeration of previously unknown, frequently occurring patterns. We call such patterns "motifs", because of their close analogy to their discrete counterparts in computation biology. An efficient motif discovery algorithm for time series would be useful as a tool for summarizing and visualizing massive time series databases. In addition it could be used as a subroutine in various other data mining tasks, including the discovery of association rules, clustering and classification.