Results 11  20
of
107
An IndexBased Approach for Similarity Search Supporting Time Warping in Large Sequence Databases
 In ICDE
, 2001
"... This paper discusses an effective processing of similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warp ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
This paper discusses an effective processing of similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multidimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan all the database, thus suffer from serious performance degradation in large databases. Another method that hires the suffix tree, which does not assume any distance function, also shows poor performance due to the large tree size. In this paper, we propose a new novel method for similarity search that supports time warping. Our primary goal is to innovate on search performance in large databases without permitting any false dismissal. To attain this goal, we devise a new distance function D tw\Gammalb that consistently unde...
Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases
 In ICDE
, 2000
"... We propose an indexing technique for fast retrieval of similar subsequences using time warping distances. A time warping distance is a more suitable similarity measure than the Euclidean distance in many applications, where sequences may be of different lengths or different sampling rates. Our index ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
We propose an indexing technique for fast retrieval of similar subsequences using time warping distances. A time warping distance is a more suitable similarity measure than the Euclidean distance in many applications, where sequences may be of different lengths or different sampling rates. Our indexing technique uses a diskbased suffix tree as an index structure and employs' lowerbound distance functions to filter out dissimilar subsequences without false dismissals. To make the index structure compact and thus accelerate the query processing, we convert sequences of continuous values to sequences of discrete values via a categorization method and store only a subset of suffixes whose first values are different from their preceding values. The experimental results' reveal that our proposed technique can be a few orders' of magnitude faster than sequential scanning.
DensityConnected Sets and their Application for Trend Detection in Spatial Databases
 Proc. 3rd Znt. Conf on Knowledge Discovery and Data Mining, 1015, Menlo Park
, 1997
"... Several clustering algorithms have been proposed for class identification in spatial databases such as earth observation databases. The effectivity of the wellknown algorithms such as DBSCAN, however, is somewhat limited because they do not fully exploit the richness of the different types of data ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Several clustering algorithms have been proposed for class identification in spatial databases such as earth observation databases. The effectivity of the wellknown algorithms such as DBSCAN, however, is somewhat limited because they do not fully exploit the richness of the different types of data contained in a spatial database. In this paper, we introduce the concept of densityconnected sets and present a significantly generalized version of DBSCAN. The major properties of this algorithm are as follows: (1) any symmetric predicate can be used to define the neighborhood of an object allowing a natural definition in the case of spatially extended objects such as polygons, and (2) the cardinality function for a set of neighboring objects may take into account the nonspatial attributes of the objects as a means of assigning application specific weights. Densityconnected sets can be used as a basis to discover trends in a spatial database. We define trends in spatial databases and show how to apply the generalized DBSCAN algorithm for the task of discovering such knowledge. To demonstrate the practical impact of our approach, we performed experiments on a geographical information system on Bavaria which is representative for a broad class of spatial databases.
Time series data mining: Identifying temporal patterns for characterization and prediction of time series events
, 1999
"... ..."
To buy or not to buy: mining airfare data to minimize ticket purchase price
 In Proceedings of KDD’03
, 2003
"... As product prices become increasingly available on the World Wide Web, consumers attempt to understand how corporations vary these prices over time. However, corporations change prices based on proprietary algorithms and hidden variables (e.g., the number of unsold seats on a flight). Is it possible ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
As product prices become increasingly available on the World Wide Web, consumers attempt to understand how corporations vary these prices over time. However, corporations change prices based on proprietary algorithms and hidden variables (e.g., the number of unsold seats on a flight). Is it possible to develop data mining techniques that will enable consumers to predict price changes under these conditions? This paper reports on a pilot study in the domain of airline ticket prices where we recorded over 12,000 price observations over a 41 day period. When trained on this data, Hamlet — our multistrategy data mining algorithm — generated a predictive model that saved 341 simulated passengers $198,074 by advising them when to buy and when to postpone ticket purchases. Remarkably, a clairvoyant algorithm with complete knowledge of future prices could save at most $320,572 in our simulation, thus Hamlet’s savings were 61.8 % of optimal. The algorithm’s savings of $198,074 represents an average savings of 23.8 % for the 341 passengers for whom savings are possible. Overall, Hamlet saved 4.4 % of the ticket price averaged over the entire set of 4,488 simulated passengers. Our pilot study suggests that mining of price data available over the web has the potential to save consumers substantial sums of money per annum.
SegmentBased Approach for Subsequence Searches in Sequence Databases
, 2001
"... This paper investigates the subsequence searching problem under time warping in sequence databases. Time warping enables to find sequences with similar changing patterns even when they are of different lengths. Our work is motivated by the observation that subsequence searches slow down quadraticall ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
This paper investigates the subsequence searching problem under time warping in sequence databases. Time warping enables to find sequences with similar changing patterns even when they are of different lengths. Our work is motivated by the observation that subsequence searches slow down quadratically as the total length of data sequences increases. To resolve this problem, we propose the SegmentBased Approach for Subsequence Searches (SBASS), which modifies the similarity measure from time warping to piecewise time warping and limits the number of possible subsequences to be compared with a query sequence. For efficient
Dynamic vptree indexing for nnearest neighbor search given pairwise distances
 VLDB Journal
, 2000
"... distances ..."
Mining for Similarities in Aligned Time Series Using Wavelets
, 1999
"... Discovery of nonobvious relationships between time series is an important problem in many domains, such as financial, sensory, and scientific data analysis. We consider data mining in aligned time series, which arise, e.g., in numerous online monitoring applications, and we are interested in findin ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
Discovery of nonobvious relationships between time series is an important problem in many domains, such as financial, sensory, and scientific data analysis. We consider data mining in aligned time series, which arise, e.g., in numerous online monitoring applications, and we are interested in finding time series that reflect the same external events. The time series can have different vertical positions, scales and overall trends, but still show related features at the same locations. The features can be short term such as small peaks and turns, or long term such as wider mountains and valleys. We propose using a wavelet transformation of a time series to produce a natural set of features for the sequence. Wavelet transformation yields features that describe properties of the sequence both at various locations and at varying time granularities. In the proposed method, these features are processed so that they are insensitive to changes in the vertical position, scaling, and overall tre...
An Overview of Issues in Developing Industrial Data Mining and Knowledge Discovery Applications
 In Proceedings of KDD96, Menlo
, 1996
"... This paper surveys the growing number of indu5 trial applications of data mining and knowledge discovery. We look at the existing tools, describe some representative applications, and discuss the major issues and problems for building and deploying successful applications and their adoption by busin ..."
Abstract

Cited by 19 (3 self)
 Add to MetaCart
This paper surveys the growing number of indu5 trial applications of data mining and knowledge discovery. We look at the existing tools, describe some representative applications, and discuss the major issues and problems for building and deploying successful applications and their adoption by business users. Finally, we examine how to assess the potential of a knowledge discovery application. 1
Finding Informative Rules in Interval Sequences
 Intelligent Data Analysis
, 2001
"... Observing a binary feature over a period of time yields a sequence of observation intervals. To ease the access to continuous features (like time series), they are often broken down into attributed intervals, such that the attribute describes the series' behaviour within the segment (e.g. incre ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
Observing a binary feature over a period of time yields a sequence of observation intervals. To ease the access to continuous features (like time series), they are often broken down into attributed intervals, such that the attribute describes the series' behaviour within the segment (e.g. increasing, highvalue, highly convex, etc.). In both cases, we obtain a sequence of interval data, in which temporal patterns and rules can be identified. A temporal pattern is defined as a set of labeled intervals together with their interval relationships described in terms of Allen's interval logic. In this paper, we consider the evaluation of such rules in order to find the most informative rules. We discuss rule semantics and outline de ciencies of the previously used rule evaluation. We apply the Jmeasure to rules with a modified semantics in order to better cope with different lengths of the temporal patterns. We also consider the problem of specializing temporal rules by additional attributes of the state intervals.