Results 1 - 10
of
45
Similarity search over time series data using wavelets
- In ICDE
, 2002
"... We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over high-dimensional time-series data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this applicatio ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
We consider the use of wavelet transformations as a dimensionality reduction technique to permit efficient similarity search over high-dimensional time-series data. While numerous transformations have been proposed and studied, the only wavelet that has been shown to be effective for this application is the Haar wavelet. In this work, we observe that a large class of wavelet transformations (not only orthonormal wavelets but also bi-orthonormal wavelets)can be used to support similarity search. This class includes the most popular and most effective wavelets being used in image compression. We present a detailed performance study of the effects of using different wavelets on the performance of similarity search for time-series data. We include several wavelets that outperform both the Haar wavelet and the best known non-wavelet transformations for this application. To ensure our results are usable by an application engineer, we also show how to configure an indexing strategy for the best performing transformations. Finally, we identify classes of data that can be indexed efficiently using these wavelet transformations. 1.
Efficient Retrieval of Similar Time Sequences Using DFT
- In Proc. FODO Conference, Kobe
, 1998
"... We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the beginning and as strong as its counterpart. We show analytically that this observation can accelerate the search time of the index by more than a factor of two. This result was confirmed by our experiments, which were carried out on real stock prices and synthetic data. Keywords similarity retrieval, time series indexing 1 Introduction Time sequences constitute a large amount of data stored in computers. Examples include stock prices, exchange rates, weather data and biomedical measurements. We are often interested in similarity queries on time-series data [APWZ95, ALSS95]. For example, we may want to find stocks that behave in approximately the same way; or years when the temperature pat...
Using the Fractal Dimension to Cluster Datasets
- IN PROCEEDINGS OF THE SIXTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2000
"... Clustering is a widely used knowledge discovery technique. It helps uncovering structures in data that were not previously known. The clustering of large data sets has received a lot of attention in recent years, however, clustering is a still a challenging task since many published algorithms fail ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
Clustering is a widely used knowledge discovery technique. It helps uncovering structures in data that were not previously known. The clustering of large data sets has received a lot of attention in recent years, however, clustering is a still a challenging task since many published algorithms fail to do well in scaling with the size of the data set and the number of dimensions that describe the points, or in finding arbitrary shapes of clusters, or dealing effectively with the presence of noise. In this paper, we present a new clustering algorithm, based in the fractal properties of the data sets. The new algorithm which we call Fractal Clustering (FC) places points incrementally in the cluster for which the change in the fractal dimension after adding the point is the least. This is a very natural way of clustering points, since points in the same cluster have a great degree of self-similarity among them (and much less self-similarity with respect to points in other clusters). FC requires one scan of the data, is suspendable at will, providing the best answer possible at that point, and is incremental. We show via experiments that FC effectively deals with large data sets, high-dimensionality and noise and is capable of recognizing clusters of arbitrary shape.
Requirements for Clustering Data Streams
"... Scientific and industrial examples of data streams abound in astronomy, telecommunication operations, banking and stock-market applications, e-commerce and other fields. A challenge imposed by continuously arriving data streams is to analyze them and to modify the models that explain them as new dat ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Scientific and industrial examples of data streams abound in astronomy, telecommunication operations, banking and stock-market applications, e-commerce and other fields. A challenge imposed by continuously arriving data streams is to analyze them and to modify the models that explain them as new data arrives. In this paper, we analyze the requirements needed for clustering data streams. We review some of the latest algorithms in the literature and assess if they meet these requirements.
On Generalized Entropies and Scale-Space
, 1997
"... this paper we show that the generalized entropies are such functionals. It should be noted that this behavior is not seen for the number of critical points: Although critical points most often disappear when scale is increased, creation of critical points with increasing scale is a generic event [16 ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
this paper we show that the generalized entropies are such functionals. It should be noted that this behavior is not seen for the number of critical points: Although critical points most often disappear when scale is increased, creation of critical points with increasing scale is a generic event [16, 14, 7]. Secondly, generalized entropy is the basis for the theory of Multi-Fractal [11, 18] and it is known that there are very strong algebraic similarities to the fundamental equations of Statistical Mechanics. These are thus well known functions, and while images are not physical systems in classical thermodynamic sense, Linear Scale-Space is governed by the Linear Heat Diffusion Equation, and one could thus without great difficulty extend the view of images to be a classical thermodynamical system for which the Linear Heat Diffusion is valid. Such a system is an ideal gas. These interpretations of images will be discussed in detail in this chapter. Finally, as will be demonstrated the generalized entropies offer practical, mathematical well founded functions to study scaling behaviors of images for scale-selection and texture analysis. Related to this work is Vehel et al. [29], where images are studied in the multi-fractal setting, focusing on certain dimensions, and Brink & Pendock [6], and Brink [5] have used the entropy and the closely related Kullback measure to do local thresholding of images. This article is organized as follows. First, in Section 2 will be given a brief introduction to Linear ScaleSpace and linear entropy. Then, in Section 3 will we discuss the generalized entropies, what the difference is to linear entropy, and what their properties are in Scale-Space. Following this, in Section 4 we will discuss a physical interpretation of images both from the...
Pitch Extraction and Fundamental Frequency: History and Current Techniques
, 2003
"... Pitch extraction (also called fundamental frequency estimation) has been a popular topic in many fields of research since the age of computers. Yet in the course of some 50 years of study, current techniques are still not to a desired level of accuracy and robustness. When ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Pitch extraction (also called fundamental frequency estimation) has been a popular topic in many fields of research since the age of computers. Yet in the course of some 50 years of study, current techniques are still not to a desired level of accuracy and robustness. When
A User-Friendly Self-Similarity Analysis Tool
- ACM SIGCOMM Computer Communication Review
, 2003
"... The concepts of self-similarity, fractals, and long-range dependence (LRD) have revolutionized network modeling during the last decade. However, despite all the attention these concepts have received, they remain di#cult to use by non-experts. This di#- culty can be attributed to a relative complexi ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
The concepts of self-similarity, fractals, and long-range dependence (LRD) have revolutionized network modeling during the last decade. However, despite all the attention these concepts have received, they remain di#cult to use by non-experts. This di#- culty can be attributed to a relative complexity of the mathematical basis, the absence of a systematic approach to their application and the absence of publicly available software. In this paper, we introduce SELFIS, a comprehensive tool, to facilitate the evaluation of LRD by practitioners. Our goal is to create a stand-alone public tool that can become a reference point for the community. Our tool integrates most of the required functionality for an in-depth LRD analysis, including several LRD estimators. In addition, SELFIS includes a powerful approach to stress-test the existence of LRD. Using our tool, evidence are presented that the widely-used LRD estimators can provide misleading results. It is worth mentioning that 25 researchers have acquired SELFIS within a month of its release, which clearly demonstrates the need for such a tool.
High Performance Discovery in Time Series: Techniques and Case Studies
"... This paper proposes e#cient methods for solving this problem based on Discrete Fourier Transforms and a three level time interval hierarchy. Extensive experiments on synthetic data and real world financial trading data show that our algorithm beats the direct computation approach by several orders o ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
This paper proposes e#cient methods for solving this problem based on Discrete Fourier Transforms and a three level time interval hierarchy. Extensive experiments on synthetic data and real world financial trading data show that our algorithm beats the direct computation approach by several orders of magnitude. It also improves on previous Fourier Transform approaches by allowing the e#cient computation of time-delayed correlation over any size sliding window and any time delay. Correlation also lends itself to an e#cient grid-based data structure. The result is the first algorithm that we know of to compute correlations over thousands of data streams in real time. The algorithm is incremental, has fixed response time, and can monitor the pairwise correlations of 10,000 streams on a single PC. The algorithm is embarrassingly parallelizable
A Stochastic Model for the Evolution of the Web
- Computer Networks
, 2002
"... Recently several authors have proposed stochastic models of the growth of the Web graph that give rise to power-law distributions. These models are based on the notion of preferential attachment leading to the "rich get richer" phenomenon. However, these models fail to explain several distributio ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Recently several authors have proposed stochastic models of the growth of the Web graph that give rise to power-law distributions. These models are based on the notion of preferential attachment leading to the "rich get richer" phenomenon. However, these models fail to explain several distributions arising from empirical results, due to the fact that the predicted exponent is not consistent with the data. To address this problem, we extend the evolutionary model of the Web graph by including a non-preferential component, and we view the stochastic process in terms of an urn transfer model. By making this extension, we can now explain a wider variety of empirically discovered power-law distributions provided the exponent is greater than two. These include: the distribution of incoming links, the distribution of outgoing links, the distribution of pages in a Web site and the distribution of visitors to a Web site. A by-product of our results is a formal proof of the convergence of the standard stochastic model (first proposed by Simon).
Physics-based Sound Synthesis and Control: Crushing, Walking and Running by Crumpling Sounds
- Proc. of the Colloq. on Musical Infomatics
, 2003
"... Three types of ecological events (crushing, walking and running) have been considered. Their acoustic properties have been modeled following the physics-based approach. Starting from an existing physically-based impact model, we superimposed to it the dynamic and temporal stochastic characteristics ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Three types of ecological events (crushing, walking and running) have been considered. Their acoustic properties have been modeled following the physics-based approach. Starting from an existing physically-based impact model, we superimposed to it the dynamic and temporal stochastic characteristics governing crushing events. The resulting model was triggered by control rules realizing typical walking and running time patterns. This bottom-up design strategy was made possible because the sound synthesis and sound control models could be directly connected each other via a common switchboard of driving and control parameters. The existence of a common interface specification for all the models follows from the application of physics-based modeling, and translates in major advantages when those models are implemented as independent, self-contained blocks and procedures connected together in real-time inside a sw architecture like pd. 1.

