## LEEWAVE: Level-Wise Distribution of Wavelet Coefficients for Processing kNN Queries over Distributed Streams

### BibTeX

@MISC{Yeh_leewave:level-wise,

author = {Mi-yen Yeh},

title = {LEEWAVE: Level-Wise Distribution of Wavelet Coefficients for Processing kNN Queries over Distributed Streams},

year = {}

}

### OpenURL

### Abstract

We present LEEWAVE − a bandwidth-efficient approach to searching range-specified k-nearest neighbors among distributed streams by LEvEl-wise distribution of WAVElet coefficients. To find the k most similar streams to a range-specified reference one, the relevant wavelet coefficients of the reference stream can be sent to the peer sites to compute the similarities. However, bandwidth can be unnecessarily wasted if the entire relevant coefficients are sent simultaneously. Instead, we present a level-wise approach by leveraging the multi-resolution property of the wavelet coefficients. Starting from the top and moving down one level at a time, the query initiator sends only the single-level coefficients to a progressively shrinking set of candidates. However, there is one difficult challenge in LEEWAVE: how does the query initiator prune the candidates without knowing all the relevant coefficients? To overcome this challenge, we derive and maintain a similarity range for each candidate and gradually tighten the bounds of this range as we move from one level to the next. The increasingly tightened similarity ranges enable the query initiator to effectively prune the candidates without causing any false dismissal. Extensive experiments with real and synthetic data show that, when compared with prior approaches, LEEWAVE uses significantly less bandwidth under a wide range of conditions. 1.

### Citations

418 | TelegraphCQ: Continuous dataflow processing for an uncertain world
- Chandrasekaran, Cooper, et al.
- 2003
(Show Context)
Citation Context ... form of rapidly arriving streams. Examples include data analysis in sensor networks, program trading in financial markets, video surveillance and weather forecasting. In response, many organizations =-=[1, 3, 5, 9, 30, 32]-=- have started developing data stream processing systems (DSPS). Finding k-nearest neighbors (kNN) is one of the most common applications in computing. Processing kNN queries has been one of the most s... |

211 |
Zur theorie der orthogonalen funktionensysteme. Mathematische Annalen
- Haar
- 1910
(Show Context)
Citation Context ...d u,5 d u,6 d u,7 (b) level 3 level 2 level 1 Figure 2: (a) The error tree for Example 1; (b) The notation of an error tree proposed in [17]. Among different wavelet transformations, the Haar wavelet =-=[16]-=- decomposition is the first and also the most popular one. It is achieved by averaging two adjacent data values of a sequence of data at different time resolutions. Then, only the overall average and ... |

208 | Adaptive filters for continuous queries over distributed data streams, in
- Olston, Jiang, et al.
- 2003
(Show Context)
Citation Context ...o focus on distributed streams. These works include finding recently frequent itemsets [23], top-k monitoring [4], tracking approximate quantiles [11], processing aggregation and thresholding queries =-=[22, 25, 28]-=-, content-based indexing for inner product queries and similarity queries [7], and so forth. To the best of our knowledge, there is no prior work on processing kNN queries among multiple distributed s... |

204 | The Design of the Borealis Stream Processing Engine. CIDR ’05
- Abadi, Ahmad, et al.
(Show Context)
Citation Context ... form of rapidly arriving streams. Examples include data analysis in sensor networks, program trading in financial markets, video surveillance and weather forecasting. In response, many organizations =-=[1, 3, 5, 9, 30, 32]-=- have started developing data stream processing systems (DSPS). Finding k-nearest neighbors (kNN) is one of the most common applications in computing. Processing kNN queries has been one of the most s... |

181 | Distributed Top-K Monitoring
- Babcock, Olston
- 2003
(Show Context)
Citation Context ... geographically distributed places. Therefore, more and more research works have started to focus on distributed streams. These works include finding recently frequent itemsets [23], top-k monitoring =-=[4]-=-, tracking approximate quantiles [11], processing aggregation and thresholding queries [22, 25, 28], content-based indexing for inner product queries and similarity queries [7], and so forth. To the b... |

175 | StatStream: Statistical Monitoring of Thousands of
- Zhu, Shasha
- 2002
(Show Context)
Citation Context ...distributed sites is large, LEEWAVE indeed is a much better solution in bandwidth savings. 5.2 Evaluation with synthetic data The synthetic data were generated by a random walk data model proposed in =-=[33]-=-. For a stream Si, it was generated as follows: Si = 100 + i∑ (uj − 0.5), j=1 where uj was randomly picked from [0,1]. We generated 1,000 streams in total, where each stream has 20,000 data points. Fo... |

95 | Tributaries and deltas: Efficient and robust aggregation in sensor network streams
- Manjhi, Nath, et al.
(Show Context)
Citation Context ...o focus on distributed streams. These works include finding recently frequent itemsets [23], top-k monitoring [4], tracking approximate quantiles [11], processing aggregation and thresholding queries =-=[22, 25, 28]-=-, content-based indexing for inner product queries and similarity queries [7], and so forth. To the best of our knowledge, there is no prior work on processing kNN queries among multiple distributed s... |

94 |
STREAM: The Stanford Stream Data Manager
- Arasu, Babcock, et al.
- 2003
(Show Context)
Citation Context ... form of rapidly arriving streams. Examples include data analysis in sensor networks, program trading in financial markets, video surveillance and weather forecasting. In response, many organizations =-=[1, 3, 5, 9, 30, 32]-=- have started developing data stream processing systems (DSPS). Finding k-nearest neighbors (kNN) is one of the most common applications in computing. Processing kNN queries has been one of the most s... |

86 | Holistic Aggregates in a Networked World: Distributed Tracking of Approximate Quantiles
- Cormode, Garofalakis, et al.
- 2005
(Show Context)
Citation Context ...Therefore, more and more research works have started to focus on distributed streams. These works include finding recently frequent itemsets [23], top-k monitoring [4], tracking approximate quantiles =-=[11]-=-, processing aggregation and thresholding queries [22, 25, 28], content-based indexing for inner product queries and similarity queries [7], and so forth. To the best of our knowledge, there is no pri... |

70 | Efcient elastic burst detection in data streams
- Zhu, Shasha
- 2003
(Show Context)
Citation Context ...nvironment. Among various sketches, the wavelet-based one, especially the Haar wavelet summarization, has been widely adopted in many stream-oriented applications due to its efficiency and simplicity =-=[6, 31, 34]-=-. More importantly, the Haar wavelet decomposition provides multiple resolutions in time and frequency domains like all other wavelet decompositions do. In a coarser resolution, there are fewer wavele... |

67 | Finding (recently) frequent items in distributed data streams
- Manjhi, Shkapenyuk, et al.
- 2005
(Show Context)
Citation Context ...treams are generated in geographically distributed places. Therefore, more and more research works have started to focus on distributed streams. These works include finding recently frequent itemsets =-=[23]-=-, top-k monitoring [4], tracking approximate quantiles [11], processing aggregation and thresholding queries [22, 25, 28], content-based indexing for inner product queries and similarity queries [7], ... |

62 | Similarity search over time series data using wavelets ICDE’02
- Popivanov, Miller
(Show Context)
Citation Context ...ed in Section 4. Section 5 shows the experimental results. Finally, the paper is concluded in Section 6. 2. RELATED WORK Wavelet transform plays an important role in the field of time series analysis =-=[27]-=-. Although there are many other data decomposition methods such as discrete Fourier transform, singular value decomposition, piecewise linear approximation and so on, none of them provides the multipl... |

62 | A geometric approach to monitoring threshold functions over distributed data streams
- Sharfman, Schuster, et al.
- 2006
(Show Context)
Citation Context ...o focus on distributed streams. These works include finding recently frequent itemsets [23], top-k monitoring [4], tracking approximate quantiles [11], processing aggregation and thresholding queries =-=[22, 25, 28]-=-, content-based indexing for inner product queries and similarity queries [7], and so forth. To the best of our knowledge, there is no prior work on processing kNN queries among multiple distributed s... |

45 | One-pass wavelet decompositions of data streams
- Gilbert, Kotidis, et al.
(Show Context)
Citation Context ...re 3: Extracting complete error subtrees and relevant coefficients from the whole error tree based on a desired time range. proposed. These requirements include minimizing the L 2 -norm average error =-=[14]-=-, minimizing the maximum absolute/relative error [18], minimizing the weighted L p -norm error [15], and providing a guaranteed accuracy [12], to name a few. Given the retained coefficients of a strea... |

40 |
The Cauchy-Schwarz Master Class. An Introduction to the Art
- Steele
- 2004
(Show Context)
Citation Context ...uld like to find a substitute for this term which is an overestimate but can be computed with level-wise coefficients. Fortunately, we do find such a substitute. According to CauchySchwarz inequality =-=[29]-=-, we can find an upper bound of the inner product of two vectors in real space, where this upper bound is the product of the linear square sum of each vector: h∑ ( αiβi) 2 ≤ i=1 h∑ i=1 α 2 i × h∑ β 2 ... |

38 | Swat: Hierarchical stream summarization in large networks
- Bulut, Singh
- 2003
(Show Context)
Citation Context ...nvironment. Among various sketches, the wavelet-based one, especially the Haar wavelet summarization, has been widely adopted in many stream-oriented applications due to its efficiency and simplicity =-=[6, 31, 34]-=-. More importantly, the Haar wavelet decomposition provides multiple resolutions in time and frequency domains like all other wavelet decompositions do. In a coarser resolution, there are fewer wavele... |

33 |
Haar Wavelets for Efficient Similarity Search of TimeSeries: With and Without Time Warping
- Chan, Fu, et al.
- 2003
(Show Context)
Citation Context ...efficients Given wavelet coefficients of two streams, we can compute the Euclidean distance directly from the coefficients themselves without doing inverse wavelet transform back to the original data =-=[8]-=-. By reformulating the distance computation proposed in [17], for a given time range T =[ts, te], the distance between two streams Su and Sv can be computed as follows: dst(Su, Sv)| te ts = [∑ where D... |

29 | Iterative incremental clustering of time series
- Lin, Vlachos, et al.
- 2004
(Show Context)
Citation Context ...aved computation time while still providing good results when answering queries using approximated images in coarser resolutions. Similarly, when considering clustering of time series, the authors in =-=[20]-=- also leverages the multi-resolution property of wavelets to avoid local minima problem and save computation time. The coarser representations are used to decide initial cluster centers while the fine... |

28 | Approximate nn queries on streams with guaranteed error/performance bounds. VLDB
- Koudas, Ooi, et al.
- 2004
(Show Context)
Citation Context ...ore challenging. It must handle an endlessly growing amount of data with limited resources. Nevertheless, many researchers have started working on various aspects of stream-based kNN query processing =-=[13, 17, 19, 21]-=-. But, these works mainly focus on the case where data streams are collected and processed at a central site. In many real-world applications, however, data streams are usually collected in a decentra... |

22 | One-pass wavelet synopses for maximum-error metrics
- Karras, Mamoulis
- 2005
(Show Context)
Citation Context ... coefficients from the whole error tree based on a desired time range. proposed. These requirements include minimizing the L 2 -norm average error [14], minimizing the maximum absolute/relative error =-=[18]-=-, minimizing the weighted L p -norm error [15], and providing a guaranteed accuracy [12], to name a few. Given the retained coefficients of a stream, we can extract the relevant coefficients within an... |

21 | Wavelet synopsis for data streams: minimizing non-euclidean error
- Guha, Harb
- 2005
(Show Context)
Citation Context ...on a desired time range. proposed. These requirements include minimizing the L 2 -norm average error [14], minimizing the maximum absolute/relative error [18], minimizing the weighted L p -norm error =-=[15]-=-, and providing a guaranteed accuracy [12], to name a few. Given the retained coefficients of a stream, we can extract the relevant coefficients within any time range [ts, te]. As suggested in [17], t... |

16 | Evaluating continuous nearest neighbor queries for streaming time series via pre-fetching
- Gao, Yao, et al.
- 2002
(Show Context)
Citation Context ...ore challenging. It must handle an endlessly growing amount of data with limited resources. Nevertheless, many researchers have started working on various aspects of stream-based kNN query processing =-=[13, 17, 19, 21]-=-. But, these works mainly focus on the case where data streams are collected and processed at a central site. In many real-world applications, however, data streams are usually collected in a decentra... |

16 | Challenges and experience in prototyping a multimodal stream analytic and monitoring application on
- Wu, Yu, et al.
- 2007
(Show Context)
Citation Context |

14 | Fast approximate wavelet tracking on streams
- Cormode, Garofalakis, et al.
- 2006
(Show Context)
Citation Context ...equirements include minimizing the L 2 -norm average error [14], minimizing the maximum absolute/relative error [18], minimizing the weighted L p -norm error [15], and providing a guaranteed accuracy =-=[12]-=-, to name a few. Given the retained coefficients of a stream, we can extract the relevant coefficients within any time range [ts, te]. As suggested in [17], the extraction procedure costs a time compl... |

14 | Efcient k-NN Search on Streaming Data Series
- Liu, Ferhatosmanoglu
- 2003
(Show Context)
Citation Context ...ore challenging. It must handle an endlessly growing amount of data with limited resources. Nevertheless, many researchers have started working on various aspects of stream-based kNN query processing =-=[13, 17, 19, 21]-=-. But, these works mainly focus on the case where data streams are collected and processed at a central site. In many real-world applications, however, data streams are usually collected in a decentra... |

9 | Distributed data streams indexing using content-based routing paradigm
- Bulut, Singh, et al.
- 2005
(Show Context)
Citation Context ... [23], top-k monitoring [4], tracking approximate quantiles [11], processing aggregation and thresholding queries [22, 25, 28], content-based indexing for inner product queries and similarity queries =-=[7]-=-, and so forth. To the best of our knowledge, there is no prior work on processing kNN queries among multiple distributed streams. In a distributed environment, among various kind of kNN queries and w... |

6 | Resource-Aware Mining with Variable Granularities in Data Streams; SIAM Int'l Conf. on Data Mining; 2004 Dr.Anshu Chaturvedi Anshu Chaturvedi Currently working as Lecturer
- Teng, Chen, et al.
- 2009
(Show Context)
Citation Context ...nvironment. Among various sketches, the wavelet-based one, especially the Haar wavelet summarization, has been widely adopted in many stream-oriented applications due to its efficiency and simplicity =-=[6, 31, 34]-=-. More importantly, the Haar wavelet decomposition provides multiple resolutions in time and frequency domains like all other wavelet decompositions do. In a coarser resolution, there are fewer wavele... |

3 | Multi-resolution indexing for shape images
- Chiueh, Ballman, et al.
- 1998
(Show Context)
Citation Context ... possible. However, sending the entire query object to all other sites initially still needs too much bandwidth, especially when the size of the query object or the total number of sites is large. In =-=[10]-=-, the concept of multi-resolution indexing is employed to reduce the computation for shape image retrieval. Instead of considering all vantage points of images, the authors computed an approximate dis... |

2 | Distributed processing of similarity queries
- Papadopoulos, Manolopoulos
(Show Context)
Citation Context ...and then reports its local kNN streams to Pinit. Finally, Pinit determines the true kNN after it receives results from all the peer sites. This kind of scheme was called Concurrent Processing (CP) in =-=[26]-=-. Unfortunately, concurrent processing is not a good one because it requires large bandwidth (size of the query stream multiplied by (M − 1) plus k candidate objects from each of the M −1 sites ) and ... |

1 |
A compact wavelet index for retrieval in image database
- Albanesi, Ferretti, et al.
- 1999
(Show Context)
Citation Context ...lution to refine the answers. The multiresolution property of wavelets has also been exploited by many other applications. For example, using haar wavelet coefficients to index images, the authors in =-=[2]-=- showed that it saved computation time while still providing good results when answering queries using approximated images in coarser resolutions. Similarly, when considering clustering of time series... |

1 |
Efficient range-constrained similarity search on wavelet synopses over multiple streams
- Hung, Chen
- 2006
(Show Context)
Citation Context |

1 |
Wavelet-based 596 for selectivity estimation
- Matias, Vitter, et al.
- 1998
(Show Context)
Citation Context ...ifference values of each resolution are {5.875, −0.625, −0.25, 0.5, −1, 1.5, 1, −1}. To better illustrate the Haar wavelet decomposition, a widely used data structure called error tree is proposed in =-=[24]-=-. The error tree for Example 1 is shown in Fig. 2(a). This tree is composed of wavelet coefficients as nodes and signs as edges. The root of this tree is the overall average and all the other non-leaf... |