## Faster adaptive set intersections for text searching (2006)

### Cached

### Download Links

- [www.cs.uwaterloo.ca]
- [users.dcc.uchile.cl]
- DBLP

### Other Repositories/Bibliography

Venue: | Experimental Algorithms: 5th International Workshop, WEA 2006, Cala Galdana, Menorca |

Citations: | 30 - 4 self |

### BibTeX

@INPROCEEDINGS{Barbay06fasteradaptive,

author = {Jérémy Barbay and Ro López-ortiz and Tyler Lu},

title = {Faster adaptive set intersections for text searching},

booktitle = {Experimental Algorithms: 5th International Workshop, WEA 2006, Cala Galdana, Menorca},

year = {2006},

pages = {146--157},

publisher = {Springer}

}

### OpenURL

### Abstract

Abstract. The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which improves over those proposed by Demaine, Munro and López-Ortiz [SODA 2000/ALENEX 2001], by using a variant of interpolation search. More specifically, our contributions are threefold. First, we corroborate and complete the practical study from Demaine et al. on comparison based intersection algorithms. Second, we show that in practice replacing binary search and galloping (one-sided binary) search [4] by interpolation search improves the performance of each main intersection algorithms. Third, we introduce and test variants of interpolation search: this results in an even better intersection algorithm.

### Citations

2634 | Modern Information Retrieval - Baeza-Yates, Ribeiro-Neto - 1999 |

667 | Suffix arrays: A new method for on-line string searches - Manber, Myers - 1993 |

67 | A survey of adaptive sorting algorithms - Estivill-Castro, Wood - 1992 |

65 | Adaptive set intersections, unions, and differences
- Demaine, López-Ortiz, et al.
- 2000
(Show Context)
Citation Context ...dating back to the algorithm by Hwang and Lin from over three decades ago [13]. In 2000, Demaine et al. improved over this by proposing a faster method for computing the intersection of k sorted sets =-=[7]-=- using an adaptive algorithm. Their algorithm has optimal worst-case behaviour on a much finer analysis than simply worst-case input size. We refer the reader to [7] for the precise details on the ada... |

54 | An Almost Optimal Algorithm for Unbounded Searching
- Bentley, Yao
- 1976
(Show Context)
Citation Context ...rate and complete the practical study from Demaine et al. on comparison based intersection algorithms. Second, we show that in practice replacing binary search and galloping (one-sided binary) search =-=[4]-=- by interpolation search improves the performance of each main intersection algorithms. Third, we introduce and test variants of interpolation search: this results in an even better intersection algor... |

44 | A simple algorithm for merging two disjoint linearly ordered sets
- Hwang, Lin
- 1972
(Show Context)
Citation Context ...bases as well as boolean queries to a search engine. The worst case complexity of this problem has long been well understood, dating back to the algorithm by Hwang and Lin from over three decades ago =-=[13]-=-. In 2000, Demaine et al. improved over this by proposing a faster method for computing the intersection of k sorted sets [7] using an adaptive algorithm. Their algorithm has optimal worst-case behavi... |

39 |
Experiments on adaptive set intersections for text retrieval systems
- Demaine, López-Ortiz, et al.
- 2001
(Show Context)
Citation Context ...] for the precise details on the adaptive measure used. In a followup study they showed that the adaptive theoretical optimal algorithm is not always best in practice in the context of search engines =-=[8]-=-. In that study, they compared a straightforward implementation of an intersection algorithm, termed SvS, with their adaptive algorithm, termed Adaptive, and showed that on the given data Adaptive is ... |

37 |
A fast set intersection algorithm for sorted sequences
- Baeza-Yates
- 2004
(Show Context)
Citation Context ...tained through their own web crawl. Of those, 3s4 we focus on two particular ones: SvS and Small Adaptive. SvS is a straightforAlgorithm 2 Pseudo-code for SvS 1: Sort the sets by size (|set[0]| ≤ |set=-=[1]-=-| ≤ . . . ≤ |set[k]|). 2: Let the smallest set s[0] be the candidate answer set. 3: for each set s[i], i = 1 . . . k do initialize ℓ[k] = 0. 4: for each set s[i], i = 1 . . . k do 5: for each element ... |

32 | Adaptive intersection and t-threshold problems
- Barbay, Kenyon
- 2002
(Show Context)
Citation Context ...dy. Our contributions are threefold. First, we corroborate the practical study from [8] by considering a much larger web crawl and extend their study to include a more recent algorithm, introduced in =-=[3]-=-. The results are similar to those of the original study: the algorithm termed Small Adaptive is the one which performs the best. Second, we study the impact of replacing binary searches and galloping... |

21 | Experimental analysis of a fast intersection algorithm for sorted sequences - Baeza-Yates, Salinger - 2005 |

19 | Optimal merging of 2 elements with n elements - Hwang, Lin - 1971 |

11 | An algorithmic and complexity analysis of interpolation search - Gonnet, Rogers, et al. - 1980 |

3 | Optimal merging of 3 elements with N elements - Hwang - 1980 |

2 |
Mihai Pǎtras¸cu. Interpolation search for non-independent data
- Demaine, Jones
- 2004
(Show Context)
Citation Context ...istribution, hence it is only natural to test if this holds using web crawled data. Moreover, recent developments suggest that interpolation search is also a reasonable technique for non-uniform data =-=[6]-=-. Our experiments, which we describe in the next Section, confirm this conjecture. Recall that interpolation search for an element of value e in an array set[i] on the range a to b probes a position a... |

1 | Interpolation search–A log log n search - Perl, Itai, et al. - 1978 |