Results 1 -
3 of
3
On-line index maintenance using horizontal partitioning
- In Proceeding of the 18th ACM CIKM, CIKM ’09
, 2009
"... In this paper, we propose a new merge-based index maintenance strategy for Information Retrieval systems. The new model is based on partitioning of the inverted index across the terms in it. We exploit the query log to partition the on-disk inverted index into two types of sub-indexes. Inverted list ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper, we propose a new merge-based index maintenance strategy for Information Retrieval systems. The new model is based on partitioning of the inverted index across the terms in it. We exploit the query log to partition the on-disk inverted index into two types of sub-indexes. Inverted lists of the terms contained in the queries that are frequently posed to the Information Retrieval systems are kept in one partition, called frequent-term index and the other inverted lists form another partition, called infrequentterm index. We use a lazy-merge strategy for maintaining infrequent-term sub-indexes, and an active merge strategy for maintaining frequent-term sub-indexes. The sub-indexes are also similarly split into frequent and in-frequent parts. Experimental results show that the proposed method improves both index maintenance performance and query performance compared to the existing merge-based strategies.
Abstract
, 1104
"... Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of key-value pairs. A multimap is generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure that is used prevalently in ..."
Abstract
- Add to MetaCart
Many data structures support dictionaries, also known as maps or associative arrays, which store and manage a set of key-value pairs. A multimap is generalization that allows multiple values to be associated with the same key. For example, the inverted file data structure that is used prevalently in the infrastructure supporting search engines is a type of multimap, where words are used as keys and document pointers are used as values. We study the multimap abstract data type and how it can be implemented efficiently online in external memory frameworks, with constant expected I/O performance. The key technique used to achieve our results is a combination of cuckoo hashing using buckets that hold multiple items with a multiqueue implementation to cope with varying numbers of values per key. Our external-memory results are for the standard two-level memory model. 1
Retrieval—Search Process
"... The existing query-log based on-line index maintenance approaches rely on frequency distribution of terms in the static query-log. Though these approaches are proved to be efficient, but in real world, the frequency distribution of the terms changes over a period of time. This negatively affects the ..."
Abstract
- Add to MetaCart
The existing query-log based on-line index maintenance approaches rely on frequency distribution of terms in the static query-log. Though these approaches are proved to be efficient, but in real world, the frequency distribution of the terms changes over a period of time. This negatively affects the efficiency of the static query-log based approaches. To overcome this problem, we propose an index tuning strategy for reorganizing the indexes according to the latest frequency distribution of the terms captured from query-logs. Experimental results show that the proposed tuning strategy improves the performance of static query-log based approaches.

