Results 1 - 10
of
12
Inverted files for text search engines
- ACM Computing Surveys
, 2006
"... The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolida ..."
Abstract
-
Cited by 136 (2 self)
- Add to MetaCart
The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolidated in textbooks, many specific techniques are not widely known or the textbook descriptions are out of date. In this tutorial, we introduce the key techniques in the area, describing both a core implementation and how the core can be enhanced through a range of extensions. We conclude with a comprehensive bibliography of text indexing literature.
Fast generation of result snippets in web search
- In Kraaij et al
"... The presentation of query biased document snippets as part of results pages presented by search engines has become an expectation of search engine users. In this paper we explore the algorithms and data structures required as part of a search engine to allow efficient generation of query biased snip ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
The presentation of query biased document snippets as part of results pages presented by search engines has become an expectation of search engine users. In this paper we explore the algorithms and data structures required as part of a search engine to allow efficient generation of query biased snippets. We begin by proposing and analysing a document compression method that reduces snippet generation time by 58 % over a baseline using the zlib compression library. These experiments reveal that finding documents on secondary storage dominates the total cost of generating snippets, and so caching documents in RAM is essential for a fast snippet generation process. Using simulation, we examine snippet generation performance for different size RAM caches. Finally we propose and analyse document reordering and compaction, revealing a scheme that increases the number of document cache hits with only a marginal affect on snippet quality. This scheme effectively doubles the number of documents that can fit in a fixed size cache.
Techniques for efficient query expansion
- Proc. String Processing and Information Retrieval Symp
, 2004
"... Abstract. Query expansion is a well-known method for improving average effectiveness in information retrieval. However, the most effective query expansion methods rely on costly retrieval and processing of feedback documents. We explore alternative methods for reducing queryevaluation costs, and pro ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
Abstract. Query expansion is a well-known method for improving average effectiveness in information retrieval. However, the most effective query expansion methods rely on costly retrieval and processing of feedback documents. We explore alternative methods for reducing queryevaluation costs, and propose a new method based on keeping a brief summary of each document in memory. This method allows query expansion to proceed three times faster than previously, while approximating the effectiveness of standard expansion. 1
Efficient online index maintenance for contiguous inverted lists
- Inf. Process. Manage
"... Search engines and other text retrieval systems use high-performance inverted indexes to provide efficient text query evaluation. Algorithms for fast query evaluation and index construction are well-known, but relatively little has been published concerning update. In this paper, we experimentally e ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Search engines and other text retrieval systems use high-performance inverted indexes to provide efficient text query evaluation. Algorithms for fast query evaluation and index construction are well-known, but relatively little has been published concerning update. In this paper, we experimentally evaluate the two main alternative strategies for index maintenance in the presence of insertions, with the constraint that inverted lists remain contiguous on disk for fast query evaluation. The in-place and re-merge strategies are benchmarked against the baseline of a complete re-build. Our experiments with large volumes of web data show that re-merge is the fastest approach if large buffers are available, but that even a simple implementation of in-place update is suitable when the rate of insertion is low or memory buffer size is limited. We also show that with careful design of aspects of implementation such as free-space management, in-place update can be improved by around an order of magnitude over a naïve implementation. Keywords: Text indexing, search engines, index construction, index update. This paper incorporates and extends material from “In-place versus re-build versus re-merge: Index maintenance
RMIT University at TREC 2005: Terabyte and Robust Track
- Proc. Text Retrieval Conf. (TREC), Gaithersburg, MD, November 2005. National Institute of Standards and Technology. Proceedings
, 2005
"... this paper we outline our approaches and experiments in both tracks, and discuss our results ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
this paper we outline our approaches and experiments in both tracks, and discuss our results
Valence Topological Charge-Transfer Indices for Dipole Moments
- J. Mol. Struct. (Theochem) 2003
"... Abstract: Valence topological charge-transfer (CT) indices are applied to the calculation of dipole moments. The algebraic and vector semisum CT indices are defined. The combination of CT indices allows the estimation of the dipole moments. The model is generalized for molecules with heteroatoms. Th ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract: Valence topological charge-transfer (CT) indices are applied to the calculation of dipole moments. The algebraic and vector semisum CT indices are defined. The combination of CT indices allows the estimation of the dipole moments. The model is generalized for molecules with heteroatoms. The ability of the indices for the description
Efficient Query Expansion with Auxiliary Data Structures
, 2005
"... Query expansion is a well-known method for improving average effectiveness in information retrieval. The most effective query expansion methods rely on retrieving documents which are used as a source of expansion terms. Retrieving those documents is costly. We examine the bottlenecks of a convention ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Query expansion is a well-known method for improving average effectiveness in information retrieval. The most effective query expansion methods rely on retrieving documents which are used as a source of expansion terms. Retrieving those documents is costly. We examine the bottlenecks of a conventional approach and investigate alternative methods aimed at reducing query evaluation time. We propose a new method that draws candidate terms from brief document summaries that are held in memory for each document. While approximately maintaining the effectiveness of the conventional approach, this method significantly reduces the time required for query expansion by a factor of five to ten.
Chapter 4 Generation of QSAR Sets Using a Self-Organizing Map
"... As mentioned in Chapter 2, self-organizing maps 1 (SOM) are a class of unsu-pervised neural networks whose characteristic feature is their ability to map nonlinear relations in multi-dimensional datasets into easily visualizable two-dimensional grids of neurons. SOM’s are also referred to as self-or ..."
Abstract
- Add to MetaCart
As mentioned in Chapter 2, self-organizing maps 1 (SOM) are a class of unsu-pervised neural networks whose characteristic feature is their ability to map nonlinear relations in multi-dimensional datasets into easily visualizable two-dimensional grids of neurons. SOM’s are also referred to as self-organized topological feature maps since the
Document Priors for Query Prediction
, 2005
"... e the quality of a set of predictions is the area under the curve metric that measures the quality of the prediction by measuring the difference in average precision between the worst 25 predicted topics of a run, to the worst 25 performing topics. We use this metric on the TREC 2005 Robust topics a ..."
Abstract
- Add to MetaCart
e the quality of a set of predictions is the area under the curve metric that measures the quality of the prediction by measuring the difference in average precision between the worst 25 predicted topics of a run, to the worst 25 performing topics. We use this metric on the TREC 2005 Robust topics and the Aquaint collection. Results: The figure illustrates the performance of our five techniques using the area under the curve approach. Each line shows the mean average precision for the remaining TREC queries after removing the worst x queries. The best possible prediction for this run is shown with the optimal curve. The area between the optimal curve and the other curves shows the gap between that approach and the optimal prediction. 0 5 10 15 20 25 Remaining 50 - x queries 0.15 0.20 0.25 MAP of remaining documents optimal as-a-1000 as-k-1000 as-k-50 as-p-1000 dl-k-1000 Conclusions: We explore a novel approach to query difficulty prediction and propose five metrics to determine
Efficient Query Evaluation Through Access-Reordering
"... Abstract. Reorganising the index of a search engine based on access frequencies can significantly reduce query evaluation time while maintaining search effectiveness. In this paper we extend access-ordering and introduce a variant index organisation technique that we label accessreordering. We show ..."
Abstract
- Add to MetaCart
Abstract. Reorganising the index of a search engine based on access frequencies can significantly reduce query evaluation time while maintaining search effectiveness. In this paper we extend access-ordering and introduce a variant index organisation technique that we label accessreordering. We show that by access-reordering an inverted index, query evaluation time can be reduced by as much as 62 % over the standard approach, while yielding highly similar effectiveness results to those obtained when using a conventional index. Keywords: Searchengines,indexorganisation,efficiency,access-ordering. 1

