Results 1 - 10
of
50
ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval
- In WebDB
, 2003
"... this paper appears in [15], and updated information is available at http://cis.poly.edu/westlab/odissea/ ..."
Abstract
-
Cited by 86 (3 self)
- Add to MetaCart
this paper appears in [15], and updated information is available at http://cis.poly.edu/westlab/odissea/
Static Index Pruning for Information Retrieval Systems
- In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 2001
"... We introduce static index pruning methods that significantly reduce the index size in information retrieval systems. We investigate uniform and term-based methods that each remove selected entries from the index and yet have only a minor effect on retrieval results. In uniform pruning, there is a fi ..."
Abstract
-
Cited by 64 (3 self)
- Add to MetaCart
We introduce static index pruning methods that significantly reduce the index size in information retrieval systems. We investigate uniform and term-based methods that each remove selected entries from the index and yet have only a minor effect on retrieval results. In uniform pruning, there is a fixed cutoff threshold, and all index entries whose contribution to relevance scores is bounded above by a given threshold are removed from the index. In term-based pruning, the cutoff threshold is determined for each term, and thus may vary from term to term. We give experimental evidence that for each level of compression, term-based pruning outperforms uniform pruning, under various measures of precision. We present theoretical and experimental evidence that under our term-based pruning scheme, it is possible to prune the index greatly and still get retrieval results that are almost as good as those based on the full index. Topic areas: indexing, compression 1.
Indri: a language-model based search engine for complex queries
- in Proceedings of the International Conference on Intelligent Analysis
, 2005
"... Search engines are a critical tool for intelligence analysis. A number of innovations for search have been introduced since research with an emphasis on analyst needs began in the TIPSTER project. For example, the Inquery search engine introduced support for specification of complex queries in a pro ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Search engines are a critical tool for intelligence analysis. A number of innovations for search have been introduced since research with an emphasis on analyst needs began in the TIPSTER project. For example, the Inquery search engine introduced support for specification of complex queries in a probabilistic inference network framework. Recent research on language modeling has led to the development of Indri, a search engine that combines the best features of inference nets and language modeling in an architecture designed for large-scale applications. In this paper, we describe the Indri system and show how the query language is designed to support modern language technologies. We also present results demonstrating that Indri is both effective and efficient. 1.
Optimized Query Execution in Large Search Engines with Global Page Ordering
, 2003
"... Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the ran ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the range of many megabytes. To address this issue, IR and database researchers have proposed pruning techniques that compute or approximate term-based ranking functions without scanning over the full inverted lists.
Tree Pattern Relaxation
, 2002
"... Tree patterns are fundamental to querying tree-structured data like XML. Because of the heterogeneity of XML data, it is often more appropriate to permit approximate query matching and return ranked answers, in the spirit of Information Retrieval, than to return only exact answers. In this paper ..."
Abstract
-
Cited by 45 (5 self)
- Add to MetaCart
Tree patterns are fundamental to querying tree-structured data like XML. Because of the heterogeneity of XML data, it is often more appropriate to permit approximate query matching and return ranked answers, in the spirit of Information Retrieval, than to return only exact answers. In this paper, we study the problem of approximate XML query matching, based on tree pattern relaxations, and devise efficient algorithms to evaluate relaxed tree patterns. We consider weighted tree patterns, where exact and relaxed weights, associated with nodes and edges of the tree pattern, are used to compute the scores of query answers. We are
Efficient Passage Ranking for Document Databases
- ACM Transactions on Information Systems
, 1999
"... Queries to text collections are resolved by ranking the documents in the collection and returning the highest-scoring documents to the user. An alternative retrieval method is to rank passages, that is, short fragments of documents, a strategy that can improve effectiveness and identify relevant mat ..."
Abstract
-
Cited by 39 (5 self)
- Add to MetaCart
Queries to text collections are resolved by ranking the documents in the collection and returning the highest-scoring documents to the user. An alternative retrieval method is to rank passages, that is, short fragments of documents, a strategy that can improve effectiveness and identify relevant material in documents that are too large for users to consider as a whole. However, ranking of passages can considerably increase retrieval costs. In this paper we explore alternative query evaluation techniques, and develop new techniques for evaluating queries on passages. We show experimentally that, appropriately implemented, effective passage retrieval is practical in limited memory on a desktop machine. Compared to passage ranking with adaptations of current document ranking algorithms, our new "DO-TOS" passage ranking algorithm requires only a fraction of the resources, at the cost of a small loss of effectiveness.
Pruned query evaluation using pre-computed impacts
- In SIGIR
, 2006
"... Exhaustive evaluation of ranked queries can be expensive, particularly when only a small subset of the overall ranking is required, or when queries contain common terms. This concern gives rise to techniques for dynamic query pruning, that is, methods for eliminating redundant parts of the usual exh ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
Exhaustive evaluation of ranked queries can be expensive, particularly when only a small subset of the overall ranking is required, or when queries contain common terms. This concern gives rise to techniques for dynamic query pruning, that is, methods for eliminating redundant parts of the usual exhaustive evaluation, yet still generating a demonstrably “good enough ” set of answers to the query. In this work we propose new pruning methods that make use of impact-sorted indexes. Compared to exhaustive evaluation, the new methods reduce the amount of computation performed, reduce the amount of memory required for accumulators, reduce the amount of data transferred from disk, and at the same time allow performance guarantees in terms of precision and mean average precision. These strong claims are backed by experiments using the TREC Terabyte collection and queries. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content analysis and indexing – indexing methods; H.3.2 [Information Storage and Retrieval]:
Approximating Matrix Multiplication for Pattern Recognition Tasks
- In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms
, 1997
"... Many pattern recognition tasks, including estimation, classification, and the finding of similar objects, make use of linear models. The fundamental operation in such tasks is the computation of the dot product between a query vector and a large database of instance vectors. Often we are interested ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
Many pattern recognition tasks, including estimation, classification, and the finding of similar objects, make use of linear models. The fundamental operation in such tasks is the computation of the dot product between a query vector and a large database of instance vectors. Often we are interested primarily in those instance vectors which have high dot products with the query. We present a random sampling based algorithm that enables us to identify, for any given query vector, those instance vectors which have large dot products, while avoiding explicit computation of all dot products. We provide experimental results that demonstrate considerable speedups for text retrieval tasks. 1 Introduction In pattern recognition tasks, a database of instances to be processed (images, signals, documents,...) is commonly represented as a set of a vectors x 1 ; : : : ; xn of numeric feature values. Examples of feature values include the number of times a word occurs in a document, the coordinates...
Sampling Search-Engine Results
, 2005
"... We consider the problem of efficiently sampling Web search engine query results. In turn, using a small random sample instead of the full set of results leads to efficient approximate algorithms for several applications, such as: . Determining the set of categories in a given taxonomy spanned by th ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
We consider the problem of efficiently sampling Web search engine query results. In turn, using a small random sample instead of the full set of results leads to efficient approximate algorithms for several applications, such as: . Determining the set of categories in a given taxonomy spanned by the search results; . Finding the range of metadata values associated to the result set in order to enable "multi-faceted search;" . Estimating the size of the result set; . Data mining associations to the query terms. We present
Efficient document retrieval in main memory
- In Proc. 30th ACM SIGIR
, 2007
"... Disk access performance is a major bottleneck in traditional information retrieval systems. Compared to system memory, disk bandwidth is poor, and seek times are worse. We circumvent this problem by considering query evaluation strategies in main memory. We show how new accumulator trimming techniqu ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
Disk access performance is a major bottleneck in traditional information retrieval systems. Compared to system memory, disk bandwidth is poor, and seek times are worse. We circumvent this problem by considering query evaluation strategies in main memory. We show how new accumulator trimming techniques combined with inverted list skipping can produce extremely high performance retrieval systems without resorting to methods that may harm effectiveness. We evaluate our techniques using Galago, a new retrieval system designed for efficient query processing. Our system achieves a 69 % improvement in query throughput over previous methods.

