Results 1 - 10
of
55
Building efficient and effective metasearch engines
- ACM Computing Surveys
, 2002
"... Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a met ..."
Abstract
-
Cited by 107 (9 self)
- Add to MetaCart
Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a metasearch engine can be constructed. When a metasearch engine receives a query from a user, it invokes the underlying search engines to retrieve useful information for the user. Metasearch engines have other benefits as a search tool such as increasing the search coverage of the Web and improving the scalability of the search. In this article, we survey techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine. Among the main challenges, the database selection problem is to identify search engines that are likely to return useful documents to a given query. The document selection problem is to determine what documents to retrieve from each identified search engine. The result merging problem is to combine the documents returned from multiple search engines. We will also point out some problems that need to be further researched.
Context in Web Search
- IEEE Data Engineering Bulletin
, 2000
"... Web search engines generally treat search requests in isolation. The results for a given query are identical, independent of the user, or the context in which the user made the request. Nextgeneration search engines will make increasing use of context information, either by using explicit or implici ..."
Abstract
-
Cited by 100 (0 self)
- Add to MetaCart
Web search engines generally treat search requests in isolation. The results for a given query are identical, independent of the user, or the context in which the user made the request. Nextgeneration search engines will make increasing use of context information, either by using explicit or implicit context information from users, or by implementing additional functionality within restricted contexts. Greater use of context in web search may help increase competition and diversity on the web.
SavvySearch: A Meta-Search Engine that Learns which Search Engines to Query
- AI Magazine
, 1997
"... Search engines are among the most successful applications on the Web today. So many search engines have been created that it is difficult for users to know where they are, how to use them and what topics they best address. Meta-search engines reduce the user burden by dispatching queries to multiple ..."
Abstract
-
Cited by 84 (1 self)
- Add to MetaCart
Search engines are among the most successful applications on the Web today. So many search engines have been created that it is difficult for users to know where they are, how to use them and what topics they best address. Meta-search engines reduce the user burden by dispatching queries to multiple search engines in parallel. The SavvySearch meta-search engine is designed to efficiently query other search engines by carefully selecting those search engines likely to return useful results and by responding to fluctuating load demands on the Web. SavvySearch learns to identify which search engines are most appropriate for particular queries, reasons about resource demands and represents an iterative parallel search strategy as a simple plan. 1 The Application: Meta-Search on the Web Companies, institutions and individuals must have a presence on the Web; each are vying for the attention of millions of people. Not too surprisingly then, the most successful applications on the Web to dat...
Server Selection on the World Wide Web
, 2000
"... We evaluate server selection methods in a Web environment, modeling a digital library which makes use of existing Web search servers rather than building its own index. The evaluation framework portrays the Web realistically in several ways. Its search servers index real Web documents, are of variou ..."
Abstract
-
Cited by 66 (4 self)
- Add to MetaCart
We evaluate server selection methods in a Web environment, modeling a digital library which makes use of existing Web search servers rather than building its own index. The evaluation framework portrays the Web realistically in several ways. Its search servers index real Web documents, are of various sizes, cover different topic areas and employ different retrieval methods. Selection is based on statistics extracted from the results of probe queries submitted to each server. We evaluate published selection methods and a new method for enhancing selection based on expected search server effectiveness. Results show CORI to be the most effective of three published selection methods. CORI selection steadily degrades with fewer probe queries, causing a drop in early precision of as much as 0#05 (one relevant document out of 20). Modifying CORI selection based on an estimation of expected effectiveness disappointingly yields no significant improvement in effectiveness. However, modifying COR...
Merging Results From Isolated Search Engines
, 1999
"... Two new techniques for merging search results are introduced: Feature Distance ranking algorithms and Reference Statistics. These techniques are compared with other published methods, using TREC effectiveness evaluations based on human relevance judgements and input rankings from 5 different search ..."
Abstract
-
Cited by 38 (3 self)
- Add to MetaCart
Two new techniques for merging search results are introduced: Feature Distance ranking algorithms and Reference Statistics. These techniques are compared with other published methods, using TREC effectiveness evaluations based on human relevance judgements and input rankings from 5 different search engines over 5 disjoint document collections. The new techniques are found to be more effective than existing methods in an isolated-server environment such as the World Wide Web. In addition, Feature Distance algorithms are found to be as effective in an isolated-server environment using Reference Statistics as they are in an integrated-server environment.
Supporting Dynamic Interactions among Web-Based Information Sources
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2000
"... ..."
Web Metasearch: Rank vs. Score Based Rank Aggregation Methods
, 2003
"... Given a set of rankings, the task of ranking fusion is the problem of combining these lists in such a way to optimize the performance of the combination. The ranking fusion problem is encountered in many situations and, e.g., metasearch is a prominent one. It deals with the problem of combining the ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Given a set of rankings, the task of ranking fusion is the problem of combining these lists in such a way to optimize the performance of the combination. The ranking fusion problem is encountered in many situations and, e.g., metasearch is a prominent one. It deals with the problem of combining the result lists returned by multiple search engines in response to a given query, where each item in a result list is ordered with respect to a search engine and a relevance score. Several ranking fusion methods have been proposed in the literature. They can be classified based on whether: (i) they rely on the rank; (ii) they rely on the score; and (iii) they require training data or not. Our paper will make the following contributions: (i) we will report experimental results for the Markov chain rank based methods, for which no large experimental tests have yet been made; (ii) while it is believed that the rank based method, named Borda Count, is competitive with score based methods, we will show that this is not true for metasearch; and (iii) we will show that Markov chain based methods compete with score based methods. This is especially important in the context of metasearch as scores are usually not available from the search engines.
Towards a Highly-Scalable and Effective Metasearch Engine
, 2001
"... A metasearch engine is a system that supports unified access to multiple local search engines. Database selection is one of the main challenges in building a large-scale metasearch engine. The problem is to efficiently and accurately determine a small number of potentially useful local search engine ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
A metasearch engine is a system that supports unified access to multiple local search engines. Database selection is one of the main challenges in building a large-scale metasearch engine. The problem is to efficiently and accurately determine a small number of potentially useful local search engines to invoke for each user query. In order to enable accurate selection, metadata that reect the contents of each search engine need to be collected and used. In this paper, we propose a highly scalable and accurate database selection method. This method has several novel features. First, the metadata for representing the contents of all search engines are organized into a single integrated representative. Such a representative yields both computation efficiency and storage efficiency. Second, our selection method is based on a theory for ranking search engines optimally. Experimental results indicate that this new method is very effective. An operational prototype system has been built based on the proposed approach.
Detection of Heterogeneities in a Multiple Text Database Environment
- IN PROCEEDINGS OF THE FOURTH IFCIS INTERNATIONAL CONFERENCE ON COOPERATIVE INFORMATION SYSTEMS
, 1999
"... As the number of text retrieval systems (search engines) grows rapidly on the World Wide Web, there is an increasing need to build search brokers (metasearch engines) on top of them. Often, the task of building an effective and efficient metasearch engine is hindered by the heterogeneities among the ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
As the number of text retrieval systems (search engines) grows rapidly on the World Wide Web, there is an increasing need to build search brokers (metasearch engines) on top of them. Often, the task of building an effective and efficient metasearch engine is hindered by the heterogeneities among the underlying local search engines. In this paper, we first analyze the impact of various heterogeneities on building a metasearch engine. We then present some techniques that can be used to detect the most prominent heterogeneities among multiple search engines. Applications of utilizing the detected heterogeneities in building better metasearch engines will be provided.
Finding the Most Similar Documents across Multiple Text Databases
, 1999
"... In this paper, we present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases ..."
Abstract
-
Cited by 19 (12 self)
- Add to MetaCart
In this paper, we present a methodology for finding the n most similar documents across multiple text databases for any given query and for any positive integer n. This methodology consists of two steps. First, databases are ranked in a certain order. Next, documents are retrieved from the databases according to the order and in a particular way. If the databases containing the n most similar documents for a given query can be ranked ahead of other databases, the methodology will guarantee the retrieval of the n most similar documents for the query. A statistical method is provided to identify databases, each of which is estimated to contain at least one of the n most similar documents. Then, a number of strategies is presented to retrieve documents from the identified databases. Experimental results are given to illustrate the relative performance of different strategies. 1 Introduction The Internet has become a vast information source in recent years and can be considered as the w...

