Results 1 -
6 of
6
The MetaCrawler Architecture for Resource Aggregation on the Web
- IEEE Expert
, 1997
"... The MetaCrawler Softbot is a parallel Web search service that has been available at the University of Washington since June of 1995. It provides users with a single interface with which they can query popular general-purpose Web search services, such as Lycos[6] and AltaVista[1], and has some sophis ..."
Abstract
-
Cited by 155 (1 self)
- Add to MetaCart
The MetaCrawler Softbot is a parallel Web search service that has been available at the University of Washington since June of 1995. It provides users with a single interface with which they can query popular general-purpose Web search services, such as Lycos[6] and AltaVista[1], and has some sophisticated features that allow it to obtain results of much higher quality than simply regurgitating the output from each search
HyPursuit: A hierarchical network search engine that exploits content-link hypertext clustering
- PROCEEDINGS OF THE SEVENTH ACM CONFERENCE ON HYPERTEXT
, 1996
"... HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search activities. Our content-link clustering algorithm is based on the semantic information embedded in hyperlink structures and document contents. HyPurs ..."
Abstract
-
Cited by 88 (2 self)
- Add to MetaCart
HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search activities. Our content-link clustering algorithm is based on the semantic information embedded in hyperlink structures and document contents. HyPursuit admits multiple, coexisting cluster hierarchies based on different principles for grouping documents, such as the Library of Congress catalog scheme and automatically created hypertext clusters. HyPursuit's abstraction functions summarize cluster contents to support scalable query processing. The abstraction functions satisfy system resource limitations with controlled information loss. The result of query processing operations on a cluster summary approximates the result of performing the operations on the entire information space. We constructed a prototype system comprising 100 leaf World Wide Web sites and a hierarchy of 42 servers that route queries to the leaf sites. Experience with our system suggests that abstraction functions based on hypertext clustering can be used to construct meaningful and scalable cluster hierarchies. We are also encouraged by preliminary results on clustering based on both document contents and hyperlink structures.
Fast and effective query refinement
- IN PROC. OF THE 20TH INTL. ACM SIGIR CONF. ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL
, 1997
"... Query Refinement is an essential information retrieval tool that interactively recommends new terms related to a particular query. This paper introduces concept recall, an experimental measure of an algorithm's ability to suggest terms humans have judged to be semantically related to an information ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
Query Refinement is an essential information retrieval tool that interactively recommends new terms related to a particular query. This paper introduces concept recall, an experimental measure of an algorithm's ability to suggest terms humans have judged to be semantically related to an information need. This study uses precision improvement experiments to measure the ability of an algorithm to produce single term query modifications that predict a user's information need as partially encoded by the query. An oracle algorithm produces ideal query modifications, providing a meaningful context for interpreting precision improvement results. This study also introduces RMAP, a fast and practical query refinement algorithm that refines multiple term queries by dynamically combining precomputed suggestions for single term queries. RMAP achieves accuracy comparable to a much slower algorithm, although both RMAP and the slower algorithm lag behind the best possible term suggestions o ered by the oracle. We believe RMAP is fast enough to be integrated into present dayInternet search engines: RMAP computes 100 term suggestions for a 160,000 document collection in 15 ms on a low-end PC.
Content Routing: A Scalable Architecture for Network-Based Information Discovery
, 1996
"... This thesis presents a new architecture for information discovery based on a hierarchy of content routers that provide both browsing and search services to end users. Content routers catalog information servers, which may in turn be other content routers. The resulting hierarchy of content routers a ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This thesis presents a new architecture for information discovery based on a hierarchy of content routers that provide both browsing and search services to end users. Content routers catalog information servers, which may in turn be other content routers. The resulting hierarchy of content routers and leaf servers provides a rich set of services to end users for locating information, including query refinement and query routing. Query refinement helps a user improve a query fragment to describe the user's interests more precisely. Once a query has been refined and describes a manageable result set, query routing automatically forwards the query to relevant servers. These services make use of succinct descriptions of server contents called content labels. A unique contribution of this research is the demonstration of a scalable discovery architecture based on a hierarchical approach to routing.
Query Routing in Large-scale Digital Library Systems
- In Proceedings of the 15th International Conference on Data Engineering (ICDE), IEEE
, 1997
"... Modern digital libraries require user-friendly and yet responsive access to the rapidly growing, heterogeneous, and distributed collection of information sources. However, the increasing volume and diversity of digital information available online have led to a growing problem that conventional da ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Modern digital libraries require user-friendly and yet responsive access to the rapidly growing, heterogeneous, and distributed collection of information sources. However, the increasing volume and diversity of digital information available online have led to a growing problem that conventional data management systems do not have, namely #nding which information sources out of many candidate choices are the most relevant to answer a given user query.We refer to this problem as the query routing problem.
The MetaCrawler Architecture for Resource Aggregation on the Web
- IEEE Expert
, 1997
"... this article, we briefly outline the motivation for MetaCrawler and highlight previous work, and then discuss the architecture of MetaCrawler and how it enables MetaCrawler to perform well and to scale and adapt to a dynamic Internet. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
this article, we briefly outline the motivation for MetaCrawler and highlight previous work, and then discuss the architecture of MetaCrawler and how it enables MetaCrawler to perform well and to scale and adapt to a dynamic Internet.

