• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Precision and recall of Gloss estimators for database discovery (1994)

by L Gravano
Venue:In PDIS’94
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 22
Next 10 →

Routing indices for peer-to-peer systems

by Arturo Crespo, Hector Garcia-molina , 2002
"... Finding information in a peer-to-peer system currently requires either a costly and vulnerable central index, or ooding the network with queries. In this paper we introduce the concept of Routing Indices (RIs), which allow nodes to forward queries to neighbors that are more likely to have answers. I ..."
Abstract - Cited by 313 (12 self) - Add to MetaCart
Finding information in a peer-to-peer system currently requires either a costly and vulnerable central index, or ooding the network with queries. In this paper we introduce the concept of Routing Indices (RIs), which allow nodes to forward queries to neighbors that are more likely to have answers. If a node cannot answer a query, it forwards the query to a subset of its neighbors, based on its local RI, rather than by selecting neighbors at random or by ooding the network by forwarding the query to all neighbors. We present three RI schemes: the compound, the hop-count, and the exponential routing indices. We evaluate their performance via simulations, and nd that RIs can improve performance by one or two orders of magnitude vs. a ooding-based system, and by up to 100 % vs. a random forwarding system. We also discuss the tradeo s between the di erent RIschemes and highlight the e ects of key design variables on system performance.

GlOSS: Text-Source Discovery over the Internet

by Luis Gravano, Hector Garcia-Molina, Anthony Tomasic , 1998
"... ..."
Abstract - Cited by 177 (15 self) - Add to MetaCart
Abstract not found

Query-Based Sampling of Text Databases

by Jamie Callan, Margaret Connell - ACM TRANSACTIONS ON INFORMATION SYSTEMS , 1999
"... ... This paper presents query-based sampling, a new technique for acquiring accurate resource descriptions. Query-based sampling does not require the cooperationof resource providers nor does it require that resource providers use a particular search engine or representation technique. An extensive ..."
Abstract - Cited by 134 (13 self) - Add to MetaCart
... This paper presents query-based sampling, a new technique for acquiring accurate resource descriptions. Query-based sampling does not require the cooperationof resource providers nor does it require that resource providers use a particular search engine or representation technique. An extensive set of experimental results demonstrates that accurate resource descriptions are created, that computation and communication costs are reasonable, and that the resource descriptions do in fact enable accurate automatic database selection.

Experiences with Selecting Search Engines Using Metasearch

by Daniel Dreilinger, Adele E. Howe - ACM Transactions on Information Systems , 1997
"... This article describes and evaluates SavvySearch, a metasearch engine designed to intelligently select and interface with multiple remote search engines. The primary metasearch issue examined is the importance of carefully selecting and ranking remote search engines for user queries. We studied the ..."
Abstract - Cited by 86 (3 self) - Add to MetaCart
This article describes and evaluates SavvySearch, a metasearch engine designed to intelligently select and interface with multiple remote search engines. The primary metasearch issue examined is the importance of carefully selecting and ranking remote search engines for user queries. We studied the efficacy of SavvySearch's incrementally acquired metaindex approach to selecting search engines by analyzing the effect of time and experience on performance. We also compared the metaindex approach to the simpler categorical approach and showed how much experience is required to surpass the simple scheme.

Data Structures for Efficient Broker Implementation

by Anthony Tomasic, Luis Gravano, Calvin Lue, Peter Schwarz, Laura Haas - ACM TRANSACTIONS ON INFORMATION SYSTEMS , 1997
"... ..."
Abstract - Cited by 14 (4 self) - Add to MetaCart
Abstract not found

Generalizing GIOSS to vector-space databases and broker hierarchies

by Luis Gravano, Hkctor Garcia-molina - VLDB’95, Proceedings of 21th International Conference on Very Large Data Bases , 1995
"... As large numbers of text databases have be-come available on the Internet, it is harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statis-t,ics on the available databases to estimate which databases are the pot ..."
Abstract - Cited by 14 (0 self) - Add to MetaCart
As large numbers of text databases have be-come available on the Internet, it is harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statis-t,ics on the available databases to estimate which databases are the potentially most use-ful for a given query. gGlOSS extends our pre-vious work [l], which focused on databases us-ing the boolean model of document retrieval, to cover databases using the more sophisti-cated vector-space retrieval model. We evalu-ate our new techniques using real-user queries and 53 databases. Finally, we further gener-alize our approach by showing how to build a hierarchy of gGlOSS brokers. The top level of the hierarchy is so small it could be widely replicated, even at end-user workstations. *This research was sponsored by the Advanced Research

Towards Context-Based Search Engine Selection

by David B. Leake, Ryan Scherle
"... A well-known problem for web search is targeting search on information that satis#es users' information needs. User queries tend to be short, and hence often ambiguous, which can lead to inappropriate results from general-purpose search ..."
Abstract - Cited by 13 (5 self) - Add to MetaCart
A well-known problem for web search is targeting search on information that satis#es users' information needs. User queries tend to be short, and hence often ambiguous, which can lead to inappropriate results from general-purpose search

Query-driven document partitioning and collection selection

by Fabrizio Silvestri, Domenico Laforenza - in INFOSCALE 2006: Proceedings of the first International Conference on Scalable Information Systems , 2006
"... Abstract — We present a novel strategy to partition a document collection onto several servers and to perform effective collection selection. The method is based on the analysis of query logs. We proposed a novel document representation called query-vectors model. Each document is represented as a l ..."
Abstract - Cited by 10 (3 self) - Add to MetaCart
Abstract — We present a novel strategy to partition a document collection onto several servers and to perform effective collection selection. The method is based on the analysis of query logs. We proposed a novel document representation called query-vectors model. Each document is represented as a list recording the queries for which the document itself is a match, along with their ranks. To both partition the collection and build the collection selection function, we co-cluster queries and documents. The document clusters are then assigned to the underlying IR servers, while the query clusters represent queries that return similar results, and are used for collection selection. We show that this document partition strategy greatly boosts the performance of standard collection selection algorithms, including CORI, w.r.t. a round-robin assignment. Secondly, we show that performing collection selection by matching the query to the existing query clusters and successively choosing only one server, we reach an average precision-at-5 up to 1.74 and we constantly improve CORI precision of a factor between 11 % and 15%. As a side result we show a way to select rarely asked-for documents. Separating these documents from the rest of the collection allows the indexer to produce a more compact index containing only relevant documents that are likely to be requested in the future. In our tests, around 52 % of the documents (3,128,366) are not returned among the first 100 top-ranked results of any query. I.

Collection Selection for Managed Distributed Document Databases

by Daryl D'Souza, James A. Thom, Justin Zobel - IN , 2004
"... In a distributed document database system, a query is processed by passing it to a set of individual collections and collating the responses. For a system with many such collections, it is attractive to first identify a small subset of collections as likely to hold documents of interest before in ..."
Abstract - Cited by 7 (4 self) - Add to MetaCart
In a distributed document database system, a query is processed by passing it to a set of individual collections and collating the responses. For a system with many such collections, it is attractive to first identify a small subset of collections as likely to hold documents of interest before interrogating only this small subset in more detail. A method for choosing collections that has been widely investigated is the use of a selection index, which captures broad information about each collection and its documents. In this paper, we re-evaluate several techniques for collection selection. We have constructed

Selecting Task-Relevant Sources for Just-in-Time Retrieval

by David B. Leake, Ryan Scherle, Jay Budzik, Kristian Hammond - In Proceedings of the AAAI-99 Workshop on Intelligent Information Systems, Menlo Park, CA , 1999
"... "Just-in-time" information systems monitor their users' tasks, anticipate task-based information needs, and proactively provide their users with relevant information. The effectiveness of such systems depends both on their capability to track user tasks and on their ability to retrieve information t ..."
Abstract - Cited by 6 (2 self) - Add to MetaCart
"Just-in-time" information systems monitor their users' tasks, anticipate task-based information needs, and proactively provide their users with relevant information. The effectiveness of such systems depends both on their capability to track user tasks and on their ability to retrieve information that satisfies taskbased needs. The Watson system (Budzik et al. 1998; Budzik & Hammond 1999) provides a framework for monitoring user tasks and identifying relevant content areas, and uses this information to generate focused queries for general-purpose search engines and for specialized search engines integrated into the system. The proliferation of specialized search engines and information repositories on the Web provides a rich source of additional information pre-focused for a wide range information needs, potentially enabling just-in-time systems to exploit that focus by querying the most relevant sources. However, putting this into practice depends on having general scalable methods f...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University