MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Harvest: A Scalable, Customizable Discovery and Access System (1995) [163 citations — 8 self]

by C. Mic Bowman ,  Transarc Corp ,  Peter B. Danzig ,  Darren R. Hardy ,  Udi Manber ,  Michael F. Schwartz
Add To MetaCart

Abstract:

Rapid growth in data volume, user base, and data diversity render Internet-accessible information increasingly difficult to use effectively. In this paper we introduce Harvest, a system that provides a set of customizable tools for gathering information from diverse repositories, building topic-specific content indexes, flexibly searching the indexes, widely replicating them, and caching objects as they are retrieved across the Internet. The system interoperates with Mosaic and with HTTP, FTP, and Gopher information resources. We discuss the design and implementation of each subsystem, and provide measurements indicating that Harvest can reduce server load, network traffic, and space requirements significantly when building indexes, compared with previous systems. We also discuss a half dozen indexes we have built using Harvest, underscoring both the customizability and scalability of the system. 1 Introduction Over the past few years a progression of Internet publishing tools have ...

Citations

701 Scale and Performance in Distributed File Systems – Howard, Kazar, et al. - 1988
433 A hierarchical Internet object cache – Chankhunthod, Danszig, et al. - 1996
287 The vocabulary problem in human-system communication – Furnas, Landauer, et al. - 1987
249 Fast text searching allowing errors – Wu, Manber - 1992
188 The harvest information discovery and access system – Bowman, Danzig, et al. - 1995
182 World-Wide Web: The information universe – Berners-Lee, Cailliau, et al. - 1992
170 Grapevine: An exercise in distributed computing – Birrell, Levin, et al. - 1982
170 Glimpse: a tool to search through entire file systems – Manber, Wu - 1993
164 An evaluation of retrieval effectiveness for a full-text document retrieval system – Blair, Maron
149 RFC 1321 - The MD5 Message-Digest Algorithm – Rivest - 1992
114 Scalable internet resource discovery: Research problems and approaches – Bowman, Danzig, et al. - 1994
113 An Information System for Corporate Users: Wide Area Information Servers, Thinking Machines technical report TMC-99 – Kahle - 1991
104 an electronic directory service for the Internet – Emtage, Deutsch - 1992
94 Replication and fault-tolerance in the ISIS system – Birman - 1985
93 GENVL and WWWW: Tools for Taming the Web – McBryan - 1994
80 A case for caching file objects inside internetworks – Danzig, Hall, et al. - 1993
67 A Comparison of Internet Resource Discovery Approaches – Schwartz, Emtage, et al. - 1992
54 An analysis of wide-area name server traffic: A study of the internet domain name system – Danzig, Obraczka, et al. - 1994
44 Information retrieval in the world-wide web: Making client-based searching feasible – DeBra, Post - 1994
36 NCSA Mosaic Technical Summary – Andreessen - 1993
34 Essence: A resource discovery system based on semantic file indexing – Hardy, Schwartz - 1993
32 Univers: An attribute-based name server – Bowman, Peterson, et al. - 1990
31 RFC 768: User Datagram Protocol – Postel - 1980
26 Customized information extraction as a basis for resource discovery – Hardy, Schwartz - 1994
25 The Internet Gopher: A Distributed Server Information System – McCahill - 1992
19 Guidelines for Robot Writers – Koster - 1994
18 A File System for Information Management – Bowman, Dharap, et al. - 1994
16 Harvest User’s Manual – Hardy, Schwartz, et al. - 1996
15 Massively replicating services in autonomously managed wide-area internetworks – Danzig, Obraczka, et al. - 1994
13 RFC 1521: MIME (Multipurpose Internet Mail Extensions) part one: Mechanisms for specifying and describing the format of Internet message bodies – Borenstein, Freed - 1993
10 Quorum-oriented multicast protocols for data replication – Golding, Long - 1992
10 Experiences with a survey tool for discovering network time protocol servers – Guyton, Schwartz - 1994
8 Katia Obraczka. Distributed indexing of autonomous internet services – Danzig, Li - 1992
8 Publishing Information on the Internet with Anonymous FTP – Deutsch, Emtage - 1994
8 About the Veronica service – Foster - 1992
6 Integrating complex data access methods into the Mosaic/WWW environment – Chhabra, Hardy, et al. - 1994
6 Semantic file systems – O’Toole - 1991
6 CERN HTTPD public domain full-featured hypertext/proxy server with caching – Luotonen, Frystyk, et al. - 1994
6 A dial-up network of UNIX systems – Nowitz, Lesk - 1978
6 RFC 959: File transfer protocol (FTP – Postel, Reynolds - 1985
6 Architecture of the Whois++ index service – Weider, Fullton, et al. - 1992
3 LaTeX: A Document Prepartion System – Lamport - 1986
3 FTP mirroring software. Available from ftp://src.doc.ic.ac.uk/package/mirror.shar – McLoughlin - 1991
3 Traceroute software – Jacobsen - 1988
2 Uniform Resource Locators. CERN – Berners-Lee - 1993
2 Introduction to ALIWEB – Koster - 1994
2 The WebCrawler – Pinkerton - 1994
2 Content Routing for Distributed Information Servers – O'Toole, Gifford - 1994
2 Harvest protocol and subsystem specifications – Bowman, Danzig, et al. - 1994
1 2.01 Design Specification. Microsoft OLE2 Design Team – OLE - 1993