Results 1 -
7 of
7
Harvest: A Scalable, Customizable Discovery and Access System
, 1995
"... Rapid growth in data volume, user base, and data diversity render Internet-accessible information increasingly difficult to use effectively. In this paper we introduce Harvest, a system that provides an integrated set of customizable tools for gathering information from diverse repositories, buil ..."
Abstract
-
Cited by 159 (7 self)
- Add to MetaCart
Rapid growth in data volume, user base, and data diversity render Internet-accessible information increasingly difficult to use effectively. In this paper we introduce Harvest, a system that provides an integrated set of customizable tools for gathering information from diverse repositories, building topic-specific content indexes, flexibly searching the indexes, widely replicating them, and caching objects as they are retrieved across the Internet. The system interoperates with WWW clients and with HTTP,FTP, Gopher, and NetNews information resources. We discuss the design and implementation of Harvest and its subsystems, give examples of its uses, and provide measurements indicating that Harvest can significantly reduce server load, network traffic, and space requirements when building indexes, compared with previous systems. We also discuss several popular indexes wehave built using Harvest, underscoring the customizability and scalability of the system.
Scalable Internet Resource Discovery: Research Problems and Approaches
, 1994
"... Over the past several years, a number of information discovery and access tools have been introduced in the Internet, including Archie, Gopher, Netfind, and WAIS. These tools have become quite popular, and are helping to redefine how people think about wide-area network applications. Yet, they ar ..."
Abstract
-
Cited by 121 (3 self)
- Add to MetaCart
Over the past several years, a number of information discovery and access tools have been introduced in the Internet, including Archie, Gopher, Netfind, and WAIS. These tools have become quite popular, and are helping to redefine how people think about wide-area network applications. Yet, they are not well suited to supporting the future information infrastructure, which will be characterized by enormous data volume, rapid growth in the user base, and burgeoning data diversity. In this paper we indicate trends in these three dimensions and survey problems these trends will create for current approaches. We then suggest several promising directions of future resource discovery research, along with some initial results from projects carried out by members of the Internet Research Task Force Research Group on Resource Discovery and Directory Service.
The Efficacy of GlOSS for the Text Database Discovery Problem
, 1993
"... The popularity of information retrieval has led users to a new problem: finding which text databases (out of thousands of candidate choices) are the most relevant to a user. Answering a given query with a list of relevant databases is the text database discovery problem. The first part of this paper ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
The popularity of information retrieval has led users to a new problem: finding which text databases (out of thousands of candidate choices) are the most relevant to a user. Answering a given query with a list of relevant databases is the text database discovery problem. The first part of this paper presents a practical method for attacking this problem based on estimating the result size of a query and a database. The method is termed GlOSS-Glossary of Servers Server. The second part of this paper evaluates GlOSS using four different semantics to answer a user's queries. Real users' queries were used in the experiments. We also describe several variations of GlOSS and compare their efficacy. In addition, we analyze the storage cost of our approach to the problem. 1 Introduction Information vendors such as Dialog and Mead Data Central provide content-indexed access to multiple databases. Dialog for instance has over three hundred databases. In addition, the advent of Archie, WAIS, Wor...
Research Problems for Scalable Internet Resource Discovery
, 1993
"... Over the past several years, a number of information discovery and access tools have been introduced in the Internet, including Archie, Gopher, Netfind, and WAIS. These tools have become quite popular, and are helping to redefine how people think about wide area network applications. Yet, they ar ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Over the past several years, a number of information discovery and access tools have been introduced in the Internet, including Archie, Gopher, Netfind, and WAIS. These tools have become quite popular, and are helping to redefine how people think about wide area network applications. Yet, they are not well suited to supporting the future information infrastructure, which will be characterized by enormous data volume, rapid growth in the user base, and burgeoning data diversity. In this paper we indicate trends in these three dimensions, and survey problems these trends will create for current approaches. We then suggest several promising directions of future resource discovery research, along with some initial results from projects carried out by members of the Internet Research Task Force Research Group on Resource Discovery and Directory Service.
Vocabulary Problem in Internet Resource Discovery
- in Proceedings of the Second International Workshop on Next Generation Information Technologies and Systems, Naharia
, 1994
"... When searching information in a retrieval system, people use a variety of terms to describe their information needs. When the terms used in a query are different from those indexed by the system, users fail to obtain the information they want. This is called the vocabulary problem. This problem has ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
When searching information in a retrieval system, people use a variety of terms to describe their information needs. When the terms used in a query are different from those indexed by the system, users fail to obtain the information they want. This is called the vocabulary problem. This problem has been studied and discussed in information retrieval for decades. Recently Deerwester et. al proposed a new technique based on singular value decomposition and obtained promising results. In this paper, we describe how to apply this technique to Internet resource discovery.
Two-Dimensional Visualization for Internet Resource Discovery
, 1995
"... Traditional information retrieval systems return documents in a list, where documents are sorted according to their publication dates, titles, or similarities to the user query. Users select interested documents by searching them through the returned list. In a distributed environment, documents are ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Traditional information retrieval systems return documents in a list, where documents are sorted according to their publication dates, titles, or similarities to the user query. Users select interested documents by searching them through the returned list. In a distributed environment, documents are returned from more than one information server. A simple document list is not efficient to present a large volume data to the users. In this paper, we propose a two-dimensional visualization scheme for Internet resource discovery. Our method displays data by clusters according to their cross-similarities and still retains the rankings with respect to the user query. In addition, it provides a customized view that arranges data in favor of user's preference terms.

