Results 1 -
2 of
2
Essence: A Resource Discovery System Based on Semantic File Indexing
- Proceedings of the USENIX Winter Conference
, 1993
"... Discovering different types of file resources (such as documentation, programs, and images) in the vast amount of data contained within network file systems is useful for both users and system administrators. In this paper we discuss the Essence resource discovery system, which exploits file semanti ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
Discovering different types of file resources (such as documentation, programs, and images) in the vast amount of data contained within network file systems is useful for both users and system administrators. In this paper we discuss the Essence resource discovery system, which exploits file semantics to index both textual and binary files. By exploiting semantics, Essence extracts keywords that summarize a file, and generates a compact yet representative index. Essence understands nested file structures (such as uuencoded, compressed, "tar" files), and recursively unravels such files to generate summaries for them. These features allow Essence to be used in a number of useful settings, such as anonymous FTP archives. We present measurements of our prototype and compare them to related projects, such as the Wide Area Information Servers (WAIS) system and the MIT Semantic File System (SFS). We demonstrate that Essence can index more data types, generate smaller indexes, and in some case...
Experience with a Semantically Cognizant Internet White Pages Directory Tool
- J. Internetworking: Research and Experience
, 1991
"... As wide area networking technology and interconnection improve, an increasingly important problem is allowing users to navigate through the vast array of network accessible resources. In this paper we discuss experience with one technique we have developed in this regard, applied to a specific resou ..."
Abstract
- Add to MetaCart
As wide area networking technology and interconnection improve, an increasingly important problem is allowing users to navigate through the vast array of network accessible resources. In this paper we discuss experience with one technique we have developed in this regard, applied to a specific resource class. We have built a prototype tool that provides a simple Internet "white pages" directory facility. Given the name of a user and a rough description of where the user works (e.g., the company name or city), the tool attempts to locate telephone and electronic mailbox information about that user. Measurements indicate that the scope of the directory is upwards of 1,147,000 users in 1,929 administrative domains, yet the tool does not require the type of global cooperation that many existing or proposed directory services require, namely, running special directory servers at many sites around the Internet. We accomplish this by building an understanding of the semantics of this particular resource discovery application into the algorithms that support searches, allowing the tool to make aggressive use of existing sources of relatively unstructured information. Being able to make use of such information is important in heterogeneous, administratively decentralized environments, where global agreement about highly structured information formats is difficult to achieve. At present, the tool utilizes information from USENET news messages, the Domain Naming System, the Simple Mail Transfer Protocol, and the "finger" protocol, as well as a variety of information about the meaning of and relationships between these information sources. Other sources of resource information (such as the CCITT X.500 directory service) can easily be incorporated into the tool as they become availa...

