Results 1 - 10
of
13
Untangling the Web from DNS
, 2004
"... The Web relies on the Domain Name System (DNS) to resolve the hostname portion of URLs into IP addresses. This marriage-of-convenience enabled the Web's meteoric rise, but the resulting entanglement is now hindering both infrastructures---the Web is overly constrained by the limitations of DNS, and ..."
Abstract
-
Cited by 50 (11 self)
- Add to MetaCart
The Web relies on the Domain Name System (DNS) to resolve the hostname portion of URLs into IP addresses. This marriage-of-convenience enabled the Web's meteoric rise, but the resulting entanglement is now hindering both infrastructures---the Web is overly constrained by the limitations of DNS, and DNS is unduly burdened by the demands of the Web. There has been much commentary on this sad state-of-affairs, but dissolving the illfated union between DNS and the Web requires a new way to resolve Web references. To this end, this paper describes the design and implementation of Semantic Free Referencing (SFR), a reference resolution infrastructure based on distributed hash tables (DHTs).
Buckets: Smart Objects for Digital Libraries
- Communications of the ACM
, 2001
"... Since its founding, NASA has been dedicated to the advancement of aeronautics and space science. The NASA Scientific and Technical Information (STI) Program Office plays a key part in helping NASA maintain this important role. The NASA STI Program Office is operated by Langley Research Center, the l ..."
Abstract
-
Cited by 15 (10 self)
- Add to MetaCart
Since its founding, NASA has been dedicated to the advancement of aeronautics and space science. The NASA Scientific and Technical Information (STI) Program Office plays a key part in helping NASA maintain this important role. The NASA STI Program Office is operated by Langley Research Center, the lead center for NASA's scientific and technical information. The NASA STI Program Office provides access to the NASA STI Database, the largest collection of aeronautical and space science STI in the world. The Program Office is also NASA's institutional mechanism for disseminating the results of its research and development activities. These results are published by NASA in the NASA STI Report Series, which includes the following report types:
Analysis of lexical signatures for improving information persistence on the World Wide Web
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2004
"... A lexical signature (LS) consisting of several key words from a Web document is often sufficient information for finding the document later, even if its URL has changed. We conduct a large-scale empirical study of nine methods for generating lexical signatures, including Phelps and Wilensky’s origin ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
A lexical signature (LS) consisting of several key words from a Web document is often sufficient information for finding the document later, even if its URL has changed. We conduct a large-scale empirical study of nine methods for generating lexical signatures, including Phelps and Wilensky’s original proposal (PW), seven of our own static variations, and one new dynamic method. We examine their performance on the Web over a 10-month period, and on a TREC data set, evaluating their ability to both (1) uniquely identify the original (possibly modified) document, and (2) locate other relevant documents if the original is lost. Lexical signatures chosen to minimize document frequency (DF) are good at unique identification but poor at finding relevant documents. PW works well on the relatively small TREC data set, but acts almost identically to DF on the Web, which contains billions of documents. Term-frequency-based lexical signatures (TF) are very easy to compute and often perform well, but are highly dependent on the ranking system of the search engine used. The term-frequency inverse-document-frequency- (TFIDF-) based method and hybrid methods (which combine DF with TF or TFIDF) seem to be the most promising candidates among static methods for generating effective lexical signatures. We propose a dynamic LS generator
Analysis of Lexical Signatures for Finding Lost or Related Documents
- 11–18 OF PROCEEDINGS OF THE 25TH ANNUAL INTERNATIONAL ACM-SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL
, 2002
"... A lexical signature of a web page is often sufficient for finding the page, even if its URL has changed. We conduct a largescale empirical study of eight methods for generating lexi- cal signatures, including Phelps and Wilensky's [14] original proposal (PW) and seven of our own variations. We exmni ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
A lexical signature of a web page is often sufficient for finding the page, even if its URL has changed. We conduct a largescale empirical study of eight methods for generating lexi- cal signatures, including Phelps and Wilensky's [14] original proposal (PW) and seven of our own variations. We exmnine their performance on the web and on a TREC data set, evaluating their ability both to uniquely identify the origi- nal document and to locate other relevant documents if the original is lost. Lexical signatures chosen to minimize document frequency (DF) are good at unique identification but poor at finding relevant documents. PW works well on the relatively small TREC data set, but acts almost identically to DF on the web, which contains billions of documents. Term-frequency-based lexical signatures (TF) are very easy to compute and often perform well, but are highly dependent on the ranking system of the search engine used. In general, TFIDF-based method and hybrid methods (which combine DF with TF or TFIDF) seem to be the most promising candidates for generating effective lexical signatures.
2001a) ‘‘A Digital Library for the Dissemination and Replication of Quantitative Social Science Research: The Virtual Data Center.’’ Social Science Computer Review 19:458–470. Reprint at http:// gking.harvard.edu/files/abs/vdcwhitepaper-abs.shtml Replicat
- Social Science Computer Review
, 2001
"... and reviews, on the topic of redistricting, numerical accuracy, and digital libraries. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
and reviews, on the topic of redistricting, numerical accuracy, and digital libraries.
Automatic Selection of Nearby Web Servers
- in Proc. of the 1998 SIGMETRICS/Performance Workshop on Internet Server Performance
, 1998
"... Performance of global services such as the worldwide web can be improved by physically distributed replicas. Most replicated systems today request users to select manually a nearby replica (typically from a list of sites), yet users often are not willing or able to make an informed choice. This pape ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Performance of global services such as the worldwide web can be improved by physically distributed replicas. Most replicated systems today request users to select manually a nearby replica (typically from a list of sites), yet users often are not willing or able to make an informed choice. This paper describes an approach to automatic replica selection which works without change to existing web clients and web and FTP servers. We describe the assumptions behind our approach and propose several metrics for estimation of network distance. We examine the feasibility behind this approach and compare the time required for replica-selection by each algorithm. Finally, we describe a small change to HTTP that would improve replica selection transparency. 1 Introduction A strength of the Internet and the world-wide web is its international character. Web links are location transparent ; users use the same kinds of names to access pages coming from a server in London as from one in New York. A...
Maintaining information resources
- Proceedings of the Third International Workshop on Next Generation Information Technologies (NGITS’97
"... With the proliferation of the World Wide Web, it has become very important to provide advanced tools for maintaining referential integrity of information resources. The growing tendency toward building increasingly complex Web sites makes it necessary to maintain not only physical files, but also lo ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
With the proliferation of the World Wide Web, it has become very important to provide advanced tools for maintaining referential integrity of information resources. The growing tendency toward building increasingly complex Web sites makes it necessary to maintain not only physical files, but also logical resources, or views, which are composed of references to other resources and presentation programs. Our solution to this problem is to design an infrastructure of resource maintenance agents. It includes the Data Agent, which keeps track of files and supports third-party requests to notify them of changes that occur to these files. Another component of the infrastructure is the Repository Agent, which supports change notification requests for logical resources. Prototype implementation of the infrastructure is currently available and is discussed in this paper. 1
Maintaining Information Resources
- Proceedings of the Third International Workshop on Next Generation Information Technologies (NGITS’97
, 1997
"... With the proliferation of the World Wide Web, it has become very important to provide advanced tools for maintaining referential integrity of information resources. The growing tendency toward building increasingly complex Web sites makes it necessary to maintain not only physical files, but also lo ..."
Abstract
- Add to MetaCart
With the proliferation of the World Wide Web, it has become very important to provide advanced tools for maintaining referential integrity of information resources. The growing tendency toward building increasingly complex Web sites makes it necessary to maintain not only physical files, but also logical resources, or views, which are composed of references to other resources and presentation programs. Our solution to this problem is to design an infrastructure of resource maintenance agents. It includes the Data Agent, which keeps track of files and supports third-party requests to notify them of changes that occur to these files. Another component of the infrastructure is the Repository Agent, which supports change notification requests for logical resources. Prototype implementation of the infrastructure is currently available and is discussed in this paper. 1 Introduction The World Wide Web is rapidly becoming ubiquitous, and its rapid expansion comes increasingly in the form of ...
Metadata and library mining: Analyzing the usage of a distributed electronic library
, 2000
"... this paper, we first propose a suitable architecture to incorporate metadata into a digital library. In reviewing the logging behavior of traditional Web servers, we will discuss the requirements for library mining, which cannot be met by available web mining tools. The main section will discuss the ..."
Abstract
- Add to MetaCart
this paper, we first propose a suitable architecture to incorporate metadata into a digital library. In reviewing the logging behavior of traditional Web servers, we will discuss the requirements for library mining, which cannot be met by available web mining tools. The main section will discuss the design and implementation of a suitable analyzing tool for digital libraries. We conclude by pointing out various applications for library mining.
Untangling the Web from DNS
"... The Web relies on the Domain Name System (DNS) to resolve the hostname portion of URLs into IP addresses. This marriage-of-convenience enabled the Web’s meteoric rise, but the resulting entanglement is now hindering both infrastructures—the Web is overly constrained by the limitations of DNS, and DN ..."
Abstract
- Add to MetaCart
The Web relies on the Domain Name System (DNS) to resolve the hostname portion of URLs into IP addresses. This marriage-of-convenience enabled the Web’s meteoric rise, but the resulting entanglement is now hindering both infrastructures—the Web is overly constrained by the limitations of DNS, and DNS is unduly burdened by the demands of the Web. There has been much commentary on this sad state-of-affairs, but dissolving the illfated union between DNS and the Web requires a new way to resolve Web references. To this end, this paper describes the design and implementation of Semantic Free Referencing (SFR), a reference resolution infrastructure based on distributed hash tables (DHTs). 1

