Results 1 - 10
of
20
Characterizing the Scalability of a Large Web-Based Shopping System
- ACM Transactions on Internet Technology (TOIT)
, 2001
"... This article presents an analysis of five days of workload data from a large Web-based shopping system. The multitier environment of this Web-based shopping system includes Web servers, application servers, database servers, and an assortment of load-balancing and firewall appliances. We characteriz ..."
Abstract
-
Cited by 60 (2 self)
- Add to MetaCart
This article presents an analysis of five days of workload data from a large Web-based shopping system. The multitier environment of this Web-based shopping system includes Web servers, application servers, database servers, and an assortment of load-balancing and firewall appliances. We characterize user requests and sessions and determine their impact on system performance and scalability. The purpose of our study is to assess scalability and support capacity planning exercises for the multitier system. We find that horizontal scalability is not always an adequate mechanism for supporting increased workloads and that personalization and robots can have a significant impact on system scalability.
Information retrieval on the Web
- ACM Computing Surveys
, 2000
"... In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
In this paper we review studies of the growth of the Internet and technologies that are useful for information search and retrieval on the Web. We present data on the Internet from several different sources, e.g., current as well as projected number of users, hosts, and Web sites. Although numerical figures vary, overall trends cited
The RBSE Spider -- Balancing Effective Search Against Web Load
, 1994
"... The design of a Web spider entails many things, including a concern for reasonable behavior, as well as more technical concerns. The RBSE Spider is a mechanism for exploring World Wide Web structure and indexing useful material thereby discovered. We relate our experience in constructing and ope ..."
Abstract
-
Cited by 55 (4 self)
- Add to MetaCart
The design of a Web spider entails many things, including a concern for reasonable behavior, as well as more technical concerns. The RBSE Spider is a mechanism for exploring World Wide Web structure and indexing useful material thereby discovered. We relate our experience in constructing and operating this spider.
Maintaining Distributed Hypertext Infostructures: Welcome to MOMspider's Web
- First International Conference on the World Wide Web
, 1994
"... Most documents made available on the World-Wide Web can be considered part of an infostructure --- an information resource database with a specifically designed structure. Infostructures often contain a wide variety of information sources, in the form of interlinked documents at distributed sites, w ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
Most documents made available on the World-Wide Web can be considered part of an infostructure --- an information resource database with a specifically designed structure. Infostructures often contain a wide variety of information sources, in the form of interlinked documents at distributed sites, which are maintained by a number of different document owners (usually, but not necessarily, the original document authors). Individual documents may also be shared by multiple infostructures. Since it is rarely static, the content of an infostructure is likely to change over time and may vary from the intended structure. Documents may be moved or deleted, referenced information may change, and hypertext links may be broken. As it grows, an infostructure becomes complex and difficult to maintain. Such maintenance currently relies upon the error logs of each server (often never relayed to the document owners), the complaints of users (often not seen by the actual document maintainers), and pe...
Ethical Web Agents
, 1994
"... As the Web continues to evolve, the sophistication of the programs that are employed in interacting with it will also increase in sophistication. Web agents, programs acting autonomously on some task, are already present in the form of spiders. Agents offer substantial benefits and hazards, and beca ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
As the Web continues to evolve, the sophistication of the programs that are employed in interacting with it will also increase in sophistication. Web agents, programs acting autonomously on some task, are already present in the form of spiders. Agents offer substantial benefits and hazards, and because of this, their development must involve not only attention to technical details, but also the ethical concerns relating to their resulting impact. These ethical concerns will differ for agents employed in the creation of a service and agents acting on behalf of a specific individual. An ethic is proposed that addresses both of these perspectives. The proposal is predicated on the assumption that agents are a reality on the Web, and that there are no reasonable means of preventing their proliferation. 1 -- Introduction The ease of construction and potential Internet-wide impact of autonomous software agents on the World Wide Web [1] has spawned a great deal of discussion and occasional c...
Search Engines and Web Dynamics
, 2002
"... In this paper we study several dimensions of web dynamics in the context of large-scale Internet search engines. Both growth and update dynamics clearly represent big challenges for search engines. We show how the problems arise in all components of a reference search engine model. ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
In this paper we study several dimensions of web dynamics in the context of large-scale Internet search engines. Both growth and update dynamics clearly represent big challenges for search engines. We show how the problems arise in all components of a reference search engine model.
Resource and Knowledge Discovery in Global Information Systems: A Preliminary Design and Experiment
- In Proc. of the First Int'l Conference on Knowledge Discovery and Data Mining
, 1995
"... With huge amounts of information connected to the global information network (Internet), efficient and effective discovery of resource and knowledge from the "global information base" has become an imminent research issue, especially with the advent of the Information SuperHighway. In this article, ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
With huge amounts of information connected to the global information network (Internet), efficient and effective discovery of resource and knowledge from the "global information base" has become an imminent research issue, especially with the advent of the Information SuperHighway. In this article, a multiple layered database (MLDB) approach is proposed to handle the resource and knowledge discovery in global information base. A preliminary experiment using on-line technical reports, a representative subset of the Internet, shows the advantages of such an approach. A multiple layered database is a database formed by generalization and transformation of the information, layer-by-layer, starting from the original information base (treated as layer-0, the primitive layer). Information retrieval, data mining, and data analysis techniques can be used to extract and transform information from a lower layer database to a higher one. Layer-1 and higher layers of an MLDB can be modeled by an e...
Resource and Knowledge Discovery in Global Information Systems: A Scalable Multiple Layered Database Approach
- IN PROC. OF THE FIRST INT'L CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 1995
"... With huge amounts of information connected to the global information network (Internet), efficient and effective discovery of resource and knowledge from the "global information base" has become an imminent research issue, especially with the advent of the Information SuperHighway. In ..."
Abstract
-
Cited by 15 (11 self)
- Add to MetaCart
With huge amounts of information connected to the global information network (Internet), efficient and effective discovery of resource and knowledge from the "global information base" has become an imminent research issue, especially with the advent of the Information SuperHighway. In
The Best Trail Algorithm for Assisted Navigation of Web Sites
- In Proc. LA-WEB Conference on Latin American Web Congress
, 2003
"... We present an algorithm called the Best Trail Algorithm, which helps solve the hypertext navigation problem by automating the construction of memex-like trails through the corpus. The algorithm performs a probabilistic best-first expansion of a set of navigation trees to find relevant and compact tr ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We present an algorithm called the Best Trail Algorithm, which helps solve the hypertext navigation problem by automating the construction of memex-like trails through the corpus. The algorithm performs a probabilistic best-first expansion of a set of navigation trees to find relevant and compact trails. We describe the implementation of the algorithm, scoring methods for trails, filtering algorithms and a new metric called potential gain which measures the potential of a page for future navigation opportunities.
Design and selection criteria for a national web archive
- In Proc. 10th European Conf. Research and Advanced Technology for Digital Libraries, ECDL
, 2006
"... Abstract. Web archives and Digital Libraries are conceptually similar, as they both store and provide access to digital contents. The process of loading documents into a Digital Library usually requires a strong intervention from human experts. However, large collections of documents gathered from t ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Abstract. Web archives and Digital Libraries are conceptually similar, as they both store and provide access to digital contents. The process of loading documents into a Digital Library usually requires a strong intervention from human experts. However, large collections of documents gathered from the web must be loaded without human intervention. This paper analyzes strategies to select contents for a national web archive and proposes a system architecture to support it. 1 1

