Results 1 - 10
of
11
Konidaris A, Estimating and Eliminating Redundant Data Transfers Over the Web: A Fragment Based Approach
- In: Proceedings of the third international conference on internet computing (IC2002
, 2002
"... Redundant data transfers over the Web, can mainly be attributed to repeated transfers of unchanged data. Web caches and Web proxies are some of the solutions that have been proposed, to deal with the issue of redundant data transfers. In this paper we focus on the efficient estimation and reduction ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Redundant data transfers over the Web, can mainly be attributed to repeated transfers of unchanged data. Web caches and Web proxies are some of the solutions that have been proposed, to deal with the issue of redundant data transfers. In this paper we focus on the efficient estimation and reduction of redundant data transfers over the Web. We first prove that a vast amount of redundant data is transferred in Web pages that are considered to carry fresh data. We show this by following an approach based on Web page fragmentation and manipulation. Web pages are broken-down into fragments, based on specific criteria. We then deal with these fragments as independent constructors of the Web page and study their change patterns independently and in the context of the whole Web page. After the fragmentation process we propose solutions for dealing with redundant data transfers.
Design and selection criteria for a national web archive
- In Proc. 10th European Conf. Research and Advanced Technology for Digital Libraries, ECDL
, 2006
"... Abstract. Web archives and Digital Libraries are conceptually similar, as they both store and provide access to digital contents. The process of loading documents into a Digital Library usually requires a strong intervention from human experts. However, large collections of documents gathered from t ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Abstract. Web archives and Digital Libraries are conceptually similar, as they both store and provide access to digital contents. The process of loading documents into a Digital Library usually requires a strong intervention from human experts. However, large collections of documents gathered from the web must be loaded without human intervention. This paper analyzes strategies to select contents for a national web archive and proposes a system architecture to support it. 1 1
The main name system: an exercise in centralized computing
- Computer Communications Review
, 2005
"... (DNS). Dr. Mockapetris deserves considerable credit, because DNS has been an incredibly successful component of the Internet; it hasn’t required major changes despite scaling up many orders of magnitude from its original size. DNS has been recognized as achieving a remarkable balance between scalabi ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(DNS). Dr. Mockapetris deserves considerable credit, because DNS has been an incredibly successful component of the Internet; it hasn’t required major changes despite scaling up many orders of magnitude from its original size. DNS has been recognized as achieving a remarkable balance between scalability (due to its distributed implementation) and decentralized administrative control, in which separate organizations control separate portions of the namespace. So why consider changing it? More to the point, why should CCR publish a paper proposing a radical redesign of the DNS? This paper proposes a “recentralized ” replacement for DNS in which, conceptually, the DNS is served from a single database. While the database could be replicated for fault tolerance, in a fundamental sense the distributed nature of the DNS would be abandoned. Heresy! Surely distributed solutions are always better than centralized solutions-- correct? The authors challenge this assumption and, in the process, provide a very useful analysis of what benefits and costs accrue from the distributed nature of DNS. The paper shows that many features that would be useful in the DNS are more easily and simply provided in a centralized design; and that, despite our (perhaps aesthetic) perference for distributed solutions, many or or most of the benefits of distribution in the DNS are in fact achivable in a centralized design as well (an observation which may be more true today than at the time the DNS was designed). CCR seeks to publish papers like this that challenge accepted wisdom in insightful ways. This paper does that by addressing an important and always timely topic; hopefully it will stimulate more discussion clarifying the strengths and weaknesses of distributed designs like that of the DNS. a c m s i g c o m m Public review written by
Web Proxy Cache Replacement: Do's, Don'ts, and Expectations
- In Proceedings of The 2nd IEEE International Symposium on Network Computing and Applications (NCA-03
, 2003
"... Numerous research efforts have produced a large number of algorithms and mechanisms for web proxy caches. In order to build powerful web proxies and understand their performance, one must be able to appreciate the impact and significance of earlier contributions and how they can be integrated. To do ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Numerous research efforts have produced a large number of algorithms and mechanisms for web proxy caches. In order to build powerful web proxies and understand their performance, one must be able to appreciate the impact and significance of earlier contributions and how they can be integrated. To do this we employ a cache replacement algorithm, ‘CSP’, which integrates key knowledge from previous work. CSP utilizes the communication Cost to fetch web objects, the objects ’ Sizes, their Popularities, an auxiliary cache and a cache admission control algorithm. We study the impact of these components with respect to hit ratio, latency, and bandwidth requirements. Our results show that there are clear performance gains when utilizing the communication cost, the popularity of objects, and the auxiliary cache. In contrast, the size of objects and the admission controller have a negligible performance impact. Our major conclusions going against those in related work are that (i) LRU is preferable to CSP for important parameter values, (ii) hit ratio results tend to be very misleading in predicting the true performance of algorithms, (iii) accounting for the objects ’ sizes does not improve latency and/or bandwidth requirements, and (iv) the collaboration of nearby proxies is not very beneficial. In addition to CSP and LRU we study the well-known GDS algorithm, which although utilizes similar terms, it does so in a different way and cannot be modeled with the CSP algorithm. Based on these results, we chart the problem solution space, identifying which algorithm is preferable and under which conditions. Finally, we develop a dynamic replacement algorithm that continuously utilizes the best algorithm as the problem-parameter values (such as the access skew distributions) change with time.
Performance Evaluation of a Hybrid Run-Time Management Policy for Data Intensive Web Sites
, 2003
"... The issues of performance, response efficiency and data consistency are among the most important for data intensive Web sites. In order to deal with these issues we analyze and evaluate a hybrid run-time management policy that may be applied to data intensive Web sites. Our research relies on the pe ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The issues of performance, response efficiency and data consistency are among the most important for data intensive Web sites. In order to deal with these issues we analyze and evaluate a hybrid run-time management policy that may be applied to data intensive Web sites. Our research relies on the performance evaluation of experimental client/server configurations. We propose a hybrid Web site run-time management policy that may apply to different Web site request patterns and data update frequencies. A run-time management policy is viewed as a Web page materialization policy that can adapt to different conditions at run-time. We define a concept that we have named the Compromise Factor (CF), to achieve the relationship between current server conditions and the materialization policy. The issue of Web and database data consistency is the driving force behind our approach. In some cases though, we prove that certain compromises to consistency can be beneficial to Web server performance and at the same time be unnoticeable to users. We first present a general a comparative cost model for the hybrid management policy and three other related and popular Web management policies. We then evaluate the performance of all the approaches. The results of our evaluation show that the concept of the CF may be beneficial to Web servers in terms of performance.
Understanding the Object Retrieval Dependence of Web Page Access
- Proceedings of the 10 th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS
, 2002
"... In this paper, we propose a chunk-level client's latency dependence model (C-LDM) to describe the effect of network protocol, streaming data transfer mechanism, and web page structure on the latency perceived by a web client. Object retrieval latency is made up of four components: (i) definition – f ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we propose a chunk-level client's latency dependence model (C-LDM) to describe the effect of network protocol, streaming data transfer mechanism, and web page structure on the latency perceived by a web client. Object retrieval latency is made up of four components: (i) definition – finding the definition of embedded objects in a page, (ii) queuing – waiting of object request dissemination into the network once its existence is defined, (iii) connection – setting up network connection for data transfer, and (iv) chunk transfer – transferring actual data from server to client once the connection is setup. We show that for typical network connectivity, user's perceived page latency is mainly due to the definition and queuing of embedded object requests inside a web page, which are related to the content structure of a web page and the parallelism width of the browser for object fetching. Such understanding is important because it opens opportunities to improve the retrieval speed of web surfing through the minimization of data chunk dependence. 1.
Measured HTTP Performance and Fun Factors
- Proc. of the 17th International Teletraffic Congress
, 2001
"... Recent work has emphasized the importance of pure delay components as well as rate components in the user perceived performance of elastic Internet applications, namely Web browsing. “Fun factors ” have been previously introduced to describe the obtained performance with respect to the maximum possi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Recent work has emphasized the importance of pure delay components as well as rate components in the user perceived performance of elastic Internet applications, namely Web browsing. “Fun factors ” have been previously introduced to describe the obtained performance with respect to the maximum possible performance on a scale of zero (no fun) to one (maximum fun). In this paper, several options for defining such fun factors using delay and rate components are presented. In order to better understand the influence of delays and transmission rates, two extensive traffic traces have been evaluated (a) to yield information about the way browsers use persistent and parallel HTTP/TCP connections to access Web servers and (b) to serve as an example for quantifying quality of service with fun factors.
Performance Analysis
- in Unstructured Overlays”, in Proc. IEEE ICC
, 2003
"... Monitoring and information system (MIS) implementations provide data about available resources and services within a distributed system, or Grid. A comprehensive performance evaluation of an MIS can aid in detecting potential bottlenecks, advise in deployment, and help improve future system developm ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Monitoring and information system (MIS) implementations provide data about available resources and services within a distributed system, or Grid. A comprehensive performance evaluation of an MIS can aid in detecting potential bottlenecks, advise in deployment, and help improve future system development. In this paper, we analyze and compare the performance of three MIS in a quantitative manner: the Globus ToolkitÆ Monitoring and Discovery Service (MDS2), the European DataGrid Relational Grid Monitoring Architecture (R-GMA), and the Condor projectís Hawkeye. We use the NetLogger toolkit to instrument the main service components of each MIS and conduct four sets of experiments to benchmark their scalability with respect to the number of users, the number of resources, and the amount of data collected. Our study provides quantitative measurements comparable across all systems. We also find performance bottlenecks and identify how they relate to the design goals, underlying architectures, and implementation technologies of the corresponding MIS, and we present guidelines for deploying monitoring and information systems in practice. 1.
Ajax-based Report Pages as Incrementally Rendered Views ∗
"... While Ajax-based programming enables faster performance and higher interface quality over pure server-side programming, it is demanding and error prone as each action that partially updates the page requires custom, ad-hoc code. The problem is exacerbated by distributed programming between the brows ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
While Ajax-based programming enables faster performance and higher interface quality over pure server-side programming, it is demanding and error prone as each action that partially updates the page requires custom, ad-hoc code. The problem is exacerbated by distributed programming between the browser and server, where the developer uses JavaScript to access the page state and Java/SQL for the database. The FORWARD framework simplifies the development of Ajax pages by treating them as rendered views, where the developer declares a view using an extension of SQL and page units, which map to the view and render the data in the browser. Such a declarative approach leads to significantly less code, as the framework automatically solves performance optimization problems that the developer would otherwise hand-code. Since pages are fueled by views, FORWARD leverages years of database research on incremental view maintenance by creating optimization techniques appropriately extended for the needs of pages (nesting, variability, ordering), thereby achieving performance comparable to hand-coded JavaScript/Java applications.
A More Precise Model for Web Retrieval
, 2005
"... Most research works on web retrieval latency are object-level based, which we think is insufficient and sometimes inaccurate. In this paper, we propose a fine grained operation-level Web Retrieval Dependency Model (WRDM) to provide more precise capture of web retrieval process. Our model reveals som ..."
Abstract
- Add to MetaCart
Most research works on web retrieval latency are object-level based, which we think is insufficient and sometimes inaccurate. In this paper, we propose a fine grained operation-level Web Retrieval Dependency Model (WRDM) to provide more precise capture of web retrieval process. Our model reveals some new factors in web retrieval which cannot be seen at object level but are very important to studies in the web retrieval area.

