Results 1 - 10
of
34
Web Caching and Zipf-like Distributions: Evidence and Implications
- In INFOCOM
, 1999
"... This paper addresses two unresolved issues about web caching. The first issue is whether web requests from a fixed user community are distributed according to Zipf's law [22]. Several early studies have supported this claim [9], [5], while other recent studies have suggested otherwise [16], [2]. The ..."
Abstract
-
Cited by 715 (2 self)
- Add to MetaCart
This paper addresses two unresolved issues about web caching. The first issue is whether web requests from a fixed user community are distributed according to Zipf's law [22]. Several early studies have supported this claim [9], [5], while other recent studies have suggested otherwise [16], [2]. The ;econd issue relates to a number of recent studies on the characteristics of web proxy traces, which have shown that the hit-ratios and temporal locality of the traces exhibit certain asymptotic properties that are uniform across the different sets of the traces [43, [XO], [71, [XO], [XS]. In partlc- ular, the question is whether these properties are inherent to web accesses or whether they are simply an artifact of the traces. An answer to these unresolved issues will facili- tate both web cache resource planning and cache hierarchy design
Relationship-Based Clustering and Visualization for High-Dimensional Data Mining
- INFORMS Journal on Computing
, 2002
"... In several real-life data-mining... This paper proposes a relationship-based approach that alleviates both problems, side-stepping the "curse-of-dimensionality" issue by working in a suitable similarity space instead of the original high-dimensional attribute space. This intermediary similarity spac ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
In several real-life data-mining... This paper proposes a relationship-based approach that alleviates both problems, side-stepping the "curse-of-dimensionality" issue by working in a suitable similarity space instead of the original high-dimensional attribute space. This intermediary similarity space can be suitably tailored to satisfy business criteria such as requiring customer clusters to represent comparable amounts of revenue. We apply efficient and scalable graph-partitioning-based clustering techniques in this space. The output from the clustering algorithm is used to re-order the data points so that the resulting permuted similarity matrix can be readily visualized in two dimensions, with clusters showing up as bands. While two-dimensional visualization of a similarity matrix is by itself not novel, its combination with the order-sensitive partitioning of a graph that captures the relevant similarity measure between objects provides three powerful properties: (i) the high-dimensionality of the data does not affect further processing once the similarity space is formed; (ii) it leads to clusters of (approximately) equal importance, and (iii) related clusters show up adjacent to one another, further facilitating the visualization of results. The visualization is very helpful for assessing and improving clustering. For example, actionable recommendations for splitting or merging of clusters can be easily derived, and it also guides the user toward the right number of clusters
Accelerating Internet Streaming Media Delivery Using Network-Aware Partial Caching
, 2001
"... This paper aims at mitigating such eects by leveraging the availability of client-side caching proxies. We present a novel caching architecture (and associated cache management algorithms) that turn edge caches into accelerators of streaming media delivery. A salient feature of our caching algori ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
This paper aims at mitigating such eects by leveraging the availability of client-side caching proxies. We present a novel caching architecture (and associated cache management algorithms) that turn edge caches into accelerators of streaming media delivery. A salient feature of our caching algorithms is that they allow ####### caching of streaming media objects and ##### delivery of content from caches and origin servers. The caching algorithms we propose are both ############# and ####### #####; they take into account the popularity of streaming media objects, their bit-rate requirements, and the available bandwidth between clients and servers. Using realistic models of Internet bandwidth (derived from proxy cache logs and measured over real Internet paths), we have conducted extensive simulations to evaluate the performance of various cache management alternatives. Our experiments demonstrate that network-aware caching algorithms can signi cantly reduce service delay and improve overall stream quality. Also, our experiments show that partial caching is particularly eective when bandwidth variability is not very high
A Generator of Internet Streaming Media Objects and Workloads
- ACM SIGMETRICS Performance Evaluation Review
, 2001
"... This paper presents a tool called Gismo (Generator of Internet Streaming Media Objects and workloads). Gismo enables the specification of a number of streaming media access characteristics, including object popularity, temporal correlation of request, seasonal access patterns, user session durations ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This paper presents a tool called Gismo (Generator of Internet Streaming Media Objects and workloads). Gismo enables the specification of a number of streaming media access characteristics, including object popularity, temporal correlation of request, seasonal access patterns, user session durations, user interactivity times, and variable bit-rate (VBR) self-similarity and marginal distributions. The embodiment of these characteristics in Gismo enables the generation of realistic and scalable request streams for use in the benchmarking and comparative evaluation of Internet streaming media delivery techniques. To demonstrate the usefulness of Gismo, we present a case study that shows the importance of various workload characteristics in determining the effectiveness of proxy caching and server patching techniques in reducing bandwidth requirements.
Reducing the Disk I/O of Web Proxy Server Caches
- IN USENIX ANNUAL TECHNICAL CONFERENCE
, 1999
"... The dramatic increase of HTTP traffic on the Internet has resulted in wide-spread use of large caching proxy servers as critical Internet infrastructure components. With continued growth the demand for larger caches and higher performance proxies grows as well. The common bottleneck of large caching ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
The dramatic increase of HTTP traffic on the Internet has resulted in wide-spread use of large caching proxy servers as critical Internet infrastructure components. With continued growth the demand for larger caches and higher performance proxies grows as well. The common bottleneck of large caching proxy servers is disk I/O. In this paper we evaluate ways to reduce the amount of required disk I/O. First we compare the file system interactions of two existing web proxy servers, CERN and SQUID. Then we show how design adjustments to the current SQUID cache architecture can dramatically reduce disk I/O. Our findings suggest two that strategies can significantly reduce disk I/O: (1) preserve locality of the HTTP reference stream while translating these references into cache references, and (2) use virtual memory instead of the file system for objects smaller than the system page size. The evaluated techniques reduced disk I/O by 50% to 70%.
Evaluation of ESI and Class-Based Delta Encoding
- IN PROCEEDINGS OF WCW
, 2003
"... The portion of web traffic attributed to dynamic web content is substantial and continues to grow as users expect more personalization and tailored information. Unfortunately, dynamic content is costly to generate. Moreover, traditional web caching schemes are not very effective for dynamically-crea ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
The portion of web traffic attributed to dynamic web content is substantial and continues to grow as users expect more personalization and tailored information. Unfortunately, dynamic content is costly to generate. Moreover, traditional web caching schemes are not very effective for dynamically-created pages. In this paper we study two acceleration techniques for dynamic content. The first technique is Edge-Side Includes (ESI), and the second is Class-Based Delta Encoding. To evaluate these schemes, we present a model for the construction of dynamic web pages. We use simulation to explore how system, page and algorithm parameters affect the performance of dynamiccontent delivery techniques, and we present a detailed comparison of ESI and delta encoding in two representative scenarios.
Distributed Caching with Centralized Control
- In Proc. of the Fifth International Web Caching and Content Delivery Workshop
, 2000
"... The benefits of using caches for reducing traffic in backbone trunk links and for improving web access times are well-known. However, there are some known problems with traditional web caching, namely, maintaining freshness of web objects, balancing load among a number of caches and providing prot ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The benefits of using caches for reducing traffic in backbone trunk links and for improving web access times are well-known. However, there are some known problems with traditional web caching, namely, maintaining freshness of web objects, balancing load among a number of caches and providing protection against cache failure. This paper investigates in detail the advantages and disadvantages of a distributed architecture of caches which are coordinated through a central controller. In particular, the performance of a set of independent caches is compared against the performance of a set of coordinated distributed caches using extensive simulation. The conclusion is that a distributed architecture of coordinated caches consistently provides a better hit ratio, improves response time, provides better freshness, achieves load balancing, and increases the overall traffic handling capacity of a network while paying a small price in terms of additional control traffic. In particular...
MINING AND USING COVERAGE AND OVERLAP STATISTICS FOR DATA INTEGRATION
, 2004
"... Query processing in the context of integrating autonomous data sources on the Internet has received significant attention of late. In contrast to traditional query processing scenarios, in which each relation is stored in the same primary database and in which completeness of answers is expected by ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Query processing in the context of integrating autonomous data sources on the Internet has received significant attention of late. In contrast to traditional query processing scenarios, in which each relation is stored in the same primary database and in which completeness of answers is expected by users, data integration scenarios involve handling relations that are stored across multiple and potentially overlapping sources and dealing with conflicting objectives in terms of what coverage of answers users want and how much execution cost they are willing to bear for achieving the desired coverage. Hence, query processing in data integration requires coverage and overlap statistics about these autonomous sources to generate optimal query plans. This dissertation first presents StatMiner, an effective statistics mining approach which automatically generates attribute value hierarchies, discovers frequently accessed query classes, and learns coverage and overlap statistics only with respect to these classes. The dissertation then introduces Multi-R, a multi-objective query optimizer which uses coverage and overlap statistics to support joint optimization of coverage and cost of query plans. The efficiency of StatMiner and the effectiveness of the learned statistics are demonstrated in the context of BibFinder, a publicly available
Factors influencing the origins of colour categories
- Laboratory Vrije Universiteit Brussel
, 2002
"... van de academische graad van doctor in de wetenschappen, in het openbaar te verdedigen op vrijdag 8 maart 2002. Acknowledgements I started as a research assistant in the Artificial Intelligence Laboratory in autumn 1996. My first interests were into behavioural robotics and robot ecosystems. As a co ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
van de academische graad van doctor in de wetenschappen, in het openbaar te verdedigen op vrijdag 8 maart 2002. Acknowledgements I started as a research assistant in the Artificial Intelligence Laboratory in autumn 1996. My first interests were into behavioural robotics and robot ecosystems. As a continuation to my “licentiaats ” thesis I started building a camera system to extend the sensory perception of the lab’s robots (Belpaeme and Birk, 1997a,b; Belpaeme, 1998; Birk and Belpaeme, 1998; Birk et al., 1998, 1999; Belpaeme and Birk, 2001). It was around that time when Luc Steels got interested in the origins of language. His early experiments formed the seed for what is now one of the most important paradigms for exploring linguistic interactions with computer simulations. Luc soon wanted more and had plans to implement a language experiment in the real world, for which I delivered the visual perception (Belpaeme et al., 1998; Belpaeme, 1999). This got me interested in visual features, and my research
Evaluation of a Broadcast Scheduling Algorithm
- in Proceedings of the 5 th East European Conference on Advances in Databases and Information Systems
, 2001
"... . One of the two main approaches of data broadcasting is pullbased data delivery. In this paper, we focus on the problem of scheduling data items to broadcast in such a pull-based environment. Previous work has shown that the Longest Wait First heuristic has the best performance results compare ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
. One of the two main approaches of data broadcasting is pullbased data delivery. In this paper, we focus on the problem of scheduling data items to broadcast in such a pull-based environment. Previous work has shown that the Longest Wait First heuristic has the best performance results compared to all other broadcast scheduling algorithms, however the decision overhead avoids its practical implementation. Observing this fact, we propose an efficient broadcast scheduling algorithm which is based on an approximate version of the Longest Wait First heuristic. We also compare the performance of the proposed algorithm against wellknown broadcast scheduling algorithms. 1

