Results 1 - 10
of
11
Opportunistic Data Structures with Applications
, 2000
"... In this paper we address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space ..."
Abstract
-
Cited by 142 (11 self)
- Add to MetaCart
In this paper we address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space reduction is achieved at no significant slowdown in the query performance. More precisely, its space occupancy is optimal in an information-content sense because a text T [1, u] is stored using O(H k (T )) + o(1) bits per input symbol in the worst case, where H k (T ) is the kth order empirical entropy of T (the bound holds for any fixed k). Given an arbitrary string P [1; p], the opportunistic data structure allows to search for the occ occurrences of P in T in O(p + occ log u) time (for any fixed > 0). If data are uncompressible we achieve the best space bound currently known [12]; on compressible data our solution improves the succinct suffix array of [12] and the classical suffix tree and suffix array data structures either in space or in query time or both.
Traffic Analysis of a Web Proxy Caching Hierarchy
- IEEE Network, special issue on Web performance
, 2000
"... 1 Introduction The World-Wide Web (WWW or Web) has experienced phenomenal growth in recent years. This growth of the Web has contributed significantly to the network traffic on the Internet, and motivated much research into improving the performance and scalability of the Web. In recent years, Web p ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
1 Introduction The World-Wide Web (WWW or Web) has experienced phenomenal growth in recent years. This growth of the Web has contributed significantly to the network traffic on the Internet, and motivated much research into improving the performance and scalability of the Web. In recent years, Web proxy caches have been deployed to reduce network traffic and provide better response time for Web accesses. A Web proxy consists of application level software that accepts document retrieval requests from a set of clients, forwards these requests to appropriate servers if the requested documents are not already present in the proxy's cache, and sends documents back to the clients. Originally, proxies were designed to allow network administrators to be able to control Internet access from within an Intranet [1]. It was recognized, however, that proxies may also serve as repositories for frequently requested documents. This role of proxies has made them very popular. Caching documents at the proxy can save network bandwidth and reduce network latency for document accesses [2].
An Efficient Compression Code for Text Databases
"... We present a new compression format for natural language texts, allowing both exact and approximate search without decompression. This new code (called End-Tagged Dense Code) has some advantages with respect to other compression techniques with similar features such as the Tagged Huffman Code of [Mo ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
We present a new compression format for natural language texts, allowing both exact and approximate search without decompression. This new code (called End-Tagged Dense Code) has some advantages with respect to other compression techniques with similar features such as the Tagged Huffman Code of [Moura et al., ACM TOIS 2000]. Our compression method obtains (i) better compression ratios, (ii) a smaller and simpler vocabulary representation, and (iii) a simpler and faster encoding. At the same time, it retains the most interesting features of the method based on the Tagged Huffman Code, i.e., exact search for words and phrases directly on the compressed text using any known sequential pattern matching algorithm, efficient word-based approximate and extended searches without any decoding, and efficient decompression of arbitrary portions of the text. As a side effect, our analytical results give new upper and lower bounds for the redundancy of d-ary Huffman codes.
Multicast Video-on-Demand Services
- ACM Computer Communication Review
, 2002
"... The server's storage I/O and network I/O bandwidths are the main bottleneck of VoD service. Multicast offers an efficient means of distributing a video program to multiple clients, thus greatly improving the VoD performance. However, there are many problems to overcome before development of multicas ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
The server's storage I/O and network I/O bandwidths are the main bottleneck of VoD service. Multicast offers an efficient means of distributing a video program to multiple clients, thus greatly improving the VoD performance. However, there are many problems to overcome before development of multicast VoD systems. This paper critically evaluates and discusses the recent progress in developing multicast VoD systems. We first present the concept and architecture of multicast VoD, and then introduce the techniques used in multicast VoD systems. We also analyze and evaluate problems related to multicast VoD service. Finally, we present open issues on multicast VoD as possible future research directions.
Web Proxy Workload Characterisation And Modelling
, 1999
"... Understanding WWW traffic characteristics is key to improving the performance and scalability of the Web. In the first part of this thesis, Web proxy workloads from different levels of a caching hierarchy are used to understand how the workload characteristics change across different levels of a cac ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Understanding WWW traffic characteristics is key to improving the performance and scalability of the Web. In the first part of this thesis, Web proxy workloads from different levels of a caching hierarchy are used to understand how the workload characteristics change across different levels of a caching hierarchy. The main observations of this study are: HTML and image documents account for 95% of the documents seen in the workload; the distribution of transfer sizes of documents is heavy-tailed, with the tails becoming heavier as one moves from the client side to the server side of the network; the popularity profile of documents does not precisely follow the Zipf distribution; one-timers account for approximately 70% of the documents referenced; concentration of references is less at proxy caches than at servers, and concentration of references is higher at lower-level proxies than at higher-level proxies; there appears to be no correlation between document modification rate and document pop...
Observation of Changing Information Sources
, 2000
"... Many modern information management tasks consist of an observer that must maintain current knowledge of a collection of changing information. The goal of this observer is to maintain acceptably accurate state estimates given limited observation resources, such as bandwidth, time, and storage. Good e ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Many modern information management tasks consist of an observer that must maintain current knowledge of a collection of changing information. The goal of this observer is to maintain acceptably accurate state estimates given limited observation resources, such as bandwidth, time, and storage. Good examples of such \observation problems" are found in any situation where bandwidth is limited and old observations become less useful over time. Two such examples are maintaining a search engine's index of the World Wide Web (WWW) and automated monitoring of multiple sensors. This thesis addresses the general observation problem by (1) devising a formal framework of what it means to be \up-to-date", (2) gathering empirical data about the web that allows us to apply this framework to an important setting, and (3) presenting algorithms for scheduling revisits to optimize formal performance measures. One year's worth of web page observations are analyzed to show how quickly and in what ways web ...
Enhancing Hyperlink Structure for Improving Web Performance
, 2003
"... In a Web site, each page v has a probability... ..."
Neutral Networks of Minimum Free Energy RNA Secondary Structures
, 2000
"... In this thesis the sequence to secondary structure mapping of natural RNA molecules is approached from three different directions. Exhaustive folding of an entire sequence space yields the most highly resolved picture of the landscape, yet if the model is to be close to biological reality (that is, ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this thesis the sequence to secondary structure mapping of natural RNA molecules is approached from three different directions. Exhaustive folding of an entire sequence space yields the most highly resolved picture of the landscape, yet if the model is to be close to biological reality (that is, the natural base alphabet is used and the folding algorithm is the biophysically meaningful minimization of the free energy of folding) then this approach is too costly with respect to computation for all but very small sequence spaces. Exhaustive folding data is presented for the sequence space Q 16 AUGC and the applicability of a random graph model to the neutral networks of this space is discussed. The folding landscape exhibits the features which are known to be characteristic for the RNA sequence to secondary structure mapping: a distinction of common and rare structures, a near isotropic distribution of the sequences which fold into the former, closenesss in space of sequenc...
Evolution At Molecular Resolution
- Nonlinear Cooperative Phenomena in Biological Systems
, 1998
"... Introduction Biological evolution is too complex and too slow for experimental investigation. In order to make evolutionary phenomena accessible to systematic studies one needs (i) to reduce generation times in order to speed up evolution, (ii) to minimize complexity of phenotypes in order to allow ..."
Abstract
- Add to MetaCart
Introduction Biological evolution is too complex and too slow for experimental investigation. In order to make evolutionary phenomena accessible to systematic studies one needs (i) to reduce generation times in order to speed up evolution, (ii) to minimize complexity of phenotypes in order to allow for an analysis of genotype-phenotype relations, and (iii) to shorten genotype lengths in order to keep possible diversity below a certain limit. All three conditions are fulfilled, for example, by test-tube experiments on optimization of RNA molecules. Evolution of molecules in the test tube is indeed the simplest and the only currently known realistic system that allows to study the mechanisms of biological evolution at molecular resolution. Both, the experimental approach and the development of theory, have reached a point from where on systematic studies and global investigations of the rules underlying the dynamics of evolutionary processes are required in order to make progress in

