Results 1 - 10
of
19
Optimal Prefetching via Data Compression
, 1995
"... Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the pr ..."
Abstract
-
Cited by 226 (11 self)
- Add to MetaCart
Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the problem of prefetching to develop an optimal universal prefetcher in terms of fault ratio, with particular applications to large-scale databases and hypertext systems. Our prediction algorithms for prefetching are novel in that they are based on data compression techniques that are both theoretically optimal and good in practice. Intuitively, in order to compress data effectively, you have to be able to predict future data well, and thus good data compressors should be able to predict well for purposes of prefetching. We show for powerful models such as Markov sources and nth order Markov sources that the page fault rates incurred by our prefetching algorithms are optimal in the limit for almost all sequences of page requests.
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length
- Machine Learning
, 1996
"... . We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions gene ..."
Abstract
-
Cited by 148 (15 self)
- Add to MetaCart
. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KL-divergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in human-machine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second ...
Opportunistic Data Structures with Applications
, 2000
"... In this paper we address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space ..."
Abstract
-
Cited by 142 (11 self)
- Add to MetaCart
In this paper we address the issue of compressing and indexing data. We devise a data structure whose space occupancy is a function of the entropy of the underlying data set. We call the data structure opportunistic since its space occupancy is decreased when the input is compressible and this space reduction is achieved at no significant slowdown in the query performance. More precisely, its space occupancy is optimal in an information-content sense because a text T [1, u] is stored using O(H k (T )) + o(1) bits per input symbol in the worst case, where H k (T ) is the kth order empirical entropy of T (the bound holds for any fixed k). Given an arbitrary string P [1; p], the opportunistic data structure allows to search for the occ occurrences of P in T in O(p + occ log u) time (for any fixed > 0). If data are uncompressible we achieve the best space bound currently known [12]; on compressible data our solution improves the succinct suffix array of [12] and the classical suffix tree and suffix array data structures either in space or in query time or both.
Adaptive Disk Spindown via Optimal Rent-to-Buy in Probabilistic Environments
, 1999
"... In the single rent-to-buy decision problem, without a priori knowledge of the amount of time a resource will be used we need to decide when to buy the resource, given that we can rent the resource for $1 per unit time or buy it once and for all for $c. In this paper we study algorithms that make a ..."
Abstract
-
Cited by 71 (4 self)
- Add to MetaCart
In the single rent-to-buy decision problem, without a priori knowledge of the amount of time a resource will be used we need to decide when to buy the resource, given that we can rent the resource for $1 per unit time or buy it once and for all for $c. In this paper we study algorithms that make a sequence of single rent-to-buy decisions, using the assumption that the resource use times are independently drawn from an unknown probability distribution. Our study of this rent-to-buy problem is motivated by important systems applications, specifically, problems arising from deciding when to spindown disks to conserve energy in mobile computers [4], [13], [15], thread blocking decisions during lock acquisition in multiprocessor applications [7], and virtual circuit holding times in IP-over-ATM networks [11], [19]. We develop a provably optimal and computationally efficient algorithm for the rent-to-buy problem. Our algorithm uses O ( √ t) time and space, and its expected cost for the tth resource use converges to optimal as O ( √ log t/t), for any bounded probability distribution on the resource use times. Alternatively, using O(1) time and space, the algorithm almost converges to optimal. We describe the experimental results for the application of our algorithm to one of the motivating systems problems: the question of when to spindown a disk to save power in a mobile computer. Simulations using disk access traces obtained from an HP workstation environment suggest that our algorithm yields significantly improved power/response time performance over the nonadaptive 2-competitive algorithm which is optimal in the worst-case competitive analysis model.
Integrated Parallel Prefetching and Caching
, 1995
"... Recently there has been a great deal of interest in prefetching from parallel disks, as a technique for enabling serial applications to improve I/O performance. Studies have also shown that for optimal performance, it is important to properly integrate prefetching and caching. In this paper, we stud ..."
Abstract
-
Cited by 63 (5 self)
- Add to MetaCart
Recently there has been a great deal of interest in prefetching from parallel disks, as a technique for enabling serial applications to improve I/O performance. Studies have also shown that for optimal performance, it is important to properly integrate prefetching and caching. In this paper, we study integrated prefetching and caching strategies for multiple disks. We present two algorithms, regular aggressive and reverse aggressive, and show that reverse aggressive is close to optimal. Using trace-driven simulation on a collection of file access traces, we evaluated these algorithms under a variety of data placement alternatives. Our results show that both algorithms can achieve near linear speedup when the load is distributed evenly on the disks, and reverse aggressive performs well even when the placement of blocks on disks distributes the load unevenly. Our simulations also show that, for file system traces, replicating data, even across all of the disks, offers little performance ...
Minimizing Stall Time in Single and Parallel Disk Systems
, 1998
"... We study integrated prefetching and caching problems following the work of Cao et. al. [3] and Kimbrel and Karlin [14]. Cao et. al. and Kimbrel and Karlin gave approximation algorithms for minimizing the total elapsed time in single and parallel disk settings. The total elapsed time is the sum of ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
We study integrated prefetching and caching problems following the work of Cao et. al. [3] and Kimbrel and Karlin [14]. Cao et. al. and Kimbrel and Karlin gave approximation algorithms for minimizing the total elapsed time in single and parallel disk settings. The total elapsed time is the sum of the processor stall times and the length of the request sequence to be served. We show that an optimum prefetching/caching schedule for a single disk problem can be computed in polynomial time, thereby settling an open question by Kimbrel and Karlin. For the parallel disk problem we give an approximation algorithm for minimizing stall time. Stall time is an important and harder to approximate measure for this problem. All of our algorithms are based on a new approach which involves formulating the prefetching/caching problems as integer programs.
Online Prediction Algorithms for Databases and Operating Systems
, 1995
"... In making online decisions, computer systems are inherently trying to predict future events. Typical decision problems in computer systems translate to three prediction scenarios: predicting what event is going to happen in the future, when a specific event will take place, or how much of something ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
In making online decisions, computer systems are inherently trying to predict future events. Typical decision problems in computer systems translate to three prediction scenarios: predicting what event is going to happen in the future, when a specific event will take place, or how much of something is going to happen. In this thesis, we develop practical algorithms for specific instances of these three prediction scenarios, and prove the goodness of our algorithms via analytical and experimental methods. We study each of the three prediction scenarios via motivating systems problems. The problem of prefetching requires a prediction of which page is going to be next requested by a user. The problem of disk spindown in mobile machines, modeled by the rent-to-buy framework, requires an estimate of when the next disk access is going to happen. Query optimizers choose a database access strategy by predicting or estimating selectivity, i.e., by estimating the size of a query result. We an...
Limits to Branch Prediction
- In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VII
, 1996
"... Branch prediction is an important mechanism in modern microprocessor design. The focus of research in this area has been on designing new branch prediction schemes. In contrast, very few studies address the inherent limit of predictability of program themselves. Programs have an inherent limit of pr ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Branch prediction is an important mechanism in modern microprocessor design. The focus of research in this area has been on designing new branch prediction schemes. In contrast, very few studies address the inherent limit of predictability of program themselves. Programs have an inherent limit of predictability due to the randomness of input data. Knowing the limit helps us to evaluate how good a prediction scheme is and how much we can expect to improve its accuracy. In this paper we propose two complementary approaches to estimating the limits of predictability: exact analysis of the program and the use of a universal compression/prediction algorithm, prediction by partial matching (PPM), that has been very successful in the field of data and image compression. We review the algorithmic basis for both some common branch predictors and PPM and show that two-level branch prediction, the best method currently in use, is a simplified version of PPM. To illustrate exact analysis, we use ...
Web-Log Mining for Predictive Web Caching
- IEEE Transactions on Knowledge and Data Engineering
, 2003
"... Abstract—Caching is a well-known strategy for improving the performance of Web-based systems. The heart of a caching system is its page replacement policy, which selects the pages to be replaced in a cache when a request arrives. In this paper, we present a Web-log mining method for caching Web obje ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Abstract—Caching is a well-known strategy for improving the performance of Web-based systems. The heart of a caching system is its page replacement policy, which selects the pages to be replaced in a cache when a request arrives. In this paper, we present a Web-log mining method for caching Web objects and use this algorithm to enhance the performance of Web caching systems. In our approach, we develop an n-gram-based prediction algorithm that can predict future Web requests. The prediction model is then used to extend the well-known GDSF caching policy. We empirically show that the system performance is improved using the predictive-caching approach. Index Terms—Web log mining, Web caching, prediction models, classification.
Evaluating Next-Cell Predictors with Extensive Wi-Fi Mobility Data
- IEEE Transactions on Mobile Computing
, 2004
"... Location is an important feature for many applications, and wireless networks can better serve their clients by anticipating client mobility. As a result, many location predictors have been proposed in the literature, though few have been evaluated with empirical evidence. This paper reports on th ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Location is an important feature for many applications, and wireless networks can better serve their clients by anticipating client mobility. As a result, many location predictors have been proposed in the literature, though few have been evaluated with empirical evidence. This paper reports on the results of the first extensive empirical evaluation of location predictors, using a two-year trace of the mobility patterns of over 6,000 users on Dartmouth's campus-wide Wi-Fi wireless network.

