Results 1  10
of
18
How to Use Expert Advice
 JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract

Cited by 350 (71 self)
 Add to MetaCart
We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictions. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show howthis leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently known in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.
Optimal Prefetching via Data Compression
, 1995
"... Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the pr ..."
Abstract

Cited by 254 (9 self)
 Add to MetaCart
Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the problem of prefetching to develop an optimal universal prefetcher in terms of fault ratio, with particular applications to largescale databases and hypertext systems. Our prediction algorithms for prefetching are novel in that they are based on data compression techniques that are both theoretically optimal and good in practice. Intuitively, in order to compress data effectively, you have to be able to predict future data well, and thus good data compressors should be able to predict well for purposes of prefetching. We show for powerful models such as Markov sources and nth order Markov sources that the page fault rates incurred by our prefetching algorithms are optimal in the limit for almost all sequences of page requests.
Toward efficient agnostic learning
 In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory
, 1992
"... Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtua ..."
Abstract

Cited by 219 (7 self)
 Add to MetaCart
Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.
On the learnability of discrete distributions
 In The 25th Annual ACM Symposium on Theory of Computing
, 1994
"... We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled ..."
Abstract

Cited by 104 (11 self)
 Add to MetaCart
(Show Context)
We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled
On the Boosting Ability of TopDown Decision Tree Learning Algorithms
 In Proceedings of the TwentyEighth Annual ACM Symposium on the Theory of Computing
, 1995
"... We analyze the performance of topdown algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosting algorithms. By this we mean that if the functions used to label the internal nodes of ..."
Abstract

Cited by 98 (6 self)
 Add to MetaCart
We analyze the performance of topdown algorithms for decision tree learning, such as those employed by the widely used C4.5 and CART software packages. Our main result is a proof that such algorithms are boosting algorithms. By this we mean that if the functions used to label the internal nodes of the decision tree can weakly approximate the unknown target function, then the topdown algorithms we study will amplify this weak advantage to build a tree achieving any desired level of accuracy. The bounds we obtain for this amplification show an interesting dependence on the splitting criterion function G used by the topdown algorithm. More precisely, if the functions used to label the internal nodes have error 1=2 \Gamma fl as approximations to the target function, then for the splitting criteria used by CART and C4.5, trees of size (1=ffl) O(1=fl 2 ffl 2 ) and (1=ffl) O(log(1=ffl)=fl 2 ) (respectively) suffice to drive the error below ffl. Thus, small constant advantage over...
Adaptive Disk Spindown via Optimal RenttoBuy in Probabilistic Environments
, 1999
"... In the single renttobuy decision problem, without a priori knowledge of the amount of time a resource will be used we need to decide when to buy the resource, given that we can rent the resource for $1 per unit time or buy it once and for all for $c. In this paper we study algorithms that make a ..."
Abstract

Cited by 88 (4 self)
 Add to MetaCart
In the single renttobuy decision problem, without a priori knowledge of the amount of time a resource will be used we need to decide when to buy the resource, given that we can rent the resource for $1 per unit time or buy it once and for all for $c. In this paper we study algorithms that make a sequence of single renttobuy decisions, using the assumption that the resource use times are independently drawn from an unknown probability distribution. Our study of this renttobuy problem is motivated by important systems applications, specifically, problems arising from deciding when to spindown disks to conserve energy in mobile computers [4], [13], [15], thread blocking decisions during lock acquisition in multiprocessor applications [7], and virtual circuit holding times in IPoverATM networks [11], [19]. We develop a provably optimal and computationally efficient algorithm for the renttobuy problem. Our algorithm uses O ( √ t) time and space, and its expected cost for the tth resource use converges to optimal as O ( √ log t/t), for any bounded probability distribution on the resource use times. Alternatively, using O(1) time and space, the algorithm almost converges to optimal. We describe the experimental results for the application of our algorithm to one of the motivating systems problems: the question of when to spindown a disk to save power in a mobile computer. Simulations using disk access traces obtained from an HP workstation environment suggest that our algorithm yields significantly improved power/response time performance over the nonadaptive 2competitive algorithm which is optimal in the worstcase competitive analysis model.
Learning distributions by their density levels: A paradigm for learning without a teacher
 Journal of Computer and System Sciences
, 1997
"... We propose a mathematical model for learning the highdensity areas of an unknown distribution from (unlabeled) random points drawn according to this distribution. While this type of a learning task has not been previously addressed in the Computational Learnability literature, we believethat this i ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
We propose a mathematical model for learning the highdensity areas of an unknown distribution from (unlabeled) random points drawn according to this distribution. While this type of a learning task has not been previously addressed in the Computational Learnability literature, we believethat this it a rather basic problem that appears in many practical learning scenarios. From a statistical theory standpoint, our model may be viewed as a restricted instance of the fundamental issue of inferring information about a probability distribution from the random samples it generates. From a computational learning angle, what we propose is a new framework of unsupervised concept learning. The examples provided to the learner in our model are not labeled (and are not necessarily all positive or all negative). The only information about their membership is indirectly disclosed to the student through the sampling distribution. We investigate the basic features of the proposed model and provide lower and upper bounds on the sample complexity of such learning tasks. Our main result is that the learnability of a class of distributions in this setting is equivalent to the niteness of the VCdimension of the class of the highdensity areas of these distributions. One direction of the proof involves a reduction of the densitylevellearnability to pconcepts learnability, while the su ciency condition is proved through the introduction of a generic learning algorithm.
Combining protein secondary structure prediction models with ensemble methods of optimal complexity
, 2004
"... ..."
Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants
, 1997
"... There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models provide a probabilistic framework for the discussion of gen ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models provide a probabilistic framework for the discussion of generalization and learning. This survey concentrates on the sample complexity questions in these models; that is, the emphasis is on how many examples should be used for training. Computational complexity considerations are briefly discussed for the basic PAC model. Throughout, the importance of the VapnikChervonenkis dimension is highlighted. Particular attention is devoted to describing how the probabilistic models apply in the context of neural network learning, both for networks with binaryvalued output and for networks with realvalued output.