Results 1 - 10
of
31
The strength of weak learnability
- Machine Learning
, 1990
"... Abstract. This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with h ..."
Abstract
-
Cited by 554 (22 self)
- Add to MetaCart
Abstract. This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a Source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent. A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error e.
Optimal Prefetching via Data Compression
, 1995
"... Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the pr ..."
Abstract
-
Cited by 226 (11 self)
- Add to MetaCart
Caching and prefetching are important mechanisms for speeding up access time to data on secondary storage. Recent work in competitive online algorithms has uncovered several promising new algorithms for caching. In this paper we apply a form of the competitive philosophy for the first time to the problem of prefetching to develop an optimal universal prefetcher in terms of fault ratio, with particular applications to large-scale databases and hypertext systems. Our prediction algorithms for prefetching are novel in that they are based on data compression techniques that are both theoretically optimal and good in practice. Intuitively, in order to compress data effectively, you have to be able to predict future data well, and thus good data compressors should be able to predict well for purposes of prefetching. We show for powerful models such as Markov sources and nth order Markov sources that the page fault rates incurred by our prefetching algorithms are optimal in the limit for almost all sequences of page requests.
The minimum consistent DFA problem cannot be approximated within any polynomial
- Journal of the Association for Computing Machinery
, 1993
"... Abstract. The minimum consistent DFA problem is that of finding a DFA with as few states as possible that is consistent with a given sample (a finite collection of words, each labeled as to whether the DFA found should accept or reject). Assuming that P # NP, it is shown that for any constant k, no ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
Abstract. The minimum consistent DFA problem is that of finding a DFA with as few states as possible that is consistent with a given sample (a finite collection of words, each labeled as to whether the DFA found should accept or reject). Assuming that P # NP, it is shown that for any constant k, no polynomial-time algorithm can be guaranteed to find a consistent DFA with fewer than opt ~ states, where opt is the number of states in the minimum state DFA consistent with the sample. This result holds even if the alphabet is of constant size two, and if the algorithm is allowed to produce an NFA, a regular expression, or a regular grammar that is consistent with the sample. A similar nonapproximability result is presented for the problem of finding small consistent linear grammars. For the case of finding minimum consistent DFAs when the alphabet is not of constant size but instead is allowed to vay with the problem specification, the slightly
Learning in the Presence of Finitely or Infinitely Many Irrelevant Attributes
, 1995
"... This paper addresses the problem of learning boolean functions in query and mistake-bound ..."
Abstract
-
Cited by 46 (8 self)
- Add to MetaCart
This paper addresses the problem of learning boolean functions in query and mistake-bound
Mobility-Based Predictive Call Admission Control and Bandwidth Reservation in Wireless Cellular Networks
- IEEE INFOCOM
, 2001
"... This paper presents call admission control and bandwidth reservation schemes in wireless cellular networks that have been developed based on assumptions more realistic than existing proposals. In order to guarantee the handoff dropping probability, we propose to statistically predict user mobility b ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
This paper presents call admission control and bandwidth reservation schemes in wireless cellular networks that have been developed based on assumptions more realistic than existing proposals. In order to guarantee the handoff dropping probability, we propose to statistically predict user mobility based on the mobility history of users. Our mobility prediction scheme is motivated by computational learning theory, which has shown that prediction is synonymous with data compression. We derive our mobility prediction scheme from data compression techniques that are both theoretically optimal and good in practice. In order to utilize resource more efficiently, we predict not only the cell to which the mobile will handoff but also when the handoff will occur. Based on the mobility prediction, bandwidth is reserved to guarantee some target handoff dropping probability. We also adaptively control the admission threshold to achieve a better balance between guaranteeing handoff dropping probability and maximizing resource utilization. Simulation results show that the proposed schemes meet our design goals and outperform the static-reservation and cell-reservation schemes. Paper submitted to Computer Networks. This paper is based on a paper presented at IEEE Infocom 2001, Anchorage, Alaska, April 2001. Technical subject area: call admission control, bandwidth reservation, mobility prediction. Please address all correspondence to Professor Victor Leung at the above address. This work was supported by a grant from Motorola Canada Ltd., and by the Canadian Natural Sciences and Engineering Research Council under grant CRDPJ 223095. Mobility-Based Predictive Call Admission Control and Bandwidth Reservation in Wireless Cellular Networks Yu 1 I.
Compression, Significance and Accuracy
, 1992
"... Inductive Logic Programming (ILP) involves learning relational concepts from examples and background knowledge. To date all ILP learning systems make use of tests inherited from propositional and decision tree learning for evaluating the significance of hypotheses. None of these significance t ..."
Abstract
-
Cited by 39 (5 self)
- Add to MetaCart
Inductive Logic Programming (ILP) involves learning relational concepts from examples and background knowledge. To date all ILP learning systems make use of tests inherited from propositional and decision tree learning for evaluating the significance of hypotheses. None of these significance tests take account of the relevance or utility of the background knowledge. In this paper we describe a method, called HP-compression, of evaluating the significance of a hypothesis based on the degree to which it allows compression of the observed data with respect to the background knowledge. This can be measured by comparing the lengths of the input and output tapes of a reference Turing machine which will generate the examples from the hypothesis and a set of derivational proofs. The model extends an earlier approach of Muggleton by allowing for noise. The truth values of noisy instances are switched by making use of correction codes. The utility of compression as a significance measure is evaluated empirically in three independent domains. In particular, the results show that the existence of positive compression distinguishes a larger number of significant clauses than other significance tests The method is also shown to reliably distinguish artificially introduced noise as incompressible data.
Statistical Queries and Faulty PAC Oracles
- In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory
, 1993
"... In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [1 ..."
Abstract
-
Cited by 37 (6 self)
- Add to MetaCart
In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [12] recently showed that efficient learning in a new model using statistical queries is a sufficient condition for PAC learning with classification noise. We show that efficient learning with statistical queries is sufficient for learning in the PAC model with malicious error rate proportional to the required statistical query accuracy. One application of this result is a new lower bound for tolerable malicious error in learning monomials of k literals. This is the first such bound which is independent of the number of irrelevant attributes n. We also use the statistical query model to give sufficient conditions for using distribution specific algorithms on distributions outside their prescr...
A Markovian extension of Valiant's learning model (Extended Abstract)
- IN PROCEEDINGS OF THE THIRTY-FIRST SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 1990
"... Formalizing the process of natural induction and justi-fying its predictive value is not only basic to the phi- ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
Formalizing the process of natural induction and justi-fying its predictive value is not only basic to the phi-
Relevant Examples and Relevant Features: Thoughts from Computational Learning Theory
- In AAAI Fall Symposium on `Relevance
, 1994
"... this paper I will attempt to survey some of the results and intuitions developed in the area of computational learning theory. My focus will be on two issues in particular: that some examples may be more relevant than others, and that within an example, some features may be more relevant than others ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
this paper I will attempt to survey some of the results and intuitions developed in the area of computational learning theory. My focus will be on two issues in particular: that some examples may be more relevant than others, and that within an example, some features may be more relevant than others. This survey is by no means even close to comprehensive, and strongly reflects my own personal biases as well as issues brought up by results presented at this workshop. Issues of relevance are fundamental in the theoretical study of machine learning. In particular, questions regarding the meaning of a "relevant" or "informative" example are key motivations for the most popular and most basic theoretical models. Let me begin in the traditional manner of defining the basic models discussed, but do so from the point of view of the motivations from "relevance."
Distinguishing Exceptions from Noise in Non-Monotonic Learning
-
, 1996
"... It is important for a learning program to have a reliable method of deciding whether to treat errors as noise or to include them as exceptions within a growing first-order theory. We explore the use of an informationtheoretic measure to decide this problem within the non-monotonic learning frame ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
It is important for a learning program to have a reliable method of deciding whether to treat errors as noise or to include them as exceptions within a growing first-order theory. We explore the use of an informationtheoretic measure to decide this problem within the non-monotonic learning framework defined by Closed-World-Specialisation. The approach adopted uses a model that consists of a reference Turing machine which accepts an encoding of a theory and proofs on its input tape and generates the observed data on the output tape. Within this model, the theory is said to "compress" data if the length of the input tape is shorter than that of the output tape. Data found to be incompressible are deemed to be "noise".

