Results 11 - 20
of
32
Separating Distribution-Free And Mistake-Bound Learning Models Over The Boolean Domain
- SIAM J. COMPUT
, 1990
"... Two of the most commonly used models in computational learning theory are the distribution-free model in which examples are chosen from a fixed but arbitrary distribution, and the absolute mistake-bound model in which examples are presented in an arbitrary order. Over the Boolean domain , it is ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Two of the most commonly used models in computational learning theory are the distribution-free model in which examples are chosen from a fixed but arbitrary distribution, and the absolute mistake-bound model in which examples are presented in an arbitrary order. Over the Boolean domain , it is known that if the learner is allowed unlimited computational resources then any concept class learnable in one model is also learnable in the other. In addition, any polynomial-time learning algorithm for a concept class in the mistake-bound model can be transformed into one that learns the class in the distribution-free model. This paper
Data Filtering and Distribution Modeling Algorithms for Machine Learning
, 1993
"... vi Acknowledgments vii 1. Introduction 1 1.1 Boosting by majority : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 1.2 Query By Committee : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 1.3 Learning distributions of binary vectors : : ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
vi Acknowledgments vii 1. Introduction 1 1.1 Boosting by majority : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 1.2 Query By Committee : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 1.3 Learning distributions of binary vectors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2. Boosting a weak learning algorithm by majority 10 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2 The majority-vote game : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14 2.2.1 Optimality of the weighting scheme : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 2.2.2 The representational power of majority gates : : : : : : : : : : : : : : : : : : : : : : 20 2.3 Boosting a weak learner using a majority vote : : : : : : : : : : : : : : : : : : : : : : : : : : 22 2.3.1 Preliminaries : : : : : : : : : : : : : : : : : : : : : : : : : :...
Calculation of the Learning Curve of Bayes Optimal Classification Algorithm for Learning a Perceptron With Noise
- In Computational Learning Theory: Proceedings of the Fourth Annual Workshop
, 1991
"... The learning curve of Bayes optimal classification algorithm when learning a perceptron from noisy random training examples is calculated exactly in the limit of large training sample size and large instance space dimension using methods of statistical mechanics. It is shown that under certain assum ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
The learning curve of Bayes optimal classification algorithm when learning a perceptron from noisy random training examples is calculated exactly in the limit of large training sample size and large instance space dimension using methods of statistical mechanics. It is shown that under certain assumptions, in this "thermodynamic" limit, the probability of misclassification of Bayes optimal algorithm is less than that of a canonical stochastic learning algorithm, by a factor approaching p 2 as the ratio of number of training examples to instance space dimension grows. Exact asymptotic learning curves for both algorithms are derived for particular distributions. In addition, it is shown that the learning performance of Bayes optimal algorithm can be approximated by certain learning algorithms that use a neural net with a layer of hidden units to learn a perceptron. 1 Introduction Extending a line of research initiated by Elizabeth Gardner [Gar88, GD88], exceptional progress has been ...
Apple tasting
- Information and Computation
"... In the standard on-line model the learning algorithm tries to minimize the total number of mistakes made in a series of trials. On each trial the learner sees an instance, makes a prediction of its classification, then finds out the correct classification. We define a natural variant of this model ( ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
In the standard on-line model the learning algorithm tries to minimize the total number of mistakes made in a series of trials. On each trial the learner sees an instance, makes a prediction of its classification, then finds out the correct classification. We define a natural variant of this model ("apple tasting") where • the classes are interpreted as the good and bad instances, and • the prediction is interpreted as accepting or rejecting the instance, • the learner gets feedback only when the instance is accepted. We use two transformations to relate the apple tasting model to an enhanced standard model where false acceptances are counted separately from false rejections. We apply our results to obtain a good generalpurpose apple tasting algorithm as well as nearly optimal apple tasting algorithms for a variety of standard classes, such as conjunctions and disjunctions of n boolean variables. We also present and analyze a simpler transformation useful when the instances are drawn at random rather than selected by an adversary. © 2000 Academic Press
Learning Linearly Separable Languages
- In Proceedings of The 17th International Conference on Algorithmic Learning Theory (ALT 2006
, 2006
"... Abstract. This paper presents a novel paradigm for learning languages that consists of mapping strings to an appropriate high-dimensional feature space and learning a separating hyperplane in that space. It initiates the study of the linear separability of automata and languages by examining the ric ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Abstract. This paper presents a novel paradigm for learning languages that consists of mapping strings to an appropriate high-dimensional feature space and learning a separating hyperplane in that space. It initiates the study of the linear separability of automata and languages by examining the rich class of piecewise-testable languages. It introduces a high-dimensional feature map and proves piecewise-testable languages to be linearly separable in that space. The proof makes use of word combinatorial results relating to subsequences. It also shows that the positive definite kernel associated to this embedding can be computed in quadratic time. It examines the use of support vector machines in combination with this kernel to determine a separating hyperplane and the corresponding learning guarantees. It also proves that all languages linearly separable under a regular finite cover embedding, a generalization of the embedding we used, are regular. 1
Improved Lower Bounds for Learning from Noisy Examples: an Information-Theoretic Approach
- Proc eedings of the 11th Annual Conference on Computational Learning Theory
, 1998
"... This paper presents a general information-theoretic approach for obtaining lower bounds on the num-ber of examples needed to PAC learn in the pres-ence of noise. This approach deals directly with the fundamental information quantities, avoiding a Bayesian analysis. The technique is applied to severa ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper presents a general information-theoretic approach for obtaining lower bounds on the num-ber of examples needed to PAC learn in the pres-ence of noise. This approach deals directly with the fundamental information quantities, avoiding a Bayesian analysis. The technique is applied to several different models, illustrating its generality and power. The resulting bounds add logarithmic factors to (or improve the constants in) previously known lower bounds. 1
Minimizing Disagreement for Geometric Regions Using Dynamic Programming, with Applications to Machine Learning and Computer Graphics
, 1996
"... We demonstrate that the dynamic programming paradigm is an effective tool in the design of efficient algorithms for solving minimumdisagreement problems for convex polygons, star-shaped polygons, unions of axis-parallel boxes and various other classes of geometric regions. In particular, we show tha ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We demonstrate that the dynamic programming paradigm is an effective tool in the design of efficient algorithms for solving minimumdisagreement problems for convex polygons, star-shaped polygons, unions of axis-parallel boxes and various other classes of geometric regions. In particular, we show that the minimizing disagreement problem for convex k-gons on a sample of size n can be solved in O(n 6 k) time. Together with earlier known results, we obtain algorithms for learning these geometric regions in the agnostic PAC learning model and the PAC model with random classification noise. Furthermore, these algorithms also allow us to track slowly drifting concept from these geometric regions. Most of these algorithms can be naturally adapted to solve related discrepancy problems that have applications in image compression, geometrical clustering and numerical integration. 1 Introduction 1.1 The Minimum Disagreements Problem For a collection S of n points in R d , each point being l...
Unlabeled compression schemes for maximum classes
- Journal of Machine Learning Research
, 2006
"... Abstract. We give a compression scheme for any maximum class of VC dimension d that compresses any sample consistent with a concept in the class to at most d unlabeled points from the domain of the sample. 1 ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. We give a compression scheme for any maximum class of VC dimension d that compresses any sample consistent with a concept in the class to at most d unlabeled points from the domain of the sample. 1
Shifting: One-Inclusion Mistake Bounds and Sample Compression
- EECS DEPARTMENT, UNIVERSITY OF CALIFORNIA, BERKELEY
, 2007
"... ..."

