Results 1  10
of
50
Boosting a Weak Learning Algorithm By Majority
, 1995
"... We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas pr ..."
Abstract

Cited by 425 (15 self)
 Add to MetaCart
We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas presented by Schapire in his paper "The strength of weak learnability", and represents an improvement over his results. The analysis of our algorithm provides general upper bounds on the resources required for learning in Valiant's polynomial PAC learning framework, which are the best general upper bounds known today. We show that the number of hypotheses that are combined by our algorithm is the smallest number possible. Other outcomes of our analysis are results regarding the representational power of threshold circuits, the relation between learnability and compression, and a method for parallelizing PAC learning algorithms. We provide extensions of our algorithms to cases in which the conc...
Kernel matching pursuit
 Machine Learning
, 2002
"... Matching Pursuit algorithms learn a function that is a weighted sum of basis functions, by sequentially appending functions to an initially empty basis, to approximate a target function in the leastsquares sense. We show how matching pursuit can be extended to use nonsquared error loss functions, a ..."
Abstract

Cited by 61 (0 self)
 Add to MetaCart
Matching Pursuit algorithms learn a function that is a weighted sum of basis functions, by sequentially appending functions to an initially empty basis, to approximate a target function in the leastsquares sense. We show how matching pursuit can be extended to use nonsquared error loss functions, and how it can be used to build kernelbased solutions to machinelearning problems, while keeping control of the sparsity of the solution. We also derive MDL motivated generalization bounds for this type of algorithm, and compare them to related SVM (Support Vector Machine) bounds. Finally, links to boosting algorithms and RBF training procedures, as well as an extensive experimental comparison with SVMs for classification are given, showing comparable results with typically sparser models. 1
The scenario approach to robust control design
 IEEE TRANS. AUTOM. CONTROL
, 2006
"... This paper proposes a new probabilistic solution framework for robust control analysis and synthesis problems that can be expressed in the form of minimization of a linear objective subject to convex constraints parameterized by uncertainty terms. This includes the wide class of NPhard control prob ..."
Abstract

Cited by 50 (6 self)
 Add to MetaCart
This paper proposes a new probabilistic solution framework for robust control analysis and synthesis problems that can be expressed in the form of minimization of a linear objective subject to convex constraints parameterized by uncertainty terms. This includes the wide class of NPhard control problems representable by means of parameterdependent linear matrix inequalities (LMIs). It is shown in this paper that by appropriate sampling of the constraints one obtains a standard convex optimization problem (the scenario problem) whose solution is approximately feasible for the original (usually infinite) set of constraints, i.e., the measure of the set of original constraints that are violated by the scenario solution rapidly decreases to zero as the number of samples is increased. We provide an explicit and efficient bound on the number of samples required to attain apriori specified levels of probabilistic guarantee of robustness. A rich family of control problems which are in general hard to solve in a deterministically robust sense is therefore amenable to polynomialtime solution, if robustness is intended in the proposed riskadjusted sense.
Coresets, sparse greedy approximation and the FrankWolfe algorithm
 Proceedings of the 19th Annual ACMSIAM Symposium on Discrete Algorithms
"... The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm an ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
The problem of maximizing a concave function f(x) in a simplex S can be solved approximately by a simple greedy algorithm. For given k, the algorithm can find a point x(k) on a kdimensional face of S, such that f(x(k)) ≥ f(x∗) − O(1/k). Here f(x∗) is the maximum value of f in S. This algorithm and analysis were known before, and related to problems of statistics and machine learning, such as boosting, regression, and density mixture estimation. In other work, coming from computational geometry, the existence of ɛcoresets was shown for the minimum enclosing ball problem, by means of a simple greedy algorithm. Similar greedy algorithms, that are special cases of the FrankWolfe algorithm, were described for other enclosure problems. Here these results are tied together, stronger convergence results are reviewed, and several coreset bounds are generalized or strengthened.
Algorithmic luckiness
 Journal of Machine Learning Research
, 2002
"... Classical statistical learning theory studies the generalisation performance of machine learning algorithms rather indirectly. One of the main detours is that algorithms are studied in terms of the hypothesis class that they draw their hypotheses from. In this paper, motivated by the luckiness frame ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
Classical statistical learning theory studies the generalisation performance of machine learning algorithms rather indirectly. One of the main detours is that algorithms are studied in terms of the hypothesis class that they draw their hypotheses from. In this paper, motivated by the luckiness framework of ShaweTaylor et al. (1998), we study learning algorithms more directly and in a way that allows us to exploit the serendipity of the training sample. The main dierence to previous approaches lies in the complexity measure; rather than covering all hypotheses in a given hypothesis space it is only necessary to cover the functions which could have been learned using the fixed learning algorithm. We show how the resulting framework relates to the VC, luckiness and compression frameworks. Finally, we present an application of this framework to the maximum margin algorithm for linear classiers which results in a bound that exploits the margin, the sparsity of the resultant weight vector, and the degree of clustering of the training data in feature space.
The Set Covering Machine
, 2002
"... We extend the classical algorithms of Valiant and Haussler for learning compact conjunctions and disjunctions of Boolean attributes to allow features that are constructed from the data and to allow a tradeoff between accuracy and complexity. The result is a generalpurpose learning machine, suitabl ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
We extend the classical algorithms of Valiant and Haussler for learning compact conjunctions and disjunctions of Boolean attributes to allow features that are constructed from the data and to allow a tradeoff between accuracy and complexity. The result is a generalpurpose learning machine, suitable for practical learning tasks, that we call the set covering machine. We present a version of the set covering machine that uses datadependent balls for its set of features and compare its performance with the support vector machine. By extending a technique pioneered by Littlestone and Warmuth, we bound its generalization error as a function of the amount of data compression it achieves during training. In experiments with realworld learning tasks, the bound is shown to be extremely tight and to provide an effective guide for model selection.
A compression approach to support vector model selection
 Journal of Machine Learning Research
, 2004
"... This report is available in PDF–format via anonymous ftp at ..."
Abstract

Cited by 21 (5 self)
 Add to MetaCart
This report is available in PDF–format via anonymous ftp at
Generalisation Error Bounds for Sparse Linear Classifiers
 IN PROCEEDINGS OF THE THIRTEENTH ANNUAL CONFERENCE ON COMPUTATIONAL LEARNING THEORY
, 2000
"... We provide small sample size bounds on the generalisation error of linear classifiers that are sparse in their dual representation given by the expansion coefficients of the weight vector in terms of the training data. These results theoretically justify algorithms like the Support Vector Machine, t ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
We provide small sample size bounds on the generalisation error of linear classifiers that are sparse in their dual representation given by the expansion coefficients of the weight vector in terms of the training data. These results theoretically justify algorithms like the Support Vector Machine, the Relevance Vector Machine and Knearestneighbour. The bounds are aposteriori bounds to be evaluated after learning when the attained level of sparsity is known. In a PACBayesian style prior knowledge about the expected sparsity is incorporated into the bounds. The proofs avoid the use of double sample arguments by taking into account the sparsity that leaves unused training points for the evaluation of classifiers. We furthermore give a PACBayesian bound on the average generalisation error over subsets of parameter space that may pave the way combining sparsity in the expansion coefficients and margin in a single bound. Finally, reinterpreting a mistake bound for the classical perceptr...
Data Filtering and Distribution Modeling Algorithms for Machine Learning
, 1993
"... vi Acknowledgments vii 1. Introduction 1 1.1 Boosting by majority : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 1.2 Query By Committee : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 1.3 Learning distributions of binary vectors : : ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
vi Acknowledgments vii 1. Introduction 1 1.1 Boosting by majority : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 1.2 Query By Committee : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 1.3 Learning distributions of binary vectors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2. Boosting a weak learning algorithm by majority 10 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2 The majorityvote game : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14 2.2.1 Optimality of the weighting scheme : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 2.2.2 The representational power of majority gates : : : : : : : : : : : : : : : : : : : : : : 20 2.3 Boosting a weak learner using a majority vote : : : : : : : : : : : : : : : : : : : : : : : : : : 22 2.3.1 Preliminaries : : : : : : : : : : : : : : : : : : : : : : : : : :...
PACMDL bounds
, 2003
"... We point out that a number of standard sample complexity bounds (VCdimension, PACBayes, and others) are all related to the number of bits required to communicate the labels given the unlabeled data for a natural communication game. Motivated by this observation, we give a general sample comple ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
We point out that a number of standard sample complexity bounds (VCdimension, PACBayes, and others) are all related to the number of bits required to communicate the labels given the unlabeled data for a natural communication game. Motivated by this observation, we give a general sample complexity bound based on this game that allows us to unify these dierent bounds in one common framework.