Results 1 - 10
of
15
An introduction to boosting and leveraging
- Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
Learning the Kernel with Hyperkernels
, 2003
"... This paper addresses the problem of choosing a kernel suitable for estimation with a Support Vector Machine, hence further automating machine learning. This goal is achieved by defining a Reproducing Kernel Hilbert Space on the space of kernels itself. Such a formulation leads to a statistical es ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
This paper addresses the problem of choosing a kernel suitable for estimation with a Support Vector Machine, hence further automating machine learning. This goal is achieved by defining a Reproducing Kernel Hilbert Space on the space of kernels itself. Such a formulation leads to a statistical estimation problem very much akin to the problem of minimizing a regularized risk functional.
The Set Covering Machine
- JOURNAL OF MACHINE LEARNING REASEARCH
, 2002
"... We extend the classical algorithms of Valiant and Haussler for learning compact conjunctions and disjunctions of Boolean attributes to allow features that are constructed from the data and to allow a trade-off between accuracy and complexity. The result is a generalpurpose learning machine, suita ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
We extend the classical algorithms of Valiant and Haussler for learning compact conjunctions and disjunctions of Boolean attributes to allow features that are constructed from the data and to allow a trade-off between accuracy and complexity. The result is a generalpurpose learning machine, suitable for practical learning tasks, that we call the set covering machine. We present a version of the set covering machine that uses data-dependent balls for its set of features and compare its performance with the support vector machine. By extending a technique pioneered by Littlestone and Warmuth, we bound its generalization error as a function of the amount of data compression it achieves during training. In experiments with real-world learning tasks, the bound is shown to be extremely tight and to provide an effective guide for model selection.
Controlling sparseness in nonnegative tensor factorization
- IN: ECCV. (2006
, 2006
"... Non-negative tensor factorization (NTF) has recently been proposed as sparse and efficient image representation (Welling and Weber, Patt. Rec. Let., 2001). Until now, sparsity of the tensor factorization has been empirically observed in many cases, but there was no systematic way to control it. In ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Non-negative tensor factorization (NTF) has recently been proposed as sparse and efficient image representation (Welling and Weber, Patt. Rec. Let., 2001). Until now, sparsity of the tensor factorization has been empirically observed in many cases, but there was no systematic way to control it. In this work, we show that a sparsity measure recently proposed for non-negative matrix factorization (Hoyer, J. Mach. Learn. Res., 2004) applies to NTF and allows precise control over sparseness of the resulting factorization. We devise an algorithm based on sequential conic programming and show improved performance over classical NTF codes on artificial and on real-world data sets.
Simpler knowledge-based support vector machines
- In ICML
, 2006
"... If appropriately used, prior knowledge can significantly improve the predictive accuracy of learning algorithms or reduce the amount of training data needed. In this paper we introduce a simple method to incorporate prior knowledge in support vector machines by modifying the hypothesis space rather ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
If appropriately used, prior knowledge can significantly improve the predictive accuracy of learning algorithms or reduce the amount of training data needed. In this paper we introduce a simple method to incorporate prior knowledge in support vector machines by modifying the hypothesis space rather than the optimization problem. The optimization problem is amenable to solution by the constrained concave convex procedure, which finds a local optimum. The paper discusses different kinds of prior knowledge and demonstrates the applicability of the approach in some characteristic experiments. 1.
Random subclass bounds
- In Proceedings of the 16th Annual Conference on Computational Learning Theory (COLT
, 2003
"... Abstract. It has been recently shown that sharp generalization bounds can be obtained when the function class from which the algorithm chooses its hypotheses is “small ” in the sense that the Rademacher averages of this function class are small [8, 9]. Seemingly based on different arguments, general ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract. It has been recently shown that sharp generalization bounds can be obtained when the function class from which the algorithm chooses its hypotheses is “small ” in the sense that the Rademacher averages of this function class are small [8, 9]. Seemingly based on different arguments, generalization bounds were obtained in the compression scheme [7], luckiness [13], and algorithmic luckiness [6] frameworks in which the “size ” of the function class is not specified a priori. We show that the bounds obtained in all these frameworks follow from the same general principle, namely that coordinate projections of this function subclass evaluated on random samples are “small ” with high probability.
PAC-Bayesian generalisation error bounds for gaussian process classification
- Journal of Machine Learning Research
, 2002
"... Approximate Bayesian Gaussian process (GP) classification techniques are powerful nonparametric learning methods, similar in appearance and performance to support vector machines. Based on simple probabilistic models, they render interpretable results and can be embedded in Bayesian frameworks for m ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Approximate Bayesian Gaussian process (GP) classification techniques are powerful nonparametric learning methods, similar in appearance and performance to support vector machines. Based on simple probabilistic models, they render interpretable results and can be embedded in Bayesian frameworks for model selection, feature selection, etc. In this paper, by applying the PAC-Bayesian theorem of McAllester (1999a), we prove distributionfree generalisation error bounds for a wide range of approximate Bayesian GP classification techniques. We also provide a new and much simplified proof for this powerful theorem, making use of the concept of convex duality which is a backbone of many machine learning techniques. We instantiate and test our bounds for two particular GPC techniques, including a recent sparse method which circumvents the unfavourable scaling of standard GP algorithms. As is shown in experiments on a real-world task, the bounds can be very tight for moderate training sample sizes. To the best of our knowledge, these results provide the tightest known distribution-free error bounds for approximate Bayesian GPC methods, giving a strong learning-theoretical justification for the use of these techniques.
Mathematical Aspects of Neural Networks
- European Symposium of Artificial Neural Networks 2003
, 2003
"... In this tutorial paper about mathematical aspects of neural networks, we will focus on two directions: on the one hand, we will motivate standard mathematical questions and well studied theory of classical neural models used in machine learning. On the other hand, we collect some recent theoretic ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
In this tutorial paper about mathematical aspects of neural networks, we will focus on two directions: on the one hand, we will motivate standard mathematical questions and well studied theory of classical neural models used in machine learning. On the other hand, we collect some recent theoretical results (as of beginning of 2003) in the respective areas. Thereby, we follow the dichotomy offered by the overall network structure and restrict ourselves to feedforward networks, recurrent networks, and self-organizing neural systems, respectively.
PAC-Bayesian compression bounds on the prediction error of learning algorithms for classification
- Machine Learning
, 2005
"... We consider bounds on the prediction error of classification algorithms based on sample compression. We refine the notion of a compression scheme to distinguish permutation and repetition invariant and non-permutation and repetition invariant compression schemes leading to different prediction error ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We consider bounds on the prediction error of classification algorithms based on sample compression. We refine the notion of a compression scheme to distinguish permutation and repetition invariant and non-permutation and repetition invariant compression schemes leading to different prediction error bounds. Also, we extend known results on compression to the case of non-zero empirical risk. We provide bounds on the prediction error of classifiers returned by mistakedriven online learning algorithms by interpreting mistake bounds as bounds on the size of the respective compression scheme of the algorithm. This leads to a bound on the prediction error of perceptron solutions that depends on the margin a support vector machine would achieve on the same training sample. Furthermore, using the property of compression we derive bounds on the average prediction error of kernel classifiers in the PAC-Bayesian framework. These bounds assume a prior measure over the expansion coefficients in the data-dependent kernel expansion and bound the average prediction error uniformly over subsets of the space of expansion coefficients. 1.
Reverse-convex programming for sparse image codes
- In Proc. of Energy Minim. Methods in Comp. Vision and Pattern Recog. (EMMCVPR), volume 3757 of LNCS
, 2005
"... Abstract. Reverse-convex programming (RCP) concerns global optimization of a specific class of non-convex optimization problems. We show that a recently proposed model for sparse non-negative matrix factorization (NMF) belongs to this class. Based on this result, we design two algorithms for sparse ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. Reverse-convex programming (RCP) concerns global optimization of a specific class of non-convex optimization problems. We show that a recently proposed model for sparse non-negative matrix factorization (NMF) belongs to this class. Based on this result, we design two algorithms for sparse NMF that solve sequences of convex secondorder cone programs (SOCP). We work out some well-defined modifications of NMF that leave the original model invariant from the optimization viewpoint. They considerably generalize the sparse NMF setting to account for uncertainty in sparseness, for supervised learning, and, by dropping the non-negativity constraint, for sparsity-controlled PCA. 1 Introduction and Related Work Reverse-convex programming (RCP) is a powerful framework from global optimization which, among others, subsumes d.c. programming [1]. Motivated by a recently proposed model for sparse non-negative matrix factorization [2], we

