In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
|
5044
|
Statistical Learning Theory
– Vapnik
- 1998
|
|
3316
|
Neural Networks for Pattern Recognition
– Bishop
- 1995
|
|
3170
|
The mathematical theory of communication
– Shannon
- 1962
|
|
1240
|
A tutorial on support vector machines for pattern recognition
– Burges
- 1998
|
|
1127
|
Kolmogorov complexity and its applications
– Li, Vitinyi
- 1990
|
|
1091
|
Support-vector network
– Cortes, Vapnik
- 1995
|
|
1073
|
A.J.: Learning with Kernels
– Schölkopf, Smola
- 2002
|
|
993
|
Robust Statistics
– Huber
- 1981
|
|
805
|
Making large-scale SVM learning practical
– Joachims
- 1999
|
|
740
|
Modeling by shortest data description
– Rissanen
- 1978
|
|
719
|
A training algorithm for optimal margin classifiers
– Boser, Guyon, et al.
- 1992
|
|
698
|
Practical Methods of Optimization
– Fletcher
- 1987
|
|
679
|
On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications
– Vapnik, Chervonenkis
|
|
597
|
2000), An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge Univ
– Cristianini, Shawe-Taylor
|
|
564
|
Nonlinear component analysis as a kernel eigenvalue problem
– Schölkopf, Smola, et al.
- 1998
|
|
544
|
Fast training of support vector machines using sequential minimal optimization. Advances in kernel methods – support vector learning
– Platt
- 1998
|
|
512
|
Atomic decomposition by basis pursuit
– Chen, Donoho, et al.
- 1998
|
|
496
|
Linear Programming and Extensions
– Dantzig
- 1998
|
|
452
|
Cross-validatory choice and assessment of statistical predictions
– Stone
- 1974
|
|
447
|
Nonlinear Programming; Theory and Algorithms
– Bazaraa, Sherali, et al.
- 1993
|
|
422
|
Bayesian Learning for Neural Networks
– Neal
- 1996
|
|
372
|
Theory of reproducing kernels
– Aronszajn
- 1950
|
|
356
|
Nonlinear programming. Athena Scientific
– Bertsekas
- 1995
|
|
302
|
The Jackknife, the Bootstrap and Other Resampling Plans, (Philadelphia, Society for Industrial and Applied Mathematics
– Efron
- 1982
|
|
251
|
Estimating the support of a high-dimensional distribution
– Schölkopf, Platt, et al.
- 2001
|
|
200
|
Structural risk minimization over data-dependent hierarchies
– Shawe-Taylor, Bartlett, et al.
- 1998
|
|
197
|
On the implementation of a primal-dual interior point method
– Mehrotra
- 1992
|
|
174
|
An improved training algorithm for support vector machines
– Osuna, Freund, et al.
- 1997
|
|
173
|
New support vector algorithms
– Scholkopf, Smola, et al.
- 2000
|
|
167
|
Robust linear programming discrimination of two linearly inseparable sets
– Bennett, Mangasarian
- 1992
|
|
164
|
Theoretical foundations of the potential function method in pattern recognition learning
– Aizerman, Braverman, et al.
- 1964
|
|
152
|
Adaptive Control Processes
– Bellman
- 1961
|
|
145
|
Extracting support data for a given task
– Schölkopf, Burges, et al.
- 1995
|
|
143
|
An equivalence between sparse approximation and support vector machines
– Girosi
- 1998
|
|
142
|
Nonlinear Programming
– Mangasarian
- 1994
|
|
140
|
Functions of Positive and Negative Type and their Connection with the Theory of Integral Equations
– Mercer
- 1909
|
|
130
|
Some results on Tchebycheffian spline functions
– Kimeldorf, Wahba
- 1971
|
|
125
|
Bayesian Methods for Adaptive Models
– Mackay
- 1991
|
|
120
|
Support vector method for function approximation, regression estimation and signal processing
– Vapnik, Golowich, et al.
|
|
111
|
Improvements to Platt’s SMO algorithm for SVM classifier design
– Keerthi, Shevade, et al.
- 2001
|
|
111
|
LOQO: an interior point code for quadratic programming
– VANDERBEI
- 1999
|
|
109
|
Nonlinear programming
– Kuhn, Tucker
- 1951
|
|
102
|
Support vector machines, reproducing kernel Hilbert spaces, and randomized gacv
– Wahba
- 1998
|
|
101
|
Sparse greedy matrix approximation for machine learning
– Smola, Schölkopf
- 2000
|
|
100
|
Improving the accuracy and speed of support vector machines
– Burges, Scholkopf
- 1997
|
|
98
|
Simplified support vector decision rules
– Burges
- 1996
|
|
95
|
of solving incorrectly posed problems
– Morozov
- 1984
|
|
95
|
Evaluation of Gaussian Processes and other Methods for Non-Linear Regression
– Rasmussen
- 1996
|
|
89
|
Probabilistic kernel regression models
– Jaakkola, Haussler
- 1999
|
|
86
|
Input space vs. feature space in kernel-based methods
– Scholkopf, Mika, et al.
- 1999
|