## Scaling Kernel-Based Systems to Large Data Sets (2001)

### Cached

### Download Links

- [www.kernel-machines.org]
- [wwwbrauer.informatik.tu-muenchen.de]
- [tresp.org]
- [wwwbrauer.in.tum.de]
- DBLP

### Other Repositories/Bibliography

Venue: | Data Mining and Knowledge Discovery |

Citations: | 11 - 2 self |

### BibTeX

@ARTICLE{Tresp01scalingkernel-based,

author = {Volker Tresp},

title = {Scaling Kernel-Based Systems to Large Data Sets},

journal = {Data Mining and Knowledge Discovery},

year = {2001},

volume = {5},

pages = {2001}

}

### OpenURL

### Abstract

. In the form of the support vector machine and Gaussian processes, kernel-based systems are currently very popular approaches to supervised learning. Unfortunately, the computational load for training kernel-based systems increases drastically with the size of the training data set, such that these systems are not ideal candidates for applications with large data sets. Nevertheless, research in this direction is very active. In this paper, I review some of the current approaches toward scaling kernel-based systems to large data sets. Keywords: Kernel-based systems, support vector machine, Gaussian processes, committee machines, massive data sets 1.

### Citations

9827 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...ght w i for many kernels is zero after training. The remaining kernels with nonzero weights dene the support vectors. It can be shown that the solution minimizes a bound on the generalization error (V=-=apnik, 1-=-998). Vapnik kdd_journal2.tex; 3/08/2001; 15:14; p.4 Scaling Kernel-Based Systems 5 (1998), Scholkopf, Burges and Smola (1999), Christianini and ShaweTaylor (2000), and Muller, Mika, Ratsch, Tsuda, an... |

1395 |
Spline Models for Observational Data
- Wahba
- 1990
(Show Context)
Citation Context ...ggio and Girosi (1990) are essentially identical to GPR. Here, the kernels are Green's functions derived from the appropriate regularization problem. Similarly, smoothing splines are closely related (=-=Wahba, 1990-=-). The relevance vector machine (Tipping, 2000) achieves sparseness by pruning away dimensions of the weight vector using an evidence framework. Finally, the kernel Fisher discriminant is the well-kno... |

1103 | Fast training of support vector machines using sequential minimal optimization - Platt - 1999 |

700 | The strength of weak learnability - Schapire - 1990 |

695 | Networks for Approximation and Learning - Poggio, Girosi - 1990 |

505 | Making large-scale support vector machine learning practical - Joachims - 1999 |

419 | An introduction to kernel-based learning algorithm - Muller, Mika, et al. |

334 | Multivariate Statistical Modelling Based on Generalized Linear Models - FAHRMEIR, TUTZ - 2001 |

229 | The relevance vector machine
- Tipping
- 2000
(Show Context)
Citation Context ...ical to GPR. Here, the kernels are Green's functions derived from the appropriate regularization problem. Similarly, smoothing splines are closely related (Wahba, 1990). The relevance vector machine (=-=Tipping, 2000-=-) achieves sparseness by pruning away dimensions of the weight vector using an evidence framework. Finally, the kernel Fisher discriminant is the well-known linear Fisher discriminant approach transfo... |

187 | Sparse greedy matrix approximation for machine learning - Smola, Bartlett, et al. - 2000 |

139 | Bayesian classification with Gaussian Processes - Williams, Barber - 1998 |

119 |
Introduction to Gaussian processes
- MacKay
- 1998
(Show Context)
Citation Context ...performance, particularly when M has a large number of small eigenvalues. This approach is due to Skilling and was one of the earliest approaches to speeding up the training of GPR systems (Gibbs and =-=MacKay, 199-=-7). Note that the computational complexity of this approach is quadratic in the size of training data set and this approach is therefore not well suited for massive data sets. 3.4.2. The Nystrom Metho... |

114 | Sparse greedy gaussian process regression - Smola, Bartlett - 2000 |

80 | Support Vector Machines - Cristianini, Shawe-Taylor - 2000 |

77 | A bayesian committee machine - Tresp |

76 | Efficient implementation of Gaussian processes
- Gibbs, Mackay
- 1997
(Show Context)
Citation Context ...t loss in performance, particularly when M has a large number of small eigenvalues. This approach is due to Skilling and was one of the earliest approaches to speeding up the training of GPR systems (=-=Gibbs and MacKay, 199-=-7). Note that the computational complexity of this approach is quadratic in the size of training data set and this approach is therefore not well suited for massive data sets. 3.4.2. The Nystrom Metho... |

49 | Sparse representation for Gaussian process models - Csato, Opper - 2001 |

32 | An Introduction to Support Vector Machines - Christianini, Taylor - 2000 |

25 | Active support vector machine classi - Mangasarian, Musicant - 2001 |

16 | Towards scalable support vector machines using squashing
- Pavlov, Chudova, et al.
- 2000
(Show Context)
Citation Context ...er case, the cluster centers are used as training patterns. The idea of preclustering data has been extended in an interesting direction by applying the concept of squashing to training a linear SVM (=-=Pavlov, Chudova, and Smyth, 200-=-0). For training, the SMO algorithm is used. Clustering is performed using a metric derived from the likelihood prole of the data. First, a small percentage of the original training data set are rando... |

16 | Bayesian classi with Gaussian processes - Williams, Barber - 1998 |

10 | Finite-dimensional approximation of Gaussian processes - Ferrari-Trecate, Williams, et al. - 1999 |

7 |
Reduced support vector machines. Data Mining Institute
- Lee, Mangasarian
- 2000
(Show Context)
Citation Context ...omposition. The connection between the decomposition of the Gram matrix in this section and the BCM approximation is discussed in the Appendix. 3.5.3. Reduced Support Vector Machines (RSVM) The RSVM (=-=Lee and Mangasarian, 2000-=-) uses a nonstandard SVM cost function of the form 1 2 (w 0 w + b 2 ) + Cg 0 g: If we compare this equation with the original SVM cost function of Equation 6, we notice that the cost term for the weig... |

6 | Lagrangian Support Vector Regression. Data mining Institute - Mangasarian, Musicant - 2000 |

6 | Fast training of support vector classifiers - Pérez-Cruz, Alarcón-Diana, et al. - 2001 |

5 | The Bayesian committee support vector machine
- Schwaighofer, Tresp
- 2001
(Show Context)
Citation Context ...the BCM to GPR in Tresp (2000). For GPR, the mean and the covariance of the posterior Gaussian densities are readily computed. Subsequently, the BCM was applied to GGPR (Tresp, 2000b) and to the SVM (=-=Schwaighofer and Tresp, 200-=-1). For both the GGPR and the SVM, the posterior distributions are only approximately Gaussian. In Tresp (2000), it was shown that NQ , the dimension of f q , should be at least as large as the eectiv... |

5 | Scalable kernel systems - Tresp, Schwaighofer - 2001 |

2 |
Lagrangian support vector machine. (Data Mining Institute
- Mangasarian, Musicant
- 2000
(Show Context)
Citation Context ...steps issnite. As an example, a data set with 7 million points required only 5 iterations and needed 95 CPU minutes. 3.3.2. Lagrange Support Vector Machine (LSVM) A variation on the ASVM is the LSVM (=-=Mangasarian and Musicant, 200-=-0). It is based on the same reformulation of the optimization problem and leads to the same dual problem. The dierence is that the LSVM works directly with the Karush-Kuhn-Tucker necessary and sucient... |

1 | Scaling support vector machines using boosting algorithm - Pavlov, Mao, et al. - 2000 |

1 | Fast training of support vector classi - Perez-Cruz, Alarcon-Diana, et al. - 2001 |

1 | Advances in kernel methods. Location - Scholkopf, Burges, et al. - 1999 |