## Ridge Regression Learning Algorithm in Dual Variables (1998)

Venue: | In Proceedings of the 15th International Conference on Machine Learning |

Citations: | 107 - 7 self |

### BibTeX

@INPROCEEDINGS{Saunders98ridgeregression,

author = {C. Saunders and A. Gammerman and V. Vovk},

title = {Ridge Regression Learning Algorithm in Dual Variables},

booktitle = {In Proceedings of the 15th International Conference on Machine Learning},

year = {1998},

pages = {515--521},

publisher = {Morgan Kaufmann}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we study a dual version of the Ridge Regression procedure. It allows us to perform non-linear regression by constructing a linear regression function in a high dimensional feature space. The feature space representation can result in a large increase in the number of parameters used by the algorithm. In order to combat this "curse of dimensionality", the algorithm allows the use of kernel functions, as used in Support Vector methods. We also discuss a powerful family of kernel functions which is constructed using the ANOVA decomposition method from the kernel corresponding to splines with an infinite number of nodes. This paper introduces a regression estimation algorithm which is a combination of these two elements: the dual version of Ridge Regression is applied to the ANOVA enhancement of the infinitenode splines. Experimental results are then presented (based on the Boston Housing data set) which indicate the performance of this algorithm relative to other algorithms....

### Citations

9946 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...ed positive constant. We now derive a "dual version" for Ridge Regression (RR); since we allow a = 0, this includes Least Squares (LS) as a special case. In this derivation we partially foll=-=ow Vapnik [8]. We sta-=-rt with re-expressing our problem as: minimize the expression akwk 2 + T X t=1 �� 2 t (1) under the constraints y t \Gamma w \Delta x t = �� t ; t = 1; : : : ; T: (2) Introducing Lagrange mult... |

2765 | Bagging predictors
- Breiman
- 1996
(Show Context)
Citation Context ... = p X i=1 K i (x; y): 6 EXPERIMENTAL RESULTS Experiments were conducted on the Boston Housing data set 2 . This is a well known data set for testing non-linear regression methods; see, e.g., Breiman =-=[1]-=- and Saunders [6]. The data set consists of 506 cases in which 12 continuous variables and 1 binary variable determine the median house price in a certain area of Boston in thousands of dollars. The c... |

294 |
Some results on Tchebycheffian spline functions
- Kimeldorf, Wahba
- 1971
(Show Context)
Citation Context ...Kn (x; y) = n Y i=1 k(x i ; y i ): One such kernel (to which the ANOVA decomposition is applied here) is the spline kernel with an infinite number of nodes (see Vapnik [8, 10] and Kimeldorf and Wahba =-=[5]-=-). A spline approximation which has an infinite number of nodes can be defined on the interval (0; a), 0 ! a ! 1, as the expansion f(x) = Z a 0 a(t)(x \Gamma t) d + dt + d X i=0 a i x i ; where a i , ... |

162 | Support vector machines, reproducing kernel Hilbert spaces and their randomized GACV
- Wahba
- 1999
(Show Context)
Citation Context ...at space. Kernel functions themselves can take many forms and particular attention is paid to a family of kernel functions which are constructed using ANOVA decomposition (Vapnik [10]; see also Wahba =-=[11, 12]-=-). There are two major objectives of this paper: 1. To show how to use kernel functions to overcome the curse of dimensionality in the above mentioned algorithms. 2. To demonstrate how ANOVA decomposi... |

158 | Support vector regression machines
- Drucker, Burges, et al.
- 1996
(Show Context)
Citation Context ...he test set. Least Squares and Ridge Regression are classical statistical algorithms which have been known for a long time. They have been widely used, and recently some papers such as Drucker et al. =-=[2]-=- have used regression in conjunction with a high dimensional feature space. That is the original input vectors are mapped into some feature space, and the algorithms are then used to construct a linea... |

150 |
Spline models for observational data, volume 59
- Wahba
- 1990
(Show Context)
Citation Context ...apnik's) gives some extra insight: see, e.g., equations (4) and (6). For an excellent survey of connections between Support Vector Machine and the work done in statistics we refer the reader to Wahba =-=[11, 12]-=- and Girosi [4]. 7.2 KRIEGING Formula (8) is well known in the theory of Krieging; in this subsection we will explain the connection for readers who are familiar with Krieging. Consider the Bayesian s... |

76 | Learning by transduction, in
- Gammerman, Vovk, et al.
- 1998
(Show Context)
Citation Context ...rge number of parameters. We feel that a very interesting direction of developing the results of this paper would be to combine the dual version of Ridge Regression with the ideas of Gammermanset al. =-=[3]-=- to obtain a measure of confidence for predictions output by our algorithms. We expect that in this case simple closed-form formulas can be obtained. Acknowledgments We thank EPSRC for providing finan... |

34 | Support vector regression with ANOVA decomposition kernels
- Stitson, Gammerman, et al.
- 1997
(Show Context)
Citation Context ...duced by ANOVA decomposition, only the order p is considered: K(x; y) = K p (x; y): An alternative method of using ANOVA decomposition would be to consider order p and all lower orders (as in Stitson =-=[7]-=-), i.e., K(x; y) = p X i=1 K i (x; y): 6 EXPERIMENTAL RESULTS Experiments were conducted on the Boston Housing data set 2 . This is a well known data set for testing non-linear regression methods; see... |

10 |
Ridge regression in dual variables
- Saunders, Gammermann, et al.
- 1998
(Show Context)
Citation Context ...; y): 6 EXPERIMENTAL RESULTS Experiments were conducted on the Boston Housing data set 2 . This is a well known data set for testing non-linear regression methods; see, e.g., Breiman [1] and Saunders =-=[6]-=-. The data set consists of 506 cases in which 12 continuous variables and 1 binary variable determine the median house price in a certain area of Boston in thousands of dollars. The continuous variabl... |

3 |
An equivalence between sparce approximations and Support Vector Machines
- Girosi
- 1997
(Show Context)
Citation Context ...red by Mercer 's theorem and addressed by Vapnik [9] in his discussion of support vector methods. As an illustration of the idea, an example of a simple kernel function is presented here. (See Girosi =-=[4]-=-.) Suppose there is a mapping function OE which maps a two-dimensional vector into 6 dimensions: OE : (x 1 ; x 2 ) 7! ((x 1 ) 2 ; (x 2 ) 2 ; p 2x 1 ; p 2x 2 ; p 2x 1 x 2 ; 1); then dot products in F t... |