#### DMCA

## Are loss functions all the same

Venue: | Neural Computation |

Citations: | 12 - 1 self |

### Citations

13233 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...INFM - DIMA, Università di Genova, Via Dodecaneso 35, 16146 Genova (I) ¶ INFM - DISI, Università di Genova, Via Dodecaneso 35, 16146 Genova (I) 1sproblem is usually regarded as a computational issue (=-=Vapnik, 1995-=-; Vapnik, 1998; Alon et al., 1993; Cristianini and Shawe Taylor, 2000). The technical results are usually derived in a form which makes it difficult to evaluate the role played, if any, by different l... |

5410 | Convex Analysis
- Rockafellar
- 1970
(Show Context)
Citation Context ... t = w − y for regression and t = wy for classification. The basic assumption we make is that the mapping t → V (t) is convex for all t ∈ IR. This convexity hypothesis has two technical implications (=-=Rockafellar, 1970-=-). 1. A loss function is a Lipschitz function, i.e. for every M > 0 there exists a constant LM > 0 such that |V (w1,y) − V (w2,y)| ≤ LM|w1 − w2| for all w1,w2 ∈ [−M,M] and for all y ∈ Y . 2. There exi... |

3479 | The elements of statistical learning, - Hastie, Tibshirani, et al. - 2009 |

1297 |
Theory of reproducing kernels
- Aronszajn
- 1950
(Show Context)
Citation Context ...pproach of regularization is to look for approximate solutions by setting appropriate smoothness constraints on the hypothesis space H. Within this framework, Reproducing Kernel Hilbert Space (RKHS) (=-=Aronszajn, 1950-=-) provides a natural choice for H (Wahba, 1990; Girosi et al., 1995; Evgeniou et al., 2000). In what follows we briefly summarize the properties of RKHSs needed in the next section. A RKHS is a Hilber... |

1290 | An Introduction to Support Vector Machines - Cristianini, Shawe-Taylor - 2000 |

395 | Regularization theory and neural networks architectures,
- Girosi, Jones, et al.
- 1995
(Show Context)
Citation Context ...1 The results obtained throughout the paper, however, hold for any probability measure on Z. 3s2.2 RKHS and Hypothesis Space The approximation of f0 from a finite set of data is an ill-posed problem (=-=Girosi et al., 1995-=-; Evgeniou et al., 2000). The treatment of the functional and numerical pathologies due to ill-posedness can be addressed by using regularization theory. The conceptual approach of regularization is t... |

366 | Regularization networks and support vector machines.
- Evgeniou, Pontil, et al.
- 2000
(Show Context)
Citation Context ...d throughout the paper, however, hold for any probability measure on Z. 3s2.2 RKHS and Hypothesis Space The approximation of f0 from a finite set of data is an ill-posed problem (Girosi et al., 1995; =-=Evgeniou et al., 2000-=-). The treatment of the functional and numerical pathologies due to ill-posedness can be addressed by using regularization theory. The conceptual approach of regularization is to look for approximate ... |

329 | On the mathematical foundations of learning
- Cucker, Smale
- 2002
(Show Context)
Citation Context ...does not depend on the data – is the approximation error. In this section we provide a bound on the estimation error for all loss functions through a rather straightforward extension of Theorem C in (=-=Cucker and Smale, 2002-=-b). We let N(ɛ) be the covering number of HR (which is well defined because HR is a compact subset of C(X)) and start by proving the following sufficient condition for uniform convergence from which t... |

189 |
Splines Models of Observational Data, volume 59
- Wahba
- 1990
(Show Context)
Citation Context ...te solutions by setting appropriate smoothness constraints on the hypothesis space H. Within this framework, Reproducing Kernel Hilbert Space (RKHS) (Aronszajn, 1950) provides a natural choice for H (=-=Wahba, 1990-=-; Girosi et al., 1995; Evgeniou et al., 2000). In what follows we briefly summarize the properties of RKHSs needed in the next section. A RKHS is a Hilbert space H characterized by a symmetric positiv... |

158 | Statistical behavior and consistency of classification methods based on convex risk minimization’,
- Zhang
- 2004
(Show Context)
Citation Context ...a consistency property supporting the meaningfulness of the convexity assumption at the basis of our study. This property is related to the problem of the Bayes consistency (Lugosi and Vayatis, 2003; =-=Zhang, 2003-=-). The main outcome of our analysis is that, for classification, the hinge loss appears to be the loss of choice. Other things being equal, the hinge loss leads to a convergence rate which is practica... |

51 | choices for regularization parameters in learning theory: On the bias-variance problem - Cucker, Smale, et al. |

22 | Statistical properties and adaptive tuning of support vector machines,
- Lin, Wahba, et al.
- 2002
(Show Context)
Citation Context ...mations of the 0 − 1 loss, lead to consistent results. Therefore, our result can be interpreted as a consistency property shared by all convex loss functions. It can be shown that for the hinge loss (=-=Lin et al., 2003-=-) I[f0] = I[fb]. (11) By directly computing f0 for different loss functions (see Hastie et al. (2001), pp. 381, for example) it is easy to prove that this result does not hold for the other loss funct... |

8 |
On the Bayes risk consistency of regularized boosting methods. Annals of Statistics
- Lugosi, Vayatis
- 2003
(Show Context)
Citation Context ...his can be interpreted as a consistency property supporting the meaningfulness of the convexity assumption at the basis of our study. This property is related to the problem of the Bayes consistency (=-=Lugosi and Vayatis, 2003-=-; Zhang, 2003). The main outcome of our analysis is that, for classification, the hinge loss appears to be the loss of choice. Other things being equal, the hinge loss leads to a convergence rate whic... |

5 |
A note on different covering numbers in learning theory
- Pontil
(Show Context)
Citation Context ...ent notion of covering number that depends on the given sample is considered. The relation between these two complexity measures of hypothesis space has been investigated by some authors (Zhou, 2002; =-=Pontil, 2003-=-). In particular, from the results in Pontil (2003) the generalization of our proof to the case of data dependent covering number does not seem straightforward. We are now in a position to generalize ... |

2 | Notes on the use of different loss functions - Rosaco, Vito, et al. |

1 |
Scale-sensitive dimensions,uniform convergence and learnability
- Alon, Ben-David, et al.
- 1993
(Show Context)
Citation Context ...enova, Via Dodecaneso 35, 16146 Genova (I) ¶ INFM - DISI, Università di Genova, Via Dodecaneso 35, 16146 Genova (I) 1sproblem is usually regarded as a computational issue (Vapnik, 1995; Vapnik, 1998; =-=Alon et al., 1993-=-; Cristianini and Shawe Taylor, 2000). The technical results are usually derived in a form which makes it difficult to evaluate the role played, if any, by different loss functions. The aim of this pa... |