Results 1 - 10
of
18
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little u ..."
Abstract
-
Cited by 1512 (112 self)
- Add to MetaCart
Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the so-called kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for real-time applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst case loss bounds and moreover we show the convergence of the hypothesis to the minimiser of the regularised risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection. In addition
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract
-
Cited by 308 (1 self)
- Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Predicting Time Series with Support Vector Machines
, 1997
"... . Support Vector Machines are used for time series prediction and compared to radial basis function networks. We make use of two different cost functions for Support Vectors: training with (i) an ffl insensitive loss and (ii) Huber's robust loss function and discuss how to choose the regularization ..."
Abstract
-
Cited by 96 (11 self)
- Add to MetaCart
. Support Vector Machines are used for time series prediction and compared to radial basis function networks. We make use of two different cost functions for Support Vectors: training with (i) an ffl insensitive loss and (ii) Huber's robust loss function and discuss how to choose the regularization parameters in these models. Two applications are considered: data from (a) a noisy (normal and uniform noise) Mackey Glass equation and (b) the Santa Fe competition (set D). In both cases Support Vector Machines show an excellent performance. In case (b) the Support Vector approach improves the best known result on the benchmark by a factor of 29%. 1 Introduction Support Vector Machines have become a subject of intensive study (see e.g. [3, 14]). They have been applied successfully to classification tasks as OCR [14, 11] and more recently also to regression [5, 15]. In this contribution we use Support Vector Machines in the field of time series prediction and we find that they show an excel...
On a Kernel-based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion
, 1997
"... We present a Kernel--based framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridge-regression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connection ..."
Abstract
-
Cited by 67 (22 self)
- Add to MetaCart
We present a Kernel--based framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridge-regression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connections between the cost-function and some properties up to now believed to apply to Support Vector Machines only. The optimal solution of all the problems described above can be found by solving a simple quadratic programming problem. The paper closes with a proof of the equivalence between Support Vector kernels and Greene's functions of regularization operators.
Bayesian Statistics
- in WWW', Computing Science and Statistics
, 1989
"... ∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second o ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second one is a convex hull peeling depth approach to nonparametric massive multivariate data analysis. The second topic includes simulations and applications on massive astronomical data. First, we present a model selection criterion, minimizing the Kullback-Leibler distance by using the jackknife method. Various model selection methods have been developed to choose a model of minimum Kullback-Liebler distance to the true model, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), and Bootstrap information criterion. Likewise, the jackknife method chooses a model of minimum Kullback-Leibler distance through bias reduction. This bias, which is inevitable in model
On Computing Geometric Estimators of Location
, 2001
"... Let S be a data set of n points in R d , and be a point in R d which "best" describes S. Since the term "best" is subjective, there exist several definitions for finding . However, it is generally agreed that such a definition, or estimator of location, should have certain statistical propert ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Let S be a data set of n points in R d , and be a point in R d which "best" describes S. Since the term "best" is subjective, there exist several definitions for finding . However, it is generally agreed that such a definition, or estimator of location, should have certain statistical properties which make it robust. Most estimators of location assign a depth value to any point in R d and define to be a point with maximum depth. Here, new results are presented concerning the computational complexity of estimators of location. We prove that in R 2 the computation of simplicial and halfspace depth of a point requires\Omega\Gamma n log n) time, which matches the upper bound complexities of algorithms by Rousseeuw and Ruts. Our lower bounds also apply to two sign tests, that of Hodges and that of Oja and Nyblom. In addition, we propose algorithms which reduce the time complexity of calculating the points with greatest Oja and simplicial depth. Our fastest algorithms use O(n 3 log n) and O(n 4 ) time respectively, compared to the algorithms of Rousseeuw and Ruts which use O(n 5 log n) time. One of our algorithms may also be used to find a point with minimum weighted sum of distances to a set of n lines in O(n 2 ) time. This point is called the FermatTorricelli point of n lines by Roy Barbara, whose algorithm uses O(n 3 ) time. Finally, we propose a new estimator which arises from the notion of hyperplane depth recently defined by Rousseeuw and Hubert.
A Practical Procedure for Public Policy Decisions or Contingent Valuation and Demand Revelation -- Without Apology
, 1993
"... This paper examines procedures based on expressions of willingness to pay to determine the provision of public goods such as environmental amenities. In addition to classical efficiency and majority political support criteria, some issues related to the practical implementability of such mechanisms ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper examines procedures based on expressions of willingness to pay to determine the provision of public goods such as environmental amenities. In addition to classical efficiency and majority political support criteria, some issues related to the practical implementability of such mechanisms are considered, including the incentives agents have to report truthfully and the possibility of sampling a subset of the population of agents. In particular we suggest a decision procedure that is an alternative to those based on estimating either the mean or median willingness to pay, as proposed in conventional Contingent Valuation studies. Based on an average of conditional median values of willingness to pay, we show that this mechanism has many desireable properties. While maintaining incentives for agents to report truthfully their valuation for a public good, the mechanism limits the effects of outliers and broadens the political acceptability for adopted policies. In addition, the mechanism has better statistical properties than conventional Contingent Valuation procedures when only a sample of individuals is queried for their willingness to pay.
Bayesian kernel methods
- LNAI 2600
, 2003
"... Bayesian methods allow for a simple and intuitive representation of the function spaces used by kernel methods. This chapter describes the basic principles of Gaussian Processes, their implementation and their connection to other kernel-based Bayesian estimation methods, such as the Relevance Vecto ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Bayesian methods allow for a simple and intuitive representation of the function spaces used by kernel methods. This chapter describes the basic principles of Gaussian Processes, their implementation and their connection to other kernel-based Bayesian estimation methods, such as the Relevance Vector Machine.
M-ESTIMATION OF LINEAR MODELS WITH DEPENDENT ERRORS
, 2008
"... Abstract: We study the asymptotic behavior of M-estimates of regression parameters in multiple linear models where errors are dependent random variables. A Bahadur representation of the M-estimates is derived and a central limit theorem is established. The results are applied to linear models with e ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract: We study the asymptotic behavior of M-estimates of regression parameters in multiple linear models where errors are dependent random variables. A Bahadur representation of the M-estimates is derived and a central limit theorem is established. The results are applied to linear models with errors being short-range dependent linear processes, heavy-tailed linear processes and some widely used nonlinear time series. 1

