• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Regression estimation with support vector learning machines (1996)

by A J Smola
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 27
Next 10 →

A tutorial on support vector regression

by Alex J. Smola, Bernhard Schölkopf , 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract - Cited by 308 (1 self) - Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.

Support Vector Machines for Classification and Regression

by Steve R. Gunn - UNIVERSITY OF SOUTHAMPTON, TECHNICAL REPORT , 1998
"... The problem of empirical data modelling is germane to many engineering applications. In empirical data modelling a process of induction is used to build up a model of the system, from which it is hoped to deduce responses of the system that have yet to be observed. Ultimately the quantity and qualit ..."
Abstract - Cited by 125 (2 self) - Add to MetaCart
The problem of empirical data modelling is germane to many engineering applications. In empirical data modelling a process of induction is used to build up a model of the system, from which it is hoped to deduce responses of the system that have yet to be observed. Ultimately the quantity and quality of the observations govern the performance of this empirical model. By its observational nature data obtained is finite and sampled; typically this sampling is non-uniform and due to the high dimensional nature of the problem the data will form only a sparse distribution in the input space. Consequently the problem is nearly always ill posed (Poggio et al., 1985) in the sense of Hadamard (Hadamard, 1923). Traditional neural network approaches have suffered difficulties with generalisation, producing models that can overfit the data. This is a consequence of the optimisation algorithms used for parameter selection and the statistical measures used to select the ’best’ model. The foundations of Support Vector Machines (SVM) have been developed by Vapnik (1995) and are gaining popularity due to many attractive features, and promising empirical performance. The formulation embodies the Structural Risk Minimisation (SRM) principle, which has been shown to be superior, (Gunn et al., 1997), to traditional Empirical Risk Minimisation (ERM) principle, employed by conventional neural networks. SRM minimises an upper bound on the expected risk, as opposed to ERM that minimises the error on the training data. It is this difference which equips SVM with a greater ability to generalise, which is the goal in statistical learning. SVMs were developed to solve the classification problem, but recently they have been extended to the domain of regression problems (Vapnik et al., 1997). In the literature the terminology for SVMs can be slightly confusing. The term SVM is typically used to describe classification with support vector methods and support vector regression is used to describe regression with support vector methods. In this report the term SVM will refer to both classification and regression methods, and the terms Support Vector Classification (SVC) and Support Vector Regression (SVR) will be used for specification. This section continues with a brief introduction to the structural risk

Query Learning with Large Margin Classifiers

by Colin Campbell, Nello Cristianini, Alex Smola , 2000
"... The active selection of instances can significantly improve the generalisation performance of a learning machine. Large margin classifiers such as Support Vector Machines classify data using the most informative instances (the support vectors). This makes them natural candidates for instance s ..."
Abstract - Cited by 92 (1 self) - Add to MetaCart
The active selection of instances can significantly improve the generalisation performance of a learning machine. Large margin classifiers such as Support Vector Machines classify data using the most informative instances (the support vectors). This makes them natural candidates for instance selection strategies. In this paper we propose an algorithm for the training of Support Vector Machines using instance selection. We give a theoretical justification for the strategy and experimental results on real and artificial data demonstrating its effectiveness. The technique is most efficient when the dataset can be learnt using few support vectors. 1. Introduction The labour-intensive task of labelling data is a serious bottleneck for many data mining tasks. Often cost or time constraints mean that only a fraction of the available instances can be labeled. For this reason there has been increasing interest in the problem of handling partially labeled datasets. One approach ...

On a Kernel-based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion

by Alex J. Smola, Bernhard Schölkopf , 1997
"... We present a Kernel--based framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridge-regression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connection ..."
Abstract - Cited by 67 (22 self) - Add to MetaCart
We present a Kernel--based framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridge-regression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connections between the cost-function and some properties up to now believed to apply to Support Vector Machines only. The optimal solution of all the problems described above can be found by solving a simple quadratic programming problem. The paper closes with a proof of the equivalence between Support Vector kernels and Greene's functions of regularization operators.

Robust Linear and Support Vector Regression

by Olvi L. Mangasarian, David R. Musicant - IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE , 2000
"... ..."
Abstract - Cited by 19 (1 self) - Add to MetaCart
Abstract not found

Regression Models for Ordinal Data: A Machine Learning Approach

by Ralf Herbrich, Thore Graepel, Klaus Obermayer , 1999
"... In contrast to the standard machine learning tasks of classification and metric regression we investigate the problem of predicting variables of ordinal scale, a setting referred to as ordinal regression. The task of ordinal regression arises frequently in the social sciences and in information retr ..."
Abstract - Cited by 12 (3 self) - Add to MetaCart
In contrast to the standard machine learning tasks of classification and metric regression we investigate the problem of predicting variables of ordinal scale, a setting referred to as ordinal regression. The task of ordinal regression arises frequently in the social sciences and in information retrieval where human preferences play a major role. Also many multi--class problems are really problems of ordinal regression due to an ordering of the classes. Although the problem is rather novel to the Machine Learning Community it has been widely considered in Statistics before. All the statistical methods rely on a probability model of a latent (unobserved) variable and on the condition of stochastic ordering. In this paper we develop a distribution independent formulation of the problem and give uniform bounds for our risk functional. The main difference to classification is the restriction that the mapping of objects to ranks must be transitive and asymmetric. Combining our theoretical framework with results from measurement theory we present an approach that is based on a mapping from objects to scalar utility values and thus guarantees transitivity and asymmetry. Applying the principle of Structural Risk Minimization as employed in Support Vector Machines we derive a new learning algorithm based on large margin rank boundaries for the task of ordinal regression. Our method is easily extended to nonlinear utility functions. We give experimental results for an Information Retrieval task of learning the order of documents with respect to an initial query. Moreover, we show that our algorithm outperforms more naive approaches to ordinal regression such as Support Vector Classification and Support Vector Regression in the case of more than two ranks.

Model induction with support vector machines

by Yonas B. Dibike, Slavco Velickov, Dimitri Solomatine, Michael B. Abbott - Introduction and PWASET VOLUME 26 DECEMBER 2007 ISSN 1307-6884 797 © 2007 WASET.ORG OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY VOLUME 26 DECEMBER 2007 ISSN 1307-6884 applications.”Journal of Computing in Civil Engineering, ASCE , 2001
"... ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
Abstract not found

Support Vector Machines for Phoneme Classification

by Jesper Salomon , 2001
"... In this thesis, Support Vector Machines (SVMs) are applied to the problem of phoneme classification. Given a sequence of acoustic observations and 40 phoneme targets, the task is to classify each observation to one of these targets. Since this task involves multiple classes, one of the main hurdles ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
In this thesis, Support Vector Machines (SVMs) are applied to the problem of phoneme classification. Given a sequence of acoustic observations and 40 phoneme targets, the task is to classify each observation to one of these targets. Since this task involves multiple classes, one of the main hurdles SVMs must overcome is to extend the inherently binary SVMs to the multi-class case. To do this, several methods are proposed, and their generalisation abilities are measured. It is found that even though some generalisation is lost in the transition, this can still lead to effective classifiers. In addition, a refinement to the SVMs is made to derive estimated posterior probabilities from classifications. Since almost all speech recognition systems are based on statistical models, this is necessary if SVMs are to be used in a full speech recognition system. The best accuracy found was 71.4%, which is competitive with the best results found in literature.

Mathematical Programming Approaches To Machine Learning And Data Mining

by Paul S. Bradley , 1998
"... Machine learning problems of supervised classification, unsupervised clustering and parsimonious approximation are formulated as mathematical programs. The feature selection problem arising in the supervised classification task is effectively addressed by calculating a separating plane by minimizing ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Machine learning problems of supervised classification, unsupervised clustering and parsimonious approximation are formulated as mathematical programs. The feature selection problem arising in the supervised classification task is effectively addressed by calculating a separating plane by minimizing separation error and the number of problem features utilized. The support vector machine approach is formulated using various norms to measure the margin of separation. The clustering problem of assigning m points in n-dimensional real space to k clusters is formulated as minimizing a piecewise-linear concave function over a polyhedral set. This problem is also formulated in a novel fashion by minimizing the sum of squared distances of data points to nearest cluster planes characterizing the k clusters. The problem of obtaining a parsimonious solution to a linear system where the right hand side vector may be corrupted by noise is formulated as minimizing the system residual plus either the number of nonzero elements in the solution vector or the norm of the solution vector. The feature selection problem, the clustering problem and the parsimonious approximation problem can all be stated as the minimization of a concave function over a polyhedral region and are solved by a theoretically justifiable, fast and finite successive linearization algorithm. Numerical tests indicate the utility and efficiency of these formulations on real-world databases. In particular, the feature selection approach via concave minimization computes a separating-plane based classifier that improves upon the generalization ability of a separating plane computed without feature suppression. This approach produces ii classifiers utilizing fewer original problem features than the support vector machin...

Data Mining Via Mathematical Programming And Machine Learning

by David R. Musicant , 2000
"... This work explores solving large-scale data mining problems through the use of mathe- matical programming methods. In particular, algorithms are proposed for the support vector machine (SVM) classification problem, which consists of constructing a separating surface that can discriminate between poi ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
This work explores solving large-scale data mining problems through the use of mathe- matical programming methods. In particular, algorithms are proposed for the support vector machine (SVM) classification problem, which consists of constructing a separating surface that can discriminate between points from one of two classes. An algorithm based on successive overrelaxation (SOR) is presented which can process very large datasets that need not reside in memory. Concepts from generalized SVMs are combined with SOR and with linear programming to find nonlinear separating surfaces. An "active set" strategy is used to generate a fast algorithm that consists of solving a finite num- ber of linear equations of the order of the dimensionality of the original input space at each step. This ASVM active set algorithm requires no specialized quadratic or linear programming code, but merely a linear equation solver which is publicly available. An implicit Lagrangian for the dual of an SVM is used to lead to the simple linearly conver- gent Lagrangian SVM (LSVM) algorithm. LSVM requires the inversion at the outset of a single (typically small) matrix, and the full algorithm is given in 11 lines of MATLAB code.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University