Results 1  10
of
22
Large Margin Classification Using the Perceptron Algorithm
 Machine Learning
, 1998
"... We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leaveoneout method. Like Vapnik 's maximalmargin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compa ..."
Abstract

Cited by 415 (1 self)
 Add to MetaCart
We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leaveoneout method. Like Vapnik 's maximalmargin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much simpler to implement, and much more efficient in terms of computation time. We also show that our algorithm can be efficiently used in very high dimensional spaces using kernel functions. We performed some experiments using our algorithm, and some variants of it, for classifying images of handwritten digits. The performance of our algorithm is close to, but not as good as, the performance of maximalmargin classifiers on the same problem, while saving significantly on computation time and programming effort. 1 Introduction One of the most influential developments in the theory of machine learning in the last few years is Vapnik's work on supp...
Ultraconservative Online Algorithms for Multiclass Problems
 Journal of Machine Learning Research
, 2001
"... In this paper we study online classification algorithms for multiclass problems in the mistake bound model. The hypotheses we use maintain one prototype vector per class. Given an input instance, a multiclass hypothesis computes a similarityscore between each prototype and the input instance and th ..."
Abstract

Cited by 249 (23 self)
 Add to MetaCart
In this paper we study online classification algorithms for multiclass problems in the mistake bound model. The hypotheses we use maintain one prototype vector per class. Given an input instance, a multiclass hypothesis computes a similarityscore between each prototype and the input instance and then sets the predicted label to be the index of the prototype achieving the highest similarity. To design and analyze the learning algorithms in this paper we introduce the notion of ultraconservativeness. Ultraconservative algorithms are algorithms that update only the prototypes attaining similarityscores which are higher than the score of the correct label's prototype. We start by describing a family of additive ultraconservative algorithms where each algorithm in the family updates its prototypes by finding a feasible solution for a set of linear constraints that depend on the instantaneous similarityscores. We then discuss a specific online algorithm that seeks a set of prototypes which have a small norm. The resulting algorithm, which we term MIRA (for Margin Infused Relaxed Algorithm) is ultraconservative as well. We derive mistake bounds for all the algorithms and provide further analysis of MIRA using a generalized notion of the margin for multiclass problems.
Controlling the Sensitivity of Support Vector Machines
 Proceedings of the International Joint Conference on AI
, 1999
"... For many applications it is important to accurately distinguish false negative results from false positives. This is particularly important for medical diagnosis where the correct balance between sensitivity and specificity plays an important role in evaluating the performance of a classifier. In th ..."
Abstract

Cited by 68 (4 self)
 Add to MetaCart
For many applications it is important to accurately distinguish false negative results from false positives. This is particularly important for medical diagnosis where the correct balance between sensitivity and specificity plays an important role in evaluating the performance of a classifier. In this paper we discuss two schemes for adjusting the sensitivity and specificity of Support Vector Machines and the description of their performance using receiver operating characteristic (ROC) curves. We then illustrate their use on reallife medical diagnostic tasks. 1 Introduction. Since their introduction by Vapnik and coworkers [ Vapnik, 1995; Cortes and Vapnik, 1995 ] , Support Vector Machines (SVMs) have been successfully applied to a number of real world problems such as handwritten character and digit recognition [ Scholkopf, 1997; Cortes, 1995; LeCun et al., 1995; Vapnik, 1995 ] , face detection [ Osuna et al., 1997 ] and speaker identification [ Schmidt, 1996 ] . They exhibit a r...
From Margin To Sparsity
 In Advances in Neural Information Processing Systems 13
, 2001
"... We present an improvement of Novikoff's perceptron convergence theorem. Reinterpreting this mistake bound as a margin dependent sparsity guarantee allows us to give a PACstyle generalisation error bound for the classifier learned by the dual perceptron learning algorithm. The bound value cruci ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
We present an improvement of Novikoff's perceptron convergence theorem. Reinterpreting this mistake bound as a margin dependent sparsity guarantee allows us to give a PACstyle generalisation error bound for the classifier learned by the dual perceptron learning algorithm. The bound value crucially depends on the margin a support vector machine would achieve on the same data set using the same kernel. Ironically, the bound yields better guarantees than are currently available for the support vector solution itself. 1 Introduction In the last few years there has been a large controversy about the significance of the attained margin, i.e. the smallest real valued output of a classifiers before thresholding, as an indicator of generalisation performance. Results in the VC, PAC and luckiness frameworks seem to indicate that a large margin is a prerequisite for small generalisation error bounds (see [13, 11]). These results caused many researchers to focus on large margin method...
Efficient SVM Regression Training with SMO
, 2001
"... The sequential minimal optimization algorithm (SMO) has been shown to be an effective method for training support vector machines ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
The sequential minimal optimization algorithm (SMO) has been shown to be an effective method for training support vector machines
Learning classifiers from distributed, semantically heterogeneous, autonomous data sources
, 2004
"... Recent advances in computing, communications, and digital storage technologies, together with development of high throughput data acquisition technologies have made it possible to gather and store large volumes of data in digital form. These developments have resulted in unprecedented opportunities ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Recent advances in computing, communications, and digital storage technologies, together with development of high throughput data acquisition technologies have made it possible to gather and store large volumes of data in digital form. These developments have resulted in unprecedented opportunities for largescale datadriven knowledge acquisition with the potential for fundamental gains in scientific understanding (e.g., characterization of macromolecular structurefunction relationships in biology) in many datarich domains. In such applications,
the data sources of interest are typically physically distributed, semantically heterogeneous and autonomously owned and operated, which makes it impossible to use traditional machine learning algorithms for knowledge acquisition.
However, we observe that most of the learning algorithms use only certain statistics computed from data in the process of generating the hypothesis that they output and we use this observation to design a general strategy for transforming traditional algorithms for learning from data into algorithms for learning from distributed data. The resulting algorithms are provably exact in that the classifiers produced by them are identical to those obtained by the corresponding algorithms in the centralized setting (i.e., when all of the data is available in a central location) and they compare favorably to their centralized counterparts in terms of time and communication complexity.
To deal with the semantical heterogeneity problem, we introduce ontologyextended data sources and define a user perspective consisting of an ontology and a set of interoperation constraints between data source ontologies and the user ontology. We show how these constraints can be used to define mappings and conversion functions needed to answer statistical queries from semantically heterogeneous data viewed from a certain user perspective. That is further used to extend our approach for learning from distributed data into a theoretically sound approach to learning from semantically heterogeneous data.
The work described above contributed to the design and implementation of AirlDM, a collection of data source independent machine learning algorithms through the means of sufficient statistics and data source wrappers, and to the design of INDUS, a federated, querycentric system for knowledge acquisition from distributed, semantically heterogeneous, autonomous data sources.
On the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel
 Machines, Proc. of ESANN 2003, 11 th European Symposium on Artificial Neural Networks
"... Abstract: The paper presents the equality of a kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and sequential minimal optimization (SMO) learning algorithm (based on an analytic quadratic programming step) in designing the support vector machines (SVMs) having posit ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Abstract: The paper presents the equality of a kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and sequential minimal optimization (SMO) learning algorithm (based on an analytic quadratic programming step) in designing the support vector machines (SVMs) having positive definite kernels. The conditions of the equality of two methods are established. The equality is valid for both the nonlinear classification and the nonlinear regression tasks, and it sheds a new light to these seemingly different learning approaches. The paper also introduces other learning techniques related to the two mentioned approaches, such as the nonnegative conjugate gradient, classic GaussSeidel (GS) coordinate ascent procedure and its derivative known as the successive overrelaxation (SOR) algorithm as a viable and usually faster training algorithms for performing nonlinear classification and regression tasks. The convergence theorem for these related iterative algorithms is proven. 1.
Iterative Single Data Algorithm for Training Kernel Machines from Huge Data Sets: Theory and Performance
 PERFORMANCE, SUPPORT VECTOR MACHINES: THEORY AND APPLICATIONS, SPRINGERVERLAG,.STUDIES IN FUZZINESS AND SOFT COMPUTING
, 2005
"... The chapter introduces the latest developments and results of Iterative Single Data Algorithm (ISDA) for solving largescale support vector machines (SVMs) problems. First, the equality of a Kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and the Sequential Minimal ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
The chapter introduces the latest developments and results of Iterative Single Data Algorithm (ISDA) for solving largescale support vector machines (SVMs) problems. First, the equality of a Kernel AdaTron (KA) method (originating from a gradient ascent learning approach) and the Sequential Minimal Optimization (SMO) learning algorithm (based on an analytic quadratic programming step for a model without bias term b) in designing SVMs with positive definite kernels is shown for both the nonlinear classification and the nonlinear regression tasks. The chapter also introduces the classic GaussSeidel (GS) procedure and its derivative known as the successive overrelaxation (SOR) algorithm as viable (and usually faster) training algorithms. The convergence theorem for these related iterative algorithms is proven. The second part of the chapter presents the effects and the methods of incorporating explicit bias term b into the ISDA. The algorithms shown here implement the single training data based iteration routine (a.k.a. perpattern learning). This makes the proposed ISDAs remarkably quick. The final solution in a dual domain is not an approximate one, but it is the optimal set of dual variables which would have been obtained by using any of existing and proven QP problem solvers if they only could deal with huge data sets.
A Geometric Approach to Train Support Vector Machines
 in Proc. IEEE Conf. Computer Vision and Pattern Rec
, 2000
"... Support Vector Machines (SVMs) have shown great potential in numerous visual learning and pattern recognition problems. The optimal decision surface of a SVM is constructed from its support vectors which are conventionally determined by solving a quadratic programming (QP) problem. However, solving ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Support Vector Machines (SVMs) have shown great potential in numerous visual learning and pattern recognition problems. The optimal decision surface of a SVM is constructed from its support vectors which are conventionally determined by solving a quadratic programming (QP) problem. However, solving a large optimization problem is challenging since it is computationally intensive and the memory requirement grows with square of the training vectors. In this paper, we propose a geometric method to extract a small superset of support vectors, which we call guard vectors, to construct the optimal decision surface. Specifically, the guard vectors are found by solving a set of linear programming problems. Experimental results on synthetic and real data sets show that the proposed method is more ecient than conventional methods using QPs and requires much less memory. 1 Introduction The Support Vector Machine (SVM) is a novel machine learning algorithm based on statistical learning theory th...
2004), Bias term b in svms again
 in ESANNâ€™2004 proceedings  European Symposium on Artificial Neural Networks
"... ..."