Results 11 - 20
of
22
N.: Recursive Aggregation of Estimators by Mirror Descent Algorithm with averaging. Problems of Information Transmission
"... We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ 1-constraint. It is defined by a stochastic version of the mirror descent algor ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ 1-constraint. It is defined by a stochastic version of the mirror descent algorithm (i.e., of the method which performs gradient descent in the dual space) with an additional averaging. The main result of the paper is an upper bound for the expected accuracy 1 of the proposed estimator. This bound is of the order √ (log M)/t with an explicit and small constant factor, where M is the dimension of the problem and t stands for the sample size. Similar bound is proved for a more general setting that covers, in particular, the regression model with squared loss. 1
Support Vector Methods in Learning and Feature Extraction
, 1998
"... The last years have witnessed an increasing interest in Support Vector (SV) machines, which use Mercer kernels for efficiently performing computations in high-dimensional spaces. In pattern recognition, the SV algorithm constructs nonlinear decision functions by training a classifier to perform a li ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The last years have witnessed an increasing interest in Support Vector (SV) machines, which use Mercer kernels for efficiently performing computations in high-dimensional spaces. In pattern recognition, the SV algorithm constructs nonlinear decision functions by training a classifier to perform a linear separation in some high-dimensional space which is nonlinearly related to input space. Recently, we have developed a technique for Nonlinear Principal Component Analysis (Kernel PCA) based on the same types of kernels. This way, we can for instance efficiently extract polynomial features of arbitrary order by computing projections onto principal components in the space of all products of n pixels of images. We explain the idea of Mercer kernels and associated feature spaces, and describe connections to the theory of reproducing kernels and to regularization theory, followed by an overview of the above algorithms employing these kernels. 1. Introduction For the case of two-class pattern...
A short introduction to learning with kernels
- IN ADVANCED LECTURES ON MACHINE LEARNING, S.MENDELSON
, 2002
"... We briefly describe the main ideas of statistical learning theory, support vector ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We briefly describe the main ideas of statistical learning theory, support vector
Adaptive model generation: An architecture for the deployment of data mining-based intrusion detection systems
- IN
, 2002
"... ..."
Statistical Learning and Kernel Methods in Bioinformatics
- in Bioinformatics,” Artificial Intelligence and Heuristic Methods in Bioinformatics 183, (Eds.) P. Frasconi und R. Shamir, IOS
, 2000
"... We briefly describe the main ideas of statistical learning theory, support vector machines, and kernel feature spaces. In addition, we present an overview of applications of kernel methods in bioinformatics. ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We briefly describe the main ideas of statistical learning theory, support vector machines, and kernel feature spaces. In addition, we present an overview of applications of kernel methods in bioinformatics.
Mathematical Programming Approaches To Machine Learning And Data Mining
, 1998
"... Machine learning problems of supervised classification, unsupervised clustering and parsimonious approximation are formulated as mathematical programs. The feature selection problem arising in the supervised classification task is effectively addressed by calculating a separating plane by minimizing ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Machine learning problems of supervised classification, unsupervised clustering and parsimonious approximation are formulated as mathematical programs. The feature selection problem arising in the supervised classification task is effectively addressed by calculating a separating plane by minimizing separation error and the number of problem features utilized. The support vector machine approach is formulated using various norms to measure the margin of separation. The clustering problem of assigning m points in n-dimensional real space to k clusters is formulated as minimizing a piecewise-linear concave function over a polyhedral set. This problem is also formulated in a novel fashion by minimizing the sum of squared distances of data points to nearest cluster planes characterizing the k clusters. The problem of obtaining a parsimonious solution to a linear system where the right hand side vector may be corrupted by noise is formulated as minimizing the system residual plus either the number of nonzero elements in the solution vector or the norm of the solution vector. The feature selection problem, the clustering problem and the parsimonious approximation problem can all be stated as the minimization of a concave function over a polyhedral region and are solved by a theoretically justifiable, fast and finite successive linearization algorithm. Numerical tests indicate the utility and efficiency of these formulations on real-world databases. In particular, the feature selection approach via concave minimization computes a separating-plane based classifier that improves upon the generalization ability of a separating plane computed without feature suppression. This approach produces ii classifiers utilizing fewer original problem features than the support vector machin...
On Design of Optimal Nonlinear Kernel Potential Function for Protein Folding and Protein Design
, 2003
"... Potential functions are critical for computational studies of protein structure prediction, folding, and sequence design. A class of widely used potentials for coarse grained models of proteins are contact potentials in the form of weighted linear sum of pairwise contacts. However, these potentials ..."
Abstract
- Add to MetaCart
Potential functions are critical for computational studies of protein structure prediction, folding, and sequence design. A class of widely used potentials for coarse grained models of proteins are contact potentials in the form of weighted linear sum of pairwise contacts. However, these potentials have been shown to be unsuitable choices because they cannot stabilize native proteins against a large number of decoys generated by gapless threading, when the number of native proteins is above 300. We develop an alternative framework for designing protein potential. We describe how finding optimal protein potential can be understood from two geometric viewpoints, and we derive nonlinear potentials using mixture of Gaussian kernel functions for folding and design. In our experiment we use a training set of 440 protein structures repre senting a major portion of all known protein structures, and about 14 million structure decoys and sequence decoys obtained by gapless threading. The optimization criterion for obtaining parameters of the potential is to minimize bounds on the generalization error of discriminating protein structures and decoys not used in training. We succeeded in obtaining nonlinear potential with perfect discrimination of the 440 native structures and native sequences. For the more challenging task of sequence design when decoys are obtained by gapless threading, we show that there is no linear potential with perfect discrimination of all 440 native sequences. Results on an independent test set of 194 proteins also showed that nonlinear kernel potential performs well, with only 3 structures and 14 sequences misclassified, which compare favorable with the results of 7 structures and 37 sequences misclassified using optimal linear potential. We conclude that ...
PAC-Bayesian Generic Chaining
- Advances in Neural Information Processing Systems
, 2003
"... There exist many different generalization error bounds for classification. ..."
Abstract
- Add to MetaCart
There exist many different generalization error bounds for classification.

