Results 1  10
of
12
Choosing multiple parameters for support vector machines
 Machine Learning
, 2002
"... Abstract. The problem of automatically tuning multiple parameters for pattern recognition Support Vector Machines (SVMs) is considered. This is done by minimizing some estimates of the generalization error of SVMs using a gradient descent algorithm over the set of parameters. Usual methods for choos ..."
Abstract

Cited by 300 (16 self)
 Add to MetaCart
Abstract. The problem of automatically tuning multiple parameters for pattern recognition Support Vector Machines (SVMs) is considered. This is done by minimizing some estimates of the generalization error of SVMs using a gradient descent algorithm over the set of parameters. Usual methods for choosing parameters, based on exhaustive search become intractable as soon as the number of parameters exceeds two. Some experimental results assess the feasibility of our approach for a large number of parameters (more than 100) and demonstrate an improvement of generalization performance.
Neural network regularization and ensembling using multiobjective evolutionary algorithms
 In: Congress on Evolutionary Computation (CEC’04), IEEE
, 2004
"... Abstract — Regularization is an essential technique to improve generalization of neural networks. Traditionally, regularization is conduced by including an additional term in the cost function of a learning algorithm. One main drawback of these regularization techniques is that a hyperparameter that ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Abstract — Regularization is an essential technique to improve generalization of neural networks. Traditionally, regularization is conduced by including an additional term in the cost function of a learning algorithm. One main drawback of these regularization techniques is that a hyperparameter that determines to which extension the regularization in¤uences the learning algorithm must be determined beforehand. This paper addresses the neural network regularization problem from a multiobjective optimization point of view. During the optimization, both structure and parameters of the neural network will be optimized. A slightly modi£ed version of two multiobjective optimization algorithms, the dynamic weighted aggregation (DWA) method and the elitist nondominated sorting genetic algorithm (NSGAII) are used and compared. An evolutionary multiobjective approach to neural network regularization has a number of advantages compared to the traditional methods. First, a number of models with a spectrum of model complexity can be obtained in one optimization run instead of only one single solution. Second, an ef£cient new regularization term can be introduced, which is not applicable to gradientbased learning algorithms. As a natural byproduct of the multiobjective optimization approach to neural network regularization, neural network ensembles can be easily constructed using the obtained networks with different levels of model complexity. Thus, the model complexity of the ensemble can be adjusted by adjusting the weight of each member network in the ensemble. Simulations are carried out on a test function to illustrate the feasibility of the proposed ideas. I.
Efficient multiple hyperparameter learning for loglinear models
 in NIPS
, 2007
"... Using multiple regularization hyperparameters is an effective method for managing model complexity in problems where input features have varying amounts of noise. While algorithms for choosing multiple hyperparameters are often used in neural networks and support vector machines, they are not common ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
Using multiple regularization hyperparameters is an effective method for managing model complexity in problems where input features have varying amounts of noise. While algorithms for choosing multiple hyperparameters are often used in neural networks and support vector machines, they are not common in structured prediction tasks, such as sequence labeling or parsing. In this paper, we consider the problem of learning regularization hyperparameters for loglinear models, a class of probabilistic models for structured prediction tasks which includes conditional random fields (CRFs). Using an implicit differentiation trick, we derive an efficient gradientbased method for learning Gaussian regularization priors with multiple hyperparameters. In both simulations and the realworld task of computational RNA secondary structure prediction, we find that multiple hyperparameter learning provides a significant boost in accuracy compared to models learned using only a single regularization hyperparameter. 1
Optimization of the SVM Kernels using an Empirical Error Minimization Scheme.
 In Proc. of the International Workshop on Pattern Recognition with Support Vector Machine
, 2002
"... We address the problem of optimizing kernel parameters in Support Vector Machine modelling, especially when the number of parameters is greater than one as in polynomial kernels and KMOD, our newly introduced kernel. The present work is an extended experimental study of the framework proposed by Cha ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We address the problem of optimizing kernel parameters in Support Vector Machine modelling, especially when the number of parameters is greater than one as in polynomial kernels and KMOD, our newly introduced kernel. The present work is an extended experimental study of the framework proposed by Chapelle et al. for optimizing SVM kernels using an analytic upper bound of the error. However, our optimization scheme minimizes an empirical error estimate using a QuasiNewton technique. The method has shown to reduce the number of support vectors along the optimization process. In order to assess our contribution, the approach is further used for adapting KMOD, RBF and polynomial kernels on synthetic data and NIST digit image database.
Empirical Error based Optimization of SVM Kernels: Application to Digit Image Recognition.
 In the 8 th IWFHR, Niagaraonthelake
, 2002
"... We address the problem of optimizing kernel parameters in Support Vector Machine modeling, especially when the number of parameters is greater than one as in polynomial kernels and KMOD, our newly introduced kernel. The present work is an extended experimental study of the framework proposed by Chap ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We address the problem of optimizing kernel parameters in Support Vector Machine modeling, especially when the number of parameters is greater than one as in polynomial kernels and KMOD, our newly introduced kernel. The present work is an extended experimental study of the framework proposed by Chapelle et al. for optimizing SVM kernels using an analytic upper bound of the error. However, our optimization scheme minimizes an empirical error estimate using a QuasiNewton optimization method. To assess our method, the approach is further used for adapting KMOD, RBF and polynomial kernels on synthetic data and NIST database. The method shows a much faster convergence with satisfactory results in comparison with the simple gradient descent method.
Anisotropic Noise Injection for Input Variables Relevance Determination
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2000
"... There are two archetypal ways to control the complexity of a exible regressor: subset selection and ridge regression. In neural networks jargon, they are respectively known as pruning and weight decay. These techniques may also be adapted to estimate which features of the input space are relevant fo ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
There are two archetypal ways to control the complexity of a exible regressor: subset selection and ridge regression. In neural networks jargon, they are respectively known as pruning and weight decay. These techniques may also be adapted to estimate which features of the input space are relevant for predicting the output variables. Relevance is given by a binary indicator for subset selection, and by a continuous rating for ridge regression. This paper
On Optimal Data Split For Generalization Estimation And Model Selection
, 1999
"... this paper, we address a crucial problem of crossvalidation estimators: how to split the data into various sets. The set D of all available data is usually split into two parts: the design set E and the test set F . The test set is exclusively reserved to a final assessment of the model which has ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
this paper, we address a crucial problem of crossvalidation estimators: how to split the data into various sets. The set D of all available data is usually split into two parts: the design set E and the test set F . The test set is exclusively reserved to a final assessment of the model which has been designed on E (using e.g., optimization and model selection). This usually requires that the design set in turn is split in two parts: training set T and validation set V . The objective of the design/test split is to both obtain a model with high generalization ability and to assess the generalization error reliably. The second split is the training /validation split of the design set. Model parameters are trained on the training data, while the validation set provides an estimator of generalization error used to e.g., choose between alternative models or optimize a
Optimized Combination, Regularization, and Pruning in Parallel Consensual Neural Networks
, 1998
"... Optimized combination, regularization, and pruning is proposed for the Parallel Consensual Neural Networks (PCNNs) which is a neural network architecture based on the consensus of a collection of stage neural networks trained on the same input data with di#erent representations. Here, a regularizati ..."
Abstract
 Add to MetaCart
Optimized combination, regularization, and pruning is proposed for the Parallel Consensual Neural Networks (PCNNs) which is a neural network architecture based on the consensus of a collection of stage neural networks trained on the same input data with di#erent representations. Here, a regularization scheme is presented for the PCNN and in training a regularized cost function is minimized. The use of this regularization scheme in conjunction with Optimal Brain Damage pruning is suggested both to optimize the architecture of the individual stage networks and to avoid overfitting. Experiments are conducted on a multisource remote sensing and geographic data set consisting of six data source. The results obtained by the proposed version of PCNN are compared to other classification approaches such as the original PCNN, single stage neural networks and statistical classifiers. In comparison to the originally proposed PCNNs, the use of pruning and regularization not only produces simpler PC...
Evaluation Of Neural Networks Algorithms In Marketing Problems: An Experimental Approach
"... Literature review corroborates that neural networks are being successfully applied in marketing problems. We further investigate the performance of neural networks, evaluating several neural network algorithms and architectures in two marketing problems (a regression and a classification problem). F ..."
Abstract
 Add to MetaCart
Literature review corroborates that neural networks are being successfully applied in marketing problems. We further investigate the performance of neural networks, evaluating several neural network algorithms and architectures in two marketing problems (a regression and a classification problem). Findings suggest that there isn't a single algorithm that outperforms all the others and that performance and robustness are problem dependent. Therefore, to achieve the best solution one has to try several neural networks algorithms and pick the one which provides the best result. Findings also suggest that neural networks are capable of solving marketing problems accurately but sometimes they are prone to converge to local minima.
Application of Statistical Learning Theory to DNA Microarray Analysis
, 2001
"... This thesis focuses on applying Support Vector Machines (SVMs), an algorithm founded in the framework of statistical learning theory, to analyzing DNA microarray data. The first part ..."
Abstract
 Add to MetaCart
This thesis focuses on applying Support Vector Machines (SVMs), an algorithm founded in the framework of statistical learning theory, to analyzing DNA microarray data. The first part