Results 1  10
of
749
Working set selection using second order information for training SVM
 Journal of Machine Learning Research
"... Working set selection is an important step in decomposition methods for training support vector machines (SVMs). This paper develops a new technique for working set selection in SMOtype decomposition methods. It uses second order information to achieve fast convergence. Theoretical properties such ..."
Abstract

Cited by 176 (9 self)
 Add to MetaCart
(Show Context)
Working set selection is an important step in decomposition methods for training support vector machines (SVMs). This paper develops a new technique for working set selection in SMOtype decomposition methods. It uses second order information to achieve fast convergence. Theoretical properties such as linear convergence are established. Experiments demonstrate that the proposed method is faster than existing selection methods using first order information.
An interiorpoint method for largescale l1regularized logistic regression
 Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract

Cited by 167 (5 self)
 Add to MetaCart
(Show Context)
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warmstart techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
A Note on Platt's Probabilistic Outputs for Support Vector Machines
, 2003
"... Platt's probabilistic outputs for Support Vector Machines [6] has been popular for applications that require posterior class probabilities. In this note, we propose an improvement which theoretically converges and avoids numerical difficulties. A simpler and readytouse pseudo code is includ ..."
Abstract

Cited by 123 (4 self)
 Add to MetaCart
(Show Context)
Platt's probabilistic outputs for Support Vector Machines [6] has been popular for applications that require posterior class probabilities. In this note, we propose an improvement which theoretically converges and avoids numerical difficulties. A simpler and readytouse pseudo code is included.
Trust region Newton method for largescale logistic regression
 In Proceedings of the 24th International Conference on Machine Learning (ICML
, 2007
"... Largescale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the loglikelihood of the logistic regression model. The proposed method uses only approximate Newton steps in ..."
Abstract

Cited by 69 (12 self)
 Add to MetaCart
(Show Context)
Largescale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the loglikelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also compare it with existing linear SVM implementations. 1
Multitask feature selection
 In the workshop of structural Knowledge Transfer for Machine Learning in the 23rd International Conference on Machine Learning (ICML 2006). Citeseer
, 2006
"... We address joint feature selection across a group of classification or regression tasks. In many multitask learning scenarios, different but related tasks share a large proportion of relevant features. We propose a novel type of joint regularization for the parameters of support vector machines in ..."
Abstract

Cited by 56 (1 self)
 Add to MetaCart
(Show Context)
We address joint feature selection across a group of classification or regression tasks. In many multitask learning scenarios, different but related tasks share a large proportion of relevant features. We propose a novel type of joint regularization for the parameters of support vector machines in order to couple feature selection across tasks. Intuitively, we extend the ℓ1 regularization for singletask estimation to the multitask setting. By penalizing the sum of ℓ2norms of the blocks of coefficients associated with each feature across different tasks, we encourage multiple predictors to have similar parameter sparsity patterns. This approach yields convex, nondifferentiable optimization problems that can be solved efficiently using a simple and scalable extragradient algorithm. We show empirically that our approach outperforms independent ℓ1based feature selection on several datasets. 1.
Classifier technology and the illusion of progress. Statist
 Sci
, 2006
"... Abstract. A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
(Show Context)
Abstract. A great many tools have been developed for supervised classification, ranging from early methods such as linear discriminant analysis through to modern developments such as neural networks and support vector machines. A large number of comparative studies have been conducted in attempts to establish the relative superiority of these methods. This paper argues that these comparisons often fail to take into account important aspects of real problems, so that the apparent superiority of more sophisticated methods may be something of an illusion. In particular, simple methods typically yield performance almost as good as more sophisticated methods, to the extent that the difference in performance may be swamped by other sources of uncertainty that generally are not considered in the classical supervised classification paradigm.
Bayesian Treed Gaussian Process Models with an Application to Computer Modeling
 Journal of the American Statistical Association
, 2007
"... This paper explores nonparametric and semiparametric nonstationary modeling methodologies that couple stationary Gaussian processes and (limiting) linear models with treed partitioning. Partitioning is a simple but effective method for dealing with nonstationarity. Mixing between full Gaussian proce ..."
Abstract

Cited by 49 (15 self)
 Add to MetaCart
This paper explores nonparametric and semiparametric nonstationary modeling methodologies that couple stationary Gaussian processes and (limiting) linear models with treed partitioning. Partitioning is a simple but effective method for dealing with nonstationarity. Mixing between full Gaussian processes and simple linear models can yield a more parsimonious spatial model while significantly reducing computational effort. The methodological developments and statistical computing details which make this approach efficient are described in detail. Illustrations of our model are given for both synthetic and real datasets. Key words: recursive partitioning, nonstationary spatial model, nonparametric regression, Bayesian model averaging 1
Cost curves: an improved method for visualizing classifier performance
 Machine Learning
, 2006
"... Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing ..."
Abstract

Cited by 49 (7 self)
 Add to MetaCart
(Show Context)
Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing classifier performance for most purposes. This is because they visually support several crucial types of performance assessment that cannot be done easily with ROC curves, such as showing confidence intervals on a classifier’s performance, and visualizing the statistical significance of the difference in performance of two classifiers. A software tool supporting all the cost curve analysis described in this paper is available from the authors.
Introducing dendritic cells as a novel immuneinspired algorithm for anomaly detection
 In ICARIS05, LNCS 3627
, 2005
"... Abstract. Dendritic cells are antigen presenting cells that provide a vital link between the innate and adaptive immune system. Research into this family of cells has revealed that they perform the role of coordinating Tcell based immune responses, both reactive and for generating tolerance. We hav ..."
Abstract

Cited by 47 (12 self)
 Add to MetaCart
(Show Context)
Abstract. Dendritic cells are antigen presenting cells that provide a vital link between the innate and adaptive immune system. Research into this family of cells has revealed that they perform the role of coordinating Tcell based immune responses, both reactive and for generating tolerance. We have derived an algorithm based on the functionality of these cells, and have used the signals and differentiation pathways to build a control mechanism for an artificial immune system. We present our algorithmic details in addition to some preliminary results, where the algorithm was applied for the purpose of anomaly detection. We hope that this algorithm will eventually become the key component within a large, distributed immune system, based on sound immunological concepts.
Efficient l1 regularized logistic regression
 In AAAI06
, 2006
"... L1 regularized logistic regression is now a workhorse of machine learning: it is widely used for many classification problems, particularly ones with many features. L1 regularized logistic regression requires solving a convex optimization problem. However, standard algorithms for solving convex opti ..."
Abstract

Cited by 47 (4 self)
 Add to MetaCart
(Show Context)
L1 regularized logistic regression is now a workhorse of machine learning: it is widely used for many classification problems, particularly ones with many features. L1 regularized logistic regression requires solving a convex optimization problem. However, standard algorithms for solving convex optimization problems do not scale well enough to handle the large datasets encountered in many practical settings. In this paper, we propose an efficient algorithm for L1 regularized logistic regression. Our algorithm iteratively approximates the objective function by a quadratic approximation at the current point, while maintaining the L1 constraint. In each iteration, it uses the efficient LARS (Least Angle Regression) algorithm to solve the resulting L1 constrained quadratic optimization problem. Our theoretical results show that our algorithm is guaranteed to converge to the global optimum. Our experiments show that our algorithm significantly outperforms standard algorithms for solving convex optimization problems. Moreover, our algorithm outperforms four previously published algorithms that were specifically designed to solve the L1 regularized logistic regression problem.