Results 1  10
of
13,831
Regularization paths for generalized linear models via coordinate descent
, 2009
"... We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the elastic ..."
Abstract

Cited by 724 (15 self)
 Add to MetaCart
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the
Additive Logistic Regression: a Statistical View of Boosting
 Annals of Statistics
, 1998
"... Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classification methodology. The performance of many classification algorithms can often be dramatically improved by sequentially applying them to reweighted versions of the input dat ..."
Abstract

Cited by 1750 (25 self)
 Add to MetaCart
data, and taking a weighted majority vote of the sequence of classifiers thereby produced. We show that this seemingly mysterious phenomenon can be understood in terms of well known statistical principles, namely additive modeling and maximum likelihood. For the twoclass problem, boosting can
Least angle regression
, 2004
"... The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to s ..."
Abstract

Cited by 1326 (37 self)
 Add to MetaCart
implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2000
"... We present a unifying framework for studying the solution of multiclass categorization problems by reducing them to multiple binary problems that are then solved using a marginbased binary learning algorithm. The proposed framework unifies some of the most popular approaches in which each class ..."
Abstract

Cited by 561 (20 self)
 Add to MetaCart
We present a unifying framework for studying the solution of multiclass categorization problems by reducing them to multiple binary problems that are then solved using a marginbased binary learning algorithm. The proposed framework unifies some of the most popular approaches in which each class
Training Linear SVMs in Linear Time
, 2006
"... Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for highdimensional sparse data commonly encountered in applications like text classification, wordsense disambiguation, and drug design. These applications involve a large number of examples n ..."
Abstract

Cited by 549 (6 self)
 Add to MetaCart
as well as a large number of features N, while each example has only s << N nonzero features. This paper presents a CuttingPlane Algorithm for training linear SVMs that provably has training time O(sn) for classification problems and O(sn log(n)) for ordinal regression problems. The algorithm
An Efficient Boosting Algorithm for Combining Preferences
, 1999
"... The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting ..."
Abstract

Cited by 727 (18 self)
 Add to MetaCart
The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new
Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties
, 2001
"... Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized ..."
Abstract

Cited by 948 (62 self)
 Add to MetaCart
Variable selection is fundamental to highdimensional statistical modeling, including nonparametric regression. Many approaches in use are stepwise selection procedures, which can be computationally expensive and ignore stochastic errors in the variable selection process. In this article, penalized
Multitask Learning,”
, 1997
"... Abstract. Multitask Learning is an approach to inductive transfer that improves generalization by using the domain information contained in the training signals of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for ..."
Abstract

Cited by 677 (6 self)
 Add to MetaCart
for each task can help other tasks be learned better. This paper reviews prior work on MTL, presents new evidence that MTL in backprop nets discovers task relatedness without the need of supervisory signals, and presents new results for MTL with knearest neighbor and kernel regression. In this paper we
The control of the false discovery rate in multiple testing under dependency
 Annals of Statistics
, 2001
"... Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparab ..."
Abstract

Cited by 1093 (16 self)
 Add to MetaCart
Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little u ..."
Abstract

Cited by 2831 (123 self)
 Add to MetaCart
and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We
Results 1  10
of
13,831