Results 1  10
of
169
An interiorpoint method for largescale l1regularized logistic regression
 Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract

Cited by 286 (8 self)
 Add to MetaCart
(Show Context)
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warmstart techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
Efficient structure learning of Markov networks using L1regularization
 In NIPS
, 2006
"... Markov networks are widely used in a wide variety of applications, in problems ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to ..."
Abstract

Cited by 146 (3 self)
 Add to MetaCart
(Show Context)
Markov networks are widely used in a wide variety of applications, in problems ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to the lack of effective algorithms for learning Markov network structure from data. In this paper, we provide a computationally effective method for learning Markov network structure from data. Our method is based on the use of L1 regularization on the weights of the loglinear model, which has the effect of biasing the model towards solutions where many of the parameters are zero. This formulation converts the Markov network learning problem into a convex optimization problem in a continuous space, which can be solved using efficient gradient methods. A key issue in this setting is the (unavoidable) use of approximate inference, which can lead to errors in the gradient computation when the network structure is dense. Thus, we explore the use of different feature introduction schemes and compare their performance. We provide results for our method on synthetic data, and on two real world data sets: modeling the joint distribution of pixel values in the MNIST data, and modeling the joint distribution of genetic sequence variations in the human HapMap data. We show that our L1based method achieves considerably higher generalization performance than the more standard L2based method (a Gaussian parameter prior) or pure maximumlikelihood learning. We also show that we can learn MRF network structure at a computational cost that is not much greater than learning parameters alone, demonstrating the existence of a feasible method for this important problem. 1
Combining svms with various feature selection strategies
 Taiwan University
, 2005
"... Feature selection is an important issue in many research areas. There are some reasons for selecting important features such as reducing the learning time, improving the accuracy, etc. This thesis investigates the performance of combining support vector machines (SVM) and various feature selection s ..."
Abstract

Cited by 126 (0 self)
 Add to MetaCart
(Show Context)
Feature selection is an important issue in many research areas. There are some reasons for selecting important features such as reducing the learning time, improving the accuracy, etc. This thesis investigates the performance of combining support vector machines (SVM) and various feature selection strategies. The first part of the thesis mainly describes the existing feature selection methods and our experience on using those methods to attend a competition. The second part studies more feature selection strategies using the SVM. ii �ì��¬¡÷ � ��å�ç¢�ß��� � selection)��¥ì����£��È�� ����È������Ú���£����æÁ ç��£�����û�� ì�Öù�¡�È��(feature é£�æÁ©Â����℄���� � �Ü � ����Æ���È��℄�¡��û���℄�ø�¢�§���� �(Support Vector Machine) iii
An efficient earth mover’s distance algorithm for robust histogram comparison
 PAMI
, 2007
"... DRAFT We propose EMDL1: a fast and exact algorithm for computing the Earth Mover’s Distance (EMD) between a pair of histograms. The efficiency of the new algorithm enables its application to problems that were previously prohibitive due to high time complexities. The proposed EMDL1 significantly s ..."
Abstract

Cited by 93 (5 self)
 Add to MetaCart
(Show Context)
DRAFT We propose EMDL1: a fast and exact algorithm for computing the Earth Mover’s Distance (EMD) between a pair of histograms. The efficiency of the new algorithm enables its application to problems that were previously prohibitive due to high time complexities. The proposed EMDL1 significantly simplifies the original linear programming formulation of EMD. Exploiting the L1 metric structure, the number of unknown variables in EMDL1 is reduced to O(N) from O(N 2) of the original EMD for a histogram with N bins. In addition, the number of constraints is reduced by half and the objective function of the linear program is simplified. Formally without any approximation, we prove that the EMDL1 formulation is equivalent to the original EMD with a L1 ground distance. To perform the EMDL1 computation, we propose an efficient treebased algorithm, TreeEMD. TreeEMD exploits the fact that a basic feasible solution of the simplex algorithmbased solver forms a spanning tree when we interpret EMDL1 as a network flow optimization problem. We empirically show that this new algorithm has average time complexity of O(N 2), which significantly improves the best reported supercubic complexity of the original EMD. The accuracy of the proposed methods is evaluated by
Statistical challenges with high dimensionality: feature selection in knowledge discovery
, 2006
"... ..."
(Show Context)
Least Squares Linear Discriminant Analysis
"... Linear Discriminant Analysis (LDA) is a wellknown method for dimensionality reduction and classification. LDA in the binaryclass case has been shown to be equivalent to linear regression with the class label as the output. This implies that LDA for binaryclass classifications can be formulated as ..."
Abstract

Cited by 51 (6 self)
 Add to MetaCart
(Show Context)
Linear Discriminant Analysis (LDA) is a wellknown method for dimensionality reduction and classification. LDA in the binaryclass case has been shown to be equivalent to linear regression with the class label as the output. This implies that LDA for binaryclass classifications can be formulated as a least squares problem. Previous studies have shown certain relationship between multivariate linear regression and LDA for the multiclass case. Many of these studies show that multivariate linear regression with a specific class indicator matrix as the output can be applied as a preprocessing step for LDA. However, directly casting LDA as a least squares problem is challenging for the multiclass case. In this paper, a novel formulation for multivariate linear regression is proposed. The equivalence relationship between the proposed least squares formulation and LDA for multiclass classifications is rigorously established under a mild condition, which is shown empirically to hold in many applications involving highdimensional data. Several LDA extensions based on the equivalence relationship are discussed. 1.
Tracking curved regularized optimization solution paths
 in ‘Advances in Neural Information Processing Systems (NIPS*2004
, 2004
"... Regularization plays a central role in the analysis of modern data, where nonregularized fitting is likely to lead to overfitted models, useless for both prediction and interpretation. We consider the design of incremental algorithms which follow paths of regularized solutions, as the regularizati ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
(Show Context)
Regularization plays a central role in the analysis of modern data, where nonregularized fitting is likely to lead to overfitted models, useless for both prediction and interpretation. We consider the design of incremental algorithms which follow paths of regularized solutions, as the regularization varies. These approaches often result in methods which are both efficient and highly flexible. We suggest a general pathfollowing algorithm based on secondorder approximations, prove that under mild conditions it remains “very close ” to the path of optimal solutions and illustrate it with examples. 1
Recent Advances of Largescale Linear Classification
"... Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classifiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
(Show Context)
Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classifiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much faster. Recently, many research works have developed efficient optimization methods to construct linear classifiers and applied them to some largescale applications. In this paper, we give a comprehensive survey on the recent development of this active research area.