Results 1 - 10
of
11
An interior-point method for large-scale l1-regularized logistic regression
- Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract
-
Cited by 77 (3 self)
- Add to MetaCart
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warm-start techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
The group Lasso for logistic regression
- Journal of the Royal Statistical Society, Series B
, 2008
"... Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regressi ..."
Abstract
-
Cited by 75 (4 self)
- Add to MetaCart
Summary. The group lasso is an extension of the lasso to do variable selection on (predefined) groups of variables in linear regression models. The estimates have the attractive property of being invariant under groupwise orthogonal reparameterizations. We extend the group lasso to logistic regression models and present an efficient algorithm, that is especially suitable for high dimensional problems, which can also be applied to generalized linear models to solve the corresponding convex optimization problem. The group lasso estimator for logistic regression is shown to be statistically consistent even if the number of predictors is much larger than sample size but with sparse true underlying structure. We further use a two-stage procedure which aims for sparser models than the group lasso, leading to improved prediction performance for some cases. Moreover, owing to the two-stage nature, the estimates can be constructed to be hierarchical. The methods are used on simulated and real data sets about splice site detection in DNA sequences.
Trust region Newton method for large-scale logistic regression
- In Proceedings of the 24th International Conference on Machine Learning (ICML
, 2007
"... Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also compare it with existing linear SVM implementations. 1
Dual averaging methods for regularized stochastic learning and online optimization
- In Advances in Neural Information Processing Systems 23
, 2009
"... We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1-norm for promoting sparsity. We develop extensions of Nes ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1-norm for promoting sparsity. We develop extensions of Nesterov’s dual averaging method, that can exploit the regularization structure in an online setting. At each iteration of these methods, the learning variables are adjusted by solving a simple minimization problem that involves the running average of all past subgradients of the loss function and the whole regularization term, not just its subgradient. In the case of ℓ1-regularization, our method is particularly effective in obtaining sparse solutions. We show that these methods achieve the optimal convergence rates or regret bounds that are standard in the literature on stochastic and online convex optimization. For stochastic learning problems in which the loss functions have Lipschitz continuous gradients, we also present an accelerated version of the dual averaging method.
Sparse Online Learning via Truncated Gradient
"... We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsification from no sparsification to ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss. This method has several essential properties. First, the degree of sparsity is continuous—a parameter controls the rate of sparsification from no sparsification to total sparsification. Second, the approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular L1-regularization method in the batch setting. We prove small rates of sparsification result in only small additional regret with respect to typical online-learning guarantees. Finally, the approach works well empirically. We apply it to several datasets and find for datasets with large numbers of features, substantial sparsity is discoverable. 1
Online Learning for Group Lasso
"... We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batch-mode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batch-mode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs in an online mode and scales well: at each iteration one can update the weight vector according to a closed-form solution based on the average of previous subgradients. Therefore, the proposed online algorithm can be very efficient and scalable. This is guaranteed by its low worst-case time complexity and memory cost both in the order of O(d), where d is the number of dimensions. Moreover, in order to achieve more sparsity in both the group level and the individual feature level, we successively extend our online system to efficiently solve a number of variants of sparse group lasso models. We also show that the online system is applicable to other group lasso models, such as the group lasso with overlap and graph lasso. Finally, we demonstrate the merits of our algorithm by experimenting with both synthetic and real-world datasets. 1.
Debellor: A Data Mining Platform with Stream Architecture
"... Abstract. This paper introduces Debellor (www.debellor.org) – an open source extensible data mining platform with stream-based architecture, where all data transfers between elementary algorithms take the form of a stream of samples. Data streaming enables implementation of scalable algorithms, whi ..."
Abstract
- Add to MetaCart
Abstract. This paper introduces Debellor (www.debellor.org) – an open source extensible data mining platform with stream-based architecture, where all data transfers between elementary algorithms take the form of a stream of samples. Data streaming enables implementation of scalable algorithms, which can efficiently process large volumes of data, exceeding available memory. This is very important for data mining research and applications, since the most challenging data mining tasks involve voluminous data, either produced by a data source or generated at some intermediate stage of a complex data processing network. Advantages of data streaming are illustrated by experiments with clustering time series. The experimental results show that even for moderatesize data sets streaming is indispensable for successful execution of algorithms, otherwise the algorithms run hundreds times slower or just crash due to memory shortage. Stream architecture is particularly useful in such application domains as time series analysis, image recognition or mining data streams. It is also the only efficient architecture for implementation of online algorithms. The algorithms currently available on Debellor platform include all classifiers from Rseslib and Weka libraries and all filters from Weka.
Classifying Spend Descriptions with off-the-shelf Learning Components
"... Analyzing spend transactions is essential to organizations for understanding their global procurement. Central to this analysis is the automated classification of these transactions to hierarchical commodity coding systems. Spend classification is challenging due not only to the complexities of the ..."
Abstract
- Add to MetaCart
Analyzing spend transactions is essential to organizations for understanding their global procurement. Central to this analysis is the automated classification of these transactions to hierarchical commodity coding systems. Spend classification is challenging due not only to the complexities of the commodity coding systems but also because of the sparseness and quality of each individual transaction text description and the volume of such transactions in an organization. In this paper, we demonstrate the application of off-the-shelf machine learning tools to address the challenges in spend classification. We have built a system using off-the-shelf SVM, Logistic Regression, and language processing toolkits and describe the effectiveness of these different learning techniques for spend classification. 1
Fast Implementation of ℓ1 Regularized Learning Algorithms Using Gradient Descent Methods ∗
"... With the advent of high-throughput technologies, ℓ1 regularized learning algorithms have attracted much attention recently. Dozens of algorithms have been proposed for fast implementation, using various advanced optimization techniques. In this paper, we demonstrate that ℓ1 regularized learning prob ..."
Abstract
- Add to MetaCart
With the advent of high-throughput technologies, ℓ1 regularized learning algorithms have attracted much attention recently. Dozens of algorithms have been proposed for fast implementation, using various advanced optimization techniques. In this paper, we demonstrate that ℓ1 regularized learning problems can be easily solved by using gradient-descent techniques. The basic idea is to transform a convex optimization problem with a non-differentiable objective function into an unconstrained non-convex problem, upon which, via gradient descent, reaching a globally optimum solution is guaranteed. We present detailed implementation of the algorithm using ℓ1 regularized logistic regression as a particular application. We conduct large-scale experiments to compare the new approach with other stateof-the-art algorithms on eight medium and large-scale problems. We demonstrate that our algorithm, though simple, performs similarly or even better than other advanced algorithms in terms of computational efficiency and memory usage.
Sparse representation of data
"... Abstract. The amount of electronic data available today as well as its dimensionality and complexity increases rapidly in many scientific areas including biology, (bio-)chemistry, medicine, physics and its application fields like robotics, bioinformatics or multimedia technologies. Many of these dat ..."
Abstract
- Add to MetaCart
Abstract. The amount of electronic data available today as well as its dimensionality and complexity increases rapidly in many scientific areas including biology, (bio-)chemistry, medicine, physics and its application fields like robotics, bioinformatics or multimedia technologies. Many of these data sets are very complex but have also a simple inherent structure which allows an appropriate sparse representation and modeling of such data with less or no information loss. Advanced methods are needed to extract these inherent buthiddeninformation. Sparsity can be observed at different levels: sparse representation of data points using e.g. dimensionality reduction for efficient data storage, sparse representation of full data sets using e.g. prototypes to achieve compact models for lifelong learning and sparse models of the underlying data structure using sparse encoding techniques. One main goal is to achieve a human-interpretable representation of the essential information. Sparse representations account for the ubiquitous problem that humans have to deal with ever increasing and inherently unlimited information by means of limited resources such as limited time, memory, or perception abilities. Starting with the seminal paper of Olshausen&Field [40] researchers recognized that sparsity can be used as a fundamental principle to arrive at very efficient information processing models for huge and complex data such as observed e.g. in the visual cortex. Nowadays, sparse models include diverse methods such as relevance learning in prototype based representations, sparse coding neural gas, factor analysis methods, latent semantic indexing, sparse Bayesian networks, relevancevectormachines andother. Thistutorialpaperreviews recent developments in the field. 1

