Results 1 - 10
of
85
A fast iterative shrinkage-thresholding algorithm with application to . . .
, 2009
"... We consider the class of Iterative Shrinkage-Thresholding Algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods is attractive due to its simplicity, however, they are also known to converge quite slowly. In this paper we present a Fast Iterat ..."
Abstract
-
Cited by 138 (3 self)
- Add to MetaCart
We consider the class of Iterative Shrinkage-Thresholding Algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods is attractive due to its simplicity, however, they are also known to converge quite slowly. In this paper we present a Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) which preserves the computational simplicity of ISTA, but with a global rate of convergence which is proven to be significantly better, both theoretically and practically. Initial promising numerical results for wavelet-based image deblurring demonstrate the capabilities of FISTA.
An interior-point method for large-scale l1-regularized logistic regression
- Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract
-
Cited by 77 (3 self)
- Add to MetaCart
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interior-point method for solving large-scale ℓ1-regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warm-start techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method
- SIAM J. Sci. Comput
, 2001
"... We describe new algorithms of the locally optimal block preconditioned conjugate gradient (LOBPCG) method for symmetric eigenvalue problems, based on a local optimization of a three-term recurrence, and suggest several other new methods. To be able to compare numerically different methods in the cla ..."
Abstract
-
Cited by 65 (10 self)
- Add to MetaCart
We describe new algorithms of the locally optimal block preconditioned conjugate gradient (LOBPCG) method for symmetric eigenvalue problems, based on a local optimization of a three-term recurrence, and suggest several other new methods. To be able to compare numerically different methods in the class, with different preconditioners, we propose a common system of model tests, using random preconditioners and initial guesses. As the "ideal" control algorithm, we advocate the standard preconditioned conjugate gradient method for finding an eigenvector as an element of the null-space of the corresponding homogeneous system of linear equations under the assumption that the eigenvalue is known. We recommend that every new preconditioned eigensolver be compared with this "ideal" algorithm on our model test problems in terms of the speed of convergence, costs of every iteration, and memory requirements. We provide such comparison for our LOBPCG method. Numerical results establish that our algorithm is practically as efficient as the "ideal" algorithm when the same preconditioner is used in both methods. We also show numerically that the LOBPCG method provides approximations to first eigenpairs of about the same quality as those by the much more expensive global optimization method on the same generalized block Krylov subspace. We propose a new version of block Davidson's method as a generalization of the LOBPCG method. Finally, direct numerical comparisons with the Jacobi-Davidson method show that our method is more robust and converges almost two times faster.
Successive Overrelaxation for Support Vector Machines
- IEEE Transactions on Neural Networks
, 1998
"... Successive overrelaxation (SOR) for symmetric linear complementarity problems and quadratic programs [11, 12, 9] is used to train a support vector machine (SVM) [20, 3] for discriminating between the elements of two massive datasets, each with millions of points. Because SOR handles one point at a t ..."
Abstract
-
Cited by 61 (14 self)
- Add to MetaCart
Successive overrelaxation (SOR) for symmetric linear complementarity problems and quadratic programs [11, 12, 9] is used to train a support vector machine (SVM) [20, 3] for discriminating between the elements of two massive datasets, each with millions of points. Because SOR handles one point at a time, similar to Platt's sequential minimal optimization (SMO) algorithm [18] which handles two constraints at a time, it can process very large datasets that need not reside in memory. The algorithm converges linearly to a solution. Encouraging numerical results are presented on datasets with up to 10 million points. Such massive discrimination problems cannot be processed by conventional linear or quadratic programming methods, and to our knowledge have not been solved by other methods. 1 Introduction Successive overrelaxation, originally developed for the solution of large systems of linear equations [16, 15] has been successfully applied to mathematical programming problems [4, 11, 12, 1...
Feature Selection via Mathematical Programming
, 1997
"... The problem of discriminating between two finite point sets in n-dimensional feature space by a separating plane that utilizes as few of the features as possible, is formulated as a mathematical program with a parametric objective function and linear constraints. The step function that appears in th ..."
Abstract
-
Cited by 51 (22 self)
- Add to MetaCart
The problem of discriminating between two finite point sets in n-dimensional feature space by a separating plane that utilizes as few of the features as possible, is formulated as a mathematical program with a parametric objective function and linear constraints. The step function that appears in the objective function can be approximated by a sigmoid or by a concave exponential on the nonnegative real line, or it can be treated exactly by considering the equivalent linear program with equilibrium constraints (LPEC). Computational tests of these three approaches on publicly available real-world databases have been carried out and compared with an adaptation of the optimal brain damage (OBD) method for reducing neural network complexity. One feature selection algorithm via concave minimization (FSV) reduced cross-validation error on a cancer prognosis database by 35.4% while reducing problem features from 32 to 4. Feature selection is an important problem in machine learning [18, 15, 1...
Solving monotone inclusions via compositions of nonexpansive averaged operators
- Optimization
, 2004
"... A unified fixed point theoretic framework is proposed to investigate the asymptotic behavior of algorithms for finding solutions to monotone inclusion problems. The basic iterative scheme under consideration involves nonstationary compositions of perturbed averaged nonexpansive operators. The analys ..."
Abstract
-
Cited by 36 (14 self)
- Add to MetaCart
A unified fixed point theoretic framework is proposed to investigate the asymptotic behavior of algorithms for finding solutions to monotone inclusion problems. The basic iterative scheme under consideration involves nonstationary compositions of perturbed averaged nonexpansive operators. The analysis covers proximal methods for common zero problems as well as various splitting methods for finding a zero of the sum of monotone operators.
Semi-Supervised Support Vector Machines for Unlabeled Data Classification
- Optimization Methods and Software
, 2001
"... A concave minimization approach is proposed for classifying unlabeled data based on the following ideas: (i) A small representative percentage (5% to 10%) of the unlabeled data is chosen by a clustering algorithm and given to an expert or oracle to label. (ii) A linear support vector machine is trai ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
A concave minimization approach is proposed for classifying unlabeled data based on the following ideas: (i) A small representative percentage (5% to 10%) of the unlabeled data is chosen by a clustering algorithm and given to an expert or oracle to label. (ii) A linear support vector machine is trained using the small labeled sample while simultaneously assigning the remaining bulk of the unlabeled dataset to one of two classes so as to maximize the margin (distance) between the two bounding planes that determine the separating plane midway between them. This latter problem is formulated as a concave minimization problem on a polyhedral set for which a stationary point is quickly obtained by solving a few (5 to 7) linear programs. Such stationary points turn out to be very e#ective as evidenced by our computational results which show that clustered concave minimization yields: (a) Test set improvement as high as 20.4% over a linear support vector machine trained on a correspondingly sm...
Incremental Subgradient Methods For Nondifferentiable Optimization
, 2001
"... We consider a class of subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions. This type of minimization arises in a dual context from Lagrangian relaxation of the coupling constraints of large scale separable problems. The idea is to p ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
We consider a class of subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions. This type of minimization arises in a dual context from Lagrangian relaxation of the coupling constraints of large scale separable problems. The idea is to perform the subgradient iteration incrementally, by sequentially taking steps along the subgradients of the component functions, with intermediate adjustment of the variables after processing each component function. This incremental approach has been very successful in solving large di#erentiable least squares problems, such as those arising in the training of neural networks, and it has resulted in a much better practical rate of convergence than the steepest descent method. In this paper, we establish the convergence properties of a number of variants of incremental subgradient methods, including some that are stochastic. Based on the analysis and computational experiments, the methods appear very promising and e#ective for important classes of large problems. A particularly interesting discovery is that by randomizing the order of selection of component functions for iteration, the convergence rate is substantially improved. 1 Research supported by NSF under Grant ACI-9873339.
Weak Sharp Minima In Mathematical Programming
- SIAM Journal on Control and Optimization
, 1993
"... . The notion of a sharp, or strongly unique, minimum is extended to include the possibility of a nonunique solution set. These minima will be called weak sharp minima. Conditions necessary for the solution set of a minimization problem to be a set of weak sharp minima are developed in both the uncon ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
. The notion of a sharp, or strongly unique, minimum is extended to include the possibility of a nonunique solution set. These minima will be called weak sharp minima. Conditions necessary for the solution set of a minimization problem to be a set of weak sharp minima are developed in both the unconstrained and constrained cases. These conditions are also shown to be sufficient under the appropriate convexity hypotheses. The existence of weak sharp minima is characterized in the cases of linear and quadratic convex programming and for the linear complementarity problem. In particular, we reproduce a result of Mangasarian and Meyer that shows that the solution set of a linear program is always a set of weak sharp minima whenever it is nonempty. Consequences for the convergence theory of algorithms is also examined, especially conditions yielding finite termination. 1. Introduction. Let f : X 7! IR : = IR S f\Gamma1; 1g, we say that f has a sharp minimum at ¯ x 2 IR n if f(x) f(¯x)...
Proximal Minimization Methods with Generalized Bregman Functions
- SIAM JOURNAL ON CONTROL AND OPTIMIZATION
, 1995
"... We consider methods for minimizing a convex function f that generate a sequence fx k g by taking x k+1 to be an approximate minimizer of f(x) +D h (x; x k )=c k , where c k ? 0 and D h is the D-function of a Bregman function h. Extensions are made to B-functions that generalize Bregman func ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
We consider methods for minimizing a convex function f that generate a sequence fx k g by taking x k+1 to be an approximate minimizer of f(x) +D h (x; x k )=c k , where c k ? 0 and D h is the D-function of a Bregman function h. Extensions are made to B-functions that generalize Bregman functions and cover more applications. Convergence is established under criteria amenable to implementation. Applications are made to nonquadratic multiplier methods for nonlinear programs.

