Results 1  10
of
132
Sparse multinomial logistic regression: fast algorithms and generalization bounds
 IEEE Trans. on Pattern Analysis and Machine Intelligence
"... Abstract—Recently developed methods for learning sparse classifiers are among the stateoftheart in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsitypromoting priors encouraging the weight estimates to be either significantly larg ..."
Abstract

Cited by 128 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Recently developed methods for learning sparse classifiers are among the stateoftheart in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsitypromoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learningtheoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a componentwise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in highdimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsitypromoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.
A tutorial on MM algorithms
 Amer. Statist
, 2004
"... Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function ..."
Abstract

Cited by 74 (3 self)
 Add to MetaCart
(Show Context)
Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function for the loglikelihood. Iterative optimization of a surrogate function as exemplified by an EM algorithm does not necessarily require missing data. Indeed, every EM algorithm is a special case of the more general class of MM optimization algorithms, which typically exploit convexity rather than missing data in majorizing or minorizing an objective function. In our opinion, MM algorithms deserve to part of the standard toolkit of professional statisticians. The current article explains the principle behind MM algorithms, suggests some methods for constructing them, and discusses some of their attractive features. We include numerous examples throughout the article to illustrate the concepts described. In addition to surveying previous work on MM algorithms, this article introduces some new material on constrained optimization and standard error estimation. Key words and phrases: constrained optimization, EM algorithm, majorization, minorization, NewtonRaphson 1 1
Onestep sparse estimates in nonconcave penalized likelihood models
 ANN. STATIST.
, 2008
"... Fan and Li propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective funct ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
(Show Context)
Fan and Li propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective function is nondifferentiable and nonconcave. In this article, we propose a new unified algorithm based on the local linear approximation (LLA) for maximizing the penalized likelihood for a broad class of concave penalty functions. Convergence and other theoretical properties of the LLA algorithm are established. A distinguished feature of the LLA algorithm is that at each LLA step, the LLA estimator can naturally adopt a sparse representation. Thus, we suggest using the onestep LLA estimator from the LLA algorithm as the final estimates. Statistically, we show that if the regularization parameter is appropriately chosen, the onestep LLA estimates enjoy the oracle properties with good initial estimators. Computationally, the onestep LLA estimation methods dramatically reduce the computational cost in maximizing the nonconcave penalized likelihood. We conduct some Monte Carlo simulation to assess the finite sample performance of the onestep sparse estimation methods. The results are very encouraging.
Variable Selection Using MM Algorithm
 Annals of Statistics
, 2005
"... Variable selection is fundamental to highdimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function is often challenging because it may be nondifferentiable and ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
(Show Context)
Variable selection is fundamental to highdimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function is often challenging because it may be nondifferentiable and/or nonconcave. This article proposes a new class of algorithms for finding a maximizer of the penalized likelihood for a broad class of penalty functions. These algorithms operate by perturbing the penalty function slightly to render it differentiable, then optimizing this differentiable function using a minorize–maximize (MM) algorithm. MM algorithms are useful extensions of the wellknown class of EM algorithms, a fact that allows us to analyze the local and global convergence of the proposed algorithm using some of the techniques employed for EM algorithms. In particular, we prove that when our MM algorithms converge, they must converge to a desirable point; we also discuss conditions under which this convergence may be guaranteed. We exploit the Newton–Raphsonlike aspect of these algorithms
On semisupervised classification
 In
, 2005
"... A graphbased prior is proposed for parametric semisupervised classification. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of cotraining. An EM algorithm for train ..."
Abstract

Cited by 40 (8 self)
 Add to MetaCart
(Show Context)
A graphbased prior is proposed for parametric semisupervised classification. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of cotraining. An EM algorithm for training the classifier automatically adjusts the tradeoff between the contributions of: (a) the labelled data; (b) the unlabelled data; and (c) the cotraining information. Active label query selection is performed using a mutual information based criterion that explicitly uses the unlabelled data and the cotraining information. Encouraging results are presented on public benchmarks and on measured data from single and multiple sensors. 1
Distributed weightedmultidimensional scaling for node localization in sensor networks
 ACM Trans. Sens. Netw
, 2006
"... Accurate, distributed localization algorithms are needed for a wide variety of wireless sensor network applications. This article introduces a scalable, distributed weightedmultidimensional scaling (dwMDS) algorithm that adaptively emphasizes the most accurate range measurements and naturally acc ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
(Show Context)
Accurate, distributed localization algorithms are needed for a wide variety of wireless sensor network applications. This article introduces a scalable, distributed weightedmultidimensional scaling (dwMDS) algorithm that adaptively emphasizes the most accurate range measurements and naturally accounts for communication constraints within the sensor network. Each node adaptively chooses a neighborhood of sensors, updates its position estimate by minimizing a local cost function and then passes this update to neighboring sensors. Derived bounds on communication requirements provide insight on the energy efficiency of the proposed distributed method versus a centralized approach. For received signalstrength (RSS) based range measurements, we demonstrate via simulation that location estimates are nearly unbiased with variance close to the CramérRao lower bound. Further, RSS and timeofarrival (TOA) channel measurements are used to demonstrate performance as good as the centralized maximumlikelihood estimator (MLE) in a realworld sensor network.
MM algorithms for generalized BradleyTerry models
 The Annals of Statistics
, 2004
"... The Bradley–Terry model for paired comparisons is a simple and muchstudied means to describe the probabilities of the possible outcomes when individuals are judged against one another in pairs. Among the many studies of the model in the past 75 years, numerous authors have generalized it in several ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
The Bradley–Terry model for paired comparisons is a simple and muchstudied means to describe the probabilities of the possible outcomes when individuals are judged against one another in pairs. Among the many studies of the model in the past 75 years, numerous authors have generalized it in several directions, sometimes providing iterative algorithms for obtaining maximum likelihood estimates for the generalizations. Building on a theory of algorithms known by the initials MM, for minorization–maximization, this paper presents a powerful technique for producing iterative maximum likelihood estimation algorithms for a wide class of generalizations of the Bradley–Terry model. While algorithms for problems of this type have tended to be custombuilt in the literature, the techniques in this paper enable their mass production. Simple conditions are stated that guarantee that each algorithm described will produce a sequence that converges to the unique maximum likelihood estimator. Several of the algorithms and convergence results herein are new. 1. Introduction. In
A wideangle view at iterated shrinkage algorithms
 in SPIE (Wavelet XII
, 2007
"... Sparse and redundant representations – an emerging and powerful model for signals – suggests that a data source could be described as a linear combination of few atoms from a prespecified and overcomplete dictionary. This model has drawn a considerable attention in the past decade, due to its appe ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Sparse and redundant representations – an emerging and powerful model for signals – suggests that a data source could be described as a linear combination of few atoms from a prespecified and overcomplete dictionary. This model has drawn a considerable attention in the past decade, due to its appealing theoretical foundations, and promising practical results it leads to. Many of the applications that use this model are formulated as a mixture of ℓ2ℓp (p ≤ 1) optimization expressions. Iterated Shrinkage algorithms are a new family of highly effective numerical techniques for handling these optimization tasks, surpassing traditional optimization techniques. In this paper we aim to give a broad view of this group of methods, motivate their need, present their derivation, show their comparative performance, and most important of all, discuss their potential in various applications.
Convergent incremental optimization transfer algorithms: Application to tomography
 IEEE Trans. Med. Imag., Submitted
"... Abstract—No convergent ordered subsets (OS) type image reconstruction algorithms for transmission tomography have been proposed to date. In contrast, in emission tomography, there are two known families of convergent OS algorithms: methods that use relaxation parameters (Ahn and Fessler, 2003), and ..."
Abstract

Cited by 24 (10 self)
 Add to MetaCart
(Show Context)
Abstract—No convergent ordered subsets (OS) type image reconstruction algorithms for transmission tomography have been proposed to date. In contrast, in emission tomography, there are two known families of convergent OS algorithms: methods that use relaxation parameters (Ahn and Fessler, 2003), and methods based on the incremental expectation maximization (EM) approach (Hsiao et al., 2002). This paper generalizes the incremental EM approach by introducing a general framework that we call “incremental optimization transfer. ” Like incremental EM methods, the proposed algorithms accelerate convergence speeds and ensure global convergence (to a stationary point) under mild regularity conditions without requiring inconvenient relaxation parameters. The general optimization transfer framework enables the use of a very broad family of nonEM surrogate functions. In particular, this paper provides the first convergent OStype algorithm for transmission tomography. The general approach is applicable to both monoenergetic and polyenergetic transmission scans as well as to other image reconstruction problems. We propose a particular incremental optimization transfer method for (nonconcave) penalizedlikelihood (PL) transmission image reconstruction by using separable paraboloidal surrogates (SPS). Results show that the new “transmission incremental optimization transfer (TRIOT) ” algorithm is faster than nonincremental ordinary SPS and even OSSPS yet is convergent. I.
A fast thresholded Landweber algorithm for waveletregularized multidimensional deconvolution
 IEEE Trans. Image Process
, 2008
"... Abstract—We present a fast variational deconvolution algorithm that minimizes a quadratic data term subject to a regularization on the 1norm of the wavelet coefficients of the solution. Previously available methods have essentially consisted in alternating between a Landweber iteration and a wavele ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
(Show Context)
Abstract—We present a fast variational deconvolution algorithm that minimizes a quadratic data term subject to a regularization on the 1norm of the wavelet coefficients of the solution. Previously available methods have essentially consisted in alternating between a Landweber iteration and a waveletdomain softthresholding operation. While having the advantage of simplicity, they are known to converge slowly. By expressing the cost functional in a Shannon wavelet basis, we are able to decompose the problem into a series of subbanddependent minimizations. In particular, this allows for larger (subbanddependent) step sizes and threshold levels than the previous method. This improves the convergence properties of the algorithm significantly. We demonstrate a speedup of one order of magnitude in practical situations. This makes waveletregularized deconvolution more widely accessible, even for applications with a strong limitation on computational complexity. We present promising results in 3D deconvolution microscopy, where the size of typical data sets does not permit more than a few tens of iterations. Index Terms—Deconvolution, fast, fluorescence microscopy, iterative, nonlinear, sparsity, 3D, thresholding, wavelets,