Results 1 - 10
of
54
Sparse multinomial logistic regression: Fast algorithms and generalization bounds
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2005
"... Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exac ..."
Abstract
-
Cited by 67 (1 self)
- Add to MetaCart
Recently developed methods for learning sparse classifiers are among the state-of-the-art in supervised learning. These methods learn classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero. From a learning-theoretic perspective, these methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. This paper presents three contributions related to learning sparse classifiers. First, we introduce a true multiclass formulation based on multinomial logistic regression. Second, by combining a bound optimization approach with a component-wise update procedure, we derive fast exact algorithms for learning sparse multiclass classifiers that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional feature spaces. To the best of our knowledge, these are the first algorithms to perform exact multinomial logistic regression with a sparsity-promoting prior. Third, we show how nontrivial generalization bounds can be derived for our classifier in the binary case. Experimental results on standard benchmark data sets attest to the accuracy, sparsity, and efficiency of the proposed methods.
A tutorial on MM algorithms
- Amer. Statist
, 2004
"... Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
Most problems in frequentist statistics involve optimization of a function such as a likelihood or a sum of squares. EM algorithms are among the most effective algorithms for maximum likelihood estimation because they consistently drive the likelihood uphill by maximizing a simple surrogate function for the loglikelihood. Iterative optimization of a surrogate function as exemplified by an EM algorithm does not necessarily require missing data. Indeed, every EM algorithm is a special case of the more general class of MM optimization algorithms, which typically exploit convexity rather than missing data in majorizing or minorizing an objective function. In our opinion, MM algorithms deserve to part of the standard toolkit of professional statisticians. The current article explains the principle behind MM algorithms, suggests some methods for constructing them, and discusses some of their attractive features. We include numerous examples throughout the article to illustrate the concepts described. In addition to surveying previous work on MM algorithms, this article introduces some new material on constrained optimization and standard error estimation. Key words and phrases: constrained optimization, EM algorithm, majorization, minorization, Newton-Raphson 1 1
Distributed Weighted-Multidimensional Scaling for Node Localization in Sensor Networks
- ACM TRANSACTIONS ON SENSOR NETWORKS
, 2005
"... Accurate, distributed localization algorithms are needed for a wide variety of wireless sensor network applications. This paper introduces a scalable, distributed weighted-multidimensional scaling (dwMDS) algorithm that adaptively emphasizes the most accurate range measurements and naturally account ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
Accurate, distributed localization algorithms are needed for a wide variety of wireless sensor network applications. This paper introduces a scalable, distributed weighted-multidimensional scaling (dwMDS) algorithm that adaptively emphasizes the most accurate range measurements and naturally accounts for communication constraints within the sensor network. Each node adaptively chooses a neighborhood of sensors, updates its position estimate by minimizing a local cost function and then passes this update to neighboring sensors. Derived bounds on communication requirements provide insight on the energy efficiency of the proposed distributed method versus a centralized approach. For received signal-strength (RSS) based range measurements, we demonstrate via simulation that location estimates are nearly unbiased with variance close to the Cramer-Rao lower bound. Further, RSS and time-of-arrival (TOA) channel measurements are used to demonstrate performance as good as the centralized maximum-likelihood estimator (MLE) in a real-world sensor network.
MM algorithms for generalized Bradley-Terry models
- The Annals of Statistics
, 2004
"... The Bradley–Terry model for paired comparisons is a simple and muchstudied means to describe the probabilities of the possible outcomes when individuals are judged against one another in pairs. Among the many studies of the model in the past 75 years, numerous authors have generalized it in several ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
The Bradley–Terry model for paired comparisons is a simple and muchstudied means to describe the probabilities of the possible outcomes when individuals are judged against one another in pairs. Among the many studies of the model in the past 75 years, numerous authors have generalized it in several directions, sometimes providing iterative algorithms for obtaining maximum likelihood estimates for the generalizations. Building on a theory of algorithms known by the initials MM, for minorization–maximization, this paper presents a powerful technique for producing iterative maximum likelihood estimation algorithms for a wide class of generalizations of the Bradley–Terry model. While algorithms for problems of this type have tended to be custom-built in the literature, the techniques in this paper enable their mass production. Simple conditions are stated that guarantee that each algorithm described will produce a sequence that converges to the unique maximum likelihood estimator. Several of the algorithms and convergence results herein are new. 1. Introduction. In
On semi-supervised classification
- In
, 2005
"... A graph-based prior is proposed for parametric semi-supervised classification. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of co-training. An EM algorithm for train ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
A graph-based prior is proposed for parametric semi-supervised classification. The prior utilizes both labelled and unlabelled data; it also integrates features from multiple views of a given sample (e.g., multiple sensors), thus implementing a Bayesian form of co-training. An EM algorithm for training the classifier automatically adjusts the tradeoff between the contributions of: (a) the labelled data; (b) the unlabelled data; and (c) the co-training information. Active label query selection is performed using a mutual information based criterion that explicitly uses the unlabelled data and the co-training information. Encouraging results are presented on public benchmarks and on measured data from single and multiple sensors. 1
Variable Selection Using MM Algorithm
- Annals of Statistics
, 2005
"... Variable selection is fundamental to high-dimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function is often challenging because it may be nondifferentiable and ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Variable selection is fundamental to high-dimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function is often challenging because it may be nondifferentiable and/or nonconcave. This article proposes a new class of algorithms for finding a maximizer of the penalized likelihood for a broad class of penalty functions. These algorithms operate by perturbing the penalty function slightly to render it differentiable, then optimizing this differentiable function using a minorize–maximize (MM) algorithm. MM algorithms are useful extensions of the well-known class of EM algorithms, a fact that allows us to analyze the local and global convergence of the proposed algorithm using some of the techniques employed for EM algorithms. In particular, we prove that when our MM algorithms converge, they must converge to a desirable point; we also discuss conditions under which this convergence may be guaranteed. We exploit the Newton–Raphson-like aspect of these algorithms
A wide-angle view at iterated shrinkage algorithms
- in SPIE (Wavelet XII
, 2007
"... Sparse and redundant representations – an emerging and powerful model for signals – suggests that a data source could be described as a linear combination of few atoms from a pre-specified and over-complete dictionary. This model has drawn a considerable attention in the past decade, due to its appe ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Sparse and redundant representations – an emerging and powerful model for signals – suggests that a data source could be described as a linear combination of few atoms from a pre-specified and over-complete dictionary. This model has drawn a considerable attention in the past decade, due to its appealing theoretical foundations, and promising practical results it leads to. Many of the applications that use this model are formulated as a mixture of ℓ2-ℓp (p ≤ 1) optimization expressions. Iterated Shrinkage algorithms are a new family of highly effective numerical techniques for handling these optimization tasks, surpassing traditional optimization techniques. In this paper we aim to give a broad view of this group of methods, motivate their need, present their derivation, show their comparative performance, and most important of all, discuss their potential in various applications.
Convergent incremental optimization transfer algorithms: Application to tomography
- IEEE Trans. Med. Imag., Submitted
"... Abstract—No convergent ordered subsets (OS) type image reconstruction algorithms for transmission tomography have been proposed to date. In contrast, in emission tomography, there are two known families of convergent OS algorithms: methods that use relaxation parameters (Ahn and Fessler, 2003), and ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
Abstract—No convergent ordered subsets (OS) type image reconstruction algorithms for transmission tomography have been proposed to date. In contrast, in emission tomography, there are two known families of convergent OS algorithms: methods that use relaxation parameters (Ahn and Fessler, 2003), and methods based on the incremental expectation maximization (EM) approach (Hsiao et al., 2002). This paper generalizes the incremental EM approach by introducing a general framework that we call “incremental optimization transfer. ” Like incremental EM methods, the proposed algorithms accelerate convergence speeds and ensure global convergence (to a stationary point) under mild regularity conditions without requiring inconvenient relaxation parameters. The general optimization transfer framework enables the use of a very broad family of non-EM surrogate functions. In particular, this paper provides the first convergent OS-type algorithm for transmission tomography. The general approach is applicable to both monoenergetic and polyenergetic transmission scans as well as to other image reconstruction problems. We propose a particular incremental optimization transfer method for (nonconcave) penalized-likelihood (PL) transmission image reconstruction by using separable paraboloidal surrogates (SPS). Results show that the new “transmission incremental optimization transfer (TRIOT) ” algorithm is faster than nonincremental ordinary SPS and even OS-SPS yet is convergent. I.
Active set and EM algorithms for logconcave densities based on complete and censored data
, 2007
"... Abstract. We develop an active set algorithm for the maximum likelihood estimation of a log–concave density based on complete data. Building on this fast algorithm, we introduce an EM algorithm to treat arbitrarily censored data, e.g. right–censored or interval–censored data. 1 ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Abstract. We develop an active set algorithm for the maximum likelihood estimation of a log–concave density based on complete data. Building on this fast algorithm, we introduce an EM algorithm to treat arbitrarily censored data, e.g. right–censored or interval–censored data. 1
A fast thresholded Landweber algorithm for waveletregularized multidimensional deconvolution
- IEEE Trans. Image Process
, 2008
"... Abstract—We present a fast variational deconvolution algorithm that minimizes a quadratic data term subject to a regularization on the 1-norm of the wavelet coefficients of the solution. Previously available methods have essentially consisted in alternating between a Landweber iteration and a wavele ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Abstract—We present a fast variational deconvolution algorithm that minimizes a quadratic data term subject to a regularization on the 1-norm of the wavelet coefficients of the solution. Previously available methods have essentially consisted in alternating between a Landweber iteration and a wavelet-domain soft-thresholding operation. While having the advantage of simplicity, they are known to converge slowly. By expressing the cost functional in a Shannon wavelet basis, we are able to decompose the problem into a series of subband-dependent minimizations. In particular, this allows for larger (subband-dependent) step sizes and threshold levels than the previous method. This improves the convergence properties of the algorithm significantly. We demonstrate a speed-up of one order of magnitude in practical situations. This makes wavelet-regularized deconvolution more widely accessible, even for applications with a strong limitation on computational complexity. We present promising results in 3-D deconvolution microscopy, where the size of typical data sets does not permit more than a few tens of iterations. Index Terms—Deconvolution, fast, fluorescence microscopy, iterative, nonlinear, sparsity, 3-D, thresholding, wavelets,

