Results 1  10
of
129
Consistency of the group lasso and multiple kernel learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it ..."
Abstract

Cited by 162 (28 self)
 Add to MetaCart
We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic model consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
Local Rademacher complexities
 Annals of Statistics
, 2002
"... We propose new bounds on the error of learning algorithms in terms of a datadependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a ..."
Abstract

Cited by 106 (18 self)
 Add to MetaCart
We propose new bounds on the error of learning algorithms in terms of a datadependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.
Online learning for matrix factorization and sparse coding
"... Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set, adapting it t ..."
Abstract

Cited by 97 (18 self)
 Add to MetaCart
Sparse coding—that is, modelling data vectors as sparse linear combinations of basis elements—is widely used in machine learning, neuroscience, signal processing, and statistics. This paper focuses on the largescale matrix factorization problem that consists of learning the basis set, adapting it to specific data. Variations of this problem include dictionary learning in signal processing, nonnegative matrix factorization and sparse principal component analysis. In this paper, we propose to address these tasks with a new online optimization algorithm, based on stochastic approximations, which scales up gracefully to large datasets with millions of training samples, and extends naturally to various matrix factorization formulations, making it suitable for a wide range of learning problems. A proof of convergence is presented, along with experiments with natural images and genomic data demonstrating that it leads to stateoftheart performance in terms of speed and optimization for both small and large datasets.
Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization
 In Advances in Neural Information Processing Systems (NIPS
, 2007
"... by convex risk minimization ..."
Treereweighted belief propagation algorithms and approximate ML estimation by pseudomoment matching
 In AISTATS
, 2003
"... In previous work [10], we presented a class of upper bounds on the log partition function of an arbitrary undirected graphical model based on solving a convex variational problem. Here we develop a class of local messagepassing algorithms, which we call treereweighted belief propagation, for ..."
Abstract

Cited by 51 (4 self)
 Add to MetaCart
In previous work [10], we presented a class of upper bounds on the log partition function of an arbitrary undirected graphical model based on solving a convex variational problem. Here we develop a class of local messagepassing algorithms, which we call treereweighted belief propagation, for ef ciently computing the value of these upper bounds, as well as the associated pseudomarginals.
Towards a coherent statistical framework for dense deformable template estimation
 J.R. Statist. Soc.B
, 2006
"... Abstract. The problem of estimating probabilistic deformable template models in the field of computer vision or of probabilistic atlases in the field of computational anatomy has not yet received a coherent statistical formulation and remains a challenge. In this paper, we provide a careful definiti ..."
Abstract

Cited by 47 (7 self)
 Add to MetaCart
Abstract. The problem of estimating probabilistic deformable template models in the field of computer vision or of probabilistic atlases in the field of computational anatomy has not yet received a coherent statistical formulation and remains a challenge. In this paper, we provide a careful definition and analysis of a well defined statistical model based on dense deformable templates for gray level images of deformable objects. We propose a rigorous Bayesian framework for which we can derived an iterative algorithm for the effective estimation of the geometric and photometric parameters of the model in a small sample setting, together with an asymptotic consistency proof. The model is extended to mixtures of finite numbers of such components leading to a fine description of the photometric and geometric variations. We illustrate some of the ideas with images of handwritten digits, and apply the estimated models to classification through maximum likelihood. 1.
Hiroshi Imai and Masao Iri. Polygonal approximations of a curve – formulations and algorithms
 Computational Morphology
, 1988
"... Regularization by the sum of singular values, also referred to as the trace norm, is a popular technique for estimating low rank rectangular matrices. In this paper, we extend some of the consistency results of the Lasso to provide necessary and sufficient conditions for rank consistency of trace no ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
Regularization by the sum of singular values, also referred to as the trace norm, is a popular technique for estimating low rank rectangular matrices. In this paper, we extend some of the consistency results of the Lasso to provide necessary and sufficient conditions for rank consistency of trace norm minimization with the square loss. We also provide an adaptive version that is rank consistent even when the necessary condition for the non adaptive version is not fulfilled. 1.
The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
"... Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by ..."
Abstract

Cited by 41 (12 self)
 Add to MetaCart
Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by replacing linear functions with a set of onedimensional smooth functions, the nonparanormal extends the normal by transforming the variables by smooth functions. We derive a method for estimating the nonparanormal, study the method’s theoretical properties, and show that it works well in many examples.
Learning with Matrix Factorization
, 2004
"... Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
Matrices that can be factored into a product of two simpler matrices can serve as a useful and often natural model in the analysis of tabulated or highdimensional data. Models based on matrix factorization (Factor Analysis, PCA) have been extensively used in statistical analysis and machine learning for over a century, with many new formulations and models suggested in recent
Managing uncertainty in call centers using Poisson mixtures
 Applied Stochastic Models in Business and Industry
, 2001
"... We model a call center as a queueing model with Poisson arrivals having an unknown varying arrival rate. We show how to compute prediction intervals for the arrival rate, and use the Erlang formula for the waiting time to compute the consequences for the occupancy level of the call center. We compar ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
We model a call center as a queueing model with Poisson arrivals having an unknown varying arrival rate. We show how to compute prediction intervals for the arrival rate, and use the Erlang formula for the waiting time to compute the consequences for the occupancy level of the call center. We compare it to the current practice of using a point estimate of the arrival rate (assumed constant) as forecast.