Results 1  10
of
334
Consistency of the group lasso and multiple kernel learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it ..."
Abstract

Cited by 281 (34 self)
 Add to MetaCart
We consider the leastsquare regression problem with regularization by a block 1norm, i.e., a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the 1norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic model consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model misspecification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covariance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
On feature combination for multiclass object classication
 In ICCV
"... A key ingredient in the design of visual object classification systems is the identification of relevant class specific aspects while being robust to intraclass variations. While this is a necessity in order to generalize beyond a given set of training images, it is also a very difficult problem du ..."
Abstract

Cited by 262 (5 self)
 Add to MetaCart
A key ingredient in the design of visual object classification systems is the identification of relevant class specific aspects while being robust to intraclass variations. While this is a necessity in order to generalize beyond a given set of training images, it is also a very difficult problem due to the high variability of visual appearance within each class. In the last years substantial performance gains on challenging benchmark datasets have been reported in the literature. This progress can be attributed to two developments: the design of highly discriminative and robust image features and the combination of multiple complementary features based on different aspects such as shape, color or texture. In this paper we study several models that aim at learning the correct weighting of different features from training data. These include multiple kernel learning as well as simple baseline methods. Furthermore we derive ensemble methods inspired by Boosting which are easily extendable to several multiclass setting. All methods are thoroughly evaluated on object classification datasets using a multitude of feature descriptors. The key results are that even very simple baseline methods, that are orders of magnitude faster than learning techniques are highly competitive with multiple kernel learning. Furthermore the Boosting type methods are found to produce consistently better results in all experiments. We provide insight of when combination methods can be expected to work and how the benefit of complementary features can be exploited most efficiently.
Learning the discriminative powerinvariance tradeoff
 In ICCV
, 2007
"... We investigate the problem of learning optimal descriptors for a given classification task. Many handcrafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that ..."
Abstract

Cited by 229 (4 self)
 Add to MetaCart
We investigate the problem of learning optimal descriptors for a given classification task. Many handcrafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that it achieves between discriminative power and invariance. Since this tradeoff must vary from task to task, no single descriptor can be optimal in all situations. Our focus, in this paper, is on learning the optimal tradeoff for classification given a particular training set and prior constraints. The problem is posed in the kernel learning framework. We learn the optimal, domainspecific kernel as a combination of base kernels corresponding to base features which achieve different levels of tradeoff (such as no invariance, rotation invariance, scale invariance, affine invariance, etc.) This leads to a convex optimisation problem with a unique global optimum which can be solved for efficiently. The method is shown to achieve stateoftheart performance on the UIUC textures, Oxford flowers and Caltech 101 datasets. 1.
Learning to Predict Where Humans Look
"... For many applications in graphics, design, and human computer interaction, it is essential to understand where humans look in a scene. Where eye tracking devices are not a viable option, models of saliency can be used to predict fixation locations. Most saliency approaches are based on bottomup com ..."
Abstract

Cited by 206 (4 self)
 Add to MetaCart
(Show Context)
For many applications in graphics, design, and human computer interaction, it is essential to understand where humans look in a scene. Where eye tracking devices are not a viable option, models of saliency can be used to predict fixation locations. Most saliency approaches are based on bottomup computation that does not consider topdown image semantics and often does not match actual eye movements. To address this problem, we collected eye tracking data of 15 viewers on 1003 images and use this database as training and testing examples to learn a model of saliency based on low, middle and highlevel image features. This large database of eye tracking data is publicly available with this paper. 1.
Exploring large feature spaces with hierarchical MKL
, 2008
"... For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or H ..."
Abstract

Cited by 113 (23 self)
 Add to MetaCart
(Show Context)
For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsityinducing norms such as the ℓ 1norm or the block ℓ 1norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsityinducing norms leads to stateoftheart predictive performance. 1
Multiple kernel learning algorithms
 JMLR
, 2011
"... In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subs ..."
Abstract

Cited by 109 (1 self)
 Add to MetaCart
In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or datadependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels.
More efficiency in multiple kernel learning
 In ICML
, 2007
"... An efficient and general multiple kernel learning (MKL) algorithm has been recently proposed by Sonnenburg et al. (2006). This approach has opened new perspectives since it makes the MKL approach tractable for largescale problems, by iteratively using existing support vector machine code. However, i ..."
Abstract

Cited by 92 (5 self)
 Add to MetaCart
(Show Context)
An efficient and general multiple kernel learning (MKL) algorithm has been recently proposed by Sonnenburg et al. (2006). This approach has opened new perspectives since it makes the MKL approach tractable for largescale problems, by iteratively using existing support vector machine code. However, it turns out that this iterative algorithm needs several iterations before converging towards a reasonable solution. In this paper, we address the MKL problem through an adaptive 2norm regularization formulation. Weights on each kernel matrix are included in the standard SVM empirical risk minimization problem with a ℓ1 constraint to encourage sparsity. We propose an algorithm for solving this problem and provide an new insight on MKL algorithms based on block 1norm regularization by showing that the two approaches are equivalent. Experimental results show that the resulting algorithm converges rapidly and its efficiency compares favorably to other MKL algorithms. 1.
More Generality in Efficient Multiple Kernel Learning
"... Recent advances in Multiple Kernel Learning (MKL) have positioned it as an attractive tool for tackling many supervised learning tasks. The development of efficient gradient descent based optimization schemes has made it possible to tackle large scale problems. Simultaneously, MKL based algorithms h ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
(Show Context)
Recent advances in Multiple Kernel Learning (MKL) have positioned it as an attractive tool for tackling many supervised learning tasks. The development of efficient gradient descent based optimization schemes has made it possible to tackle large scale problems. Simultaneously, MKL based algorithms have achieved very good results on challenging real world applications. Yet, despite their successes, MKL approaches are limited in that they focus on learning a linear combination of given base kernels. In this paper, we observe that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization. This can be achieved while retaining all the efficiency of existing large scale optimization algorithms. To highlight the advantages of generalized kernel learning, we tackle feature selection problems on benchmark vision and UCI databases. It is demonstrated that the proposed formulation can lead to better results not only as compared to traditional MKL but also as compared to stateoftheart wrapper and filter methods for feature selection. 1.
Learning and Classification of Malware Behavior
 In Fifth Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 08
, 2008
"... Abstract. Malicious software in form of Internet worms, computer viruses, and Trojan horses poses a major threat to the security of networked systems. The diversity and amount of its variants severely undermine the e ectiveness of classical signaturebased detection. Yet variants of malware families ..."
Abstract

Cited by 59 (9 self)
 Add to MetaCart
(Show Context)
Abstract. Malicious software in form of Internet worms, computer viruses, and Trojan horses poses a major threat to the security of networked systems. The diversity and amount of its variants severely undermine the e ectiveness of classical signaturebased detection. Yet variants of malware families share typical behavioral patterns reflecting its origin and purpose. We aim to exploit these shared patterns for classification of malware and propose a method for learning and discrimination of malware behavior. Our method proceeds in three stages: (a) behavior of collected malware is monitored in a sandbox environment, (b) based on a corpus of malware labeled by an antivirus scanner a malware behavior classifier is trained using learning techniques and (c) discriminative features of the behavior models are ranked for explanation of classification decisions. Experiments with di erent heterogeneous test data collected over several months using honeypots demonstrate the e ectiveness of our method, especially in detecting novel instances of malware families previously not recognized by commercial antivirus software. 1
ARTS: Accurate Recognition of Transcription Starts in Human
 BIOINFORMATICS
, 2006
"... ..."
(Show Context)