## Learning the discriminative powerinvariance trade-off (2007)

### Cached

### Download Links

Venue: | In ICCV |

Citations: | 168 - 4 self |

### BibTeX

@INPROCEEDINGS{Varma07learningthe,

author = {Manik Varma},

title = {Learning the discriminative powerinvariance trade-off},

booktitle = {In ICCV},

year = {2007}

}

### OpenURL

### Abstract

We investigate the problem of learning optimal descriptors for a given classification task. Many hand-crafted descriptors have been proposed in the literature for measuring visual similarity. Looking past initial differences, what really distinguishes one descriptor from another is the tradeoff that it achieves between discriminative power and invariance. Since this trade-off must vary from task to task, no single descriptor can be optimal in all situations. Our focus, in this paper, is on learning the optimal tradeoff for classification given a particular training set and prior constraints. The problem is posed in the kernel learning framework. We learn the optimal, domain-specific kernel as a combination of base kernels corresponding to base features which achieve different levels of trade-off (such as no invariance, rotation invariance, scale invariance, affine invariance, etc.) This leads to a convex optimisation problem with a unique global optimum which can be solved for efficiently. The method is shown to achieve state-of-the-art performance on the UIUC textures, Oxford flowers and Caltech 101 datasets. 1.

### Citations

5921 | Distinctive image features from scale-invariant keypoints
- Lowe
- 2004
(Show Context)
Citation Context ...om the surrounding vegetation. Shape distances between two images are calculated as the χ 2 statistic between the normalised frequency histograms of densely sampled, vector quantised SIFT descriptors =-=[29]-=- of the two images. Similarly, colour distances are computed over vocabularies of HSV descriptors and texture over MR8 filter responses [47]. Cue combination fits well within our framework as one can ... |

4266 |
Convex Optimization
- Boyd, Vandenberghe
- 2004
(Show Context)
Citation Context ...pport vectors, Y is a diagonal matrix with the labels on the diagonal and Ak is the k th column of A. The dual is convex with a unique global optimum. It is an instance of a Second Order Cone Program =-=[10]-=- and can be solved relatively efficiently by off-the-shelf numerical optimisation packages such as SeDuMi [1]. However, in order to tackle large scale problems involving hundreds of kernels we adopt t... |

1291 | A performance evaluation of local descriptors
- Mikolajczyk, Schmid
- 2005
(Show Context)
Citation Context ...es, Oxford flowers and Caltech 101 datasets. 1. Introduction A fundamental problem in visual classification is designing good descriptors and many successful ones have been proposed in the literature =-=[31]-=-. If one looks past the initial dissimilarities, what really distinguishes one descriptor from another is the trade-off that it achieves between discriminative power and invariance. For instance, imag... |

1117 | Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories
- Lazebnik, Schmid, et al.
(Show Context)
Citation Context ...4 fixed scales. These are then vector quantised to form a vocabulary of visual words. Images are represented as a bag of words and similarity between two images is given by the spatial pyramid kernel =-=[26]-=-. While AppGray is computed from gray scale images, AppColour is computed from an HSV representation. The two shape features, Shape180 and Shape360, are represented as histograms of oriented gradients... |

877 | Gradient-based learning applied to document recognition
- LeCun, Bottou, et al.
- 1998
(Show Context)
Citation Context ...ctly applicable to our problem as they are unsupervised and generally focus on learning invariances without regard to discriminative power. One might also try and learn an optimal descriptor directly =-=[21, 27, 36, 48]-=- for classification. However, our proposed solution has two advantages. First, by combining kernels, we never need to work in combined high dimensional descriptor space with all its associated problem... |

592 | Learning the kernel matrix with semidefinite programming
- Lanckriet, Cristianini, et al.
- 2004
(Show Context)
Citation Context ...n a kernel which is optimal for the specified task. Much progress has been made recently in this field and solutions have been proposed based on kernel target alignment [15], multiple kernel learning =-=[3, 24, 35, 42, 52]-=-, hyperkernels [34, 45], boosted kernels [14, 20] and other methods [2, 9]. These approaches mainly differ in the cost function that is optimised. Of particular interest are [3, 4, 35, 42] as each lea... |

426 | Local Features and Kernels for Classification of Texture and Objecs. Categories: A Comprehensive Study
- Zhang, Marszałek, et al.
- 2007
(Show Context)
Citation Context ...roblem typical of such high dimensional spaces. Second, we are able to combine heterogeneous sources of data, such as shape, colour and texture. The idea of combining descriptors has been explored in =-=[8,25,33,51]-=-. Unfortunately, these methods are not based on learning. In [25, 51] a fixed combination of descriptors is tried with all descriptors being equally weighted all the time. In [8, 33] a brute force sea... |

344 |
Perturbation analysis of optimization problems
- Bonnans, Shapiro
- 2000
(Show Context)
Citation Context ...e dual of T which is W(d) = Max α 1 t α + σ t d − 1 2 � k dkα t YKkYα (11) subject to 0 ≤ α ≤ C, 1 t Yα = 0 (12) By the principle of strong duality T(d) = W(d). Furthermore, if α ∗ maximises W , then =-=[7]-=- have shown that W is differentiable if α ∗ is unique (which it is in our case since all the kernel matrices are strictly positive definite). Finally, as proved in Lemma 2 of [12], W can be differenti... |

341 | Choosing Multiple Parameters for Support Vector Machines
- Chapelle, Vapnik, et al.
- 2002
(Show Context)
Citation Context ...y by off-the-shelf numerical optimisation packages such as SeDuMi [1]. However, in order to tackle large scale problems involving hundreds of kernels we adopt the minimax optimisation (7) strategy of =-=[12, 35]-=-. In their method, the primal is reformulated as Mind T(d) subject to d ≥ 0 and Ad ≥ p, where T(d) = Minw,ξ 1 2wtw + C1 t ξ + σ t d (8) subject to yi(w t φ(xi) + b) ≥ 1 − ξi (9) ξ ≥ 0 (10) The strateg... |

324 | Shape Matching and Object Recognition Using Low Distortion Correspondence
- Berg, Berg, et al.
- 2005
(Show Context)
Citation Context ...radients are computed at both boundary and texture edges these descriptors represent both local shape and local texture. To evaluate classification performance, we stick to the methodology adopted in =-=[6, 50]-=-. Thus, 15 images are randomly selected from all 102 class (i.e. including the background) for training and another random 15 for testing. Classification results using each of the base descriptors as ... |

301 | Multiple kernel learning, conic duality, and the SMO algorithm
- Bach, Lanckriet, et al.
- 2004
(Show Context)
Citation Context ...n a kernel which is optimal for the specified task. Much progress has been made recently in this field and solutions have been proposed based on kernel target alignment [15], multiple kernel learning =-=[3, 24, 35, 42, 52]-=-, hyperkernels [34, 45], boosted kernels [14, 20] and other methods [2, 9]. These approaches mainly differ in the cost function that is optimised. Of particular interest are [3, 4, 35, 42] as each lea... |

265 | Boosting image retrieval
- Tieu, Viola
- 2000
(Show Context)
Citation Context ...ances [13,17,30,32,37, 39,41] is also relevant to our problem. In addition, boosting has been particularly successful at learning distances and features optimised for classification and related tasks =-=[44]-=-. A recent survey of the state-of-the-art in learning distances can be found in [19]. There has also been a lot of work done on learning invariances in an unsupervised setting, see [22, 43, 49] and re... |

261 | One-shot learning of object categories
- Fei-Fei, Fergus, et al.
(Show Context)
Citation Context ...g to its maximal distance from the separating hyperplanes. 4. Experimentation In this section, we apply our method to the UIUC textures [25], Oxford flowers [33] and Caltech 101 object categorisation =-=[16]-=- databases. Since we would like to test how general the technique is, we assume that no prior knowledge is available and that no descriptor is a priori preferable to any other. We therefore set σk to ... |

250 | Large Scale Multiple Kernel Learning
- Sonnenburg, Rätsch, et al.
- 2006
(Show Context)
Citation Context ...n a kernel which is optimal for the specified task. Much progress has been made recently in this field and solutions have been proposed based on kernel target alignment [15], multiple kernel learning =-=[3, 24, 35, 42, 52]-=-, hyperkernels [34, 45], boosted kernels [14, 20] and other methods [2, 9]. These approaches mainly differ in the cost function that is optimised. Of particular interest are [3, 4, 35, 42] as each lea... |

249 | On kernel-target alignment
- Cristianini, Shawe-Taylor, et al.
- 2002
(Show Context)
Citation Context ...The optimal descriptor’s kernel matrix should have the same structure as the ideal kernel (essentially corresponding to zero intra-class and infinite inter-class distances) in kernel target alignment =-=[15]-=-. Unfortunately, most descriptors don’t have such a continuously tunable parameter. It is nevertheless possible to discretely sample the levels of invariance and generate a finite set of base descript... |

237 | Robust Object recognition with cortex-like mechanisms
- Serre, Wolf, et al.
- 2007
(Show Context)
Citation Context ... [47] (see Figure 1). Subsequent rotation, scale and similarity invariant descriptors are obtained by taking the maximum response of a basic filter over orientation, scale or both. This is similar to =-=[38]-=- where the maximum response is taken over position to achieve translation invariance. MR filter responses can also be used to derive fractal based bi-Lipschitz (including affine, perspective and non-r... |

205 | Fast pose estimation with parameter-sensitive hashing
- Shakhnarovich, Viola, et al.
- 2003
(Show Context)
Citation Context ... are [3, 4, 35, 42] as each learns the optimal kernel for classification as a linear combination of base kernels with positive weights while enforcing sparsity. The body of work on learning distances =-=[13,17,30,32,37, 39,41]-=- is also relevant to our problem. In addition, boosting has been particularly successful at learning distances and features optimised for classification and related tasks [44]. A recent survey of the ... |

195 |
X.: Representing shape with a spatial pyramid kernel
- BOSCH, ZISSERMAN, et al.
- 2007
(Show Context)
Citation Context ...roblem typical of such high dimensional spaces. Second, we are able to combine heterogeneous sources of data, such as shape, colour and texture. The idea of combining descriptors has been explored in =-=[8,25,33,51]-=-. Unfortunately, these methods are not based on learning. In [25, 51] a fixed combination of descriptors is tried with all descriptors being equally weighted all the time. In [8, 33] a brute force sea... |

168 | Slow feature analysis: Unsupervised learning of invariances
- Wiskott, Sejnowski
(Show Context)
Citation Context ... related tasks [44]. A recent survey of the state-of-the-art in learning distances can be found in [19]. There has also been a lot of work done on learning invariances in an unsupervised setting, see =-=[22, 43, 49]-=- and references within. In this scenario, an object is allowed to transform over time and a representation invariant to such transformations is learnt from the data. These methods are not directly app... |

159 | A statistical approach to texture classification from single images. IJCV
- Varma, Zisserman
- 2005
(Show Context)
Citation Context ...ut then take different transforms to derive 7 other base descriptors achieving different levels of the trade-off. The first descriptor is obtained by linearly projecting the patch onto the MR filters =-=[47]-=- (see Figure 1). Subsequent rotation, scale and similarity invariant descriptors are obtained by taking the maximum response of a basic filter over orientation, scale or both. This is similar to [38] ... |

138 | Transformation invariance in pattern recognition: Tangent distance and propagation
- Simard, Cun, et al.
(Show Context)
Citation Context ... are [3, 4, 35, 42] as each learns the optimal kernel for classification as a linear combination of base kernels with positive weights while enforcing sparsity. The body of work on learning distances =-=[13,17,30,32,37, 39,41]-=- is also relevant to our problem. In addition, boosting has been particularly successful at learning distances and features optimised for classification and related tasks [44]. A recent survey of the ... |

136 | A sparse texture representation using local affine regions
- Lazebnik, Schmid, et al.
- 2005
(Show Context)
Citation Context ...roblem typical of such high dimensional spaces. Second, we are able to combine heterogeneous sources of data, such as shape, colour and texture. The idea of combining descriptors has been explored in =-=[8,25,33,51]-=-. Unfortunately, these methods are not based on learning. In [25, 51] a fixed combination of descriptors is tried with all descriptors being equally weighted all the time. In [8, 33] a brute force sea... |

122 | Geometric blur for template matching - Berg, Malik - 2001 |

105 | Discriminative learning of local image descriptors
- Brown, Hua, et al.
(Show Context)
Citation Context ...ctly applicable to our problem as they are unsupervised and generally focus on learning invariances without regard to discriminative power. One might also try and learn an optimal descriptor directly =-=[21, 27, 36, 48]-=- for classification. However, our proposed solution has two advantages. First, by combining kernels, we never need to work in combined high dimensional descriptor space with all its associated problem... |

100 | Learning a Similarity Metric Discriminatively, with Application to Face Verification
- Chopra, Hadsell, et al.
- 2005
(Show Context)
Citation Context ... are [3, 4, 35, 42] as each learns the optimal kernel for classification as a linear combination of base kernels with positive weights while enforcing sparsity. The body of work on learning distances =-=[13,17,30,32,37, 39,41]-=- is also relevant to our problem. In addition, boosting has been particularly successful at learning distances and features optimised for classification and related tasks [44]. A recent survey of the ... |

97 | Learning from one example through shared densities of transforms
- Miller, Matsakis, et al.
- 2000
(Show Context)
Citation Context |

91 |
Projected gradient methods for linearly constrained problems
- Calamai, More
- 1987
(Show Context)
Citation Context ...olver of choice can therefore be used to maximise W and obtain α ∗ . In the second stage, T is minimised by projected gradient descent according to (13). The two stages are repeated until convergence =-=[11]-=- or a maximum number of iterations is reached at which point the weights d and support vectors α ∗ have been solved for. A novel point x can now be classified as ±1 by determining sign( � i αiyiKopt(x... |

86 | Learning the kernel with hyperkernels
- Ong, Smola, et al.
- 2005
(Show Context)
Citation Context ...e specified task. Much progress has been made recently in this field and solutions have been proposed based on kernel target alignment [15], multiple kernel learning [3, 24, 35, 42, 52], hyperkernels =-=[34, 45]-=-, boosted kernels [14, 20] and other methods [2, 9]. These approaches mainly differ in the cost function that is optimised. Of particular interest are [3, 4, 35, 42] as each learns the optimal kernel ... |

80 | On affine invariant clustering and automatic cast listing
- Fitzgibbon, Zisserman
- 2002
(Show Context)
Citation Context |

71 | On the complexity of learning the kernel matrix
- Bousquet, Herrmann
- 2003
(Show Context)
Citation Context ...y in this field and solutions have been proposed based on kernel target alignment [15], multiple kernel learning [3, 24, 35, 42, 52], hyperkernels [34, 45], boosted kernels [14, 20] and other methods =-=[2, 9]-=-. These approaches mainly differ in the cost function that is optimised. Of particular interest are [3, 4, 35, 42] as each learns the optimal kernel for classification as a linear combination of base ... |

66 | Kernel design using boosting
- Crammer, Keshet, et al.
- 2003
(Show Context)
Citation Context ...gress has been made recently in this field and solutions have been proposed based on kernel target alignment [15], multiple kernel learning [3, 24, 35, 42, 52], hyperkernels [34, 45], boosted kernels =-=[14, 20]-=- and other methods [2, 9]. These approaches mainly differ in the cost function that is optimised. Of particular interest are [3, 4, 35, 42] as each learns the optimal kernel for classification as a li... |

65 | A visual vocabulary for flower classification
- Nilsback, Zisserman
- 2006
(Show Context)
Citation Context |

62 |
Learning texture discrimination masks
- Jain, Karu
- 1996
(Show Context)
Citation Context ...ctly applicable to our problem as they are unsupervised and generally focus on learning invariances without regard to discriminative power. One might also try and learn an optimal descriptor directly =-=[21, 27, 36, 48]-=- for classification. However, our proposed solution has two advantages. First, by combining kernels, we never need to work in combined high dimensional descriptor space with all its associated problem... |

61 | More efficiency in multiple kernel learning
- Rakotomamonjy, Bach, et al.
- 2007
(Show Context)
Citation Context |

46 | Learning convex combinations of continuously parameterized basic kernels
- Argyriou, Micchelli, et al.
(Show Context)
Citation Context ...y in this field and solutions have been proposed based on kernel target alignment [15], multiple kernel learning [3, 24, 35, 42, 52], hyperkernels [34, 45], boosted kernels [14, 20] and other methods =-=[2, 9]-=-. These approaches mainly differ in the cost function that is optimised. Of particular interest are [3, 4, 35, 42] as each learns the optimal kernel for classification as a linear combination of base ... |

46 | Multiclass multiple kernel learning - Zien, Ong - 2007 |

42 | Computing regularization paths for learning multiple kernels
- Bach
- 2004
(Show Context)
Citation Context ...arning [3, 24, 35, 42, 52], hyperkernels [34, 45], boosted kernels [14, 20] and other methods [2, 9]. These approaches mainly differ in the cost function that is optimised. Of particular interest are =-=[3, 4, 35, 42]-=- as each learns the optimal kernel for classification as a linear combination of base kernels with positive weights while enforcing sparsity. The body of work on learning distances [13,17,30,32,37, 39... |

42 |
The optimal distance measure for object detection
- Mahamud, Hebert
- 2003
(Show Context)
Citation Context |

41 | Learning silhouette features for control of human motion
- Ren, Hodgins, et al.
- 2005
(Show Context)
Citation Context |

27 |
Local ensemble kernel learning for object category recognition
- Lin, Liu, et al.
- 2007
(Show Context)
Citation Context ...to the state-of-the-art, note that [50] combine shape and texture features to obtain 59.08 ± 0.37% and [18] combine colour features in addition to get 60.3 ± 0.70%. Kernel target alignment is used by =-=[28]-=- to combine 8 kernels based on shape, colour texture and other cues. Their results are 59.80%. In [23], a performance of 57.83% is achieved by combining 12 kernels using the MKL-Block l1 method. In [8... |

26 |
Learning a Kernel Function for Classification with Small Training Samples
- Hertz, Hillel, et al.
- 2006
(Show Context)
Citation Context ...gress has been made recently in this field and solutions have been proposed based on kernel target alignment [15], multiple kernel learning [3, 24, 35, 42, 52], hyperkernels [34, 45], boosted kernels =-=[14, 20]-=- and other methods [2, 9]. These approaches mainly differ in the cost function that is optimised. Of particular interest are [3, 4, 35, 42] as each learns the optimal kernel for classification as a li... |

26 |
C.: Support Kernel Machines for Object Recognition
- Kumar, Sminc
(Show Context)
Citation Context ... [18] combine colour features in addition to get 60.3 ± 0.70%. Kernel target alignment is used by [28] to combine 8 kernels based on shape, colour texture and other cues. Their results are 59.80%. In =-=[23]-=-, a performance of 57.83% is achieved by combining 12 kernels using the MKL-Block l1 method. In [8], a brute force search is performed over a validation set to learn the best combination of their 4 ke... |

21 | Locally invariant fractal features for statistical texture classification
- Varma, Garg
(Show Context)
Citation Context ...n invariance. MR filter responses can also be used to derive fractal based bi-Lipschitz (including affine, perspective and non-rigid surface deformations) invariant and rotation invariant descriptors =-=[46]-=-. Finally, patches can directly yield rotation invariant descriptors by aligning them according to their dominant orientation. Figure 1. The extended MR8 filter bank. Invariance 1NN SVM (1-vs-1) None ... |

17 | Efficient hyperkernel learning using second-order cone programming
- Tsang, Kwok
- 2006
(Show Context)
Citation Context ...e specified task. Much progress has been made recently in this field and solutions have been proposed based on kernel target alignment [15], multiple kernel learning [3, 24, 35, 42, 52], hyperkernels =-=[34, 45]-=-, boosted kernels [14, 20] and other methods [2, 9]. These approaches mainly differ in the cost function that is optimised. Of particular interest are [3, 4, 35, 42] as each learns the optimal kernel ... |

14 |
Learning viewpoint invariant perceptual representations from cluttered images
- Spratling
- 2005
(Show Context)
Citation Context ... related tasks [44]. A recent survey of the state-of-the-art in learning distances can be found in [19]. There has also been a lot of work done on learning invariances in an unsupervised setting, see =-=[22, 43, 49]-=- and references within. In this scenario, an object is allowed to transform over time and a representation invariant to such transformations is learnt from the data. These methods are not directly app... |

8 |
Image retrieval and recognition using local distance function
- Frome, Singer, et al.
- 2006
(Show Context)
Citation Context ...n in Table 3 and Figure 7 gives a qualitative feel of the learnt weights. To compare our results to the state-of-the-art, note that [50] combine shape and texture features to obtain 59.08 ± 0.37% and =-=[18]-=- combine colour features in addition to get 60.3 ± 0.70%. Kernel target alignment is used by [28] to combine 8 kernels based on shape, colour texture and other cues. Their results are 59.80%. In [23],... |

7 |
Learning distance functions: algorithms and applications
- Hertz
- 2006
(Show Context)
Citation Context ... has been particularly successful at learning distances and features optimised for classification and related tasks [44]. A recent survey of the state-of-the-art in learning distances can be found in =-=[19]-=-. There has also been a lot of work done on learning invariances in an unsupervised setting, see [22, 43, 49] and references within. In this scenario, an object is allowed to transform over time and a... |

5 | Invariant operators, small samples, and the bias-variance dilemma
- Shi, Manduchi
- 2004
(Show Context)
Citation Context ...s performed over a validation set to determine the best descriptor weights. Finally, the idea of a trade-off between invariance and discriminative power is well known and is explored theoretically in =-=[40]-=-. However, rather than learning the actual trade-off, their proposed randomised invariants solution is to add noise to the training set features. The noise parameters, corresponding to the trade-off, ... |

4 | B.: Fast transformation-invariant component analysis
- Kannan, Jojic, et al.
- 2008
(Show Context)
Citation Context ... related tasks [44]. A recent survey of the state-of-the-art in learning distances can be found in [19]. There has also been a lot of work done on learning invariances in an unsupervised setting, see =-=[22, 43, 49]-=- and references within. In this scenario, an object is allowed to transform over time and a representation invariant to such transformations is learnt from the data. These methods are not directly app... |

2 |
Optimal filter-bank design for multiple texture discrimination
- Randen, Husoy
- 1997
(Show Context)
Citation Context |