## On feature combination for multiclass object classication

### Cached

### Download Links

Venue: | In ICCV |

Citations: | 137 - 3 self |

### BibTeX

@INPROCEEDINGS{Gehler_onfeature,

author = {Peter Gehler and Sebastian Nowozin},

title = {On feature combination for multiclass object classication},

booktitle = {In ICCV},

year = {}

}

### OpenURL

### Abstract

A key ingredient in the design of visual object classification systems is the identification of relevant class specific aspects while being robust to intra-class variations. While this is a necessity in order to generalize beyond a given set of training images, it is also a very difficult problem due to the high variability of visual appearance within each class. In the last years substantial performance gains on challenging benchmark datasets have been reported in the literature. This progress can be attributed to two developments: the design of highly discriminative and robust image features and the combination of multiple complementary features based on different aspects such as shape, color or texture. In this paper we study several models that aim at learning the correct weighting of different features from training data. These include multiple kernel learning as well as simple baseline methods. Furthermore we derive ensemble methods inspired by Boosting which are easily extendable to several multiclass setting. All methods are thoroughly evaluated on object classification datasets using a multitude of feature descriptors. The key results are that even very simple baseline methods, that are orders of magnitude faster than learning techniques are highly competitive with multiple kernel learning. Furthermore the Boosting type methods are found to produce consistently better results in all experiments. We provide insight of when combination methods can be expected to work and how the benefit of complementary features can be exploited most efficiently.

### Citations

5238 | Distinctive image features from scale-invariant keypoints
- Lowe
- 2004
(Show Context)
Citation Context ... The oriented histogram Shp360 contains 40 bins, the unoriented Shp180 20 bins yielding a total of 2 × 4 kernels (L=3). Appearance Descriptor. Appearance information is modeled using SIFT descriptors =-=[12]-=- which are computed on a regular grid on the image with a spacing of 10 pixels and for the four different radii r = 4, 8, 12, 16. The descriptors are subsequently quantized into a vocabulary of visual... |

972 | Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories
- Lazebnik, Schmid, et al.
(Show Context)
Citation Context ...best results. 7.1. Image Descriptors In the following we give an overview of the features which were used for the experiments. We compute all but the V1S+ features in a spatial pyramid as proposed in =-=[10]-=-. A pyramid representation consists of several levels which themselves consist of several cells. The first level 0 of the pyramid is the image itself and in each subsequent level each cell is split in... |

569 | T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
- Ojala, Pietikäinen, et al.
- 2002
(Show Context)
Citation Context ...egion Covariance. We use the covariances of simple per-pixel features described in [19] (tangent-space projected). A pyramid representation yields 3 kernels (L=2). Local Binary Patterns. Ojala et al. =-=[15]-=- argue to use locally binary pattern (LBP) features, retaining the classification performance of textons while being much faster and simpler to extract. We use histograms of uniform rotationinvariant ... |

559 | Learning the Kernel Matrix with Semidefinite Programming
- Lanckriet, Cristianini, et al.
- 2004
(Show Context)
Citation Context ...hese feature combinations is a recent trend in class-level object recognition and image classification. One popular method in computer vision is Multiple Kernel Learning (MKL), originally proposed in =-=[9]-=-. In the application of MKL to object classification, the approach can be seen to linearly combine similarity functions between images such that the combined similarity function yields improved classi... |

476 | Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories
- Fei-Fei, Fergus, et al.
- 2004
(Show Context)
Citation Context ...s of performance if each participating feature is individually designed to be discriminative. 7. Experiments: Caltech datasets For the second set of experiments we use the well known Caltech datasets =-=[5, 7]-=- which are prominent benchmark datasets for object classification. We follow the experimental setup proposed by the designers of the datasets. Performance is measured as the mean prediction rate per c... |

281 | Multiple kernel learning, conic duality, and the SMO algorithm
- Bach, Lanckriet, et al.
- 2004
(Show Context)
Citation Context ...s MKL. Its objective is to optimize jointly over a linear combination of kernels k ∗ (x, x ′ ) = ∑ F m=1 βmkm(x, x ′ ) and the parameters α ∈ R N and b ∈ R of an SVM. MKL was originally introduced in =-=[1]-=-. For efficiency and in order to obtain sparse, interpretable coefficients, it restricts βm ≥ 0 and imposes the constraint ∑ F m=1 βm = 1. Since the scope of this paper is to access the applicability ... |

252 |
Caltech-256 object category dataset
- Griffin, Holub, et al.
- 2007
(Show Context)
Citation Context ...s of performance if each participating feature is individually designed to be discriminative. 7. Experiments: Caltech datasets For the second set of experiments we use the well known Caltech datasets =-=[5, 7]-=- which are prominent benchmark datasets for object classification. We follow the experimental setup proposed by the designers of the datasets. Performance is measured as the mean prediction rate per c... |

226 | Large scale multiple kernel learning
- Sonnenburg, Rätsch, et al.
(Show Context)
Citation Context ...m=1 F∑ βm = 1, βm ≥ 0, m = 1, . . . , F, m=1 where L(y, t) = max(0, 1 − yt) denotes the Hinge loss. We compare two different algorithms solving this problem for their runtime performance, namely SILP =-=[18]-=- 2 and SimpleMKL [17] 3 . The final binary decision function of MKL is of the following form ( F∑ FMKL(x) = sign βm(Km(x) T ) α + b) . (2) m=1 2 Available online:www.shogun-toolbox.org/ 3 Available on... |

175 |
Representing shape with a spatial pyramid kernel
- Bosch, Zisserman, et al.
- 2007
(Show Context)
Citation Context ...riefly describe the image features used for the experiments and refer to the corresponding publications for more details. PHOG Shape Descriptor. Shape is modeled using the PHOG descriptor proposed in =-=[3]-=-. The descriptor is a histogram of oriented (Shp360) or unoriented (Shp180) gradients computed on the output of a Canny edge detector. The oriented histogram Shp360 contains 40 bins, the unoriented Sh... |

150 | Learning the discriminative power-invariance trade-off
- Varma, Ray
- 2007
(Show Context)
Citation Context ...of MKL to object classification, the approach can be seen to linearly combine similarity functions between images such that the combined similarity function yields improved classification performance =-=[8, 11, 20]-=-. In Section 2 we give a general overview of the problem addressed in this paper. The Sections 3-5 describes several combination approaches. Experiments are presented in Section 6 and 7. We conclude w... |

146 | Support vector machines for multiclass pattern recognition - Weston, Watkins - 1999 |

123 | Human detection via classification on Riemannian manifolds
- Tuzel, Porikli, et al.
- 2007
(Show Context)
Citation Context ...nts) and grey image descriptors (128 dims) as well as HSV-SIFT (3*128=384dims)) with a total of 4 × 4 kernels (L=3) Region Covariance. We use the covariances of simple per-pixel features described in =-=[19]-=- (tangent-space projected). A pyramid representation yields 3 kernels (L=2). Local Binary Patterns. Ojala et al. [15] argue to use locally binary pattern (LBP) features, retaining the classification p... |

102 | Linear programming boosting via column generation
- Demiriz, Bennet, et al.
- 2002
(Show Context)
Citation Context .....,C βm ` ´ β ∈ R T Km(x) αc,m + bc,m F α ∈ R b ∈ R C×F PF y(x) = argmax m=1 c=1,...,C Bc ` ´ B ∈ R T m Km(x) αc,m + bc,m α ∈ R b ∈ R C×F C×F ×N F ×C C×F ×N (α, b)c, ind. Cc [2] 1. (α, b)c, ind 1. Cm =-=[4]-=- 2. β, jointly 2. ν ∈ (0, 1) 1. (α, b)c, ind 1. Cm, [4] 2. B, jointly 2. ν ∈ (0, 1) Table 1. Comparison of multiclass learning approaches to the feature combination problem in image and object classif... |

78 | Why is real-world visual object recognition hard? PLoS
- Pinto, Cox, et al.
(Show Context)
Citation Context ... retaining the classification performance of textons while being much faster and simpler to extract. We use histograms of uniform rotationinvariant LBP8,1-features and create 3 kernels (L=2). V1S+ In =-=[16]-=- a population of locally normalized, thresholded Gabor functions spanning a range of orientations and spatial frequencies are derived and advocated as particular simple features. This generates one ke... |

61 | A Visual Vocabulary for Flower Classification
- Nilsback, Zisserman
- 2006
(Show Context)
Citation Context ...nd found that logistic regression yields best results while assuring good convergence of the algorithm. 6. Experiments: Oxford Flowers In this section we present results on the Oxford flowers dataset =-=[13]-=-. This dataset consists of flower images depicting 17 different types with 80 images per category. Example images are shown in Figure 1. 6 Implemented using liblinear-1.33, a standard solver for linea... |

57 | More efficiency in multiple kernel learning
- Rakotomamonjy, Bach, et al.
- 2007
(Show Context)
Citation Context ..., m = 1, . . . , F, m=1 where L(y, t) = max(0, 1 − yt) denotes the Hinge loss. We compare two different algorithms solving this problem for their runtime performance, namely SILP [18] 2 and SimpleMKL =-=[17]-=- 3 . The final binary decision function of MKL is of the following form ( F∑ FMKL(x) = sign βm(Km(x) T ) α + b) . (2) m=1 2 Available online:www.shogun-toolbox.org/ 3 Available online:mloss.org/softwa... |

44 | Automated Flower Classification over a Large Number of Classes
- Nilsback, Zisserman
- 2008
(Show Context)
Citation Context ...Simple) 85.2 ± 1.5 152 siftint 70.6 ± 1.6 4 LP-β 85.5 ± 3.0 80 siftbdy 59.4 ± 3.3 5 LP-B 85.4 ± 2.4 98 Table 2. Mean accuracy for all methods on the Oxford Flowers dataset using the predefined splits =-=[14]-=-. Also plotted is the total time for model selection, training and testing in seconds. worse compared to LP-β, so fitting its C × F instead of F parameters demands for more training data. 5.3. Column ... |

44 | Multiclass multiple kernel learning
- Zien, Ong
- 2007
(Show Context)
Citation Context ...iers can be trained in parallel. Multiclass MKL For strongly unbalanced datasets a MKL classifier trained as a multiclass classifier might be preferable over the one-versus-rest setup. The authors of =-=[22]-=- derive such a MC-MKL formulation in which all parameters for all classes are trained jointly. Due to performance issues this approach renders infeasible for the experiments presented here. 4 5. Metho... |

24 |
Local ensemble kernel learning for object category recognition
- Lin, Liu, et al.
(Show Context)
Citation Context ...of MKL to object classification, the approach can be seen to linearly combine similarity functions between images such that the combined similarity function yields improved classification performance =-=[8, 11, 20]-=-. In Section 2 we give a general overview of the problem addressed in this paper. The Sections 3-5 describes several combination approaches. Experiments are presented in Section 6 and 7. We conclude w... |

23 |
Support kernel machines for object recognition
- Kumar, Sminchisescu
- 2007
(Show Context)
Citation Context ...of MKL to object classification, the approach can be seen to linearly combine similarity functions between images such that the combined similarity function yields improved classification performance =-=[8, 11, 20]-=-. In Section 2 we give a general overview of the problem addressed in this paper. The Sections 3-5 describes several combination approaches. Experiments are presented in Section 6 and 7. We conclude w... |

18 |
K (2004) Column-generation boosting methods for mixture of kernels
- Bi, Zhang, et al.
(Show Context)
Citation Context ...PF y(x) = argmax m=1 c=1,...,C βm ` ´ β ∈ R T Km(x) αc,m + bc,m F α ∈ R b ∈ R C×F PF y(x) = argmax m=1 c=1,...,C Bc ` ´ B ∈ R T m Km(x) αc,m + bc,m α ∈ R b ∈ R C×F C×F ×N F ×C C×F ×N (α, b)c, ind. Cc =-=[2]-=- 1. (α, b)c, ind 1. Cm [4] 2. β, jointly 2. ν ∈ (0, 1) 1. (α, b)c, ind 1. Cm, [4] 2. B, jointly 2. ν ∈ (0, 1) Table 1. Comparison of multiclass learning approaches to the feature combination problem i... |

17 | Let the Kernel Figure it Out: Principled Learning of Pre-processing for Kernel Classifiers
- Gehler, Nowozin
- 2009
(Show Context)
Citation Context ...to build the final descriptor. These vectors are used to generate kernels for each level which we refer to as pyramid kernels. Analog to the spatial pyramid we compute a kernel in the way proposed by =-=[6]-=-. A subwindow is drawn randomly and a histogram of SIFT features which fall into this subwindow is computed. All such histograms define a new image feature for which a kernel is computed. This process... |