DMCA
Learning mid-level features for recognition (2010)
Cached
Download Links
Citations: | 227 - 13 self |
Citations
8949 | Distinctive Image Features from ScaleInvariant Keypoints
- Lowe
(Show Context)
Citation Context ...nition architectures. 1. Introduction Finding good image features is critical in modern approaches to category-level image classification. Many methods first extract low-level descriptors (e.g., SIFT =-=[18]-=- or HOG descriptors [5]) at interest point locations, or nodes in a dense grid. This paper considers the problem of combining these local features into a global image representation suited to recognit... |
3735 | Histograms of oriented gradients for human detection. CVPR
- Dalal, Triggs
- 2005
(Show Context)
Citation Context ... Introduction Finding good image features is critical in modern approaches to category-level image classification. Many methods first extract low-level descriptors (e.g., SIFT [18] or HOG descriptors =-=[5]-=-) at interest point locations, or nodes in a dense grid. This paper considers the problem of combining these local features into a global image representation suited to recognition using a common clas... |
1920 | Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories
- Lazebnik, Schmid, et al.
(Show Context)
Citation Context ... Laboratoire d’Informatique de l’Ecole Normale Supérieure, ENS/INRIA/CNRS UMR 8548. them as mid-level features. Popular examples of mid-level features include bags of features [25], spatial pyramids =-=[12]-=-, and the upper units of convolutional networks [13] or deep belief networks [8, 23]. Extracting these mid-level features involves the same sequence of interchangeable modules as identified by Winder ... |
1634 | Video Google: A Text Retrieval Approach to Object Matching in Videos
- Sivic, Zisserman
- 2003
(Show Context)
Citation Context ...o 4WILLOW project-team, Laboratoire d’Informatique de l’Ecole Normale Supérieure, ENS/INRIA/CNRS UMR 8548. them as mid-level features. Popular examples of mid-level features include bags of features =-=[25]-=-, spatial pyramids [12], and the upper units of convolutional networks [13] or deep belief networks [8, 23]. Extracting these mid-level features involves the same sequence of interchangeable modules a... |
1529 | Gradient-based learning applied to document recognition - Lecun, Bottou, et al. - 1998 |
955 | DJ: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vision Res
- BA, Field
- 1997
(Show Context)
Citation Context ... the limit when β → ∞). This amounts to coding as in the E-step of the expectationmaximization algorithm to learn a Gaussian mixture model, using codewords of the dictionary as centers. Sparse coding =-=[22]-=- uses a linear combination of a small number of codewords to approximate the xi. Yang et al. [31] have obtained state-of-the-art results by using sparse coding and max pooling: αi = argmin α L(α,D) , ... |
946 | A bayesian hierarchical model for learning natural scene categories
- Fei-Fei, Perona
(Show Context)
Citation Context ...nsional SIFT descriptors [18] of 16 × 16 patches. The descriptors are extracted on a dense grid rather than at interest points, as this procedure has been shown to yield superior scene classification =-=[17]-=-. Pooling regions m comprise the cells of 4×4, 2×2 and 1×1 grids (forming a three-level pyramid). We use the SPAMS toolbox [1] to compute sparse codes. 3.1. Interaction Between Modules Here, we perfor... |
796 |
Reducing the dimensionality of data with neural networks
- Hinton, Salakhutdinov
(Show Context)
Citation Context ...48. them as mid-level features. Popular examples of mid-level features include bags of features [25], spatial pyramids [12], and the upper units of convolutional networks [13] or deep belief networks =-=[8, 23]-=-. Extracting these mid-level features involves the same sequence of interchangeable modules as identified by Winder and Brown for local image descriptors [29]. In this paper, we focus on two types of ... |
783 | P (2004) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Paper presented at
- Fei-Fei, Fergus, et al.
(Show Context)
Citation Context ...ral recognition schemes using a single type of descriptors. Bold numbers in parentheses preceding the method description indicate methods reimplemented in this paper. SP: spatial pyramid. Caltech-101 =-=[6]-=- and Scenes datasets [12] as benchmarks. These datasets respectively comprise 101 object categories (plus a ”background” category) and fifteen scene categories. Following the usual procedure [12, 31],... |
496 | Linear Spatial Pyramid Matching Using Sparse Coding for
- Yang, Yu, et al.
- 2009
(Show Context)
Citation Context ... convolutional nets [13], bag-of-features methods, and HOG descriptors; max pooling is found in convolutional nets [16, 23], HMAX nets [24], and state-of-the-art variants of the spatial pyramid model =-=[31]-=-. The final global vector is formed by concatenating with suitable weights the semi-local vectors obtained for each pooling region. High levels of performance have been reported for specific pairings ... |
448 |
Expectation propagation for approximate Bayesian inference
- Minka
- 2001
(Show Context)
Citation Context ...f h over examples from positive and negative classes (henceforth denoted by + and−) to be well separated. We model the distribution of image patches of a given class as a mixture of two distributions =-=[21]-=-: patches are taken from the actual class distribution (foreground) with probability (1 − w), and from a clutter distribution (background) with probability w, with clutter patches being present in bot... |
442 | Efficient sparse coding algorithms
- Lee, Battle, et al.
- 2007
(Show Context)
Citation Context ...dictionary trained by minimizing the average of L(αi,D) over all samples, alternatively over D and the αi. It is well known that the ℓ1 penalty induces sparsity and makes the problem tractable (e.g., =-=[15, 19]-=-). 3. Systematic Evaluation of UnsupervisedMidLevel Features This section offers comprehensive comparisons of unsupervised coding schemes. In all experiments, we use the Method Caltech-101, 30 trainin... |
369 | Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,”
- Lee, Grosse, et al.
- 2009
(Show Context)
Citation Context ...s can be plugged into various architectures. For example, average pooling is found in convolutional nets [13], bag-of-features methods, and HOG descriptors; max pooling is found in convolutional nets =-=[16, 23]-=-, HMAX nets [24], and state-of-the-art variants of the spatial pyramid model [31]. The final global vector is formed by concatenating with suitable weights the semi-local vectors obtained for each poo... |
342 | SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. CVPR - Zhang, Berg, et al. - 2006 |
301 | Object categorization by learned universal visual dictionary
- Winn, Criminisi, et al.
- 2005
(Show Context)
Citation Context ...inative codebooks. Lazebnik and Raginsky [11] incorporate discriminative information by minimizing the loss of mutual information between features and labels during the quantization step. Winn et al. =-=[30]-=- prune a large codebook iteratively by fusing codewords that do not contribute to discrimination. However these methods are optimized for vector quantization. Mairal et al. [20] have proposed an algor... |
289 | Object recognition with features inspired by visual cortex
- Serre, Wolf, et al.
- 2005
(Show Context)
Citation Context ...o various architectures. For example, average pooling is found in convolutional nets [13], bag-of-features methods, and HOG descriptors; max pooling is found in convolutional nets [16, 23], HMAX nets =-=[24]-=-, and state-of-the-art variants of the spatial pyramid model [31]. The final global vector is formed by concatenating with suitable weights the semi-local vectors obtained for each pooling region. Hig... |
275 | Multiple kernels for object detection - Vedaldi, Gulshan, et al. - 2009 |
259 | On feature combination for multiclass object classification.
- Gehler, Nowozin
- 2009
(Show Context)
Citation Context ...riptors on the same dataset are shown on Table 2. Note that better performance has been reported with multiple descriptor types (e.g., methods using multiple kernel learning have achieved 77.7% ± 0.3 =-=[7]-=- and 78.0% ± 0.3 [28, 2] on Caltech-101 with 30 training examples), or subcategory learning (83% on Caltech101 [26]). The coding and pooling module combinations used in [27, 31] are included in our co... |
252 | What is the best multistage architecture for object recognition
- Jarrett, Kavukcuoglu, et al.
- 2009
(Show Context)
Citation Context ...ion is that the superiority of max pooling over average pooling generalizes to many combinations of coding schemes and classifiers. Several authors have already stressed the efficiency of max pooling =-=[10, 31]-=-, but they have not given theoretical explanations to their findings. In this section, we study max pooling in more details theoretically and experimentally. 5.1.ATheoreticalComparison ofPooling Strat... |
244 | Online dictionary learning for sparse coding
- Mairal, Bach, et al.
- 2009
(Show Context)
Citation Context ...dictionary trained by minimizing the average of L(αi,D) over all samples, alternatively over D and the αi. It is well known that the ℓ1 penalty induces sparsity and makes the problem tractable (e.g., =-=[15, 19]-=-). 3. Systematic Evaluation of UnsupervisedMidLevel Features This section offers comprehensive comparisons of unsupervised coding schemes. In all experiments, we use the Method Caltech-101, 30 trainin... |
215 | Efficient backprop.
- LeCun, Bottou, et al.
- 1998
(Show Context)
Citation Context ... it is constant, we can compute the gradient: ∂αk ∂Dij = biAjk −αjCik, (18) A , (Dα T Dα) −1, (19) b , x−α, (20) C , ADα T . (21) We train the discriminative dictionary by stochastic gradient descent =-=[4, 14]-=-. Recomputing the sparse decompositions αi at each location of a training image at each iteration is computationally costly. To simplify the computation while remaining closer to global image statisti... |
193 | Supervised dictionary learning
- Mairal, Ponce, et al.
- 2009
(Show Context)
Citation Context ...tion step. Winn et al. [30] prune a large codebook iteratively by fusing codewords that do not contribute to discrimination. However these methods are optimized for vector quantization. Mairal et al. =-=[20]-=- have proposed an algorithm to train discriminative dictionaries for sparse coding, but it requires each encoded vector to be labelled. Instead, the approach we propose is adapted to global image stat... |
174 | Discriminative learning of local image descriptors
- Brown, Hua, et al.
(Show Context)
Citation Context ... networks [13] or deep belief networks [8, 23]. Extracting these mid-level features involves the same sequence of interchangeable modules as identified by Winder and Brown for local image descriptors =-=[29]-=-. In this paper, we focus on two types of modules: • Coding: Input features are locally transformed into representations that have some desirable properties such as compactness, sparseness (i.e., most... |
130 | Sparse feature learning for deep belief networks,”
- Boureau, Cun
- 2007
(Show Context)
Citation Context ...48. them as mid-level features. Popular examples of mid-level features include bags of features [25], spatial pyramids [12], and the upper units of convolutional networks [13] or deep belief networks =-=[8, 23]-=-. Extracting these mid-level features involves the same sequence of interchangeable modules as identified by Winder and Brown for local image descriptors [29]. In this paper, we focus on two types of ... |
103 | Fast image search for learned metrics.
- Jain, Kulis, et al.
- 2008
(Show Context)
Citation Context ...y and macrofeatures giving the best performance for sparse coding. Method Caltech 15 tr. Caltech 30 tr. Scenes Boiman et al. [3] Nearest neighbor + spatial correspondence 65.0± 1.1 70.4 - Jain et al. =-=[9]-=- Fast image search for learned metrics 61.0 69.6 - Lazebnik et al. [12] (1) SP + hard quantization + kernel SVM 56.4 64.4 ± 0.8 81.4 ± 0.5 van Gemert et al. [27] (2) SP + soft quantization + kernel SV... |
82 |
Online algorithms and stochastic approximations
- Bottou
- 1998
(Show Context)
Citation Context ... it is constant, we can compute the gradient: ∂αk ∂Dij = biAjk −αjCik, (18) A , (Dα T Dα) −1, (19) b , x−α, (20) C , ADα T . (21) We train the discriminative dictionary by stochastic gradient descent =-=[4, 14]-=-. Recomputing the sparse decompositions αi at each location of a training image at each iteration is computationally costly. To simplify the computation while remaining closer to global image statisti... |
71 | Supervised learning of quantizer codebooks by information loss minimization.
- Lazebnik, Raginsky
- 2009
(Show Context)
Citation Context ...e classification task. In this section, we introduce a novel supervised method to learn the dictionary. Several authors have proposed methods to obtain discriminative codebooks. Lazebnik and Raginsky =-=[11]-=- incorporate discriminative information by minimizing the loss of mutual information between features and labels during the quantization step. Winn et al. [30] prune a large codebook iteratively by fu... |
35 | Learning subcategory relevances for category recognition.
- Todorovic, Ahuja
- 2008
(Show Context)
Citation Context ...criptor types (e.g., methods using multiple kernel learning have achieved 77.7% ± 0.3 [7] and 78.0% ± 0.3 [28, 2] on Caltech-101 with 30 training examples), or subcategory learning (83% on Caltech101 =-=[26]-=-). The coding and pooling module combinations used in [27, 31] are included in our comparative evaluation (bold numbers in parentheses on Tables 1 and 2). Overall, our results confirm the experimental... |
5 | A novel gaussianized vector representation for natural scene categorization.
- Zhou, Zhuang, et al.
- 2008
(Show Context)
Citation Context ... pooling + linear SVM 67.0± 0.5 73.2± 0.5 80.3 ± 0.9 Yang et al. [31] (4) SP + sparse codes + max pooling + kernel SVM 60.4±1.0 − 77.7±0.7 Zhang et al. [32] kNN-SVM 59.1± 0.6 66.2 ± 0.5 - Zhou et al. =-=[33]-=- SP + Gaussian mixture − − 84.1± 0.5 Table 2. Results obtained by several recognition schemes using a single type of descriptors. Bold numbers in parentheses preceding the method description indicate ... |