## Tiled convolutional neural networks (2010)

### Cached

### Download Links

Venue: | In NIPS, in press |

Citations: | 24 - 6 self |

### BibTeX

@INPROCEEDINGS{Le10tiledconvolutional,

author = {Quoc V. Le and Jiquan Ngiam and Zhenghao Chen and Daniel Chia and Pang Wei Koh and Andrew Y. Ng},

title = {Tiled convolutional neural networks},

booktitle = {In NIPS, in press},

year = {2010}

}

### OpenURL

### Abstract

Convolutional neural networks (CNNs) have been successfully applied to many tasks such as digit and object recognition. Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hard-coded into the architecture. In this paper, we consider the problem of learning invariances, rather than relying on hardcoding. We propose tiled convolution neural networks (Tiled CNNs), which use a regular “tiled ” pattern of tied weights that does not require that adjacent hidden units share identical weights, but instead requires only that hidden units k steps away from each other to have tied weights. By pooling over neighboring units, this architecture is able to learn complex invariances (such as scale and rotational invariance) beyond translational invariance. Further, it also enjoys much of CNNs’ advantage of having a relatively small number of learned parameters (such as ease of learning and greater scalability). We provide an efficient learning algorithm for Tiled CNNs based on Topographic ICA, and show that learning complex invariant features allows us to achieve highly competitive results for both the NORB and CIFAR-10 datasets. 1

### Citations

1697 | Independent Component Analysis
- Hyvärinen, Karhunen, et al.
- 2001
(Show Context)
Citation Context ... to learn these invariances from unlabeled data, we employ unsupervised pretraining, which has been shown to help performance [5, 6, 7]. In particular, we use a modification of Topographic ICA (TICA) =-=[8]-=-, which learns to organize features in a topographical map by pooling together groups 1Figure 1: Left: Convolutional Neural Networks with local receptive fields and tied weights. Right: Partially unt... |

997 |
Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature
- Olshausen, Field
- 1996
(Show Context)
Citation Context ...he learned representation consists of multiple feature maps (Figure 1-Right). This corresponds to training TICA with an overcomplete representation (m > n). When learning overcomplete representations =-=[14]-=-, the orthogonality constraint cannot be satisfied exactly, and we instead try to satisfy an approximate orthogonality constraint [15]. Unfortunately, these approximate orthogonality constraints are c... |

619 | Liblinear: A library for large linear classification
- Fan, Chang, et al.
- 2008
(Show Context)
Citation Context ...hting conditions, and we trained our linear classifier only on data with elevations {2, 4, 6}, azimuths {10, 18, 24} and Figure 5: Test set accuracy on full and limited training sets 7 We used an SVM =-=[22]-=- as the linear classifier and determined C by cross-validation over {10 −4 , 10 −3 , . . . , 10 4 }. Models were trained with various untied map sizes k ∈ {1, 2, 9, 16, 25} and number of maps l ∈ {4, ... |

508 | A fast learning algorithm for deep belief nets
- Hinton, Osindero, et al.
(Show Context)
Citation Context ...e special case of k = 1 corresponding to convolutional networks). In order to learn these invariances from unlabeled data, we employ unsupervised pretraining, which has been shown to help performance =-=[5, 6, 7]-=-. In particular, we use a modification of Topographic ICA (TICA) [8], which learns to organize features in a topographical map by pooling together groups 1Figure 1: Left: Convolutional Neural Network... |

384 |
Reducing the dimensionality of data with neural networks
- Hinton, Salakhutdinov
- 2006
(Show Context)
Citation Context ...that Tiled CNNs learned purely on unsupervised data compare favorably to many state-of-the-art algorithms on NORB. 6.2.2 Supervised finetuning of W Next, we study the effects of supervised finetuning =-=[23]-=- on the models produced by the unsupervised pretraining phase. Supervised finetuning takes place after unsupervised pretraining, but before the supervised training of the classifier. Using softmax reg... |

264 | 80 million tiny images: A large dataset for non-parametric object and scene recognition
- Torralba, Fergus, et al.
(Show Context)
Citation Context ...tely, the Tiled CNN only requires unlabeled data for training, which can be obtained cheaply. Our preliminary results on networks pretrained using 250000 unlabeled images from the Tiny images dataset =-=[30]-=- show that performance increases as k goes from 1 to 3, flattening out at k = 4. This suggests that when there is sufficient data to avoid overfitting, setting k = p can be a very good choice. In this... |

208 | Self-taught learning: Transfer learning from unlabeled data
- Raina, Battle, et al.
- 2007
(Show Context)
Citation Context ...e special case of k = 1 corresponding to convolutional networks). In order to learn these invariances from unlabeled data, we employ unsupervised pretraining, which has been shown to help performance =-=[5, 6, 7]-=-. In particular, we use a modification of Topographic ICA (TICA) [8], which learns to organize features in a topographical map by pooling together groups 1Figure 1: Left: Convolutional Neural Network... |

205 | Greedy layer-wise training of deep networks
- Bengio, Lamblin, et al.
- 2007
(Show Context)
Citation Context ... parameters. This is because TICA’s tractable objective function allows us to monitor convergence easily. In contrast, other unsupervised feature learning algorithms such as RBMs [6] and autoencoders =-=[18]-=- require much more parameter tuning, especially during optimization. 6 Experiments 6.1 Speed-up We first establish that the local receptive fields intrinsic to Tiled CNNs allows us to implement TICA l... |

174 | Learning methods for generic object recognition with invariance to pose and lighting
- LeCun, Huang, et al.
- 2004
(Show Context)
Citation Context ...oduction Convolutional neural networks (CNNs) [1] have been successfully applied to many recognition tasks. These tasks include digit recognition (MNIST dataset [2]), object recognition (NORB dataset =-=[3]-=-), and natural language processing [4]. CNNs take translated versions of the same basis function, and “pool” over them to build translational invariant features. By sharing the same basis function acr... |

172 | Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
- Lee, Grosse, et al.
- 2009
(Show Context)
Citation Context ... rotation. We find that this improves classification performance, enabling Tiled CNNs to be competitive with previously published results on the NORB [3] and CIFAR-10 [10] datasets. 2 Tiled CNNs CNNs =-=[1, 11]-=- are based on two key concepts: local receptive fields, and weight-tying. Using local receptive fields means that each unit in the network only “looks” at a small, localized region of the input image.... |

133 | A unified architecture for natural language processing: Deep neural networks with multitask learning
- Collobert, Weston
- 2008
(Show Context)
Citation Context ... (CNNs) [1] have been successfully applied to many recognition tasks. These tasks include digit recognition (MNIST dataset [2]), object recognition (NORB dataset [3]), and natural language processing =-=[4]-=-. CNNs take translated versions of the same basis function, and “pool” over them to build translational invariant features. By sharing the same basis function across different image locations (weight-... |

126 | Best practices for convolutional neural networks applied to visual document analysis
- Simard, Steinkraus, et al.
- 2003
(Show Context)
Citation Context ... the NORB and CIFAR-10 datasets. 1 Introduction Convolutional neural networks (CNNs) [1] have been successfully applied to many recognition tasks. These tasks include digit recognition (MNIST dataset =-=[2]-=-), object recognition (NORB dataset [3]), and natural language processing [4]. CNNs take translated versions of the same basis function, and “pool” over them to build translational invariant features.... |

122 | What is the best multi-stage architecture for object recognition
- Jarrett, Kavukcuoglu, et al.
- 2009
(Show Context)
Citation Context ...small number of learnable parameters. 2Figure 2: Left: TICA network architecture. Right: TICA first layer filters (2D topography, 25 rows of W ). Unfortunately, existing methods for pretraining CNNs =-=[11, 12]-=- are not suitable for untied weights; for example, the CDBN algorithm [11] breaks down without the weight-tying constraints. In the following sections, we discuss a pretraining method for Tiled CNNs b... |

116 | Learning multiple layers of features from tiny images
- Krizhevsky
- 2009
(Show Context)
Citation Context ...at are robust to both scaling and rotation. We find that this improves classification performance, enabling Tiled CNNs to be competitive with previously published results on the NORB [3] and CIFAR-10 =-=[10]-=- datasets. 2 Tiled CNNs CNNs [1, 11] are based on two key concepts: local receptive fields, and weight-tying. Using local receptive fields means that each unit in the network only “looks” at a small, ... |

85 | Scaling learning algorithms towards AI
- Bengio, LeCun
- 2007
(Show Context)
Citation Context ...NNs (without finetuning) (Section 6.2.1) 94.5% Standard TICA (10x overcomplete) 89.6% Convolutional Neural Networks [19], [12] 94.1% , 94.4% 3D Deep Belief Networks [19] 93.5% Support Vector Machines =-=[20]-=- 88.4% Deep Boltzmann Machines [21] 92.8 % with which to learn the weights W of the Tiled CNN. We call this initial phase the unsupervised pretraining phase. After learning a feature representation fr... |

78 |
Learning deep architectures for ai
- Bengio
- 2009
(Show Context)
Citation Context ...put (RGB). 76.3.2 Deep Tiled CNNs We additionally investigate the possibility of training a deep Tiled CNN in a greedy layer-wise fashion, similar to models such as DBNs [6] and stacked autoencoders =-=[26, 18]-=-. We constructed this network by stacking two Tiled CNNs, each with 10 maps and k = 2. The resulting four-layer network has the structure W1 → V1 → W2 → V2, where the weights W1 are local receptive fi... |

74 | Learning invariant features through topographic filter maps
- Kavukcuoglu, Ranzato, et al.
- 2009
(Show Context)
Citation Context ...altogether with topographic sparse coding, those models are also expensive as they require further work either for inference at prediction time [9, 14] or for learning a decoder unit at training time =-=[17]-=-. We can avoid approximate orthogonalization by using local receptive fields, which are inherently built into Tiled CNNs. With these, the weight matrix W for each simple unit is constrained to be 0 ou... |

49 | Why does unsupervised pre-training help deep learning
- Erhan, Bengio, et al.
(Show Context)
Citation Context ...e special case of k = 1 corresponding to convolutional networks). In order to learn these invariances from unlabeled data, we employ unsupervised pretraining, which has been shown to help performance =-=[5, 6, 7]-=-. In particular, we use a modification of Topographic ICA (TICA) [8], which learns to organize features in a topographical map by pooling together groups 1Figure 1: Left: Convolutional Neural Network... |

46 | Slow feature analysis yields a rich repertoire of complex cell properties
- Berkes, Wiskott
(Show Context)
Citation Context ...st results for both the NORB and CIFAR-10 datasets, even with deep networks. More importantly, untying weights allow the networks to learn more complex invariances from unlabeled data. By visualizing =-=[28, 29]-=- the range of optimal stimulus that activate each pooling unit in a Tiled CNN, we found units that were scale and rotationally invariant. 9 We note that a standard CNN is unlikely to be invariant to t... |

41 | Wavelets and natural image statistics
- Hurri, Hyvarinen, et al.
- 1997
(Show Context)
Citation Context ...s. (Network diagrams in the paper are shown in 1D for clarity.) of related features. By pooling together local groups of features, it produces representations that are robust to local transformations =-=[9]-=-. We show in this paper how TICA can be efficiently used to pretrain Tiled CNNs through the use of local orthogonality. The resulting Tiled CNNs pretrained with TICA are indeed able to learn invariant... |

36 | 3D Object Recognition with Deep Belief Nets
- Nair, Hinton
- 1347
(Show Context)
Citation Context ...n NORB Algorithm Accuracy Tiled CNNs (with finetuning) (Section 6.2.2) 96.1% Tiled CNNs (without finetuning) (Section 6.2.1) 94.5% Standard TICA (10x overcomplete) 89.6% Convolutional Neural Networks =-=[19]-=-, [12] 94.1% , 94.4% 3D Deep Belief Networks [19] 93.5% Support Vector Machines [20] 88.4% Deep Boltzmann Machines [21] 92.8 % with which to learn the weights W of the Tiled CNN. We call this initial ... |

33 |
Gradient based learning applied to document recognition
- LeCun, Bottou, et al.
- 1998
(Show Context)
Citation Context ...hic ICA, and show that learning complex invariant features allows us to achieve highly competitive results for both the NORB and CIFAR-10 datasets. 1 Introduction Convolutional neural networks (CNNs) =-=[1]-=- have been successfully applied to many recognition tasks. These tasks include digit recognition (MNIST dataset [2]), object recognition (NORB dataset [3]), and natural language processing [4]. CNNs t... |

31 | Measuring invariances in deep networks
- Goodfellow, Le, et al.
- 2009
(Show Context)
Citation Context ...ing algorithm, which promotes selectivity by optimizing for sparsity. This combination of robustness and selectivity is central to feature invariance, which is in turn essential for recognition tasks =-=[13]-=-. If we choose square and square-root activations for the simple and pooling units in the Tiled CNN, we can view the Tiled CNN as a special case of a TICA network, with the topography of the pooling u... |

29 |
Estimation of non-normalized statistical models using score matching
- Hyvärinen
- 2005
(Show Context)
Citation Context ...tely, these approximate orthogonality constraints are computationally expensive and have hyperparameters which need to be extensively tuned. Much of this tuning can be avoided by using score matching =-=[16]-=-, but this is computationally even more expensive, and while orthogonalization can be avoided altogether with topographic sparse coding, those models are also expensive as they require further work ei... |

27 | On random weights and unsupervised feature learning
- Saxe, Koh, et al.
(Show Context)
Citation Context ...size 4x4, and W2 is of size 3x3, i.e., each unit in the third layer “looks” at a 3x3 window of each of the 10 maps in the first layer. These parameters were chosen by an efficient architecture search =-=[27]-=- on the hold-out validation set. The number of maps in the third and fourth layer is also 10. After finetuning, we found that the deep model outperformed all previous models on the validation set, and... |

25 |
G.: Modeling Pixel Means and Covariances Using Factorized Third-Order Boltzmann Machines
- Ranzato, Hinton
(Show Context)
Citation Context ...g) [10] 64.8% RBM (two layers, 10000 units, finetuning both layers) [10] 60.3% RBM (two layers, 10000 units, finetuning top layer) [10] 62.2% mcRBM (convolutional, trained on two million tiny images) =-=[24]-=- 71.0% Local Coordinate Coding (LCC) [25] 72.3% Improved Local Coordinate Coding (Improved LCC) [25] 74.5% The CIFAR-10 dataset contains 50000 training images and 10000 test images drawn from 10 categ... |

24 | Improved local coordinate coding using local tangents
- Yu, Zhang
(Show Context)
Citation Context ...s, finetuning both layers) [10] 60.3% RBM (two layers, 10000 units, finetuning top layer) [10] 62.2% mcRBM (convolutional, trained on two million tiny images) [24] 71.0% Local Coordinate Coding (LCC) =-=[25]-=- 72.3% Improved Local Coordinate Coding (Improved LCC) [25] 74.5% The CIFAR-10 dataset contains 50000 training images and 10000 test images drawn from 10 categories. 8 A summary of results for is repo... |

18 | Efficient learning of deep Boltzmann machines
- Salakhutdinov, Larochelle
- 2010
(Show Context)
Citation Context ....2.1) 94.5% Standard TICA (10x overcomplete) 89.6% Convolutional Neural Networks [19], [12] 94.1% , 94.4% 3D Deep Belief Networks [19] 93.5% Support Vector Machines [20] 88.4% Deep Boltzmann Machines =-=[21]-=- 92.8 % with which to learn the weights W of the Tiled CNN. We call this initial phase the unsupervised pretraining phase. After learning a feature representation from the unlabeled data, we train a l... |

9 | 2010a). Emergence of complexlike cells in a temporal product network with local receptive fields - Gregor, LeCun |

4 |
Visualizing higher-layer features of a deep network
- Erhan, Bengio, et al.
- 2009
(Show Context)
Citation Context ...st results for both the NORB and CIFAR-10 datasets, even with deep networks. More importantly, untying weights allow the networks to learn more complex invariances from unlabeled data. By visualizing =-=[28, 29]-=- the range of optimal stimulus that activate each pooling unit in a Tiled CNN, we found units that were scale and rotationally invariant. 9 We note that a standard CNN is unlikely to be invariant to t... |