Results 11 - 20
of
52
Measuring Invariances in Deep Networks
"... For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for ex ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep ” vs. “shallower ” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1
Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images
, 2010
"... Deep belief nets have been successful in modeling handwritten characters, but it has proved more difficult to apply them to real images. The problem lies in the restricted Boltzmann machine (RBM) which is used as a module for learning deep belief nets one layer at a time. The Gaussian-Binary RBMs th ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Deep belief nets have been successful in modeling handwritten characters, but it has proved more difficult to apply them to real images. The problem lies in the restricted Boltzmann machine (RBM) which is used as a module for learning deep belief nets one layer at a time. The Gaussian-Binary RBMs that have been used to model real-valued data are not a good way to model the covariance structure of natural images. We propose a factored 3-way RBM that uses the states of its hidden units to represent abnormalities in the local covariance structure of an image. This provides a probabilistic framework for the widely used simple/complex cell architecture. Our model learns binary features that work very well for object recognition on the “tiny images” data set. Even better features are obtained by then using standard binary RBM’s to learn a deeper model.
Visual Selective Behavior Can Be Triggered by a Feed-Forward Process
, 2003
"... The ventral visual pathway implements object recognition and categorization in a hierarchy of processing areas with neuronal selectivities of increasing complexity. The presence of massive feedback connections within this hierarchy raises the possibility that normal visual processing relies on the u ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
The ventral visual pathway implements object recognition and categorization in a hierarchy of processing areas with neuronal selectivities of increasing complexity. The presence of massive feedback connections within this hierarchy raises the possibility that normal visual processing relies on the use of computational loops. It is not known, however, whether object recognition can be performed at all without such loops (i.e., in a purely feed-forward mode). By analyzing the time course of reaction times in a masked natural scene categorization paradigm, we show that the human visual system can generate selective motor responses based on a single feed-forward pass. We confirm these results using a more constrained letter discrimination task, in which the rapid succession of a target and mask is actually perceived as a distractor. We show that a masked stimulus presented for only 26 msec---and often not consciously perceived---can fully determine the earliest selective motor responses: The neural representations of the stimulus and mask are thus kept separated during a short period corresponding to the feedforward "sweep." Therefore, feedback loops do not appear to be "mandatory" for visual processing. Rather, we found that such loops allow the masked stimulus to reverberate in the visual system and affect behavior for nearly 150 msec after the feed-forward sweep. &
Shared Weights Neural Networks in Image Analysis
, 1996
"... This thesis is concerned with the use of shared weights neural networks in image analysis. This type of neural network has been extensively described in literature since 1989. It is believed that networks incorporating shared weights are capable of local, shift-invariant feature extraction due to th ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
This thesis is concerned with the use of shared weights neural networks in image analysis. This type of neural network has been extensively described in literature since 1989. It is believed that networks incorporating shared weights are capable of local, shift-invariant feature extraction due to the restrictions placed on their architecture. The first experiments were focused mainly on the neural network architectures as suggested by, amongst others, Le Cun et al. [LBD + 89, LBD + 90, LJB + 89] and Viennet [Vie93]. These architectures basically are back-propagation neural networks. However, they restrain the number of free parameters and introduce the notion of receptive fields, combining local information into more abstract patterns at a higher level. Three of these networks were tested on the problem of handwritten digit recognition and the results were compared to those of methods based on other feature extraction or classification techniques. As an intermezzo, a second order...
Stacks of Convolutional Restricted Boltzmann Machines for Shift-Invariant Feature Learning
"... In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (C-RBM), a ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
In this paper we present a method for learning classspecific features for recognition. Recently a greedy layerwise procedure was proposed to initialize weights of deep belief networks, by viewing each layer as a separate Restricted Boltzmann Machine (RBM). We develop the Convolutional RBM (C-RBM), a variant of the RBM model in which weights are shared to respect the spatial structure of images. This framework learns a set of features that can generate the images of a specific object class. Our feature extraction model is a four layer hierarchy of alternating filtering and maximum subsampling. We learn feature parameters of the first and third layers viewing them as separate C-RBMs. The outputs of our feature extraction hierarchy are then fed as input to a discriminative classifier. It is experimentally demonstrated that the extracted features are effective for object detection, using them to obtain performance comparable to the state-of-the-art on handwritten digit recognition and pedestrian detection. 1.
Parallel Environments for Implementing Neural Networks
- Neural Computing Survey
, 1997
"... As artificial neural networks (ANNs) gain popularity in a variety of application domains, it is critical that these models run fast and generate results in real time. Although a number of implementations of neural networks are available on sequential machines, most of these implementations require a ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
As artificial neural networks (ANNs) gain popularity in a variety of application domains, it is critical that these models run fast and generate results in real time. Although a number of implementations of neural networks are available on sequential machines, most of these implementations require an inordinate amount of time to train or run ANNs, especially when the ANN models are large. One approach for speeding up the implementation of ANNs is to implement them on parallel machines. This paper surveys the area of parallel environments for the implementations of ANNs, and prescribes desired characteristics to look for in such implementations. 1 Introduction Although traditional von Neumann computing has been successful in many applications, it has not proved effective in solving a variety of important complex problems. At the same time, it has been observed that human beings solve these problems routinely in real time. Typical problems that fall into this class consist of perception...
A neuromorphic cortical-layer microchip for spike-based event processing vision systems
- IEEE Trans. Circuits Syst. I, Reg. Papers
, 2006
"... Abstract—We present a neuromorphic cortical-layer processing microchip for address event representation (AER) spike-based processing systems. The microchip computes 2-D convolutions of video information represented in AER format in real time. AER, as opposed to conventional frame-based video represe ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Abstract—We present a neuromorphic cortical-layer processing microchip for address event representation (AER) spike-based processing systems. The microchip computes 2-D convolutions of video information represented in AER format in real time. AER, as opposed to conventional frame-based video representation, describes visual information as a sequence of events or spikes in a way similar to biological brains. This format allows for fast information identification and processing, without waiting to process complete image frames. The neuromorphic cortical-layer processing microchip presented in this paper computes convolutions of programmable kernels over the AER visual input information flow. It not only computes convolutions but also allows for a programmable forgetting rate, which in turn allows for a bio-inspired coincidence detection processing. Kernels are programmable and can be of arbitrary shape and arbitrary size of up to 32 32 pixels. The convolution processor operates on a pixel array of size 32 32, but can process an input space of up to 128 128 pixels. Larger pixel arrays can be directly processed by tiling arrays of chips. The chip receives and generates data in AER format, which is asynchronous and digital. However, its internal operation is based on analog low-current circuit techniques. The paper describes the architecture of the chip and circuits used for the pixels, including calibration techniques to overcome mismatch. Extensive experimental results are provided, describing pixel operation and calibration, convolution processing with and without forgetting, and high-speed recognition experiments like discriminating rotating propellers of different shape rotating at speeds of up to 5000 revolutions per second. Index Terms—2-D convolutions, address-event representation (AER), bio-inspired systems, digitally calibrated analog circuits, high-speed signal processing, MOS transistor mismatch, spike-based processing, subthreshold circuits, vision, VLSI mixed-circuit design. I.
Rotation, Translation, and Scaling Tolerant Recognition of Complex Shapes Using a Hierarchical Self-Organising Neural Network
, 1997
"... A hierarchical neural network model for the identification of arbitrary contour shapes is presented. Tolerance towards translation, rotation and scaling is achieved far more costeffectively than for a fully connected multi-layer perceptron. 1 Introduction The classification of complex shapes is con ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
A hierarchical neural network model for the identification of arbitrary contour shapes is presented. Tolerance towards translation, rotation and scaling is achieved far more costeffectively than for a fully connected multi-layer perceptron. 1 Introduction The classification of complex shapes is considered to be a difficult pattern recognition task which is further complicated when the objects are subject to translation, rotation and scaling. To date, no general solution has been successfully demonstrated. Many specifically engineered systems have been developed to tackle specific problems [Suetens, 1992]. The Neocognitron [Fukushima and Miyake, 1982, Fukushima et al., 1983] is a well known biologically inspired, hierarchical artifical neural network (ANN) system, designed for numeral and character recognition. The supervised model [Fukushima et al., 1983] requires substantial user interaction during feature selection and training. Hence, it can not efficiently be applied to large and ...
Perceptual interactions in two-word displays: familiarity and similarity effects
- Journal of Experimental Psychology: Human Perception and Performance
, 1986
"... Previous studies have demonstrated the existence of perceptual interactions in the processing of two-word displays such as SAND LANE. When postcued to report one of the two words, subjects often make migration errors, in that the report of the specified word includes a letter of the other word (e.g. ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Previous studies have demonstrated the existence of perceptual interactions in the processing of two-word displays such as SAND LANE. When postcued to report one of the two words, subjects often make migration errors, in that the report of the specified word includes a letter of the other word (e.g., LAND or SANE instead of SAND). We find that migrations depend on the abstract, structural similarity of the strings, but not on the physical similarity; on whether the strings are words; and on whether the possible migration responses are words. We also rule out an interpretation of migration errors that attributes them to a guessing strategy. Our findings are interpreted in terms of models in which both strings simultaneously access high-level structural knowledge, that is, knowledge about what sequences of letters fit together to form familiar wholes. The role of structure and familiarity in visual perception has usually been studied using displays consisting of a single stimulus object It is generally observed that perception of the components of these objects is more accurate when the objects are coherent wholes than when they are random unstructured arrays. Furthermore, components of coherent objects are perceived better when they occur in these objects than when they are presented alone. For example, perception of a letter is more accurate when it occurs in a word or pseudoword than when it occurs alone or
A Theoretical Analysis of Feature Pooling in Visual Recognition
- 27TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING, HAIFA, ISRAEL
, 2010
"... Many modern visual recognition algorithms incorporate a step of spatial ‘pooling’, where the outputs of several nearby feature detectors are combined into a local or global ‘bag of features’, in a way that preserves task-related information while removing irrelevant details. Pooling is used to achie ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Many modern visual recognition algorithms incorporate a step of spatial ‘pooling’, where the outputs of several nearby feature detectors are combined into a local or global ‘bag of features’, in a way that preserves task-related information while removing irrelevant details. Pooling is used to achieve invariance to image transformations, more compact representations, and better robustness to noise and clutter. Several papers have shown that the details of the pooling operation can greatly influence the performance, but studies have so far been purely empirical. In this paper, we show that the reasons underlying the performance of various pooling methods are obscured by several confounding factors, such as the link between the sample cardinality in a spatial pool and the resolution at which low-level features have been extracted. We provide a detailed theoretical analysis of max pooling and average pooling, and give extensive empirical comparisons for object recognition tasks. 1.

