Results 1 - 10
of
111
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
- International Journal of Computer Vision
, 2001
"... In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a se ..."
Abstract
-
Cited by 351 (41 self)
- Add to MetaCart
In this paper, we propose a computational model of the recognition of real world scenes that bypasses the segmentation and the processing of individual objects or regions. The procedure is based on a very low dimensional representation of the scene, that we term the Spatial Envelope. We propose a set of perceptual dimensions (naturalness, openness, roughness, expansion, ruggedness) that represent the dominant spatial structure of a scene. Then, we show that these dimensions may be reliably estimated using spectral and coarsely localized information. The model generates a multidimensional space in which scenes sharing membership in semantic categories (e.g., streets, highways, coasts) are projected closed together. The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
Classifying Facial Actions
- IEEE Trans. Pattern Anal and Machine Intell
, 1999
"... AbstractÐThe Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trai ..."
Abstract
-
Cited by 201 (18 self)
- Add to MetaCart
AbstractÐThe Facial Action Coding System (FACS) [23] is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding is presently performed by highly trained human experts. This paper explores and compares techniques for automatically recognizing facial actions in sequences of images. These techniques include analysis of facial motion through estimation of optical flow; holistic spatial analysis, such as principal component analysis, independent component analysis, local feature analysis, and linear discriminant analysis; and methods based on the outputs of local filters, such as Gabor wavelet representations and local principal components. Performance of these systems is compared to naive and expert human subjects. Best performances were obtained using the Gabor wavelet representation and the independent component representation, both of which achieved 96 percent accuracy for classifying 12 facial actions of the upper and lower face. The results provide converging evidence for the importance of using local filters, high spatial frequencies, and statistical independence for classifying facial actions.
Face recognition by independent component analysis
- IEEE Transactions on Neural Networks
, 2002
"... Abstract—A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such ..."
Abstract
-
Cited by 133 (3 self)
- Add to MetaCart
Abstract—A number of current face recognition algorithms use face representations found by unsupervised statistical methods. Typically these methods find a set of basis images and represent faces as a linear combination of those images. Principal component analysis (PCA) is a popular example of such methods. The basis images found by PCA depend only on pairwise relationships between pixels in the image database. In a task such as face recognition, in which important information may be contained in the high-order relationships among pixels, it seems reasonable to expect that better basis images may be found by methods sensitive to these high-order statistics. Independent component analysis (ICA), a generalization of PCA, is one such method. We used a version of ICA derived from the principle of optimal information transfer through sigmoidal neurons. ICA was performed on face images in the FERET database under two different architectures, one which treated the images as random variables and the pixels as outcomes, and a second which treated the pixels as random variables and the images as outcomes. The first architecture found spatially local basis images for the faces. The second architecture produced a factorial face code. Both ICA representations were superior to representations based on PCA for recognizing faces across days and changes in expression. A classifier that combined the two ICA representations gave the best performance. Index Terms—Eigenfaces, face recognition, independent component analysis (ICA), principal component analysis (PCA), unsupervised learning. I.
Non Linear Neurons in the Low Noise Limit: A Factorial Code Maximizes Information Transfer
, 1994
"... We investigate the consequences of maximizing information transfer in a simple neural network (one input layer, one output layer), focussing on the case of non linear transfer functions. We assume that both receptive fields (synaptic efficacies) and transfer functions can be adapted to the environm ..."
Abstract
-
Cited by 130 (17 self)
- Add to MetaCart
We investigate the consequences of maximizing information transfer in a simple neural network (one input layer, one output layer), focussing on the case of non linear transfer functions. We assume that both receptive fields (synaptic efficacies) and transfer functions can be adapted to the environment. The main result is that, for bounded and invertible transfer functions, in the case of a vanishing additive output noise, and no input noise, maximization of information (Linsker'sinfomax principle) leads to a factorial code - hence to the same solution as required by the redundancy reduction principle of Barlow. We show also that this result is valid for linear, more generally unbounded, transfer functions, provided optimization is performed under an additive constraint, that is which can be written as a sum of terms, each one being specific to one output neuron. Finally we study the effect of a non zero input noise. We find that, at first order in the input noise, assumed to be small ...
Point-to-point connectivity between neuromorphic chips using address-events
- IEEE Trans. Circuits Syst. II
, 2000
"... Abstract — I discuss connectivity between neuromorphic chips, which use the timing of fixed-height, fixed-width, pulses to encode information. Address-events—log2 (N)-bit packets that uniquely identify one of N neurons—are used to transmit these pulses in real-time on a random-access, time-multiplex ..."
Abstract
-
Cited by 65 (15 self)
- Add to MetaCart
Abstract — I discuss connectivity between neuromorphic chips, which use the timing of fixed-height, fixed-width, pulses to encode information. Address-events—log2 (N)-bit packets that uniquely identify one of N neurons—are used to transmit these pulses in real-time on a random-access, time-multiplexed, communication channel. Activity is assumed to consist of neuronal ensembles—spikes clustered in space and in time. I quantify tradeoffs faced in allocating bandwidth, granting access, and queuing, as well as throughput requirements, and conclude that an arbitered channel design is the best choice. I implement the arbitered channel with a formal design methodology for asynchronous digital VLSI CMOS systems, after introducing the reader to this top-down synthesis technique. Following the evolution of three generations of designs, I show how the overhead of arbitrating, and encoding and decoding, can be reduced in area (from N to √ N) by organizing neurons into rows and columns, and reduced in time (from log2 (N) to 2) by exploiting locality in the arbiter tree and in the row–column architecture, and clustered activity. Throughput is boosted by pipelining and by reading spikes in parallel. Simple techniques that reduce crosstalk in these mixed analog–digital systems are described.
Origins of Scaling in Natural Images
, 1997
"... One of the most robust qualities of our visual world is the scaleinvariance of natural images. Not only has scaling been found in different visual environments, but the phenomenon also appears to be calibration independent. This paper proposes a simple property of natural images which explains this ..."
Abstract
-
Cited by 64 (2 self)
- Add to MetaCart
One of the most robust qualities of our visual world is the scaleinvariance of natural images. Not only has scaling been found in different visual environments, but the phenomenon also appears to be calibration independent. This paper proposes a simple property of natural images which explains this robustness: They are collages of regions corresponding to statistically independent "objects". Evidence is provided for these objects having a power-law distribution of sizes within images, from which follows scaling in natural images. It is commonly suggested that scaling instead results from edges, each with power spectrum 1/k². This hypothesis is refuted by example.
Responses of Neurons in Primary and Inferior Temporal Visual Cortices to Natural Scenes
, 1997
"... Introduction It has been suggested that visual representations are optimised to transmit the maximum information about the images encountered in everyday life (Uttley, 1973; Linsker, 1988; Barlow, 1989). This simple assumption has proven sufficient to account for the characteristics of large monopo ..."
Abstract
-
Cited by 60 (5 self)
- Add to MetaCart
Introduction It has been suggested that visual representations are optimised to transmit the maximum information about the images encountered in everyday life (Uttley, 1973; Linsker, 1988; Barlow, 1989). This simple assumption has proven sufficient to account for the characteristics of large monopolar cells in the fly (Srinivasan et al., 1982; Hateren, 1992; Laughlin, 1981), the temporal characteristics of retinal ganglion cells (Dong & Atick, 1995), human spatial frequency thresholds (Atick & Redlich, 1992; Van Hateren, 1993), and the psychophysics of orientation perception for short presentation times (Baddeley & Hancock, 1991). Maximisation of information is a powerful theoretical principle that leads to testable predictions about the firing patterns of neurons. However, to generate specific predictions we must make some assumptions about the nature of the neural code and the type of constraint that limits its information carrying capacity. To appl
Statistical edge detection: learning and evaluating edge cues
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... We formulate edge detection as statistical inference. This statistical edge detection is data driven, unlike standard methods for edge detection which are model based. For any set of edge detection filters (implementing local edge cues) we use pre-segmented images to learn the probability distributi ..."
Abstract
-
Cited by 44 (4 self)
- Add to MetaCart
We formulate edge detection as statistical inference. This statistical edge detection is data driven, unlike standard methods for edge detection which are model based. For any set of edge detection filters (implementing local edge cues) we use pre-segmented images to learn the probability distributions of filter responses conditioned on whether they are evaluated on or off an edge. Edge detection is formulated as a discrimina-tion task specified by a likelihood ratio test on the filter responses. This approach emphasizes the necessity of modeling the image background (the off-edges). We rep-resent the conditional probability distributions non-parametrically and learn them on two different datasets of 100 (Sowerby) and 50 (South Florida) images. Multiple edges cues, including chrominance and multiple-scale, are combined by using their joint dis-tributions. Hence this cue combination is optimal in the statistical sense. We evaluate the effectiveness of different visual cues using the Chernoff information and Receiver Operator Characteristic (ROC) curves. This shows that our approach gives quantita-tively better results than the Canny edge detector when the image background contains significant clutter. In addition, it enables us to determine the effectiveness of different edge cues and gives quantitative measures for the advantages of multi-level processing, for the use of chrominance, and for the relative effectiveness of different detectors. Fur-thermore, we show that we can learn these conditional distributions on one dataset and adapt them to the other with only slight degradation of performance without knowing the ground truth on the second dataset. This shows that our results are not purely domain specific. We apply the same approach to the spatial grouping of edge cues and obtain analogies to non-maximal suppression and hysteresis.
A Multi-Layer Sparse Coding Network Learns Contour Coding From Natural Images
, 2002
"... An important approach in visual neuroscience considers how the function of the early visual system relates to the statistics of its natural input. Previous studies have shown how many basic properties of the primary visual cortex, such as the receptive fields of simple and complex cells and the sp ..."
Abstract
-
Cited by 41 (8 self)
- Add to MetaCart
An important approach in visual neuroscience considers how the function of the early visual system relates to the statistics of its natural input. Previous studies have shown how many basic properties of the primary visual cortex, such as the receptive fields of simple and complex cells and the spatial organization (topography) of the cells, can be understood as efficient coding of natural images. Here we extend the framework by considering how the responses of complex cells could be sparsely represented by a higher-order neural layer. This leads to contour coding and end-stopped receptive fields. In addition, contour integration could be interpreted as top-down inference in the presented model.

