Results 1 - 10
of
44
A biologically inspired system for action recognition
- In ICCV
, 2007
"... We present a biologically-motivated system for the recognition of actions from video sequences. The approach builds on recent work on object recognition based on hierarchical feedforward architectures [25, 16, 20] and extends a neurobiological model of motion processing in the visual cortex [10]. Th ..."
Abstract
-
Cited by 71 (4 self)
- Add to MetaCart
We present a biologically-motivated system for the recognition of actions from video sequences. The approach builds on recent work on object recognition based on hierarchical feedforward architectures [25, 16, 20] and extends a neurobiological model of motion processing in the visual cortex [10]. The system consists of a hierarchy of spatio-temporal feature detectors of increasing complexity: an input sequence is first analyzed by an array of motiondirection sensitive units which, through a hierarchy of processing stages, lead to position-invariant spatio-temporal feature detectors. We experiment with different types of motion-direction sensitive units as well as different system architectures. As in [16], we find that sparse features in intermediate stages outperform dense ones and that using a simple feature selection approach leads to an efficient system that performs better with far fewer features. We test the approach on different publicly available action datasets, in all cases achieving the highest results reported to date. 1.
Geometric Methods for Feature Extraction and Dimensional Reduction
- In L. Rokach and O. Maimon (Eds.), Data
, 2005
"... Abstract We give a tutorial overview of several geometric methods for feature extraction and dimensional reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component anal ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Abstract We give a tutorial overview of several geometric methods for feature extraction and dimensional reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component analysis (PCA), kernel PCA, probabilistic PCA, and oriented PCA; and for the manifold methods, we review multidimensional scaling (MDS), landmark MDS, Isomap, locally linear embedding, Laplacian eigenmaps and spectral clustering. The Nyström method, which links several of the algorithms, is also reviewed. The goal is to provide a self-contained review of the concepts and mathematics underlying these algorithms.
Learning local evidence for shading and reflection
- in Proceedings of the IEEE International Conference on Computer Vision
, 2001
"... A fundamental, unsolved vision problem is to distinguish image intensity variations caused by surface normal variations from those caused by reflectance changes–ie, to tell shading from paint. A solution to this problem is necessary for machines to interpret images as people do and could have many a ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
A fundamental, unsolved vision problem is to distinguish image intensity variations caused by surface normal variations from those caused by reflectance changes–ie, to tell shading from paint. A solution to this problem is necessary for machines to interpret images as people do and could have many applications. The labelling allows us to reconstruct bandpassed images containing only those parts of the input image caused by shading effects, and a separate image containing only those parts caused by reflectance changes. The resulting classifications compare well with human psychophysical performance on a test set of images, and show good results for test photographs.
Face recognition: A hybrid neural network approach
, 1996
"... Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult (Turk and Pentland, 1991). We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult (Turk and Pentland, 1991). We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map neural network, and a convolutional neural network. The self-organizing map provides a quantization of the image samples into a topological space where inputs that are nearby in the original space are also nearby in the output space, thereby providing dimensionality reduction and invariance to minor changes in the image sample, and the convolutional neural network provides for partial invariance to translation, rotation, scale, and deformation. The convolutional network extracts successively larger features in a hierarchical set of layers. We present results using the Karhunen-Loève transform in place of the self-organizing map, and a multilayer perceptron in place of the convolutional network. The Karhunen-Loève transform performs almost as well (5.3 % error versus 3.8%). The multilayer perceptron performs very poorly (40 % error versus 3.8%). The method is capable of rapid classification, requires only fast, approximate normalization and preprocessing, and consistently exhibits better classification performance than the eigenfaces approach (Turk and Pentland, 1991) on the database
A supervised classification algorithm for note onset detection
- EURASIP Journal on Applied Signal Processing
, 2007
"... This paper presents a novel approach to detecting onsets in music audio files. We use a supervised learning algorithm to classify spectrogram frames extracted from digital audio as being onsets or non-onsets. Frames classified as onsets are then treated with a simple peakpicking algorithm based on a ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
This paper presents a novel approach to detecting onsets in music audio files. We use a supervised learning algorithm to classify spectrogram frames extracted from digital audio as being onsets or non-onsets. Frames classified as onsets are then treated with a simple peakpicking algorithm based on a moving average. In this paper we present two versions of this approach. The first version uses a single neural network classifier. The second version combines the predictions of several networks trained using different hyperparameters. In the paper we describe the details of the algorithm and summarize the performance of both variants on several datasets. We also examine our choice of hyperparameters by describing results of cross validation experiments done on a custom dataset. We conclude that a supervised learning approach to note onset detection performs well and warrants further investigation. 1
How the brain might work: A hierarchical and temporal model for learning and recognition
- STANFORD UNIVERSITY
, 2008
"... ..."
A neuromorphic cortical-layer microchip for spike-based event processing vision systems
- IEEE Trans. Circuits Syst. I, Reg. Papers
, 2006
"... Abstract—We present a neuromorphic cortical-layer processing microchip for address event representation (AER) spike-based processing systems. The microchip computes 2-D convolutions of video information represented in AER format in real time. AER, as opposed to conventional frame-based video represe ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Abstract—We present a neuromorphic cortical-layer processing microchip for address event representation (AER) spike-based processing systems. The microchip computes 2-D convolutions of video information represented in AER format in real time. AER, as opposed to conventional frame-based video representation, describes visual information as a sequence of events or spikes in a way similar to biological brains. This format allows for fast information identification and processing, without waiting to process complete image frames. The neuromorphic cortical-layer processing microchip presented in this paper computes convolutions of programmable kernels over the AER visual input information flow. It not only computes convolutions but also allows for a programmable forgetting rate, which in turn allows for a bio-inspired coincidence detection processing. Kernels are programmable and can be of arbitrary shape and arbitrary size of up to 32 32 pixels. The convolution processor operates on a pixel array of size 32 32, but can process an input space of up to 128 128 pixels. Larger pixel arrays can be directly processed by tiling arrays of chips. The chip receives and generates data in AER format, which is asynchronous and digital. However, its internal operation is based on analog low-current circuit techniques. The paper describes the architecture of the chip and circuits used for the pixels, including calibration techniques to overcome mismatch. Extensive experimental results are provided, describing pixel operation and calibration, convolution processing with and without forgetting, and high-speed recognition experiments like discriminating rotating propellers of different shape rotating at speeds of up to 5000 revolutions per second. Index Terms—2-D convolutions, address-event representation (AER), bio-inspired systems, digitally calibrated analog circuits, high-speed signal processing, MOS transistor mismatch, spike-based processing, subthreshold circuits, vision, VLSI mixed-circuit design. I.
Global Training of Document Processing Systems using Graph Transformer Networks.
- In Proc. of Computer Vision and Pattern Recognition
, 1997
"... We propose a new machine learning paradigm called Graph Transformer Networks that extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as inputs and produce graphs as output. Training is performed by computing gradients of a global objective ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We propose a new machine learning paradigm called Graph Transformer Networks that extends the applicability of gradient-based learning algorithms to systems composed of modules that take graphs as inputs and produce graphs as output. Training is performed by computing gradients of a global objective function with respect to all the parameters in the system using a kind of back-propagation procedure. A complete check reading system based on these concepts is described. The system uses convolutional neural network character recognizers, combined with global training techniques to provides record accuracy on business and personal checks. It is presently deployed commercially and reads million of checks per month. 1. Introduction The most common technique for building document processing systems is to partition the task into manageable subtasks, such as field detection, word segmentation, or character recognition, and to build a separate module for each one. Typically, each module is trai...
Receptive fields for vision: from hyperacuity to object recognition
, 1996
"... Many of the lower-level areas in the mammalian visual system are organized retinotopically, that is, as maps which preserve to a certain degree the topography of the retina. A unit that is a part of such a retinotopic map normally responds selectively to stimulation in a well-delimited part of th ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Many of the lower-level areas in the mammalian visual system are organized retinotopically, that is, as maps which preserve to a certain degree the topography of the retina. A unit that is a part of such a retinotopic map normally responds selectively to stimulation in a well-delimited part of the visual field, referred to as its receptive field (RF). Receptive fields are probably the most prominent and ubiquitous computational mechanism employed by biological information processing systems. This paper surveys some of the possible computational reasons behind the ubiquity of RFs, by discussing examples of RF-based solutions to problems in vision, from spatial acuity, through sensory coding, to object recognition.
Signal Detection in a Nonstationary Environment Reformulated as an Adaptive Pattern Classification Problem
, 1998
"... The primary purpose of this paper is the improved detection of a nonstationary target signal embedded in a nonstationary background. Accordingly, the first part of the paper is devoted to a detailed exposition of how to deal with the issue of nonstationarity. The material presented here starts with ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The primary purpose of this paper is the improved detection of a nonstationary target signal embedded in a nonstationary background. Accordingly, the first part of the paper is devoted to a detailed exposition of how to deal with the issue of nonstationarity. The material presented here starts with Love's probabilistic theory of nonstationary processes. From this principled discussion, three important tools emerge: the dynamic spectrum, the Wigner-Ville distribution as an instantaneous estimate of the dynamic spectrum, and the Love spectrum. Procedures for the estimation of these spectra are described, and their applications are demonstrated using real-life radar data. Time, an essential dimension of learning, appears explicitly in the dynamic spectrum and Wigner-Ville distribution and implicitly in the Love spectrum. In each case, the one-dimensional time series is transformed into a two-dimensional image where the presence of nonstationarity is displayed in a more visible manner tha...

