Results 1 -
7 of
7
AM-FM IMAGE FILTERS
"... We introduce a multicomponent invertible AM-FM image transform and use it to define new nonlinear AM-FM filters for performing modulation domain image processing. The key elements of the transform are analysis and synthesis filterbanks based on the steerable image pyramid and perfect reconstruction ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
(Show Context)
We introduce a multicomponent invertible AM-FM image transform and use it to define new nonlinear AM-FM filters for performing modulation domain image processing. The key elements of the transform are analysis and synthesis filterbanks based on the steerable image pyramid and perfect reconstruction demodulation algorithms based on analytic differentiation of continuous cubic tensor spline models fit to the unwrapped phase samples of a digital image. We demonstrate spatially and spectrally localized orientation and frequency selective filtering, simple image restoration, and image fusion in the modulation domain. These results are also among the first to demonstrate high fidelity image reconstructions from computed multicomponent AM-FM models. Index Terms — AM-FM image models, AM-FM image filters, modulation domain signal processing, multicomponent models 1.
TOWARDS CO-CHANNEL SPEAKER SEPARATION BY 2-D DEMODULATION OF SPECTROGRAMS 1
"... This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single and multi-speaker content in each region. Our model maps harmonically-related speech content to concentrated entities in a transformed 2-D space, thereby motivating 2-D demodulation of the spectrogram for analysis/synthesis and speaker separation. Using a priori pitch estimates of individual speakers, we show through a quantitative evaluation: 1) Utility of the model for representing speech content of a single speaker and 2) Its feasibility for speaker separation. For the separation task, we also illustrate benefits of the model's representation of pitch dynamics relative to a sinusoidal-based separation system. Index Terms — Grating Compression Transform, speaker separation, spectrogram demodulation, 2-D speech analysis
Hierarchical Learning: Theory with Applications in Speech and Vision
, 2009
"... ..."
(Show Context)
FM FILTERS FOR MODULATION DOMAIN IMAGE PROCESSING
"... For the first time, we demonstrate modulation domain image filters that achieve perceptually motivated image processing goals by directly manipulating the FM functions in a multi-component AM-FM image model. The action of previous modulation domain filters has been limited to modification of the AM ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
For the first time, we demonstrate modulation domain image filters that achieve perceptually motivated image processing goals by directly manipulating the FM functions in a multi-component AM-FM image model. The action of previous modulation domain filters has been limited to modification of the AM functions based on the values of the AM and FM functions. This is because reconstruction of the modified phase from the filtered frequency modulation vectors was an unsolved problem. Here, we present two new algorithms capable of reconstructing the phase from the processed frequencies, one based on a least squares solution of the discrete Poisson equation with Neumann boundary condition and one based on cubic tensor product spline integration. New modulation domain FM filters are designed to modify both the orientations and magnitudes of the visually important emergent image frequency vectors. In our most dramatic example, we demonstrate an FM filter that autonomously changes the stripes on the pants in the well known Barbara image from vertical to horizontal. Index Terms — AM-FM image models, AM-FM image filters, modulation domain signal processing, multicomponent models
Generalization and properties of the neural response
, 2010
"... Hierarchical learning algorithms have enjoyed tremendous growth in recent years, with many new al-gorithms being proposed and applied to a wide range of applications. However, despite the apparent success of hierarchical algorithms in practice, the theory of hierarchical architectures remains at an ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Hierarchical learning algorithms have enjoyed tremendous growth in recent years, with many new al-gorithms being proposed and applied to a wide range of applications. However, despite the apparent success of hierarchical algorithms in practice, the theory of hierarchical architectures remains at an early stage. In this thesis we study the theoretical properties of hierarchical algorithms from a mathematical perspective. Our work is based on the framework of hierarchical architectures introduced by Smale et al. in the paper “Mathematics of the Neural Response”, Foundations of Computational Mathematics, 2010. We propose a generalized definition of the neural response and derived kernel that allows us to integrate
Problem & Motivation Hierarchical Spectro-Temporal Models for Speech Recognition
"... We seek to explore computational approaches for audition that are inspired by computational visual neuroscience. In particular, we seek to leverage recent progress over the past few years in building a biologically-faithful hierarchical, feed-forward system for visual object recognition [13,14]. The ..."
Abstract
- Add to MetaCart
We seek to explore computational approaches for audition that are inspired by computational visual neuroscience. In particular, we seek to leverage recent progress over the past few years in building a biologically-faithful hierarchical, feed-forward system for visual object recognition [13,14]. The system, which was designed to closely match the currently known feed-forward path in the ventral stream in visual cortex, processes 2-D images in a feed-forward, hierarchical way to determine the category and identity of a particular object within that image. The system is capable of recognizing the object in the image irrespective of variations in position, scale, orientation, and in the presence of clutter. Motivated by the success of our architecture for visual object recognition, we propose to explore a similar 2-D hierarchical, feed-forward architecture for auditory object recognition. In particular, we propose to explore whether such a system may be capable of achieving state-of-the-art phonetic recognition (with and without noise). In addition, since it is likely that similar cortical mechanisms are used in both vision and audition, we believe that some of these mechanisms, which are well-known in the vision community, can be used successfully in the auditory domain. Previous Work Recent work by a number of auditory neurophysiologists [15,9] indicates that there is a secondary level of auditory analysis in the auditory cortex (AI), in which cells in AI analyze and process elements of the underlying input auditory time-frequency “image”. Measurements of the so-called spectro-temporal receptive fields (STRFs) of cells in AI indicate that they can be tuned to different optimal frequencies, have different spectral scales, and also respond to different temporal rates. Several researches have begun to apply these recent developments in neuroscience to automatic speech recognition. Mesgarani and Shamma [10] have filtered spectrograms of speech sgnals with spectro-temporal kernels derived from recordings in primary auditory cortex of the ferret. Kleinschmidt et al. [8,7] have borrowed the STRF idea, and extracted localized spectro-temporal patterns by convolving speech spectrograms with Gabor functions. They then applied the resulting features to speech recognition tasks involving noisy spoken digits.
Generalization and Properties of the Neural Response
, 2010
"... Hierarchical learning algorithms have enjoyed tremendous growth in recent years, with many new algorithms being proposed and applied to a wide range of applications. However, despite the apparent success of hierarchical algorithms in practice, the theory of hierarchical architectures remains at an e ..."
Abstract
- Add to MetaCart
(Show Context)
Hierarchical learning algorithms have enjoyed tremendous growth in recent years, with many new algorithms being proposed and applied to a wide range of applications. However, despite the apparent success of hierarchical algorithms in practice, the theory of hierarchical architectures remains at an early stage. In this paper we study the theoretical properties of hierarchical algorithms from a mathematical perspective. Our work is based on the framework of hierarchical architectures introduced by Smale et al. in the paper “Mathematics of the Neural Response”, Foundations of Computational Mathematics, 2010. We propose a generalized definition of the neural response and derived kernel that allows us to integrate some of the existing hierarchical algorithms in practice into our framework. We then use this generalized definition to analyze the theoretical properties of hierarchical architectures. Our analysis focuses on three particular aspects of the hierarchy. First, we show that a wide class of architectures suffers from range compression; essentially, the derived kernel becomes increasingly saturated at each layer. Second, we show that the complexity of a linear architecture is constrained by the complexity of the first layer, and in some cases the architecture collapses into a single-layer linear computation. Finally, we characterize the discrimination and invariance properties