• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

AM-FM Demodulation of Spectrograms using Localized 2D Max-Gabor Analysis (2007)

by T Ezzat, J Bouvrie
Add To MetaCart

Tools

Sorted by:
Results 1 - 7 of 7

AM-FM IMAGE FILTERS

by Chuong T. Nguyen, Joseph P. Havlicek
"... We introduce a multicomponent invertible AM-FM image transform and use it to define new nonlinear AM-FM filters for performing modulation domain image processing. The key elements of the transform are analysis and synthesis filterbanks based on the steerable image pyramid and perfect reconstruction ..."
Abstract - Cited by 5 (5 self) - Add to MetaCart
We introduce a multicomponent invertible AM-FM image transform and use it to define new nonlinear AM-FM filters for performing modulation domain image processing. The key elements of the transform are analysis and synthesis filterbanks based on the steerable image pyramid and perfect reconstruction demodulation algorithms based on analytic differentiation of continuous cubic tensor spline models fit to the unwrapped phase samples of a digital image. We demonstrate spatially and spectrally localized orientation and frequency selective filtering, simple image restoration, and image fusion in the modulation domain. These results are also among the first to demonstrate high fidelity image reconstructions from computed multicomponent AM-FM models. Index Terms — AM-FM image models, AM-FM image filters, modulation domain signal processing, multicomponent models 1.
(Show Context)

Citation Context

...fication, content-based retrieval, and regeneration of occluded and damaged textures [5], as well as for infrared target tracking [6] and in the analysis of (2-D) spectrograms of human speech signals =-=[7]-=-. To date, however, they have been considerably less successful in applications requiring image synthesis in addition to analysis (to the best of our knowledge, reconstruction from a computed AM-FM mo...

TOWARDS CO-CHANNEL SPEAKER SEPARATION BY 2-D DEMODULATION OF SPECTROGRAMS 1

by Tianyu T. Wang
"... This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single and multi-speaker content in each region. Our model maps harmonically-related speech content to concentrated entities in a transformed 2-D space, thereby motivating 2-D demodulation of the spectrogram for analysis/synthesis and speaker separation. Using a priori pitch estimates of individual speakers, we show through a quantitative evaluation: 1) Utility of the model for representing speech content of a single speaker and 2) Its feasibility for speaker separation. For the separation task, we also illustrate benefits of the model's representation of pitch dynamics relative to a sinusoidal-based separation system. Index Terms — Grating Compression Transform, speaker separation, spectrogram demodulation, 2-D speech analysis
(Show Context)

Citation Context

...uency regions of a narrowband spectrogram using 2-D Fourier transforms, a representation we refer to as the Grating Compression Transform (GCT). The GCT has been explored by Quatieri [5], Ezzat et al =-=[6, 7]-=-, and Wang and Quatieri [8] primarily for single-speaker analysis and is consistent with physiological modeling studies implicating 2-D analysis of sounds by auditory cortex neurons [9]. Ezzat et al. ...

Hierarchical Learning: Theory with Applications in Speech and Vision

by Jacob V. Bouvrie, Tomaso Poggio, Jacob V. Bouvrie , 2009
"... ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...l data, we present an algorithm for noise-robust speech analysis inspired by the early stages of the auditory cortex. Our algorithm is the most recent within a series of localized analysis techniques =-=[36, 35, 34]-=- which begins with an encoding based on sparse Gabor projections and ends with a local 2-D DCT representation. The technique can be thought of as imposing 26a form of localized smoothing, and we show...

FM FILTERS FOR MODULATION DOMAIN IMAGE PROCESSING

by Chuong T. Nguyen, Patrick A. Campbell, Joseph P. Havlicek
"... For the first time, we demonstrate modulation domain image filters that achieve perceptually motivated image processing goals by directly manipulating the FM functions in a multi-component AM-FM image model. The action of previous modulation domain filters has been limited to modification of the AM ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
For the first time, we demonstrate modulation domain image filters that achieve perceptually motivated image processing goals by directly manipulating the FM functions in a multi-component AM-FM image model. The action of previous modulation domain filters has been limited to modification of the AM functions based on the values of the AM and FM functions. This is because reconstruction of the modified phase from the filtered frequency modulation vectors was an unsolved problem. Here, we present two new algorithms capable of reconstructing the phase from the processed frequencies, one based on a least squares solution of the discrete Poisson equation with Neumann boundary condition and one based on cubic tensor product spline integration. New modulation domain FM filters are designed to modify both the orientations and magnitudes of the visually important emergent image frequency vectors. In our most dramatic example, we demonstrate an FM filter that autonomously changes the stripes on the pants in the well known Barbara image from vertical to horizontal. Index Terms — AM-FM image models, AM-FM image filters, modulation domain signal processing, multicomponent models
(Show Context)

Citation Context

...ction and image enhancement, texture-based stereopsis, fingerprint classification, content-based retrieval, and regeneration of occluded and damaged textures [5], as well as for human speech analysis =-=[6, 7]-=- and recently in infrared [8] and visible [9] target tracking. While modulation domain image models have proven useful in analysis applications, they have found relatively limited use in applications ...

Generalization and properties of the neural response

by Andre Yohannes Wibisono, Tomaso Poggio, Yohannes Wibisono , 2010
"... Hierarchical learning algorithms have enjoyed tremendous growth in recent years, with many new al-gorithms being proposed and applied to a wide range of applications. However, despite the apparent success of hierarchical algorithms in practice, the theory of hierarchical architectures remains at an ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Hierarchical learning algorithms have enjoyed tremendous growth in recent years, with many new al-gorithms being proposed and applied to a wide range of applications. However, despite the apparent success of hierarchical algorithms in practice, the theory of hierarchical architectures remains at an early stage. In this thesis we study the theoretical properties of hierarchical algorithms from a mathematical perspective. Our work is based on the framework of hierarchical architectures introduced by Smale et al. in the paper “Mathematics of the Neural Response”, Foundations of Computational Mathematics, 2010. We propose a generalized definition of the neural response and derived kernel that allows us to integrate
(Show Context)

Citation Context

...rithms have been applied to a wide range of problem domains, including image classification [29, 48, 59], image segmentation [58], action recognition from video sequences [17, 27], speech recognition =-=[9, 15, 38]-=-, dimensionality reduction [23, 52], robotics [20, 34], natural language processing [11, 13 42], language identification [62], and even seizure detection [40, 41], with encouraging results. Hierarchic...

Problem & Motivation Hierarchical Spectro-Temporal Models for Speech Recognition

by Jake Bouvrie, Tony Ezzat, Tomaso Poggio
"... We seek to explore computational approaches for audition that are inspired by computational visual neuroscience. In particular, we seek to leverage recent progress over the past few years in building a biologically-faithful hierarchical, feed-forward system for visual object recognition [13,14]. The ..."
Abstract - Add to MetaCart
We seek to explore computational approaches for audition that are inspired by computational visual neuroscience. In particular, we seek to leverage recent progress over the past few years in building a biologically-faithful hierarchical, feed-forward system for visual object recognition [13,14]. The system, which was designed to closely match the currently known feed-forward path in the ventral stream in visual cortex, processes 2-D images in a feed-forward, hierarchical way to determine the category and identity of a particular object within that image. The system is capable of recognizing the object in the image irrespective of variations in position, scale, orientation, and in the presence of clutter. Motivated by the success of our architecture for visual object recognition, we propose to explore a similar 2-D hierarchical, feed-forward architecture for auditory object recognition. In particular, we propose to explore whether such a system may be capable of achieving state-of-the-art phonetic recognition (with and without noise). In addition, since it is likely that similar cortical mechanisms are used in both vision and audition, we believe that some of these mechanisms, which are well-known in the vision community, can be used successfully in the auditory domain. Previous Work Recent work by a number of auditory neurophysiologists [15,9] indicates that there is a secondary level of auditory analysis in the auditory cortex (AI), in which cells in AI analyze and process elements of the underlying input auditory time-frequency “image”. Measurements of the so-called spectro-temporal receptive fields (STRFs) of cells in AI indicate that they can be tuned to different optimal frequencies, have different spectral scales, and also respond to different temporal rates. Several researches have begun to apply these recent developments in neuroscience to automatic speech recognition. Mesgarani and Shamma [10] have filtered spectrograms of speech sgnals with spectro-temporal kernels derived from recordings in primary auditory cortex of the ferret. Kleinschmidt et al. [8,7] have borrowed the STRF idea, and extracted localized spectro-temporal patterns by convolving speech spectrograms with Gabor functions. They then applied the resulting features to speech recognition tasks involving noisy spoken digits.

Generalization and Properties of the Neural Response

by Jake Bouvrie, Tomaso Poggio, Lorenzo Rosasco, Steve Smale, Andre Wibisono , 2010
"... Hierarchical learning algorithms have enjoyed tremendous growth in recent years, with many new algorithms being proposed and applied to a wide range of applications. However, despite the apparent success of hierarchical algorithms in practice, the theory of hierarchical architectures remains at an e ..."
Abstract - Add to MetaCart
Hierarchical learning algorithms have enjoyed tremendous growth in recent years, with many new algorithms being proposed and applied to a wide range of applications. However, despite the apparent success of hierarchical algorithms in practice, the theory of hierarchical architectures remains at an early stage. In this paper we study the theoretical properties of hierarchical algorithms from a mathematical perspective. Our work is based on the framework of hierarchical architectures introduced by Smale et al. in the paper “Mathematics of the Neural Response”, Foundations of Computational Mathematics, 2010. We propose a generalized definition of the neural response and derived kernel that allows us to integrate some of the existing hierarchical algorithms in practice into our framework. We then use this generalized definition to analyze the theoretical properties of hierarchical architectures. Our analysis focuses on three particular aspects of the hierarchy. First, we show that a wide class of architectures suffers from range compression; essentially, the derived kernel becomes increasingly saturated at each layer. Second, we show that the complexity of a linear architecture is constrained by the complexity of the first layer, and in some cases the architecture collapses into a single-layer linear computation. Finally, we characterize the discrimination and invariance properties
(Show Context)

Citation Context

...rithms have been applied to a wide range of problem domains, including image classification [29, 48, 59], image segmentation [58], action recognition from video sequences [17, 27], speech recognition =-=[9, 15, 38]-=-, dimensionality reduction [23, 52], robotics [20, 34], natural language processing [11, 42], language identification [63], and even seizure detection [40, 41], with encouraging results. Hierarchical ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University