Results 1 - 10
of
327
Wavelet-Based Texture Retrieval Using Generalized Gaussian Density and Kullback-Leibler Distance
- IEEE Trans. Image Processing
, 2002
"... We present a statistical view of the texture retrieval problem by combining the two related tasks, namely feature extraction (FE) and similarity measurement (SM), into a joint modeling and classification scheme. We show that using a consistent estimator of texture model parameters for the FE step fo ..."
Abstract
-
Cited by 101 (4 self)
- Add to MetaCart
We present a statistical view of the texture retrieval problem by combining the two related tasks, namely feature extraction (FE) and similarity measurement (SM), into a joint modeling and classification scheme. We show that using a consistent estimator of texture model parameters for the FE step followed by computing the Kullback--Leibler distance (KLD) between estimated models for the SM step is asymptotically optimal in term of retrieval error probability. The statistical scheme leads to a new wavelet-based texture retrieval method that is based on the accurate modeling of the marginal distribution of wavelet coefficients using generalized Gaussian density (GGD) and on the existence a closed form for the KLD between GGDs. The proposed method provides greater accuracy and flexibility in capturing texture information, while its simplified form has a close resemblance with the existing methods which uses energy distribution in the frequency domain to identify textures. Experimental results on a database of 640 texture images indicate that the new method significantly improves retrieval rates, e.g., from 65% to 77%, compared with traditional approaches, while it retains comparable levels of computational complexity.
Multiresolution markov models for signal and image processing
- Proceedings of the IEEE
, 2002
"... This paper reviews a significant component of the rich field of statistical multiresolution (MR) modeling and processing. These MR methods have found application and permeated the literature of a widely scattered set of disciplines, and one of our principal objectives is to present a single, coheren ..."
Abstract
-
Cited by 83 (11 self)
- Add to MetaCart
This paper reviews a significant component of the rich field of statistical multiresolution (MR) modeling and processing. These MR methods have found application and permeated the literature of a widely scattered set of disciplines, and one of our principal objectives is to present a single, coherent picture of this framework. A second goal is to describe how this topic fits into the even larger field of MR methods and concepts–in particular making ties to topics such as wavelets and multigrid methods. A third is to provide several alternate viewpoints for this body of work, as the methods and concepts we describe intersect with a number of other fields. The principle focus of our presentation is the class of MR Markov processes defined on pyramidally organized trees. The attractiveness of these models stems from both the very efficient algorithms they admit and their expressive power and broad applicability. We show how a variety of methods and models relate to this framework including models for self-similar and 1/f processes. We also illustrate how these methods have been used in practice. We discuss the construction of MR models on trees and show how questions that arise in this context make contact with wavelets, state space modeling of time series, system and parameter identification, and hidden
Continuous Probabilistic Transform for Voice Conversion
- IEEE Transactions on Speech and Audio Processing
, 1998
"... Abstract — Voice conversion, as considered in this paper, is defined as modifying the speech signal of one speaker (source speaker) so that it sounds as if it had been pronounced by a different speaker (target speaker). Our contribution includes the design of a new methodology for representing the r ..."
Abstract
-
Cited by 78 (3 self)
- Add to MetaCart
Abstract — Voice conversion, as considered in this paper, is defined as modifying the speech signal of one speaker (source speaker) so that it sounds as if it had been pronounced by a different speaker (target speaker). Our contribution includes the design of a new methodology for representing the relationship between two sets of spectral envelopes. The proposed method is based on the use of a Gaussian mixture model of the source speaker spectral envelopes. The conversion itself is represented by a continuous parametric function which takes into account the probabilistic classification provided by the mixture model. The parameters of the conversion function are estimated by least squares optimization on the training data. This conversion method is implemented in the context of the HNM (harmonic C noise model) system, which allows high-quality modifications of speech signals. Compared to earlier methods based on vector quantization, the proposed conversion scheme results in a much better match between the converted envelopes and the target envelopes. Evaluation by objective tests and formal listening tests shows that the proposed transform greatly improves the quality and naturalness of the converted speech signals compared with previous proposed conversion methods. I.
The effect of correlated variability on the accuracy of a population code
- Neural Computation
, 1999
"... We study the impact of correlated neuronal firing rate variability on the accuracy with which an encoded quantity can be extracted from a population of neurons. Contrary to a widespread belief, correlations in the variabilities of neuronal firing rates do not, in general, limit the increase in codin ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
We study the impact of correlated neuronal firing rate variability on the accuracy with which an encoded quantity can be extracted from a population of neurons. Contrary to a widespread belief, correlations in the variabilities of neuronal firing rates do not, in general, limit the increase in coding accuracy provided by using large populations of encoding neurons. Furthermore, in some cases, but not all, correlations improve the accuracy of a population code.
Interpreting neuronal population activity by reconstruction: unified framework with application to hippocampal place cells
- J. Neumphysiol
, 1998
"... such as the orientation of a line in the visual field or the location of Two main goals for reconstruction are approached in this the body in space are coded as activity levels in populations of neurons. Reconstruction or decoding is an inverse problem in which paper. The first goal is technical and ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
such as the orientation of a line in the visual field or the location of Two main goals for reconstruction are approached in this the body in space are coded as activity levels in populations of neurons. Reconstruction or decoding is an inverse problem in which paper. The first goal is technical and is exemplified by the the physical variables are estimated from observed neural activity. population vector method applied to motor cortical activities Reconstruction is useful first in quantifying how much information during various reaching tasks (Georgopoulos et al. 1986, 1989; about the physical variables is present in the population and, second, Schwartz 1994) and the template matching method applied to in providing insight into how the brain might use distributed represen- disparity selective cells in the visual cortex (Lehky and Sejnowtations in solving related computational problems such as visual ob- ski 1990) and hippocampal place cells during rapid learning of ject recognition and spatial navigation. Two classes of reconstruction place fields in a novel environment (Wilson and McNaughton methods, namely, probabilistic or Bayesian methods and basis func- 1993). In these examples, reconstruction extracts information tion methods, are discussed. They include important existing methods from noisy neuronal population activity and transforms it to a
A Framework for Speech Source Localization Using Sensor Arrays
, 1995
"... Electronically steerable arrays of microphones have avariety of uses in speech data ac-quisition systems. Applications include teleconferencing, speech recognition and speaker identification, sound capture in adverse environments, and biomedical devices for the hear-ing impaired. An array of microph ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
Electronically steerable arrays of microphones have avariety of uses in speech data ac-quisition systems. Applications include teleconferencing, speech recognition and speaker identification, sound capture in adverse environments, and biomedical devices for the hear-ing impaired. An array of microphones has a number of advantages over a single-microphone system. It may be electronically aimed to provide a high-quality signal from a desired source location while simultaneously attenuating interfering talkers and ambient noise, does not necessitate local placement of transducers or encumber the talker with a hand-held or head-mounted microphone, and does not require physical movement to alter its direction of reception. Additionally, it has capabilities that a single microphone does not; namely automatic detection, localization, and tracking of active talkers in its receptive area. A fundamental requirement of sensor array systems is the ability to locate and track a speech source. An accurate fix on the primary talker, as well as knowledge of any interfering talkers or coherent noise sources, is necessary to effectively steer the array. Source location data may also be used for purposes other than beamforming; e.g. aiming a camera in a video-conferencing system. In addition to high accuracy, the location estimator must be
Computational Auditory Scene Recognition
- In IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing
, 2001
"... v 1 ..."
Fundamental Performance Limits in Image Registration
- IEEE Transactions on Image Processing
, 2003
"... The task of image registration is fundamental in image processing. It often is a critical preprocessing step to many modern image processing and computer vision tasks, and many algorithms and techniques have been proposed to address the registration problem. Often, the performances of these techni ..."
Abstract
-
Cited by 31 (8 self)
- Add to MetaCart
The task of image registration is fundamental in image processing. It often is a critical preprocessing step to many modern image processing and computer vision tasks, and many algorithms and techniques have been proposed to address the registration problem. Often, the performances of these techniques have been presented using a variety of relative measures comparing different estimators, leaving open the critical question of overall optimality. In this paper, we present the fundamental performance limits for the problem of image registration as derived from the Cramer-Rao inequality. We compare experimental performance of several popular methods with respect to this performance bound, and explain the fundamental tradeoff between variance and bias inherent to the problem of image registration. In particular, we derive and explore the bias of the popular gradient-based estimator showing how widely used multiscale methods for improving performance can be explained with this bias expression. Finally, we present experimental simulations showing general rule-of-thumb performance limits for gradient-based image registration techniques.
A Closed-Form Location Estimator for Use with Room Environment Microphone Arrays
, 1997
"... Introduction Microphone-array systems can be used to determine the positions of active talkers and can be electronically steered to provide spatially-selective speech acquisition. Since it is steered electronically, a microphone-array's directivity pattern can be updated rapidly to follow a moving t ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Introduction Microphone-array systems can be used to determine the positions of active talkers and can be electronically steered to provide spatially-selective speech acquisition. Since it is steered electronically, a microphone-array's directivity pattern can be updated rapidly to follow a moving talker or to switch between several alternating or simultaneous talkers. These features make microphone-arrays a desirable alternative to single-microphone systems for hands-free speech acquisition, especially those involving multiple or moving sources. Furthermore, the ability of microphone-array systems to determine talker location makes them attractive for use in multimedia teleconferencing systems where the location of the talker can be used not only for steering the directivity of the microphonearray, but also for pointing cameras or determining binaural cues for stereo imaging. In microphone-array systems, a directly observable signal characteristic is the time difference of arr

