Results 1 - 10
of
132
Signal modeling techniques in speech recognition
- PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract
-
Cited by 99 (5 self)
- Add to MetaCart
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to normalize and decor-relate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closed-loop manner. In this paper, we review the signal processing components of these algorithms. These al-gorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in state-of-the-art speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
Speech Recognition in Noisy Environments
- Ph. D. Dissertation, ECE Department, CMU
, 1996
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1. Thesis goals . . . . . . . . . . . . . . . . . . . . . ..."
Abstract
-
Cited by 72 (3 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1. Thesis goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 2 The SPHINX-II Recognition System . . . . . . . . . . . . . . . . . . . . . . 17 2.1. An Overview of the SPHINX-II System . . . . . . . . . . . . . . . . . . 17 2.1.1. Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.2. Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . 20 2.1.3. Recognition Unit . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.4. Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.5. Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2. Experimental Tasks and Corpora . ...
Survey of the State of the Art in Human Language Technology
, 1995
"... Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Sig ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
Contents 1 Spoken Language Input 1 Ron Cole & Victor Zue, chapter editors 1.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 Victor Zue & Ron Cole 1.2 Speech Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 Victor Zue, Ron Cole, & Wayne Ward 1.3 Signal Representation : : : : : : : : : : : : : : : : : : : : : : : : : : 11 Melvyn J. Hunt 1.4 Robust Speech Recognition : : : : : : : : : : : : : : : : : : : : : : 17 Richard M. Stern 1.5 HMM Methods in Speech Recognition : : : : : : : : : : : : : : : 24 Renato De Mori & Fabio Brugnara 1.6 Language Representation : : : : : : : : : : : : : : : : : : : : : : : : 35 Salim Roukos 1.7 Speaker Recognition : : : : : : : : : : : : : : : : : : : : : : : : : : :<F35.37
Bark and ERB Bilinear Transforms
, 1999
"... Use of a bilinear conformal map to achieve a frequency warping nearly identical to that of the Bark frequency scale is described. Because the map takes the unit circle to itself, its form is that of the transfer function of a first-order allpass filter. Since it is a first-order map, it preserves th ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
Use of a bilinear conformal map to achieve a frequency warping nearly identical to that of the Bark frequency scale is described. Because the map takes the unit circle to itself, its form is that of the transfer function of a first-order allpass filter. Since it is a first-order map, it preserves the model order of rational systems, making it a valuable frequency warping technique for use in audio filter design. A closed-form weighted-equation-error method is derived which computes the optimal mapping coefficient as a function of sampling rate, and the solution is shown to be generally indistinguishable from the optimal least-squares solution. The optimal Chebyshev mapping is also found to be essentially identical to the optimal least-squares solution. The expression...
A Convex Optimization Approach to the Rational Covariance Extension Problem
- SIAM J. Control Optim
, 1999
"... In this paper we present a convex optimization problem for solving the rational covariance extension problem. Given a partial covariance sequence and the desired zeros of the modeling filter, the poles are uniquely determined from the unique minimum of the corresponding optimization problem. In this ..."
Abstract
-
Cited by 40 (22 self)
- Add to MetaCart
In this paper we present a convex optimization problem for solving the rational covariance extension problem. Given a partial covariance sequence and the desired zeros of the modeling filter, the poles are uniquely determined from the unique minimum of the corresponding optimization problem. In this way we obtain an algorithm for solving the covariance extension problem, as well as a constructive proof of Georgiou's seminal existence result and his conjecture, a stronger version of which we have resolved in [7]. K3 words. rational covariance extension, partial stochastic realization, trigonometric moment problem, spectral estimation, speech processing, stochastic modeling AMS subject classifications.30ERR 60G35, 62M15, 93A30,93E0 1.
Computational Auditory Scene Recognition
- In IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing
, 2001
"... v 1 ..."
Reducing Audible Spectral Discontinuities
, 2001
"... In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon. We first set out to find an objective spectral measure for disconti ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon. We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. Then, we studied the feasibility of extending the diphone database with context-sensitive diphones to reduce the occurrence of audible discontinuities. The number of additional diphones is limited by clustering consonant contexts that have a similar effect on the surrounding vowels on the basis of the best performing distance measure. A listening experiment has shown that the addition of these context-sensitive diphones significantly reduces the amount of audible discontinuities. Index Terms---Audible discontinuities, context-sensitive diphones, spectral distance measures. I. INTROD...
Generalized Stochastic Subdivision
- ACM Transactions on Graphics
, 1987
"... This paper describes the basis for techniques such as stochastic subdivision in the theory of random processes and estimation theory. The popular stochastic subdivision construction is then generalized to provide control of the autocorrelation and spectral properties of the synthesized random functi ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
This paper describes the basis for techniques such as stochastic subdivision in the theory of random processes and estimation theory. The popular stochastic subdivision construction is then generalized to provide control of the autocorrelation and spectral properties of the synthesized random functions. The generalized construction is suitable for generating a variety of perceptually distinct high-quality random functions, including those with non-fractal spectra and directional or oscillatory characteristics. It is argued that a spectral modeling approach provides a more powerful and somewhat more intuitive perceptual characterization of random processes than does the fractal model. Synthetic textures and terrains are presented as a means of visually evaluating the generalized subdivision technique. Categories and Subject Descriptors: I.3.3 [Computer Graphics]: Picture/Image Generation; I.3.7 [Computer Graphics]: Three Dimensional Graphics and Realism -<F11.
A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics
- IEEE TRANS. SPEECH AUDIO PROCESSING
, 2005
"... In this paper, we present a general broadband approach to blind source separation (BSS) for convolutive mixtures based on second-order statistics. This avoids several known limitations of the conventional narrowband approximation, such as the internal permutation problem. In contrast to traditional ..."
Abstract
-
Cited by 31 (15 self)
- Add to MetaCart
In this paper, we present a general broadband approach to blind source separation (BSS) for convolutive mixtures based on second-order statistics. This avoids several known limitations of the conventional narrowband approximation, such as the internal permutation problem. In contrast to traditional narrowband approaches, the new framework simultaneously exploits the nonwhiteness property and nonstationarity property of the source signals. Using a novel matrix formulation, we rigorously derive the corresponding time-domain and frequency-domain broadband algorithms by generalizing a known cost-function which inherently allows joint optimization for several time-lags of the correlations. Based on the broadband approach time-domain, constraints are obtained which provide a deeper understanding of the internal permutation problem in traditional narrowband frequency-domain BSS. For both the time-domain and the frequency-domain versions, we discuss links to well-known, and also, to novel algorithms that constitute special cases. Moreover, using the so-called generalized coherence, links between the time-domain and the frequency-domain algorithms can be established, showing that our cost function leads to an update equation with an inherent normalization ensuring a robust adaptation behavior. The concept is applicable to offline, online, and block-online algorithms by introducing a general weighting function allowing for tracking of time-varying real acoustic environments.

