Signal modeling techniques in speech recognition
 PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or timederivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or timederivative, spectral information, have become common. Second, similariry transform techniques, often used to normalize and decorrelate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closedloop manner. In this paper, we review the signal processing components of these algorithms. These algorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in stateoftheart speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
Speech Recognition in Noisy Environments
 Ph. D. Dissertation, ECE Department, CMU
, 1996
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1. Thesis goals . . . . . . . . . . . . . . . . . . . . ...
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.1. Thesis goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 2 The SPHINXII Recognition System . . . . . . . . . . . . . . . . . . . . . . 17 2.1. An Overview of the SPHINXII System . . . . . . . . . . . . . . . . . . 17 2.1.1. Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.2. Hidden Markov Models . . . . . . . . . . . . . . . . . . . . . . 20 2.1.3. Recognition Unit . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.4. Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.1.5. Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2. Experimental Tasks and Corpora . ...
Bark and ERB Bilinear Transforms
, 1999
"... Use of a bilinear conformal map to achieve a frequency warping nearly identical to that of the Bark frequency scale is described. Because the map takes the unit circle to itself, its form is that of the transfer function of a firstorder allpass filter. Since it is a firstorder map, it preserves th ..."
Use of a bilinear conformal map to achieve a frequency warping nearly identical to that of the Bark frequency scale is described. Because the map takes the unit circle to itself, its form is that of the transfer function of a firstorder allpass filter. Since it is a firstorder map, it preserves the model order of rational systems, making it a valuable frequency warping technique for use in audio filter design. A closedform weightedequationerror method is derived which computes the optimal mapping coefficient as a function of sampling rate, and the solution is shown to be generally indistinguishable from the optimal leastsquares solution. The optimal Chebyshev mapping is also found to be essentially identical to the optimal leastsquares solution. The expression...
Survey of the state of the art in human language technology
 Studies In Natural Language Processing, XIIXIII
, 1997
A Convex Optimization Approach to the Rational Covariance Extension Problem
 SIAM J. Control Optim
, 1999
"... In this paper we present a convex optimization problem for solving the rational covariance extension problem. Given a partial covariance sequence and the desired zeros of the modeling filter, the poles are uniquely determined from the unique minimum of the corresponding optimization problem. In this ..."
In this paper we present a convex optimization problem for solving the rational covariance extension problem. Given a partial covariance sequence and the desired zeros of the modeling filter, the poles are uniquely determined from the unique minimum of the corresponding optimization problem. In this way we obtain an algorithm for solving the covariance extension problem, as well as a constructive proof of Georgiou's seminal existence result and his conjecture, a stronger version of which we have resolved in [7]. K3 words. rational covariance extension, partial stochastic realization, trigonometric moment problem, spectral estimation, speech processing, stochastic modeling AMS subject classifications.30ERR 60G35, 62M15, 93A30,93E0 1.
Computational Auditory Scene Recognition
 In IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing
, 2001
Reducing Audible Spectral Discontinuities
, 2001
"... In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon. We first set out to find an objective spectral measure for disconti ..."
In this paper, a common problem in diphone synthesis is discussed, viz., the occurrence of audible discontinuities at diphone boundaries. Informal observations show that spectral mismatch is most likely the cause of this phenomenon. We first set out to find an objective spectral measure for discontinuity. To this end, several spectral distance measures are related to the results of a listening experiment. Then, we studied the feasibility of extending the diphone database with contextsensitive diphones to reduce the occurrence of audible discontinuities. The number of additional diphones is limited by clustering consonant contexts that have a similar effect on the surrounding vowels on the basis of the best performing distance measure. A listening experiment has shown that the addition of these contextsensitive diphones significantly reduces the amount of audible discontinuities. Index TermsAudible discontinuities, contextsensitive diphones, spectral distance measures. I. INTROD...
A generalization of blind source separation algorithms for convolutive mixtures based on secondorder statistics
 IEEE TRANS. SPEECH AUDIO PROCESSING
, 2005
"... In this paper, we present a general broadband approach to blind source separation (BSS) for convolutive mixtures based on secondorder statistics. This avoids several known limitations of the conventional narrowband approximation, such as the internal permutation problem. In contrast to traditional ..."
In this paper, we present a general broadband approach to blind source separation (BSS) for convolutive mixtures based on secondorder statistics. This avoids several known limitations of the conventional narrowband approximation, such as the internal permutation problem. In contrast to traditional narrowband approaches, the new framework simultaneously exploits the nonwhiteness property and nonstationarity property of the source signals. Using a novel matrix formulation, we rigorously derive the corresponding timedomain and frequencydomain broadband algorithms by generalizing a known costfunction which inherently allows joint optimization for several timelags of the correlations. Based on the broadband approach timedomain, constraints are obtained which provide a deeper understanding of the internal permutation problem in traditional narrowband frequencydomain BSS. For both the timedomain and the frequencydomain versions, we discuss links to wellknown, and also, to novel algorithms that constitute special cases. Moreover, using the socalled generalized coherence, links between the timedomain and the frequencydomain algorithms can be established, showing that our cost function leads to an update equation with an inherent normalization ensuring a robust adaptation behavior. The concept is applicable to offline, online, and blockonline algorithms by introducing a general weighting function allowing for tracking of timevarying real acoustic environments.
Generalized Stochastic Subdivision
 ACM Transactions on Graphics
, 1987
"... This paper describes the basis for techniques such as stochastic subdivision in the theory of random processes and estimation theory. The popular stochastic subdivision construction is then generalized to provide control of the autocorrelation and spectral properties of the synthesized random functi ..."
This paper describes the basis for techniques such as stochastic subdivision in the theory of random processes and estimation theory. The popular stochastic subdivision construction is then generalized to provide control of the autocorrelation and spectral properties of the synthesized random functions. The generalized construction is suitable for generating a variety of perceptually distinct highquality random functions, including those with nonfractal spectra and directional or oscillatory characteristics. It is argued that a spectral modeling approach provides a more powerful and somewhat more intuitive perceptual characterization of random processes than does the fractal model. Synthetic textures and terrains are presented as a means of visually evaluating the generalized subdivision technique. Categories and Subject Descriptors: I.3.3 [Computer Graphics]: Picture/Image Generation; I.3.7 [Computer Graphics]: Three Dimensional Graphics and Realism <F11.