Results 1 - 10
of
13
Signal modeling techniques in speech recognition
- PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract
-
Cited by 99 (5 self)
- Add to MetaCart
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to normalize and decor-relate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closed-loop manner. In this paper, we review the signal processing components of these algorithms. These al-gorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in state-of-the-art speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
Multiresolution spectrotemporal analysis of complex sounds
- J Acoust Soc Am
"... A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system. The model provides a unified multiresolution representation of the spectral and temporal features of sound likely critical ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
A computational model of auditory analysis is described that is inspired by psychoacoustical and neurophysiological findings in early and central stages of the auditory system. The model provides a unified multiresolution representation of the spectral and temporal features of sound likely critical in the perception of timbre. Several types of complex stimuli are used to demonstrate the spectrotemporal information extracted and represented by the model. Also outlined are several reconstruction algorithms to resynthesize the sound so as to evaluate the fidelity of the representation and contribution of different features and cues to the sound percept. Simplified versions of this model representations have already been used in a variety of applications, as in the assessment of speech intelligibility [Elhilali et al., 2003, Chi et al., 1999] and in explaining the perception of monaural phase sensitivity [Carlyon and Shamma, 2002]. 1 1.
The Use of Distinctive Features for Automatic Speech Recognition
, 1991
"... One of the most critical and yet unsolved problems in phonetic recognition is the transformation of the continuous speech signal to a discrete representation for accessing words in the lexicon. In order to find an efficient description of speech for recognition tasks, our research investigates the u ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
One of the most critical and yet unsolved problems in phonetic recognition is the transformation of the continuous speech signal to a discrete representation for accessing words in the lexicon. In order to find an efficient description of speech for recognition tasks, our research investigates the use of distinctive features. Distinctive features are a small set of linguistic units which have the potential advantage of enabling us to describe contextual and coarticulatory variations in speech more parsimoniously and thus make more effective use of available training data.
Cochlear Models Implemented with Linearized Transconductors
- In IEEE Int. Symp. Circ. and Syst., Atlanta GA
, 1996
"... The aim of this work is the efficient implementation of linear continuous-time cochlear models, such as that proposed by Liu [11, 12]. The basic filter element, a transconductance-C integrator with no linearization, is evaluated in terms of dynamic range and power consumption. Linearized transconduc ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The aim of this work is the efficient implementation of linear continuous-time cochlear models, such as that proposed by Liu [11, 12]. The basic filter element, a transconductance-C integrator with no linearization, is evaluated in terms of dynamic range and power consumption. Linearized transconductors which employ source degeneration via single and multiple diffusors yield no net increase in current noise density, whereas linear range is improved eight and four times, respectively. Experimental results verify this improvement.
Robust auditory-based speech processing using the ALSD
- IEEE Trans. Speech and Audio Proc
, 2002
"... endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must b ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
Variations on Statistical Phoneme Recognition -- A Hybrid Approach
, 1997
"... Automatic speech recognition (ASR) is rapidly becoming a mature technology leading to an increasing number of commercial applications. Although great advances have been made in the state of the art of speech recognition over the last 10 years, the holy grail of ASR, namely large vocabulary speaker ..."
Abstract
- Add to MetaCart
Automatic speech recognition (ASR) is rapidly becoming a mature technology leading to an increasing number of commercial applications. Although great advances have been made in the state of the art of speech recognition over the last 10 years, the holy grail of ASR, namely large vocabulary speaker independent continuous speech recognition with an error rate of less than 1%, still eludes researchers. At the heart of most modern speech recognition systems lies a HMM based phoneme recognition engine which segments and classifies the incoming acoustic signal into a sequence of phonemes. These phonemes are concatenated to form word models which are processed further to arrive at a transcription of the linguistic message encoded in the speech signal. The final recognition accuracy of the speech recognition system can thus be directly linked to the recognition accuracy of the underlying phoneme recogniser. Two types of features extracted from the speech signal is commonly used for phoneme recognition. These are the supra-segmental knowledge-based features derived from phonetic and phonologic theory, and the widely used frame-based cepstral features. Up till now, these features have been used separately by researchers, resulting in the loss of valuable discriminative information.
A Design Framework for LowPower Analog Filter Banks
- IEEE Trans. Circ. and Syst
, 1995
"... We detail the design of multiresolution analog #lter banks, linear models of cochlear function, with power dissipation being a prime engineering constraint. We propose that a reasonable goodness criterion is the information rate through the system, per Watt of power dissipated. Speech applications ..."
Abstract
- Add to MetaCart
We detail the design of multiresolution analog #lter banks, linear models of cochlear function, with power dissipation being a prime engineering constraint. We propose that a reasonable goodness criterion is the information rate through the system, per Watt of power dissipated. Speech applications requiring #lter banks with a wide frequency tuning range, from 20 Hz to 20 kHz, and low power consumption make the transconductance-C integrator in subthreshold CMOS the preferable integrator structure. As a way of example, the dynamic range of a lowpass #lter is computed and subsequently used to design a #lter bank that models faithfully cochlear micro-mechanics. The power consumption of the entire #lter bank is computedfrom analytical expressions and is estimated as 355 nW, at 68 kbits#sec overall information rate at the output of the system. 1
A Comparison of Front-Ends for Robust Speech Recognition
, 1998
"... Zero-crossings with peak amplitudes (ZCPA) model motivated by human auditory periphery was proposed to extract reliable features from speech signals even in noisy environments for robust speech recognition. In this paper, the performance of the ZCPA model is further improved by incorporating convent ..."
Abstract
- Add to MetaCart
Zero-crossings with peak amplitudes (ZCPA) model motivated by human auditory periphery was proposed to extract reliable features from speech signals even in noisy environments for robust speech recognition. In this paper, the performance of the ZCPA model is further improved by incorporating conventional speech processing techniques into the model output. Spectral and cepstral representations of the ZCPA model output are compared, and the incorporation of dynamic features with several different lengths of time-derivative window are evaluated. Also, comparative evaluations with other front-ends in real-world noisy environments are performed, and result in the superiority of the ZCPA model.
Practical Considerations for Hardware Implementations of the Auditory Model and Evaluations in Real World Noisy Environments
, 1997
"... Zero-Crossings with Peak Amplitudes (ZCPA) model motivated by human auditory periphery was proposed to extract reliable features from speech signals even in noisy environments for robust speech recognition. In this paper, some practical considerations for digital hardware implementations of the Z ..."
Abstract
- Add to MetaCart
Zero-Crossings with Peak Amplitudes (ZCPA) model motivated by human auditory periphery was proposed to extract reliable features from speech signals even in noisy environments for robust speech recognition. In this paper, some practical considerations for digital hardware implementations of the ZCPA model are addressed and evaluated for recognition of speech corrupted by several real world noises as well as white Gaussian noise. Infinite impulse response (IIR) filters which constitute the cochlear filterbank of the ZCPA are replaced by hamming bandpass filters of which frequency responses are less similar to biological neural tuning curves. Experimental results demonstrate that the detailed frequency response of the cochlear filters are not critical to the performance. Also, the sensitivity of the model output to the variations in microphone gain is investigated, and results in good reliability of the ZCPA model.
Development of a Computational Auditory Model
, 1991
"... This report is a summary of the work which I performed on cochlear modeling within the framework of a 2 year cooperation between ESAT-KULeuven and IPO-Eindhoven. This report gives a detailed overview of the development of a computational auditory model and of the obstacles that one can expect on the ..."
Abstract
- Add to MetaCart
This report is a summary of the work which I performed on cochlear modeling within the framework of a 2 year cooperation between ESAT-KULeuven and IPO-Eindhoven. This report gives a detailed overview of the development of a computational auditory model and of the obstacles that one can expect on the road towards it. For the casual reader some of the mathematics in it will be painful, but I thought it necessary to include as much detail as possible so that this work can serve as a good technical reference for further development. This report should be considered as a writeup on work in progress. Nevertheless the chapters on cochlear filterbanks and adaptation have reached a more or less finished form, while on the other hand the chapter on data representation and post processing leaves many questions unanswered. I hope to be able to continue work on this topic and present a more conclusive report at some point in the future

