Results 1 - 10
of
26
Signal modeling techniques in speech recognition
- PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract
-
Cited by 99 (5 self)
- Add to MetaCart
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to normalize and decor-relate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closed-loop manner. In this paper, we review the signal processing components of these algorithms. These al-gorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in state-of-the-art speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
Robust Feature-Estimation and Objective Quality Assessment for Noisy Speech Recognition using the Credit Card Corpus
, 1994
"... It is well known that the introduction of acoustic background distortion into speech causes recognition algorithms to fail. In order to improve the environmental robustness of speech recognition in adverse conditions, a novel constrained-iterative feature-estimation algorithm, which was previously f ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
It is well known that the introduction of acoustic background distortion into speech causes recognition algorithms to fail. In order to improve the environmental robustness of speech recognition in adverse conditions, a novel constrained-iterative feature-estimation algorithm, which was previously formulated for speech enhancement, is considered and shown to produce improved feature characterization in a variety of actual noise conditions such as computer fan, large crowd, and voice communications channel noise. In addition, an objective measure based MAP estimator is formulated as a means of predicting changes in robust recognition performance at the speech feature extraction stage. The four measures considered include (i) NIST SNR, (ii) Itakura-Saito log-likelihood, (iii) log-area-ratio, and (iv) the weighted-spectral slope measure. A continuous distribution, monophone based, hidden Markov model recognition algorithm is used for objective measure based MAP estimator analysis and reco...
Morphological Constrained Feature Enhancement with Adaptive Cepstral Compensation (MCE-ACC) for Speech Recognition in Noise and Lombard Effect
, 1994
"... The use of present day speech recognition techniques in many practical applications has demonstrated the need for improved algorithm formulation under varying acoustical environments. This paper describes a low-vocabulary speech recognition algorithm which provides robust performance in noisy enviro ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The use of present day speech recognition techniques in many practical applications has demonstrated the need for improved algorithm formulation under varying acoustical environments. This paper describes a low-vocabulary speech recognition algorithm which provides robust performance in noisy environments with particular emphasis on characteristics due to Lombard effect. A neutral and stressed based source generator framework is established to achieve improved speech parameter characterization using a morphological constrained enhancement algorithm and stressed source compensation which is unique for each source generator across a stressed speaking class. The algorithm uses a noise adaptive boundary detector to obtained a sequence of source generator classes, which is used to direct noise parameter enhancement and stress compensation. This allows the parameter enhancement and stress compensation schemes to adapt to changing speech generator types. A phonetic consistency rule is also em...
Noisy Audio Feature Enhancement Using Audio-Visual Speech Data
"... We investigate improving automatic speech recognition (ASR) in noisy conditions by enhancing noisy audio features using visual speech captured from the speaker's face. The enhancement is achieved by applying a linear filter to the concatenated vector of noisy audio and visual features, obtained by ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We investigate improving automatic speech recognition (ASR) in noisy conditions by enhancing noisy audio features using visual speech captured from the speaker's face. The enhancement is achieved by applying a linear filter to the concatenated vector of noisy audio and visual features, obtained by mean square error estimation of the clean audio features in a training stage. The performance of the enhanced audio features is evaluated on two ASR tasks: A connected digits task and speaker-independent, largevocabulary, continuous speech recognition. In both cases and at sufficiently low signal-to-noise ratios (SNRs), ASR trained on the enhanced audio features significantly outperforms ASR trained on the noisy audio, achieving for example a 46% relative reduction in word error rate on the digits task at-3.5 dB SNR. However, the method fails to capture the full visual modality benefit to ASR, as demonstrated by its comparison to discriminant audio-visual feature fusion introduced in previous work.
Analog circuit implementation for speech enhancement purposes
- 38th Asilomar Conference on Circuits, Systems and Computers
, 2004
"... Abstract — Human speech is the main method for personal communication. However, interfering noise could degrade the intelligibility of speech, eventually resulting in errors. Thus, efficient speech enhancement algorithms are needed for example in hand held battery powered hearing aids. This paper pr ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract — Human speech is the main method for personal communication. However, interfering noise could degrade the intelligibility of speech, eventually resulting in errors. Thus, efficient speech enhancement algorithms are needed for example in hand held battery powered hearing aids. This paper presents an implementation of a time domain method for speech enhancement purposes; the Adaptive Gain Equalizer. The implementation is carried out on a printed circuit board using common analog electronic components, and evaluated in real-time. The proposed solution benefits from high system bandwidth, it neither quantizes nor digitalizes data, and it is likely to have more efficient power consumption as opposed to many Digital Signal Processor (DSP) based solutions. The evaluation proves the speech enhancement performance of the analog circuit implementation. I.
Speech Enhancement Implementations in the Digital, Analog, and Hybrid Domain
"... The general quality of speech or important speech parameters such as the intelligibility, clearness or naturalness of speech can be emphasized by signal processing. Such processing for improving speech quality can be found in telecommunication applications, e.g. mobile telephony, internet telephony ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The general quality of speech or important speech parameters such as the intelligibility, clearness or naturalness of speech can be emphasized by signal processing. Such processing for improving speech quality can be found in telecommunication applications, e.g. mobile telephony, internet telephony or personal intercom. By careful selection of domain for realization, i.e. digital, analog, or hybrid, implementation specific benefits can be utilized to increase the speech quality or performance. This paper stresses some key characteristics of the three implementation domains with emphasis on speech enhancement applications. A robust, low complexity, speech enhancement algorithm will be highlighted to illustrate the advantages (and disadvantages) of a purely digital, a purely analog, and a hybrid digital-analog implementation.
Information fusion for subband-HMM speaker recognition
- In: Proc
"... Previous work has demonstrated the performance gains that can be obtained in speaker recognition by apply-ing subband processing, together with hidden Markov modelling and multiple classifier recombination. Two recombination rules have been investigated: the sum of log likelihoods, which corresponds ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Previous work has demonstrated the performance gains that can be obtained in speaker recognition by apply-ing subband processing, together with hidden Markov modelling and multiple classifier recombination. Two recombination rules have been investigated: the sum of log likelihoods, which corresponds to the optimal Bayes ’ rule under certain constraints, and multilayer perceptrons (MLP), which are not subject to these con-straints. It was found that for two spoken digits in the presence of a single case of narrowband noise the sum of log likelihoods and MLP achieved comparable per-formance. In this paper, the previous work is extended in the direction of investigating the robustness of the recognition system to different narrowband noise. Two approaches are taken towards this aim. Firstly, nar-rowband noise is added at different centre frequencies. Secondly, a Bayesian MLP approach is investigated us-ing automatic relevance determination (ARD) on the subband inputs to the MLP. From this it is possi-ble to assess the relative importance of the subbands to recognition performance. Results for the new noise conditions show that the sum of log likelihoods gener-ally does better than the (average) MLP fusion. 1
A Gaussian Mixture Model Spectral Representation for Speech Recognition
"... Summary Most modern speech recognition systems use either Mel-frequency cepstral coefficients or per-ceptual linear prediction as acoustic features. Recently, there has been some interest in alter-native speech parameterisations based on using formant features. Formants are the resonant frequencies ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Summary Most modern speech recognition systems use either Mel-frequency cepstral coefficients or per-ceptual linear prediction as acoustic features. Recently, there has been some interest in alter-native speech parameterisations based on using formant features. Formants are the resonant frequencies in the vocal tract which form the characteristic shape of the speech spectrum. How-ever, formants are difficult to reliably and robustly estimate from the speech signal and in some cases may not be clearly present. Rather than estimating the resonant frequencies, formant-like features can be used instead. Formant-like features use the characteristics of the spectral peaks to represent the spectrum. In this work, novel features are developed based on estimating a Gaussian mixture model (GMM) from the speech spectrum. This approach has previously been used sucessfully as a speech codec. The EM algorithm is used to estimate the parameters of the GMM. The extracted parameters: the means, standard deviations and component weights can be related to the for-mant locations, bandwidths and magnitudes. As the features directly represent the linear spec-trum, it is possibly to apply techniques for vocal tract length normalisation and additive noise
Efficient high-order hidden Markov modelling
- in Proceedings of the International Conference on Spoken Language Processing
, 1998
"... I, the undersigned, hereby declare that the work contained in this dissertation is my own original work and that I have not previously in its entirety or in part submitted it at any university for a degree. Signature: Date: ii Currently, first-order hidden Markov models (HMMs) form the backbone arou ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
I, the undersigned, hereby declare that the work contained in this dissertation is my own original work and that I have not previously in its entirety or in part submitted it at any university for a degree. Signature: Date: ii Currently, first-order hidden Markov models (HMMs) form the backbone around which most automatic speech processing applications are built. Their higher-order extensions are known to be more powerful, but, due to their complexity and computational demands, they are seldomly used. It is the purpose of this work to advance their application In this work we unify HMMs of all orders by deriving and proving the ORder rEDucing (ORED) algorithm. This algorithm will reduce any higher-order HMM (also mixed-order) to an equivalent first-order representation. This makes it possible to process any higher-order HMM using known first-order algorithms, thereby
Markov Model Based Phoneme Class Partitioning for Improved Constrained Iterative Speech Enhancement
, 1995
"... Research has shown that degrading acoustic background noise influences speech quality across phoneme classes in a non-uniform manner. This results in variable quality performance of many speech enhancement algorithms in noisy environments. A phoneme classification procedure is proposed which directs ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Research has shown that degrading acoustic background noise influences speech quality across phoneme classes in a non-uniform manner. This results in variable quality performance of many speech enhancement algorithms in noisy environments. A phoneme classification procedure is proposed which directs single-channel constrained speech enhancement. The procedure performs broad phoneme class partitioning of noisy speech frames using a continuous mixture hidden Markov model recognizer in conjunction with a perceptually motivated cost-based decision process. Once noisy speech frames are identified, iterative speech enhancement based on all-pole parameter estimation with inter- and intra-frame spectral constraints is employed. The phoneme class directed enhancement algorithm is evaluated using TIMIT speech data and shown to result in substantial improvement in objective speech quality over a range of signal-to-noise ratios and individual phoneme classes. Mail All Correspondence To: Prof. John...

