Results 1 
7 of
7
A vector Taylor series approach for environmentindependent speech recognition
 Proc. ICASSP96
, 1996
"... In this paper we introduce a new analytical approach to environment compensation for speech recognition. Previous attempts at solving analytically the problem of noisy speech recognition have either used an overlysimplified mathematical description of the effects of noise on the statistics of speec ..."
Abstract

Cited by 100 (19 self)
 Add to MetaCart
In this paper we introduce a new analytical approach to environment compensation for speech recognition. Previous attempts at solving analytically the problem of noisy speech recognition have either used an overlysimplified mathematical description of the effects of noise on the statistics of speech or they have relied on the availability of large environmentspecific adaptation sets. Some of the previous methods required the use of adaptation data that consists of simultaneouslyrecorded or “stereo ” recordings of clean and degraded speech. In this work we introduce the use of a Vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel. The VTS approach is computationally efficient. It can be applied either to the incoming speech feature vectors, or to the statistics representing these vectors. In the first case the speech is compensated and then recognized; in the second case HMM statistics are modified using the VTS formulation. Both approaches use only the actual speech segment being recognized to compute the parameters required for environmental compensation. We evaluate the performance of two implementations of VTS algorithms using the CMU SPHINXII system on the 100word alphanumeric CENSUS database and on the 1993 5000word ARPA Wall Street Journal database. Artificial white Gaussian noise is added to both databases. The VTS approaches provide significant improvements in recognition accuracy compared to previous algorithms. 1.
Cepstral compensation by polynomial approximation for environmentindependent speech recognition
 IN `INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING
, 1996
"... Speech recognition systems perform poorly on speech degraded by even simple effects such as linear filtering and additive noise. One possible solution to this problem is to modify the probability density function (PDF) of clean speech to account for the effects of the degradation. However, even for ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Speech recognition systems perform poorly on speech degraded by even simple effects such as linear filtering and additive noise. One possible solution to this problem is to modify the probability density function (PDF) of clean speech to account for the effects of the degradation. However, even for the case of linear filtering and additive noise, it is extremely difficult to do this analytically. Previously attempted analytical solutions to the problem of noisy speech recognition have either used an overlysimplified mathematical description of the effects of noise on the statistics of speech, or they have relied on the availability of large environmentspecific adaptation sets. Some of the previous methods required the use of adaptation data that consists of simultaneouslyrecorded or “stereo ” recordings of clean and degraded speech. In this paper we introduce an approximationbased method to compute the effects of the environment on the parameters of the PDF of clean speech. In this work, we perform compensation by Vector Polynomial approximationS (VPS) for the effects of linear filtering and additive noise on the clean speech. We also estimate the parameters of the environment, namely the noise and the channel, by using piecewiselinear approximations of these effects. We evaluate the performance of this method (VPS) using the CMU SPHINXII system and the 100word alphanumeric CENSUS database. Performance is evaluated at several SNRs, with artificial white Gaussian noise added to the database. VPS provides improvements of up to 15 percent in relative recognition accuracy. 1.
Cepstral compensation using statistical linearization.
 ESCANATO TUTORIAL AND RESEARCH WORKSHOP IN ROBUST SPEECH RECOGNITION USING UNKNOWN COMMUNICATION CHANNELS
, 1997
"... Speech recognition systems perform poorly on speech degraded by even simple effects such as linear filtering and additive noise. One solution to this problem is to modify the probability density function (PDF) of clean speech to account for the effects of the degradation. However, even for the case ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Speech recognition systems perform poorly on speech degraded by even simple effects such as linear filtering and additive noise. One solution to this problem is to modify the probability density function (PDF) of clean speech to account for the effects of the degradation. However, even for the case of linear filtering and additive noise, it is extremely difficult to do this analytically. Previouslyattempted analytical solutions for the problem of noisy speech recognition have either used an overlysimplified mathematical description of the effects of noise on the statistics of speech, or they have relied on the availability of large environmentspecific adaptation sets. In this paper we present the Vector Polynomial approximationS (VPS) method to compensate for the effects of linear filtering and additive noise on the PDF of clean speech. VPS also estimates the parameters of the environment, namely the noise and the channel, by using statistically linearized approximations of these effects. We evaluate the performance of this method (VPS) using the CMU SPHINXII system on the alphanumeric CENSUS database corrupted with artificial white Gaussian noise. VPS provides improvements of up to 15 percent in relative recognition accuracy over our previous best algorithm, VTS, while being up to 20 percent more computationally efficient. 1.
Modelling, estimating and compensating lowbit rate coding distortion in speech recognition
 IEEE Trans. on SAP
, 2002
"... A solution to the problem of speech recognition with signals distorted by lowbit rate coders is presented in this paper. A model for the codingdecoding distortion, a HMM compensation method to include this model, and an EMbased adaptation algorithm to estimate this distortion are proposed here. M ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
A solution to the problem of speech recognition with signals distorted by lowbit rate coders is presented in this paper. A model for the codingdecoding distortion, a HMM compensation method to include this model, and an EMbased adaptation algorithm to estimate this distortion are proposed here. Medium vocabulary continuousspeech speakerindependent recognition experiments with 8 kbps G.729(CSCELP), 13 kbps RPELTP (GSM), 5.3 kbps G723.1, 4.8 kbps FS1016 and 32 kbps G.726(ADPCM) coders show that the approach described in this paper is able to dramatically reduce the effect of the coding distortion and, in some cases, gives a word accuracy higher than the baseline system with uncoded speech. Finally, the EM estimation algorithm requires only one adapting utterance and the approach described is certainly The evolution and popularity of cellular and TCP/IP networks has created the problem of improving the recognition accuracy for speech distorted by lowbit rate coders. The distortion of coding schemes in speech recognizers is difficult to model and is an open problem that cannot be solved by applying conventional noise cancelling techniques [1] such as spectral subtraction [2], cepstral mean subtraction [3] and RASTA
Approaches to Environment Compensation in Automatic Speech Recognition
 Proceeding of the 1995 International Conference in Acoustics ICA'95
, 1995
"... This paper describes a series of cepstralbased compensation procedures that render the SPHINXII continuous speech recognition system more robust with respect to acoustical changes in the environment. The first two algorithms, SNR based MultivaRiate gAussian based cepsTral normaliZation (SNRbased ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper describes a series of cepstralbased compensation procedures that render the SPHINXII continuous speech recognition system more robust with respect to acoustical changes in the environment. The first two algorithms, SNR based MultivaRiate gAussian based cepsTral normaliZation (SNRbased RATZ) and STAtistical Reestimation of HMMs (STAR), compensate for environmental degradation based on comparisons of simultaneouslyrecorded data in the training and testing environments (“stereo data”). They differ in that RATZ modifies the incoming feature vectors to a recognition system while STAR modifies the internal representation of speech by the system. We also describe NCDCN, an improved version of codeworddependent cepstral normalization (CDCN) which does not require stereo training data but nevertheless achieves performance levels comparable to RATZ and other algorithms that require stereo training. Use of these compensation algorithms significantly reduces the error rates for SPHINXII. The algorithms are tested in a variety of databases and environmental conditions.
Continuous Recognition of LargeVocabulary TelephoneQuality Speech
 Proceedings of the ARPA Workshop on Spoken Language Technology
, 1994
"... The problem of speech recognition over telephone lines is growing in importance, as many nearterm applications of spokenlanguage processing are likely to involve telephone speech. This paper describes recent efforts by the CMU speech group to improve the recognition accuracy of telephonechannel s ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The problem of speech recognition over telephone lines is growing in importance, as many nearterm applications of spokenlanguage processing are likely to involve telephone speech. This paper describes recent efforts by the CMU speech group to improve the recognition accuracy of telephonechannel speech, particularly in the context of the 1994 ARPA common Hub 2 evaluation of speech over longdistance telephone lines. The greatest amount of work was directed toward determining a training procedure that provides the greatest recognition accuracy when the incoming speech is known to be collected over the telephone. We compare the effectiveness of three training procedures, finding that training using highquality speech that is bandlimited to 8 kHz can achieve results that are as good as those obtained by training on speech of a similar bandwidth collected over actual telephone channels. We also compare the recognition accuracy of the SPHINXII system using highquality speech and telephone speech, and we comment on the reasons for differences in system performance. 1.
ISCA Archive STRUCTUREBASED COMPENSATION USING AN IMPROVED STATISTICAL LINEAR APPROXIMATION FOR MANDARIN SPEECH RECOGNITION OVER TELEPHONE
"... In this paper, a Vector Piecewise Polynomial (VPP) approximation algorithm is proposed for robust speech recognition in telecommunication environments. The method is formulated in a statistical framework in order to perform the optimal compensation of noise effect given the observed noisy speech, a ..."
Abstract
 Add to MetaCart
In this paper, a Vector Piecewise Polynomial (VPP) approximation algorithm is proposed for robust speech recognition in telecommunication environments. The method is formulated in a statistical framework in order to perform the optimal compensation of noise effect given the observed noisy speech, a model describing the statistics of the speech recorded in clean reference environment and the estimation of the noisy recognition environment. The VPP algorithm is an extension of P.J.Moreno’s Vector Taylor Series (VTS) approximations for dealing with the distortion due to channel effects and background noise. We use a piecewise polynomial, namely two linear polynomials and a quadratic polynomial, to approximate the environment function (f(v)). Moreno replaced f(v) by its vector Taylor series approximation. It is well known that VTS is not precise if variables (v) are not close to the Taylor expansion points (v0). The VPP algorithm can overcome this defect. In addition, VPP estimates the parameters of the environment by the expectationmaximization (EM) algorithm. Experimental results are presented in the paper on the application of this approach in improving the performance of Mandarin large vocabulary continuous speech recognition (LVCSR) due to different transmission channels (Such as fixed telephone line and GSM) and the background noise. The proposed VPP algorithm is found to converge fast. The method can reduce the average character error rate (CER) by about 12 %. 1.