## Connected Digit Recognition Using Statistical Template Matching (1995)

Venue: | Proc. 1995 Europ. Conf. on Speech Communication and Technology |

Citations: | 10 - 9 self |

### BibTeX

@INPROCEEDINGS{Welling95connecteddigit,

author = {L. Welling and H. Ney and A. Eiden and C. Forbrig},

title = {Connected Digit Recognition Using Statistical Template Matching},

booktitle = {Proc. 1995 Europ. Conf. on Speech Communication and Technology},

year = {1995},

pages = {1483--1486}

}

### OpenURL

### Abstract

In this paper we describe the optimization of 'conventional ' template matching techniques for connected digit recognition (TI/NIST connected digit corpus). In particular we carried out a series of experiments in which we studied various aspects of signal processing, acoustic modeling, mixture densities and linear transforms of the acoustic vector. After all optimization steps, our best string error rate on the TI/NIST connected digit corpus was 1.71% for single densities and 0.74% for mixture densities. 1. INTRODUCTION Over the last five years much progress has been made in connected digit recognition [3, 7, 8, 9]. This paper describes how the systematic optimization of various components of a 'conventional' recognition system leads to high performance comparable with other systems that use much more complicated techniques. Experimental results on the adult corpus of the TI/NIST connected digit corpus are given. The optimization steps presented in this paper are: 1. Several methods f...

### Citations

4050 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...hitening transform and derivatives. sub/del/ins WER [%] SER [%] derivatives 108/49/22 0.63 1.77 whitening 97/62/13 0.60 1.71 4.4. Linear Discriminant Analysis Linear discriminant analysis (pp. 118 in =-=[1]-=- and pp. 445 in [5]) has already been successfully utilized for speech recognition [3, 8]. In our experiments with linear discriminant analysis (LDA) and mixture densities, 3 successive 48--component ... |

2775 |
Introduction to Statistical Pattern Recognition, 2nd edition
- Fukunaga
- 1990
(Show Context)
Citation Context ...tputs. Thus some correlations among the components of the acoustic vector remain after a cepstral decorrelation. These correlations can, on the average, be removed by a whitening transform (pp. 24 in =-=[5]-=-) based on a pooled covariance matrix of the spliced vector. The pooled covariance matrix was calculated as follows: 1. We performed a time alignment without a whitening transform. 2. The time alignme... |

799 | Comparison of parametric representation for monosyllable word recognition in continuously spoken sentences
- Davis, Mermelstein
- 1980
(Show Context)
Citation Context ...s are correlated. The covariance matrix of a vector consisting of the filter bank outputs has approximately Toeplitz form. Thus the filter bank outputs are decorrelated by a discrete cosine transform =-=[2]. M -=-= 16 cepstrum coefficients cm are computed from N = 20 filter bank outputs fn by cm = N X n=1 fn cos ` ��m(n \Gamma 0:5) N ' ; 0sm ! M: 1 2 20 3 19 Frequency response magnitude Mel (f) Figure 1: F... |

41 |
Highperformance connected digit recognition using maximum mutual information estimation
- Normandin, Cardin, et al.
- 1994
(Show Context)
Citation Context ... the TI/NIST connected digit corpus was 1.71% for single densities and 0.74% for mixture densities. 1. INTRODUCTION Over the last five years much progress has been made in connected digit recognition =-=[3, 7, 8, 9]-=-. This paper describes how the systematic optimization of various components of a 'conventional' recognition system leads to high performance comparable with other systems that use much more complicat... |

19 |
Acoustic Modelling of Phoneme Units for Continuous Speech Recognition. Fifth European Signal Processing Conference
- Ney
- 1990
(Show Context)
Citation Context ...word models for 11 English digits including 'oh' and gender--dependent silence models, ffl 357 states plus 1 state for silence per gender, ffl maximum likelihood training in the Viterbi approximation =-=[4]-=-. 3. SIGNAL PROCESSING STEPS We conducted a series of experiments in which we investigated the effect of signal processing steps on the error rate. All experiments in this section were carried out wit... |

19 |
Improvement in connected digit recognition using linear discriminant analysis and mixture densities
- Haeb-Umbach, Geller, et al.
- 1993
(Show Context)
Citation Context ... the TI/NIST connected digit corpus was 1.71% for single densities and 0.74% for mixture densities. 1. INTRODUCTION Over the last five years much progress has been made in connected digit recognition =-=[3, 7, 8, 9]-=-. This paper describes how the systematic optimization of various components of a 'conventional' recognition system leads to high performance comparable with other systems that use much more complicat... |

17 |
Phonetically sensitive discriminants for improved speech recognition
- Doddington
- 1989
(Show Context)
Citation Context ... the TI/NIST connected digit corpus was 1.71% for single densities and 0.74% for mixture densities. 1. INTRODUCTION Over the last five years much progress has been made in connected digit recognition =-=[3, 7, 8, 9]-=-. This paper describes how the systematic optimization of various components of a 'conventional' recognition system leads to high performance comparable with other systems that use much more complicat... |

17 |
HTK: Hidden Markov Model Toolkit V1.4
- Young
- 1993
(Show Context)
Citation Context ...um coefficientsscm are calculated from N = 1024 mel--warped log magnitudes fn : cm = N \Gamma1 X n=0 fn cos i ��mn N j ; 0sm ! M: ffl Method B: This method is based on 20 mel scale triangular filt=-=ers [6]-=-. We use a mel scale defined by Mel(f) = 2595 log 10 i 1 + f 700Hz j : A filter bank in which each filter has a triangle bandpass frequency response with bandwidth and spacing determined by a constant... |

9 |
Improved acoustic modeling with Bayesian learning
- Gauvain, Lee
- 1992
(Show Context)
Citation Context |

8 | Experiments with linear feature extraction in speech recognition
- Beulen, Welling, et al.
- 1995
(Show Context)
Citation Context ...lso studied the effect of a 11-- frame window of vectors without derivatives. Again the resulting acoustic vector consisted of 48 components. Such a long window performed best with Gaussian densities =-=[10]-=-. In Table 9 (method B) the results for Gaussian and Laplacian densities are summarized. For comparison, Table 9 also shows the error rates of our baseline system with no LDA. A 11--frame window combi... |