## Language Model Estimations And Representations For Real-Time Continuous Speech Recognition (1994)

Venue: | In Proc. of ICASSP |

Citations: | 7 - 5 self |

### BibTeX

@INPROCEEDINGS{Antoniol94languagemodel,

author = {Giuliano Antoniol and Fabio Brugnara and Mauro Cettolo and Marcello Federico},

title = {Language Model Estimations And Representations For Real-Time Continuous Speech Recognition},

booktitle = {In Proc. of ICASSP},

year = {1994},

pages = {588--591}

}

### OpenURL

### Abstract

This paper compares different ways of estimating bigram language models and of representing them in a finite state network used by a beam-search based, continuous speech, and speaker independent HMM recognizer. Attention is focused on the n-gram interpolation scheme for which seven models are considered. Among them, the Stacked estimated linear interpolated model favourably compares with the best known ones. Further, two different static representations of the search space are investigated: "linear" and "tree-based". Results show that the latter topology is better suited to the beam-search algorithm. Moreover, this representation can be reduced by a network optimization technique, which allows the dynamic size of the recognition process to be decreased by 60%. Extensive recognition experiments on a 10,000-word dictation task with four speakers are described in which an average word accuracy of 93% is achieved with real-time response. I. INTRODUCTION This paper compares different ways ...

### Citations

2500 |
The Design and Analysis of Computer Algorithms
- Aho, Hopcroft, et al.
- 1974
(Show Context)
Citation Context ...ds that share a phoneme. This "back-propagation" of probabilities within trees makes many paths redundant, a fact that can be exploited to reduce network size. The partitioning algorithm des=-=cribed in [3]-=- was successfully used for this purpose. IV. EXPERIMENTS 4.1 System Description. Acoustic modelling uses phonetic transcription of words with 50 context independent units. Unit HMMs have simple left-t... |

682 | Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer
- Katz
- 1987
(Show Context)
Citation Context ... distribution if trigrams are computed - or otherwise (e.g. for unigrams) uniformly. The discounting and the redistribution functions are generally combined according to two main schemes: backing-off =-=[7]-=- and interpolation [6]. Backing-off scheme. Bigram probability is computed by choosing the most significant approximation according to the frequency countings: P r(z j y) = ( f 0 (z j y) if c(yz) ? 0 ... |

534 |
An inequality and associated maximization technique in statistical estimation for probabilistic functions of markov processes
- Baum
- 1972
(Show Context)
Citation Context ... estimation of this model from a training text W is well known in the literature of LMs and HMMs [6]. In fact, the following Leaving-One-Out (LOO) iterative formula derived from Baum-Egon's estimator =-=[4]-=- was devised: n+1 (y) = 1 jSy j X yz2Sy n (y)Pr(z) (1 \Gammasn (y))f (z j y) +sn (y)Pr(z) where S y is the set of all occurrences of bigrams of type y in W and f (z j y) is the relative frequency comp... |

238 |
The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression
- Witten, Bell
- 1991
(Show Context)
Citation Context ... In general, the above probability is computed by combining two components: a discounting function and a redistribution function. The first function is related to the zero-frequency estimation problem=-=[13]-=-: that is, a probability for all the bigrams never occurred in W is computed by discounting the bigram relative frequency f(z j y) = c(yz) c(y) . The second function redistributes the zero-frequency p... |

177 |
On structuring probabilistic dependencies in stochastic language modelling
- Ney, Essen, et al.
- 1994
(Show Context)
Citation Context ... 1 for c(yz) ? 5. Absolute (or "shift") discounting. A small constantsfi is subtracted from all bigram countings. Both the simplest solution with fi = 1 (S1) 3 [13] and the one proposed by N=-=ey et al. [9]-=- for 0 ! fi ! 1 (Sfi) are considered. Linear discounting. Empirical frequencies are discounted in proportion to their value. The Linear Empirical (LE) discounting method was described by Witten and Be... |

62 |
Principles of lexical language modeling for speech recognition
- Jelinek, Mercer, et al.
- 1991
(Show Context)
Citation Context ...ams are computed - or otherwise (e.g. for unigrams) uniformly. The discounting and the redistribution functions are generally combined according to two main schemes: backing-off [7] and interpolation =-=[6]-=-. Backing-off scheme. Bigram probability is computed by choosing the most significant approximation according to the frequency countings: P r(z j y) = ( f 0 (z j y) if c(yz) ? 0 K y (y)P r(z) if c(yz)... |

57 |
M.: Improvements in Beam Search for 10000-Word Continuous Speech Recognition
- Ney, Haeb-Umbach, et al.
- 1992
(Show Context)
Citation Context ...ng phonemes of words are shared and each leaf corresponds to a word. Further, computational advantages obtained by integrating this lexicon representation with the beamsearch algorithm are well known =-=[10]-=-. Unfortunately, unlike the linear representation, in the lexicon tree the identity of a word is only known at the leaf level: so, to integrate the bigram probability, a duplicate of the whole lexicon... |

53 | A one pass decoder design for large vocabulary recognition
- Odel, Valtchev, et al.
- 1994
(Show Context)
Citation Context ...ion justifies the efforts made by many research laboratories to overcome the problem of its memory space requirements. In fact, some labs dynamically build the portion of the currently explored space =-=[10, 11]-=-; others adopt a static linear-tree mixture approach [8]. As a matter of fact, a static representation of the whole search space is attractive mainly for two reasons: first, there is no overhead in bu... |

41 | The estimation of powerful language models from small and large corpora
- Placeway, Schwartz, et al.
- 1993
(Show Context)
Citation Context ...frequencies are discounted in proportion to their value. The Linear Empirical (LE) discounting method was described by Witten and Bell [13] and was first employed for LM estimation by Placeway et al. =-=[12]. The basi-=-c idea is to 3 Condition c(yz) ? 0 in (1) becomes c(yz) ? 1. make the zero-frequency probability (y) proportional to the number of "new events" occurred after context y during the production... |

40 | A baseline of a speaker independent continuous speech recognizer of italian
- Angelini, Brugnara, et al.
- 1993
(Show Context)
Citation Context ... is done, since this allows a time gain without affecting accuracy. Acoustic models were trained with MLE on a set of 2000 sentences belonging to a phonetically rich database under collection at IRST =-=[2]-=-. Neither the sentences nor the speakers in the training set have any relation with the application domain. Features Concierge AReS LOB Content queries reports articles Vocab. size 907 10,261 49,615 C... |

20 |
Techniques to achieve an accurate real-time large-vocabulary speech recognition system
- Murveit, Monaco, et al.
- 1994
(Show Context)
Citation Context ...to overcome the problem of its memory space requirements. In fact, some labs dynamically build the portion of the currently explored space [10, 11]; others adopt a static linear-tree mixture approach =-=[8]-=-. As a matter of fact, a static representation of the whole search space is attractive mainly for two reasons: first, there is no overhead in building it during the recognition process; secondly, the ... |

8 | Radiological reporting by speech recognition: the A.Re.S system
- Angelini, Antoniol, et al.
- 1994
(Show Context)
Citation Context ... scheme seven bigram LMs in the literature are introduced. Comparisons are performed on text corpora presenting increasing data sparseness and on a 10,000word speech recognition task from the A.Re.S. =-=[1]-=- (Automatic REporting by Speech) applicative domain 1 . If better bigram estimates can improve the search engine accuracy, a suitable organization of the search space can improve its speed as well. Be... |

2 |
Stacked estimation of interpolated ngram language models
- Federico
- 1993
(Show Context)
Citation Context ...ethods actually provided better results. Further, a way to reduce the disadvantage of deleting a cross-validation set was introduced by using a Stacked version of the interpolation model (LG Stacked) =-=[5]-=-. The basic idea of the stacked method is to combine parameters estimated on different random partitions of the training data (into training and crossvalidation sets) in order to improve performance. ... |