Results 1 -
4 of
4
Automatic stochastic tagging of natural language texts
- Computational Linguistics
, 1995
"... Five language and tagset independent stochastic taggers, handling morphological and contextual information, are presented and tested in corpora of seven European languages (Dutch, English, French, German, Greek, Italian and Spanish), using two sets of grammatical tags; a small set containing the ele ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
Five language and tagset independent stochastic taggers, handling morphological and contextual information, are presented and tested in corpora of seven European languages (Dutch, English, French, German, Greek, Italian and Spanish), using two sets of grammatical tags; a small set containing the eleven main grammatical classes and a large set of grammatical categories common to all languages. The unknown words are tagged using an experimentally proven stochastic hypothesis that links the stochastic behavior of the unknown words with that of the less probable known words. A fully automatic training and tagging program has been implemented on an IBM PC-compatible 80386-based computer. Measurements of error rate, time response, and memory requirements have shown that the taggers " performance is satisfactory, even though a small training text is available. The error rate is improved when new texts are used to update the stochastic model parameters. 1.
Lattice Parsing for Speech Recognition
- In Proceedings of 6me
, 1999
"... A lot of work remains to be done in the domain of a better integration of speech recognition and language processing systems. This paper gives an overview of several strategies for integrating linguistic models into speech understanding systems and investigates several ways of producing sets of hypo ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
A lot of work remains to be done in the domain of a better integration of speech recognition and language processing systems. This paper gives an overview of several strategies for integrating linguistic models into speech understanding systems and investigates several ways of producing sets of hypotheses that include more "semantic" variability than usual language models. The main goal is to present and demonstrate by actual experiments that sequential coupling may be efficiently achieved by word-lattice syntactic analyzers, efficiently parsing the huge number of hypothesis (i.e. possible sentences) contained in the lattice produced by the speech recognizer. 1. Motivations The past decade has seen significant progress in speech recognition technology: word (recognition) error rates continue to drop by a factor of 2 every two years (Rabiner et al., 1996) and high performance systems are now becoming available. Several factors have contributed to this rapid progress: ffl Generalisati...
A Category Based Approach for Recognition of Out-of-Vocabulary Words
- In Int. Conf. on Spoken Language Processing
, 1996
"... In almost all applications of automatic speech recognition, especially in spontaneous speech tasks, the recognizer vocabulary cannot cover all occurring words. There is always a significant amount of out-of-vocabulary words even when the vocabulary size is very large. In this paper we present a new ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
In almost all applications of automatic speech recognition, especially in spontaneous speech tasks, the recognizer vocabulary cannot cover all occurring words. There is always a significant amount of out-of-vocabulary words even when the vocabulary size is very large. In this paper we present a new approach for the integration of out-of-vocabulary words into statistical language models. We use category information for all words in the training corpus to define a function that gives an approximation of the out-of-vocabulary word emission probability for each word category. This information is integrated into the language models. Although we use a simple acoustic model for out-of-vocabulary words, we achieve a 6% reduction of word error rate on spontaneous speech data with about 5% out-of-vocabulary rate.
Analyzing And Improving Statistical Language Models For Speech Recognition
, 1994
"... A speech recognizer is a device that translates speech into text. Many current speech recognizers contain two components, an acoustic model and a statistical language model. The acoustic model indicates how likely it is that a certain word corresponds to a part of the acoustic signal (e.g. the speec ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A speech recognizer is a device that translates speech into text. Many current speech recognizers contain two components, an acoustic model and a statistical language model. The acoustic model indicates how likely it is that a certain word corresponds to a part of the acoustic signal (e.g. the speech). The statistical language model indicates how likely it is that a certain word will be spoken next, given the words recognized so far. Even though the acoustic model might for example not be able to decide between the acoustically similar words "peach" and "teach", the statistical language model can indicate that the word "peach" is more likely if the previously recognized words are "He ate the". Current speech recognizers perform well on constrained tasks, but the goal of continuous, speaker independent speech recognition in potentially noisy environments with a very large vocabulary has not been reached so far. How can statistical language models be improved so that more complex tasks c...

