Results 1 -
5 of
5
Janus: Towards Multilingual Spoken Language Translation
, 1995
"... In our effort to build spoken language translation systems we have extended our JANUS system to process spontaneous human-human dialogs in a new domain, two people trying to schedule a meeting. Trained on an initial database JANUS-2 is able to translate English and German spoken input in either Engl ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
In our effort to build spoken language translation systems we have extended our JANUS system to process spontaneous human-human dialogs in a new domain, two people trying to schedule a meeting. Trained on an initial database JANUS-2 is able to translate English and German spoken input in either English, German, Spanish, Japanese or Korean output. To tackle the difficulty of spontaneous human-human dialogs we improved the JANUS-2 recognizer along its three knowledgesourcesacousticmodels, dictionary andlanguage models. We developed a robust translation system which performs semantic rather than syntactic analysis and thus is particulary suited to processing spontaneous speech. We describe repair methods to recover from recognition errors. tes on spontaneo...
Using Partial Morphological Analysis In Language Modeling Estimation For Large Vocabulary Portuguese Speech Recognition
"... To achieve an acceptable degree of generalization, current speech recognition systems work with large vocabularies, which, among other effects, result in higher search spaces and consequently lower system performance. For highly inflectional languages, such as the Portuguese, a much larger vocabular ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
To achieve an acceptable degree of generalization, current speech recognition systems work with large vocabularies, which, among other effects, result in higher search spaces and consequently lower system performance. For highly inflectional languages, such as the Portuguese, a much larger vocabulary is required for the same tasks coverage and a much larger text corpus for extraction of word-based statistics with the same reliability. In this paper we present a new approach using some basic morphological analysis based on the decomposition of regular verbs on its morphemes (roots and suffixes) applied to a Portuguese large vocabulary continuous speech recognition system. This approach not only reduces the vocabulary size and therefore the language model perplexity, but also the rate of out-of-vocabulary words (OOV) and memory requirements. Preliminary results shows an improvement of about 20% on the recognition speed with a slight degradation on the word error rate (WER).
Speech Recognition Of European Languages
- In Proc. of the IEEE ASR Workshop, Snowbird
, 1995
"... A basic overview is presented of the main ongoing efforts in large vocabulary, continuous speech recognition (LVCSR) for European languages. We address issues in acoustic modeling, lexical representation, and language modeling for several European languages, as well as issues in comparative evaluati ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
A basic overview is presented of the main ongoing efforts in large vocabulary, continuous speech recognition (LVCSR) for European languages. We address issues in acoustic modeling, lexical representation, and language modeling for several European languages, as well as issues in comparative evaluation.
Detection and Transcription of OOV Words
, 1998
"... This thesis deals with the problem of Out-Of-Vocabulary words in speech recognition. The standard response of speech recognition systems whenever they encounter such OOV words is to (silently) misrecognize them without issuing any warning to the user. In order to avoid this undesired behaviour, two ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This thesis deals with the problem of Out-Of-Vocabulary words in speech recognition. The standard response of speech recognition systems whenever they encounter such OOV words is to (silently) misrecognize them without issuing any warning to the user. In order to avoid this undesired behaviour, two different strategies are proposed. The first strategy consists in preventing the problem, i.e. the occurrence of OOV words, and this thesis presents two ways of doing that. First, the system vocabulary is optimized using information extracted from other corpora and application domains, such that the number of expected OOV words be minimized. Using this method, the vocabulary coverage was significantly improved, especially for small vocabularies. The second method of reducing the number of OOV words consists of redefining the concept of "word" based on morphological considerations. In particular, compound words are decomposed into their constituent parts, which are used as the lexical recogni...
A Hybrid Approach To Compounds In Lvcsr
- In Proc. International Conference on Spoken Language Processing, volume I
, 2002
"... In several languages compound words form orthographic units, which complicates the task of ensuring good lexical coverage for large vocabulary continuous speech recognition (LVCSR). A common approach to the problem consists of first recognizing the compound constituents, followed by an automatic rec ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In several languages compound words form orthographic units, which complicates the task of ensuring good lexical coverage for large vocabulary continuous speech recognition (LVCSR). A common approach to the problem consists of first recognizing the compound constituents, followed by an automatic recompounding process. We describe an accurate compound module, which combines a rule-based approach with statistical pruning. The module is incorporated in a broadcast news recognition task for Dutch and yields an 11% relative decrease in word error rate (WER).

