Results 1 -
2 of
2
Rapid language model development for new task domains
- Proc. First International Conference on Language Resources and Evaluation (LREC
, 1998
"... Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The f ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The first technique is based on using a context-free grammar to generate a corpus of word collocations. The second is an adaptation technique based on using out-of-domain corpora to estimate target domain language models. We report results of successfully using these two techniques individually and in combination to build efficient models for a spontaneous speech recognition task in a medium-sized vocabulary domain. 1.
SCREEN: Learning a Flat Syntactic and Semantic . . .
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1997
"... Previous approaches of analyzing spontaneously spoken language often have been based on encoding syntactic and semantic knowledge manually and symbolically. While there has been some progress using statistical or connectionist language models, many current spoken-language systems still use a relativ ..."
Abstract
- Add to MetaCart
Previous approaches of analyzing spontaneously spoken language often have been based on encoding syntactic and semantic knowledge manually and symbolically. While there has been some progress using statistical or connectionist language models, many current spoken-language systems still use a relatively brittle, hand-coded symbolic grammar or symbolic semantic component. In contrast, we describe a so-called screening approach for learning robust processing of spontaneously spoken language. A screening approach is a flat analysis which uses shallow sequences of category representations for analyzing an utterance at various syntactic, semantic and dialog levels. Rather than using a deeply structured symbolic analysis, we use a flat connectionist analysis. This screening approach aims at supporting speech and language processing by using (1) data-driven learning and (2) robustness of connectionist networks. In order to test this approach, we have developed the screen system which is ba...

