Results 1 -
3 of
3
Rapid language model development for new task domains
- Proc. First International Conference on Language Resources and Evaluation (LREC
, 1998
"... Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The f ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The first technique is based on using a context-free grammar to generate a corpus of word collocations. The second is an adaptation technique based on using out-of-domain corpora to estimate target domain language models. We report results of successfully using these two techniques individually and in combination to build efficient models for a spontaneous speech recognition task in a medium-sized vocabulary domain. 1.
Automatic induction of language model data for a spoken dialogue system
- In Proceedings of SIGDIAL
, 2005
"... When building a new spoken dialogue application, large amounts of domain specific data are required. This paper addresses the issue of generating in-domain training data when little or no real user data are available. The twostage approach taken begins with a data induction phase whereby linguistic ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
When building a new spoken dialogue application, large amounts of domain specific data are required. This paper addresses the issue of generating in-domain training data when little or no real user data are available. The twostage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utterances is induced. The second stage involves sampling the synthetic corpus towards the goal of obtaining data that would be representative of the statistics of applicationspecific real user interactions. The sampling methods proposed employ an example-based generation framework, a simulated user model and information extracted from development data. Evaluation is conducted on recognition performance in a restaurant information domain. We show that word error rate can be reduced when limited amounts of real user training data are augmented with synthetic data derived by our methods. 1
Language Modelling and Spoken Dialogue Systems - the ARISE experience
, 1999
"... The aim of this paper is to describe the experiences gained in the field of language modelling during the LE-3 ARISE (Automatic Railway Information Systems for Europe) project. All of the different techniques presented in this paper are related to the field of Spoken Dialogue Systems, and they cope ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The aim of this paper is to describe the experiences gained in the field of language modelling during the LE-3 ARISE (Automatic Railway Information Systems for Europe) project. All of the different techniques presented in this paper are related to the field of Spoken Dialogue Systems, and they cope with the issues of limited amount of training material and the exploitation of the constraints available in a dialogue system. The results obtained may be useful for the future development of similar applications. Keywords: language modelling, spoken dialogue system, speech recognition

