Results 1 -
9 of
9
An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2000
"... This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is ..."
Abstract
-
Cited by 47 (7 self)
- Add to MetaCart
This paper describes a novel method by which a spoken dialogue system can learn to choose an optimal dialogue strategy from its experience interacting with human users. The method is
Statistical language model adaptation: review and perspectives
- Speech Communication
, 2004
"... Speech recognition performance is severely affected when the lexical, syntactic, or semantic characteristics of the discourse in the training and recognition tasks differ. The aim of language model adaptation is to exploit specific, albeit limited, knowledge about the recognition task to compensate ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Speech recognition performance is severely affected when the lexical, syntactic, or semantic characteristics of the discourse in the training and recognition tasks differ. The aim of language model adaptation is to exploit specific, albeit limited, knowledge about the recognition task to compensate for this mismatch. More generally, an adaptive language model seeks to maintain an adequate representation of the current task domain under changing conditions involving potential variations in vocabulary, syntax, content, and style. This paper presents an overview of the major approaches proposed to address this issue, and offers some perspectives regarding their comparative merits and associated tradeoffs. Ó 2003 Elsevier B.V. All rights reserved. 1.
Evaluating Spoken Dialogue Agents with PARADISE: Two Case Studies
, 1998
"... This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating and comparing the performance of spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enable ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating and comparing the performance of spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity. After presenting PARADISE, we illustrate its application to two different spoken dialogue agents. We show how to derive a performance function for each agent and how to generalize results across agents. We then show that once such a performance function has been derived, that it can be used both for making predictions about future versions of an agent, and as feedback to the agent so that the agent can learn to optimize its behavior based on its experiences with users over time.
Production Models As A Structural Basis For Automatic Speech Recognition
, 1996
"... We postulate in this paper that highly structured speech production models will have much to contribute to the ultimate success of speech recognition in view of the weaknesses of the theoretical foundation underpinning current technology. These weaknesses are analyzed in terms of phonological modeli ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
We postulate in this paper that highly structured speech production models will have much to contribute to the ultimate success of speech recognition in view of the weaknesses of the theoretical foundation underpinning current technology. These weaknesses are analyzed in terms of phonological modeling and of phonetic-interface modeling. We conclude by suggesting that many of the advantages to be gained from interaction between speech production and speech recognition communities will develop from integrating models from the production community with the probabilistic analysis-by-synthesis strategy currently used by the technology community. R ' ESUM ' EE Dans cet article, nous proposons que les mod`eles de production de la parole contribueront beaucoup `a la r'eussite eventuelle des mod`eles de reconnaissance automatique, limit'es en ce moment par les faiblesses de la base th'eorique de la technologie actuelle. Nous analysons ces faiblesses au niveau des mod`eles phonologiques et mod`...
Generalizing Prosodic Prediction Of Speech Recognition Errors
- In Proceedings of the 6th International Conference of Spoken Language Processing (ICSLP-2000
, 2000
"... Since users of spoken dialogue systems have difficulty correcting system misconceptions, it is important for automatic speech recognition (ASR) systems to know when their best hypothesis is incorrect. We compare results of previous experiments which showed that prosody improves the detection of ASR ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Since users of spoken dialogue systems have difficulty correcting system misconceptions, it is important for automatic speech recognition (ASR) systems to know when their best hypothesis is incorrect. We compare results of previous experiments which showed that prosody improves the detection of ASR errors to experiments with a new system and new domain, the W99 conference registration system. Our new results again show that prosodic features can improve prediction of ASR misrecognitions over the use of other standard techniques for ASR rejection.
Lattice Parsing for Speech Recognition
- In Proceedings of 6me
, 1999
"... A lot of work remains to be done in the domain of a better integration of speech recognition and language processing systems. This paper gives an overview of several strategies for integrating linguistic models into speech understanding systems and investigates several ways of producing sets of hypo ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
A lot of work remains to be done in the domain of a better integration of speech recognition and language processing systems. This paper gives an overview of several strategies for integrating linguistic models into speech understanding systems and investigates several ways of producing sets of hypotheses that include more "semantic" variability than usual language models. The main goal is to present and demonstrate by actual experiments that sequential coupling may be efficiently achieved by word-lattice syntactic analyzers, efficiently parsing the huge number of hypothesis (i.e. possible sentences) contained in the lattice produced by the speech recognizer. 1. Motivations The past decade has seen significant progress in speech recognition technology: word (recognition) error rates continue to drop by a factor of 2 every two years (Rabiner et al., 1996) and high performance systems are now becoming available. Several factors have contributed to this rapid progress: ffl Generalisati...
Parallel Structure in an Integrated Speech-Recognition Network
- In EuroPar'99
, 1999
"... . Large-vocabulary continuous-speech recognition (LVCR) speakerindependent systems which integrate cross-word context dependent acoustic models and n-gram language models are difficult to parallelize because of their interwoven structure, large dynamic data structures, and complex object-oriente ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
. Large-vocabulary continuous-speech recognition (LVCR) speakerindependent systems which integrate cross-word context dependent acoustic models and n-gram language models are difficult to parallelize because of their interwoven structure, large dynamic data structures, and complex object-oriented software design. This paper shows how retrospective decomposition can be achieved if a quantitative analysis is made of dynamic system behaviour. A design which accommodates unforeseen effects and future modifications is presented. 1 Introduction Two varieties of LVCR system exist: a pipelined structure in which components of acoustic matching and language modelling are separated; and an approach which integrates cross-word context dependent acoustic models and n-gram language models into the search. The former has been thought to be more computationally tractable [1], while the latter has delivered a low mean error rate, 8.2% per word in ARPA evaluation, for a 65k vocabulary, tri-gra...
Improvements in Japanese Broadcast News Transcription
- Proc. DARPA Broadcast News Workshop
, 1999
"... This paper reports on recent improvements in Japanese broadcast news transcription and topic extraction. We constructed a language model that depends on the readings of words in order to prevent recognition errors caused by context-dependent readings of Japanese characters. We also introduced interj ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper reports on recent improvements in Japanese broadcast news transcription and topic extraction. We constructed a language model that depends on the readings of words in order to prevent recognition errors caused by context-dependent readings of Japanese characters. We also introduced interjection modeling into the language model. To improve the model's performance for a series of sentences spoken by one speaker, an on-line incremental speaker adaptation was applied. We investigated a method for extracting topic-words from the speech recognition results that was based on a significance measure. This paper also proposes a new formulation for speech recognition/understanding systems, in which the a posteriori probability of a message that the speaker intends to address given an observed acoustic sequence is maximized. We applied the formulation to rescoring the recognition hypotheses. 1. INTRODUCTION We have been developing a large-vocabulary continuous -speech recognition (LVC...

