Results 1 -
8 of
8
The Thoughtful Elephant: Strategies for Spoken Dialog Systems
- IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
, 2000
"... In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and fle ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and flexible dialog flow similar to human--human interaction. This imposes the challenging task to recognize and interpret user input, where he/she is allowed to choose from an unrestricted vocabulary and an infinite set of possible formulations. We therefore put emphasis on strategies that make the system more robust while still maintaining a high level of naturalness and flexibility. In view of this paradigm, we found that two fundamental principles characterize many of the proposed methods: 1) to consider available sources of information as early as possible, and 2) to keep alternative hypotheses and delay the decision for a single option as long as possible. We describe
A voice-controlled automatic telephone switchboard and directory information system
- Speech Communication
, 1997
"... The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room n ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room numbers as well as direct call completion to a desired party. In this paper, we present the underlying probabilistic framework, the system architecture, and the individual modules for speech recognition, language understanding, dialogue control, and speech output. In addition, we report results on performance and user behaviour obtained from a field test in our research lab with a 600-entry database. We derive a new maximum-a-posteriori decision rule which incorporates database knowledge and dialogue history as constraints in speech recognition and language understanding. It has improved speech understanding accuracy by 19 % (in terms of concept error rate), and reduced attribute substitution errors (e.g. recognition of a wrong name) by 38%. The decision rule is implemented in a multi-stage approach as a combination of state-of-the-art speech recognition, partial parsing with an attributed stochastic context-free grammar, and an N-best algorithm which is also described in this paper. The system conducts a flexible mixed-initiative dialogue rather than using a rigid form-filling scheme, and incorporates database knowledge to optimize the dialogue flow.
Towards an Automated Directory Information System
- Proc. EUROSPEECH
, 1997
"... This paper describes a design and feasibility study for a large-scale automatic directory information system with a scalable architecture. The current demonstrator, called PADIS-XL 1, operates in realtime and handles a database of a medium-size German city with 130,000 listings. The system uses a ne ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
This paper describes a design and feasibility study for a large-scale automatic directory information system with a scalable architecture. The current demonstrator, called PADIS-XL 1, operates in realtime and handles a database of a medium-size German city with 130,000 listings. The system uses a new technique of taking a combined decision on the joint probability over multiple dialogue turns, and a dialogue strategy that strives to restrict the search space more and more with every dialogue turn. During the course of the dialogue, the last name of the desired subscriber must be spelled out. The spelling recognizer permits continuous spelling and uses a context-free grammar to parse common spelling expressions. This paper describes the system architecture, our maximum a-posteriori (MAP) decision rule, the spelling grammar, and the dialogue strategy. We give results on the SPEECHDAT and SIETILL databases on recognition of first names by spelling and on jointly deciding on the spelled and the spoken name. In a 35,000-names setup, the joint decision reduced name-recognition errors by 31%. 1.
A Word Graph Based N-Best Search in Continuous Speech Recognition
, 1996
"... In this paper, weintroduce an e#cient algorithm for the exhaustive search of N best sentence hypotheses in a word graph. The search procedure is based on a two-pass algorithm. In the #rst pass, a word graph is constructed with standard time-synchronous beam search. The actual extraction of N best wo ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In this paper, weintroduce an e#cient algorithm for the exhaustive search of N best sentence hypotheses in a word graph. The search procedure is based on a two-pass algorithm. In the #rst pass, a word graph is constructed with standard time-synchronous beam search. The actual extraction of N best word sequences from the word graph takes place during the second pass.
Evaluation Methodologies for Interactive Speech Systems
- In First International Conference on Language Resources and Evaluation (LREC
, 1998
"... In this paper, several criteria and paradigms are described to measure the performance of spoken language systems. The focus is on the evaluation of natural language understanding components. These evaluations are carried out in the domain of spontaneous human-human interaction as supported by autom ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In this paper, several criteria and paradigms are described to measure the performance of spoken language systems. The focus is on the evaluation of natural language understanding components. These evaluations are carried out in the domain of spontaneous human-human interaction as supported by automatic translation systems. They are also applied in the domain of spontaneous human-machine interaction typically used in information retrieval applications. Some system response evaluation paradigms for different applications and domains are discussed in more detail. It is also shown that official performance tests and site-specific evaluation paradigms are complementary in use. 1. Introduction This paper describes and discusses methods and paradigms measuring the performance of a spoken language system for different applications and domains and at different stages of the inputprocessing. The focus is on the evaluationof natural language understanding components. A diagram of a generic spo...
Spoken Language Understanding Within Dialogs Using A Graphical Model Of Task Structure
- Proc. ICSLP 98
, 1998
"... We describe a procedure for contextual interpretation of spoken sentences within dialogs. Task structure is represented in a graphical form, enabling the interpreter algorithm to be efficient and task-independent. Recognized spoken input may consist either of a single sentence with utterance-verific ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
We describe a procedure for contextual interpretation of spoken sentences within dialogs. Task structure is represented in a graphical form, enabling the interpreter algorithm to be efficient and task-independent. Recognized spoken input may consist either of a single sentence with utterance-verification scores, or of a word lattice with arc weights. A confidence model is used throughout and all inferences are probability-weighted. The interpretation consists of a probability for each class and for each auxiliary information label needed for task completion. Anaphoric references are permitted. 1. INTRODUCTION We are interested in spoken dialog systems in which the caller responds using fluent natural language to the prompt "How may I help you?" (HMIHY). In previous work we have described the speech recognizer [1], automatic acquisition of salient phrase and grammar fragments, and call-type classification [2,3], the dialog manager [4], and the incorporation of utterance verification [...
Context Use to Improve the Speech Understanding Processing
- Proc. SPECOM 2001
, 2001
"... Abstract: We developed an environment for speech understanding, based on a stochastic representation of conceptual entities. The conceptual segmentation can be performed with the Viterbi or A * algorithms. The aim of this paper is to propose a model of contextual knowledge to improve the speech unde ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract: We developed an environment for speech understanding, based on a stochastic representation of conceptual entities. The conceptual segmentation can be performed with the Viterbi or A * algorithms. The aim of this paper is to propose a model of contextual knowledge to improve the speech understanding process. We will focus on the context provided by the system prompt and the dialogue history. We argue that the context could help the prediction of the conceptual segmentation of the utterance. So results obtained on understanding error rate and running time with and without contextual knowledge are discussed.
Belief-Based Nonlinear Rescoring in Thai Speech Understanding
"... This paper proposes an approach to improve speech understanding based on rescoring of N-best semantic hypotheses. In rescoring, probabilities produced by an understanding component are combined with additional probabilities derived from system beliefs. While a normal rescoring approach is to multipl ..."
Abstract
- Add to MetaCart
This paper proposes an approach to improve speech understanding based on rescoring of N-best semantic hypotheses. In rescoring, probabilities produced by an understanding component are combined with additional probabilities derived from system beliefs. While a normal rescoring approach is to multiply or linearly interpolate with belief probabilities, this paper shows that probabilities from various sources are better combined using a nonlinear estimator. Using the proposed model together with a dialogue-state dependent semantic model shows a significant improvement when applying to a Thai interactive hotel reservation agent (TIRA), the first spoken dialogue system in Thai language.

