Results 1 - 10
of
12
The Thoughtful Elephant: Strategies for Spoken Dialog Systems
- IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING
, 2000
"... In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and fle ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and flexible dialog flow similar to human--human interaction. This imposes the challenging task to recognize and interpret user input, where he/she is allowed to choose from an unrestricted vocabulary and an infinite set of possible formulations. We therefore put emphasis on strategies that make the system more robust while still maintaining a high level of naturalness and flexibility. In view of this paradigm, we found that two fundamental principles characterize many of the proposed methods: 1) to consider available sources of information as early as possible, and 2) to keep alternative hypotheses and delay the decision for a single option as long as possible. We describe
Automatic construction of Unique Signatures and Confusable sets for Natural Language Directory Assistance Application
- In Proc. Eurospeech 2003
"... This paper addresses the problem of building natural language based grammars and language models for directory assistance applications that use automatic speech recognition. As input, one is given an electronic version of a standard phone book, and the output is a grammar or language model that will ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
This paper addresses the problem of building natural language based grammars and language models for directory assistance applications that use automatic speech recognition. As input, one is given an electronic version of a standard phone book, and the output is a grammar or language model that will accept all the ways in which one might ask for a particular listing. We focus primarily on the problem of processing listings for businesses and government offices, but our techniques can be used to speech-enable other kinds of large listings (like book titles, catalog entries, etc.). We have applied these techniques to the business listings of a state in the Midwestern United States, and we present highly encouraging recognition results. 1.
Assessment of Dialogue Systems By Means of a New Simulation Technique
, 2002
"... In recent years, aquestiT of greatieatTV: has been the development of tools and techni8T# tofaci))T#Z the evaluatiT ofdi:ZG9T systems. The latter can be evaluated fromvari(: poi( ofviZK such asrecogni#ZG and understandi # rates,dis,TVV naturalness and robustnessagaist recognissT errors.EvaluatiZ usu ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In recent years, aquestiT of greatieatTV: has been the development of tools and techni8T# tofaci))T#Z the evaluatiT ofdi:ZG9T systems. The latter can be evaluated fromvari(: poi( ofviZK such asrecogni#ZG and understandi # rates,dis,TVV naturalness and robustnessagaist recognissT errors.EvaluatiZ usually requiyT compim -T a large corpus of words and sentences uttered by users, relevant to theappli:VT#Z domai the systemi desimT9for.Thi paper proposes a newtechni9B that makesi possi(9 to reuse such a corpus for theevaluati# and to check the performance of the system whendinTV)G dinTV)G strategiT are used. ThetechniKZ i based on theautomati generatiT of conversati)) between thediT(B(K system, togetherwie anaddiK9T#( didiK9 system user#si8GG8T#()9 wi8 thediT(GZ: system. Thetechni8G has beenappliV to evaluate a di9:K8: system developedi our labusiV twodiT((ZK recogniT#( front-ends and twodiTZ8:( diTZ8:( strategi# to handle user confirmati(KZ The experiVT#( show that the prompt-dependentrecogniepe front-endachi-en better results, but that thi front-endi appropriVG onlyi users lirs thei utterances to those related to the current system prompt. The prompt-i(9VBKTiK front-endachi-en ihi-en results, but enables front-end users to utter anypermi89G utterance at anytiVB iVB9K(T#(ZB of the system prompt. In consequence,thi front-end may allow a more natural and comfortable imfortableT TheexperiBT#( also show that there-promptiV confirmati strategy enhances system performance for both recogniVT# front-ends.
Error Detection and Recovery in Spoken Dialogue Systems
- IN PROC. WORKSHOP ON SPOKEN LANGUAGE UNDERSTANDING FOR CONVERSATIONAL SYSTEMS
, 2004
"... This paper describes our research on both the detection and subsequent resolution of recognition errors in spoken dialogue systems. The paper consists of two major components. The first half concerns the design of the error detection mechanism for resolving city names in our MERCURY flight res ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper describes our research on both the detection and subsequent resolution of recognition errors in spoken dialogue systems. The paper consists of two major components. The first half concerns the design of the error detection mechanism for resolving city names in our MERCURY flight reservation system, and an investigation of the behavioral patterns of users in subsequent subdialogues involving keypad entry for disambiguation. An important observation is that, upon a request for keypad entry, users are frequently unresponsive to the extent of waiting for a time-out or hanging up the phone. The second half concerns a pilot experiment investigating the feasibility of replacing the solicitation of a keypad entry with that of a "speak-and-spell" entry. A novelty of our work is the introduction of a speech synthesizer to simulate the user, which facilitates development and evaluation of our proposed strategy. We have
Voice User Interface Design for Automated Directory Assistance
- in Proc. Interspeech
, 2005
"... This paper focuses on the challenges that one encounters when building for commercial deployment an automated system for Directory Assistance (DA.) The design for an automated DA system needs to take into account constraints and requirements that arise from three distinct aspects of the application, ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper focuses on the challenges that one encounters when building for commercial deployment an automated system for Directory Assistance (DA.) The design for an automated DA system needs to take into account constraints and requirements that arise from three distinct aspects of the application, namely, the business drivers, the user needs, and the strengths and weaknesses of voice technologies. 1.
Exploiting the asr n-best by tracking multiple dialog state hypotheses
- in Proc. of Interspeech
, 2008
"... When the top ASR hypothesis is incorrect, often the correct hypothesis is listed as an alternative in the ASR N-Best list. Whereas traditional spoken dialog systems have struggled to exploit this information, this paper argues that a dialog model that tracks a distribution over multiple dialog state ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
When the top ASR hypothesis is incorrect, often the correct hypothesis is listed as an alternative in the ASR N-Best list. Whereas traditional spoken dialog systems have struggled to exploit this information, this paper argues that a dialog model that tracks a distribution over multiple dialog states can improve dialog accuracy by making use of the entire N-Best list. The key element of the approach is a generative model of the N-Best list given the user’s true hidden action. An evaluation on real dialog data verifies that dialog accuracy rates are improved by making use of the entire N-Best list. Index Terms: dialogue modelling, dialogue management, spoken dialogue systems, confidence score, N-Best list
A TURBO-STYLE ALGORITHM FOR LEXICAL BASEFORMS ESTIMATION
"... In this research, an iterative and unsupervised Turbo-style algorithm is presented and implemented for the task of automatic lexical acquisition. The algorithm makes use of spoken examples of both spellings and words and fuses information from letter and subword recognizers to boost the overall lexi ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this research, an iterative and unsupervised Turbo-style algorithm is presented and implemented for the task of automatic lexical acquisition. The algorithm makes use of spoken examples of both spellings and words and fuses information from letter and subword recognizers to boost the overall lexical learning performance. The algorithm is tested on a challenging lexicon of restaurant and street names and evaluated in terms of spelling accuracy and letter error rate. Absolute improvements of 7.2 % and 3 % (15.5 % relative improvement) are obtained in the spelling accuracy and the letter error rate respectively following only 2 iterations of the algorithm. Index Terms — Turbo-style, spelling, pronunciation, lexical acquisition
Detection of Recognition Errors and Out of the Spelling Dictionary Names in a Spelled Name Recognizer for Spanish
"... This paper deals with improved confidence assessment for detecting recognition errors and out of dictionary names in a Spanish Recognizer of continuously spelled names over the telephone. We present a hypothesis-verification approach for spelled name recognition. We evaluate the system for sever ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper deals with improved confidence assessment for detecting recognition errors and out of dictionary names in a Spanish Recognizer of continuously spelled names over the telephone. We present a hypothesis-verification approach for spelled name recognition. We evaluate the system for several dictionaries, obtaining more than 90.0% recognition rate for a 10,000 name dictionary. For confidence scoring, we consider several features obtained from the different recognition stages. The paper investigates the ability of each feature set to detect recognition errors and names out of the spelling dictionary. We use a neural network to combine all the features in order to obtain the best confidence annotation. Using the data collected from 1,000 phone calls, it is shown that 57.9% incorrectly recognized names and 68.3% out of the spelling dictionary names are detected at a 5% false rejection rate. 1.
Grapheme Based Speech Recognition For Large Vocabularies
- In Proc. ICSLP '00
, 2000
"... Common speech recognition systems use phonetically motivated subword units. To utilize words in these systems, one has to translate the available graphemic word representation into a phonetic one. To reduce this manual effort we propose to build grapheme based recognition systems. They can be used a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Common speech recognition systems use phonetically motivated subword units. To utilize words in these systems, one has to translate the available graphemic word representation into a phonetic one. To reduce this manual effort we propose to build grapheme based recognition systems. They can be used as speech interfaces for devices that can provide a graphemic representation of words like city names of navigation systems. Results of experiments on a 10,000 word lexicon of German cities are presented. 1.
Knowledge-Combining Methodology for Dialogue Design in Spoken Language Systems
"... In this paper, we propose a strategy for designing dialogue managers in spoken dialogue systems for a restricted domain. This strategy combines several information sources intuition, observation and simulation, in order to maximize the adaptation within the system capability and the expectation of t ..."
Abstract
- Add to MetaCart
In this paper, we propose a strategy for designing dialogue managers in spoken dialogue systems for a restricted domain. This strategy combines several information sources intuition, observation and simulation, in order to maximize the adaptation within the system capability and the expectation of the user. These sources are combined by an iterative process consisting of five steps, where different dialogue alternatives are proposed and evaluated sequentially. The evaluation process includes different measures depending on the information required. Several measures are proposed and analyzed in each step. We also describe a user-modeling technique and an approach for designing the confirmation sub-dialogues based on recognition confidence measures. The knowledge-combining methodology is described and applied to a railway information system. In a subjective evaluation, users from the university gave the system a 3.9 score on a 5-point scale with an average call duration of 205 seconds. The employers of the railway company were more critical of the system. They gave it a score of 2.1 even though the system resolved more than half of the calls (57.8%) within an average call duration of three minutes (185 seconds). 1

