Results 1 -
7 of
7
Repetition and its phonetic realizations: Investigating a Swedish database of spontaneous computer-directed speech
- In Proceedings of ICPhS-99, San Francisco. International Congress of Phonetic Sciences
, 1999
"... This paper is an investigation of repetitive utterances in a Swedish database of spontaneous computer-directed speech. A spoken dialogue system was installed in a public location in downtown Stockholm and spontaneous human-computer interactions with adults and children were recorded [1]. Several aco ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
This paper is an investigation of repetitive utterances in a Swedish database of spontaneous computer-directed speech. A spoken dialogue system was installed in a public location in downtown Stockholm and spontaneous human-computer interactions with adults and children were recorded [1]. Several acoustic and prosodic features such as duration, shifting of focus and hyperarticulation were examined to see whether repetitions could be distinguished from what the users first said to the system. The present study indicates that adults and children use partly different strategies as they attempt to resolve errors by means of repetition. As repetition occurs, duration is increased and words are often hyperarticulated or contrastively focused. These results could have implications for the development of future spoken dialogue systems with robust error handling. 1.
On the influence of hyperarticulated speech on recognition performance
- In Proceedings of ICSLP-98, Sydney. International Conference on Spoken Language Processing
, 1998
"... Since we cannot exclude that speech recognizers fail sometimes, it is important to examine how users react to recognition errors. In correction situations, speaking style becomes more accentuated to disambiguate the original mistake. We examine the effect of speaking style in such situations on spee ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Since we cannot exclude that speech recognizers fail sometimes, it is important to examine how users react to recognition errors. In correction situations, speaking style becomes more accentuated to disambiguate the original mistake. We examine the effect of speaking style in such situations on speech recognition performance. Our results indicate that hyperarticulated effects occur in correction situations and decrease word accuracy significantly. 1.
Speech Technology on Trial: Experiences from the August System
- Natural Language Engineering
, 2000
"... In this paper, the August spoken dialogue system is described. This experimental Swedish dialogue system, which featured an animated talking agent, was exposed to the general public during a trial period of six months. The construction of the system was partly motivated by the need to collect genuin ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
In this paper, the August spoken dialogue system is described. This experimental Swedish dialogue system, which featured an animated talking agent, was exposed to the general public during a trial period of six months. The construction of the system was partly motivated by the need to collect genuine speech data from people with little or no previous experience of spoken dialogue systems. A corpus of more than 10,000 utterances of spontaneous computer-directed speech was collected and empirical linguistic analyses were carried out. Acoustical, lexical and syntactical aspects of this data were examined. In particular, user behavior and user adaptation during error resolution were emphasized. Repetitive sequences in the database were analyzed in detail. Results suggest that computer-directed speech during error resolution is increased in duration, hyperarticulated and contains inserted pauses. Design decisions which may have influenced how the users behaved when they interacted with August are discussed and implications for the development of future systems are outlined.
Acoustic Models for Hyperarticulated Speech
, 2000
"... In spoken dialogue systems, hyperarticulation occur as an eect to recover previous recognition errors. It is commonly observed that in particular real users apply similar recovery strategies as in human-human interactions. Previous studies have shown that current speech recognizer cannot handle hype ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
In spoken dialogue systems, hyperarticulation occur as an eect to recover previous recognition errors. It is commonly observed that in particular real users apply similar recovery strategies as in human-human interactions. Previous studies have shown that current speech recognizer cannot handle hyperarticulated speech. As an eect of higher word error rates at hyperarticulated speech, humans try to reinforce this speaking style which result in even more recognition errors. In this paper, we present approaches to build robust acoustic models for hyperarticulated speech. One key point is that the changes of acoustic features at hyperarticulation is a phone dependent eect. The idea is to use the likelihood criterion to decide, which phones should be treated separately. This can be done by incorporating dynamic questions about hyperarticulation into the clustering stage. Based on such phonetic decision tree, we can generate appropriate acoustic models. With this method, we achieved a wo...
Linguistic adaptations in spoken human-computer dialogues -- Empirical studies of user behavior
, 2003
"... ..."
Speech and Speech Recognition during Dictation Corrections
"... A natural way to correct errors made while dictating to a computer is to respeak portions of the original sentence. But often spoken corrections are themselves misrecognized, costing the user time and testing their patience. To better understand how users behave while correcting, I created a simulat ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A natural way to correct errors made while dictating to a computer is to respeak portions of the original sentence. But often spoken corrections are themselves misrecognized, costing the user time and testing their patience. To better understand how users behave while correcting, I created a simulated dictation interface and fooled users into believing they were correcting errors by respeaking. I found that users not only hyperarticulate during corrections, but they do so preemptively before any misrecognition. Depending on the recognizer, hyperarticulation was found to cause relatively minor changes in error rate. The correction of isolated words or phrases was more troublesome, causing substantial recognition problems for an HTK recognizer. Dragon Naturally Speaking, on the other hand, performed slightly better on hyperarticulated speech and only degraded slightly on isolated corrections. Index Terms: speech recognition, error correction, dictation, hyperarticulation, correcting by respeaking
Improvements On Speech Recogniton For Fast Talkers
, 1999
"... The accuracy of a speech recognition (SR) system depends on many factors, such as the presence of background noise, mismatches in microphone and language models, variations in speaker, accent and even speaking rates. In addition to fast speakers, even normal speakers will tend to speak faster when u ..."
Abstract
- Add to MetaCart
The accuracy of a speech recognition (SR) system depends on many factors, such as the presence of background noise, mismatches in microphone and language models, variations in speaker, accent and even speaking rates. In addition to fast speakers, even normal speakers will tend to speak faster when using a speech recognition system in order to get higher throughput. Unfortunately, state-of-the-art SR systems perform significantly worse on fast speech. In this paper, we present our efforts in making our system more robust to fast speech. We propose cepstrum length normalization, applied to the incoming testing utterances, which results in a 13% word error rate reduction on an independent evaluation corpus. Moreover, this improvement is additive to the contribution of Maimum Likelihood Linear Regression (MLLR) adaptation. Together with MLLR, a 23% error rate reduction was achieved

