Results 1 - 10
of
16
Robust Grammatical Analysis for Spoken Dialogue Systems
- Natural Language Engineering
, 1997
"... We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of inform ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
We argue that grammatical analysis is a viable alternative to concept spotting for processing spoken input in a practical spoken dialogue system. We discuss the structure of the grammar, and a model for robust parsing which combines linguistic sources of information and statistical sources of information. We discuss test results suggesting that grammatical processing allows fast and accurate processing of spoken input.
A Dutch treatment of an elitist approach to articulatory-acoustic feature classification
- Proc. Eurospeech
, 2001
"... A novel approach to articulatory-acoustic feature extraction has been developed for enhancing the accuracy of classification associated with place and manner of articulation information. This “elitist ” approach is tested on a corpus of spontaneous Dutch using two different systems, one trained on a ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
A novel approach to articulatory-acoustic feature extraction has been developed for enhancing the accuracy of classification associated with place and manner of articulation information. This “elitist ” approach is tested on a corpus of spontaneous Dutch using two different systems, one trained on a subset of the same corpus, the other trained on a corpus from a different language (American English). The feature dimensions, voicing and manner of articulation transfer relatively well between the two languages. However, place information transfers less well. Manner-specific training can be used to improve classification of articulatory place information. 1.
Obtaining Phonetic Transcriptions: A Comparison between Expert Listeners and a Continuous Speech Recognizer
- Language and Speech
, 2001
"... In this article, we address the issue of using a continuous speech recognition tool to obtain phonetic or phonological representations of speech. Two experiments were carried out in which the performance of a continuous speech recognizer (CSR) was compared to the performance of expert listeners in a ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
In this article, we address the issue of using a continuous speech recognition tool to obtain phonetic or phonological representations of speech. Two experiments were carried out in which the performance of a continuous speech recognizer (CSR) was compared to the performance of expert listeners in a task of judging whether a number of prespecified phones had been realized in an utterance. In the first experiment, nine expert listeners and the CSR carried out exactly the same task: deciding whether a segment was present or not in 467 cases. In the second experiment, we expanded on the first experiment by focusing on two phonological processes: schwa-deletion and schwa-insertion. The results of these experiments show that significant differences in performance were found between the CSR and the listeners, but also between individual listeners. Although some of these differences appeared to be statistically significant, their magnitude is such that they may very well be acceptable depending on what the transcriptions are needed for. In other words, although the CSR is not infallible, it makes it possible to explore large datasets, which might outweigh the errors introduced by the mistakes the CSR makes. For these reasons, we can conclude that the CSR can be used instead of a listener to carry out this type of task: deciding whether a phone is present or not.
A Syllable, Articulatory-Feature, and Stress-Accent Model of Speech Recognition
, 2002
"... Current-generation automatic speech recognition #ASR# systems assume that words are readily decomposable into constituent phonetic components ##phonemes"#. A detailed linguistic dissection of state-of-the-art speech recognition systems indicates that the conventional phonemic #beads-on-a-string" app ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Current-generation automatic speech recognition #ASR# systems assume that words are readily decomposable into constituent phonetic components ##phonemes"#. A detailed linguistic dissection of state-of-the-art speech recognition systems indicates that the conventional phonemic #beads-on-a-string" approach is of limited utility, particularly with respect to informal, conversational material. The study shows that there is a signi#cantgapbetween the observed data and the pronunciation models of current ASR systems. It also shows that many important factors a#ecting recognition performance are not modeled explicitly in these systems.
Experiments With Linear Feature Extraction In Speech Recognition
- in Proc. Europ. Conf. on Speech Communication and Technology
, 1995
"... In this paper we investigate Linear Discriminant Analysis (LDA) for the TI connected digit recognition task (TI task) and the Wall Street Journal large vocabulary recognition task (WSJ task). In addition to previous variants of LDA implementations, we avoided the explicit incorporation of derivative ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
In this paper we investigate Linear Discriminant Analysis (LDA) for the TI connected digit recognition task (TI task) and the Wall Street Journal large vocabulary recognition task (WSJ task). In addition to previous variants of LDA implementations, we avoided the explicit incorporation of derivatives in the acoustic vector. Instead a sliding window without derivatives was used. This large-sized vector was then taken to extract the features by an LDA transformation. Tests for this feature generation were performed both for Laplacian and Gaussian densities. 1. INTRODUCTION It is a well known fact that the performance of a pattern recognition system depends heavily on the type of observations that are used in the system. Several methods are employed in practice which often consist of two stages: First the acoustic signal is transformed from time domain into frequency domain using a Fourier transformation or the like. Second the spectral components of the resulting acoustic vector are th...
A Spoken Dialogue System For Public Transport Information
- Proc. of the Dept. of Language and Speech
, 1996
"... In 1995 our department was involved in two projects in the field of continuous speech recognition. The main aim of these two strongly related projects was the development of basic technology that can be used to build advanced telephone-based systems for providing information about public transport. ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
In 1995 our department was involved in two projects in the field of continuous speech recognition. The main aim of these two strongly related projects was the development of basic technology that can be used to build advanced telephone-based systems for providing information about public transport. A short description of the work carried out within these projects is provided in the present article. 1. Introduction During the last decade the performance of spoken dialogue systems has improved substantially. At the moment, the quality of these systems seems to be able to support a number of simple practical tasks in small and clearly delimited domains. As a result, much effort is spent nowadays to develop prototype telephone-based information systems in different countries. These systems are reminiscent of the well-known Air Travel Information System (ATIS) task that has been a focal point in the American ARPA-project. In Europe two MLAP (Multi-Lingual Action Plan) projects concerning p...
Pronunciation Variation Modelling In A Model Of Human Word Recognition
, 2002
"... Due to pronunciation variation, many insertions and deletions of phones occur in spontaneous speech. The psycholinguistic model of human speech recognition Shortlist is not well able to deal with phone insertions and deletions and is therefore not well suited for dealing with real-life input. The re ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Due to pronunciation variation, many insertions and deletions of phones occur in spontaneous speech. The psycholinguistic model of human speech recognition Shortlist is not well able to deal with phone insertions and deletions and is therefore not well suited for dealing with real-life input. The research presented in this paper explains how Shortlist can benefit from pronunciation variation modelling in dealing with real-life input.
Goal-Directed ASR in a Multimedia Indexing and Searching Environment (MUMIS)
, 2002
"... This paper describes the contribution of automatic speech recognition (ASR) within the framework of MUMIS (Multimedia Indexing and Searching Environment). The domain is football commentaries. The initial results of carrying out ASR on Dutch and English football commentaries are presented. We found t ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper describes the contribution of automatic speech recognition (ASR) within the framework of MUMIS (Multimedia Indexing and Searching Environment). The domain is football commentaries. The initial results of carrying out ASR on Dutch and English football commentaries are presented. We found that overall word error rates are high, but application specific words are recognized reasonably well. The difficulty of the ASR task is greatly increased by the high levels of noise present in the material.
Hidden Model Sequence Models for Automatic Speech Recognition
, 2001
"... Most modern automatic speech recognition systems make use of acoustic models based on hidden Markov models. To obtain reasonable recognition performance within a large vocabulary framework, the acoustic models usually include a pronunciation model, together with complex parameter tying schemes. In m ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Most modern automatic speech recognition systems make use of acoustic models based on hidden Markov models. To obtain reasonable recognition performance within a large vocabulary framework, the acoustic models usually include a pronunciation model, together with complex parameter tying schemes. In many cases the pronunciation model operates on a phoneme level and is derived independently of the underlying models. In contrast, this work is aimed at improving pronunciation modelling on a sub-phone level in a combined framework. The modelling of pronunciation variation is assumed to be of special importance for recognition of spontaneous speech.
On automatic phonetic transcription quality: lower word error rates do not guarantee better transcriptions
, 2004
"... The first goal of this study was to investigate the effect of changing several properties of a continuous speech recognizer (CSR) on the automatic phonetic transcriptions generated by the same CSR. Our results show that the quality of the automatic transcriptions can be improved by using #short# hid ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The first goal of this study was to investigate the effect of changing several properties of a continuous speech recognizer (CSR) on the automatic phonetic transcriptions generated by the same CSR. Our results show that the quality of the automatic transcriptions can be improved by using #short# hidden Markov models (HMMs) and by reducing the amount of contamination in the HMMs. The amount of contamination can be reduced by training the HMMs on the basis of a transcription that better matches the actual pronunciation, e.g., by modeling pronunciation variation or by training HMMs on read speech. Furthermore, we found that context-dependent HMMs should preferably not be trained on baseline transcriptions if there is a mismatch between these baseline transcriptions of the speech material and the realized pronunciation. Finally, we found that by combining the changes in the properties of the CSR, the quality of automatic transcription can be further improved. The second

