Results 1 - 10
of
57
Designing the User Interface for Multimodal Speech and Pen-based Gesture Applications: State-of-the-Art Systems and Future Research Directions
, 2000
"... The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of humancomputer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applic ..."
Abstract
-
Cited by 102 (14 self)
- Add to MetaCart
The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of humancomputer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applications, to be usable by a broader spectrum of the average population, and to function more reliably under realistic and challenging usage conditions. In this paper, we summarize the emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner--- including early and late fusion approaches, and the new hybrid symbolic/statistical approach. We also describe a diverse collection of state-of-the-art multimodal systems that process users' spoken and gestural input. These applications range from map-based and virtual reality systems for engaging in simulations and training, to field medic systems for mobile use in noisy environments, to web-based transactions and standard text-editing applications that will reshape daily computing and have a significant commercial impact. To realize successful multimodal systems of the future, many key research challenges remain to be addressed. Among these challenges are the development of cognitive theories to guide multimodal system design, and the development of effective natural language processing, dialogue processing, and error handling techniques. In addition, new multimodal systems will be needed that can function more robustly and adaptively, and with support for collaborative multi-person use. Before this new class of systems can proliferate, toolkits also will be needed to promote software development for both simulated and functioning systems. Multimodal Speech and Gesture Interfaces 3 CONT...
Robust Continuous Speech Recognition Using Parallel Model Combination
- IEEE Transactions on Speech and Audio Processing
, 1996
"... This paper addresses the problem of automatic speech recognition in the presence of interfering noise. It focuses on the Parallel Model Combination (PMC) scheme, which has been shown to be a powerful technique for achieving noise robustness. Most experiments reported on PMC to date have been on s ..."
Abstract
-
Cited by 78 (5 self)
- Add to MetaCart
This paper addresses the problem of automatic speech recognition in the presence of interfering noise. It focuses on the Parallel Model Combination (PMC) scheme, which has been shown to be a powerful technique for achieving noise robustness. Most experiments reported on PMC to date have been on small, 10-50 word vocabulary systems. Experiments on the Resource Management (RM) database, a 1000 word continuous speech recognition task, reveal compensation requirements not highlighted by the smaller vocabulary tasks. In particular, that it is necessary to compensate the dynamic parameters as well as the static parameters to achieve good recognition performance. The database used for these experiments was the RM speaker independent task with either Lynx Helicopter noise or Operation Room noise from the NOISEX-92 database added. The experiments reported here used the HTK RM recogniser developed at CUED modified to include PMC based compensation for the static, delta and delta-delta parameters. After training on clean speech data,the performance of the recogniser was found to be severely degraded when noise was added to the speech signal at between 10dB and 18dB. However, using PMC the performance was restored to a level comparable with that obtained when training directly in the noise corrupted environment. 1
Understanding Speech Understanding: Towards A Unified Theory Of Speech Perception
, 1996
"... Ever since Helmholtz, the perceptual basis of speech has been associated with the energy distribution across frequency. However, there is now accumulating evidence that speech understanding does not require a detailed spectral portraiture of the signal. As a consequence, a new theoretical perspectiv ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
Ever since Helmholtz, the perceptual basis of speech has been associated with the energy distribution across frequency. However, there is now accumulating evidence that speech understanding does not require a detailed spectral portraiture of the signal. As a consequence, a new theoretical perspective, focused on time, is beginning to emerge. This framework emphasizes the temporal evolution of coarse spectral patterns as the primary carrier of information within the speech signal, and provides an efficient and effective means of shielding linguistic information against the potentially hostile forces of the natural soundscape, such as reverberation and background acoustic interference. The auditory system may extract this relational information through computation of the low-frequency modulation spectrum in the auditory cortex, and this representation provides a principled basis for segmentation of the speech signal into syllabic units. Because of the systematic relationship between the syllable and higher-level lexicogrammatical organization it is possible, in principle, to gain direct access to the lexicon and grammar through such an auditory analysis of speech.
Uncertainty decoding for noise robust speech recognition
- in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings
Predictive Model-Based Compensation Schemes for Robust Speech Recognition
- Speech Communication
, 1998
"... For practical applications speech recognition systems need to be insensitive to differences between training and test acoustic conditions. Differences in the acoustic environment may result from various sources, such as ambient background noise, channel variations and speaker stress. These differ ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
For practical applications speech recognition systems need to be insensitive to differences between training and test acoustic conditions. Differences in the acoustic environment may result from various sources, such as ambient background noise, channel variations and speaker stress. These differences can dramatically degrade the performance of a speech recognition system. A wide range of techniques have been proposed for achieving noise robustness. This paper considers one particular approach to model-based compensation, predictive model-based compensation, which has been shown to achieve good noise robustness in a wide range of acoustic environments. The characteristic of these schemes is that they combine a speech model with an additive noise model, a channel model and, in the general case, a speaker stress model, to generate a corrupted-speech model. The general theory of these predictive techniques is discussed. Various approximations for rapidly performing the model combination stage have been proposed and are reviewed in this paper. The advantages and the limitations of such a predictive approach to noise robustness are also discussed. In addition, methods for combining predictive schemes with schemes which make use of speech data in the new environment, adaptive schemes, are detailed. This combined approach overcomes some of the limitations of the predictive schemes. 1 The author is now at the IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA. 1
Non-Linear Transformations Of The Feature Space For Robust Speech Recognition
- Proceedings of ICASSP 2002
, 2002
"... The noise usually produces a non-linear distortion of the feature space considered for Automatic Speech Recognition. This distortion causes a mismatch between the training and recognition conditions which significantly degrades the performance of speech recognizers. In this contribution we analyze t ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
The noise usually produces a non-linear distortion of the feature space considered for Automatic Speech Recognition. This distortion causes a mismatch between the training and recognition conditions which significantly degrades the performance of speech recognizers. In this contribution we analyze the effect of the additive noise over cepstral based representations and we compare several approaches to compensate this effect. We discuss the importance of the non-linearities introduced by the noise and we propose a method (based on the histogram equalization technique) specifically oriented to the compensation of the non-linear transformation caused by the additive noise. The proposed method has been evaluated using the AURORA-2 database and task. The recognition results show significant improvements with respect to other compensation methods reported in the bibliography and reveals the importance of the non-linear effects of the noise and the utility of the proposed method.
An Architecture and Interaction Techniques for Handling Ambiguity in Recognition-based Input
, 2001
"... ..."
Perceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments
, 1998
"... Natural, hands-free interaction with computers is currently one of the great unfulfilled promises of automatic speech recognition (ASR), in part because ASR systems cannot reliably recognize speech under everyday, reverberant conditions that pose no problems for most human listeners. The specific pr ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Natural, hands-free interaction with computers is currently one of the great unfulfilled promises of automatic speech recognition (ASR), in part because ASR systems cannot reliably recognize speech under everyday, reverberant conditions that pose no problems for most human listeners. The specific properties of the auditory representation of speech likely contribute to reliable human speech recognition under such conditions. This dissertation explores the use of perceptually inspired signal-processing strategies -- critical-band-like frequency analysis, an emphasis of slow changes in the spectral structure of the speech signal, adaptation, integration of phonetic information over syllabic durations, and use of multiple signal representations for...

