Results 1 -
7 of
7
Multimodal Interfaces
- Artificial Intelligence Review Journal, special issue
, 1994
"... In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instea ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instead we move to involve all available human communication modalities. These human modalities include Speech, Gesture and Pointing,
Recent Advances In JANUS: A Speech Translation System
, 1993
"... We present recent advances from our efforts in increasing coverage, robustness, generality and speed of JANUS, CMU's speech-to-speech translation system. JANUS is a speakerindependent system which translates spoken utterances in English and also in German into one of German, English or Japanese. The ..."
Abstract
-
Cited by 20 (10 self)
- Add to MetaCart
We present recent advances from our efforts in increasing coverage, robustness, generality and speed of JANUS, CMU's speech-to-speech translation system. JANUS is a speakerindependent system which translates spoken utterances in English and also in German into one of German, English or Japanese. The system has been designed around the task of conference registration (CR). It has initially been built based on a speech database of 12 read dialogs, encompassing a vocabulary of around 500 words. We have since been expanding the system along several dimensions to improve speed, robustness and coverage and to move toward spontaneous input. 1. INTRODUCTION In this paper we describe recent improvements of JANUS, a speech to speech translation system. Improvements have been made mainly along the following dimensions: 1.) better context-dependent modeling improves performance in the speech recognition module, 2.) improved language models, smoothing, and word equivalence classes improve coverage ...
JANUS 93: Towards Spontaneous Speech Translation
, 1994
"... We present first results from our efforts toward translation of spontaneously spoken speech. Improvements include increasing coverage, robustness, generality and speed of JANUS, the speech-to-speech translation system of Carnegie Mellon and Karlsruhe University. Recognition and Machine Translation E ..."
Abstract
-
Cited by 19 (12 self)
- Add to MetaCart
We present first results from our efforts toward translation of spontaneously spoken speech. Improvements include increasing coverage, robustness, generality and speed of JANUS, the speech-to-speech translation system of Carnegie Mellon and Karlsruhe University. Recognition and Machine Translation Engine have been upgraded to deal with requirements introduced by spontaneous human to human dialogs. To allow for development and evaluation of our system on adequate data, a large database with spontaneous scheduling dialogs is being gathered for English, German and Spanish. 1. OVERVIEW JANUS [1, 2] has been among early systems to attempt the translation of spoken dialogs. It had initially been built based on a speech database of 12 read dialogs of the conference registration task, encompassing a vocabulary of around 500 words. It was designed as a speaker-independent system which translates spoken utterances from English and also from German into one of German, English or Japanese. Speech...
Multimodal Human-Computer Interaction
- In Proceedings of ISSD'93. (Waseda
, 1993
"... While human-to-human communication takes advantage of an abundance of information and cues, human-computer interaction is limited to only a few input modalities (usually only keyboard and mouse) and provides little flexibility as to choice of communication modality. In this paper, we present an over ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
While human-to-human communication takes advantage of an abundance of information and cues, human-computer interaction is limited to only a few input modalities (usually only keyboard and mouse) and provides little flexibility as to choice of communication modality. In this paper, we present an overview of a family of research projects we are undertaking at Carnegie Mellon and Karlsruhe University to overcome some of these human-computer communication barriers. Multimodal interfaces are to include not only typing, but speech, lip-reading, eye-tracking, face recognition and tracking, and gesture and handwriting recognition. Initial experiments aimed at exploiting the complementary nature of these alternate modalities in interpreting user intent in a user interface are discussed.
Explicit N-Best Formant Features fo Segment-Based Speech Recognition
, 1996
"... This thesis investigates the use of explicit speech knowledge in computer speech-recognition. Speech knowledge is generally expressed in terms of acoustic events occurring near phonetic segment boundaries and the location, shape and dynamics of formant trajectories. This suggests the creation of a s ..."
Abstract
- Add to MetaCart
This thesis investigates the use of explicit speech knowledge in computer speech-recognition. Speech knowledge is generally expressed in terms of acoustic events occurring near phonetic segment boundaries and the location, shape and dynamics of formant trajectories. This suggests the creation of a segment-based recognition framework and the use of explicit formant features in a flexible integration scheme to ultimately improve the phonetic recognition accuracy. We describe a segmentation algorithm that produces a lattice of segment hypotheses, each with an associated broad phonetic identity. We build a single phonetic segment classifier along with separate vowel/semi-vowel and consonant classifiers based on traditional cepstral features paying attention to reducing the mismatch between training and deployment conditions. We develop a robust, N-best formant tracking algorithm that generates a list of up to N consistent formant interpretations. The use of the N best feature paradigum is based on the observation that there are generally only a handful of reasonable interpretation of the given formant information. Instead of finding the best formant interpretation through the use of a global cost function that includes energy maximization and smoothness terms, we delay the selection of the correct formant interpretation until after the segment classification and phonetic search. We use the formant interpretations to extract features for a vowel/semi-vowel segment classifier. The formant trajectories are approximated either by three line segments or by a third-order Legendre polynomial. We show that together with formant amplitude, formant bandwidth, pitch, and segment durations we can produce a classifier of comparable performance to a cepstral-based classifier. We further demonstrate the potential of the N best classification paradigm and show that a combination of formant and cepstral features further improves the classification accuracy. Finally, the validity of the entire approach of using a segment-based approach, separate classifiers for vowels and consontans, and explicit formant features is verified by phonetic recognition experiments.
Recent Advances In Janus: A Speech Translation System
, 1993
"... We present recent advances from our efforts in increasing coverage, robustness, generality and speed of JANUS, CMU's speech-to-speech translation system. JANUS is a speakerindependent system which translates spoken utterances in English and also in German into one of German, English or Japanese. The ..."
Abstract
- Add to MetaCart
We present recent advances from our efforts in increasing coverage, robustness, generality and speed of JANUS, CMU's speech-to-speech translation system. JANUS is a speakerindependent system which translates spoken utterances in English and also in German into one of German, English or Japanese. The system has been designed around the task of conference registration (CR). It has initially been built based on a speech database of 12 read dialogs, encompassing a vocabulary of around 500 words. We have since been expanding the system along several dimensions to improve speed, robustness and coverage and to move toward spontaneous input. 1. INTRODUCTION In this paper we describe recent improvements of JANUS, a speech to speech translation system. Improvements have been made mainly along the following dimensions: 1.) better context-dependent modeling improves performance in the speech recognition module, 2.) improved language models, smoothing, and word equivalence classes improve coverag...
Janus 93: Towards Spontaneous Speech Translation
, 1994
"... We present first results from our efforts toward translation of spontaneously spoken speech. Improvements include increasing coverage, robustness, generality and speed of JANUS, the speech-to-speech translation system of Carnegie Mellon and Karlsruhe University. Recognition and Machine Translation E ..."
Abstract
- Add to MetaCart
We present first results from our efforts toward translation of spontaneously spoken speech. Improvements include increasing coverage, robustness, generality and speed of JANUS, the speech-to-speech translation system of Carnegie Mellon and Karlsruhe University. Recognition and Machine Translation Engine have been upgraded to deal with requirements introduced by spontaneous human to human dialogs. To allow for development and evaluation of our system on adequate data, a large database with spontaneous scheduling dialogs is being gathered for English, German and Spanish. 1. OVERVIEW JANUS [1, 2] has been among early systems to attempt the translation of spoken dialogs. It had initially been built based on a speech database of 12 read dialogs of the conference registration task, encompassing a vocabulary of around 500 words. It was designed as a speaker-independent system which translates spoken utterances from English and also from German into one of German, English or Japanese. Speec...

