Results 11 - 20
of
45
Towards Multi-Domain Speech Understanding with Flexible and Dynamic Vocabulary
, 2001
"... In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dia ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In developing telephone-based conversational systems, we foresee future systems capable of supporting multiple domains and flexible vocabulary. Users can pursue several topics of interest within a single telephone call, and the system is able to switch transparently among domains within a single dialog. This system is able to detect the presence of any out-of-vocabulary (OOV) words, and automatically hypothesizes each of their pronunciation, spelling and meaning. These can be confirmed with the user and the new words are subsequently incorporated into the recognizer lexicon for future use. This thesis
Data Reprocessing in Signal Understanding Systems
, 1996
"... DATA REPROCESSING IN SIGNAL UNDERSTANDING SYSTEMS SEPTEMBER 1996 FRANK I. KLASSNER, III B.S., UNIVERSITY OF SCRANTON M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Victor R. Lesser Signal understanding systems have the difficult tas ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
DATA REPROCESSING IN SIGNAL UNDERSTANDING SYSTEMS SEPTEMBER 1996 FRANK I. KLASSNER, III B.S., UNIVERSITY OF SCRANTON M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Victor R. Lesser Signal understanding systems have the difficult task of interpreting environmental signals: decomposing them and explaining their components in terms of an arbitrary number of instances of perceptual object categories whose properties can interact with one another. This dissertation addresses the problem of designing blackboard-based perceptual systems for interpreting signals from complex environments. A "complex environment" is one that can (1) produce signal-to-noise ratios that vary unpredictably over time, and (2) can contain perceptual objects that mutually interfere with each others' signal signature, or have arbitrary time-dependent behaviors. The traditional design paradigm for perceptual systems assumes that some particular set of ...
Phonological Parsing for Bi-directional Letterto-Sound/Sound-to-Letter Generation
- Journal of Speech Communication
, 1995
"... In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information suc ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
In this paper, we describe a reversible letter-to-sound/sound-to-letter generation system based on an approach which com-bines a rule-based formalism with data-driven techniques. We adopt a probabilistic parsing strategy to provide a hierarchical lexical analysis of a word, including information such as mor-phology, stress, syllabification, phonemics and graphemics. Long-distance constraints are propagated by enforcing local constraints throughout the hierarchy. Our training and test-ing corpora are derived from the high-frequency portion of the Brown Corpus (10,000 words), augmented with markers indicating stress and word morphology. We evaluated our performance based on an unseen test set. The percentage of nonparsable words for letter-to-sound and sound-to-letter generation were 6 % and 5 % respectively. Of the remaining words our system achieved a word accuracy of 71.8~0 and a phoneme accuracy of 92.5 % for letter-to-sound generation, and a word accuracy of 55.8 % and letter accuracy of 89.4% for sound-to-letter generation. We also compared our hierar-chical approach with an alternative, single-layer approach to demonstrate how the hierarchy provides a parsimonious de-scription for English orthographic-phonological regularities, while simultaneously attaining competitive generation accu-racy.
Error Correction Via A Post-Processor For Continuous Speech Recognition
- In Proc. ICASSP
, 1996
"... This paper presents a new technique for overcoming several types of speech recognition errors by post-processing the output of a continuous speech recognizer. The post-processor output contains fewer errors, thereby making interpretation by higher-level modules, such as a parser, in a speech underst ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
This paper presents a new technique for overcoming several types of speech recognition errors by post-processing the output of a continuous speech recognizer. The post-processor output contains fewer errors, thereby making interpretation by higher-level modules, such as a parser, in a speech understanding system more reliable. The primary advantage to the post-processing approach over existing approaches for overcoming SR errors lies in its ability to introduce options that are not available in the SR module's output. This work provides evidence for the claim that a modern continuous speech recognizer can be used successfully in "black-box" fashion for robustly interpreting spontaneous utterances in a dialogue with a human.
HMM Continuous Speech Recognition Using Predictive LR Parsing
- In IEEE International Conference on Acoustics, Speech and Signal Processing
, 1989
"... This paper proposes a new continuous speech recognition method using an ecient parsing mechanism, an LR parser, driving HMM modules directly without any intervening structures such as a phoneme lattice. Accurate and ecient speech parsing is achieved by combining HMM and LR parsing. This method is te ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper proposes a new continuous speech recognition method using an ecient parsing mechanism, an LR parser, driving HMM modules directly without any intervening structures such as a phoneme lattice. Accurate and ecient speech parsing is achieved by combining HMM and LR parsing. This method is tested in Japanese phrase recognition experiments. Two grammars are prepared, a general Japanese grammar and a task-specic grammar. The phrase recognition rate with the general grammar is 72% for top candidates and 95% for the ve best candidates. With the task-specic grammar, recognition rate is 80% and 99%, respectively. 1 INTRODUCTION There have been many speech recognition systems which use syntactic information to improve recognition accuracy. For example, statistical language modeling such as a bigram or a trigram [1, 2, 3], nite state grammars [4, 5] and context-free grammars [6, 7]. This paper proposes a new method for parsing speech data directly without any intervening structur...
Speech Recognition in Mobile Environments
, 2000
"... The growth of cellular telephony combined with recent advances in speech recognition technology results in sizeable potential opportunities for mobile speech recognition applications. Classic robustness techniques that have been previously proposed for speech recognition yield limited improvements o ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
The growth of cellular telephony combined with recent advances in speech recognition technology results in sizeable potential opportunities for mobile speech recognition applications. Classic robustness techniques that have been previously proposed for speech recognition yield limited improvements of the degradation introduced by idiosyncrasies of the mobile networks. These sources of degradation include distortion introduced by the speech codec as well as artifacts arising from channel errors and discontinuous transmission. In this thesis we focus on characterizing the distortion introduced to the speech signal by the speech codec and we propose methods for reducing the detrimental effect of coding on recognition accuracy. The initial focus of this thesis is on the full rate GSM codec (FRGSM) . We propose a method to generate recognition features directly from codec parameters. It is shown in this work that by selectively constructing a cepstral feature vector from the GSM codec para...
Progress in Dynamic Programming Search for LVCSR
- Proceedings of the IEEE
, 1997
"... This paper gives an overview of the recent improvements in dynamic programming search for large vocabulary continuous speech recognition: search using lexical trees, time-conditioned search and word graph construction. ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper gives an overview of the recent improvements in dynamic programming search for large vocabulary continuous speech recognition: search using lexical trees, time-conditioned search and word graph construction.
A Framework and Toolkit for the Construction of Multimodal Learning Interfaces
, 1998
"... Multimodal human-computer interaction, in which the computer accepts input from multiple channels or modalities, is more flexible, natural, and powerful than unimodal interaction with input from a single modality. Many research studies ([Hauptmann89], [Nakagawa94], [Nishimoto94], [Oviatt97b], [Chu97 ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Multimodal human-computer interaction, in which the computer accepts input from multiple channels or modalities, is more flexible, natural, and powerful than unimodal interaction with input from a single modality. Many research studies ([Hauptmann89], [Nakagawa94], [Nishimoto94], [Oviatt97b], [Chu97], to name a few) have reported that the combination of human communication means such as speech, gestures, handwriting, eye movement, etc. enjoys strong preference among users. Unfortunately, the development of multimodal applications is difficult and still suffers from a lack of generality, such that a lot of duplicated effort is wasted when implementing different applications sharing some common aspects. The research presented in this dissertation aims to provide a partial solution to the difficult problem of developing multimodal applications by creating a modular, distributed, and customizable infrastructure to facilitate the construction of such applications. This dissertation contribu...
Mapping eye movements to cognitive processes
, 1999
"... policies, either expressed or implied, of the NSF or the U.S. government. ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
policies, either expressed or implied, of the NSF or the U.S. government.
IMPLEMENTATION ASPECTS OF LARGE VOCABULARY RECOGNITION BASED ON INTRAWORD AND INTERWORD PHONETIC UNITS
, 1990
"... Most large vocabulary speech recognition systems essentially consist of a training algorithm and a recognition structure which is essentially a search for the best path through a rather large decoding network. Although the performance of the recognizer is crucially tied to the details of the trainin ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Most large vocabulary speech recognition systems essentially consist of a training algorithm and a recognition structure which is essentially a search for the best path through a rather large decoding network. Although the performance of the recognizer is crucially tied to the details of the training procedure, it is absolutely essential that the recognition structure be efficient in terms of computation and memory, and accurate in terms of actually determining the best path through the lattice, so that a wide range of training (sub-word unit creation) strategies can be efficiently evaluated in a reasonable time period. We have considered an architecture in which we incorporate several well known procedures (beam search, compiled network, etc.) with some new ideas (stacks of active network nodes, likelihood computation on demand, guided search, etc.) to implement a search procedure which maintains the accuracy of the full search but which can decode a single sentence in about one minute of computing time (about 20 times real time) on a vectorized, concurrent processor. The ways in which we have realized this significant computational reduction are described in this paper.

