Results 1 - 10
of
17
GLR*: A Robust Grammar-Focused Parser for Spontaneously Spoken Language
, 1996
"... The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disflu ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disfluencies, the looser notion of grammaticality, and the lack of clearly marked sentence boundaries. The contamination of the input with errors of a speech recognizer can further exacerbate these problems. Most natural language parsing algorithms are designed to analyze "clean" grammatical input. Because they reject any input which is found to be ungrammatical in even the slightest way, such parsers are unsuitable for parsing spontaneous speech, where completely grammatical input is the exception more than the rule. This thesis describes GLR*, a parsing system based on Tomita's Generalized LR parsing algorithm, that was designed to be robust to two particular types of extra-grammaticality: noise...
Balancing Robustness and Efficiency in Unification-augmented Context-Free Parsers for Large Practical Applications
- Robustness in Language and Speech Technology
"... Large practical NLP applications require robust analysis components that can effectively handle input that is disfluent or extra-grammatical. The effectiveness and efficiency of any robust parser are a direct function of three main factors: (1) Flexibility: what types of disfluencies and deviations ..."
Abstract
-
Cited by 25 (7 self)
- Add to MetaCart
Large practical NLP applications require robust analysis components that can effectively handle input that is disfluent or extra-grammatical. The effectiveness and efficiency of any robust parser are a direct function of three main factors: (1) Flexibility: what types of disfluencies and deviations from the grammar can the parser handle?; (2) Search: How does the parser search the space of possible interpretations, and what techniques are applied to prune the search space?; and (3) Parse Selection and Disambiguation: What methods and resources are used to evaluate and rank potential parses and sub-parses, and how does the parser cope with the extreme levels of ambiguity introduced by its flexibility parameters? In this chapter we describe our investigations on how to balance flexibility and efficiency in the context of two different robust parsers - a GLR parser and a left corner Chart parser - both based on a unification-augmented context-free grammar formalism. We demonstrate how the...
Speech Recognition using Neural Networks
, 1995
"... This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modelin ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NN-HMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic ...
Speech-language Integration in a Multi-lingual Speech Translation System
- In Proceedings of the Workshop on Integration of Natural Language and Speech Processing, AAAI
, 1994
"... In this paper we report on our e orts to combine speech and language processing toward multi-lingual spontaneous speech translation. The ongoing work extends our JANUS system e ort toward handling spontaneous spoken discourse and multiple languages. A major objective of this project is to maximize t ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
In this paper we report on our e orts to combine speech and language processing toward multi-lingual spontaneous speech translation. The ongoing work extends our JANUS system e ort toward handling spontaneous spoken discourse and multiple languages. A major objective of this project is to maximize the number of modules, methods and data structures that are language-independent and extensible to other domains. After an overview of the task, databases and the system architecture we will focus on how speech decoding and natural language processing modules will be integrated in a large-scale multi-lingual speech-to-speech translation system for spontaneous spoken discourse. 1.
Input Segmentation of Spontaneous Speech in JANUS: a Speech-to-speech Translation System
- IN PROCEEDINGS OF THE ECAI-96
, 1996
"... JANUS is a multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneous conversation in a limited domain. In this paper we describe how multi-level segmentation of single utterance turns improves translation quality and facilitat ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
JANUS is a multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneous conversation in a limited domain. In this paper we describe how multi-level segmentation of single utterance turns improves translation quality and facilitates accurate translation in our system. We define the basic dialogue units that are handled by our system, and discuss the cues and methods employed by the system in segmenting the input utterance into such units. Utterance segmentation in our system is performed in a multi-level incremental fashion, partly prior and partly during analysis by the parser. The segmentation relies on a combination of acoustic, lexical, semantic and statistical knowledge sources, which are described in detail in the paper. We also discuss how our system is designed to disambiguate among alternative possible input segmentations.
End-to-end Evaluation in JANUS: a Speech-to-speech Translation System
- Proceedings of ECAI Workshop on Dialogue Processing in Spoken Language Systems
, 1996
"... Abstract. JANUS is a multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneousconversation in a limited domain. In this paper we describe our methodology for evaluating translation performance. Our current focus is on end-to-e ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
Abstract. JANUS is a multi-lingual speech-to-speech translation system designed to facilitate communication between two parties engaged in a spontaneousconversation in a limited domain. In this paper we describe our methodology for evaluating translation performance. Our current focus is on end-to-end evaluations- the evaluation of the translation capabilities of the system as a whole. The main goal of our end-to-end evaluation procedure is to determine translation accuracy on a test set of previously unseen dialogues. Other goals include evaluating the effectiveness of the system in conveying the domain relevant information and in detecting and dealing appropriately with utterances (or portions of utterances) that are out-of-domain. End-toend evaluations are performed in order to verify the general coverage of our knowledge sources, guide our development efforts, and to track our improvement over time. We discuss our evaluation procedures, the criteria used for assigning scores to translations produced by the system, and the tools developed for performing this task. Our most recent Spanish-to-English performance evaluation results are presented as an example.
JANUS-III: Speech-to-Speech Translation in Multiple Languages
- In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'97
, 1997
"... This paper describes JANUS-III, our most recent version of the JANUS speech-to-speech translation system. We present anoverview of the system and focus on how system design facilitates speech translation between multiple languages, and allows for easy adaptation to new source and target languages. W ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
This paper describes JANUS-III, our most recent version of the JANUS speech-to-speech translation system. We present anoverview of the system and focus on how system design facilitates speech translation between multiple languages, and allows for easy adaptation to new source and target languages. We also describe our methodology for evaluation of end-to-end system performance with a variety of source and target languages. For system development and evaluation, we haveexperimented with both push-to-talk as well as cross-talk recording conditions. To date, our system has achieved performance levels of over 80 % acceptable translations on transcribed input, and over 70 % acceptable translations on speech input recognized with a 75-90 % word accuracy. Our current major research is concentrated on enhancing the capabilities of the system to deal with input in broad and general domains. 1.
GLR* - A Robust Parser For Spontaneously Spoken Language
- In Proceedings of ESSLLI-96 Workshop on Robust Parsing
, 1996
"... This paper describes GLR*, a parsing system based on Tomita's Generalized LR parsing algorithm, that was designed to be robust to two particular types of extra-grammaticality: noise in the input, and limited grammar coverage. GLR* attempts to overcome these forms of extra-grammaticality by ignoring ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper describes GLR*, a parsing system based on Tomita's Generalized LR parsing algorithm, that was designed to be robust to two particular types of extra-grammaticality: noise in the input, and limited grammar coverage. GLR* attempts to overcome these forms of extra-grammaticality by ignoring the unparsable words and fragments and conducting a search for the maximal subset of the original input that is covered by the grammar. The parser is coupled with a beam search heuristic, that limits the combinations of skipped words considered by the parser, and ensures that the parser will operate within feasible time and space bounds. The developed parsing system includes several tools designed to address the difficulties of parsing spontaneous speech: a statistical disambiguation module, an integrated heuristic for evaluating and ranking the parses produced by the parser, and a parse quality heuristic, that allows the parser to self-judge the quality of the parse chosen as best. To evalu...
An Efficient Distribution of Labor in a Two Stage Robust Interpretation Process
, 1997
"... Although Minimum Distance Parsing (MDP) offers a theoretically attractive solution to the problem of extragrammaticality, it is often computationally infeasible in large scale practical applications. In this paper we present an alternative approach where the labor is distributed between a more restr ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Although Minimum Distance Parsing (MDP) offers a theoretically attractive solution to the problem of extragrammaticality, it is often computationally infeasible in large scale practical applications. In this paper we present an alternative approach where the labor is distributed between a more restrictive partial parser and a repair module. Though two stage approaches have grown in popularity in recent years because of their efficiency, they have done so at the cost of requiring hand coded repair heuristics (Ehrlich and Hanrieder, 1996; Danieli and Gerbino, 1995). In contrast, our two stage approach does not require any hand coded knowledge sources dedicated to repair, thus making it possible to achieve a similar run time advantage over MDP without losing the quality of domain independence. 1 Introduction The correct interpretation of spontaneous spoken language poses challenges that continue to fall outside of the reach of state-of-the-art technology. The first essential task of a na...
The JANUS Speech Recognizer
- In ARPA SLT Workshop
, 1995
"... JANUS [17] was designed for the translation of spontaneous human-to-human speech. Before the 1994 CSR evaluation, JANUS was run with vocabularies of up to 2500 words. JANUS was also tested on the Conference Registration and the Resource Management tasks. The best error rate on the '89 Resource Manag ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
JANUS [17] was designed for the translation of spontaneous human-to-human speech. Before the 1994 CSR evaluation, JANUS was run with vocabularies of up to 2500 words. JANUS was also tested on the Conference Registration and the Resource Management tasks. The best error rate on the '89 Resource Management evaluation set was 5.9%. At the June 1994 Verbmobil speech component evaluation [1], JANUS scored best among eight participants on the German appointment scheduling task, a task of spontaneous human to human dialogs. In this paper we give a detailed description of the recognition engine of JANUS, focusing on the acoustic modeling and our first run with the WSJ task. 1. ACOUSTIC MODELING IN JANUS 1.1 PREPROCESSING For the 1994 CSR evaluation we computed 16 mel scale spectral coefficients from an FFT with a window size of 256 sample points and a window shift (frame rate) of 10 ms. 16 mel spectral coefficients, 16 delta coefficients, and 16 delta-delta coefficients were used to build a 4...

