Results 1 - 10
of
56
A Corpus-based study of repair cues in spontaneous speech
"... this paper, acoustic and prosodic cues to such repairs are identified, based on an analysis of a corpus taken from the ARPA Air Travel Information System database, and methods are proposed for exploiting these cues for repair detection, especially the task of modeling word fragments, and repair corr ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
this paper, acoustic and prosodic cues to such repairs are identified, based on an analysis of a corpus taken from the ARPA Air Travel Information System database, and methods are proposed for exploiting these cues for repair detection, especially the task of modeling word fragments, and repair correction. The relative contributions of these speech-based cues, as well as other text-based repair cues, are examined in a statistical model of repair site detection that achieves a precision rate of 91% and recall of 86% on a prosodically labeled corpus of repair utterances. (This paper appears in the Journal of the Acoustical Society of America, 95 (3), March 1994, pp.1603--1616.) PACS numbers: 43.72Ja,43.70.B,43.70.Bk,43.70.Fq Nakatani&Hirschberg, JASA 2 Introduction
Large Vocabulary Continuous Speech Recognition: a Review
- of INCIS Project, Schedule 6 in (Small
, 1996
"... This article will discuss the principles and architecture of current LVR systems and identify the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system will be described. This is a modern design giving state-of-the-art performance ..."
Abstract
-
Cited by 62 (1 self)
- Add to MetaCart
This article will discuss the principles and architecture of current LVR systems and identify the key issues affecting their future deployment. To illustrate the various points raised, the Cambridge University HTK system will be described. This is a modern design giving state-of-the-art performance and it is typical of the current generation of recognition systems. 2 System Overview
Interactive Translation of Conversational Speech
, 1996
"... iscuss their usability and performance. 1.0 Introduction Multilinguality will take on spoken form when information services are to extend beyond national boundaries or across language groups. Database access by speech will need to handle multiple languages to service customers from different langu ..."
Abstract
-
Cited by 53 (7 self)
- Add to MetaCart
iscuss their usability and performance. 1.0 Introduction Multilinguality will take on spoken form when information services are to extend beyond national boundaries or across language groups. Database access by speech will need to handle multiple languages to service customers from different language groups. Public service operators (emergency, police, telephone operators and others) frequently receive requests from foreigners unable to speak the national language. Already multilingual spoken language services are growing. Telephone companies in the US (AT&T Language Line), Europe and Japan now offer language translation services over the telephone, provided by human operators. Movies and television broadcasts are routinely translated and Interactive Translation of Conversational Speech 2 delivered either by dubbing, subtitles or multilingual transcripts. With the drive of automating information services, therefore, comes a growing need for automate
GLR*: A Robust Grammar-Focused Parser for Spontaneously Spoken Language
, 1996
"... The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disflu ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disfluencies, the looser notion of grammaticality, and the lack of clearly marked sentence boundaries. The contamination of the input with errors of a speech recognizer can further exacerbate these problems. Most natural language parsing algorithms are designed to analyze "clean" grammatical input. Because they reject any input which is found to be ungrammatical in even the slightest way, such parsers are unsuitable for parsing spontaneous speech, where completely grammatical input is the exception more than the rule. This thesis describes GLR*, a parsing system based on Tomita's Generalized LR parsing algorithm, that was designed to be robust to two particular types of extra-grammaticality: noise...
Automatic summarization of open-domain multiparty dialogues in diverse genres
- Computational Linguistics
, 2002
"... Automatic summarization of open-domain spoken dialogues is a relatively new research area. This article introduces the task and the challenges involved and motivates and presents an approach for obtaining automatic-extract summaries for human transcripts of multiparty dialogues of four different gen ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
Automatic summarization of open-domain spoken dialogues is a relatively new research area. This article introduces the task and the challenges involved and motivates and presents an approach for obtaining automatic-extract summaries for human transcripts of multiparty dialogues of four different genres, without any restriction on domain. We address the following issues, which are intrinsic to spoken-dialogue summarization and typically can be ignored when summarizing written text such as news wire data: (1) detection and removal of speech disfluencies; (2) detection and insertion of sentence boundaries; and (3) detection and linking of cross-speaker information units (question-answer pairs). A system evaluation is performed using a corpus of 23 dialogue excerpts with an average duration of about 10 minutes, comprising 80 topical segments and about 47,000 words total. The corpus was manually annotated for relevant text spans by six human annotators. The global evaluation shows that for the two more informal genres, our summarization system using dialoguespecific components significantly outperforms two baselines: (1) a maximum-marginal-relevance ranking algorithm using TF*IDF term weighting, and (2) a LEAD baseline that extracts the first n words from a text. 1.
Towards Better Language Models For Spontaneous Speech
, 1994
"... In our effort to build a speech--to--speech translation system for spontaneous spoken dialogs we have developed several methods to improve the language models of the speech decoder of the system. We attempt to take advantage of natural equivalence word classes, frequently occuring word phrases, and ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
In our effort to build a speech--to--speech translation system for spontaneous spoken dialogs we have developed several methods to improve the language models of the speech decoder of the system. We attempt to take advantage of natural equivalence word classes, frequently occuring word phrases, and discourse structure. Each of these methods was tested on spontaneous English, German and Spanish human--human dialogs. 1. INTRODUCTION The goal of the JANUS project is multi-lingual machine translation of spontaneously spoken dialogs in a limited domain: two people scheduling a meeting with each other. We are currently working with English, German, and Spanish as source languages and German, English, and Japanese as target languages. Table 1 shows the size of training and test set for the English, German and Spanish Spontaneous Scheduling Task databases (ESST, GSST, SSST) used for all experiments reported in this paper, and the coverage of the dictionary over the test set. 1 ESST GSST SSST...
GLR* -- An Efficient Noise-skipping Parsing Algorithm For Context Free Grammars
, 1993
"... This chapter describes GLR*, a parser that can parse any input sentence by ignoring unrecognizable parts of the sentence. Using an efficient algorithm, the parser is capable of finding and parsing a maximal subset of the original input that is parsable, and therefore return the parse with fewest ski ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
This chapter describes GLR*, a parser that can parse any input sentence by ignoring unrecognizable parts of the sentence. Using an efficient algorithm, the parser is capable of finding and parsing a maximal subset of the original input that is parsable, and therefore return the parse with fewest skipped words. The parser returns some parse(s) for any input sentence, unless no part of the sentence can be recognized at all. Formally, the problem can be defined in the following way: Given a context-free grammar G and a sentence S, find and parse S 0 - the largest subset of words of S, such that S 0 2 L(G). The algorithm described in this chapter is a modification of the Generalized LR (Tomita) parsing algorithm (Tomita, (1986)). The parser accommodates the skipping of words by allowing shift operations to be performed from inactive state nodes of the Graph Structured Stack. A heuristic similar to beam search makes the algorithm computationally tractable. The modified parser, GLR*, h...
Speech Repairs, Intonational Boundaries and Discourse Markers: Modeling Speakers
- Department of Computer Science, University of Rochester
, 1997
"... Peter Heeman was born October 22, 1963, and much to his dismay his parents had already moved away from Toronto. Instead he was born in London Ontario, where he grew up on a strawberry farm. He attended the University of Waterloo where he re-ceived a Bachelors of Mathematics with a joint degree in Pu ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
Peter Heeman was born October 22, 1963, and much to his dismay his parents had already moved away from Toronto. Instead he was born in London Ontario, where he grew up on a strawberry farm. He attended the University of Waterloo where he re-ceived a Bachelors of Mathematics with a joint degree in Pure Mathematics and Com-puter Science in the spring of 1987. After working two years for a software engineering company, which supposedly used artificial intelligence techniques to automate COBOL and CICS programming, Peter was ready for a change. What better way to wipe the slate clear than by going to graduate school at the University of Toronto, but not without first spending the sum-mer in Europe. After spending two months in countries where he couldn’t speak the language, Peter became fascinated by language, and so decided to give computational linguistics a try.
Multimodal Interfaces
- Artificial Intelligence Review Journal, special issue
, 1994
"... In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instea ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instead we move to involve all available human communication modalities. These human modalities include Speech, Gesture and Pointing,
Extensions to Constraint Dependency Parsing for Spoken Language Processing
- COMPUTER SPEECH AND LANGUAGE
, 1995
"... A text-based and spoken language processing framework based on the Constraint Dependency Grammar (CDG) developed by Maruyama [24, 25] is discussed. The scope of CDG is expanded to allow for the analysis of sentences containing lexically ambiguous words, to allow feature analysis in constraints, and ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
A text-based and spoken language processing framework based on the Constraint Dependency Grammar (CDG) developed by Maruyama [24, 25] is discussed. The scope of CDG is expanded to allow for the analysis of sentences containing lexically ambiguous words, to allow feature analysis in constraints, and to efficiently process multiple sentence candidates that are likely to arise in spoken language processing. The benefits of the CDG parsing approach are summarized. Additionally, the development of CDG grammars using our grammar tools and parser is discussed.

