Results 1 - 10
of
24
Preliminaries to a Theory of Speech Disfluencies
, 1994
"... This thesis examines disfluencies (e.g., "um", repeated words, and a variety of forms of self-repair) in the spontaneous speech of adult normal speakers of American English. Despite their prevalence, disfluencies have traditionally been viewed as irregular events and have received little attention. ..."
Abstract
-
Cited by 97 (7 self)
- Add to MetaCart
This thesis examines disfluencies (e.g., "um", repeated words, and a variety of forms of self-repair) in the spontaneous speech of adult normal speakers of American English. Despite their prevalence, disfluencies have traditionally been viewed as irregular events and have received little attention. The goal of the thesis is to provide evidence that, on the contrary, disfluencies show remarkably regular trends in a number of dimensions. These regularities have consequences for models of human language production; they can also be exploited to improve performance in speech applications. The method includes analysis of over 5000 hand-annotated disfluencies from a database (250,000 words) containing three different styles of spontaneous speech: task-oriented human-computer dialog, task-oriented human-human dialog, and human-human conversation on a prescribed topic. The approach is theory-neutral and strongly data-driven. The annotations correspond to observable characteristics ("features") ...
A hierarchical duration model for speech recognition based on the ANGIE framework
- in Proc. Eurospeech '97
, 1999
"... This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonologic ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
This paper presents a hierarchical duration model applied to enhance speech recognition. The model is based on the novel ANGIE framework which is a flexible unified sublexical representation designed for speech applications. This duration model captures duration phenomena operating at the phonological, phonemic, syllabic and morphological levels. At the core of the modelling scheme is a hierarchical normalization procedure performed on the ANGIE parse structure. From this, we derive a robust measure for the rate of speech. The model uses two sets of statistical models - a first set based on relative duration between sublexical units and a second set based on absolute duration that has been normalized with respect to the speaking rate. We have used this paradigm to explore some speech timing phenomena such as the secondary effects on relative duration due to variations in speaking rate, the characteristics of anomalously slow words, and prepausal lengthening effects. Finally, we successfully demonstrate the utility of durational information for recognition applications. In phonetic recognition, we achieve a relative improvement of up to 7.7% by incorporating our model over and above a standard phone duration model, and similarly, in a word spotting task, an improvement from $9.3 to 91.6 (FOM) has resulted. 1999 Elsevier Science B.V. All rights reserved.
Online learning of relaxed CCG grammars for parsing to logical form
- In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-2007
, 2007
"... We consider the problem of learning to parse sentences to lambda-calculus representations of their underlying semantics and present an algorithm that learns a weighted combinatory categorial grammar (CCG). A key idea is to introduce non-standard CCG combinators that relax certain parts of the gramma ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
We consider the problem of learning to parse sentences to lambda-calculus representations of their underlying semantics and present an algorithm that learns a weighted combinatory categorial grammar (CCG). A key idea is to introduce non-standard CCG combinators that relax certain parts of the grammar—for example allowing flexible word order, or insertion of lexical items— with learned costs. We also present a new, online algorithm for inducing a weighted CCG. Results for the approach on ATIS data show 86 % F-measure in recovering fully correct semantic analyses and 95.9% F-measure by a partial-match criterion, a more than 5 % improvement over the 90.3% partial-match figure reported by He and Young (2006).
Rapid language model development for new task domains
- Proc. First International Conference on Language Resources and Evaluation (LREC
, 1998
"... Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The f ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The first technique is based on using a context-free grammar to generate a corpus of word collocations. The second is an adaptation technique based on using out-of-domain corpora to estimate target domain language models. We report results of successfully using these two techniques individually and in combination to build efficient models for a spontaneous speech recognition task in a medium-sized vocabulary domain. 1.
Transparent Combination of Rule-Based and Data-Driven Approaches in a Speech Understanding Architecture
, 2003
"... We describe a domain-independent semantic interpretation architecture suitable for spoken dialogue systems, which uses a decision-list method to effect a transparent combination of rule-based and data-driven approaches. The architecture has been implemented and evaluated in the context of a m ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
We describe a domain-independent semantic interpretation architecture suitable for spoken dialogue systems, which uses a decision-list method to effect a transparent combination of rule-based and data-driven approaches. The architecture has been implemented and evaluated in the context of a medium- vocabulary command and control task.
Generative and Discriminative Algorithms for Spoken Language Understanding
"... Spoken Language Understanding (SLU) for conversational systems (SDS) aims at extracting concept and their relations from spontaneous speech. Previous approaches to SLU have modeled concept relations as stochastic semantic networks ranging from generative approach to discriminative. As spoken dialog ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Spoken Language Understanding (SLU) for conversational systems (SDS) aims at extracting concept and their relations from spontaneous speech. Previous approaches to SLU have modeled concept relations as stochastic semantic networks ranging from generative approach to discriminative. As spoken dialog systems complexity increases, SLU needs to perform understanding based on a richer set of features ranging from a-priori knowledge, long dependency, dialog history, system belief, etc. This paper studies generative and discriminative approaches to modeling the sentence segmentation and concept labeling. We evaluate algorithms based on Finite State Transducers (FST) as well as discriminative algorithms based on Support Vector Machine sequence classifier based and Conditional Random Fields (CRF). We compare them in terms of concept accuracy, generalization and robustness to annotation ambiguities. We also show how non-local non-lexical features (e.g. a-priori knowledge) can be modeled with CRF which is the best performing algorithm across tasks. The evaluation is carried out on two SLU tasks of different complexity, namely ATIS and MEDIA corpora. Index Terms: spoken language understanding (SLU), conditional random fields (CRF), classifiers based sequence labeling, finite state transducers (FST). 1.
Providing Sublexical Constraints For Word Spotting Within The Angie Framework
- In Proc. Eurospeech '97
"... We describe our recent work in implementing a word-spotting system based on the ANGIE framework and the effects of varying the nature of the sublexical constraints placed upon the wordspotter 's filler model. ANGIE is a framework for modelling speech where the morphological and phonological substruc ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We describe our recent work in implementing a word-spotting system based on the ANGIE framework and the effects of varying the nature of the sublexical constraints placed upon the wordspotter 's filler model. ANGIE is a framework for modelling speech where the morphological and phonological substructures of words are jointly characterized by a context-free grammar and are represented in a multi-layered hierarchical structure. In this representation, the upper layers capture syllabification, morphology, and stress, the preterminal layer represents phonemics, and the bottom terminal categories are the phones. ANGIE provides a flexible framework where we can explore the effects of sublexical constraints within a word-spotting environment. Our experiments with spotting city names in ATIS validate the intuition that increasing the constraints present in the model improves performance, from 85.3 FOM for phone bigram to 89.3 FOM for a word lexicon. They also empirically strengthens our belief...
Learning Context-Dependent Mappings from Sentences to Logical Form
"... We consider the problem of learning context-dependent mappings from sentences to logical form. The training examples are sequences of sentences annotated with lambda-calculus meaning representations. We develop an algorithm that maintains explicit, lambda-calculus representations of salient discours ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We consider the problem of learning context-dependent mappings from sentences to logical form. The training examples are sequences of sentences annotated with lambda-calculus meaning representations. We develop an algorithm that maintains explicit, lambda-calculus representations of salient discourse entities and uses a context-dependent analysis pipeline to recover logical forms. The method uses a hidden-variable variant of the perception algorithm to learn a linear model used to select the best analysis. Experiments on context-dependent utterances from the ATIS corpus show that the method recovers fully correct logical forms with 83.7% accuracy. 1
Estimation of Language Models for New Spoken Language Applications
, 1996
"... Spoken language interfaces can provide natural communication for many database retrieval tasks. The CMU ATIS system provides an example of accessing airline information using spoken natural language queries. However, a lot of training data is needed to develop a spoken language application. For exam ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Spoken language interfaces can provide natural communication for many database retrieval tasks. The CMU ATIS system provides an example of accessing airline information using spoken natural language queries. However, a lot of training data is needed to develop a spoken language application. For example, we need training data to generate a language model that can be used by the recognizer to reduce the search space. In this paper, we will address some issues arising from small amount of training data available for a new spoken language application.
Robustness Issues in a Data-Driven Spoken Language Understanding System
- In HLT/NAACL04 Workshop on Spoken Language Understanding for Conversational Systems
, 2004
"... Robustness is a key requirement in spoken language understanding (SLU) systems. Human speech is often ungrammatical and ill-formed, and there will frequently be a mismatch between training and test data. This paper discusses robustness and adaptation issues in a statistically-based SLU system ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Robustness is a key requirement in spoken language understanding (SLU) systems. Human speech is often ungrammatical and ill-formed, and there will frequently be a mismatch between training and test data. This paper discusses robustness and adaptation issues in a statistically-based SLU system which is entirely data-driven. To test robustness, the system has been tested on data from the Air Travel Information Service (ATIS) domain which has been artificially corrupted with varying levels of additive noise. Although the speech recognition performance degraded steadily, the system did not fail catastrophically. Indeed, the rate at which the end-to-end performance of the complete system degraded was significantly slower than that of the actual recognition component.

