Results 1 - 10
of
13
A Bit of Progress in Language Modeling
, 2001
"... Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction (Church, 1988; Brown et al., 1990; Hull, 1 ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction (Church, 1988; Brown et al., 1990; Hull, 1992; Kernighan et al., 1990; Srihari and Baltus, 1992). The most commonly used language models are very simple (e.g. a Katz-smoothed trigram model). There are many improvements over this simple model however, including caching, clustering, higherorder n-grams, skipping models, and sentence-mixture models, all of which we will describe below. Unfortunately, these more complicated techniques have rarely been examined in combination. It is entirely possible that two techniques that work well separately will not work well together, and, as we will show, even possible that some techniques will work better together than either one does by itself. In this...
Getting More Mileage from Web Text Sources for Conversational Speech Language Modeling using Class-Dependent Mixtures
- Proc. HLT-NAACL 2003
, 2003
"... Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger perfor ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.
Rapid language model development for new task domains
- Proc. First International Conference on Language Resources and Evaluation (LREC
, 1998
"... Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The f ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
Data sparseness has been regularly indicted as the primary problem in statistical language modelling. We go one step further to consider the situation when no text data is available for the target domain. We present two techniques for building efficient language models quickly for new domains. The first technique is based on using a context-free grammar to generate a corpus of word collocations. The second is an adaptation technique based on using out-of-domain corpora to estimate target domain language models. We report results of successfully using these two techniques individually and in combination to build efficient models for a spontaneous speech recognition task in a medium-sized vocabulary domain. 1.
Practical Issues in Compiling Typed Unification Grammars for Speech Recognition
- In Meeting of the Association for Computational Linguistics
, 2001
"... Current alternatives for language modeling are statistical techniques based on large amounts of training data, and hand-crafted context-free or finite-state grammars that are difficult to build and maintain. One way to address the problems of the grammar-based approach is to compile recogniti ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Current alternatives for language modeling are statistical techniques based on large amounts of training data, and hand-crafted context-free or finite-state grammars that are difficult to build and maintain. One way to address the problems of the grammar-based approach is to compile recognition grammars from grammars written in a more expressive formalism. While theoretically straight-forward, the compilation process can exceed memory and time bounds, and might not always result in accurate and efficient speech recognition.
Robust Information Extraction From Spoken Language Data
- In Proceedings of Eurospeech-99
"... In this paper we address the problem of information extraction from speech data, particularly improving robustness to automatic recognition errors. We describe a baseline probabilistic model that uses wordclass smoothing in a phrase n-gram language model. The model is adjusted to the error character ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
In this paper we address the problem of information extraction from speech data, particularly improving robustness to automatic recognition errors. We describe a baseline probabilistic model that uses wordclass smoothing in a phrase n-gram language model. The model is adjusted to the error characteristics of a speech recognizer by inserting error tokens in the training data and by using word confidences in decoding to account for possible errors in the recognition output. Experiments show improved performance when training and test conditions are matched. 1. INTRODUCTION Extracting linguistic structure such as proper names, noun phrases, and verb phrases is an important first step in many systems aimed at automatic language understanding. While significant progress has been made on this problem, most of the work has focused on "clean" textual data such as newswire texts, in which cues such as capitalization and punctuation are important for obtaining high accuracy results. However, t...
Normalization of Non-Standard Words: WS '99 Final Report
- Hopkins University
, 1999
"... All areas of language and speech technology must deal, in one way or another, with real text. Real text is messy: many things one nds in text | numbers, abbreviations, dates, currency amounts, acronyms . . . | are not standard words in that one cannot nd their properties by looking them up in a ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
All areas of language and speech technology must deal, in one way or another, with real text. Real text is messy: many things one nds in text | numbers, abbreviations, dates, currency amounts, acronyms . . . | are not standard words in that one cannot nd their properties by looking them up in a dictionary or deriving them morphologically from words that are in a dictionary, nor can one nd their pronunciation by an application of \letter-to-sound" rules. For many applications, such non-standard words | NSW's | need to be normalized, or in other words converted into standard words. Since the correct normalization of a given token often depends upon both the local context and the type (genre) of text one is dealing with, \text-normalization" is in general a very hard problem. Typical technology for text-normalization mostly involves sets of ad hoc rules tuned to handle one or two genres of text (often newspaper-style text), with the expected result that the techniques, do...
Text normalization with varied data sources for conversational speech language modeling
- In Proc. ICASSP
, 2002
"... Collecting sufficient language model training data for good speech recognition performance in a new domain is often difficult. However, there may be other sources of data that are matched in terms of topic or style, if not both. This paper looks at the use of text normalization tools to make these d ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Collecting sufficient language model training data for good speech recognition performance in a new domain is often difficult. However, there may be other sources of data that are matched in terms of topic or style, if not both. This paper looks at the use of text normalization tools to make these data more suitable for language model training, in conjunction with mixture models to combine data from different sources. We specifically address the task of recognizing meeting speech, showing a small reduction in word error rate over a baseline language model trained from conversational speech data. 1.
Acoustic Model Clustering Based on Syllable Structure
, 2002
"... Current speech recognition systems perform poorly on conversational speech as compared to read speech, arguably due to the large acoustic variability inherent in conversational speech. Our hypothesis is that there are systematic effects in local context, associated with syllabic structure, that are ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Current speech recognition systems perform poorly on conversational speech as compared to read speech, arguably due to the large acoustic variability inherent in conversational speech. Our hypothesis is that there are systematic effects in local context, associated with syllabic structure, that are not being captured in the current acoustic models. Such variation may be modeled using a broader definition of context than in traditional systems which restrict context to be the neighboring phonemes. In this paper, we study the use of word- and syllable-level context conditioning in recognizing conversational speech. We describe a method to extend standard tree-based clustering to incorporate a large number of features, and we report results on the Switchboard task which indicate that syllable structure outperforms pentaphones and incurs less computational cost. It has been hypothesized that previous work in using syllable models for recognition of English was limited because of ignoring the phenomenon of re-syllabification (change of syllable structure at word boundaries), but our analysis shows that accounting for re-syllabification does not impact recognition performance.
Class-dependent Interpolation for Estimating Language Models from Multiple Text Sources
, 2003
"... Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger perf ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.
Information Extraction From Broadcast News Speech Data
- Proceedings Of The DARPA Broadcast News Workshop, February 28-March 3
, 1999
"... In this paper we describe a robust algorithm for information extraction from spoken language data. Our probabilistic algorithm builds on results in language modeling, using classbased smoothing to produce state-of-the-art performance for a wide range of speech error rates. We show that our system pe ..."
Abstract
- Add to MetaCart
In this paper we describe a robust algorithm for information extraction from spoken language data. Our probabilistic algorithm builds on results in language modeling, using classbased smoothing to produce state-of-the-art performance for a wide range of speech error rates. We show that our system performs well with sparse data, as well as with out-of-domain data. 1. INTRODUCTION Extracting linguistic structure such as proper names, noun phrases, and verb phrases is an important first step in many systems aimed at automatic language understanding. While significant progress has been made on this problem, most of the work has focused on "clean" textual data such as newswire texts, where cues such as capitalization and punctuation are important for obtaining high accuracy results. However, there are many data sources where these cues are no t reliable, such as in spoken language data or single-case text. Spoken language sources, in particular, pose additional problems because of disflu...

