• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Speech Recognition And The Frequency Of Recently Used Words: A Modified Markov Model For Natural Language (1988)

by Roland Kuhn
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 19
Next 10 →

A Maximum Entropy Approach to Adaptive Statistical Language Modeling

by Ronald Rosenfeld - Computer, Speech and Language , 1996
"... An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's histor ..."
Abstract - Cited by 201 (11 self) - Add to MetaCart
An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's history, we propose and use trigger pairs as the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse. Next, statistical evidence from multiple sources must be combined. Traditionally, linear interpolation and its variants have been used, but these are shown here to be seriously deficient. Instead, we apply the principle of Maximum Entropy (ME). Each information source gives rise to a set of constraints, to be imposed on the combined estimate. The intersection of these constraints is the set of probability functions which are consistent with all the information sources. The function with the highest entropy within that set is the ME solution...

Two decades of statistical language modeling: Where do we go from here

by Ronald Rosenfeld - Proceedings of the IEEE , 2000
"... Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here ..."
Abstract - Cited by 119 (1 self) - Add to MetaCart
Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here, point to a few promising directions, and argue for a Bayesian approach to integration of linguistic theories with data. 1. OUTLINE Statistical language modeling (SLM) is the attempt to capture regularities of natural language for the purpose of improving the performance of various natural language applications. By and large, statistical language modeling amounts to estimating the probability distribution of various linguistic units, such as words, sentences, and whole documents. Statistical language modeling is crucial for a large variety of language technology applications. These include speech recognition (where SLM got its start), machine translation, document classification and routing, optical character recognition, information retrieval, handwriting recognition, spelling correction, and many more. In machine translation, for example, purely statistical approaches have been introduced in [1]. But even researchers using rule-based approaches have found it beneficial to introduce some elements of SLM and statistical estimation [2]. In information retrieval, a language modeling approach was recently proposed by [3], and a statistical/information theoretical approach was developed by [4]. SLM employs statistical estimation techniques using language training data, that is, text. Because of the categorical nature of language, and the large vocabularies people naturally use, statistical techniques must estimate a large number of parameters, and consequently depend critically on the availability of large amounts of training data.

A Bit of Progress in Language Modeling

by Joshua T. Goodman , 2001
"... Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction (Church, 1988; Brown et al., 1990; Hull, 1 ..."
Abstract - Cited by 70 (1 self) - Add to MetaCart
Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction (Church, 1988; Brown et al., 1990; Hull, 1992; Kernighan et al., 1990; Srihari and Baltus, 1992). The most commonly used language models are very simple (e.g. a Katz-smoothed trigram model). There are many improvements over this simple model however, including caching, clustering, higherorder n-grams, skipping models, and sentence-mixture models, all of which we will describe below. Unfortunately, these more complicated techniques have rarely been examined in combination. It is entirely possible that two techniques that work well separately will not work well together, and, as we will show, even possible that some techniques will work better together than either one does by itself. In this...

Building Probabilistic Models for Natural Language

by Stanley F. Chen , 1996
"... Building models of language is a central task in natural language processing. Traditionally, language has been modeled with manually-constructed grammars that describe which strings are grammatical and which are not; however, with the recent availability of massive amounts of on-line text, statistic ..."
Abstract - Cited by 60 (1 self) - Add to MetaCart
Building models of language is a central task in natural language processing. Traditionally, language has been modeled with manually-constructed grammars that describe which strings are grammatical and which are not; however, with the recent availability of massive amounts of on-line text, statistically-trained models are an attractive alternative. These models are generally probabilistic, yielding a score reflecting sentence frequency instead of a binary grammaticality judgement. Probabilistic models of language are a fundamental tool in speech recognition for resolving acoustically ambiguous utterances. For example, we prefer the transcription forbear to four bear as the former string is far more frequent in English text. Probabilistic models also have application in optical character recognition, handwriting recognition, spelling correction, part-of-speech tagging, and machine translation. In this thesis, we investigate three problems involving the probabilistic modeling of languag...

A Hybrid Approach To Adaptive Statistical Language Modeling

by Ronald Rosenfeld - Proceedings of the ARPA workshop on human language technology , 1994
"... We desert'be our latest attempt at adaptive language modeling. At the heart of our approach is a Maximum Entropy (ME) model which inc.orlxnates many knowledge sources in a consistent manner. The other components are a selective unigram cache, a conditional bigram cache, and a conventionalstatic trig ..."
Abstract - Cited by 23 (2 self) - Add to MetaCart
We desert'be our latest attempt at adaptive language modeling. At the heart of our approach is a Maximum Entropy (ME) model which inc.orlxnates many knowledge sources in a consistent manner. The other components are a selective unigram cache, a conditional bigram cache, and a conventionalstatic trigram. We describe the knowledge sources used to build such a model with ARPA's official WSJ corpus, and report on perplexity and word error rate results obtained with it. Then, three different adaptation paradigms are discussed, and an additional experiment, based on AP wire data, is used to compare them. 1. OVERVIEW OF ME FRAMEWORK Using several different probability estimates to arrive at one combined estimate is a general problem that arises in many tasks. The Maximum Entropy (ME) principle has recently been demonstrated as a powerful tool for combining statistical estimates from diverse sources[l, 2, 3]. The ME principle ([4, 5]) proposes the following: 1. Reformulate the different estimates as constraints on the expectation of various functions, to be satisfied by the target (combined) estimate. 2. Among all probability distributions that satisfy these con-straints, choose the one that has the highest entropy. More specifically, for estimating a probability function P(x), each constraint i is associated with a constraintfunctionfi(x) and a desired expectation ci. The constraint is then written as: def E Eefi = P(x)fi(x) = ci. (1) X Given consistent constraints, a unique ME solutions is guar-anteed to exist, and to be of the form: P(x) = II mf'°°, (2) i where the pi's are some unknown constants, to be found. Probability functions of the form (2) are called log-linear, and the family of functions defined by holding thefi's fixed and varying the pi's is called an exponential family.

Incorporation of a markov model of language syntax in a text recognition algorithm

by Jonathan J. Hull - In Sympostum on Doc ...ment Analysis and Information Retrievals
"... The use of a hidden Markov model (HMM) for language syntax to improve the performance of a text recognition algorithm is proposed. Syntactic constraints are described by the transition probabilities between word classes. The confusion between the feature string for a word and the various syntactic c ..."
Abstract - Cited by 19 (5 self) - Add to MetaCart
The use of a hidden Markov model (HMM) for language syntax to improve the performance of a text recognition algorithm is proposed. Syntactic constraints are described by the transition probabilities between word classes. The confusion between the feature string for a word and the various syntactic classes is also described probabilistically. A modification of the Viterbi algorithm is also proposed that finds a fixed number of sequences of syntactic classes for a given sentence that have the highest probabilities of occurrence, given the feature strings for the words. An experimental application of this approach is demonstrated with a word hypothesization algorithm that produces a number of guesses about the identity of each word in a running text The use of first and second order transition probabilities is explored. Overall performance of between 65 and 80 percent reduction in the average number of words that can match a given image is achieved. 1.

Putting It All Together: Language Model Combination

by Joshua T. Goodman - IN PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING
"... In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, modified Kneser-Ney smoothing, and clustering. While all of these techniques have been studied separately, they have rarely b ..."
Abstract - Cited by 11 (2 self) - Add to MetaCart
In the past several years, a number of different language modeling improvements over simple trigram models have been found, including caching, higher-order n-grams, skipping, modified Kneser-Ney smoothing, and clustering. While all of these techniques have been studied separately, they have rarely been studied in combination. We find some significant interactions, especially with smoothing techniques. The combination of all techniques leads to up to a 45% perplexity reduction over a Katz smoothed trigram model with no count cuto#s, the highest such perplexity reduction reported.

How to Wreck a Nice Beach You Sing Calm Incense

by Henry Lieberman, Er Faaborg, Waseem Daher, José Espinosa - Proceedings of the 10th international conference on Intelligent user interfaces , 2005
"... A principal problem in speech recognition is distinguishing between words and phrases that sound similar but have different meanings. Speech recognition programs produce a list of weighted candidate hypotheses for a given audio segment, and choose the "best " candidate. If the choice is in ..."
Abstract - Cited by 10 (3 self) - Add to MetaCart
A principal problem in speech recognition is distinguishing between words and phrases that sound similar but have different meanings. Speech recognition programs produce a list of weighted candidate hypotheses for a given audio segment, and choose the "best " candidate. If the choice is incorrect, the user must invoke a correction interface that displays a list of the hypotheses and choose the desired one. The correction interface is time-consuming, and accounts for much of the frustration of today's dictation systems. Conventional dictation systems prioritize hypotheses based on language models derived from statistical techniques such as n-grams and Hidden Markov Models. We propose a supplementary method for ordering hypotheses based on Commonsense Knowledge. We filter acoustical and word-frequency hypotheses by testing their plausibility with a semantic network derived from 700,000 statements about everyday life. This often filters out possibilities that "don't make sense " from the user's viewpoint, and leads to improved recognition. Reducing the hypothesis space in this way also makes possible streamlined correction interfaces that improve the overall throughput of dictation systems.

Lattice Based Language Models

by Pierre Dupont, Ronald Rosenfeld , 1997
"... This paper introduces lattice based language models, a new language modeling paradigm. These models construct multi-dimensional hierarchies of partitions and select the most promising partitions to generate the estimated distributions. We discussed a specific two dimensional lattice and propose two ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
This paper introduces lattice based language models, a new language modeling paradigm. These models construct multi-dimensional hierarchies of partitions and select the most promising partitions to generate the estimated distributions. We discussed a specific two dimensional lattice and propose two primary features to measure the usefulness of each node: the training-set history count and the smoothed entropy of its prediction. Smoothing techniques are reviewed and a generalization of the conventional backoff strategy to multiple dimensions is proposed. Preliminary experimental results are obtained on the SWITCHBOARD corpus which lead to a 6.5 % perplexity reduction over a word trigram model. Project sponsored by the National Security Agency under Grant No. MDA904-97-10006. The United States Government is authorized to reproduce and distribute reprints notwithstanding any copyright notation hereon. y Current address: D'ept. Math., Universit'e Jean Monnet, 23, rue P. Michelon, 42023 S...

NYU Language Modeling Experiments for the 1995 CSR Evaluation

by Satoshi Sekine, Ralph Grishman - In Proceedings of the ARPA Spoken Language Systems Technology Workshop , 1995
"... This paper describes NYU's effort toward improving recognition accuracy for the 1995 ARPA Large Vocabulary Continuous Speech Recognition evaluation. We are trying to develop language models which includes longer-range language models and linguistic motivated model. For the system described here, we ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
This paper describes NYU's effort toward improving recognition accuracy for the 1995 ARPA Large Vocabulary Continuous Speech Recognition evaluation. We are trying to develop language models which includes longer-range language models and linguistic motivated model. For the system described here, we used as a starting point the scores produced by SRI's acoustic and language models. These are linearly combined with the scores produced by the NYU language models. 1. Introduction This paper describes NYU's effort toward improving recognition accuracy for the 1995 ARPA Large Vocabulary Continuous Speech Recognition evaluation. Our goal has been to study some longerrange language models and determine whether they can be a useful component of the languagemodels used for speech recognition. This year, we started the project by classifying the recognition errors in terms of their linguistic properties. Then, we considered how some of these errors might be reduced by specifically-targeted lingu...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University