Results 1 - 10
of
17
Two decades of statistical language modeling: Where do we go from here
- Proceedings of the IEEE
, 2000
"... Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here ..."
Abstract
-
Cited by 119 (1 self)
- Add to MetaCart
Statistical Language Models estimate the distribution of various natural language phenomena for the purpose of speech recognition and other language technologies. Since the first significant model was proposed in 1980, many attempts have been made to improve the state of the art. We review them here, point to a few promising directions, and argue for a Bayesian approach to integration of linguistic theories with data. 1. OUTLINE Statistical language modeling (SLM) is the attempt to capture regularities of natural language for the purpose of improving the performance of various natural language applications. By and large, statistical language modeling amounts to estimating the probability distribution of various linguistic units, such as words, sentences, and whole documents. Statistical language modeling is crucial for a large variety of language technology applications. These include speech recognition (where SLM got its start), machine translation, document classification and routing, optical character recognition, information retrieval, handwriting recognition, spelling correction, and many more. In machine translation, for example, purely statistical approaches have been introduced in [1]. But even researchers using rule-based approaches have found it beneficial to introduce some elements of SLM and statistical estimation [2]. In information retrieval, a language modeling approach was recently proposed by [3], and a statistical/information theoretical approach was developed by [4]. SLM employs statistical estimation techniques using language training data, that is, text. Because of the categorical nature of language, and the large vocabularies people naturally use, statistical techniques must estimate a large number of parameters, and consequently depend critically on the availability of large amounts of training data.
A Bit of Progress in Language Modeling
, 2001
"... Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction (Church, 1988; Brown et al., 1990; Hull, 1 ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, optical character recognition, handwriting recognition, machine translation, and spelling correction (Church, 1988; Brown et al., 1990; Hull, 1992; Kernighan et al., 1990; Srihari and Baltus, 1992). The most commonly used language models are very simple (e.g. a Katz-smoothed trigram model). There are many improvements over this simple model however, including caching, clustering, higherorder n-grams, skipping models, and sentence-mixture models, all of which we will describe below. Unfortunately, these more complicated techniques have rarely been examined in combination. It is entirely possible that two techniques that work well separately will not work well together, and, as we will show, even possible that some techniques will work better together than either one does by itself. In this...
Assessment of Dialogue Systems By Means of a New Simulation Technique
, 2002
"... In recent years, aquestiT of greatieatTV: has been the development of tools and techni8T# tofaci))T#Z the evaluatiT ofdi:ZG9T systems. The latter can be evaluated fromvari(: poi( ofviZK such asrecogni#ZG and understandi # rates,dis,TVV naturalness and robustnessagaist recognissT errors.EvaluatiZ usu ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In recent years, aquestiT of greatieatTV: has been the development of tools and techni8T# tofaci))T#Z the evaluatiT ofdi:ZG9T systems. The latter can be evaluated fromvari(: poi( ofviZK such asrecogni#ZG and understandi # rates,dis,TVV naturalness and robustnessagaist recognissT errors.EvaluatiZ usually requiyT compim -T a large corpus of words and sentences uttered by users, relevant to theappli:VT#Z domai the systemi desimT9for.Thi paper proposes a newtechni9B that makesi possi(9 to reuse such a corpus for theevaluati# and to check the performance of the system whendinTV)G dinTV)G strategiT are used. ThetechniKZ i based on theautomati generatiT of conversati)) between thediT(B(K system, togetherwie anaddiK9T#( didiK9 system user#si8GG8T#()9 wi8 thediT(GZ: system. Thetechni8G has beenappliV to evaluate a di9:K8: system developedi our labusiV twodiT((ZK recogniT#( front-ends and twodiTZ8:( diTZ8:( strategi# to handle user confirmati(KZ The experiVT#( show that the prompt-dependentrecogniepe front-endachi-en better results, but that thi front-endi appropriVG onlyi users lirs thei utterances to those related to the current system prompt. The prompt-i(9VBKTiK front-endachi-en ihi-en results, but enables front-end users to utter anypermi89G utterance at anytiVB iVB9K(T#(ZB of the system prompt. In consequence,thi front-end may allow a more natural and comfortable imfortableT TheexperiBT#( also show that there-promptiV confirmati strategy enhances system performance for both recogniVT# front-ends.
A Closer Look at Skip-gram Modelling
"... Data sparsity is a large problem in natural language processing that refers to the fact that language is a system of rare events, so varied and complex, that even using an extremely large corpus, we can never accurately model all possible strings of words. This paper examines the use of skip-grams ( ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Data sparsity is a large problem in natural language processing that refers to the fact that language is a system of rare events, so varied and complex, that even using an extremely large corpus, we can never accurately model all possible strings of words. This paper examines the use of skip-grams (a technique where by n-grams are still stored to model language, but they allow for tokens to be skipped) to overcome the data sparsity problem. We analyze this by computing all possible skip-grams in a training corpus and measure how many adjacent (standard) n-grams these cover in test documents. We examine skip-gram modelling using one to four skips with various amount of training data and test against similar documents as well as documents generated from a machine translation system. In this paper we also determine the amount of extra training data required to achieve skip-gram coverage using standard adjacent tri-grams. 1.
Combination Of N-Grams And Stochastic Context-Free Grammars For Language Modeling
- International conference on computational linguistics (COLIN-A CL
, 2000
"... This paper describes a hybrid proposal to combine n-grams and Stochastic Context-Free Grammars (SCFGs) for language modeling. A classical n-gram model is used to capture the local relations between words, while a stochastic grammatical model is considered to represent the long-term relations between ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper describes a hybrid proposal to combine n-grams and Stochastic Context-Free Grammars (SCFGs) for language modeling. A classical n-gram model is used to capture the local relations between words, while a stochastic grammatical model is considered to represent the long-term relations between syntactical structures. In order to dene this grammatical model, which will be used on large-vocabulary complex tasks, a category-based SCFG and a probabilistic model of word distribution in the categories have been proposed. Methods for learning these stochastic models for complex tasks are described, and algorithms for computing the word transition probabilities are also presented. Finally, experiments using the Penn Treebank corpus improved by 30% the test set perplexity with regard to the classical n-gram models.
Testing Dialogue Systems By Means of Automatic Generation of Conversations
, 2002
"... This paper presents a novel technique that allows testing spoken dialogue systems by means of an automatic generation of conversations. The technique permits to easily test spoken dialogue systems under a variety of lab-simulated conditions, as it is easy to vary or change the utterance corpus used ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents a novel technique that allows testing spoken dialogue systems by means of an automatic generation of conversations. The technique permits to easily test spoken dialogue systems under a variety of lab-simulated conditions, as it is easy to vary or change the utterance corpus used to check the performance of the system. The technique is based on the use of a module called user simulator whose purpose is to behave as real users when they interact with dialogue systems. The behaviour of the simulator is decided by means of diverse scenarios that represent the goals of the users. The simulator aim is to achieve the goals set in the scenarios during the interaction with the dialogue system. We have applied the technique to test a dialogue system developed in our lab. The test has been carried out considering different levels of white and babble noise as well as a VTS noise compensation technique. The results prove that the dialogue system performance is worse under the babble noise conditions. The VTS technique has been effective when dealing with noisy utterances and has lead to better experimental results, particularly for the white noise. The technique has permitted to detect problems in the dialogue strategies employed to handle confirmation turns and recognition errors, suggesting that these strategies must be improved. q 2002 Elsevier Science B.V. All rights reserved.
Quantifying The Contribution Of Language Modeling To Writer-Independent On-Line Handwriting Recognition
- Proceedings of IWFHR 7
, 2000
"... We describe experiments varying the degree of language-model constraint applied to writer-independent on-line handwriting recognition. Six types of models are used, varying statistical components and hard constraints which govern recognition search during the sequencing of characters to form valid t ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We describe experiments varying the degree of language-model constraint applied to writer-independent on-line handwriting recognition. Six types of models are used, varying statistical components and hard constraints which govern recognition search during the sequencing of characters to form valid texts. Experiments on constrained texts, such as dates and phone numbers, show that although tighter language models cause more inputs to be out-of-domain, they can still eliminate up to 50% of string errors and 75% of character errors compared to using a null language model.
Capturing long distance dependency for language modeling: an empirical study
- In Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP-04
, 2004
"... This paper presents an extensive empirical study on two language modeling techniques, linguistically-motivated word skipping and predictive clustering, both of which are used in capturing long distance word dependencies that are beyond the scope of a word trigram model. We compare the techniques to ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents an extensive empirical study on two language modeling techniques, linguistically-motivated word skipping and predictive clustering, both of which are used in capturing long distance word dependencies that are beyond the scope of a word trigram model. We compare the techniques to others that were proposed previously for the same purpose. We evaluate the resulting models on the task of Japanese Kana-Kanji conversion. We show that the two techniques, while simple, outperform existing methods studied in this paper, and lead to language models that perform significantly better than a word trigram model. We also investigate how factors such as training corpus size and genre affect the performance of the models. 1
Integrating A Context-Dependent Phrase Grammar In The Variable N-Gram Framework
, 2000
"... This paper focuses on the learning of multi-word lexical units, or phrases, and how to model them within the variable n-gram framework. We introduce the notion of contextdependent phrases and suggest an algorithm for unsupervised learning of phrases. Also, we propose an approach to integrate a phras ..."
Abstract
- Add to MetaCart
This paper focuses on the learning of multi-word lexical units, or phrases, and how to model them within the variable n-gram framework. We introduce the notion of contextdependent phrases and suggest an algorithm for unsupervised learning of phrases. Also, we propose an approach to integrate a phrase grammar and a variable n-gram without the need of explicitly handling multi-word lexical items. The combined variable n-gram phrase grammar improves recognition accuracy on the Switchboard corpus over both the baseline trigram and using a variable n-gram alone. 1. INTRODUCTION Although words in English are reasonable lexical units for language modeling, there are many cases that longer lexical units may be more appropriate. Frequently used word sequences, such as I mean or you know, are so common in conversational speech that they may be effectively used by the speaker as a single lexical item. We call these multiword units "phrases". There are several ways of treating a multi-word sequ...

