Results 1  10
of
37
An Empirical Study of Smoothing Techniques for Language Modeling
, 1998
"... We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Br ..."
Abstract

Cited by 1188 (21 self)
 Add to MetaCart
(Show Context)
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and ngram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the crossentropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of JelinekMercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods. 1
Estimation of probabilities from sparse data for the language model component of a speech recognizer
 IEEE Transactions on Acoustics, Speech and Signal Processing
, 1987
"... AbstractThe description of a novel type of rngram language model is given. The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data. This solution compares favorably to other proposed methods. Wh ..."
Abstract

Cited by 790 (2 self)
 Add to MetaCart
(Show Context)
AbstractThe description of a novel type of rngram language model is given. The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data. This solution compares favorably to other proposed methods. While the method has been developed for and successfully implemented in the IBM Real Time Speech Recognizers, its generality makes it applicable in other areas where the problem of estimating probabilities from sparse data arises. Sparseness of data is an inherent property of any real text, and it is a problem that one always encounters while collecting frequency statistics on words and word sequences (mgrams) from a text of finite size. This means that even for a very large data collection, the maximum likelihood estimation method does not allow Turing’s estimate PT for a probability of a word (mgram) which occurred in the sample r times is r* PT = where r We call a procedure of replacing a count r with a modified count r ’ “discounting ” and a ratio rt/r a discount coefficient dr. When r ’ = r*, we have Turing’s discounting. Let us denote the mgram wl, *.., w, as wy and the number of times it occurred in the sample text as c(wT). Then the maximum likelihood estimate is
Similaritybased approaches to natural language processing
, 1997
"... Statistical methods for automatically extracting information about associations between words or documents from large collections of text have the potential to have considerable impact in a number of areas, such as information retrieval and naturallanguagebased user interfaces. However, even huge ..."
Abstract

Cited by 52 (3 self)
 Add to MetaCart
Statistical methods for automatically extracting information about associations between words or documents from large collections of text have the potential to have considerable impact in a number of areas, such as information retrieval and naturallanguagebased user interfaces. However, even huge bodies of text yield highly unreliable estimates of the probability of relatively common events, and, in fact, perfectly reasonable events may not occur in the training data at all. This is known as the sparse data problem. Traditional approaches to the sparse data problem use crude approximations. We propose a different solution: if we are able to organize the data into classes of similar events, then, if information about an event is lacking, we can estimate its behavior from information about similar events. This thesis presents two such similaritybased approaches, where, in general, we measure similarity by the KullbackLeibler divergence, an informationtheoretic quantity. Our first approach is to build soft, hierarchical clusters: soft, because each event belongs to each cluster with some probability; hierarchical, because cluster centroids are iteratively split to model finer distinctions. Our clustering method, which uses the technique of deterministic annealing,
Phrasetable smoothing for statistical machine translation
"... We discuss different strategies for smoothing the phrasetable in Statistical MT, and give results over a range of translation settings. We show that any type of smoothing is a better idea than the relativefrequency estimates that are often used. The best smoothing techniques yield consistent gains o ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
(Show Context)
We discuss different strategies for smoothing the phrasetable in Statistical MT, and give results over a range of translation settings. We show that any type of smoothing is a better idea than the relativefrequency estimates that are often used. The best smoothing techniques yield consistent gains of approximately 1 % (absolute) according to the BLEU metric. 1
GoodTuring smoothing without tears
 Journal of Quantitative Linguistics
, 1995
"... The performance of statistically based techniques for many tasks such as spelling correction, sense disambiguation, and translation is improved if one can estimate a probability for an object of interest which has not been seen before. GoodTuring methods are one means of estimating these probabilit ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
The performance of statistically based techniques for many tasks such as spelling correction, sense disambiguation, and translation is improved if one can estimate a probability for an object of interest which has not been seen before. GoodTuring methods are one means of estimating these probabilities for previously unseen objects. However, the use of GoodTuring methods requires a smoothing step which must smooth in regions of vastly different accuracy. Such smoothers are difficult to use, and may have hindered the use of GoodTuring methods in computational linguistics. This paper presents a method which uses the simplest possible smooth, a straight line, together with a rule for switching from Turing estimates which are more accurate at low frequencies. We call this method the Simple GoodTuring (SGT) method. Two examples, one from prosody, the other from morphology, are used to illustrate the SGT. While the goal of this research was to provide a simple estimator, the SGT turns out to be the most accurate of several methods applied in a set of Monte Carlo examples which satisfy the assumptions of the GoodTuring methods. The accuracy of the SGT is compared to two other methods for estimating the same probabilities, the Expected Likelihood Estimate (ELE) and two way cross validation. The SGT method is
CategoryBased Statistical Language Models
, 1997
"... this document. The first section, in chapter 3, develops a model for syntactic dependencies based on wordcategory ngrams. The second section, in chapter 4, extends this model by allowing shortrange word relations to be captured through the incorporation of selected word ngrams. ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
this document. The first section, in chapter 3, develops a model for syntactic dependencies based on wordcategory ngrams. The second section, in chapter 4, extends this model by allowing shortrange word relations to be captured through the incorporation of selected word ngrams.
A syllablesynchronous network search algorithm for word decoding in Chinese speech recognition
 IEEE International Conf. on Acoust., Speech and Signal Processing (ICASSP
, 1999
"... The Chinese language is syllabic in nature with frequent homonym phenomena and severe word boundary uncertainty problem. This makes the Chinese continuous speech recognition (CSR) slightly difficult. In order to solve these problems, a Chinese syllablesynchronous network search (SSNS) algorithm is ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
The Chinese language is syllabic in nature with frequent homonym phenomena and severe word boundary uncertainty problem. This makes the Chinese continuous speech recognition (CSR) slightly difficult. In order to solve these problems, a Chinese syllablesynchronous network search (SSNS) algorithm is proposed. Together with the vocabulary word search tree and the Ngram based language model, the syllablesynchronous network search algorithm gives a good solution to the Chinese syllabletoword conversion. In addition, this algorithm is a good method for the accent Chinese speech recognition. The experimental results have showed that the SSNS algorithm can achieve a good overall continuous Chinese speech recognition system performance. 1.
EASYTALK: A largevocabulary speakerindependent Chinese dictation machine
 EuroSpeech‘99
, 1999
"... The EasyTalk application is a largevocabulary speakerindependent continuous Chinese speech recognition system, i.e. Chinese dictation machine (CDM), under the WINTEL environment. Addressed in this paper are a number of novel techniques adopted in the CDM engine which is the basis of EasyTalk, incl ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
(Show Context)
The EasyTalk application is a largevocabulary speakerindependent continuous Chinese speech recognition system, i.e. Chinese dictation machine (CDM), under the WINTEL environment. Addressed in this paper are a number of novel techniques adopted in the CDM engine which is the basis of EasyTalk, including the mergingbased syllable detection automaton (MBSDA) and the statistical knowledge based frame synchronous search (SKBFSS) algorithms in the acoustic processing stage, the percentage in critical area (CAP) and recognition score gap (RSG) methods for the acceptation and rejection decision, the word search tree (WST), the NGram, and the syllable synchronous network search (SSNS) algorithm in the language processing stage, the embedded multiple model sheme (EMM) and the fuzzy syllable set (FSS) for the robustness purpose.
Robust estimation of microbial diversity in theory and in practice
 ISME J
, 2013
"... practice ..."
Dyna: Extending Datalog For Modern AI ⋆
"... Abstract. Modern statistical AI systems are quite large and complex; this interferes with research, development, and education. We point out that most of the computation involves databaselike queries and updates on complex views of the data. Specifically, recursive queries look up and aggregate rel ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Modern statistical AI systems are quite large and complex; this interferes with research, development, and education. We point out that most of the computation involves databaselike queries and updates on complex views of the data. Specifically, recursive queries look up and aggregate relevant or potentially relevant values. If the results of these queries are memoized for reuse, the memos may need to be updated through change propagation. We propose a declarative language, which generalizes Datalog, to support this work in a generic way. Through examples, we show that a broad spectrum of AIalgorithms can be concisely captured by writing down systems of equations in our notation. Many strategies could be used to actually solve those systems. Our examples motivatecertainextensionstoDatalog, whichareconnectedtofunctional and objectoriented programming paradigms. 1 Why a New DataOriented Language for AI? Modern AI systems are frustratingly big, making them timeconsuming to engineer