Results 11  20
of
97
TopicBased Language Models Using EM
 IN PROCEEDINGS OF EUROSPEECH
, 1999
"... In this paper, we propose a novel statistical language model to capture topicrelated longrange dependencies. Topics are modeled in a latent variable framework in which we also derive an EM algorithm to perform a topic factor decomposition based on a segmented training corpus. The topic model is co ..."
Abstract

Cited by 53 (1 self)
 Add to MetaCart
In this paper, we propose a novel statistical language model to capture topicrelated longrange dependencies. Topics are modeled in a latent variable framework in which we also derive an EM algorithm to perform a topic factor decomposition based on a segmented training corpus. The topic model is combined with a standard language model to be used for online word prediction. Perplexity results indicate an improvement over previously proposed topic models, which unfortunately has not translated into lower word error.
A survey of statistical machine translation
, 2007
"... Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of humanproduced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular tec ..."
Abstract

Cited by 52 (4 self)
 Add to MetaCart
Statistical machine translation (SMT) treats the translation of natural language as a machine learning problem. By examining many samples of humanproduced translation, SMT algorithms automatically learn how to translate. SMT has made tremendous strides in less than two decades, and many popular techniques have only emerged within the last few years. This survey presents a tutorial overview of stateoftheart SMT at the beginning of 2007. We begin with the context of the current research, and then move to a formal problem description and an overview of the four main subproblems: translational equivalence modeling, mathematical modeling, parameter estimation, and decoding. Along the way, we present a taxonomy of some different approaches within these areas. We conclude with an overview of evaluation and notes on future directions.
Maximum Entropy Techniques for Exploiting Syntactic, Semantic and Collocational Dependencies in Language Modeling
"... A new statistical language model is presented which combines collocational dependencies with two important sources of longrange statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Subs ..."
Abstract

Cited by 48 (10 self)
 Add to MetaCart
A new statistical language model is presented which combines collocational dependencies with two important sources of longrange statistical dependence: the syntactic structure and the topic of a sentence. These dependencies or constraints are integrated using the maximum entropy technique. Substantial improvements are demonstrated over a trigram model in both perplexity and speech recognition accuracy on the Switchboard task. A detailed analysis of the performance of this language model is provided in order to characterize the manner in which it performs better than a standard Ngram model. It is shown that topic dependencies are most useful in predicting words which are semantically related by the subject matter of the conversation. Syntactic dependencies on the other hand are found to be most helpful in positions where the best predictors of the following word are not within Ngram range due to an intervening phrase or clause. It is also shown that these two methods ind...
Headline Generation Based on Statistical Translation
 In Proceedings of ACL2000
"... Extractive summarization techniques cannot generate document summaries shorter than a single sentence, something that is often required. An ideal summarization system would understand each document and generate an appropriate summary directly from the results of that understanding. A more pr ..."
Abstract

Cited by 45 (0 self)
 Add to MetaCart
Extractive summarization techniques cannot generate document summaries shorter than a single sentence, something that is often required. An ideal summarization system would understand each document and generate an appropriate summary directly from the results of that understanding. A more practical approach to this problem results in the use of an approximation: viewing summarization as a problem analogous to statistical machine translation.
A bit of progress in language modeling — extended version
, 2001
"... 1.1 Overview Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition, ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
1.1 Overview Language modeling is the art of determining the probability of a sequence of words. This is useful in a large variety of areas including speech recognition,
Statistical Parsing With an AutomaticallyExtracted Tree Adjoining Grammar
, 2000
"... We discuss the advantages of lexicalized treeadjoining grammar as an alternative to lexicalized PCFG for statistical parsing, describing the induction of a probabilistic LTAG model from the Penn Treebank and evaluating its parsing performance. We find that this induction method is an improvement ov ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
We discuss the advantages of lexicalized treeadjoining grammar as an alternative to lexicalized PCFG for statistical parsing, describing the induction of a probabilistic LTAG model from the Penn Treebank and evaluating its parsing performance. We find that this induction method is an improvement over the EMbased method of [Hwa, 1998], and that the induced model yields results comparable to lexicalized PCFG.
Relating Probabilistic Grammars and Automata
 In Proceedings of ACP’99
, 1999
"... Both probabilistic contextfree grammars (PCFGs) and shiftreduce probabilistic pushdown automata (PPDAs) have been used for language modeling and maximum likelihood parsing. We investigate the precise relationship between these two formalisms, showing that, while they define the same classes of pr ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
Both probabilistic contextfree grammars (PCFGs) and shiftreduce probabilistic pushdown automata (PPDAs) have been used for language modeling and maximum likelihood parsing. We investigate the precise relationship between these two formalisms, showing that, while they define the same classes of probabilis tic languages, they appear to impose different inductive biases.
A Classification Approach to Word Prediction
, 2000
"... The eventual goal of a language model is to accurately predict the value of a missing word given its context. We present an approach to word prediction that is based on learning a representation for each word as a function of words and linguistics predicates in its context. This approach raises a fe ..."
Abstract

Cited by 34 (8 self)
 Add to MetaCart
The eventual goal of a language model is to accurately predict the value of a missing word given its context. We present an approach to word prediction that is based on learning a representation for each word as a function of words and linguistics predicates in its context. This approach raises a few new questions that we address. First, in order to learn good word representations it is necessary to use an expressive representation of the context. We present a way that uses external knowledge to generate expressive context representations, along with a learning method capable of handling the large number of features generated this way that can, potentially, contribute to each prediction. Second, since the number of words "competing" for each prediction is large, there is a need to "focus the attention" on a smaller subset of these. We exhibit the contribution of a "focus of attention" mechanism to the performance of the word predictor. Finally, we describe a large scale experimental study in which the approach presented is shown to yield significant improvements in word prediction tasks.
Exploiting Syntactic Structure for Natural Language Modeling
, 2000
"... The thesis presents an attempt at using the syntactic structure in natural language for improved language models for speech recognition. The structured language model merges techniques in automatic parsing and language modeling using an original probabilistic parameterization of a shiftreduce parse ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
The thesis presents an attempt at using the syntactic structure in natural language for improved language models for speech recognition. The structured language model merges techniques in automatic parsing and language modeling using an original probabilistic parameterization of a shiftreduce parser. A maximum likelihood reestimation procedure belonging to the class of expectationmaximization algorithms is employed for training the model. Experiments on the Wall Street Journal, Switchboard and Broadcast News corpora show improvement in both perplexity and word error rate  word lattice rescoring  over the standard 3gram language model. The significance of the thesis lies in presenting an original approach to language modeling that uses the hierarchical  syntactic  structure in natural language to improve on current 3gram modeling techniques for large vocabulary speech recognition.
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)