### Modeling Syntax for Parsing and Translation

, 2003

"... Syntactic structure is an important component of natural language utterances, for both form and content. Therefore, a variety of applications can benefit from the integration of syntax into their statistical models of language. In this thesis, two new syntax-based models are presented, along with th ..."

Abstract
- Add to MetaCart

(Show Context)
Syntactic structure is an important component of natural language utterances, for both form and content. Therefore, a variety of applications can benefit from the integration of syntax into their statistical models of language. In this thesis, two new syntax-based models are presented, along with their training algorithms: a monolingual generative model of sentence structure, and a model of the relationship between the structure of a sentence in one language and the structure of its translation into another language. After these models are trained and tested on the respective tasks of monolingual parsing and word-level bilingual corpus alignment, they are demonstrated in two additional applications. First, a new statistical parser is automatically induced for a language in which none was available, using a bilingual corpus. Second, a statistical translation system is augmented with syntax-based models. Thus the contributions of this thesis include:

### Exploiting Syntactic Structure for Natural Language Modeling

, 2000

"... Abstract The thesis presents an attempt at using the syntactic structure in natural language for improved language models for speech recognition. The structured language model merges techniques in automatic parsing and language modeling using an original probabilistic parameterization of a shift-red ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract The thesis presents an attempt at using the syntactic structure in natural language for improved language models for speech recognition. The structured language model merges techniques in automatic parsing and language modeling using an original probabilistic parameterization of a shift-reduce parser. A maximum likelihood reestimation procedure belonging to the class of expectation-maximization algorithms is employed for training the model. Experiments on the Wall Street Journal, Switchboard and Broadcast News corpora show improvement in both perplexity and word error rate-- word lattice rescoring-- over the standard 3-gram language model. The significance of the thesis lies in presenting an original approach to language modeling that uses the hierarchical-- syntactic-- structure in natural language to improve on current 3-gram modeling techniques for large vocabulary speech recognition.

### Grammatical Bigrams

"... Abstract Unsupervised learning algorithms have been derived for several statistical models of English grammar, but their computational complexity makes applying them to large data sets intractable. This paper presents a probabilistic model of English grammar that is much simpler than conventional mo ..."

Abstract
- Add to MetaCart

Abstract Unsupervised learning algorithms have been derived for several statistical models of English grammar, but their computational complexity makes applying them to large data sets intractable. This paper presents a probabilistic model of English grammar that is much simpler than conventional models, but which admits an efficient EM training algorithm. The model is based upon grammatical bigrams, i.e., syntactic relationships between pairs of words. We present the results of experiments that quantify the representational adequacy of the grammatical bigram model, its ability to generalize from labelled data, and its ability to induce syntactic structure from large amounts of raw text. 1 Introduction One of the most significant challenges in learning grammars from raw text is keeping the computational complexity manageable. For example, the EM algorithm for the unsupervised training of Probabilistic Context-Free Grammars--known as the Inside-Outside algorithm--has been found in practice to be &quot;computationally intractable for realistic problems &quot; [1]. Unsupervised learning algorithms have been designed for other grammar models (e.g., [2, 3]). However, to the best of our knowledge, no large-scale experiments have been carried out to test the efficacy of these algorithms; the most likely reason is that their computational complexity, like that of the Inside-Outside algorithm, is impractical. One way to improve the complexity of inference and learning in statistical models is to introduce independence assumptions; however, doing so increases the model's bias. It is natural to wonder how a simpler grammar model (that can be trained efficiently from raw text) would compare with conventional models (which make fewer independence assumptions, but which must be trained from labelled data). Such a model would be a useful tool in domains where partial accuracy is valuable and large amounts of unlabelled data are available (e.g., Information Retrieval, Information Extraction, etc.).

### Reviewed by

"... The $64,000 question in computational linguistics these days is: “What should I read to learn about statistical natural language processing? ” I have been asked this question over and over, and each time I have given basically the same reply: there is no text that addresses this topic directly, and ..."

Abstract
- Add to MetaCart

The $64,000 question in computational linguistics these days is: “What should I read to learn about statistical natural language processing? ” I have been asked this question over and over, and each time I have given basically the same reply: there is no text that addresses this topic directly, and the best one can do is find a good probability-theory

### Word Prediction Using a Neural Net

"... A neural network model of word prediction based on automatically derived corpus-based term vectors is proposed as a replacement for the standard n-gram model. Initial testing and evaluation show the technique is promising, but more rigorous evaluation techniques are needed. 1 INTRODUCTION If the ti ..."

Abstract
- Add to MetaCart

A neural network model of word prediction based on automatically derived corpus-based term vectors is proposed as a replacement for the standard n-gram model. Initial testing and evaluation show the technique is promising, but more rigorous evaluation techniques are needed. 1 INTRODUCTION If the title of this paper had been Word Prediction Using a Neural X, most AI researchers could replace the X with the correct word. This is the problem of word prediction, a quintessential problem in natural language processing with obvious applications like speech recognition. One typical way to solve the word prediction problem is via an n-gram model (where typically n=2 or 3 - see for example [Church and Mercer, 1993], [Brown et al., 1992], [Della Pietra et al., 1994], etc.). With this model, the probability of a certain word occurring depends only on the previous word or two. Furthermore, these probabilities are typically estimated by counting the number of times n-grams occur in a large text co...

### A Goal-Oriented Language Model

"... A "goal-oriented" language model is introduced. This model, or, more properly, family of traditional nth-order markov models, is constrained by the value of a symbol which is to occur some time in the future. This constraint makes it possible to use the model to generate strings which have ..."

Abstract
- Add to MetaCart

(Show Context)
A "goal-oriented" language model is introduced. This model, or, more properly, family of traditional nth-order markov models, is constrained by the value of a symbol which is to occur some time in the future. This constraint makes it possible to use the model to generate strings which have a goal in mind, a process which is difficult to achieve using standard markov models. An intended application of the model is as the language generation device of a non-intelligent conversation simulator, which is being developed for the 1999 Loebner contest in artificial intelligence. Another possible application is adaptive text compression.

### Statistical Language Modeling Using Grammatical Information

, 1995

"... We propose to investigate the use of grammatical information to build improved statistical language models. Until recently, language models were primarily influenced by local lexical constraints. Today, language models often utilize longer range lexical information to aid in their predictions. All o ..."

Abstract
- Add to MetaCart

(Show Context)
We propose to investigate the use of grammatical information to build improved statistical language models. Until recently, language models were primarily influenced by local lexical constraints. Today, language models often utilize longer range lexical information to aid in their predictions. All of these language models ignore grammatical considerations other than those induced by the statistics of lexical constraints. We believe that properly incorporating additional grammatical structure will achieve improved language models. We will use link grammar as our grammatical base. Being highly lexical in nature, the link grammar formalism will allow us to integrate more traditional modeling schemes with grammatical ones. An efficient robust link grammar parser will assist in this undertaking. We will initially build finite state-based language models that will utilize relatively simple grammatical information, such as part-of-speech data, along with information sources used by other lang...

### Chapter 1 k-Valued Link Grammars are Learnable from Strings

"... ABSTRACT. The article is concerned with learning link grammars in the model of Gold. We show that rigid and k-valued link grammars are learnable from strings. In fact, we prove that the languages of link structured lists of words associated to rigid link grammars have finite elasticity and we show a ..."

Abstract
- Add to MetaCart

(Show Context)
ABSTRACT. The article is concerned with learning link grammars in the model of Gold. We show that rigid and k-valued link grammars are learnable from strings. In fact, we prove that the languages of link structured lists of words associated to rigid link grammars have finite elasticity and we show a learning algorithm. As a standard corollary, this result leads to the learnability of rigid or k-valued link grammars learned from strings. 1.1

### Review of "Statistical language learning" by Eugene Charniak

, 1993

"... Introduction The $64,000 question in computational linguistics these days is: "What should I read to learn about statistical natural language processing?" I have been asked this question over and over, and each time I have given basically the same reply: there is no text that addresses th ..."

Abstract
- Add to MetaCart

Introduction The $64,000 question in computational linguistics these days is: "What should I read to learn about statistical natural language processing?" I have been asked this question over and over, and each time I have given basically the same reply: there is no text that addresses this topic directly, and the best one can do is find a good probability-theory textbook and a good information-theory textbook, and supplement those texts with an assortment of conference papers and journal articles. Understanding the disappointment this answer provoked, I was delighted to hear that someone had finally written a book directly addressing this topic. However, after reading Eugene Charniak's Statistical Language Learning, I have very mixed feelings about the impact this book might have on the ever-growing field of statistical NLP. The book begins with a very brief description of the classic artificial intelligence approach to NLP (chapter 1), including morphology, s

### Gibbs-Markov Models

- In Computing Science and Statistics: Proceedings of the 27th Symposium on the Interface. Interface Foundation
, 1995

"... In this paper we present a framework for building probabilistic automata parameterized by context-dependent probabilities. Gibbs distributions are used to model state transitions and output generation, and parameter estimation is carried out using an EM algorithm where the M-step uses a generalized ..."

Abstract
- Add to MetaCart

In this paper we present a framework for building probabilistic automata parameterized by context-dependent probabilities. Gibbs distributions are used to model state transitions and output generation, and parameter estimation is carried out using an EM algorithm where the M-step uses a generalized iterative scaling procedure. We discuss relations with certain classes of stochastic feedforward neural networks, a geometric interpretation for parameter estimation, and a simple example of a statistical language model constructed using this methodology. 1. Introduction Standard statistical approaches to speech and language processing problems use hidden Markov models, or more general probabilistic automata such as stochastic context-free grammars, taking advantage of their wellunderstood properties and efficient training algorithms. But such models are limited in their ability to incorporate contextual information and long-distance dependencies. Because of the Markov assumption, all predi...