Results 11  20
of
23
PatternBased Disambiguation for Natural Language Processing
, 2000
"... A wide range of natural language problems can be viewed as disambiguating between a small set of alternatives based upon the string context surrounding the ambiguity site. In this paper we demonstrate that classification accuracy can be improved by invoking a more descriptive feature set than ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
A wide range of natural language problems can be viewed as disambiguating between a small set of alternatives based upon the string context surrounding the ambiguity site. In this paper we demonstrate that classification accuracy can be improved by invoking a more descriptive feature set than what is typically used. We present a technique that disambiguates by learning regular expressions describing the string contexts in which the ambiguity sites appear.
Selection Criteria for Word Trigger Pairs in Language Modeling
 In ICGI’96
, 1996
"... . In this paper, we study selection criteria for the use of word trigger pairs in statistical language modeling. A word trigger pair is defined as a longdistance word pair. To select the most significant trigger pairs, we need suitable criteria which are the topics of this paper. We extend a ba ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
. In this paper, we study selection criteria for the use of word trigger pairs in statistical language modeling. A word trigger pair is defined as a longdistance word pair. To select the most significant trigger pairs, we need suitable criteria which are the topics of this paper. We extend a baseline language model by a single word trigger pair and use the perplexity of this extended language model as selection criterion. This extension is applied to all possible trigger pairs, the number of which is the square of the vocabulary size. When using a unigram language model as baseline model, this approach produces the mutual information criterion used in [7, 11]. The more interesting case is to use this criterion for a more powerful model such as a bigram/trigram model with a cache. We study different variants for including word trigger pairs into such a language model. This approach produced better word trigger pairs than the usual mutual information criterion. When used on...
Capturing long distance dependency for language modeling: an empirical study
 In Proceedings of the First International Joint Conference on Natural Language Processing (IJCNLP04
, 2004
"... This paper presents an extensive empirical study on two language modeling techniques, linguisticallymotivated word skipping and predictive clustering, both of which are used in capturing long distance word dependencies that are beyond the scope of a word trigram model. We compare the techniques to ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This paper presents an extensive empirical study on two language modeling techniques, linguisticallymotivated word skipping and predictive clustering, both of which are used in capturing long distance word dependencies that are beyond the scope of a word trigram model. We compare the techniques to others that were proposed previously for the same purpose. We evaluate the resulting models on the task of Japanese KanaKanji conversion. We show that the two techniques, while simple, outperform existing methods studied in this paper, and lead to language models that perform significantly better than a word trigram model. We also investigate how factors such as training corpus size and genre affect the performance of the models. 1
Combining Labeled and Unlabeled Data in Statistical Natural Language Parsing
, 2002
"... Prof. Aravind Joshi, my dissertation advisor has been my guide and mentor for the entire time that I spent at Penn. I thank him for all his academic help and personal kindness. The external member on my dissertation committee was Steven Abney, whose suggestions and advice have made the ideas present ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Prof. Aravind Joshi, my dissertation advisor has been my guide and mentor for the entire time that I spent at Penn. I thank him for all his academic help and personal kindness. The external member on my dissertation committee was Steven Abney, whose suggestions and advice have made the ideas presented here stronger. My dissertation committee members from Penn: Mitch Marcus, Mark Liberman and Martha Palmer provided questions whose answers shaped my dissertation proposal into the finished form in front of you. Many thanks to my academic collaborators; the work on prefix probabilities was done with MarkJan Nederhof and Giorgio Satta when they visited IRCS in 1998, the work on subcategorization frame learning was done in collaboration with Daniel Zeman when he visited IRCS in 2000. Thanks to B. Srinivas whose previous work provided the path to the experimental work in this dissertation. Thanks also to Paola Merlo and Suzanne Stevenson for discussions on their work on verb alternation classes. I also acknowledge the help of Woottiporn Tripasai in the extension of their work presented in this dissertation. Thanks to
Nonuniform Markov Models
, 1996
"... A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a small, finite number of other symbols in the string. In this report we propose a new way to model conditional independence in Markov models. The central feature of our nonuniform Markov model is that it makes predictions of varying lengths using contexts of varying lengths. Experiments on the Wall Street Journal reveal that the nonuniform model performs slightly better than the classic interpolated Markov model.
A Scalable Distributed Syntactic, Semantic, and Lexical Language Model
"... This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an ngram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper presents an attempt at building a large scale distributed composite language model that is formed by seamlessly integrating an ngram model, a structured language model, and probabilistic latent semantic analysis under a directed Markov random field paradigm to simultaneously account for local word lexical information, midrange sentence syntactic structure, and longspan document semantic content. The composite language model has been trained by performing a convergent Nbest list approximate EM algorithm and a followup EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over ngrams and achieves significantly better translation quality measured by the Bleu score and “readability ” of translations when applied to the task of reranking the Nbest list from a stateoftheart parsingbased machine translation system. 1.
An InformationTheoretic Empirical Analysis of DependencyBased Feature Types for Word Prediction Models
 University of Maryland, USA
, 1999
"... Over the years, many proposals have been made to incorporate assorted types of feature in language models. However, discrepancies between training sets, evaluation criteria, algorithms, and hardware environments make it difficult to compare the models objectively. In this paper, we take an informati ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Over the years, many proposals have been made to incorporate assorted types of feature in language models. However, discrepancies between training sets, evaluation criteria, algorithms, and hardware environments make it difficult to compare the models objectively. In this paper, we take an information theoretic approach to select feature types in a systematic manner. We describe a quantitative analysis of the information gain and the information redundancy for various combinations of feature types inspired by both dependency structure and bigram structure, using a Chinese treebank and taking word prediction as the object. The experiments yield several conclusions on the predictive value of several feature types and feature types combinations for word prediction, which are expected to provide guidelines for feature type selection in language modeling.
Statistical Parsing Algorithms for Lexicalized Tree Adjoining Grammars
"... The goal of this dissertation is twofold: to develop the theory of probabilistic Tree Adjoining Grammars (TAGs) and to present some practical results in the form of efficient parsing and estimation algorithms for probabilistic TAGs. The overall goal of developing the theory of probabilistic TAGs is ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The goal of this dissertation is twofold: to develop the theory of probabilistic Tree Adjoining Grammars (TAGs) and to present some practical results in the form of efficient parsing and estimation algorithms for probabilistic TAGs. The overall goal of developing the theory of probabilistic TAGs is to provide a simple, mathematically and linguistically wellformed probabilistic framework for statistical parsing. The practical results in parsing and estimation of probabilistic TAGs are developed with a view towards an increasingly unsupervised approach to the training of statistical parsers and language models. In particular, this proposal contains the following results: An algorithm for determining deficiency in a generative model for probabilistic TAGs. Anovel chart based headcorner parsing algorithm for probabilistic TAGs. A probability model for statistical parsing and a cotraining method for training this parser which combines labeled and unlabeled data. An algorithm for computing prefix probabilities which can be used to predict the word most likely to occur after an initial substring of the input. The proposed work can be summarized in the following points: A separate evaluation of the cotraining algorithm on a larger set of labeled and unlabeled data, in addition to the evaluation presented in this proposal. An evaluation of the pre x probability algorithm by comparing it with a trigram language model. An extension of techniques in learning subcategorization information and verb classes to produce TAG lexicons which can be directly used to improve performance of the cotraining algorithm.
Nonuniform Markov Models
, 1996
"... A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a ..."
Abstract
 Add to MetaCart
A statistical language model assigns probability to strings of arbitrary length. Unfortunately, it is not possible to gather reliable statistics on strings of arbitrary length from a finite corpus. Therefore, a statistical language model must decide that each symbol in a string depends on at most a small, finite number of other symbols in the string. In this report we propose a new way to model conditional independence in Markov models. The central feature of our nonuniform Markov model is that it makes predictions of varying lengths using contexts of varying lengths. Experiments on the Wall Street Journal reveal that the nonuniform model performs slightly better than the classic interpolated Markov model. Keywords: nonuniform Markov model, interpolated Markov model, conditional independence, statistical language model, discrete time series. 1 Thanks to Andrew Appel, Joe Kupin and Harry Printz for their critique. Our implementation of the nonuniform model used the library of practical...
Review of "Statistical language learning" by Eugene Charniak
, 1993
"... Introduction The $64,000 question in computational linguistics these days is: "What should I read to learn about statistical natural language processing?" I have been asked this question over and over, and each time I have given basically the same reply: there is no text that addresses this topic d ..."
Abstract
 Add to MetaCart
Introduction The $64,000 question in computational linguistics these days is: "What should I read to learn about statistical natural language processing?" I have been asked this question over and over, and each time I have given basically the same reply: there is no text that addresses this topic directly, and the best one can do is find a good probabilitytheory textbook and a good informationtheory textbook, and supplement those texts with an assortment of conference papers and journal articles. Understanding the disappointment this answer provoked, I was delighted to hear that someone had finally written a book directly addressing this topic. However, after reading Eugene Charniak's Statistical Language Learning, I have very mixed feelings about the impact this book might have on the evergrowing field of statistical NLP. The book begins with a very brief description of the classic artificial intelligence approach to NLP (chapter 1), including morphology, s