Results 1  10
of
58
PCFG Models of Linguistic Tree Representations
 Computational Linguistics
, 1998
"... This paper points out that the Penn lI treebank representations are of the kind predicted to have such an effect, and describes a simple node relabeling transformation that improves a treebank PCFGbased parser's average precision and recall by around 8%, or approximately half of the performanc ..."
Abstract

Cited by 225 (9 self)
 Add to MetaCart
This paper points out that the Penn lI treebank representations are of the kind predicted to have such an effect, and describes a simple node relabeling transformation that improves a treebank PCFGbased parser's average precision and recall by around 8%, or approximately half of the performance difference between a simple PCFG model and the best broadcoverage parsers available today. This performance variation comes about because any PCFG, and hence the corpus of trees from which the PCFG is induced, embodies independence assumptions about the distribution of words and phrases. The particular independence assumptions implicit in a tree representation can be studied theoretically and investigated empirically by means of a tree transformation / detransformation process
Parameter learning of logic programs for symbolicstatistical modeling
 Journal of Artificial Intelligence Research
, 2001
"... We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. de nite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distributio ..."
Abstract

Cited by 100 (20 self)
 Add to MetaCart
We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. de nite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distribution semantics, possible world semantics with a probability distribution which is unconditionally applicable to arbitrary logic programs including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM algorithm, the graphical EM algorithm, thatrunsfora class of parameterized logic programs representing sequential decision processes where each decision is exclusive and independent. It runs on a new data structure called support graphs describing the logical relationship between observations and their explanations, and learns parameters by computing inside and outside probability generalized for logic programs. The complexity analysis shows that when combined with OLDT search for all explanations for observations, the graphical EM algorithm, despite its generality, has the same time complexity as existing EM algorithms, i.e. the BaumWelch algorithm for HMMs, the InsideOutside algorithm for PCFGs, and the one for singly connected Bayesian networks that have beendeveloped independently in each research eld. Learning experiments with PCFGs using two corpora of moderate size indicate that the graphical EM algorithm can signi cantly outperform the InsideOutside algorithm. 1.
Parsing InsideOut
, 1998
"... Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probabili ..."
Abstract

Cited by 89 (2 self)
 Add to MetaCart
Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probability that any given nonterminal covers any piece of the input sentence. The traditional use of these probabilities is to improve the probabilities of grammar rules. In this thesis we show that these values are useful for solving many other problems in Statistical Natural Language Processing. We give a framework for describing parsers. The framework generalizes the inside and outside values to semirings. It makes it easy to describe parsers that compute a wide variety of interesting quantities, including the inside and outside probabilities, as well as related quantities such as Viterbi probabilities and nbest lists. We also present three novel uses for the inside and outside probabilities. T...
Probabilistic TopDown Parsing and Language Modeling
 Computational Linguistics
, 2004
"... This paper describes the functioning of a broadcoverage probabilistic topdown parser, and its application to the problem of language modeling for speech recognition. The paper first introduces key notions in language modeling and probabilistic parsing, and briefly reviews some previous approaches ..."
Abstract

Cited by 75 (1 self)
 Add to MetaCart
This paper describes the functioning of a broadcoverage probabilistic topdown parser, and its application to the problem of language modeling for speech recognition. The paper first introduces key notions in language modeling and probabilistic parsing, and briefly reviews some previous approaches to using syntactic structure for language modeling. A lexicalized probabilistic topdown parser is then presented, which performs very well, in terms of both the accuracy of returned parses and the efficiency with which they are found, relative to the best broadcoverage statistical parsers. A new language model that utilizes probabilistic topdown parsing is then outlined, and empirical results show that it improves upon previous work in test corpus perplexity. Interpolation with a trigram model yields an exceptional improvement relative to the improvement observed by other models, demonstrating the degree to which the information captured by our parsing model is orthogonal to that captured by a trigram model. A small recognition experiment also demonstrates the utility of the model
Recursive Markov chains, stochastic grammars, and monotone systems of nonlinear equations
 IN STACS
, 2005
"... We define Recursive Markov Chains (RMCs), a class of finitely presented denumerable Markov chains, and we study algorithms for their analysis. Informally, an RMC consists of a collection of finitestate Markov chains with the ability to invoke each other in a potentially recursive manner. RMCs offer ..."
Abstract

Cited by 72 (11 self)
 Add to MetaCart
We define Recursive Markov Chains (RMCs), a class of finitely presented denumerable Markov chains, and we study algorithms for their analysis. Informally, an RMC consists of a collection of finitestate Markov chains with the ability to invoke each other in a potentially recursive manner. RMCs offer a natural abstract model for probabilistic programs with procedures. They generalize, in a precise sense, a number of well studied stochastic models, including Stochastic ContextFree Grammars (SCFG) and MultiType Branching Processes (MTBP). We focus on algorithms for reachability and termination analysis for RMCs: what is the probability that an RMC started from a given state reaches another target state, or that it terminates? These probabilities are in general irrational, and they arise as (least) fixed point solutions to certain (monotone) systems of nonlinear equations associated with RMCs. We address both the qualitative problem of determining whether the probabilities are 0, 1 or inbetween, and
Parameter Estimation in Stochastic Logic Programs
 Machine Learning
, 2000
"... . Stochastic logic programs (SLPs) are logic programs with labelled clauses which dene a loglinear distribution over refutations of goals. The loglinear distribution provides, by marginalisation, a distribution over variable bindings, allowing SLPs to compactly represent quite complex distributions ..."
Abstract

Cited by 71 (4 self)
 Add to MetaCart
. Stochastic logic programs (SLPs) are logic programs with labelled clauses which dene a loglinear distribution over refutations of goals. The loglinear distribution provides, by marginalisation, a distribution over variable bindings, allowing SLPs to compactly represent quite complex distributions. We analyse the fundamental statistical properties of SLPs addressing issues concerning innite derivations, `unnormalised' SLPs and impure SLPs. After detailing existing approaches to parameter estimation for loglinear models and their application to SLPs, we present a new algorithm called failureadjusted maximisation (FAM). FAM is an instance of the EM algorithm that applies specically to normalised SLPs and provides a closedform for computing parameter updates within an iterative maximisation approach. We empirically show that FAM works on some small examples and discuss methods for applying it to bigger problems. c 2000 Kluwer Academic Publishers. Printed in the Netherlands. ...
Supervised and unsupervised PCFG adaptation to novel domains
, 2003
"... This paper investigates adapting a lexicalized probabilistic contextfree grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example ..."
Abstract

Cited by 45 (0 self)
 Add to MetaCart
This paper investigates adapting a lexicalized probabilistic contextfree grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example. Other approaches falling within this framework are more effective. In contrast to the results
Statistical Properties of Probabilistic ContextFree Grammars
 Computational Linguistics
, 1999
"... This article proves a number of useful properties of probabilistic contextfree grammars (PCFGs). In this section, we give an introduction to the results and related topics ..."
Abstract

Cited by 41 (0 self)
 Add to MetaCart
This article proves a number of useful properties of probabilistic contextfree grammars (PCFGs). In this section, we give an introduction to the results and related topics
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)
Spatial random tree grammars for modeling hierarchal structure in images with . . .
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2004
"... We present a novel probabilistic model for the hierarchical structure of an image and its regions. We call this model spatial random tree grammars (SRTGs). We develop algorithms for the exact computation of likelihood and maximum a posteriori (MAP) estimates and the exact expectationmaximization ( ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
We present a novel probabilistic model for the hierarchical structure of an image and its regions. We call this model spatial random tree grammars (SRTGs). We develop algorithms for the exact computation of likelihood and maximum a posteriori (MAP) estimates and the exact expectationmaximization (EM) updates for modelparameter estimation. We collectively call these algorithms the centersurround algorithm. We use the centersurround algorithm to automatically estimate the maximum likelihood (ML) parameters of SRTGs and classify images based on their likelihood and based on the MAP estimate of the associated hierarchical structure. We apply our method to the task of classifying natural images and demonstrate that the addition of hierarchical structure significantly improves upon the performance of a baseline model that lacks such structure.