Results 1 
6 of
6
Inducing compact but accurate treesubstitution grammars
 In Proc. NAACL
, 2009
"... Tree substitution grammars (TSGs) are a compelling alternative to contextfree grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and overfitting. We present a theoretica ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Tree substitution grammars (TSGs) are a compelling alternative to contextfree grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and overfitting. We present a theoretically principled model which solves these problems using a Bayesian nonparametric formulation. Our model learns compact and simple grammars, uncovering latent linguistic structures (e.g., verb subcategorisation), and in doing so far outperforms a standard PCFG. 1
Inducing TreeSubstitution Grammars
"... Inducing a grammar from text has proven to be a notoriously challenging learning task despite decades of research. The primary reason for its difficulty is that in order to induce plausible grammars, the underlying model must be capable of representing the intricacies of language while also ensuring ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Inducing a grammar from text has proven to be a notoriously challenging learning task despite decades of research. The primary reason for its difficulty is that in order to induce plausible grammars, the underlying model must be capable of representing the intricacies of language while also ensuring that it can be readily learned from data. The majority of existing work on grammar induction has favoured model simplicity (and thus learnability) over representational capacity by using context free grammars and first order dependency grammars, which are not sufficiently expressive to model many common linguistic constructions. We propose a novel compromise by inferring a probabilistic tree substitution grammar, a formalism which allows for arbitrarily large tree fragments and thereby better represent complex linguistic structures. To limit the model’s complexity we employ a Bayesian nonparametric prior which biases the model towards a sparse grammar with shallow productions. We demonstrate the model’s efficacy on supervised phrasestructure parsing, where we induce a latent segmentation of the training treebank, and on unsupervised dependency grammar induction. In both cases the model uncovers interesting latent linguistic structures while producing competitive results.
Linguistic and Statistical Extensions of Data Oriented Parsing
, 2006
"... This thesis explores certain linguistic and statistical extensions of DataOriented Parsing (DOP). The central idea in DOP is to analyse new input on the basis of a collection of fragmentprobability pairs. In its simplest version, TreeDOP, the fragments used are subparts of simple phrase structure ..."
Abstract
 Add to MetaCart
This thesis explores certain linguistic and statistical extensions of DataOriented Parsing (DOP). The central idea in DOP is to analyse new input on the basis of a collection of fragmentprobability pairs. In its simplest version, TreeDOP, the fragments used are subparts of simple phrase structure trees. Resolving ambiguity (i.e. selecting the optimal analysis) involves identifying the Most Probable Parse (MPP). Though empirical evaluation has shown stateoftheart results, the linguistic expressive mechanism of this model is very limited. In addition, the algorithm used to compute the MPP has been shown to suffer from several disadvantages. The aim of the thesis is twofold. In the first part, we seek to explore how the linguistic dimension of DOP can be enhanced. To this end, we investigate how the framework can be applied to representations based on a richer annotation scheme, specifically that of Headdriven Phrase Structure Grammar (HPSG). This investigation culminates in the development of an HPSGDOP model, which takes maximal advantage of the underlying formalism. The proposed model embodies a number of positive characteristics
1 What is DOP?
, 2007
"... in 1990, it advocates a much more empirical approach to natural language parsing than was common at that time. 1.1 The Problem Scha notes how Chomsky makes a sharp distinction between a formal grammar that defines the set of grammatical sentences in a language and the use of that language in a commu ..."
Abstract
 Add to MetaCart
in 1990, it advocates a much more empirical approach to natural language parsing than was common at that time. 1.1 The Problem Scha notes how Chomsky makes a sharp distinction between a formal grammar that defines the set of grammatical sentences in a language and the use of that language in a community, sloppiness and mistakes included. The former is usually referred to by Chomsky as “competence”, and the latter as “performance.” Scha then questions how useful this distinction is for contemporary natural language processing systems. He first points out how linguistic theory and practical NLP systems have converged on context free grammars as a respectable theoretical model of natural language syntax also having efficient parsing algorithms. Scha then goes on to point out some drawbacks of using context free grammars to model natural language syntax. First, trying to account for more phenomena by adding rules to a grammar
Many successful models of syntax are based on
"... Tree substitution grammars (TSGs) are a compelling alternative to contextfree grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and overfitting. We present a theoretical ..."
Abstract
 Add to MetaCart
Tree substitution grammars (TSGs) are a compelling alternative to contextfree grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and overfitting. We present a theoretically principled model which solves these problems using a Bayesian nonparametric formulation. Our model learns compact and simple grammars, uncovering latent linguistic structures (e.g., verb subcategorisation), and in doing so far outperforms a standard PCFG.