Results 1 - 10
of
32
The CoNLL-2008 Shared Task on Joint Parsing of Syntactic and Semantic Dependencies
"... The Conference on Computational Natural Language Learning is accompanied every year by a shared task whose purpose is to promote natural language processing applications and evaluate them in a standard setting. In 2008 the shared task was dedicated to the joint parsing of syntactic and semantic depe ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
The Conference on Computational Natural Language Learning is accompanied every year by a shared task whose purpose is to promote natural language processing applications and evaluate them in a standard setting. In 2008 the shared task was dedicated to the joint parsing of syntactic and semantic dependencies. This shared task not only unifies the shared tasks of the previous four years under a unique dependency-based formalism, but also extends them significantly: this year’s syntactic dependencies include more information such as named-entity boundaries; the semantic dependencies model roles of both verbal and nominal predicates. In this paper, we define the shared task and describe how the data sets were created. Furthermore, we report and analyze the results and describe the approaches of the participating systems.
Syntactic Topic Models
"... We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word ..."
Abstract
-
Cited by 24 (6 self)
- Add to MetaCart
We develop the syntactic topic model (STM), a nonparametric Bayesian model of parsed documents. The STM generates words that are both thematically and syntactically constrained, which combines the semantic insights of topic models with the syntactic information available from parse trees. Each word of a sentence is generated by a distribution that combines document-specific topic weights and parse-tree-specific syntactic transitions. Words are assumed to be generated in an order that respects the parse tree. We derive an approximate posterior inference method based on variational methods for hierarchical Dirichlet processes, and we report qualitative and quantitative results on both synthetic data and hand-parsed documents. 1
An efficient algorithm for easy-first non-directional dependency parsing
- In Proc. of NAACL
, 2010
"... We present a novel deterministic dependency parsing algorithm that attempts to create the easiest arcs in the dependency structure first in a non-directional manner. Traditional deterministic parsing algorithms are based on a shift-reduce framework: they traverse the sentence from left-to-right and, ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We present a novel deterministic dependency parsing algorithm that attempts to create the easiest arcs in the dependency structure first in a non-directional manner. Traditional deterministic parsing algorithms are based on a shift-reduce framework: they traverse the sentence from left-to-right and, at each step, perform one of a possible set of actions, until a complete tree is built. A drawback of this approach is that it is extremely local: while decisions can be based on complex structures on the left, they can look only at a few words to the right. In contrast, our algorithm builds a dependency tree by iteratively selecting the best pair of neighbours to connect at each parsing step. This allows incorporation of features from already built structures both to the left and to the right of the attachment point. The parser learns both the attachment preferences and the order in which they should be performed. The result is a deterministic, best-first, O(nlogn) parser, which is significantly more accurate than best-first transition based parsers, and nears the performance of globally optimized parsing models. 1
Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation
"... Dependency parsing is a central NLP task. In this paper we show that the common evaluation for unsupervised dependency parsing is highly sensitive to problematic annotations. We show that for three leading unsupervised parsers (Klein and Manning, 2004; Cohen and Smith, 2009; Spitkovsky et al., 2010a ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Dependency parsing is a central NLP task. In this paper we show that the common evaluation for unsupervised dependency parsing is highly sensitive to problematic annotations. We show that for three leading unsupervised parsers (Klein and Manning, 2004; Cohen and Smith, 2009; Spitkovsky et al., 2010a), a small set of parameters can be found whose modification yields a significant improvement in standard evaluation measures. These parameters correspond to local cases where no linguistic consensus exists as to the proper gold annotation. Therefore, the standard evaluation does not provide a true indication of algorithm quality. We present a new measure, Neutral Edge Direction (NED), and show that it greatly reduces this undesired phenomenon. 1
Task-oriented Evaluation of Syntactic Parsers and Their Representations
- PROCEEDINGS OF THE 46TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES
, 2008
"... This paper presents a comparative evaluation of several state-of-the-art English parsers based on different frameworks. Our approach is to measure the impact of each parser when it is used as a component of an information extraction system that performs protein-protein interaction (PPI) identificati ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
This paper presents a comparative evaluation of several state-of-the-art English parsers based on different frameworks. Our approach is to measure the impact of each parser when it is used as a component of an information extraction system that performs protein-protein interaction (PPI) identification in biomedical papers. We evaluate eight parsers (based on dependency parsing, phrase structure parsing, or deep parsing) using five different parse representations. We run a PPI system with several combinations of parser and parse representation, and examine their impact on PPI identification accuracy. Our experiments show that the levels of accuracy obtained with these different parsers are similar, but that accuracy improvements vary when the parsers are retrained with domain-specific data.
The Effect of Syntactic Representation on Semantic Role Labeling
, 2008
"... Almost all automatic semantic role labeling (SRL) systems rely on a preliminary parsing step that derives a syntactic structure from the sentence being analyzed. This makes the choice of syntactic representation an essential design decision. In this paper, we study the influence of syntactic represe ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Almost all automatic semantic role labeling (SRL) systems rely on a preliminary parsing step that derives a syntactic structure from the sentence being analyzed. This makes the choice of syntactic representation an essential design decision. In this paper, we study the influence of syntactic representation on the performance of SRL systems. Specifically, we compare constituent-based and dependencybased representations for SRL of English in the FrameNet paradigm. Contrary to previous claims, our results demonstrate that the systems based on dependencies perform roughly as well as those based on constituents: For the argument classification task, dependencybased systems perform slightly higher on average, while the opposite holds for the argument identification task. This is remarkable because dependency parsers are still in their infancy while constituent parsing is more mature. Furthermore, the results show that dependency-based semantic role classifiers rely less on lexicalized features, which makes them more robust to domain changes and makes them learn more efficiently with respect to the amount of training data.
Treebanks Gone Bad Parser Evaluation and Retraining using a Treebank of Ungrammatical Sentences
"... Abstract This article describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minima ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract This article describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the original analyses in the treebank so that they describe the newly created ill-formed sentences. Such a treebank can be used to test how well a parser is able to ignore grammatical errors in texts (as people do), and can be used to induce a grammar capable of analysing such sentences. This article demonstrates these two applications using the Penn Treebank. In a robustness evaluation experiment, two state-of-the-art statistical parsers are evaluated on an ungrammatical version of Section 23 of the Wall Street Journal (WSJ) portion of the Penn Treebank. This experiment shows that the performance of both parsers degrades with grammatical noise. A breakdown by error type is provided for both parsers. A second experiment retrains both parsers using an ungrammatical version of WSJ Sections 2-21. This experiment indicates that an ungrammatical treebank is a useful resource in improving parser robustness to grammatical errors, but that the correct combination of grammatical and ungrammatical training data has yet to be determined.
Parser Evaluation and the BNC: Evaluating 4 constituency parsers with 3 metrics
"... We evaluate discriminative parse reranking and parser self-training on a new English test set using four versions of the Charniak parser and a variety of parser evaluation metrics. The new test set consists of 1,000 hand-corrected British National Corpus parse trees. We directly evaluate parser outp ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We evaluate discriminative parse reranking and parser self-training on a new English test set using four versions of the Charniak parser and a variety of parser evaluation metrics. The new test set consists of 1,000 hand-corrected British National Corpus parse trees. We directly evaluate parser output using both the Parseval and the Leaf Ancestor metrics. We also convert the hand-corrected and parser output phrase structure trees to dependency trees using a state-of-the-art functional tag labeller and constituent-to-dependency conversion tool, and then calculate label accuracy, unlabelled attachment and labelled attachment scores over the dependency structures. We find that reranking leads to a performance improvement on the new test set (albeit a modest one). We find that self-training using BNC data leads to significantly better results. However, it is not clear how effective self-training is when the training material comes from the North American News Corpus. 1.
In-domain Relation Discovery with Meta-constraints via Posterior Regularization
"... We present a novel approach to discovering relations and their instantiations from a collection of documents in a single domain. Our approach learns relation types by exploiting meta-constraints that characterize the general qualities of a good relation in any domain. These constraints state that in ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present a novel approach to discovering relations and their instantiations from a collection of documents in a single domain. Our approach learns relation types by exploiting meta-constraints that characterize the general qualities of a good relation in any domain. These constraints state that instances of a single relation should exhibit regularities at multiple levels of linguistic structure, including lexicography, syntax, and document-level context. We capture these regularities via the structure of our probabilistic model as well as a set of declaratively-specified constraints enforced during posterior inference. Across two domains our approach successfully recovers hidden relation structure, comparable to or outperforming previous state-of-the-art approaches. Furthermore, we find that a small set of constraints is applicable across the domains, and that using domain-specific constraints can further improve performance. 1 1
An English Dependency Treebank à la Tesnière
"... During the last decade, the Computational Linguistics community has shown an increased interest in Dependency Treebanks. Several groups have developed new annotated corpora using dependency representation, while other people have proposed several automatic conversion algorithms to transform availabl ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
During the last decade, the Computational Linguistics community has shown an increased interest in Dependency Treebanks. Several groups have developed new annotated corpora using dependency representation, while other people have proposed several automatic conversion algorithms to transform available Phrase Structure (PS) treebanks into Dependency Structure (DS) notation. Such projects typically refer to Tesnière as the father of dependency syntax, but little attempt has been made to explain how the chosen representation relates to the original work. A careful comparison reveals substantial differences: modern DS annotations discard some relevant features characterizing Tesnière’s model. This paper is presenting our attempt to go back to the roots of dependency theory, and show how it is possible to transform a PS English treebank to a DS notation that is closer to the one proposed by Tesnière, which we will refer to as TDS. We will show how this representation can incorporate all main advantages of modern DS, while avoiding well known problems concerning the choice of heads, and better representing common linguistic phenomena such as coordination. 1

