Results 1 - 10
of
64
Linguistically motivated large-scale NLP with C&C and Boxer
- In Proceedings of the Demonstrations Session of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-07
, 2007
"... The statistical modelling of language, together with advances in wide-coverage grammar development, have led to high levels of robustness and efficiency in NLP systems and made linguistically motivated ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
The statistical modelling of language, together with advances in wide-coverage grammar development, have led to high levels of robustness and efficiency in NLP systems and made linguistically motivated
Formalism-independent parser evaluation with CCG and DepBank
- In Proceedings of the 45th Annual Meeting of the ACL
, 2007
"... A key question facing the parsing community is how to compare parsers which use different grammar formalisms and produce different output. Evaluating a parser on the same resource used to create it can lead to non-comparable accuracy scores and an over-optimistic view of parser performance. In this ..."
Abstract
-
Cited by 19 (7 self)
- Add to MetaCart
A key question facing the parsing community is how to compare parsers which use different grammar formalisms and produce different output. Evaluating a parser on the same resource used to create it can lead to non-comparable accuracy scores and an over-optimistic view of parser performance. In this paper we evaluate a CCG parser on DepBank, and demonstrate the difficulties in converting the parser output into Dep-Bank grammatical relations. In addition we present a method for measuring the effectiveness of the conversion, which provides an upper bound on parsing accuracy. The CCG parser obtains an F-score of 81.9% on labelled dependencies, against an upper bound of 84.8%. We compare the CCG parser against the RASP parser, outperforming RASP by over 5 % overall and on the majority of dependency types. 1
Automatically acquiring models of preposition use
"... This paper proposes a machine-learning based approach to predict accurately, given a syntactic and semantic context, which preposition is most likely to occur in that context. Each occurrence of a preposition in an English corpus has its context represented by a vector containing 307 features. The v ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This paper proposes a machine-learning based approach to predict accurately, given a syntactic and semantic context, which preposition is most likely to occur in that context. Each occurrence of a preposition in an English corpus has its context represented by a vector containing 307 features. The vectors are processed by a voted perceptron algorithm to learn associations between contexts and prepositions. In preliminary tests, we can associate contexts and prepositions with a success rate of up to 84.5%. 1
A System for Large-Scale Acquisition of Verbal, Nominal and Adjectival Subcategorization Frames from Corpora
"... This paper describes the first system for large-scale acquisition of subcategorization frames (SCFs) from English corpus data which can be used to acquire comprehensive lexicons for verbs, nouns and adjectives. The system incorporates an extensive rulebased classifier which identifies 168 verbal, 37 ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
This paper describes the first system for large-scale acquisition of subcategorization frames (SCFs) from English corpus data which can be used to acquire comprehensive lexicons for verbs, nouns and adjectives. The system incorporates an extensive rulebased classifier which identifies 168 verbal, 37 adjectival and 31 nominal frames from grammatical relations (GRs) output by a robust parser. The system achieves state-ofthe-art performance on all three sets. 1
Integrating natural language processing with FlyBase curation
- In Proceedings of PSB 2007
, 2007
"... Applying Natural Language Processing techniques to biomedical text as a potential aid to curation has become the focus of intensive research. However, developing integrated systems which address the curators ’ real-world needs has been studied less rigorously. This paper addresses this question and ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Applying Natural Language Processing techniques to biomedical text as a potential aid to curation has become the focus of intensive research. However, developing integrated systems which address the curators ’ real-world needs has been studied less rigorously. This paper addresses this question and presents generic tools developed to assist FlyBase curators. We discuss how they have been integrated into the curation workflow and present initial evidence about their effectiveness. 1.
Shift-reduce dependency DAG parsing
- In Proc. of COLING
, 2008
"... Abstract � Most data-driven dependency parsing approaches assume that sentence structure is represented as trees. Although trees have several desirable properties from both computational and linguistic perspectives, the structure of linguistic phenomena that goes beyond shallow syntax often cannot b ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract � Most data-driven dependency parsing approaches assume that sentence structure is represented as trees. Although trees have several desirable properties from both computational and linguistic perspectives, the structure of linguistic phenomena that goes beyond shallow syntax often cannot be fully captured by tree representations. We present a parsing approach that is nearly as simple as current data-driven transition-based dependency parsing frameworks, but outputs directed acyclic graphs (DAGs). We demonstrate the benefits of DAG parsing in two experiments where its advantages over dependency tree parsing can be clearly observed: predicate-argument analysis of English and syntactic analysis of Danish with a representation that includes long-distance dependencies and anaphoric reference links. 1
Improving Verb Clustering with Automatically Acquired Selectional Preferences
"... In previous research in automatic verb classification, syntactic features have proved the most useful features, although manual classifications rely heavily on semantic features. We show, in contrast with previous work, that considerable additional improvement can be obtained by using semantic featu ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
In previous research in automatic verb classification, syntactic features have proved the most useful features, although manual classifications rely heavily on semantic features. We show, in contrast with previous work, that considerable additional improvement can be obtained by using semantic features in automatic classification: verb selectional preferences acquired from corpus data using a fully unsupervised method. We report these promising results using a new framework for verb clustering which incorporates a recent subcategorization acquisition system, rich syntactic-semantic feature sets, and a variation of spectral clustering which performs particularly well in high dimensional feature space. 1
Latent variable models of selectional preference
- In ACL 2010
, 2010
"... This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to pr ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data. 1
Challenges in Mapping of Syntactic Representations for Framework-Independent Parser Evaluation
, 2008
"... We explore some of the issues and challenges created by the incompatibility of diverse representation schemes for syntactic parsing. In particular, we examine the problem of output format conversion for evaluation of parsers that use different formalisms. We discuss recent related efforts, and prese ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We explore some of the issues and challenges created by the incompatibility of diverse representation schemes for syntactic parsing. In particular, we examine the problem of output format conversion for evaluation of parsers that use different formalisms. We discuss recent related efforts, and present an evaluation of different parsers that use representations that vary not only in formalisms, but also in depth of syntactic information. We attempt to compare these parsers in a domain widely used for parser evaluation, the Wall Street Journal section of the Penn Treebank, and in the academic biomedical literature, where the use of parsing technologies is expected to contribute in practical applications, such as information extraction and text mining.
Semi-supervised Semantic Role Labeling Using the Latent Words Language Model
- In Proceedings of EMNLP-09
, 2009
"... Semantic Role Labeling (SRL) has proved to be a valuable tool for performing automatic analysis of natural language texts. Currently however, most systems rely on a large training set, which is manually annotated, an effort that needs to be repeated whenever different languages or a different set of ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Semantic Role Labeling (SRL) has proved to be a valuable tool for performing automatic analysis of natural language texts. Currently however, most systems rely on a large training set, which is manually annotated, an effort that needs to be repeated whenever different languages or a different set of semantic roles is used in a certain application. A possible solution for this problem is semi-supervised learning, where a small set of training examples is automatically expanded using unlabeled texts. We present the Latent Words Language Model, which is a language model that learns word similarities from unlabeled texts. We use these similarities for different semi-supervised SRL methods as additional features or to automatically expand a small training set. We evaluate the methods on the PropBank dataset and find that for small training sizes our best performing system achieves an error reduction of 33.27 % F1-measure compared to a state-of-the-art supervised baseline. 1

