Results 1 -
7 of
7
The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages
, 2009
"... For the 11th straight year, the Conference on Computational Natural Language Learning has been accompanied by a shared task whose purpose is to promote natural language processing applications and evaluate them in a standard setting. In 2009, the shared task was dedicated to the joint parsing of syn ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
For the 11th straight year, the Conference on Computational Natural Language Learning has been accompanied by a shared task whose purpose is to promote natural language processing applications and evaluate them in a standard setting. In 2009, the shared task was dedicated to the joint parsing of syntactic and semantic dependencies in multiple languages. This shared task combines the shared tasks of the previous five years under a unique dependency-based formalism similar to the 2008 task. In this paper, we define the shared task, describe how the data sets were created and show their quantitative properties, report the results and summarize the approaches of the participating systems.
Projection-based Acquisition of a Temporal Labeller
"... We present a cross-lingual projection framework for temporal annotations. Automatically obtained TimeML annotations in the English portion of a parallel corpus are transferred to the German translation along a word alignment. Direct projection augmented with shallow heuristic knowledge outperforms t ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
We present a cross-lingual projection framework for temporal annotations. Automatically obtained TimeML annotations in the English portion of a parallel corpus are transferred to the German translation along a word alignment. Direct projection augmented with shallow heuristic knowledge outperforms the uninformed baseline by 6.64 % F1-measure for events, and by 17.93 % for time expressions. Subsequent training of statistical classifiers on the (imperfect) projected annotations significantly boosts precision by up to 31 % to 83.95 % and 89.52%, respectively. 1
Parallel LFG grammars on parallel corpora: A base for practical triangulation
- In
, 2008
"... This paper presents an approach to annotation projection in a multi-parallel corpus, that is, a collection of translated texts in more than two languages. Existing analysis tools, like the LFG grammars from the ParGram project, are applied to two of the languages in the corpus and the resulting anno ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper presents an approach to annotation projection in a multi-parallel corpus, that is, a collection of translated texts in more than two languages. Existing analysis tools, like the LFG grammars from the ParGram project, are applied to two of the languages in the corpus and the resulting annotation is projected to a third language, taking advantage of the largely parallel character of f-structure. The third language can be a low-resource language. The technique can thus be particularly beneficial for corpus-based (cross-) linguistic research. We discuss a number of ways to realize automatic corpus annotation based on multi-source projection, including direct projection and approaches with an additional generalization step that employs machine learning techniques. We present a series of detailed experiments for a sample annotation task, verb argument identification, using the German and English ParGram grammars for projection to Dutch and maximum entropy models for learning generalizations. 1
Cross Lingual Syntax Projection for Resource-Poor Languages
"... Language Technologies Institute, ..."
Transferring Coreference Chains through Word Alignment
"... This paper investigates the problem of automatically annotating resources with NP coreference information using a parallel corpus, English-Romanian, in order to transfer, through word alignment, coreference chains from the English part to the Romanian part of the corpus. The results show that we can ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper investigates the problem of automatically annotating resources with NP coreference information using a parallel corpus, English-Romanian, in order to transfer, through word alignment, coreference chains from the English part to the Romanian part of the corpus. The results show that we can detect Romanian referential expressions and coreference chains with over 80 % F-measure, thus using our method as a preprocessing step followed by manual correction as part of an annotation effort for creating a large Romanian corpus with coreference information is worthwhile. 1.
Cross-Lingual Projection of LFG F-Structures: Resource Induction for Polish
"... Natural language processing has made rapid progress over the last decades. Yet, computational linguistic resources and tools are restricted to a handful of languages. It seems unrealistic to develop high-quality resources for all languages using traditional methods. Especially the creation of gramma ..."
Abstract
- Add to MetaCart
Natural language processing has made rapid progress over the last decades. Yet, computational linguistic resources and tools are restricted to a handful of languages. It seems unrealistic to develop high-quality resources for all languages using traditional methods. Especially the creation of grammars and syntactic treebanks is an expensive process. Various methods aim at overcoming the shortage of NLP resources. One approach that is pursued in this paper targets the induction of linguistic annotations in a cross-linguistic setting: Using a bilingual corpus, existing analysis tools are applied to the resource-rich language side of the bitext. The resulting annotations are projected to the second, resourcepoor language using automatic word alignments as a bridge. The projection approach for resource induction is built on the assumption that linguistic analysis of a sentence carries over to its translation in an aligned parallel corpus. While this assumption does not hold uniformly, the projected annotations can be used to train NLP tools for the target language. This has been shown for PoS tagging [1], NP-bracketing [1], dependency analysis [2, 3], word sense disambiguation [4], extraction of semantic roles [5] and temporal labelling [6]. Within the ParGram project [7], grammars for English, French, German, Norwegian, Japanese, Urdu and other languages are written according to the framework of Lexical Functional Grammar, using XLE as a processing platform [8]. Manual development of large-scale LFG grammars is an expensive process that may be sped up by automation techniques. One strand of work that targets the automatic induction of LFG grammars is the induction from existing
Discovery of Ambiguous and Unambiguous Discourse Connectives via Annotation Projection
"... We present work on tagging German discourse connectives using English training data and a German-English parallel corpus, and report first results towards a more comprehensive approach of doing annotation projection for explicit discourse relations. Our results show that (i) an approach based on a d ..."
Abstract
- Add to MetaCart
We present work on tagging German discourse connectives using English training data and a German-English parallel corpus, and report first results towards a more comprehensive approach of doing annotation projection for explicit discourse relations. Our results show that (i) an approach based on a dictionary of connectives currently has advantages over a simpler approach that uses word alignments without further linguistic information, but also that (ii) bootstrapping a connective dictionary using distribution-based heuristics on aligned bitexts seems to be a feasible and low-effort way of creating such a resource. Our best method achieves an F-measure of 68.7 % for the identification of discourse connectives without any German-language training data, which is a large improvement over a nontrivial baseline. 1

