Results 1 - 10
of
21
Tree-based Target Language Modeling
"... In this paper we describe an approach to target language modeling which is based on a large treebank. We assume a bag of bags as input for the target language generation component, leaving it up to this component to decide upon word and phrase order. An experiment with Dutch as target language shows ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In this paper we describe an approach to target language modeling which is based on a large treebank. We assume a bag of bags as input for the target language generation component, leaving it up to this component to decide upon word and phrase order. An experiment with Dutch as target language shows that this approach to candidate translation reranking outperforms standard n-gram modeling, when measuring
2007) Adding semantic role annotation to a corpus of written Dutch
- In: Proc. of LAW-07. ACL 2007 workshop. Prague. Czech Republic
"... We present an approach to automatic semantic role labeling (SRL) carried out in the context of the Dutch Language Corpus Initiative (D-Coi) project. Adapting earlier research which has mainly focused on English to the Dutch situation poses an interesting challenge especially because there is no sema ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present an approach to automatic semantic role labeling (SRL) carried out in the context of the Dutch Language Corpus Initiative (D-Coi) project. Adapting earlier research which has mainly focused on English to the Dutch situation poses an interesting challenge especially because there is no semantically annotated Dutch corpus available that can be used as training data. Our automatic SRL approach consists of three steps: bootstrapping from a syntactically annotated corpus by means of a rulebased tagger developed for this purpose, manual correction on the basis of the Prop-Bank guidelines which have been adapted to Dutch and training a machine learning system on the manually corrected data. 1
Removing the Distinction Between a Translation Memory, a Bilingual Dictionary and a Parallel Corpus
"... This paper presents a prototype MT system which does not make the dis-tinction between a dictionary, a sub-sentential aligned parallel corpus, and post-edited information (translators output) like a translation memory. The system is based on the METIS-approach (Vandeghinste et al, 2006), and uses an ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper presents a prototype MT system which does not make the dis-tinction between a dictionary, a sub-sentential aligned parallel corpus, and post-edited information (translators output) like a translation memory. The system is based on the METIS-approach (Vandeghinste et al, 2006), and uses an XML-based dictionary format in which not only simple word-to-word translations can be included, but which also contains complex dictionary en-tries, including discontinuous entries, like idioms and proverbs. The pre-sented prototype is a system that automatically adapts its dictionary and tar-get language corpus depending on the post-edited output as made by the users of the system, and will therefore have a learning curve in its performance. 1 1
To Use a Treebank or Not – Which Is Better for Hypernym Extraction?
"... We compare two processing methods for a single natural language processing task. One uses a treebank created with a full parser while the other restricts itself to lexical and part-of-speech information. We show that for the task under investigation, automatic extraction of hypernym-hyponym pairs fr ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We compare two processing methods for a single natural language processing task. One uses a treebank created with a full parser while the other restricts itself to lexical and part-of-speech information. We show that for the task under investigation, automatic extraction of hypernym-hyponym pairs from text, the former does not outperform the latter. We compare the output of the two approaches and look for an explanation for this unexpected result. 1
Bottom-up transfer in Example-based Machine Translation
"... This paper describes the transfer component of a syntax-based Example-based Machine Translation system. The source sentence parse tree is matched in a bottom-up fashion with the source language side of a parallel example treebank, which results in a target forest which is sent to the target language ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes the transfer component of a syntax-based Example-based Machine Translation system. The source sentence parse tree is matched in a bottom-up fashion with the source language side of a parallel example treebank, which results in a target forest which is sent to the target language generation component. The results on a 500 sentences test set are compared with a top-down approach to transfer of the same system, with the bottom-up approach yielding much better results. 1
Effective Measures of Domain Similarity for Parsing
"... It is well known that parsing accuracy suffers when a model is applied to out-of-domain data. It is also known that the most beneficial data to parse a given domain is data that matches the domain (Sekine, 1997; Gildea, 2001). Hence, an important task is to select appropriate domains. However, most ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
It is well known that parsing accuracy suffers when a model is applied to out-of-domain data. It is also known that the most beneficial data to parse a given domain is data that matches the domain (Sekine, 1997; Gildea, 2001). Hence, an important task is to select appropriate domains. However, most previous work on domain adaptation relied on the implicit assumption that domains are somehow given. As more and more data becomes available, automatic ways to select data that is beneficial for a new (unknown) target domain are becoming attractive. This paper evaluates various ways to automatically acquire related training data for a given test set. The results show that an unsupervised technique based on topic models is effective – it outperforms random data selection on both languages examined, English and Dutch. Moreover, the technique works better than manually assigned labels gathered from meta-data that is available for English. 1
LASSY: LARGE SCALE SYNTACTIC ANNOTATION OF WRITTEN DUTCH
"... Lassy Small is the Lassy corpus in which the syntactic annotations have been manually verified. This part contains one million words. The composition of the corpus is detailed in deliverable 1.1. The annotations include syntactic dependency annotations, as documented in deliverable 3.5 [5], and the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Lassy Small is the Lassy corpus in which the syntactic annotations have been manually verified. This part contains one million words. The composition of the corpus is detailed in deliverable 1.1. The annotations include syntactic dependency annotations, as documented in deliverable 3.5 [5], and the annotation of the part-of-speech and lemma of each token, as documented in [3]. 2 Annotation Procedures Both the annotation guidelines manuals and the various tools we used for annotation were initially developed in the STEVIN D-Coi project. The annotation of part-of-speech and lemma proceeded in the same way as in D-Coi: initial assignment of part-of-speech and lemma by TadPole [2]. These automatically assigned annotations were then checked and corrected by students. The syntactic annotation procedure works in a similar way. The Alpino parser [4] is used to assign initial dependency structures automatically. These automatically assigned annotations were then checked and corrected by students (using an adapted version of TrEd,
Lexico-Semantic Multiword Expression Extraction
"... This paper describes a fully unsupervised and automated method for the large-scale extraction of multiword expressions (MWEs) from large corpora. The method takes into account the non-compositionality of MWEs; the intuition is that a noun within a MWE cannot easily be replaced by a semantically simi ..."
Abstract
- Add to MetaCart
This paper describes a fully unsupervised and automated method for the large-scale extraction of multiword expressions (MWEs) from large corpora. The method takes into account the non-compositionality of MWEs; the intuition is that a noun within a MWE cannot easily be replaced by a semantically similar noun. To implement this intuition, a noun clustering is automatically extracted (using distributional similarity measures), which gives us clusters of semantically related nouns. Next, a number of statistical measures – based on selectional preferences – is developed that formalize the intuition of non-compositionality. The ratio of individual noun preference over cluster preference shows how likely a particular expression is to be a MWE (i.e. whether or not an individual noun accounts for all the preference of a certain cluster). Our approach has been tested on Dutch, and has been both manually and automatically evaluated. 1
LINGUISTIC COMPLEXITY AND FREQUENCY IN AGRAMMATIC SPEECH PRODUCTION
"... object scrambling, unaccusative verbs Address for correspondence: ..."

