DMCA
P.: Multilingual corpora with coreferential annotation of person entities (2014)
Venue: | In Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC 2014 |
Citations: | 1 - 1 self |
Citations
112 | The Tradeoffs Between Open and Traditional Relation Extraction
- Banko, Etzioni
- 2008
(Show Context)
Citation Context ...he Brazilian driver”), or a relative pronoun (“who”) among other linguistic units. Different expressions referring to the same discourse entity are in a coreference relation (Recasens and Martı, 2010). Knowing the behavior of this phenomenon allows us, from a linguistic point of view, to better understand how a discourse is organized at the semantico-referential level (Gordon and Hendrick, 1998). From the Natural Language Processing perspective, coreference resolution is a crucial task for different applications such as Text Summarization (Steinberger et al., 2007) or Information Extraction (Banko and Etzioni, 2008). Information about people is one of the most common types of knowledge extracted by Relation Extraction systems (Mann, 2002), as the different Web People Search (WePS)1 workshops show (Artiles et al., 2007). In this regard, the extraction of biographical information is a task whose performance may be improved if performed after a coreference solver, which previously established identity links between mentions of person entities, which are the main argument of a biographical relation (Suchanek et al., 2006).2 In order to take advantage of the benefits of coreference resolution, resources such ... |
83 |
The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task.
- Artiles, Gonzalo, et al.
- 2007
(Show Context)
Citation Context ... the behavior of this phenomenon allows us, from a linguistic point of view, to better understand how a discourse is organized at the semantico-referential level (Gordon and Hendrick, 1998). From the Natural Language Processing perspective, coreference resolution is a crucial task for different applications such as Text Summarization (Steinberger et al., 2007) or Information Extraction (Banko and Etzioni, 2008). Information about people is one of the most common types of knowledge extracted by Relation Extraction systems (Mann, 2002), as the different Web People Search (WePS)1 workshops show (Artiles et al., 2007). In this regard, the extraction of biographical information is a task whose performance may be improved if performed after a coreference solver, which previously established identity links between mentions of person entities, which are the main argument of a biographical relation (Suchanek et al., 2006).2 In order to take advantage of the benefits of coreference resolution, resources such as annotated corpora are needed. They are useful both for better understanding the behavior of this phenomenon and also for evaluating how coreference resolution systems perform. Taking the above into accoun... |
72 | Description of the UMass systems as used for MUC-6
- Fisher, Soderland, et al.
- 1995
(Show Context)
Citation Context ...tems work with person entities. Furthermore, the presented resources are freely available in different formats, so they can be enlarged and improved collaboratively.3 Some related work in presented in Section 2. The main properties of the corpora as well as some distributional statistics are presented in Section 3. Then, the importance of coreference resolution for Information Extraction is shown in Section 4. Some final remarks are put forward in Section 5. 2. Related Work The interest in having corpora with coreferential annotation was visible in the Message Understanding Conferences (MUC) (Fisher et al., 1995; Chinchor and Hirschmann, 1997), which started to develop guidelines and to build corpora for English with this kind of information. Other evaluations such as the Anaphora Resolution Exercise (ARE) focused their attention on pronominal anaphora resolution and in nominal phrase coreference (Orasan et al., 2008). Previous works such as Mitkov et al. (2000) had continued developing annotation tools and corpora. Based on the annotation of MUC schemes, Hoste (2005) proposed a coreference annotation scheme for Dutch, followed by the COREA Project (Bouma et al., 2007). Recasens and Martı (2010) def... |
65 |
MUC-7 Coreference Task Definition (Version 3.0).
- Chinchor, Hirschmann
- 1997
(Show Context)
Citation Context ... entities. Furthermore, the presented resources are freely available in different formats, so they can be enlarged and improved collaboratively.3 Some related work in presented in Section 2. The main properties of the corpora as well as some distributional statistics are presented in Section 3. Then, the importance of coreference resolution for Information Extraction is shown in Section 4. Some final remarks are put forward in Section 5. 2. Related Work The interest in having corpora with coreferential annotation was visible in the Message Understanding Conferences (MUC) (Fisher et al., 1995; Chinchor and Hirschmann, 1997), which started to develop guidelines and to build corpora for English with this kind of information. Other evaluations such as the Anaphora Resolution Exercise (ARE) focused their attention on pronominal anaphora resolution and in nominal phrase coreference (Orasan et al., 2008). Previous works such as Mitkov et al. (2000) had continued developing annotation tools and corpora. Based on the annotation of MUC schemes, Hoste (2005) proposed a coreference annotation scheme for Dutch, followed by the COREA Project (Bouma et al., 2007). Recasens and Martı (2010) defined new annotation guidelines f... |
38 | The representation and processing of coreference in discourse
- Gordon, Hendrick
- 1999
(Show Context)
Citation Context ...produced, several concepts are often expressed in many different ways without losing the reference to the same discourse entity. Thus, a person like “Ayrton Senna” may be referred by a personal pronoun (“He”), a noun phrase (“the Brazilian driver”), or a relative pronoun (“who”) among other linguistic units. Different expressions referring to the same discourse entity are in a coreference relation (Recasens and Martı, 2010). Knowing the behavior of this phenomenon allows us, from a linguistic point of view, to better understand how a discourse is organized at the semantico-referential level (Gordon and Hendrick, 1998). From the Natural Language Processing perspective, coreference resolution is a crucial task for different applications such as Text Summarization (Steinberger et al., 2007) or Information Extraction (Banko and Etzioni, 2008). Information about people is one of the most common types of knowledge extracted by Relation Extraction systems (Mann, 2002), as the different Web People Search (WePS)1 workshops show (Artiles et al., 2007). In this regard, the extraction of biographical information is a task whose performance may be improved if performed after a coreference solver, which previously estab... |
37 | Freeling 3.0: Towards wider multilinguality. In: - Padro, Stanilovsky - 2012 |
36 |
Optimization issues in machine learning of coreference resolution.
- Hoste
- 2005
(Show Context)
Citation Context ...d Work The interest in having corpora with coreferential annotation was visible in the Message Understanding Conferences (MUC) (Fisher et al., 1995; Chinchor and Hirschmann, 1997), which started to develop guidelines and to build corpora for English with this kind of information. Other evaluations such as the Anaphora Resolution Exercise (ARE) focused their attention on pronominal anaphora resolution and in nominal phrase coreference (Orasan et al., 2008). Previous works such as Mitkov et al. (2000) had continued developing annotation tools and corpora. Based on the annotation of MUC schemes, Hoste (2005) proposed a coreference annotation scheme for Dutch, followed by the COREA Project (Bouma et al., 2007). Recasens and Martı (2010) defined new annotation guidelines for coreference in Spanish and Catalan, by excluding some relations previously considered, such as part-whole coreference, bound anaphora or bridging reference. This work also released corpora with coreferential annotation, and inspired the SemEval-2010 Task #1: Coreference Resolution in Multiple Languages (Recasens et al., 2010). Apart from Spanish and Catalan, this evaluation also 3http://gramatica.usc.es/˜marcos/lrec. tar.bz2 3... |
35 | Fine-grained proper noun ontologies for question answering.
- Mann
- 2002
(Show Context)
Citation Context ...ntity are in a coreference relation (Recasens and Martı, 2010). Knowing the behavior of this phenomenon allows us, from a linguistic point of view, to better understand how a discourse is organized at the semantico-referential level (Gordon and Hendrick, 1998). From the Natural Language Processing perspective, coreference resolution is a crucial task for different applications such as Text Summarization (Steinberger et al., 2007) or Information Extraction (Banko and Etzioni, 2008). Information about people is one of the most common types of knowledge extracted by Relation Extraction systems (Mann, 2002), as the different Web People Search (WePS)1 workshops show (Artiles et al., 2007). In this regard, the extraction of biographical information is a task whose performance may be improved if performed after a coreference solver, which previously established identity links between mentions of person entities, which are the main argument of a biographical relation (Suchanek et al., 2006).2 In order to take advantage of the benefits of coreference resolution, resources such as annotated corpora are needed. They are useful both for better understanding the behavior of this phenomenon and also for e... |
28 | Weikum: LEILA: Learning to Extract Information by Linguistic Analysis
- Suchanek, Ifrim, et al.
- 2006
(Show Context)
Citation Context ...ions such as Text Summarization (Steinberger et al., 2007) or Information Extraction (Banko and Etzioni, 2008). Information about people is one of the most common types of knowledge extracted by Relation Extraction systems (Mann, 2002), as the different Web People Search (WePS)1 workshops show (Artiles et al., 2007). In this regard, the extraction of biographical information is a task whose performance may be improved if performed after a coreference solver, which previously established identity links between mentions of person entities, which are the main argument of a biographical relation (Suchanek et al., 2006).2 In order to take advantage of the benefits of coreference resolution, resources such as annotated corpora are needed. They are useful both for better understanding the behavior of this phenomenon and also for evaluating how coreference resolution systems perform. Taking the above into account, this article presents three freely available corpora annotated with coreference links of person entities in Portuguese (pt), Galician (gl), and Span1http://nlp.uned.es/weps/weps-3 2In this paper, a mention is every instance of reference to a person, while an entity is the group of all the mentions ref... |
23 | AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan. Language Resources and Evaluation, - Recasens, Martı - 2010 |
23 | Two uses of anaphora resolution in summarization.
- Steinberger, Poesio, et al.
- 2007
(Show Context)
Citation Context ...erred by a personal pronoun (“He”), a noun phrase (“the Brazilian driver”), or a relative pronoun (“who”) among other linguistic units. Different expressions referring to the same discourse entity are in a coreference relation (Recasens and Martı, 2010). Knowing the behavior of this phenomenon allows us, from a linguistic point of view, to better understand how a discourse is organized at the semantico-referential level (Gordon and Hendrick, 1998). From the Natural Language Processing perspective, coreference resolution is a crucial task for different applications such as Text Summarization (Steinberger et al., 2007) or Information Extraction (Banko and Etzioni, 2008). Information about people is one of the most common types of knowledge extracted by Relation Extraction systems (Mann, 2002), as the different Web People Search (WePS)1 workshops show (Artiles et al., 2007). In this regard, the extraction of biographical information is a task whose performance may be improved if performed after a coreference solver, which previously established identity links between mentions of person entities, which are the main argument of a biographical relation (Suchanek et al., 2006).2 In order to take advantage of the... |
11 | Anaphora Resolution Exercise: an Overview.
- Orasan, Cristea, et al.
- 2008
(Show Context)
Citation Context ...on 3. Then, the importance of coreference resolution for Information Extraction is shown in Section 4. Some final remarks are put forward in Section 5. 2. Related Work The interest in having corpora with coreferential annotation was visible in the Message Understanding Conferences (MUC) (Fisher et al., 1995; Chinchor and Hirschmann, 1997), which started to develop guidelines and to build corpora for English with this kind of information. Other evaluations such as the Anaphora Resolution Exercise (ARE) focused their attention on pronominal anaphora resolution and in nominal phrase coreference (Orasan et al., 2008). Previous works such as Mitkov et al. (2000) had continued developing annotation tools and corpora. Based on the annotation of MUC schemes, Hoste (2005) proposed a coreference annotation scheme for Dutch, followed by the COREA Project (Bouma et al., 2007). Recasens and Martı (2010) defined new annotation guidelines for coreference in Spanish and Catalan, by excluding some relations previously considered, such as part-whole coreference, bound anaphora or bridging reference. This work also released corpora with coreferential annotation, and inspired the SemEval-2010 Task #1: Coreference Resolu... |
7 | A Grammatical Formalism Based on Patterns of Part-ofSpeech Tags. - Gamallo, Lopez - 2011 |
7 | Dependency-based Open Information Extraction. - Gamallo, Garcia, et al. - 2012 |
6 |
Summ-it: Um corpus anotado com informacoes discursivas visandoa sumarizacao automatica.
- Collovini, Carbonel, et al.
- 2007
(Show Context)
Citation Context ...emEval-2010 Task #1: Coreference Resolution in Multiple Languages (Recasens et al., 2010). Apart from Spanish and Catalan, this evaluation also 3http://gramatica.usc.es/˜marcos/lrec. tar.bz2 3229 Language Text Documents Tokens Portuguese Journal 91 34kWikipedia 6 17k Galician Journal 28 17kWikipedia 29 25k Spanish Journal 27 18kWikipedia 12 28k Total Journal 146 70k Wikipedia 47 71k Total 193 141k Table 1: Size of the corpora in number of documents and tokens per language and text typology. made available corpora for other languages such as English, Dutch, German and Italian.4 For Portuguese, Collovini et al. (2007) published Summ-it, a Brazilian Portuguese corpus focused on automatic summarization, which followed the MUC guidelines for including coreference annotation. Finally, to the best of our knowledge, there is no corpus for Galician with any kind of coreference information. Due to the lack of resources for Galician and Portuguese, this paper releases coreferentially annotated corpora with similar properties for these languages (and also for Spanish), which allow researchers from different fields to analyze this phenomenon and to evaluate coreference resolution systems. 3. The Corpora The source te... |
5 |
The COREA-project. Manual for the annotation of coreference in Dutch texts.
- Bouma, Daelemans, et al.
- 2007
(Show Context)
Citation Context ...nderstanding Conferences (MUC) (Fisher et al., 1995; Chinchor and Hirschmann, 1997), which started to develop guidelines and to build corpora for English with this kind of information. Other evaluations such as the Anaphora Resolution Exercise (ARE) focused their attention on pronominal anaphora resolution and in nominal phrase coreference (Orasan et al., 2008). Previous works such as Mitkov et al. (2000) had continued developing annotation tools and corpora. Based on the annotation of MUC schemes, Hoste (2005) proposed a coreference annotation scheme for Dutch, followed by the COREA Project (Bouma et al., 2007). Recasens and Martı (2010) defined new annotation guidelines for coreference in Spanish and Catalan, by excluding some relations previously considered, such as part-whole coreference, bound anaphora or bridging reference. This work also released corpora with coreferential annotation, and inspired the SemEval-2010 Task #1: Coreference Resolution in Multiple Languages (Recasens et al., 2010). Apart from Spanish and Catalan, this evaluation also 3http://gramatica.usc.es/˜marcos/lrec. tar.bz2 3229 Language Text Documents Tokens Portuguese Journal 91 34kWikipedia 6 17k Galician Journal 28 17kWiki... |
4 | Antonia Martı, Mariona Taule, Veronique Hoste, Massimo Poesio, and Yannick Versley. - Recasens, Marquez, et al. - 2010 |
3 | A resourcebased method for named entity extraction and classification.
- Gamallo, Garcia
- 2011
(Show Context)
Citation Context ...ted journalistic news and encyclopedic articles (of people) from 4http://stel.ub.edu/semeval2010-coref/ 5The statistics have been computed using the version 0.2 of the corpora. Further revisions might involve variations in these results. different Internet sources (taking into account their linguistic variety). These texts were tokenized, lemmatized and PoS-tagged with FreeLing (Padro and Stanilovsky, 2012). FreeLing was also used for performing Named Entity Recognition (NER) in Spanish, while Galician and Portuguese NER was carried out with other open-source tools (Garcia and Gamallo, 2010; Gamallo and Garcia, 2011; Garcia et al., 2012). Then, DepPattern was used for enriching the corpora with syntactic dependencies (Gamallo and Gonzalez Lopez, 2011). Finally, the coreferential annotation was manually added by two linguists following the SemEval-2010 Task #1 format (Recasens et al., 2010). 3.1. Annotation guidelines Different expressions referring to the same discourse entity were annotated as coreferent when standing in a identity of referent relation. Moreover, predicative, appositive, and parenthetical expressions were also marked, even though these are not considered as coreferent expressions by s... |
1 | Richard Evans, Constantin Orasan, - Mitkov - 2000 |
1 | anaphora: developing annotating tools, annotated resources and annotation strategies. - Coreference - 2000 |