Results 1 - 10
of
68
Inter-Coder Agreement for Computational Linguistics
- COMPUTATIONAL LINGUISTICS
, 2008
"... This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in Computational Linguistics, may be more appropriate for many corpus annotation tasks – but that their use makes the interpretation of the value of the coefficient even harder.
Word sense disambiguation: a survey
- ACM COMPUTING SURVEYS
, 2009
"... Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
Word sense disambiguation (WSD) is the ability to identify the meaning of words in context in a computational manner. WSD is considered an AI-complete problem, that is, a task whose solution is at least as hard as the most difficult problems in artificial intelligence. We introduce the reader to the motivations for solving the ambiguity of words and provide a description of the task. We overview supervised, unsupervised, and knowledge-based approaches. The assessment of WSD systems is discussed in the context of the Senseval/Semeval campaigns, aiming at the objective evaluation of systems participating in several different disambiguation tasks. Finally, applications, open problems, and future directions are discussed.
Semeval-2007 task-17: English lexical sample, SRL and all words
- In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007
, 2007
"... This paper describes our experience in preparing the data and evaluating the results for three subtasks of SemEval-2007 Task-17 – Lexical Sample, Semantic Role Labeling (SRL) and All-Words respectively. We tabulate and analyze the results of participating systems. 1 ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
This paper describes our experience in preparing the data and evaluating the results for three subtasks of SemEval-2007 Task-17 – Lexical Sample, Semantic Role Labeling (SRL) and All-Words respectively. We tabulate and analyze the results of participating systems. 1
Hierarchical Bayesian Domain Adaptation
"... Multi-task learning is the problem of maximizing the performance of a system across a number of related tasks. When applied to multiple domains for the same task, it is similar to domain adaptation, but symmetric, rather than limited to improving performance on a target domain. We present a more pri ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Multi-task learning is the problem of maximizing the performance of a system across a number of related tasks. When applied to multiple domains for the same task, it is similar to domain adaptation, but symmetric, rather than limited to improving performance on a target domain. We present a more principled, better performing model for this problem, based on the use of a hierarchical Bayesian prior. Each domain has its own domain-specific parameter for each feature but, rather than a constant prior over these parameters, the model instead links them via a hierarchical Bayesian global prior. This prior encourages the features to have similar weights across domains, unless there is good evidence to the contrary. We show that the method of (Daumé III, 2007), which was presented as a simple “preprocessing step, ” is actually equivalent, except our representation explicitly separates hyperparameters which were tied in his work. We demonstrate that allowing different values for these hyperparameters significantly improves performance over both a strong baseline and (Daumé III, 2007) within both a conditional random field sequence model for named entity recognition and a discriminatively trained dependency parser. 1
Joint Parsing and Named Entity Recognition
"... For many language technology applications, such as question answering, the overall system runs several independent processors over the data (such as a named entity recognizer, a coreference system, and a parser). This easily results in inconsistent annotations, which are harmful to the performance o ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
For many language technology applications, such as question answering, the overall system runs several independent processors over the data (such as a named entity recognizer, a coreference system, and a parser). This easily results in inconsistent annotations, which are harmful to the performance of the aggregate system. We begin to address this problem with a joint model of parsing and named entity recognition, based on a discriminative feature-based constituency parser. Our model produces a consistent output, where the named entity spans do not conflict with the phrasal spans of the parse tree. The joint representation also allows the information from each type of annotation to improve performance on the other, and, in experiments with the OntoNotes corpus, we found improvements of up to 1.36 % absolute F1 for parsing, and up to 9.0 % F1 for named entity recognition. 1
OntoNotes: A Unified Relational Semantic Representation
"... The OntoNotes project is creating a corpus of largescale, accurate, and integrated annotation of multiple levels of the shallow semantic structure in text. Such rich, integrated annotation covering many levels will allow for richer, cross-level models enabling significantly better automatic semantic ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
The OntoNotes project is creating a corpus of largescale, accurate, and integrated annotation of multiple levels of the shallow semantic structure in text. Such rich, integrated annotation covering many levels will allow for richer, cross-level models enabling significantly better automatic semantic analysis. At the same time, it demands a robust, efficient, scalable mechanism for storing and accessing these complex inter-dependent annotations. We describe a relational database representation that captures both the inter- and intra-layer dependencies and provide details of an object-oriented API for efficient, multi-tiered access to this data. 1
SemEval-2007 Task 07: Coarse-Grained English All-Words Task
"... This paper presents the coarse-grained English all-words task at SemEval-2007. We describe our experience in producing a coarse version of the WordNet sense inventory and preparing the sense-tagged corpus for the task. We present the results of participating systems and discuss future directions. 1 ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
This paper presents the coarse-grained English all-words task at SemEval-2007. We describe our experience in producing a coarse version of the WordNet sense inventory and preparing the sense-tagged corpus for the task. We present the results of participating systems and discuss future directions. 1
Phrase Detectives - A Web-based Collaborative Annotation Game
- In Proceedings of I-Semantics
, 2008
"... Abstract: Annotated corpora of the size needed for modern computational linguistics research cannot be created by small groups of hand annotators. One solution is to exploit collaborative work on the Web and one way to do this is through games like the ESP game. Applying this methodology however req ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Abstract: Annotated corpora of the size needed for modern computational linguistics research cannot be created by small groups of hand annotators. One solution is to exploit collaborative work on the Web and one way to do this is through games like the ESP game. Applying this methodology however requires developing methods for teaching subjects the rules of the game and evaluating their contribution while maintaining the game entertainment. In addition, applying this method to linguistic annotation tasks like anaphoric annotation requires developing methods for presenting text and identifying the components of the text that need to be annotated. In this paper we
Learning for semantic parsing
- IN COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE
, 2007
"... Semantic parsing is the task of mapping a natural language sentence into a complete, formal meaning representation. Over the past decade, we have developed a number of machine learning methods for inducing semantic parsers by training on a corpus of sentences paired with their meaning representatio ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Semantic parsing is the task of mapping a natural language sentence into a complete, formal meaning representation. Over the past decade, we have developed a number of machine learning methods for inducing semantic parsers by training on a corpus of sentences paired with their meaning representations in a specified formal language. We have demonstrated these methods on the automated construction of naturallanguage interfaces to databases and robot command languages. This paper reviews our prior work on this topic and discusses directions for future research.
Stanford’s Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task
"... This paper details the coreference resolution system submitted by Stanford at the CoNLL-2011 shared task. Our system is a collection of deterministic coreference resolution models that incorporate lexical, syntactic, semantic, and discourse information. All these models use global document-level inf ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This paper details the coreference resolution system submitted by Stanford at the CoNLL-2011 shared task. Our system is a collection of deterministic coreference resolution models that incorporate lexical, syntactic, semantic, and discourse information. All these models use global document-level information by sharing mention attributes, such as gender and number, across mentions in the same cluster. We participated in both the open and closed tracks and submitted results using both predicted and gold mentions. Our system was ranked first in both tracks, with a score of 57.8 in the closed track and 58.3 in the open track. 1

