Results 1 - 10
of
13,565
Building a Large Annotated Corpus of English: The Penn Treebank
- COMPUTATIONAL LINGUISTICS
, 1993
"... There is a growing consensus that significant, rapid progress can be made in both text understanding and spoken language understanding by investigating those phenomena that occur most centrally in naturally occurring unconstrained materials and by attempting to automatically extract information abou ..."
Abstract
-
Cited by 2740 (10 self)
- Add to MetaCart
and comparison of the adequacy of parsing models.
In this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus 1 consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989
Building A Large Annotated Corpus of
, 1993
"... In this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989-1992), this corpus has been annotated for part-of-spee ..."
Abstract
- Add to MetaCart
In this paper, we review our experience with constructing one such large annotated corpus--the Penn Treebank, a corpus consisting of over 4.5 million words of American English. During the first three-year phase of the Penn Treebank Project (1989-1992), this corpus has been annotated for part
Building a large annotated corpus of English: the Penn Treebank
, 1993
"... this paper, we review our experience with constructing one such large annotated corpus---the Penn Treebank, a corpus ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
this paper, we review our experience with constructing one such large annotated corpus---the Penn Treebank, a corpus
A large annotated corpus for learning natural language inference
"... Understanding entailment and contradic-tion is fundamental to understanding nat-ural language, and inference about entail-ment and contradiction is a valuable test-ing ground for the development of seman-tic representations. However, machine learning research in this area has been dra-matically limi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
-matically limited by the lack of large-scale resources. To address this, we introduce the Stanford Natural Language Inference corpus, a new, freely available collection of labeled sentence pairs, written by hu-mans doing a novel grounded task based on image captioning. At 570K pairs, it is two orders of magnitude
Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English
"... We describe the NUS Corpus of Learner English (NUCLE), a large, fully annotated corpus of learner English that is freely available for research purposes. The goal of the corpus is to provide a large data resource for the development and evaluation of grammatical error correction systems. Although NU ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We describe the NUS Corpus of Learner English (NUCLE), a large, fully annotated corpus of learner English that is freely available for research purposes. The goal of the corpus is to provide a large data resource for the development and evaluation of grammatical error correction systems. Although
The Proposition Bank: An Annotated Corpus of Semantic Roles
- Computational Linguistics
, 2005
"... The Proposition Bank project takes a practical approach to semantic representation, adding a layer of predicate-argument information, or semantic role labels, to the syntactic structures of the Penn Treebank. The resulting resource can be thought of as shallow, in that it does not represent corefere ..."
Abstract
-
Cited by 556 (22 self)
- Add to MetaCart
coreference, quantification, and many other higher-order phenomena, but also broad, in that it covers every instance of every verb in the corpus and allows representative statistics to be calculated. We discuss the criteria used to define the sets of semantic roles used in the annotation process
LabelMe: A Database and Web-Based Tool for Image Annotation
, 2008
"... We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant sha ..."
Abstract
-
Cited by 679 (46 self)
- Add to MetaCart
We seek to build a large collection of images with ground truth labels to be used for object detection and recognition research. Such data is useful for supervised learning and quantitative evaluation. To achieve this, we developed a web-based tool that allows easy image annotation and instant
Imagenet: A large-scale hierarchical image database
- In CVPR
, 2009
"... The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce her ..."
Abstract
-
Cited by 840 (28 self)
- Add to MetaCart
of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image
A Maximum Entropy Model for Part-Of-Speech Tagging
, 1996
"... This paper presents a statistical model which trains from a corpus annotated with Part-OfSpeech tags and assigns them to previously unseen text with state-of-the-art accuracy(96.6%). The model can be classified as a Maximum Entropy model and simultaneously uses many contextual "features" t ..."
Abstract
-
Cited by 580 (1 self)
- Add to MetaCart
This paper presents a statistical model which trains from a corpus annotated with Part-OfSpeech tags and assigns them to previously unseen text with state-of-the-art accuracy(96.6%). The model can be classified as a Maximum Entropy model and simultaneously uses many contextual "
The Berkeley FrameNet Project
- IN PROCEEDINGS OF THE COLING-ACL
, 1998
"... FrameNet is a three-year NSF-supported project in corpus-based computational lexicography, now in its second year #NSF IRI-9618838, #Tools for Lexicon Building"#. The project's key features are #a# a commitment to corpus evidence for semantic and syntactic generalizations, and #b# the repr ..."
Abstract
-
Cited by 643 (3 self)
- Add to MetaCart
#semantic and syntactic# of several thousand words and phrases, each accompanied by #c# a representative collection of annotated corpus attestations, which jointly exemplify the observed linkings between #frame elements" and their syntactic realizations #e.g. grammatical function, phrase type
Results 1 - 10
of
13,565