Results 1 - 10
of
596
A Maximum-Entropy-Inspired Parser
, 1999
"... We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan- dard" se ..."
Abstract
-
Cited by 971 (19 self)
- Add to MetaCart
We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5,9,10,15,17] "stan- dard" sections of the Wall Street Journal tree- bank. This represents a 13% decrease in error rate over the best single-parser results on this corpus [9]. The major technical innova- tion is the use of a "maximum-entropy-inspired" model for conditioning and smoothing that let us successfully to test and combine many different conditioning events. We also present some partial results showing the effects of different conditioning information, including a surprising 2% improvement due to guessing the lexical head's pre-terminal before guessing the lexical head.
Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging
- Computational Linguistics
, 1995
"... this paper, we will describe a simple rule-based approach to automated learning of linguistic knowledge. This approach has been shown for a number of tasks to capture information in a clearer and more direct fashion without a compromise in performance. We present a detailed case study of this learni ..."
Abstract
-
Cited by 924 (8 self)
- Add to MetaCart
(Show Context)
this paper, we will describe a simple rule-based approach to automated learning of linguistic knowledge. This approach has been shown for a number of tasks to capture information in a clearer and more direct fashion without a compromise in performance. We present a detailed case study of this learning method applied to part of speech tagging
A semantic concordance
- Proceedings ARPA Human Language Technology Workshop
, 1993
"... A semantic concordance is a textual corpus and a lexicon So combined that every substantive word in the text is linked to its appropriate ~nse in the lexicon. Thus it can be viewed either as a corpus in which words have been tagged syntactically and semantically, or as a lexicon in which example sen ..."
Abstract
-
Cited by 321 (4 self)
- Add to MetaCart
(Show Context)
A semantic concordance is a textual corpus and a lexicon So combined that every substantive word in the text is linked to its appropriate ~nse in the lexicon. Thus it can be viewed either as a corpus in which words have been tagged syntactically and semantically, or as a lexicon in which example sentences can be found for many definitions. A semantic concordance is being constructed to use in studies of sense resolution in context (semantic disambiguation). The Brown Corpus is the text and WordNet is the lexicon. Semantic tags (pointers to WordNet synsets) are inserted in the text manually using an interface, ConText, that was designed to facilitate the task. Another interface supports searches of the tagged text. Some practical uses for semantic concordances are proposed. 1.
Some advances in transformation-based part-of-speech tagging
- In Proceedings of the Twelfth National Conference on Artificial Intelligence
, 1994
"... Most recent research in trainable part of speech taggers has explored stochastic tagging. While these taggers obtain high accuracy, linguistic information is captured indirectly, typically in tens of thousands of lexical and contextual probabilities. In (Brill 1992), a trainable rule-based tagger wa ..."
Abstract
-
Cited by 294 (1 self)
- Add to MetaCart
(Show Context)
Most recent research in trainable part of speech taggers has explored stochastic tagging. While these taggers obtain high accuracy, linguistic information is captured indirectly, typically in tens of thousands of lexical and contextual probabilities. In (Brill 1992), a trainable rule-based tagger was described that obtained performance comparable to that of stochastic taggers, but captured relevant linguistic information in a small number of simple non-stochastic rules. In this paper, we describe a number of extensions to this rule-based tagger. First, we describe a method for expressing lexical relations in tagging that stochastic taggers are currently unable to express. Next, we show a rule-based approach to tagging unknown words. Finally, we show how the tagger can be extended into a k-best tagger, where multiple tags can be assigned to words in some cases of uncertainty.
Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach
- IN PROCEEDINGS OF THE 34TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1996
"... In this paper, we present a new approach for word sense disambiguation (WSD) using an exemplar-based learning algorithm. This approach ..."
Abstract
-
Cited by 279 (9 self)
- Add to MetaCart
In this paper, we present a new approach for word sense disambiguation (WSD) using an exemplar-based learning algorithm. This approach
Improvements In Part-of-Speech Tagging With an Application To German
- In Proceedings of the ACL SIGDAT-Workshop
, 1995
"... This paper presents a couple of extensions to a basic Markov Model tagger (called TreeTagger) which improve its accuracy when trained on small corpora. The basic tagger was originally developed for English [Schmid, 1994]. The extensions together reduced error rates on a German test corpus by more th ..."
Abstract
-
Cited by 216 (1 self)
- Add to MetaCart
This paper presents a couple of extensions to a basic Markov Model tagger (called TreeTagger) which improve its accuracy when trained on small corpora. The basic tagger was originally developed for English [Schmid, 1994]. The extensions together reduced error rates on a German test corpus by more than a third.
An adapted lesk algorithm for word sense disambiguation using wordnet
- In Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics
, 2002
"... This is to certify that I have examined this copy of master’s thesis by ..."
Abstract
-
Cited by 210 (4 self)
- Add to MetaCart
(Show Context)
This is to certify that I have examined this copy of master’s thesis by
Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature
- PLoS Biol
, 2004
"... We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso’s two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation ..."
Abstract
-
Cited by 208 (14 self)
- Add to MetaCart
(Show Context)
We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso’s two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc.) and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., biological process, etc.). Together they form a catalog of types of objects and concepts called an ontology. After this ontology is populated with terms, the whole corpus of articles and abstracts is marked up to identify terms of these categories. The current ontology comprises 33 categories of terms. A search engine enables the user to search for one or a combination of these tags and/or keywords within a sentence or document, and as the ontology allows word meaning to be queried, it is possible to formulate semantic queries. Full text access increases recall of biological data types from 45 % to 95%. Extraction of particular biological facts, such as gene-gene interactions, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences; in searches for two uniquely named genes and an interaction term, the ontology confers a 3-fold increase of search efficiency. Textpresso currently focuses on Caenorhabditis elegans literature, with 3,800 full text articles and 16,000 abstracts. The lexicon of the ontology contains 14,500 entries, each of which includes all versions of a specific word or phrase, and it includes all categories of the Gene Ontology database. Textpresso is a useful curation tool, as well as search engine for researchers, and can readily be extended to other organism-specific corpora of text. Textpresso can be accessed at
Learning Subjective Language
- Computational Linguistics
, 1993
"... Subjectivity in natural language refers to aspects of language used to express opinions, evaluations, and speculations. There are numerous natural language processing applications for which subjectivity analysis is relevant, including information extraction and text categorization. The goal of this ..."
Abstract
-
Cited by 194 (5 self)
- Add to MetaCart
(Show Context)
Subjectivity in natural language refers to aspects of language used to express opinions, evaluations, and speculations. There are numerous natural language processing applications for which subjectivity analysis is relevant, including information extraction and text categorization. The goal of this work is learning subjective language from corpora. Clues of subjectivity are generated and tested, including low-frequency words, collocations, and adjectives and verbs identified using distributional similarity. The features are also examined working together in concert. The features, generated from different data sets using different procedures, exhibit consistency in performance in that they all do better and worse on the same data sets. In addition, this article shows that the density of subjectivity clues in the surrounding context strongly affects how likely it is that a word is subjective, and it provides the results of an annotation study assessing the subjectivity of sentences with high-density features. Finally, the clues are used to perform opinion piece recognition (a type of text categorization and genre detection) to demonstrate the utility of the knowledge acquired in this article.