Learning to Extract Keyphrases from Text (1999)
Cached
Download Links
- [cogprints.ecs.soton.ac.uk]
- [iit-iti.nrc-cnrc.gc.ca]
- DBLP
Other Repositories/Bibliography
| Citations: | 39 - 4 self |
BibTeX
@MISC{Turney99learningto,
author = {Peter Turney},
title = {Learning to Extract Keyphrases from Text},
year = {1999}
}
Years of Citing Articles
OpenURL
Abstract
Many academic journals ask their authors to provide a list of about five to fifteen key words, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a surprisingly wide variety of tasks for which keyphrases are useful, as we discuss in this paper. Recent commercial software, such as Microsoft's Word 97 and Verity's Search 97, includes algorithms that automatically extract keyphrases from documents. In this paper, we approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keyphrases. Our first set of experiments applies the C4.5 decision tree induction algorithm to this learning task. The second set of experiments applies the GenEx algorithm to the task. We developed the GenEx algorithm specifically for this task. T...







