Highlights: Language- and domain-independent automatic indexing terms for abstracting (1995)
| Venue: | Journal of the American Society for Information Science |
| Citations: | 27 - 0 self |
BibTeX
@ARTICLE{Cohen95highlights:language-,
author = {Jonathan D. Cohen},
title = {Highlights: Language- and domain-independent automatic indexing terms for abstracting},
journal = {Journal of the American Society for Information Science},
year = {1995},
pages = {114}
}
Years of Citing Articles
OpenURL
Abstract
A method of drawing index terms from text is presented. The approach uses no stop list, stemmer, or other language-and domain-specific component, allowing operation in any language or domain with only trivial modification. The method uses n-gram counts, achieving a function similar to, but more general than, a stemmer. The generated index terms, which the author calls “highlights, ” are suitable for identifying the topic for perusal and selection. An extension is also described and demonstrated which selects index terms to represent a subset of documents, distinguishing them from the corpus. Some experimental results are presented, showing operation in English, Spanish, German, Georgian, Russian, and Japanese.







