Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis (2002)
Cached
Download Links
- [ciir.cs.umass.edu]
- [widit.slis.indiana.edu]
- DBLP
Other Repositories/Bibliography
| Venue: | In SIGIR 2002 |
| Citations: | 48 - 5 self |
BibTeX
@INPROCEEDINGS{Larkey02improvingstemming,
author = {Leah S. Larkey and Lisa Ballesteros},
title = {Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis},
booktitle = {In SIGIR 2002},
year = {2002},
pages = {275--282}
}
Years of Citing Articles
OpenURL
Abstract
Arabic, a highly inflected language, requires good stemming for effective information retrieval, yet no standard approach to stemming has emerged. We developed several light stemmers based on heuristics and a statistical stemmer based on co-occurrence for Arabic retrieval. We compared the retrieval effectiveness of our stemmers and of a morphological analyzer on the TREC-2001 data. The best light stemmer was more effective for cross-language retrieval than a morphological stemmer which tried to find the root for each word. A repartitioning process consisting of vowel removal followed by clustering using co-occurrence analysis produced stem classes which were better than no stemming or very light stemming, but still inferior to good light stemming or morphological analysis.







