Towards a universal wordnet by learning from combined evidence (2009)
| Venue: | In Proc. CIKM 2009 |
| Citations: | 10 - 6 self |
BibTeX
@INPROCEEDINGS{Melo09towardsa,
author = {Gerard De Melo and Gerhard Weikum},
title = {Towards a universal wordnet by learning from combined evidence},
booktitle = {In Proc. CIKM 2009},
year = {2009},
publisher = {ACM}
}
OpenURL
Abstract
Lexical databases are invaluable sources of knowledge about words and their meanings, with numerous applications in areas like NLP, IR, and AI. We propose a methodology for the automatic construction of a large-scale multilingual lexical database where words of many languages are hierarchically organized in terms of their meanings and their semantic relations to other words. This resource is bootstrapped from WordNet, a well-known English-language resource. Our approach extends WordNet with around 1.5 million meaning links for 800,000 words in over 200 languages, drawing on evidence extracted from a variety of resources including existing (monolingual) wordnets, (mostly bilingual) translation dictionaries, and parallel corpora. Graph-based scoring functions and statistical learning techniques are used to iteratively integrate this information and build an output graph. Experiments show that this wordnet has a high level of precision and coverage, and that it can be useful in applied tasks such as cross-lingual text classification.







