Results 1 -
9 of
9
Knowledge derived from Wikipedia for computing semantic relatedness
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 2007
"... Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Exi ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet on some datasets. We also address the question whether and how Wikipedia can be integrated into NLP applications as a knowledge base. Including Wikipedia improves the performance of a machine learning based coreference resolution system, indicating that it represents a valuable resource for NLP applications. Finally, we show that our method can be easily used for languages other than English by computing semantic relatedness for a German dataset.
Language models for searching in Web corpora
- THE THIRTEENTH TEXT RETRIEVAL CONFERENCE (TREC 2004). NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY. NIST SPECIAL PUBLICATION
, 2005
"... We describe our participation in the ..."
Applying wikipedia’s multilingual knowledge to cross-lingual question answering
- In Zoubida Kedad, Nadira Lammari, Elisabeth Métais, Farid Meziane, and Yacine Rezgui, editors, NLDB, volume 4592 of Lecture Notes in Computer Science
, 2007
"... Abstract. The application of the multilingual knowledge encoded in Wikipedia to an open–domain Cross–Lingual Question Answering system based on the Inter Lingual Index (ILI) module of EuroWordNet is proposed and evaluated. This strategy overcomes the problems due to ILI’s low coverage on proper noun ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Abstract. The application of the multilingual knowledge encoded in Wikipedia to an open–domain Cross–Lingual Question Answering system based on the Inter Lingual Index (ILI) module of EuroWordNet is proposed and evaluated. This strategy overcomes the problems due to ILI’s low coverage on proper nouns (Named Entities). Moreover, as these are open class words (highly changing), using a community–based up– to–date resource avoids the tedious maintenance of hand–coded bilingual dictionaries. A study reveals the importance to translate Named Entities in CL–QA and the advantages of relying on Wikipedia over ILI for doing this. Tests on questions from the Cross–Language Evaluation Forum (CLEF) justify our approach (20 % of these are correctly answered thanks to Wikipedia’s Multilingual Knowledge). 1
The Impact of Named Entity Normalization on Information Retrieval for Question Answering
"... Abstract. In the named entity normalization task, a system identifies a canonical unambiguous referent for names like Bush or Alabama. Resolving synonymy and ambiguity of such names can benefit end-to-end information access tasks. We evaluate two entity normalization methods based on Wikipedia in th ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. In the named entity normalization task, a system identifies a canonical unambiguous referent for names like Bush or Alabama. Resolving synonymy and ambiguity of such names can benefit end-to-end information access tasks. We evaluate two entity normalization methods based on Wikipedia in the context of both passage and document retrieval for question anwering. We find that even a simple normalization method leads to improvements of early precision, both for document and passage retrieval. Moreover, better normalization results in better retrieval performance. 1
Summarizing Definition from Wikipedia
"... Wikipedia provides a wealth of knowledge, where the first sentence, infobox (and relevant sentences), and even the entire document of a wiki article could be considered as diverse versions of summaries (definitions) of the target topic. We explore how to generate a series of summaries with various l ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Wikipedia provides a wealth of knowledge, where the first sentence, infobox (and relevant sentences), and even the entire document of a wiki article could be considered as diverse versions of summaries (definitions) of the target topic. We explore how to generate a series of summaries with various lengths based on them. To obtain more reliable associations between sentences, we introduce wiki concepts according to the internal links in Wikipedia. In addition, we develop an extended document concept lattice model to combine wiki concepts and non-textual features such as the outline and infobox. The model can concatenate representative sentences from non-overlapping salient local topics for summary generation. We test our model based on our annotated wiki articles which topics come from TREC-QA 2004-2006 evaluations. The results show that the model is effective in summarization and definition QA. 1
The University of Amsterdam at QA@CLEF 2005
, 2005
"... We describe the official runs of our team for the CLEF 2005 question answering track. We took part ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We describe the official runs of our team for the CLEF 2005 question answering track. We took part
ADVANCES IN AUTOMATIC TERMINOLOGY PROCESSING: METHODOLOGY AND APPLICATION IN FOCUS
, 2007
"... This work or any part thereof has not previously been presented in any form to the University or to any other institutional body whether for assessment, publication, or for other purposes. Save for any express acknowledgements, references and/or bibliographies cited in the work, I confirm that the i ..."
Abstract
- Add to MetaCart
This work or any part thereof has not previously been presented in any form to the University or to any other institutional body whether for assessment, publication, or for other purposes. Save for any express acknowledgements, references and/or bibliographies cited in the work, I confirm that the intellectual content of the work is the result of my own efforts and of no other person. The right of Le An Ha to be identified as author of this work is asserted in accordance with ss.77 and 78 of the Copyright, Designs and Patents Act 1988. At this date, copyright is owned by the author. Signature……. Date….. The information and knowledge era, in which we are living, creates challenges in many fields, and terminology is not an exception. The challenges include an exponential growth in the number of specialised documents that are available, in which terms are presented, and the number of newly introduced concepts and terms, which are already beyond our (manual) capacity. A promising solution to this ‘information overload ’ would be to employ automatic or semi-automatic procedures
General Terms Algorithms, Experimentation
"... In this paper we address the problem of discovering missing hypertext links in Wikipedia. The method we propose consists of two steps: first, we compute a cluster of highly similar pages around a given page, and then we identify candidate links from those similar pages that might be missing on the g ..."
Abstract
- Add to MetaCart
In this paper we address the problem of discovering missing hypertext links in Wikipedia. The method we propose consists of two steps: first, we compute a cluster of highly similar pages around a given page, and then we identify candidate links from those similar pages that might be missing on the given page. The main innovation is in the algorithm that we use for identifying similar pages, LTRank, which ranks pages using co-citation and page title information. Both LTRank and the link discovery method are manually evaluated and show acceptable results, especially given the simplicity of the methods and conservativeness of the evaluation criteria.
Finding Similar Sentences across Multiple Languages in Wikipedia
"... We investigate whether the Wikipedia corpus is amenable to multilingual analysis that aims at generating parallel corpora. We present the results of the application of two simple heuristics for the identification of similar text across multiple languages in Wikipedia. Despite the simplicity of the m ..."
Abstract
- Add to MetaCart
We investigate whether the Wikipedia corpus is amenable to multilingual analysis that aims at generating parallel corpora. We present the results of the application of two simple heuristics for the identification of similar text across multiple languages in Wikipedia. Despite the simplicity of the methods, evaluation carried out on a sample of Wikipedia pages shows encouraging results. 1

