Results 1 -
4 of
4
Cross-language information retrieval using PARAFAC2
- Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2007
"... Approved for public release; further dissemination unlimited. ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Approved for public release; further dissemination unlimited.
Organized by:
"... In cooperation with the InfoPlosion Project of Japan: http://www.infoplosion.nii.ac.jp/info-plosion/ctr.php/m/IndexEng/a/Index / LEGAL NOTICE: Copyright for individual papers in these proceedings resides with the individual authors of each paper. To copy otherwise, to republish, to post on servers o ..."
Abstract
- Add to MetaCart
In cooperation with the InfoPlosion Project of Japan: http://www.infoplosion.nii.ac.jp/info-plosion/ctr.php/m/IndexEng/a/Index / LEGAL NOTICE: Copyright for individual papers in these proceedings resides with the individual authors of each paper. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGIR
Unlimited Release
, 2007
"... Sandia is a multiprogram laboratory operated by Sandia Corporation, ..."
The Effects of Language Relatedness on Multilingual Information Retrieval: A Case Study With Indo-European and Semitic Languages
"... We explore the effects of language relatedness within a multilingual information retrieval (IR) framework which can be deployed to virtually any language, focusing specifically on Indo-European versus Semitic languages. The Semitic languages present unique challenges to IR for a number of reasons, s ..."
Abstract
- Add to MetaCart
We explore the effects of language relatedness within a multilingual information retrieval (IR) framework which can be deployed to virtually any language, focusing specifically on Indo-European versus Semitic languages. The Semitic languages present unique challenges to IR for a number of reasons, so we set out to answer the question of whether cross-language IR for Semitic languages can be boosted by manipulation of the training data (which, in our framework, includes multilingual parallel text, some of which is morphologically analyzed). We attempted three measures to achieve this: first, the inclusion of genetically related (i.e., other Semitic) languages in the training data; second, the inclusion of non-related languages sharing the same script, and third, the inclusion of morphological analysis for Semitic languages. We find that language relatedness is a definite factor in boosting IR precision; script similarity can probably be ruled out as a factor; and morphological analysis can be helpful, but – perhaps paradoxically – not necessarily to the languages which are subjected to morphological analysis. 1

