Results 1 -
8 of
8
Semantic Annotation for Concept-Based Cross-Language Medical Information Retrieval
"... We present a framework for concept-based cross-language information retrieval in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data. Documents and queries are annotat ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
We present a framework for concept-based cross-language information retrieval in the medical domain, which is under development in the MUCHMORE project. Our approach is based on using the Unified Medical Language System (UMLS) as the primary source of semantic data. Documents and queries are annotated with multiple layers of linguistic information. Linguistic processing includes part-of-speech tagging, morphological analysis, phrase recognition and the identification of medical terms and semantic relations between them. The paper
Unsupervised Monolingual and Bilingual Word-Sense Disambiguation of Medical Documents using UMLS
- In Natural 110 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—January 1, 2006 DOI: 10.1002/asi Language Processing in Biomedicine ACL 2003 Workshop (pp. 9–16). East Stroudsburg, PA: Association for Computational Linguistics
, 2003
"... This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using UMLS. We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best r ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using UMLS. We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best results are obtained using relations between terms given by UMLS, a method which achieves 74% precision, 66% coverage for English and 79% precision, 73% coverage for German on evaluation corpora and over 83% coverage over the whole corpus. The success of this technique for German shows that a lexical resource giving relations between concepts used to index an English document collection can be used for high quality disambiguation in another language.
Evaluation Resources for Concept-based Cross-Lingual Information Retrieval in the Medical Domain
"... The paper describes evaluation resources for concept-based, cross-lingual information retrieval in the medical domain. All resources were constructed in the context of the MuchMore project and are freely available through the project website. Available resources include: a bilingual, parallel docume ..."
Abstract
- Add to MetaCart
The paper describes evaluation resources for concept-based, cross-lingual information retrieval in the medical domain. All resources were constructed in the context of the MuchMore project and are freely available through the project website. Available resources include: a bilingual, parallel document collection of German and English medical scientific abstracts, a set of queries and corresponding relevance assessments, two manually disambiguated test sets for semantic annotation (sense disambiguation), two evaluation lists for German morphological decomposition of medical terms. MuchMore The evaluation resources described in this paper were all constructed in the context of the MuchMore project 1 on concept-based, cross-lingual information retrieval (CLIR). The project provided a framework for integrating and refining existing technologies and developing new approaches to CLIR for the medical domain. For this purpose, the project pursued the following aims: • Integrated and effective combination of different approaches and heterogeneous resources for crosslingual information access and management, including performance and user evaluation for realistic information access tasks. • Automated acquisition of domain-specific linguistic resources and effective use of multilingual concept hierarchies. • Demonstration of a cross-lingual information access prototype system for the medical domain, that provides access to multilingual information on the basis of a combined use of corpus analysis and (domain-specific) ontologies and thesauri. The MuchMore Prototype The MuchMore project developed a prototype crosslingual document retrieval system that enables users to retrieve documents (in English and/or German) that are relevant to a given query document (in English or German), see e.g. (Sacaleanu et al., 2003). In the current
Unsupervised Monolingual and Bilingual Word-Sense
- In Natural 110 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—January 1, 2006 DOI: 10.1002/asi Language Processing in Biomedicine ACL 2003 Workshop (pp. 9–16). East Stroudsburg, PA: Association for Computational Linguistics
, 2003
"... This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using UMLS. We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. ..."
Abstract
- Add to MetaCart
This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using UMLS. We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora.
Evaluation Corpora for Sense Disambiguation in the Medical Domain
, 2002
"... An important aspect of word sense disambiguation is the evaluation of different methods and parameters. Unfortunately, there is a lack of test sets for evaluation, specifically for languages other than English and even more so for specific domains like medicine. Given that our work focuses on Englis ..."
Abstract
- Add to MetaCart
An important aspect of word sense disambiguation is the evaluation of different methods and parameters. Unfortunately, there is a lack of test sets for evaluation, specifically for languages other than English and even more so for specific domains like medicine. Given that our work focuses on English as well as German text in the medical domain, we had to develop our own evaluation corpora in order to test our disambiguation methods. In this paper we describe the work on developing these corpora, using GermaNet and UMLS as (lexical) semantic resources, next to a description of the annotation tool KiC that we developed for support of the annotation task.
5 Word-Vectors and Search Engines
"... So far in this book we have discussed symmetric and antisymmetric relationships between particular words in a graph or a hierarchy, described one way to learn symmetric relationships from text, and shown how to use ideas such as similarity measures and transitivity to find ‘nearest neighbours ’ of a ..."
Abstract
- Add to MetaCart
So far in this book we have discussed symmetric and antisymmetric relationships between particular words in a graph or a hierarchy, described one way to learn symmetric relationships from text, and shown how to use ideas such as similarity measures and transitivity to find ‘nearest neighbours ’ of a particular word. But ideally we should be able to measure the similarity or distance between any pair of words or concepts. To some extent, this is possible in graphs and taxonomies by finding the lengths of paths between concepts, but there are problems with this. First of all, finding shortest paths is often computationally expensive and may take a long time. Secondly, we might not have a reliable taxonomy, and as we’ve seen already, that there is a short path between two words in a graph doesn’t necessarily mean that they’re very similar, because the links in this short path may have arisen from very different contexts. Thirdly, the meanings of words we encounter in documents and corpora may be very different from those given by a general taxonomy such as WordNet — for example, WordNet 2.0 only gives the fruit and tree meanings for the word apple, which is a stark contrast with the top 10 pages returned by Google when doing an internet search with the query apple, which are all about Apple Computers. Another limitation of our methods so far is that we have focussed our attention purely on individual concepts, mainly single words. Ideally, we should be able to find the similarity between two arbitrary collections of words, and quickly. For this, we need some process for semantic composition — working out how to represent the meaning of a sentence or document based on the meaning of the words it contains.
Unsupervised Disambiguation for a Multilingual Medical Information System Using UMLS
, 2004
"... This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using the Unified Medical Language System (UMLS). We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availa ..."
Abstract
- Add to MetaCart
This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using the Unified Medical Language System (UMLS). We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best results are obtained using relationships between terms given by UMLS, a method which achieves 74% precision, 66% coverage for English and 79% precision, 73% coverage for German on evaluation corpora and over 83% coverage over the whole corpus. The success of this technique for German shows that a lexical resource giving relationships between concepts used to index an English document collection can be used for high quality disambiguation in another language. Document

