Results 1 -
9 of
9
Linguistic annotation for the semantic web
- Annotation for the Semantic Web. IOS
, 2003
"... Abstract. Establishing the semantic web on a large scale implies the widespread annotation ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. Establishing the semantic web on a large scale implies the widespread annotation
Automatic processing of multilingual medical terminology: Applications to thesaurus enrichment and cross-language information retrieval
- Artificial Intelligence in Medicine, 33(2
, 2005
"... We present in this article experiments on Multi-Language Information Extraction and Access in the medical domain. Methods for extracting bilingual lexicons from parallel and comparable corpora are described and their use in Multi-Language Information Access is illustrated. Our experiments show that ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present in this article experiments on Multi-Language Information Extraction and Access in the medical domain. Methods for extracting bilingual lexicons from parallel and comparable corpora are described and their use in Multi-Language Information Access is illustrated. Our experiments show that these automatically extracted bilingual lexicons are accurate enough for semi-automatically enriching mono- or bilingual thesauri (such as UMLS), and that their use in Cross-language Information Retrieval (CLIR) significantly improves the retrieval performance and clearly outperforms existing bilingual lexicon resources (both general lexicons and specialized ones).
Multilingual content processing
- In Proceedings of 4th International Conference on Language Resources and Evaluation (LREC) 2004
, 2004
"... This contribution describes the consequences of a multilingual set-up, as used in internet information gathering, search and content processing, for the architecture and the different components of such a system. First the scenario is briefly outlined; then the existing technology components are rev ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This contribution describes the consequences of a multilingual set-up, as used in internet information gathering, search and content processing, for the architecture and the different components of such a system. First the scenario is briefly outlined; then the existing technology components are reviewed, with the focus of the effects of multilingual content to such components. Finally, the linguistic resources are discussed which form the backbone of such a multifunctional and multilingual system. It is shown that adding multilinguality to an information system has massive consequences for the design of all system components and resources. 1 The Crosslingual Scenario Information Acquisition has become a major challenge in the internet age. The amount of information to be monitored has grown significantly, but time and resources to perform this task are still constrained. Tools like personalised electronic news clipping, automatic knowledge mining, or internet monitoring try to match the new requirements. As a result, a scenario needs to be designed which requires massive natural language support
Evaluation Resources for Concept-based Cross-Lingual Information Retrieval in the Medical Domain
"... The paper describes evaluation resources for concept-based, cross-lingual information retrieval in the medical domain. All resources were constructed in the context of the MuchMore project and are freely available through the project website. Available resources include: a bilingual, parallel docume ..."
Abstract
- Add to MetaCart
The paper describes evaluation resources for concept-based, cross-lingual information retrieval in the medical domain. All resources were constructed in the context of the MuchMore project and are freely available through the project website. Available resources include: a bilingual, parallel document collection of German and English medical scientific abstracts, a set of queries and corresponding relevance assessments, two manually disambiguated test sets for semantic annotation (sense disambiguation), two evaluation lists for German morphological decomposition of medical terms. MuchMore The evaluation resources described in this paper were all constructed in the context of the MuchMore project 1 on concept-based, cross-lingual information retrieval (CLIR). The project provided a framework for integrating and refining existing technologies and developing new approaches to CLIR for the medical domain. For this purpose, the project pursued the following aims: • Integrated and effective combination of different approaches and heterogeneous resources for crosslingual information access and management, including performance and user evaluation for realistic information access tasks. • Automated acquisition of domain-specific linguistic resources and effective use of multilingual concept hierarchies. • Demonstration of a cross-lingual information access prototype system for the medical domain, that provides access to multilingual information on the basis of a combined use of corpus analysis and (domain-specific) ontologies and thesauri. The MuchMore Prototype The MuchMore project developed a prototype crosslingual document retrieval system that enables users to retrieve documents (in English and/or German) that are relevant to a given query document (in English or German), see e.g. (Sacaleanu et al., 2003). In the current
Developing Resources for Swedish Bio-Medical Text Mining
"... Collection and annotation of corpora in specialized fields, such as medicine, and particularly for lesser-spoken languages, than for instance English, is an important enterprise for the continuous development and growth of language technology research, for resource development and for the implementa ..."
Abstract
- Add to MetaCart
Collection and annotation of corpora in specialized fields, such as medicine, and particularly for lesser-spoken languages, than for instance English, is an important enterprise for the continuous development and growth of language technology research, for resource development and for the implementation of practical applications for these languages. In this paper, we describe our ongoing efforts to build a large Swedish medical corpus, the MEDLEX Corpus, how we combine generic named entity and terminology recognition for the detailed annotation of the corpus, and how these annotations are further utilized by an annotations-aware cascaded finite-state parser. 1
The Interaction Between Automatic Annotation and Query Expansion: a retrieval experiment on a large cultural heritage archive
"... Abstract. Improving a search system for large audiovisual archives can be done in two ways: by enriching the annotations, or by enriching the query mechanism. Both operations possibly benefit from a preliminary terminological enrichment of the controlled vocabulary in use, i.e. the thesaurus. In thi ..."
Abstract
- Add to MetaCart
Abstract. Improving a search system for large audiovisual archives can be done in two ways: by enriching the annotations, or by enriching the query mechanism. Both operations possibly benefit from a preliminary terminological enrichment of the controlled vocabulary in use, i.e. the thesaurus. In this paper we report on a four-parts experiment in which we evaluate the benefits and drawbacks of both aspects: the added value and pitfalls of automatically generated semantic annotations over classically (i.e. manually) assigned keywords and the added value and pitfalls of query expansion over pure keyword matching technique; we then investigate the combination of these operations in the following setup: we create the baseline for our experiments by querying a set of documents annotated by cataloguers with keywords from the thesaurus. We then apply the same querying process on a set of annotations automatically generated from textual resources related to the documents. Thirdly, we apply a querying process enhanced with query expansion functionalities to the first set of manually annotated documents. Finally, we apply the query expansion mechanism on the automatically generated annotations. The results give insight into the interaction between the two approaches. 1
Testing Concept Indexing in Crosslingual Medical Text Classification
"... MetaMap is an online application that allows mapping text to UMLS Metathesaurus concepts, which is very useful for interoperability among different languages and systems within the biomedical domain. MetaMap Transfer (MMTx) is a Java program that makes MetaMap available to biomedical researchers in ..."
Abstract
- Add to MetaCart
MetaMap is an online application that allows mapping text to UMLS Metathesaurus concepts, which is very useful for interoperability among different languages and systems within the biomedical domain. MetaMap Transfer (MMTx) is a Java program that makes MetaMap available to biomedical researchers in controlled, configurable environment. Currently there is no Spanish version of MetaMap, which difficult the use of UMLS Metathesaurus to extract concepts from Spanish biomedical texts. Developing a Spanish version of MetaMap would be a huge task, since there has been a lot of work supporting the English version for the last sixteen years. Our ongoing research is mainly focused on using biomedical concepts for cross-lingual text classification. In this context the use of concepts instead of bag of words representation allows us to face text classification tasks abstracting from the language. In this paper we show our experiments on combining automatic translation techniques with the use of biomedical ontologies to produce an English text that can be processed by MMTx in order to extract concepts for text classification. 1.
Building a Spanish MMTx by using Automatic Translation and Biomedical Ontologies
"... Abstract. The use of domain ontologies is becoming increasingly popular in Medical Natural Language Processing Systems. A wide variety of knowledge bases in multiple languages has been integrated into the Unified Medical Language System (UMLS) to create a huge knowledge source that can be accessed w ..."
Abstract
- Add to MetaCart
Abstract. The use of domain ontologies is becoming increasingly popular in Medical Natural Language Processing Systems. A wide variety of knowledge bases in multiple languages has been integrated into the Unified Medical Language System (UMLS) to create a huge knowledge source that can be accessed with diverse lexical tools. MetaMap (and its java version MMTx) is a tool that allows extracting medical concepts from free text, but currently there not exists a Spanish version. Our ongoing research is centered on the application of biomedical concepts to cross-lingual text classification, what makes it necessary to have a Spanish MMTx available. We have combined automatic translation techniques with biomedical ontologies and the existing English MMTx to produce a Spanish version of MMTx. We have evaluated different approaches and applied several types of evaluation according to different concept representations for text classification. Our results prove that the use of existing translation tools such as Google Translate produce translations with a high similarity to original texts in terms of extracted concepts.
Unsupervised Disambiguation for a Multilingual Medical Information System Using UMLS
, 2004
"... This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using the Unified Medical Language System (UMLS). We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availa ..."
Abstract
- Add to MetaCart
This paper describes techniques for unsupervised word sense disambiguation of English and German medical documents using the Unified Medical Language System (UMLS). We present both monolingual techniques which rely only on the structure of UMLS, and bilingual techniques which also rely on the availability of parallel corpora. The best results are obtained using relationships between terms given by UMLS, a method which achieves 74% precision, 66% coverage for English and 79% precision, 73% coverage for German on evaluation corpora and over 83% coverage over the whole corpus. The success of this technique for German shows that a lexical resource giving relationships between concepts used to index an English document collection can be used for high quality disambiguation in another language. Document

