Results 1 -
6 of
6
Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus
- In Proceedings of eightth CLIN meeting
, 1998
"... Within the project Twenty-One, which aims at effective dissemination of information on ecology and sustainable development, a system is developed that supports cross-language information retrieval for any of the four languages Dutch, English, French and German. Knowledge of this application domain i ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
Within the project Twenty-One, which aims at effective dissemination of information on ecology and sustainable development, a system is developed that supports cross-language information retrieval for any of the four languages Dutch, English, French and German. Knowledge of this application domain is neededto enhanceexisting translation resourcesfor the purpose of lexical disambiguation. This paper describes an algorithm for the automated acquisition of a translation lexicon from a parallel corpus. New about the presented algorithm is the statistical language model used. Because the algorithm is based on a symmetric translation model it becomespossible to identify one-to-many and many-to-one relations between words of a language pair. We claim that the presented method has two advantagesover algorithms that have been published before. Firstly, because the translation model is more powerful, the resulting bilingual lexicon will be more accurate. Secondly, the resulting bilingual lexicon can be used to translate in both directions between a language pair. Different versions of the algorithm were evaluated on the Dutch and English version of the Agenda 21 corpus, which is a UN document on the application domain of sustainable development. 1
Language-Based Multimedia Information Retrieval
- In 6th RIAO Conference
, 2000
"... This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material b ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality. 1 Introduction D h...puvo/oor+ 's hyy xvq+# qr#hvyrq q'pr#h#v' hq f...'svyvt '...
Headline Extraction Based on a Combination of Uni- and Multidocument Summarization Techniques
, 2002
"... The TNO system for multi-document summarisation is based on an extraction approach. For headline generation, we chose to extend our system to extract the most informative topical noun phrase. The cluster topic is defined as the most frequent term occurring in the most salient document sentences. The ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
The TNO system for multi-document summarisation is based on an extraction approach. For headline generation, we chose to extend our system to extract the most informative topical noun phrase. The cluster topic is defined as the most frequent term occurring in the most salient document sentences. The core of our system is a probabilistic model, which estimates the log-odds of salience based on a number of features including sentence position, sentence length, cue phrases and a language model based content score. The parameters of the model were estimated on annotated training data.
OLIVE: Speech Based Video Retrieval
, 1998
"... . This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
. This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the basis for text-based retrieval functionality. The retrieval demonstrator builds on and extends the architecture from the Pop-Eye project, a system applying human language technology on subtitles for the disclosure of video fragments. 1 Introduction In archives of all kinds, detailed documentation and profiling of the archived material is a prerequisite for efficient and precise access to the data. While in the domain of textual digital libraries advanced methods of information retrieval can support such processes, there are so far no effective methods for automatically profiling, indexing, and retrieving image and video material on the basis of a direct analysis of its visual conten...
Integrating Different Strategies for Cross-Language Information Retrieval in the MIETTA Project
"... In this paper we describe an integrated approach to cross-language retrieval within the MIETTA project, whose objective is to build a special purpose search engine in the tourism domain that covers information from a number of geographical regions. MIETTA is designed to enable users to search and re ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In this paper we describe an integrated approach to cross-language retrieval within the MIETTA project, whose objective is to build a special purpose search engine in the tourism domain that covers information from a number of geographical regions. MIETTA is designed to enable users to search and retrieve information on the regions covered in their own language preferably. In order to facilitate the user with such functionality, the system includes document translation, cross-language query translation, multilingual generation from information extraction templates and document classification. In addition, query expansion is offered to identify
Twente Workshop on Language Technology
, 1998
"... TWLT is an acronym of Twente Workshop(s) on Language Technology. These workshops on natural language theory and technology are organised by the Parlevink Project, a language theory and technology project of the Department of Computer Science of the University of Twente, Enschede, The Netherlands. Fo ..."
Abstract
- Add to MetaCart
TWLT is an acronym of Twente Workshop(s) on Language Technology. These workshops on natural language theory and technology are organised by the Parlevink Project, a language theory and technology project of the Department of Computer Science of the University of Twente, Enschede, The Netherlands. For each workshop proceedings are published containing the papers that were presented. TWLT 14, has been organised together with the German Research Center for Artificial Intelligence, DFKI Saarbrücken, Germany. The idea for this workshop grew out of a longstanding cooperation between the University of Twente, TNO-TPD in Delft and DFKI. This co-operation manifested itself for the first time in the Twenty-One project, which inspired a whole series of other projects, such as Pop-Eye and Olive, but which also led to a close contact and exchange with independently established projects such as Mulinex and MIETTA for which DFKI was responsible. All of these projects had in common that they were funded by the Telematics Application Programme of the European Commission, all, except for Twenty-One, by the Language Engineering Sector. Beyond this formal feature, the projects mentioned also had in common that part of their agenda was and is to explore and prove the use and usefulness of language technology in the area of

