Results 1 - 10
of
18
Models of Translational Equivalence among Words
- Computational Linguistics
, 2000
"... This article presents methods for biasing statistical translation models to reflect these properties. Evaluation with respect to independent human judgments has confirmed that translation models biased in this fashion are significantly more accurate than a baseline knowledge-free model. This article ..."
Abstract
-
Cited by 121 (2 self)
- Add to MetaCart
This article presents methods for biasing statistical translation models to reflect these properties. Evaluation with respect to independent human judgments has confirmed that translation models biased in this fashion are significantly more accurate than a baseline knowledge-free model. This article also shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs. Even the simplest kinds of languagespecific knowledge, such as the distinction between content words and function words, are shown to reliably boost translation model performance on some tasks. Statistical models that reflect knowledge about the model domain combine the best of both the rationalist and empiricist paradigms
Disambiguation strategies for cross-language information retrieval
- In Proceedings of the third European Conference on Research and Advanced Technology for Digital Libraries (ECDL
, 1999
"... Keywords: Cross-Language Information Retrieval, Statistical Machine ..."
Abstract
-
Cited by 33 (11 self)
- Add to MetaCart
Keywords: Cross-Language Information Retrieval, Statistical Machine
NATools – A statistical word aligner workbench. Processamiento del Lenguaje Natural
- In Proceedings da Sociedade Española para el Procesamiento del Lenguaje Natural
, 2003
"... Resumen: Este documento presenta el proyecto TerminUM y el trabajo realizado en su alineador estadístico a nivel de palabra (NATools). Muestra una variedad de métodos de alineamento para corpora paralelos y discute los diccionarios terminológicos resultantes y su uso: evaluación de traducciones; con ..."
Abstract
-
Cited by 17 (12 self)
- Add to MetaCart
Resumen: Este documento presenta el proyecto TerminUM y el trabajo realizado en su alineador estadístico a nivel de palabra (NATools). Muestra una variedad de métodos de alineamento para corpora paralelos y discute los diccionarios terminológicos resultantes y su uso: evaluación de traducciones; construcción de un sistema de navegación para estudios lingüísticos, o traducción estadística. Palabras clave: corpora paralelos, alineamento a nivel de palabra Abstract: This document presents the TerminUM project and the work done in its statistical word aligner workbench (NATools). It shows a variety of alignment methods for parallel corpora and discusses the resulting terminological dictionaries and their use: evaluation of sentence translations; construction of a multi-level navigation system for linguistic studies or statistical translations.
Combining Clues for Word Alignment
- In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL): 12–17 April 2003; Budapest Programme chairs Copestake A, Hajic J
, 2003
"... In this paper, a word alignment approach is presented which is based on a combination of clues. Word alignment clues indicate associations between words and phrases. They can be based on features such as frequency, part-of-speech, phrase type, and the actual wordform strings. Clues can be found by c ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
In this paper, a word alignment approach is presented which is based on a combination of clues. Word alignment clues indicate associations between words and phrases. They can be based on features such as frequency, part-of-speech, phrase type, and the actual wordform strings. Clues can be found by calculating similarity measures or learned from word aligned data. The clue alignment approach...
From the Rosetta Stone to the Information Society: A Survey of parellel text processing
, 2000
"... This introductory chapter provides a survey of the processing and use of parallel texts, i.e., texts accompanied by their translation. Throughout the chapter, the various authors' contributions to the book are considered and related to the state of the art in the field. Three themes are addressed, c ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This introductory chapter provides a survey of the processing and use of parallel texts, i.e., texts accompanied by their translation. Throughout the chapter, the various authors' contributions to the book are considered and related to the state of the art in the field. Three themes are addressed, corresponding to the three parts of the book: (i) techniques and methodology for the alignment of parallel texts at various levels such as sentences, clauses or words; (ii) applications of parallel texts in fields such as translation, lexicography, and information retrieval; and (iii) available corpus resources and evaluation of alignment methods.
NatServer: A Client-Server Architecture for building Parallel Corpora applications
"... Parallel corpora are important resources for most Natural Language processing tasks. From the common applications, like machine translation, to the usually mono-lingual tasks as paraphrase detection and word sense disambiguation, most researchers are using massive parallel corpora. Thus, the availa ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Parallel corpora are important resources for most Natural Language processing tasks. From the common applications, like machine translation, to the usually mono-lingual tasks as paraphrase detection and word sense disambiguation, most researchers are using massive parallel corpora. Thus, the availability of an efficient way to manage them is very important. This paper presents a Client-Server architecture to query efficiently parallel corpora and probabilistic translation dictionaries.
LIHLA: A lexical aligner based on language-independent heuristics
, 2005
"... Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a l ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a lexical aligner which uses bilingual probabilistic lexicons generated by a freely available set of tools (NATools) and language-independent heuristics to find links between single words and multiword units in Brazilian Portuguese, Spanish and English parallel texts. The method has achieved a precision of 92.48% and 84.35% and a recall of 88.32% and 76.39% on Brazilian Portuguese--Spanish and Brazilian Portuguese--English parallel texts, respectively.
Building a Multilingual Parallel Subtitle Corpus
"... In this paper on-going work of creating an extensive multilingual parallel corpus of movie subtitles is presented. The corpus currently contains roughly 23,000 pairs of aligned subtitles covering about 2,700 movies in 29 languages. Subtitles mainly consist of transcribed speech, sometimes in a very ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper on-going work of creating an extensive multilingual parallel corpus of movie subtitles is presented. The corpus currently contains roughly 23,000 pairs of aligned subtitles covering about 2,700 movies in 29 languages. Subtitles mainly consist of transcribed speech, sometimes in a very condensed way. Insertions, deletions and paraphrases are very frequent which makes them a challenging data set to work with especially when applying automatic sentence alignment. Standard alignment approaches rely on translation consistency either in terms of length or term translations or a combination of both. In the paper, we show that these approaches are not applicable for subtitles and we propose a new alignment approach based on time overlaps specifically designed for subtitles. In our experiments we obtain a significant improvement of alignment accuracy compared to standard length-based approaches. 1
Parallel corpora for the Galician language: building and processing of the CLUVI (Linguistic Corpus of the University of Vigo)
"... In this paper, we present the methodology developed by the SLI (Computational Linguistics Group of the University of Vigo) for the building and processing of the CLUVI Corpus, showing the TMX-based XML specification designed to encode both morphosyntactic features and translation alignments in paral ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper, we present the methodology developed by the SLI (Computational Linguistics Group of the University of Vigo) for the building and processing of the CLUVI Corpus, showing the TMX-based XML specification designed to encode both morphosyntactic features and translation alignments in parallel corpora, and the solutions adopted for making the CLUVI parallel corpora freely available over the WWW
Evaluating the LIHLA lexical aligner on Spanish, Brazilian Portuguese and Basque parallel texts
, 2005
"... Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a le ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Alignment of words and multiword units plays an important role in many natural language processing applications, such as example-based machine translation, transfer rule learning for machine translation, bilingual lexicography, word sense disambiguation, etc. In this paper we describe LIHLA, a lexical aligner which uses bilingual probabilistic lexicons generated by a freely available set of tools (NATools) and language-independent heuristics to find links between single words and multiword units in sentence-aligned parallel texts. The method has achieved a precision of 92.44% and 85.09% and a recall of 91.13% and 64.66% on Brazilian Portuguese--Spanish and Spanish--Basque parallel texts, respectively.

