Results 1 - 10
of
22
Sentence alignment in parallel, comparable, and quasicomparable corpora
- In Proceedings of LREC
, 2004
"... We explore the usability of different bilingual corpora for the purpose of multilingual and cross-lingual natural language processing. The usability of bilingual corpus is evaluated by the lexical alignment score calculated for the bi-lexicon pair distributed in the aligned bilingual sentence pairs. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
. We compare and contrast a number of bilingual corpora, ranging from parallel, to comparable, and to non-parallel corpora. We compare different methods of mining parallel sentences and bilingual lexicon from bilingual corpora. These methods make several sentence-level assumptions on the bilingual
2004b. Multi-level bootstrapping for extracting parallel sentences from a quasicomparable corpus
- Proceedings of COLING 2004
"... We propose a completely unsupervised method for mining parallel sentences from quasi-comparable bilingual texts which have very different sizes, and which include both in-topic and off-topic documents. We discuss and analyze different bilingual corpora with various levels of comparability. We propos ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
We propose a completely unsupervised method for mining parallel sentences from quasi-comparable bilingual texts which have very different sizes, and which include both in-topic and off-topic documents. We discuss and analyze different bilingual corpora with various levels of comparability. We
Mining VeryNon-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM
- Proceedings of EMNLP
, 2004
"... We present a method capable of extracting parallel sentences from far more disparate “very-non-parallel corpora ” than previous “comparable corpora ” methods, by exploiting bootstrapping on top of IBM Model 4 EM. Step 1 of our method, like previous methods, uses similarity measures to find matching ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
We present a method capable of extracting parallel sentences from far more disparate “very-non-parallel corpora ” than previous “comparable corpora ” methods, by exploiting bootstrapping on top of IBM Model 4 EM. Step 1 of our method, like previous methods, uses similarity measures to find matching
Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora
"... In this article, we present a simple and effective approach for extracting bilingual lexicon from comparable corpora enhanced with parallel corpora. We make use of structural characteristics of the documents comprising the comparable corpus to extract parallel sentences with a high degree of quality ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this article, we present a simple and effective approach for extracting bilingual lexicon from comparable corpora enhanced with parallel corpora. We make use of structural characteristics of the documents comprising the comparable corpus to extract parallel sentences with a high degree
Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM
"... We present a method capable of extracting parallel sentences from far more disparate “very-non-parallel corpora ” than previous “comparable corpora ” methods, by exploiting bootstrapping on top of IBM Model 4 EM. Step 1 of our method, like previous methods, uses similarity measures to find matching ..."
Abstract
- Add to MetaCart
We present a method capable of extracting parallel sentences from far more disparate “very-non-parallel corpora ” than previous “comparable corpora ” methods, by exploiting bootstrapping on top of IBM Model 4 EM. Step 1 of our method, like previous methods, uses similarity measures to find matching
IOS Press An Efficient Framework for Extracting Parallel Sentences from
"... Abstract. Automatically building a large bilingual corpus that contains millions of words is always a challenging task. In particular in case of low-resource languages, it is difficult to find an existing parallel corpus which is large enough for building a real statistical machine translation. Howe ..."
Abstract
- Add to MetaCart
Abstract. Automatically building a large bilingual corpus that contains millions of words is always a challenging task. In particular in case of low-resource languages, it is difficult to find an existing parallel corpus which is large enough for building a real statistical machine translation
Topic Models + Word Alignment = A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus
"... We propose a flexible and effective framework for extracting a bilingual dictionary from comparable corpora. Our approach is based on a novel combination of topic modeling and word alignment techniques. Intuitively, our approach works by converting a comparable document-aligned corpus into a paralle ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
parallel topic-aligned corpus, then learning word alignments using co-occurrence statistics. This topicaligned corpus is similar in structure to the sentence-aligned corpus frequently used in statistical machine translation, enabling us to exploit advances in word alignment research. Unlike many previous
Dictionary Acquisition using Parallel Text and Co-occurrence Statistics
"... We present a simple and efficient approach for deriving bilingual dic-tionaries from sentence-aligned par-allel text by extending the notion of co-occurrences to a cross-lingual setting. Dictionaries are evaluated against gold standards and manu-ally; the analysis accounts for fre-quency and corpus ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We present a simple and efficient approach for deriving bilingual dic-tionaries from sentence-aligned par-allel text by extending the notion of co-occurrences to a cross-lingual setting. Dictionaries are evaluated against gold standards and manu-ally; the analysis accounts for fre-quency and corpus
Source Language Effect on Translating Korean Honorifics
"... This paper investigates the effect of source language on translations using two variants of a Korean translation corpus. The sentences are translations of a bilingual travel corpus. The original corpus was compiled from Japanese-English parallel sentences collected from more than 300 phrase books in ..."
Abstract
- Add to MetaCart
This paper investigates the effect of source language on translations using two variants of a Korean translation corpus. The sentences are translations of a bilingual travel corpus. The original corpus was compiled from Japanese-English parallel sentences collected from more than 300 phrase books
Toward A Bilingual Legal Term Glossary from Context Profiles
"... We propose an algorithm for the automatic acquisition of a bilingual lexicon in the legal domain. We make use of a parallel corpus of bilingual court judgments, aligned to the sentence level, and analyse the bilingual context profiles to extract corresponding legal terms in both languages. Our metho ..."
Abstract
- Add to MetaCart
We propose an algorithm for the automatic acquisition of a bilingual lexicon in the legal domain. We make use of a parallel corpus of bilingual court judgments, aligned to the sentence level, and analyse the bilingual context profiles to extract corresponding legal terms in both languages. Our
Results 1 - 10
of
22