• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 22
Next 10 →

Sentence alignment in parallel, comparable, and quasicomparable corpora

by Percy Cheung, Pascale Fung - In Proceedings of LREC , 2004
"... We explore the usability of different bilingual corpora for the purpose of multilingual and cross-lingual natural language processing. The usability of bilingual corpus is evaluated by the lexical alignment score calculated for the bi-lexicon pair distributed in the aligned bilingual sentence pairs. ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
. We compare and contrast a number of bilingual corpora, ranging from parallel, to comparable, and to non-parallel corpora. We compare different methods of mining parallel sentences and bilingual lexicon from bilingual corpora. These methods make several sentence-level assumptions on the bilingual

2004b. Multi-level bootstrapping for extracting parallel sentences from a quasicomparable corpus

by Pascale Fung, Percy Cheung - Proceedings of COLING 2004
"... We propose a completely unsupervised method for mining parallel sentences from quasi-comparable bilingual texts which have very different sizes, and which include both in-topic and off-topic documents. We discuss and analyze different bilingual corpora with various levels of comparability. We propos ..."
Abstract - Cited by 24 (0 self) - Add to MetaCart
We propose a completely unsupervised method for mining parallel sentences from quasi-comparable bilingual texts which have very different sizes, and which include both in-topic and off-topic documents. We discuss and analyze different bilingual corpora with various levels of comparability. We

Mining VeryNon-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM

by Pascale Fung, Percy Cheung - Proceedings of EMNLP , 2004
"... We present a method capable of extracting parallel sentences from far more disparate “very-non-parallel corpora ” than previous “comparable corpora ” methods, by exploiting bootstrapping on top of IBM Model 4 EM. Step 1 of our method, like previous methods, uses similarity measures to find matching ..."
Abstract - Cited by 29 (1 self) - Add to MetaCart
We present a method capable of extracting parallel sentences from far more disparate “very-non-parallel corpora ” than previous “comparable corpora ” methods, by exploiting bootstrapping on top of IBM Model 4 EM. Step 1 of our method, like previous methods, uses similarity measures to find matching

Bilingual Lexicon Extraction from Comparable Corpora Enhanced with Parallel Corpora

by Emmanuel Morin, Emmanuel Prochasson
"... In this article, we present a simple and effective approach for extracting bilingual lexicon from comparable corpora enhanced with parallel corpora. We make use of structural characteristics of the documents comprising the comparable corpus to extract parallel sentences with a high degree of quality ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
In this article, we present a simple and effective approach for extracting bilingual lexicon from comparable corpora enhanced with parallel corpora. We make use of structural characteristics of the documents comprising the comparable corpus to extract parallel sentences with a high degree

Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and EM

by unknown authors
"... We present a method capable of extracting parallel sentences from far more disparate “very-non-parallel corpora ” than previous “comparable corpora ” methods, by exploiting bootstrapping on top of IBM Model 4 EM. Step 1 of our method, like previous methods, uses similarity measures to find matching ..."
Abstract - Add to MetaCart
We present a method capable of extracting parallel sentences from far more disparate “very-non-parallel corpora ” than previous “comparable corpora ” methods, by exploiting bootstrapping on top of IBM Model 4 EM. Step 1 of our method, like previous methods, uses similarity measures to find matching

IOS Press An Efficient Framework for Extracting Parallel Sentences from

by Non-parallel Corpora, Cuong Hoang, Anh-cuong Le, Phuong-thai Nguyen, Son Bao Pham, Tu Bao Ho
"... Abstract. Automatically building a large bilingual corpus that contains millions of words is always a challenging task. In particular in case of low-resource languages, it is difficult to find an existing parallel corpus which is large enough for building a real statistical machine translation. Howe ..."
Abstract - Add to MetaCart
Abstract. Automatically building a large bilingual corpus that contains millions of words is always a challenging task. In particular in case of low-resource languages, it is difficult to find an existing parallel corpus which is large enough for building a real statistical machine translation

Topic Models + Word Alignment = A Flexible Framework for Extracting Bilingual Dictionary from Comparable Corpus

by Xiaodong Liu, Kevin Duh, Yuji Matsumoto
"... We propose a flexible and effective framework for extracting a bilingual dictionary from comparable corpora. Our approach is based on a novel combination of topic modeling and word alignment techniques. Intuitively, our approach works by converting a comparable document-aligned corpus into a paralle ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
parallel topic-aligned corpus, then learning word alignments using co-occurrence statistics. This topicaligned corpus is similar in structure to the sentence-aligned corpus frequently used in statistical machine translation, enabling us to exploit advances in word alignment research. Unlike many previous

Dictionary Acquisition using Parallel Text and Co-occurrence Statistics

by Chris Biemann, Uwe Quasthoff
"... We present a simple and efficient approach for deriving bilingual dic-tionaries from sentence-aligned par-allel text by extending the notion of co-occurrences to a cross-lingual setting. Dictionaries are evaluated against gold standards and manu-ally; the analysis accounts for fre-quency and corpus ..."
Abstract - Cited by 2 (2 self) - Add to MetaCart
We present a simple and efficient approach for deriving bilingual dic-tionaries from sentence-aligned par-allel text by extending the notion of co-occurrences to a cross-lingual setting. Dictionaries are evaluated against gold standards and manu-ally; the analysis accounts for fre-quency and corpus

Source Language Effect on Translating Korean Honorifics

by Kyonghee Paik, Kiyonori Ohtake, Francis Bond, Kazuhide Yamamoto
"... This paper investigates the effect of source language on translations using two variants of a Korean translation corpus. The sentences are translations of a bilingual travel corpus. The original corpus was compiled from Japanese-English parallel sentences collected from more than 300 phrase books in ..."
Abstract - Add to MetaCart
This paper investigates the effect of source language on translations using two variants of a Korean translation corpus. The sentences are translations of a bilingual travel corpus. The original corpus was compiled from Japanese-English parallel sentences collected from more than 300 phrase books

Toward A Bilingual Legal Term Glossary from Context Profiles

by Oi Yee Kwong
"... We propose an algorithm for the automatic acquisition of a bilingual lexicon in the legal domain. We make use of a parallel corpus of bilingual court judgments, aligned to the sentence level, and analyse the bilingual context profiles to extract corresponding legal terms in both languages. Our metho ..."
Abstract - Add to MetaCart
We propose an algorithm for the automatic acquisition of a bilingual lexicon in the legal domain. We make use of a parallel corpus of bilingual court judgments, aligned to the sentence level, and analyse the bilingual context profiles to extract corresponding legal terms in both languages. Our
Next 10 →
Results 1 - 10 of 22
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University