Results 1 -
2 of
2
Aligning Sentences In Bilingual Corpora Using Lexical Information
, 1993
"... In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ig- nore word identities and only consider sentence length (Brown et al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statisti- cal wor ..."
Abstract
-
Cited by 99 (2 self)
- Add to MetaCart
In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ig- nore word identities and only consider sentence length (Brown et al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statisti- cal word-to-word translation model on the fly during alignment. We find the alignment that maximizes the probability of generating the corpus with this translation model. We have achieved an error rate of approximately 0.4% on Canadian Hansard data, which is a significant improvement over previous results. The algorithm is language indepen- dent.
Master’s Thesis Using parallel corpora to create Greek-English
, 2006
"... * This thesis corresponds to 20 weeks of full-time work The importance of parallel corpora has been appreciated for many years. After the emergence of statistical translation methods there were many projects that have been working on automated dictionary extraction using parallel corpora. Many corpo ..."
Abstract
- Add to MetaCart
* This thesis corresponds to 20 weeks of full-time work The importance of parallel corpora has been appreciated for many years. After the emergence of statistical translation methods there were many projects that have been working on automated dictionary extraction using parallel corpora. Many corpora processing systems and tools have been implemented and have been applied to parallel corpora of most of the popular natural languages. However there are not many projects on automated creation of a dictionary between the Greek and English language pair. This thesis project focuses on the creation of a machine readable bilingual dictionary from Greek-English parallel corpora that were created manually by collected documents retrieved from the Internet. The English corpora contained 196.048 words in total, with 10.450 unique words identified, while the Greek corpora contained 204.043 words in total, with 18.117 unique words identified respectively. The parallel corpora processing was performed by the Uplug system without the use of language specific information. A sample was extracted from the population of suggested translations included in the resulted dictionary, and was included in questionnaires that were sent out to Greek-English speakers who evaluated the sample based on the quality of the translation pairs. For the suggested translation pairs of the sample belonging to the stratum with the higher frequency of occurrence, 67.11 % of correct translations have been achieved. With an overall of 50,63 % correct translations of the sample, the results were promising considering the minimal optimisation of the corpus and the many differences between the two languages. The resulted dictionary could be used as input to special software tools that in their turn could be used by search engines for web site searching, or it can be utilised by Multilingual Information Retrieval applications in order to facilitate web retrieval and act as a bridge between different languages. The dictionary can also be used as a translation tool between Greek and other small languages with English acting as a pivot language. ii

