Results 1 -
2 of
2
Identifying Word Correspondances in Parallel Texts
- In Proceedings of the Fourth DARPA Speech and Natural Language Workshop
, 1991
"... this paper. We wish to distinguish the terms alignment and correspondance. The term alignment will be used when order constraints must be preserved and the term correspondance will be used when order constraints need not be preserved and crossing dependencies are permitted. We refer to the matching ..."
Abstract
-
Cited by 140 (4 self)
- Add to MetaCart
this paper. We wish to distinguish the terms alignment and correspondance. The term alignment will be used when order constraints must be preserved and the term correspondance will be used when order constraints need not be preserved and crossing dependencies are permitted. We refer to the matching problem at the word level as a correspondance problem because it is important to model crossing dependencies (e.g., sales volume and volume des ventes). In contrast, we refer to the matching problem at the sentence level as an alignment problem because we believe that it is not necessary to model crossing dependencies at the sentence level as they are quite rare and can be ignored (at least for now). - 3 - Here is an example of our word correspondance program. Given the input English and French sentences:
Char_align: A Program for Aligning Parallel Texts at the Character Level
- In Proceedings of the 31st Annual Conference of the Association for Computational Linguistics
, 1993
"... There have been a number of recent papers on aligning parallel texts at the sentence level, e.g., Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and Ro .. senschein (to appear), Simard et al (1992), WarwickArmstrong and Russell (1990). On clean inputs, such as the Canadian Han ..."
Abstract
-
Cited by 103 (3 self)
- Add to MetaCart
There have been a number of recent papers on aligning parallel texts at the sentence level, e.g., Brown et al (1991), Gale and Church (to appear), Isabelle (1992), Kay and Ro .. senschein (to appear), Simard et al (1992), WarwickArmstrong and Russell (1990). On clean inputs, such as the Canadian Hansards, these methods have been very successful (at least 96% correct by sentence). Unfortunately, if the input is noisy (due to OCR and/or unknown markup conventions), then these methods tend to break down because the noise can make it difficult to find paragraph boundaries, let alone sentences. This paper describes a new program, char_align, that aligns texts at the character level rather than at the sentence/paragraph level, based on the cognate approach proposed by Simard et al. 1. Introduction Parallel texts have recently received considerable attention in machine translation (e.g., Brown et al, 1990), bilingual lexicography (e.g., Klavans and Tzoukermann, 1990), and terminology resea...

