Results 1 -
7 of
7
Aligning Sentences In Bilingual Corpora Using Lexical Information
, 1993
"... In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ig- nore word identities and only consider sentence length (Brown et al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statisti- cal wor ..."
Abstract
-
Cited by 99 (2 self)
- Add to MetaCart
In this paper, we describe a fast algorithm for aligning sentences with their translations in a bilingual corpus. Existing efficient algorithms ig- nore word identities and only consider sentence length (Brown et al., 1991b; Gale and Church, 1991). Our algorithm constructs a simple statisti- cal word-to-word translation model on the fly during alignment. We find the alignment that maximizes the probability of generating the corpus with this translation model. We have achieved an error rate of approximately 0.4% on Canadian Hansard data, which is a significant improvement over previous results. The algorithm is language indepen- dent.
Building Probabilistic Models for Natural Language
, 1996
"... Building models of language is a central task in natural language processing. Traditionally, language has been modeled with manually-constructed grammars that describe which strings are grammatical and which are not; however, with the recent availability of massive amounts of on-line text, statistic ..."
Abstract
-
Cited by 60 (1 self)
- Add to MetaCart
Building models of language is a central task in natural language processing. Traditionally, language has been modeled with manually-constructed grammars that describe which strings are grammatical and which are not; however, with the recent availability of massive amounts of on-line text, statistically-trained models are an attractive alternative. These models are generally probabilistic, yielding a score reflecting sentence frequency instead of a binary grammaticality judgement. Probabilistic models of language are a fundamental tool in speech recognition for resolving acoustically ambiguous utterances. For example, we prefer the transcription forbear to four bear as the former string is far more frequent in English text. Probabilistic models also have application in optical character recognition, handwriting recognition, spelling correction, part-of-speech tagging, and machine translation. In this thesis, we investigate three problems involving the probabilistic modeling of languag...
Combining Corpus and Machine-Readable Dictionary Data for Building Bilingual Lexicons
, 1996
"... . This paper describes and discusses some theoretical and practical problems arising from developing a system to combine the structured but incomplete information from machine readable dictionaries (MRDs) with the unstructured but more complete information available in corpora for the creation of a ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
. This paper describes and discusses some theoretical and practical problems arising from developing a system to combine the structured but incomplete information from machine readable dictionaries (MRDs) with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base, presenting a methodology to integrate information from both sources into a single lexical data structure. The bicord system (BIlingual CORpus-enhanced Dictionaries) involves linking entries in Collins English-French and FrenchEnglish bilingual dictionary with a large English-French and French-English bilingual corpus. We have concentrated on the class of action verbs of movement, building on earlier work on lexical correspondences specific to this verb class between languages (Klavans and Tzoukermann, 1989), (Klavans and Tzoukermann, 1990a), (Klavans and Tzoukermann, 1990b). 1 We first examine the way prototypical verbs of movement are translated in the Collin...
The BICORD system: combining lexical information from bilingual corpora and machine readable dictionaries
- Proceedings of the 13th Annual Meeting of the Association of Computational Linguistics
, 1990
"... Our goal is to explore methods for combining structured but incomplete information from dictionaries with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base. This paper concentrates on the class of action verbs of movement, and build ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Our goal is to explore methods for combining structured but incomplete information from dictionaries with the unstructured but more complete information available in corpora for the creation of a bilingual lexical data base. This paper concentrates on the class of action verbs of movement, and builds on earlier work on lexical correspondences between languages and specific to this verb class. The languages we explore here are English and French. We first examine the way prototypical verbs of movement are translated in the Collins-Robert (Collins 1978, henceforth CR) bilingual dictionary. We then analyze the behavior of some of these verbs in a large bilingual corpus. We take advantage of the results of linguistic research on verb types (e.g. Levin, to appear) coupled with data from machine readable dictionaries to motivate corpus-based text analysis for the purpose of estabfishing lexical correspondences with the full range of associated translations and then attach frequencies to translations. 1. Background. As NLP systems become more robust, large lexicons are required, providing a wide range of information including syntactic, semantic, pragmatic, naorphological and phonological. There are difficulties in constructing these large lexicons, first in their design, and then in providing them with the necessary and sufficient data. These problems have recently been the topic of intense research
From the Rosetta Stone to the Information Society: A Survey of parellel text processing
, 2000
"... This introductory chapter provides a survey of the processing and use of parallel texts, i.e., texts accompanied by their translation. Throughout the chapter, the various authors' contributions to the book are considered and related to the state of the art in the field. Three themes are addressed, c ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This introductory chapter provides a survey of the processing and use of parallel texts, i.e., texts accompanied by their translation. Throughout the chapter, the various authors' contributions to the book are considered and related to the state of the art in the field. Three themes are addressed, corresponding to the three parts of the book: (i) techniques and methodology for the alignment of parallel texts at various levels such as sentences, clauses or words; (ii) applications of parallel texts in fields such as translation, lexicography, and information retrieval; and (iii) available corpus resources and evaluation of alignment methods.
Corpora and Translation: Uses and Future Prospects
, 1993
"... Although corpora have been an object of study for some decades, the nineteen eighties saw an increased interest in their use and construction. With this increased interest and awareness has come an expansion in the application ..."
Abstract
- Add to MetaCart
Although corpora have been an object of study for some decades, the nineteen eighties saw an increased interest in their use and construction. With this increased interest and awareness has come an expansion in the application

