Results 1 - 10
of
93
A Systematic Comparison of Various Statistical Alignment Models
- Computational Linguistics
, 2003
"... this article the problem of finding the word alignment of a bilingual sentence-aligned corpus by using language-independent statistical methods. There is a vast literature on this topic, and many different systems have been suggested to solve this problem. Our work follows and extends the methods in ..."
Abstract
-
Cited by 805 (22 self)
- Add to MetaCart
this article the problem of finding the word alignment of a bilingual sentence-aligned corpus by using language-independent statistical methods. There is a vast literature on this topic, and many different systems have been suggested to solve this problem. Our work follows and extends the methods introduced by Brown, Della Pietra, Della Pietra, and Mercer (1993) by using refined statistical models for the translation process. The basic idea of this approach is to develop a model of the translation process with the word alignment as a hidden variable of this process, to apply statistical estimation theory to compute the "optimal" model parameters, and to perform alignment search to compute the best word alignment
Measures of Distributional Similarity
- In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics
, 1999
"... We study distributional similarity measures for the purpose of improving probability estimation for unseen cooccurrences. Our contributions are three-fold: an empirical comparison of a broad range of measures; a classification of similarity functions based on the information that they incorporate; a ..."
Abstract
-
Cited by 173 (2 self)
- Add to MetaCart
We study distributional similarity measures for the purpose of improving probability estimation for unseen cooccurrences. Our contributions are three-fold: an empirical comparison of a broad range of measures; a classification of similarity functions based on the information that they incorporate; and the introduction of a novel function that is superior at evaluating potential proxy distributions.
Relational Learning Techniques for Natural Language Information Extraction
, 1998
"... The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a t ..."
Abstract
-
Cited by 73 (4 self)
- Add to MetaCart
The recent growth of online information available in the form of natural language documents creates a greater need for computing systems with the ability to process those documents to simplify access to the information. One type of processing appropriate for many tasks is information extraction, a type of text skimming that retrieves specific types of information from text. Although information extraction systems have existed for two decades, these systems have generally been built by hand and contain domain specific information, making them difficult to port to other domains. A few researchers have begun to apply machine learning to information extraction tasks, but most of this work has involved applying learning to pieces of a much larger system. This paper presents a novel rule representation specific to natural language and a learning system, Rapier, which learns information extraction rules. Rapier takes pairs of documents and filled templates indicating the information to be ext...
Learning Parse and Translation Decisions from Examples with Rich Context
, 1997
"... We present a knowledge and context-based system for parsing and translating natural language and evaluate it on sentences from the Wall Street Journal. Applying machine learning techniques, the system uses parse action examples acquired under supervision to generate a deterministic shift-reduce pars ..."
Abstract
-
Cited by 70 (18 self)
- Add to MetaCart
We present a knowledge and context-based system for parsing and translating natural language and evaluate it on sentences from the Wall Street Journal. Applying machine learning techniques, the system uses parse action examples acquired under supervision to generate a deterministic shift-reduce parser in the form of a decision structure. It relies heavily on context, as encoded in features which describe the morphological, syntactic, semantic and other aspects of a given parse state.
Finding Terminology Translations From Non-Parallel Corpora
, 1997
"... this paper, we present an initial algorithm for translating technical terms using a pair of non-parallel corpora. Evalution results show translation precisions at around 30% when only the top candidate is considered. While this precision is lower than that achieved with parallel corpora, we show tha ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
this paper, we present an initial algorithm for translating technical terms using a pair of non-parallel corpora. Evalution results show translation precisions at around 30% when only the top candidate is considered. While this precision is lower than that achieved with parallel corpora, we show that top 20 candidate output from our algorithm allows translators to increase their accuracy by 50.9%. In the following sections, we first describe a pair of non-parallel corpora we use for experiments, and then we introduce the Word Relation Matrix (WoRM), a statistical word feature representation for technical term translation from non-parallel corpora. We evaluate the effectiveness of this feature with two sets of experiments, using English/English, and English/Japanese non-parallel corpora. 2. BACKGROUND
A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts
- Proc. of the 35 th Annual Meetingof the Association for Computational Linguistics
, 1998
"... We present an algorithm for bilingual word alignment that extends previous work by treating multi-word candidates on a par with single words, and combining some simple assumptions about the translation process to capture alignments for low frequency words. As most other alignment algorithms it uses ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
We present an algorithm for bilingual word alignment that extends previous work by treating multi-word candidates on a par with single words, and combining some simple assumptions about the translation process to capture alignments for low frequency words. As most other alignment algorithms it uses co-occurrence statistics as a basis, but differs in the assumptions it makes about the translation process. The algorithm has been implemented in a modular system that allows the user to experiment with different combinations and variants of these assumptions. We give performance results from two ewfiuations, which compare well with results reported in the literature.
Automating knowledge acquisition for machine translation
- AI Mag
, 1997
"... How can we write a computer program to translate an English sentence into Japanese? Anyone who has taken a graduate-level course in Arti cial Intelligence knows the answer. First, compute the meaning of the English sentence. That is, convert it into logic or your favorite knowledge ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
How can we write a computer program to translate an English sentence into Japanese? Anyone who has taken a graduate-level course in Arti cial Intelligence knows the answer. First, compute the meaning of the English sentence. That is, convert it into logic or your favorite knowledge
Automatic Construction of Weighted String Similarity Measures
"... String similarity metrics are used for several purposes in text-processing. One task is the extraction of cognates from bilingual text. In this paper three approaches to the automatic generation of language dependent string matching functions are presented. ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
String similarity metrics are used for several purposes in text-processing. One task is the extraction of cognates from bilingual text. In this paper three approaches to the automatic generation of language dependent string matching functions are presented.
Using text processing techniques to automatically enrich a domain ontology
- In Proceedings of the ACM International Conference on Formal Ontology in Information Systems (FOIS
, 2001
"... Abstract- Though the utility of domain Ontologies is now widely acknowledged in an increasing number of domains, several barriers must be overcome before Ontologies become practical and useful tools. A critical issue is the task of identifying, defining, and entering the concept definitions. In case ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
Abstract- Though the utility of domain Ontologies is now widely acknowledged in an increasing number of domains, several barriers must be overcome before Ontologies become practical and useful tools. A critical issue is the task of identifying, defining, and entering the concept definitions. In case of large and complex application domains this task can be lengthy, costly, and controversial (since different persons may have different points of view about the same concept). To reduce time, cost (and, sometimes, harsh discussions) it is highly advisable to refer, in constructing or updating an ontology, to the documents available in the field. In this paper we describe OntoLearn, a text-mining tool devised to improve human productivity during the process of ontology construction. 1.
Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus
- In Proceedings of eightth CLIN meeting
, 1998
"... Within the project Twenty-One, which aims at effective dissemination of information on ecology and sustainable development, a system is developed that supports cross-language information retrieval for any of the four languages Dutch, English, French and German. Knowledge of this application domain i ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
Within the project Twenty-One, which aims at effective dissemination of information on ecology and sustainable development, a system is developed that supports cross-language information retrieval for any of the four languages Dutch, English, French and German. Knowledge of this application domain is neededto enhanceexisting translation resourcesfor the purpose of lexical disambiguation. This paper describes an algorithm for the automated acquisition of a translation lexicon from a parallel corpus. New about the presented algorithm is the statistical language model used. Because the algorithm is based on a symmetric translation model it becomespossible to identify one-to-many and many-to-one relations between words of a language pair. We claim that the presented method has two advantagesover algorithms that have been published before. Firstly, because the translation model is more powerful, the resulting bilingual lexicon will be more accurate. Secondly, the resulting bilingual lexicon can be used to translate in both directions between a language pair. Different versions of the algorithm were evaluated on the Dutch and English version of the Agenda 21 corpus, which is a UN document on the application domain of sustainable development. 1

