• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Automatic discovery of non-compositional compounds in parallel data (1997)

by I D Melamed
Venue:University
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 44
Next 10 →

The Web as a Parallel Corpus

by Philip Resnik, Noah A. Smith - Computational Linguistics , 2003
"... Parallel corpora have become an essential resource for work in multilingual natural language processing. In this report, we describe our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of signif ..."
Abstract - Cited by 101 (3 self) - Add to MetaCart
Parallel corpora have become an essential resource for work in multilingual natural language processing. In this report, we describe our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements. These enhancements include the use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale.

Decoding Complexity in Word-Replacement Translation Models

by Kevin Knight - Computational Linguistics , 1999
"... This paper looks at decoding complexity. ..."
Abstract - Cited by 84 (4 self) - Add to MetaCart
This paper looks at decoding complexity.

Statistical Machine Translation

by Yaser Al-onaizan, Jan Curin, Michael Jahr, Kevin Knight, John Lafferty, Dan Melamed, Franz-Josef Och, David Purdy, Noah A. Smith, David Yarowsky - Final Report, JHU Summer Workshop , 1999
"... Automatic translation from one human language to another using computers, better known as machine translation (MT), is a longstanding goal of computer science. In order to be able to perform such a task, the computer must "know" the two languages---synonyms for words and phrases, grammars of the two ..."
Abstract - Cited by 67 (9 self) - Add to MetaCart
Automatic translation from one human language to another using computers, better known as machine translation (MT), is a longstanding goal of computer science. In order to be able to perform such a task, the computer must "know" the two languages---synonyms for words and phrases, grammars of the two languages, and semantic or world knowledge. One way to incorporate such knowledge into a computer is to use bilingual experts to hand-craft the necessary information into the computer program. Another is to let the computer learn some of these things automatically by examining large amounts of parallel text: documents which are translations of each other. The Canadian government produces one such resource, for example, in the form of parliamentary proceedings which are recorded in both English and French. Recently, statistical data analysis has been used to gather MT knowledge automatically from parallel bilingual text. Unfortunately, these techniques and tools have not been dissem...

An Empirical Model of Multiword Expression Decomposability

by Timothy Baldwin , Colin Bannard, Takaaki Tanaka, Dominic Widdows - IN PROCEEDINGS OF THE ACL-SIGLEX WORKSHOP ON MULTIWORD EXPRESSIONS: ANALYSIS, ACQUISITION AND TREATMENT , 2003
"... This paper presents a constructioninspecific model of multiword expression decomposability based on latent semantic analysis. We use latent semantic analysis to determine the similarity between a multiword expression and its constituent words, and claim that higher similarities indicate great ..."
Abstract - Cited by 65 (9 self) - Add to MetaCart
This paper presents a constructioninspecific model of multiword expression decomposability based on latent semantic analysis. We use latent semantic analysis to determine the similarity between a multiword expression and its constituent words, and claim that higher similarities indicate greater decomposability. We test the model over English noun-noun compounds and verb-particles, and evaluate its correlation with similarities and hyponymy values in WordNet. Based on mean hyponymy over partitions of data ranked on similarity, we furnish evidence for the calculated similarities being correlated with the semantic relational content of WordNet.

A Statistical Approach to the Semantics of Verb-Particles

by Colin Bannard, Timothy Baldwin, Alex Lascarides - IN PROCEEDINGS OF THE ACL-SIGLEX WORKSHOP ON MULTIWORD EXPRESSIONS: ANALYSIS, ACQUISITION AND TREATMENT , 2003
"... This paper describes a distributional approach to the semantics of verb-particle constructions (e.g. put up, make off ). We report first on a framework for implementing and evaluating such models. We then go on to report on the implementation of some techniques for using statistical models acq ..."
Abstract - Cited by 37 (4 self) - Add to MetaCart
This paper describes a distributional approach to the semantics of verb-particle constructions (e.g. put up, make off ). We report first on a framework for implementing and evaluating such models. We then go on to report on the implementation of some techniques for using statistical models acquired from corpus data to infer the meaning of verb-particle constructions.

A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts

by Lars Ahrenberg, Mikael Andersson, Magnus Merkel - Proc. of the 35 th Annual Meetingof the Association for Computational Linguistics , 1998
"... We present an algorithm for bilingual word alignment that extends previous work by treating multi-word candidates on a par with single words, and combining some simple assumptions about the translation process to capture alignments for low frequency words. As most other alignment algorithms it uses ..."
Abstract - Cited by 33 (3 self) - Add to MetaCart
We present an algorithm for bilingual word alignment that extends previous work by treating multi-word candidates on a par with single words, and combining some simple assumptions about the translation process to capture alignments for low frequency words. As most other alignment algorithms it uses co-occurrence statistics as a basis, but differs in the assumptions it makes about the translation process. The algorithm has been implemented in a modular system that allows the user to experiment with different combinations and variants of these assumptions. We give performance results from two ewfiuations, which compare well with results reported in the literature.

Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text

by Philip Resnik - In Third Conference of the Association for Machine Translation in the Americas , 1998
"... . Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genreand domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel c ..."
Abstract - Cited by 29 (2 self) - Add to MetaCart
. Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genreand domain-specificity, licensing restrictions, and the basic difficulty of locating parallel texts in all but the most dominant of the world's languages. A parallel corpus resource not yet explored is the World Wide Web, which hosts an abundance of pages in parallel translation, offering a potential solution to some of these problems and unique opportunities of its own. This paper presents the necessary first step in that exploration: a method for automatically finding parallel translated documents on the Web. The technique is conceptually simple, fully language independent, and scalable, and preliminary evaluation results indicate that the method may be accurate enough to apply without human intervention. 1 Introduction In recent years large parallel corpora have taken on an important role as resources in machine translation and multilingual natural la...

Multilingual domain modeling in Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus

by Djoerd Hiemstra - In Proceedings of eightth CLIN meeting , 1998
"... Within the project Twenty-One, which aims at effective dissemination of information on ecology and sustainable development, a system is developed that supports cross-language information retrieval for any of the four languages Dutch, English, French and German. Knowledge of this application domain i ..."
Abstract - Cited by 24 (4 self) - Add to MetaCart
Within the project Twenty-One, which aims at effective dissemination of information on ecology and sustainable development, a system is developed that supports cross-language information retrieval for any of the four languages Dutch, English, French and German. Knowledge of this application domain is neededto enhanceexisting translation resourcesfor the purpose of lexical disambiguation. This paper describes an algorithm for the automated acquisition of a translation lexicon from a parallel corpus. New about the presented algorithm is the statistical language model used. Because the algorithm is based on a symmetric translation model it becomespossible to identify one-to-many and many-to-one relations between words of a language pair. We claim that the presented method has two advantagesover algorithms that have been published before. Firstly, because the translation model is more powerful, the resulting bilingual lexicon will be more accurate. Secondly, the resulting bilingual lexicon can be used to translate in both directions between a language pair. Different versions of the algorithm were evaluated on the Dutch and English version of the Agenda 21 corpus, which is a UN document on the application domain of sustainable development. 1

Creating a Parallel Corpus from the "Book of 2000 Tongues"

by Philip Resnik, Mari Broman Olsen, Mona Diab - In Proceedings of the Text Encoding Initiative Tenth Anniversary User Conference, Brown University , 1998
"... This paper reports on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this p ..."
Abstract - Cited by 16 (0 self) - Add to MetaCart
This paper reports on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the near-sentence level. We discuss the nature of the text, our annotation process, and intended uses for the corpus, and we point out relevant aspects and potential limitations of the current draft of the Corpus Encoding Standard with respect to this corpus. 2 Why this text? 2.1 The nature of the text The Bible is a widely available, representative sample of carefully translated texts in a variety of styles in a wide range of languages. These pr...

Combining Clues for Word Alignment

by Jörg Tiedemann - In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL): 12–17 April 2003; Budapest Programme chairs Copestake A, Hajic J , 2003
"... In this paper, a word alignment approach is presented which is based on a combination of clues. Word alignment clues indicate associations between words and phrases. They can be based on features such as frequency, part-of-speech, phrase type, and the actual wordform strings. Clues can be found by c ..."
Abstract - Cited by 16 (0 self) - Add to MetaCart
In this paper, a word alignment approach is presented which is based on a combination of clues. Word alignment clues indicate associations between words and phrases. They can be based on features such as frequency, part-of-speech, phrase type, and the actual wordform strings. Clues can be found by calculating similarity measures or learned from word aligned data. The clue alignment approach...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University