Results 1 -
2 of
2
Statistical Machine Translation
- Final Report, JHU Summer Workshop
, 1999
"... Automatic translation from one human language to another using computers, better known as machine translation (MT), is a longstanding goal of computer science. In order to be able to perform such a task, the computer must "know" the two languages---synonyms for words and phrases, grammars of the two ..."
Abstract
-
Cited by 67 (9 self)
- Add to MetaCart
Automatic translation from one human language to another using computers, better known as machine translation (MT), is a longstanding goal of computer science. In order to be able to perform such a task, the computer must "know" the two languages---synonyms for words and phrases, grammars of the two languages, and semantic or world knowledge. One way to incorporate such knowledge into a computer is to use bilingual experts to hand-craft the necessary information into the computer program. Another is to let the computer learn some of these things automatically by examining large amounts of parallel text: documents which are translations of each other. The Canadian government produces one such resource, for example, in the form of parliamentary proceedings which are recorded in both English and French. Recently, statistical data analysis has been used to gather MT knowledge automatically from parallel bilingual text. Unfortunately, these techniques and tools have not been dissem...
Statistical Parsing Algorithms for Lexicalized Tree Adjoining Grammars
"... The goal of this dissertation is two-fold: to develop the theory of probabilistic Tree Adjoining Grammars (TAGs) and to present some practical results in the form of efficient parsing and estimation algorithms for probabilistic TAGs. The overall goal of developing the theory of probabilistic TAGs is ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The goal of this dissertation is two-fold: to develop the theory of probabilistic Tree Adjoining Grammars (TAGs) and to present some practical results in the form of efficient parsing and estimation algorithms for probabilistic TAGs. The overall goal of developing the theory of probabilistic TAGs is to provide a simple, mathematically and linguistically well-formed probabilistic framework for statistical parsing. The practical results in parsing and estimation of probabilistic TAGs are developed with a view towards an increasingly unsupervised approach to the training of statistical parsers and language models. In particular, this proposal contains the following results: An algorithm for determining deficiency in a generative model for probabilistic TAGs. Anovel chart based head-corner parsing algorithm for probabilistic TAGs. A probability model for statistical parsing and a co-training method for training this parser which combines labeled and unlabeled data. An algorithm for computing prefix probabilities which can be used to predict the word most likely to occur after an initial substring of the input. The proposed work can be summarized in the following points: A separate evaluation of the co-training algorithm on a larger set of labeled and unlabeled data, in addition to the evaluation presented in this proposal. An evaluation of the pre x probability algorithm by comparing it with a trigram language model. An extension of techniques in learning subcategorization information and verb classes to produce TAG lexicons which can be directly used to improve performance of the co-training algorithm.

