Results 1 - 10
of
14
Improved Statistical Alignment Models
- In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics
, 2000
"... In this paper, we present and compare various single-word based alignment models for statistical machine translation. We discuss the five IBM alignment models, the Hidden-Markov alignment model, smoothing techniques and various modifications. ..."
Abstract
-
Cited by 341 (9 self)
- Add to MetaCart
In this paper, we present and compare various single-word based alignment models for statistical machine translation. We discuss the five IBM alignment models, the Hidden-Markov alignment model, smoothing techniques and various modifications.
Improved Alignment Models for Statistical Machine Translation
- University of Maryland, College Park, MD
, 1999
"... In this paper, we describe improved alignment models for statistical machine translation. The statistical translation approach uses two types of information: a translation model and a lan- guage model. The language model used is a bigram or general m-gram model. The translation model is decomp ..."
Abstract
-
Cited by 205 (38 self)
- Add to MetaCart
In this paper, we describe improved alignment models for statistical machine translation. The statistical translation approach uses two types of information: a translation model and a lan- guage model. The language model used is a bigram or general m-gram model. The translation model is decomposed into a lexical and an alignment model. We describe two different approaches for statistical translation and present experimental results. The first approach is based on dependencies between single words, the second approach explicitly takes shallow phrase structures into account, using two different alignment levels: a phrase level alignment between phrases and a word level alignment between single words. We present results us- ing the Verbmobil task (German-English, 6000word vocabulary) which is a limited-domain spoken-language task. The experimental tests were performed on both the text transcription and the speech recognizer output.
A Maximum Entropy Approach to Adaptive Statistical Language Modeling
- Computer, Speech and Language
, 1996
"... An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's histor ..."
Abstract
-
Cited by 201 (11 self)
- Add to MetaCart
An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's history, we propose and use trigger pairs as the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse. Next, statistical evidence from multiple sources must be combined. Traditionally, linear interpolation and its variants have been used, but these are shown here to be seriously deficient. Instead, we apply the principle of Maximum Entropy (ME). Each information source gives rise to a set of constraints, to be imposed on the combined estimate. The intersection of these constraints is the set of probability functions which are consistent with all the information sources. The function with the highest entropy within that set is the ME solution...
Distributional Part-of-Speech Tagging
- In Proc. of 7th Conference of the European Chapter of the Association for Computational Linguistics
, 1995
"... This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus. ..."
Abstract
-
Cited by 75 (6 self)
- Add to MetaCart
This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. Unlike previous work, the algorithm categorizes word tokens in context instead of word types. The algorithm is evaluated on the Brown Corpus.
An Efficient Method for Determining Bilingual Word Classes
"... In statistical natural language processing we always face the problem of sparse data. One way to reduce this problem is to group words into equivalence classes which is a standard method in statistical language modeling. In this paper we describe a method to determine bilingual word classes s ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
In statistical natural language processing we always face the problem of sparse data. One way to reduce this problem is to group words into equivalence classes which is a standard method in statistical language modeling. In this paper we describe a method to determine bilingual word classes suitable for statistical ma- chine translation. We develop an opti- mization criterion based on a maximum- likelihood approach and describe a clustering algorithm. We will show that the usage of the bilingual word classes we get can improve statistical machine transla- tion.
Improving Statistical Language Model Performance with Automatically Generated Word Hierarchies
- COMPUTATIONAL LINGUISTICS
, 2003
"... ..."
Lattice Based Language Models
, 1997
"... This paper introduces lattice based language models, a new language modeling paradigm. These models construct multi-dimensional hierarchies of partitions and select the most promising partitions to generate the estimated distributions. We discussed a specific two dimensional lattice and propose two ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper introduces lattice based language models, a new language modeling paradigm. These models construct multi-dimensional hierarchies of partitions and select the most promising partitions to generate the estimated distributions. We discussed a specific two dimensional lattice and propose two primary features to measure the usefulness of each node: the training-set history count and the smoothed entropy of its prediction. Smoothing techniques are reviewed and a generalization of the conventional backoff strategy to multiple dimensions is proposed. Preliminary experimental results are obtained on the SWITCHBOARD corpus which lead to a 6.5 % perplexity reduction over a word trigram model. Project sponsored by the National Security Agency under Grant No. MDA904-97-10006. The United States Government is authorized to reproduce and distribute reprints notwithstanding any copyright notation hereon. y Current address: D'ept. Math., Universit'e Jean Monnet, 23, rue P. Michelon, 42023 S...
Distributional Information and the Acquisition of Linguistic Categories: A Statistical Approach
- In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society
, 1993
"... Distributional information, in the form of simple, locally computed statistics of an input corpus, provides a potential means of establishing initial syntactic categories (noun, verb, etc.). Finch and Chater (1991, 1992) clustered words hierarchically, according to the distribution of local contexts ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Distributional information, in the form of simple, locally computed statistics of an input corpus, provides a potential means of establishing initial syntactic categories (noun, verb, etc.). Finch and Chater (1991, 1992) clustered words hierarchically, according to the distribution of local contexts in which they appeared in large, written English corpora, obtaining clusters that corresponded well with the standard syntactic categories. Here, a stronger demonstration of their method is provided, using `real' data, that to which children are exposed during category acquisition, taken from the childes corpus. For 2\Delta5 million words of adult speech, clustering on syntactic and semantic bases was observed, with a high degree of clear differentiation between syntactic categories. For child data, some noun and verb clusters emerged, with some evidence of other categories, but the data set was too small for reliable trends to emerge. Some initial results investigating the possibility of c...
Statistical Classification Methods for Arabic News Articles
- Arabic Natural Language Processing in ACL2001
, 2001
"... In this paper, we present experimental results on document clustering and classification achieved on the Arabic NEWSWIRE corpus using statistical methods. Arabic is a highly inflecting language. The methods presented here show to be very robust and reliable without morphological analysis. 1 ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper, we present experimental results on document clustering and classification achieved on the Arabic NEWSWIRE corpus using statistical methods. Arabic is a highly inflecting language. The methods presented here show to be very robust and reliable without morphological analysis. 1
Statistical Language Processing based on Self-Organising Word Classification
, 1994
"... An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a type of simulated annealing which employs an average class mutual information metric. Resulting class ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a type of simulated annealing which employs an average class mutual information metric. Resulting classifications are hierarchical, allowing variable class granularity. Words are represented as structural tags --- unique n-bit numbers the most significant bit-patterns of which incorporate class information. Therefore, access to a structural tag immediately provides access to all classification levels for the corresponding word. The classification system has successfully revealed some of the structure of two natural languages, from the phonemic to the semantic level. The system has been favourably compared --- directly and indirectly --- with other word classification systems. Class based interpolated language models have been constructed to exploit the extra information supplied by structural...

