Results 1  10
of
67
Minimum Error Rate Training in Statistical Machine Translation
, 2003
"... Often, the training procedure for statistical machine translation models is based on maximum likelihood or related criteria. A general problem of this approach is that there is only a loose relation to the final translation quality on unseen text. In this paper, we analyze various training cri ..."
Abstract

Cited by 452 (5 self)
 Add to MetaCart
Often, the training procedure for statistical machine translation models is based on maximum likelihood or related criteria. A general problem of this approach is that there is only a loose relation to the final translation quality on unseen text. In this paper, we analyze various training criteria which directly optimize translation quality.
Learning Accurate, Compact, and Interpretable Tree Annotation
 In ACL ’06
, 2006
"... We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple Xbar grammar, we learn a new grammar whose nonterminals are subsymbols of the original nonterminals. In co ..."
Abstract

Cited by 283 (36 self)
 Add to MetaCart
We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple Xbar grammar, we learn a new grammar whose nonterminals are subsymbols of the original nonterminals. In contrast with previous work, we are able to split various terminals to different degrees, as appropriate to the actual complexity in the data. Our grammars automatically learn the kinds of linguistic distinctions exhibited in previous work on manual tree annotation. On the other hand, our grammars are much more compact and substantially more accurate than previous work on automatic annotation. Despite its simplicity, our best grammar achieves an F1 of 90.2 % on the Penn Treebank, higher than fully lexicalized systems. 1
Fast exact inference with a factored model for natural language parsing
 In: NIPS, Volume 15
, 2003
"... We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization provides conceptual simplicity, straightforward opportunities for separately improving the component mod ..."
Abstract

Cited by 200 (7 self)
 Add to MetaCart
We present a novel generative model for natural language tree structures in which semantic (lexical dependency) and syntactic (PCFG) structures are scored with separate models. This factorization provides conceptual simplicity, straightforward opportunities for separately improving the component models, and a level of performance comparable to similar, nonfactored models. Most importantly, unlike other modern parsing models, the factored model admits an extremely effective A * parsing algorithm, which enables efficient, exact inference. 1
Improved Inference for Unlexicalized Parsing
, 2007
"... We present several improvements to unlexicalized parsing with hierarchically statesplit PCFGs. First, we present a novel coarsetofine method in which a grammar’s own hierarchical projections are used for incremental pruning, including a method for efficiently computing projections of a grammar wi ..."
Abstract

Cited by 180 (21 self)
 Add to MetaCart
We present several improvements to unlexicalized parsing with hierarchically statesplit PCFGs. First, we present a novel coarsetofine method in which a grammar’s own hierarchical projections are used for incremental pruning, including a method for efficiently computing projections of a grammar without a treebank. In our experiments, hierarchical pruning greatly accelerates parsing with no loss in empirical accuracy. Second, we compare various inference procedures for statesplit PCFGs from the standpoint of risk minimization, paying particular attention to their practical tradeoffs. Finally, we present multilingual experiments which show that parsing with hierarchical statesplitting is fast and accurate in multiple languages and domains, even without any languagespecific tuning.
Parsing the WSJ using CCG and loglinear models
 In Proceedings of the 42nd Meeting of the ACL
, 2004
"... This paper describes and evaluates loglinear parsing models for Combinatory Categorial Grammar (CCG). A parallel implementation of the LBFGS optimisation algorithm is described, which runs on a Beowulf cluster allowing the complete Penn Treebank to be used for estimation. We also develop a new eff ..."
Abstract

Cited by 165 (21 self)
 Add to MetaCart
This paper describes and evaluates loglinear parsing models for Combinatory Categorial Grammar (CCG). A parallel implementation of the LBFGS optimisation algorithm is described, which runs on a Beowulf cluster allowing the complete Penn Treebank to be used for estimation. We also develop a new efficient parsing algorithm for CCG which maximises expected recall of dependencies. We compare models which use all CCG derivations, including nonstandard derivations, with normalform models. The performances of the two models are comparable and the results are competitive with existing widecoverage CCG parsers.
Widecoverage efficient statistical parsing with CCG and loglinear models
 COMPUTATIONAL LINGUISTICS
, 2007
"... This paper describes a number of loglinear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminativ ..."
Abstract

Cited by 149 (34 self)
 Add to MetaCart
This paper describes a number of loglinear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Penn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (over 20 GB), which is satisfied using a parallel implementation of the BFGS optimisation algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largestscale estimation problems in the statistical parsing literature in under three hours. A key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The supertagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly,
Minimum bayesrisk decoding for statistical machine translation
 In Proceedings of HLTNAACL
, 2004
"... We present Minimum BayesRisk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of l ..."
Abstract

Cited by 116 (13 self)
 Add to MetaCart
We present Minimum BayesRisk (MBR) decoding for statistical machine translation. This statistical approach aims to minimize expected loss of translation errors under loss functions that measure translation performance. We describe a hierarchy of loss functions that incorporate different levels of linguistic information from word strings, wordtoword alignments from an MT system, and syntactic structure from parsetrees of source and target language sentences. We report the performance of the MBR decoders on a ChinesetoEnglish translation task. Our results show that MBR decoding can be used to tune statistical MT performance for specific loss functions. 1
Statistical Techniques for Natural Language Parsing
 AI Magazine
, 1997
"... We review current statistical work on syntactic parsing and then consider partofspeech tagging, which was the first syntactic problem to be successfully attacked by statistical techniques and also serves as a good warmup for the main topic, statistical parsing. Here we consider both the simplif ..."
Abstract

Cited by 89 (1 self)
 Add to MetaCart
We review current statistical work on syntactic parsing and then consider partofspeech tagging, which was the first syntactic problem to be successfully attacked by statistical techniques and also serves as a good warmup for the main topic, statistical parsing. Here we consider both the simplified case in which the input string is viewed as a string of parts of speech, and the more interesting case in which the parser is guided by statistical information about the particular words in the sentence. Finally we anticipate future research directions. 1 Introduction Syntactic parsing is the process of assigning a "phrase marker" to a sentence  that is, the process that given a sentence like "The dog ate," produces a structure like that in Figure 1. In this example we adopt the standard abbreviations: np for "noun phrase," vp for "verb phrase," and det for "determiner." It is generally accepted that finding the sort of structure shown in Figure 1 is useful in determining the m...
Parsing InsideOut
, 1998
"... Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probabili ..."
Abstract

Cited by 82 (2 self)
 Add to MetaCart
Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probability that any given nonterminal covers any piece of the input sentence. The traditional use of these probabilities is to improve the probabilities of grammar rules. In this thesis we show that these values are useful for solving many other problems in Statistical Natural Language Processing. We give a framework for describing parsers. The framework generalizes the inside and outside values to semirings. It makes it easy to describe parsers that compute a wide variety of interesting quantities, including the inside and outside probabilities, as well as related quantities such as Viterbi probabilities and nbest lists. We also present three novel uses for the inside and outside probabilities. T...
Probabilistic CFG with Latent Annotations
, 2005
"... This paper defines a generative probabilistic model of parse trees, which we call PCFGLA. This model is an extension of PCFG in which nonterminal symbols are augmented with latent variables. Finegrained CFG rules are automatically induced from a parsed corpus by training a PCFGLA model using an E ..."
Abstract

Cited by 68 (1 self)
 Add to MetaCart
This paper defines a generative probabilistic model of parse trees, which we call PCFGLA. This model is an extension of PCFG in which nonterminal symbols are augmented with latent variables. Finegrained CFG rules are automatically induced from a parsed corpus by training a PCFGLA model using an EMalgorithm. Because exact parsing with a PCFGLA is NPhard, several approximations are described and empirically compared. In experiments using the Penn WSJ corpus, our automatically trained model gave a performance of 86.6 % (F ¥ , sentences ¦ 40 words), which is comparable to that of an unlexicalized PCFG parser created using extensive manual feature selection.