Results 1 - 10
of
12
From Baby Steps to Leapfrog: How “Less is More” in unsupervised dependency parsing
- IN NAACL-HLT
"... We present three approaches for unsupervised grammar induction that are sensitive to data complexity and apply them to Klein and Manning’s Dependency Model with Valence. The first, Baby Steps, bootstraps itself via iterated learning of increasingly longer sentences and requires no initialization. Th ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
We present three approaches for unsupervised grammar induction that are sensitive to data complexity and apply them to Klein and Manning’s Dependency Model with Valence. The first, Baby Steps, bootstraps itself via iterated learning of increasingly longer sentences and requires no initialization. This method substantially exceeds Klein and Manning’s published scores and achieves 39.4 % accuracy on Section 23 (all sentences) of the Wall Street Journal corpus. The second, Less is More, uses a low-complexity subset of the available data: sentences up to length 15. Focusing on fewer but simpler examples trades off quantity against ambiguity; it attains 44.1% accuracy, using the standard linguisticallyinformed prior and batch training, beating state-of-the-art. Leapfrog, our third heuristic, combines Less is More with Baby Steps by mixing their models of shorter sentences, then rapidly ramping up exposure to the full training set, driving up accuracy to 45.0%. These trends generalize to the Brown corpus; awareness of data complexity may improve other parsing models and unsupervised algorithms.
Efficient Parsing for Transducer Grammars
"... The tree-transducer grammars that arise in current syntactic machine translation systems are large, flat, and highly lexicalized. We address the problem of parsing efficiently with such grammars in three ways. First, we present a pair of grammar transformations that admit an efficient cubic-time CKY ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The tree-transducer grammars that arise in current syntactic machine translation systems are large, flat, and highly lexicalized. We address the problem of parsing efficiently with such grammars in three ways. First, we present a pair of grammar transformations that admit an efficient cubic-time CKY-style parsing algorithm despite leaving most of the grammar in n-ary form. Second, we show how the number of intermediate symbols generated by this transformation can be substantially reduced through binarization choices. Finally, we describe a two-pass coarse-to-fine parsing approach that prunes the search space using predictions from a subset of the original grammar. In all, parsing time reduces by 81%. We also describe a coarse-to-fine pruning scheme for forest-based language model reranking that allows a 100-fold increase in beam size while reducing decoding time. The resulting translations improve by 1.3 BLEU. 1
Consensus training for consensus decoding in machine translation
- In EMNLP
, 2009
"... We propose a novel objective function for discriminatively tuning log-linear machine translation models. Our objective explicitly optimizes the BLEU score of expected n-gram counts, the same quantities that arise in forestbased consensus and minimum Bayes risk decoding methods. Our continuous object ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We propose a novel objective function for discriminatively tuning log-linear machine translation models. Our objective explicitly optimizes the BLEU score of expected n-gram counts, the same quantities that arise in forestbased consensus and minimum Bayes risk decoding methods. Our continuous objective can be optimized using simple gradient ascent. However, computing critical quantities in the gradient necessitates a novel dynamic program, which we also present here. Assuming BLEU as an evaluation measure, our objective function has two principle advantages over standard max BLEU tuning. First, it specifically optimizes model weights for downstream consensus decoding procedures. An unexpected second benefit is that it reduces overfitting, which can improve test set BLEU scores when using standard Viterbi decoding. 1
Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
"... We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimalit ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimality, on over 97 % of test examples; it has comparable speed to state-of-the-art decoders. 1
Baby Steps: How “Less is More” in unsupervised dependency parsing
- In NIPS: Grammar Induction, Representation of Language and Language Learning
, 2009
"... We present an empirical study of two very simple approaches to unsupervised grammar induction. Both are based on Klein and Manning’s Dependency Model with Valence. The first, Baby Steps, requires no initialization and bootstraps itself via iterated learning of increasingly longer sentences. This met ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
We present an empirical study of two very simple approaches to unsupervised grammar induction. Both are based on Klein and Manning’s Dependency Model with Valence. The first, Baby Steps, requires no initialization and bootstraps itself via iterated learning of increasingly longer sentences. This method substantially exceeds Klein and Manning’s published numbers and achieves 39.4 % accuracy on Section 23 of the Wall Street Journal corpus — a result that is already competitive with the recent state-of-the-art. The second, Less is More, is based on the observation that there is sometimes a trade-off between the quantity and complexity of training data. Using the standard linguistically-informed prior but training at the “sweet spot ” — sentences up to length 15, it attains 44.1 % accuracy, beating state-of-the-art. Both results generalize to the Brown corpus and shed light on opportunities in the present state of unsupervised dependency parsing. 1
Quadratic-Time Dependency Parsing for Machine Translation
"... Efficiency is a prime concern in syntactic MT decoding, yet significant developments in statistical parsing with respect to asymptotic efficiency haven’t yet been explored in MT. Recently, McDonald et al. (2005b) formalized dependency parsing as a maximum spanning tree (MST) problem, which can be so ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Efficiency is a prime concern in syntactic MT decoding, yet significant developments in statistical parsing with respect to asymptotic efficiency haven’t yet been explored in MT. Recently, McDonald et al. (2005b) formalized dependency parsing as a maximum spanning tree (MST) problem, which can be solved in quadratic time relative to the length of the sentence. They show that MST parsing is almost as accurate as cubic-time dependency parsing in the case of English, and that it is more accurate with free word order languages. This paper applies MST parsing to MT, and describes how it can be integrated into a phrase-based decoder to compute dependency language model scores. Our results show that augmenting a state-ofthe-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets. 1
Cube pruning as heuristic search
- In Proceedings of EMNLP
, 2009
"... Cube pruning is a fast inexact method for generating the items of a beam decoder. In this paper, we show that cube pruning is essentially equivalent to A * search on a specific search space with specific heuristics. We use this insight to develop faster and exact variants of cube pruning. 1 ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Cube pruning is a fast inexact method for generating the items of a beam decoder. In this paper, we show that cube pruning is essentially equivalent to A * search on a specific search space with specific heuristics. We use this insight to develop faster and exact variants of cube pruning. 1
Simple, Accurate Parsing with an All-Fragments Grammar
"... We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, determini ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88 % F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding. 1
Grammar based statistical MT on Hadoop
, 2009
"... An end-to-end toolkit for large scale PSCFG based MT ..."
Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
"... We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimalit ..."
Abstract
- Add to MetaCart
We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimality, on over 97 % of test examples; it has comparable speed to state-of-the-art decoders. 1

