Results 1 
9 of
9
TypeBased MCMC
"... Most existing algorithms for learning latentvariable models—such as EM and existing Gibbs samplers—are tokenbased, meaning that they update the variables associated with one sentence at a time. The incremental nature of these methods makes them susceptible to local optima/slow mixing. In this paper ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Most existing algorithms for learning latentvariable models—such as EM and existing Gibbs samplers—are tokenbased, meaning that they update the variables associated with one sentence at a time. The incremental nature of these methods makes them susceptible to local optima/slow mixing. In this paper, we introduce a typebased sampler, which updates a block of variables, identified by a type, which spans multiple sentences. We show improvements on partofspeech induction, word segmentation, and learning treesubstitution grammars. 1
Why Synchronous Tree Substitution Grammars?
"... Synchronous tree substitution grammars are a translation model that is used in syntaxbased machine translation. They are investigated in a formal setting and compared to a competitor that is at least as expressive. The competitor is the extended multi bottomup tree transducer, which is the bottom ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
Synchronous tree substitution grammars are a translation model that is used in syntaxbased machine translation. They are investigated in a formal setting and compared to a competitor that is at least as expressive. The competitor is the extended multi bottomup tree transducer, which is the bottomup analogue with one essential additional feature. This model has been investigated in theoretical computer science, but seems widely unknown in natural language processing. The two models are compared with respect to standard algorithms (binarization, regular restriction, composition, application). Particular attention is paid to the complexity of the algorithms. 1
SCFG Decoding Without Binarization
"... Conventional wisdom dictates that synchronous contextfree grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al., 2006). A drawback to this approac ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Conventional wisdom dictates that synchronous contextfree grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al., 2006). A drawback to this approach is that it inflates the constant factors associated with decoding, and thus the practical running time. (DeNero et al., 2009) tackle this problem by defining a superset of CNF called Lexical Normal Form (LNF), which also supports cubic time decoding under certain implicit assumptions. In this paper, we make these assumptions explicit, and in doing so, show that LNF can be further expanded to a broader class of grammars (called “scope3”) that also supports cubictime decoding. By simply pruning nonscope3 rules from a GHKMextracted grammar, we obtain better translation performance than synchronous binarization. 1
Cube pruning as heuristic search
 In Proceedings of EMNLP
, 2009
"... Cube pruning is a fast inexact method for generating the items of a beam decoder. In this paper, we show that cube pruning is essentially equivalent to A * search on a specific search space with specific heuristics. We use this insight to develop faster and exact variants of cube pruning. 1 ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Cube pruning is a fast inexact method for generating the items of a beam decoder. In this paper, we show that cube pruning is essentially equivalent to A * search on a specific search space with specific heuristics. We use this insight to develop faster and exact variants of cube pruning. 1
Tree Parsing with Synchronous TreeAdjoining Grammars
, 2011
"... Restricting the input or the output of a grammarinduced translation to a given set of trees plays an important role in statistical machine translation. The problem for practical systems is to find a compact (and in particular, finite) representation of said restriction. For the class of synchronous ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Restricting the input or the output of a grammarinduced translation to a given set of trees plays an important role in statistical machine translation. The problem for practical systems is to find a compact (and in particular, finite) representation of said restriction. For the class of synchronous treeadjoining grammars, partial solutions to this problem have been described, some being restricted to the unweighted case, some to the monolingual case. We introduce a formulation of this class of grammars which is effectively closed under input and output restrictions to regular tree languages, i.e., the restricted translations can again be represented by grammars. Moreover, we present an algorithm that constructs these grammars for input and output restriction, which is inspired by Earley’s algorithm.
Weight pushing and binarization for fixedgrammar parsing
"... We apply the idea of weight pushing (Mohri, 1997) to CKY parsing with fixed contextfree grammars. Applied after rule binarization, weight pushing takes the weight from the original grammar rule and pushes it down across its binarized pieces, allowing the parser to make better pruning decisions earl ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We apply the idea of weight pushing (Mohri, 1997) to CKY parsing with fixed contextfree grammars. Applied after rule binarization, weight pushing takes the weight from the original grammar rule and pushes it down across its binarized pieces, allowing the parser to make better pruning decisions earlier in the parsing process. This process can be viewed as generalizing weight pushing from transducers to hypergraphs. We examine its effect on parsing efficiency with various binarization schemes applied to tree substitution grammars from previous work. We find that weight pushing produces dramatic improvements in efficiency, especially with small amounts of time and with large grammars. 1
Using Categorial Grammar to Label Translation Rules
"... Adding syntactic labels to synchronous contextfree translation rules can improve performance, but labeling with phrase structure constituents, as in GHKM (Galley et al., 2004), excludes potentially useful translation rules. SAMT (Zollmann and Venugopal, 2006) introduces heuristics to create new non ..."
Abstract
 Add to MetaCart
Adding syntactic labels to synchronous contextfree translation rules can improve performance, but labeling with phrase structure constituents, as in GHKM (Galley et al., 2004), excludes potentially useful translation rules. SAMT (Zollmann and Venugopal, 2006) introduces heuristics to create new nonconstituent labels, but these heuristics introduce many complex labels and tend to add rarelyapplicable rules to the translation grammar. We introduce a labeling scheme based on categorial grammar, which allows syntactic labeling of many rules with a minimal, wellmotivated label set. We show that our labeling scheme performs comparably to SAMT on an Urdu–English translation task, yet the label set is an order of magnitude smaller, and translation is twice as fast.
An Extended GHKM Algorithm for Inducing λSCFG
 ACL05 WORKSHOP ON SOFTWARE PROCEEDINGS OF THE WORKSHOP
, 2013
"... Semantic parsing, which aims at mapping a natural language (NL) sentence into its formal meaning representation (e.g., logical form), has received increasing attention in recent years. While synchronous contextfree grammar (SCFG) augmented with lambda calculus (λSCFG) provides an effective mechani ..."
Abstract
 Add to MetaCart
Semantic parsing, which aims at mapping a natural language (NL) sentence into its formal meaning representation (e.g., logical form), has received increasing attention in recent years. While synchronous contextfree grammar (SCFG) augmented with lambda calculus (λSCFG) provides an effective mechanism for semantic parsing, how to learn such λSCFG rules still remains a challenge because of the difficulty in determining the correspondence between NL sentences and logical forms. To alleviate this structural divergence problem, we extend the GHKM algorithm, which is a stateoftheart algorithm for learning synchronous grammars in statistical machine translation, to induce λSCFG from pairs of NL sentences and logical forms. By treating logical forms as trees, we reformulate the theory behind GHKM that gives formal semantics to the alignment between NL words and logical form tokens. Experiments on the GEOQUERY dataset show that our semantic parser achieves an Fmeasure of 90.2%, the best result published to date.