## Compiling Comp Ling: Practical weighted dynamic programming and the Dyna language (2005)

### Cached

### Download Links

Venue: | In Advances in Probabilistic and Other Parsing |

Citations: | 13 - 8 self |

### BibTeX

@INPROCEEDINGS{Eisner05compilingcomp,

author = {Jason Eisner and Eric Goldlust and Noah A. Smith},

title = {Compiling Comp Ling: Practical weighted dynamic programming and the Dyna language},

booktitle = {In Advances in Probabilistic and Other Parsing},

year = {2005}

}

### Years of Citing Articles

### OpenURL

### Abstract

Weighted deduction with aggregation is a powerful theoretical formalism that encompasses many NLP algorithms. This paper proposes a declarative specification language, Dyna; gives general agenda-based algorithms for computing weights and gradients; briefly discusses Dyna-to-Dyna program transformations; and shows that a first implementation of a Dyna-to-C++ compiler produces code that is efficient enough for real NLP research, though still several times slower than hand-crafted code. 1

### Citations

2306 | Conditional random fields: probabilistic models for segmenting and labeling sequence data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...ith, 2004; Smith et al., 2005), Dyna let us quickly replicate, tweak, and combine useful techniques from the literature. These techniques included unweighted FS morphology, conditional random fields (=-=Lafferty et al., 2001-=-), synchronous parsers (Wu, 1997; Melamed, 2003), lexicalized parsers (Eisner and Satta, 1999), 22 partially supervised training à la (Pereira and Schabes, 1992), 23 and grammar induction (Klein and M... |

953 | Head-Driven Statistical Models for Natural Language Parsing - Collins - 1999 |

682 | Accurate Unlexicalized Parsing - Klein, Manning - 2003 |

652 | An efficient context-free parsing algorithm
- EARLEY
- 1970
(Show Context)
Citation Context ...ent covering the sentence 6. need(Nonterm,J) += constit( /cons(Nonterm, ), ,J). % Note: underscore matches anything (anonymous wildcard) Figure 2: An Earley parser that recovers inside probabilities (=-=Earley, 1970-=-; Stolcke, 1995). The rule np → det n should be encoded as the axiom rewrite(“np”,cons(“det”,cons(“n”,nil))), a nested term. “np”/Needed is the label of a partial np constituent that is still missing ... |

427 | Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
- Wu
- 1997
(Show Context)
Citation Context ...uickly replicate, tweak, and combine useful techniques from the literature. These techniques included unweighted FS morphology, conditional random fields (Lafferty et al., 2001), synchronous parsers (=-=Wu, 1997-=-; Melamed, 2003), lexicalized parsers (Eisner and Satta, 1999), 22 partially supervised training à la (Pereira and Schabes, 1992), 23 and grammar induction (Klein and Manning, 2002). These replication... |

413 | Zipser D. A learning algorithm for continually running fully recurrent neural networks
- Williams
- 1989
(Show Context)
Citation Context ...tion. To do this, we “unwind” the computation of goal, undoing the value updates while building up the gradient values. The idea is to differentiate an “unrolled” version of the original computation (=-=Williams and Zipser, 1989-=-), in which an item at 19 P More generally, g(ai) = ∂goal/∂ai = c ∂goal/∂c · ∂c/∂ai = P c g(c) · ∂c/∂ai by the chain rule. 287 1. for each a, gchart[a] := 0 and gagenda[a] := 0 (* respectively hold ∂g... |

384 |
Maximum Likelihood Estimation from Incomplete Data via the EM Algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...new program computes both goal and ∇goal. An optimization algorithm such as conjugate gradient can use this information to tune the axiom weights to maximize goal. An alternative is the EM algorithm (=-=Dempster et al., 1977-=-) for probabilistic generative models such as PCFGs. Luckily the same program serves, since for such models, the E count (expected count) of an item a can be found as a · g(a)/goal. In other words, th... |

326 | Logic programming with stable model semantics as a constraint programming paradigm
- Niemelä
- 1999
(Show Context)
Citation Context ...ams outside the semiring. In particular, one can write instances of SAT and other NP-hard constraint satisfaction problems by using cyclic rules with negation over finitely many boolean-valued items (=-=Niemelä, 1998-=-). Here the agenda algorithm can end up flipping values forever between false and true; a more general solver would have to be called in order to find a stable model of a SAT problem’s equations. 14 S... |

272 | Inside-Outside Reestimation from Partially Bracketed Corpora
- Pereira, Schabes
- 1992
(Show Context)
Citation Context ...d FS morphology, conditional random fields (Lafferty et al., 2001), synchronous parsers (Wu, 1997; Melamed, 2003), lexicalized parsers (Eisner and Satta, 1999), 22 partially supervised training à la (=-=Pereira and Schabes, 1992-=-), 23 and grammar induction (Klein and Manning, 2002). These replications were easy to write and extend, and to train via §5.2. 7.2 Experiments We compared the current Dyna compiler to handbuilt syste... |

213 |
Unfold/fold transformations of logic programs
- Tamaki, Sato
- 1984
(Show Context)
Citation Context ...h John Blatz, we are also exploring transformations that can result in asymptotically more efficient computations of goal. Their unweighted versions are well-known in the logic programming community (=-=Tamaki and Sato, 1984-=-; Ramakrishnan, 1991). Folding introduces new intermediate items, perhaps exploiting the distributive law; applications include parsing speedups such as (Eisner and Satta, 1999), as well as well-known... |

170 | Theory of generalized annotated logic programming and its applications
- Kifer, Subrahmanian
- 1992
(Show Context)
Citation Context ...ag, 1997) support aggregation (as in Dyna’s +=, log+=, max=, . . . ), although only “stratified” forms of it that exclude unary CFG rule cycles. 6 Ross and Sagiv (1992) (and in a more restricted way, =-=Kifer and Subrahmanian, 1992-=-) come closest to our notion of attaching aggregable values to terms. Among deductive or other database systems, Dyna is perhaps unusual in that its goal is not to support transactional databases or a... |

165 | Principles and implementation of deductive parsing
- Shieber, Schabes, et al.
- 1995
(Show Context)
Citation Context ...amic programming as deduction The “parsing as deduction” framework (Pereira and Warren, 1983) is now over 20 years old. It provides an elegant notation for specifying a variety of parsing algorithms (=-=Shieber et al., 1995-=-), including algorithms for probabilistic or other semiring-weighted parsing (Goodman, 1999). In the parsing community, new algorithms are often stated simply as a set of deductive inference rules (Si... |

138 |
Parsing as deduction
- Pereira, Warren
- 1983
(Show Context)
Citation Context ...mpiles into C++ classes. This system should help the HLT community to experiment more easily with new models and algorithms. 1.1 Dynamic programming as deduction The “parsing as deduction” framework (=-=Pereira and Warren, 1983-=-) is now over 20 years old. It provides an elegant notation for specifying a variety of parsing algorithms (Shieber et al., 1995), including algorithms for probabilistic or other semiring-weighted par... |

126 |
Pattern databases
- Culberson, Schaeffer
- 1998
(Show Context)
Citation Context ...rmation for finding ∇goal. We note a few other examples. Bounding transformations generate a new program that computes upper or lower bounds on goal, via generic bounding techniques (Prieditis, 1993; =-=Culberson and Schaeffer, 1998-=-). The A* heuristics explored by Klein and Manning (2003a) can be seen as resulting from bounding transformations. With John Blatz, we are also exploring transformations that can result in asymptotica... |

117 | Magic templates: a spellbinding approach to logic programs
- Ramakrishnan
- 1991
(Show Context)
Citation Context ...so exploring transformations that can result in asymptotically more efficient computations of goal. Their unweighted versions are well-known in the logic programming community (Tamaki and Sato, 1984; =-=Ramakrishnan, 1991-=-). Folding introduces new intermediate items, perhaps exploiting the distributive law; applications include parsing speedups such as (Eisner and Satta, 1999), as well as well-known techniques for spee... |

117 | Contrastive estimation: Training log-linear models on unlabeled data
- Smith, Eisner
- 2005
(Show Context)
Citation Context ...i-Newton methods, and smoothing-parameter tuning on development data. As an object-oriented C++ library, it also facilitates rapid implementation of new estimation techniques (Smith and Eisner, 2004; =-=Smith and Eisner, 2005-=-). 6 Program Transformations Another interest of Dyna is that its high-level specifications can be manipulated by mechanical sourceto-source program transformations. This makes it possible to derive n... |

78 | A* Parsing: Fast Exact Viterbi Parse Selection - Klein, Manning - 2003 |

75 | Regular expressions for language engineering - Karttunen, Chanod, et al. - 1996 |

71 | New figures of merit for best-first probabilistic chart parsing - Caraballo, Charniak - 1998 |

64 | Semiring parsing
- Goodman
- 1999
(Show Context)
Citation Context ...ow over 20 years old. It provides an elegant notation for specifying a variety of parsing algorithms (Shieber et al., 1995), including algorithms for probabilistic or other semiring-weighted parsing (=-=Goodman, 1999-=-). In the parsing community, new algorithms are often stated simply as a set of deductive inference rules (Sikkel, 1997; Eisner and Satta, 1999). It is also straightforward to specify other NLP algori... |

64 | Monotonic aggregation in deductive databases - Ross, Sagiv - 1991 |

53 | Edge-based best-first chart parsing - Charniak, Goldwater, et al. - 1998 |

49 | Multitext grammars and synchronous parsers
- Melamed
- 2003
(Show Context)
Citation Context ...licate, tweak, and combine useful techniques from the literature. These techniques included unweighted FS morphology, conditional random fields (Lafferty et al., 2001), synchronous parsers (Wu, 1997; =-=Melamed, 2003-=-), lexicalized parsers (Eisner and Satta, 1999), 22 partially supervised training à la (Pereira and Schabes, 1992), 23 and grammar induction (Klein and Manning, 2002). These replications were easy to ... |

47 | Bilingual parsing with factored estimation: Using English to parse Korean
- Smith, Smith
- 2004
(Show Context)
Citation Context ...rlier work on new algorithms for lexicalized and CCG parsing, syntactic MT, transformational syntax, trainable parameterized FSMs, and finite-state phonology.) In other cases (Smith and Eisner, 2004; =-=Smith and Smith, 2004-=-; Smith et al., 2005), Dyna let us quickly replicate, tweak, and combine useful techniques from the literature. These techniques included unweighted FS morphology, conditional random fields (Lafferty ... |

46 |
Parsing Schemata – A Framework for Specification and Analysis of Parsing Algorithms
- Sikkel
- 1997
(Show Context)
Citation Context ...95), including algorithms for probabilistic or other semiring-weighted parsing (Goodman, 1999). In the parsing community, new algorithms are often stated simply as a set of deductive inference rules (=-=Sikkel, 1997-=-; Eisner and Satta, 1999). It is also straightforward to specify other NLP algorithms this way. Syntactic MT models, language models, and stack decoders can be easily described using deductive rules. ... |

35 | eds), Automatic Differentiation of Algorithms - Griewank, Corliss - 1991 |

32 | Space-efficient inference in dynamic probabilistic networks
- Binder, Murphy, et al.
- 1997
(Show Context)
Citation Context ...eclarations that control which items use the agenda or are memoized in the chart. This can be used to support lazy or “on-the-fly” computation (Mohri et al., 1998) and asymptotic space-saving tricks (=-=Binder et al., 1997-=-). 7 Usefulness of the Implementation 7.1 Applications The current Dyna compiler has proved indispensable in our own recent projects, in the sense that we would not have attempted many of them without... |

32 | Parsing with soft and hard constraints on dependency length - Eisner, Smith - 2005 |

29 | Machine discovery of effective admissible heuristics
- Prieditis
- 1993
(Show Context)
Citation Context ... gradient transformation for finding ∇goal. We note a few other examples. Bounding transformations generate a new program that computes upper or lower bounds on goal, via generic bounding techniques (=-=Prieditis, 1993-=-; Culberson and Schaeffer, 1998). The A* heuristics explored by Klein and Manning (2003a) can be seen as resulting from bounding transformations. With John Blatz, we are also exploring transformations... |

29 | Annealing techniques for unsupervised statistical language learning - Smith, Eisner - 2004 |

27 | The CORAL deductive system
- Ramakrishnan, Srivastava, et al.
- 1994
(Show Context)
Citation Context ...lly declarative logic programming languages (with restrictions or extensions) that are—or could be—implemented using efficient database techniques. Some implemented deductive databases such as CORAL (=-=Ramakrishnan et al., 1994-=-) and LOLA (Zukowski and Freitag, 1997) support aggregation (as in Dyna’s +=, log+=, max=, . . . ), although only “stratified” forms of it that exclude unary CFG rule cycles. 6 Ross and Sagiv (1992) (... |

23 | Quasi-synchronous grammars: Alignment by soft projection of syntactic dependencies - Smith, Eisner - 2006 |

21 | Dyna: A declarative language for implementing dynamic programs - Eisner, Goldlust, et al. |

17 | Context-based morphological disambiguation with random fields
- Smith, Smith, et al.
- 2005
(Show Context)
Citation Context ...ithms for lexicalized and CCG parsing, syntactic MT, transformational syntax, trainable parameterized FSMs, and finite-state phonology.) In other cases (Smith and Eisner, 2004; Smith and Smith, 2004; =-=Smith et al., 2005-=-), Dyna let us quickly replicate, tweak, and combine useful techniques from the literature. These techniques included unweighted FS morphology, conditional random fields (Lafferty et al., 2001), synch... |

15 |
Weighted deductive parsing and Knuth’s algorithm
- Nederhof
- 2003
(Show Context)
Citation Context ...s for parameter estimation. Third, regarding weights, the Dyna language is designed to express systems of arbitrary, heterogeneous equations over item values. In previous work such as (Goodman, 1999; =-=Nederhof, 2003-=-), one only specifies the inference rules as unweighted Horn clauses, and then weights are added automatically in a standard way: all values have the same type W, and all rules transform to equations ... |

6 | Magic for Filter Optimization in Dynamic Bottom-up Processing
- Minnen
- 1996
(Show Context)
Citation Context ...es for the semiring case, these can be generalized. Our approach may be most closely related to deductive databases, which even in their heyday were apparently ignored by the CL community (except for =-=Minnen, 1996-=-). Deductive database systems permit inference rules that can derive new database facts from old ones. 5 They are essentially declarative logic programming languages (with restrictions or extensions) ... |

2 |
Efficient parsing for bilexical CFGs and headautomaton grammars
- Eisner, Satta
- 1999
(Show Context)
Citation Context ... algorithms for probabilistic or other semiring-weighted parsing (Goodman, 1999). In the parsing community, new algorithms are often stated simply as a set of deductive inference rules (Sikkel, 1997; =-=Eisner and Satta, 1999-=-). It is also straightforward to specify other NLP algorithms this way. Syntactic MT models, language models, and stack decoders can be easily described using deductive rules. So can operations on fin... |

2 |
A generative constituentcontext model for grammar induction
- Klein, Manning
- 2002
(Show Context)
Citation Context ...t al., 2001), synchronous parsers (Wu, 1997; Melamed, 2003), lexicalized parsers (Eisner and Satta, 1999), 22 partially supervised training à la (Pereira and Schabes, 1992), 23 and grammar induction (=-=Klein and Manning, 2002-=-). These replications were easy to write and extend, and to train via §5.2. 7.2 Experiments We compared the current Dyna compiler to handbuilt systems on a variety of parsing tasks. These problems wer... |

1 | Inside-outside (computer program). http://www.cog.brown.edu/˜mj/Software.htm - Johnson - 2000 |

1 |
A rational design for a weighted FST library. LNCS
- Mohri, Pereira, et al.
- 1998
(Show Context)
Citation Context ...t in memory. We are also exploring the introduction of declarations that control which items use the agenda or are memoized in the chart. This can be used to support lazy or “on-the-fly” computation (=-=Mohri et al., 1998-=-) and asymptotic space-saving tricks (Binder et al., 1997). 7 Usefulness of the Implementation 7.1 Applications The current Dyna compiler has proved indispensable in our own recent projects, in the se... |

1 |
AMOS: A NL parser implemented as a deductive database
- Specht, Freitag
- 1995
(Show Context)
Citation Context ...y use some variant of the unweighted agendabased algorithm, which is known in that community as “seminaive bottom-up evaluation.” 6 An unweighted parser was implemented in an earlier version of LOLA (=-=Specht and Freitag, 1995-=-).slations in memory 7 in a way that resembles handdesigned data structures for the algorithm in question. The compiler has many choices to make here; we ultimately hope to implement feedback-directed... |

1 |
An efficient probabilistic CF parsing algorithm that computes prefix probabilities
- Stolcke
- 1995
(Show Context)
Citation Context ...he sentence 6. need(Nonterm,J) += constit( /cons(Nonterm, ), ,J). % Note: underscore matches anything (anonymous wildcard) Figure 2: An Earley parser that recovers inside probabilities (Earley, 1970; =-=Stolcke, 1995-=-). The rule np → det n should be encoded as the axiom rewrite(“np”,cons(“det”,cons(“n”,nil))), a nested term. “np”/Needed is the label of a partial np constituent that is still missing the list of sub... |