## Advanced Dynamic Programming in Semiring and Hypergraph Frameworks (2008)

Citations: | 4 - 0 self |

### BibTeX

@MISC{Huang08advanceddynamic,

author = {Liang Huang},

title = {Advanced Dynamic Programming in Semiring and Hypergraph Frameworks },

year = {2008}

}

### OpenURL

### Abstract

Dynamic Programming (DP) is an important class of algorithms widely used in many areas of speech and language processing. Recently there have been a series of work trying to formalize many instances of DP algorithms under algebraic and graph-theoretic frameworks. This tutorial surveys two such frameworks, namely semirings and directed hypergraphs, and draws connections between them. We formalize two particular types of DP algorithms under each of these frameworks: the Viterbi-style topological algorithms and the Dijkstra-style best-first algorithms. Wherever relevant, we also discuss typical applications of these algorithms in Natural Language Processing.

### Citations

8530 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1990
(Show Context)
Citation Context ... natural order is a total ordering. An important property of semirings when dealing with optimization problems is monotonicity, which justifies the optimal subproblem property in dynamic programming (=-=Cormen et al., 2001-=-) that the computation can be factored (into smaller problems). Definition 6. Let K = (A, ⊕, ⊗, 0, 1) be a semiring, and ≤ a partial ordering over A. We say K is monotonic if for all a, b, c ∈ A (a ≤ ... |

8089 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...ring (Table 1), including the forward-backward algorithm (Baum, 1972) and Inside-Outside algorithm (Baker, 1979; Lari and Young, 1990) are widely used for unsupervised training with the EM algorithm (=-=Dempster et al., 1977-=-). For the latter, since NLP is often a pipeline of several modules, where the 1-best solution from one module might not be the best input for the next module, and one prefers to postpone disambiguati... |

2611 |
Dynamic Programming
- Bellman
- 1957
(Show Context)
Citation Context ...scuss typical applications of these algorithms in Natural Language Processing. 1 Introduction Many algorithms in speech and language processing can be viewed as instances of dynamic programming (DP) (=-=Bellman, 1957-=-). The basic idea of DP is to solve a bigger problem by divide-and-conquer, but also reuses the solutions of overlapping subproblems to avoid recalculation. The simplest such example is a Fibonacci se... |

1436 | A note on two problems in connexion with graphs
- Dijkstra
- 1959
(Show Context)
Citation Context ...mportant types of DP algorithms (columns) with contrasting order of visiting nodes: the Viterbi style topological-order algorithms (Viterbi, 1967), and the Dijkstra-Knuth style best-first algorithms (=-=Dijkstra, 1959-=-; Knuth, 1977). This survey focuses on optimization problems where one aims to find the best solution of a problem (e.g. shortest path or highest probability derivation) but other problems will also b... |

975 |
A Formal Basis for the Heuristic Determination of Minimum Cost Paths
- Hart, Nilsson, et al.
- 1968
(Show Context)
Citation Context ...y low among all vertices (Eq. 4 does not hold), so normally the direct use of Dijkstra does not bring speed up as opposed to Viterbi. To alleviate this problem, there is a popular technique named A* (=-=Hart et al., 1968-=-) described below. 3.2.1 A* Algorithm for State-Space Search We prioritize the queue using a combination d(v) ⊗ ˆ h(v) of the known cost d(v) from the source vertex, and an estimate ˆ h(v) of the (fut... |

592 | Automatic labeling of semantic roles
- Gildea, Jurafsky
- 2000
(Show Context)
Citation Context ...dules, where the 1-best solution from one module might not be the best input for the next module, and one prefers to postpone disambiguation by propogating a k-best list of candidates (Collins, 2000; =-=Gildea and Jurafsky, 2002-=-; Charniak and Johnson, 2005; Huang and Chiang, 2005). The k-best list is also frequently used in discriminative learning to approximate the whole set of candidates which is usually exponentially larg... |

512 |
An inequality and associated maximization technique in statistical estimation of probabilistic functions of a markov process
- Baum
- 1972
(Show Context)
Citation Context ...d(u)⊕ = d(v) ⊗ w(e) 3 Also known as the Lawler (1976) algorithm in the theory community, but he considers it as part of the folklore. 4 This is not to be confused with the forward-backward algorithm (=-=Baum, 1972-=-). In fact both forward and backward updates here are instances of the forward phase of a forward-backward algorithm. 6Variant 2. Another popular implemention is memoized recursion (Cormen et al., 20... |

375 | Hierarchical phrase-based translation
- Chiang
- 2007
(Show Context)
Citation Context ...or k-best derivations (Jiménez and Marzal, 2000; Huang and Chiang, 2005). Applications of this algorithm include k-best parsing (McDonald et al., 2005; Mohri and Roark, 2006) and machine translation (=-=Chiang, 2007-=-). It is also implemented as part of Dyna (Eisner et al., 2005), a generic langauge for dynamic programming. The k-best extension of the Knuth Algorithm is studied by Huang (2005). A separate problem,... |

373 |
The estimation of stochastic context-free grammars using the Inside–Outside algorithm. Computer Speech and Language
- Lari, Young
- 1990
(Show Context)
Citation Context ...ions have many applications in NLP. For the former, algorithms based on the Inside semiring (Table 1), including the forward-backward algorithm (Baum, 1972) and Inside-Outside algorithm (Baker, 1979; =-=Lari and Young, 1990-=-) are widely used for unsupervised training with the EM algorithm (Dempster et al., 1977). For the latter, since NLP is often a pipeline of several modules, where the 1-best solution from one module m... |

324 |
On a routing problem
- Bellman
- 1958
(Show Context)
Citation Context ...es this problem by a slightly different definition of closedness which does not assume idempotence. His generic single-source algorithm subsumes many classical algorithms like Dijkstra, Bellman-Ford (=-=Bellman, 1958-=-), and Viterbi as specific instances. It remains an open problem how to extend the closedness definition to the case of weight functions in hypergraphs. 156.2 k-best Extensions The straightforward ex... |

274 | Combinatorial Optimization: Networks and - LAWLER - 1976 |

268 |
Trainable grammars for speech recognition
- Baker
- 1979
(Show Context)
Citation Context .... Both extensions have many applications in NLP. For the former, algorithms based on the Inside semiring (Table 1), including the forward-backward algorithm (Baum, 1972) and Inside-Outside algorithm (=-=Baker, 1979-=-; Lari and Young, 1990) are widely used for unsupervised training with the EM algorithm (Dempster et al., 1977). For the latter, since NLP is often a pipeline of several modules, where the 1-best solu... |

225 | Online large-margin training of dependency parsers
- McDonald, Crammer, et al.
- 2005
(Show Context)
Citation Context ... Johnson, 2005; Huang and Chiang, 2005). The k-best list is also frequently used in discriminative learning to approximate the whole set of candidates which is usually exponentially large (Och, 2003; =-=McDonald et al., 2005-=-). 6.1 Beyond Optimization Problems We know that in optimization problems, the criteria for using dynamic programming is monotonicity (definitions 6 and 16). But in non-optimization problems, since th... |

165 | Principles and implementation of deductive parsing
- Shieber, Schabes, et al.
- 1995
(Show Context)
Citation Context ...(v) = { 1 v is a source vertex ⊕ D∈D(v) w(D) otherwise (6) 4.3 Related Formalisms Hypergraphs are closely related to other formalisms like AND/OR graphs, context-free grammars, and deductive systems (=-=Shieber et al., 1995-=-; Nederhof, 2003). In an AND/OR graph, the OR-nodes correspond to vertices in a hypergraph and the AND-nodes, which links several OR-nodes to another OR-node, correspond to a hyperedge. Similarly, in ... |

148 | Better k-best Parsing
- Huang, Chiang
- 2005
(Show Context)
Citation Context ... not straightforward due to the assymmetry of the head and the tail in a hyperedge and there have been multiple proposals in the literature. Here we follow the recursive definition of derivations in (=-=Huang and Chiang, 2005-=-). See Section 6 for the alternative notion of hyperpaths. Definition 19. A derivation D of a vertex v in a hypergraph H, its size |D| and its weight w(D) are recursively defined as follows: • If e ∈ ... |

111 |
An efficient recognition and syntax analysis algorithm for context free languages. University of lllinois
- Kasami
- 1966
(Show Context)
Citation Context ...ave been fixed 10: e is ({u1, u2, · · · , u |e|}, h(e), fe) 11: d(h(e))⊕ = fe(d(u1), d(u2), · · · , d(u |e|)) 5.1.1 CKY Algorithm The most widely used algorithm for parsing in NLP, the CKY algorithm (=-=Kasami, 1965-=-), is a specific instance of the Viterbi algorithm for hypergraphs. The CKY algorithm takes a context-free grammar G in Chomsky Normal Form (CNF) and essentially intersects G with a DFA D representing... |

103 | Training Tree Transducers - Graehl, Knight, et al. |

100 | Directed hypergraphs and applications
- Gallo, Longo, et al.
- 1993
(Show Context)
Citation Context ...e first study two types of search spaces (rows): the semiring framework (Mohri, 2002) when the underlying representation is a directed graph as in finite-state machines, and the hypergraph framework (=-=Gallo et al., 1993-=-) when the search space is hierarchically branching as in context-free grammars; then, under each of these frameworks, we study two important types of DP algorithms (columns) with contrasting order of... |

78 | A* Parsing: Fast Exact Viterbi Parse Selection - Klein, Manning - 2003 |

64 |
A Generalization of Dijkstra’s Algorithm
- Knuth
- 1977
(Show Context)
Citation Context ...f DP algorithms (columns) with contrasting order of visiting nodes: the Viterbi style topological-order algorithms (Viterbi, 1967), and the Dijkstra-Knuth style best-first algorithms (Dijkstra, 1959; =-=Knuth, 1977-=-). This survey focuses on optimization problems where one aims to find the best solution of a problem (e.g. shortest path or highest probability derivation) but other problems will also be discussed. ... |

26 |
Coarseto-fine-grained n-best parsing and discriminative reranking
- Charniak, Johnson
- 2005
(Show Context)
Citation Context ...ution from one module might not be the best input for the next module, and one prefers to postpone disambiguation by propogating a k-best list of candidates (Collins, 2000; Gildea and Jurafsky, 2002; =-=Charniak and Johnson, 2005-=-; Huang and Chiang, 2005). The k-best list is also frequently used in discriminative learning to approximate the whole set of candidates which is usually exponentially large (Och, 2003; McDonald et al... |

24 |
Compiling Comp Ling: Weighted Dynamic Programming and the Dyna Language
- Eisner, Goldlust, et al.
- 2005
(Show Context)
Citation Context ...and Chiang, 2005). Applications of this algorithm include k-best parsing (McDonald et al., 2005; Mohri and Roark, 2006) and machine translation (Chiang, 2007). It is also implemented as part of Dyna (=-=Eisner et al., 2005-=-), a generic langauge for dynamic programming. The k-best extension of the Knuth Algorithm is studied by Huang (2005). A separate problem, k-shortest hyperpaths, has been studied by Nielsen et al. (20... |

17 | An Efficient Algorithm for the N-Best-Strings Problem
- Mohri, Riley
- 2002
(Show Context)
Citation Context ...zy computation method on top of the Viterbi algorithm to efficiently compute the ith-best solution based on the 1st, 2nd, ..., (i − 1)th solutions. A simple k-best Dijkstra algorithm is described in (=-=Mohri and Riley, 2002-=-). For the hypergraph case, the REA algorithm has been adapted for k-best derivations (Jiménez and Marzal, 2000; Huang and Chiang, 2005). Applications of this algorithm include k-best parsing (McDonal... |

17 | Finding the k shortest hyperpaths
- Nielsen, Andersen, et al.
- 2005
(Show Context)
Citation Context ... allow multiple occurrences of the same vertex in a tail and there is an ordering among the components. We also allow the head vertex to appear in the tail creating a self-loop which is ruled out in (=-=Nielsen et al., 2005-=-). Definition 13. We denote |e| = |T(e)| to be the arity of the hyperedge 5 . If |e| = 0, then fe() ∈ R is a constant (fe is a nullary function) and we call h(e) a source vertex. We define the arity o... |

15 |
Weighted deductive parsing and Knuth’s algorithm
- Nederhof
- 2003
(Show Context)
Citation Context ...on the more popular binary heap case below. For problems that satisfy both acyclicity and superiority, which include many applications in NLP such as HMM tagging, both Dijkstra and Viterbi can apply (=-=Nederhof, 2003-=-). So which one is better in this case? From the above analysis, the complexity O((V + E)log V ) of Dijkstra look inferior to Viterbi’s O(V +E) (due to the overhead for maintaining the priority queue)... |

10 | and Andrés Marzal. 2000. Computation of the n best parse trees for weighted and stochastic context-free grammars - Jiménez |

10 | Probabilistic context-free grammar induction based on structural zeros
- Mohri, Roark
- 2006
(Show Context)
Citation Context ...graph case, the REA algorithm has been adapted for k-best derivations (Jiménez and Marzal, 2000; Huang and Chiang, 2005). Applications of this algorithm include k-best parsing (McDonald et al., 2005; =-=Mohri and Roark, 2006-=-) and machine translation (Chiang, 2007). It is also implemented as part of Dyna (Eisner et al., 2005), a generic langauge for dynamic programming. The k-best extension of the Knuth Algorithm is studi... |

9 | Computing the k shortest paths: A new algorithm and an experimental comparison - Víctor, Marzal - 1999 |

3 | k-best Knuth algorithm and k-best A* parsing. Unpublished manuscript - Huang - 2005 |