## Non-projective dependency parsing using spanning tree algorithms (2005)

### Cached

### Download Links

- [ryanmcd.googlepages.com]
- [www.cs.toronto.edu]
- [ryanmcd.com]
- [www.ryanmcd.com]
- [ryanmcd.googlepages.com]
- [www.seas.upenn.edu]
- [ufal.mff.cuni.cz]
- [ufal.mff.cuni.cz]
- [www2.denizyuret.com]
- [www.cs.utoronto.ca]
- [acl.ldc.upenn.edu]
- [www.aclweb.org]
- [wing.comp.nus.edu.sg]
- [www.aclweb.org]
- [acl.eldoc.ub.rug.nl]
- [aclweb.org]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing |

Citations: | 295 - 10 self |

### BibTeX

@INPROCEEDINGS{Mcdonald05non-projectivedependency,

author = {Ryan Mcdonald and Fernando Pereira and Kiril Ribarov and Jan Hajič},

title = {Non-projective dependency parsing using spanning tree algorithms},

booktitle = {In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing},

year = {2005},

pages = {523--530}

}

### Years of Citing Articles

### OpenURL

### Abstract

We formalize weighted dependency parsing as searching for maximum spanning trees (MSTs) in directed graphs. Using this representation, the parsing algorithm of Eisner (1996) is sufficient for searching over all projective trees in O(n 3) time. More surprisingly, the representation is extended naturally to non-projective parsing using Chu-Liu-Edmonds (Chu and Liu, 1965; Edmonds, 1967) MST algorithm, yielding an O(n 2) parsing algorithm. We evaluate these methods on the Prague Dependency Treebank using online large-margin learning techniques (Crammer et al., 2003; McDonald et al., 2005) and show that MST parsing increases efficiency and accuracy for languages with non-projective dependencies. 1

### Citations

9158 | Introduction to Algorithms
- Cormen, Leiserson, et al.
- 1998
(Show Context)
Citation Context ...ing non-projective tree we simply search the entire space of spanning trees with no restrictions. Well-known algorithms exist for the less general case of finding spanning trees in undirected graphs (=-=Cormen et al., 1990-=-). Efficient algorithms for the directed case are less well known, but they exist. We will use here the Chu-Liu-Edmonds algorithm (Chu and Liu, 1965; Edmonds, 1967), sketched in Figure 3 following Geo... |

2273 | Building a large annotated corpus of English: The Penn Treebank
- Marcus, Marcinkiewicz, et al.
- 1993
(Show Context)
Citation Context ...of the sentence. In English, projective trees are sufficient to analyze most sentence types. In fact, the largest source of English dependency trees is automatically generated from the Penn Treebank (=-=Marcus et al., 1993-=-) and is by convention exclusively projective. However, there are certain examples in which a nonprojective tree is preferable. Consider the sentence John saw a dog yesterday which was a Yorkshire Ter... |

466 | Max-margin Markov networks
- Taskar, Guestrin, et al.
- 2004
(Show Context)
Citation Context ...dency trees, the loss of a tree is defined to be the number of words with incorrect parents relative to the correct tree. This is closely related to the Hamming loss that is often used for sequences (=-=Taskar et al., 2003-=-). For arbitrary inputs, there are typically exponentially many possible parses and thus exponentially many margin constraints in line 4 of Figure 4. 3.1 Single-best MIRA One solution for the exponent... |

272 | Three new probabilistic models for dependency parsing: an exploration
- Eisner
- 1996
(Show Context)
Citation Context ...at dependency parsing can be formalized as the search for a maximum spanning tree in a directed graph. This formalization generalizes standard projective parsing models based on the Eisner algorithm (=-=Eisner, 1996-=-) to yield efficient O(n 2 ) exact parsing methods for nonprojective languages like Czech. Using this spanning tree representation, we extend the work of McDonald et al. (2005) on online large-margin ... |

268 | Ultraconservative Online Algorithms for Multiclass Problems
- Crammer, Singer
- 2003
(Show Context)
Citation Context ...nce xt and its correct dependency tree yt. In what follows, dt(x) denotes the set of possible dependency trees for sentence x. The basic idea is to extend the Margin Infused Relaxed Algorithm (MIRA) (=-=Crammer and Singer, 2003-=-; Crammer et al., 2003) to learning with structured outputs, in the present case dependency trees. Figure 4 gives pseudo-code for the MIRA algorithm as presented by McDonald et al. (2005). An online l... |

244 | Online large-margin training of dependency parsers
- McDonald, Crammer, et al.
- 2005
(Show Context)
Citation Context ...; Edmonds, 1967) MST algorithm, yielding an O(n 2 ) parsing algorithm. We evaluate these methods on the Prague Dependency Treebank using online large-margin learning techniques (Crammer et al., 2003; =-=McDonald et al., 2005-=-) and show that MST parsing increases efficiency and accuracy for languages with non-projective dependencies. 1 Introduction Dependency parsing has seen a surge of interest lately for applications suc... |

244 | 2005. Pseudo-projective dependency parsing - Nivre, Nilsson |

201 |
Optimum branchings
- Edmonds
- 1967
(Show Context)
Citation Context ... is sufficient for searching over all projective trees in O(n 3 ) time. More surprisingly, the representation is extended naturally to non-projective parsing using Chu-Liu-Edmonds (Chu and Liu, 1965; =-=Edmonds, 1967-=-) MST algorithm, yielding an O(n 2 ) parsing algorithm. We evaluate these methods on the Prague Dependency Treebank using online large-margin learning techniques (Crammer et al., 2003; McDonald et al.... |

195 | 2004. Dependency tree kernels for relation extraction - Culotta, Sorensen |

161 |
Learning syntactic patterns for automatic hypernym discovery
- SNOW, JURAFSKY, et al.
- 2005
(Show Context)
Citation Context ... for applications such as relation extraction (Culotta and Sorensen, 2004), machine translation (Ding and Palmer, 2005), synonym generation (Shinyama et al., 2002), and lexical resource augmentation (=-=Snow et al., 2004-=-). The primary reasons for using dependency structures instead of more informative lexicalized phrase structures is that they are more efficient to learn and parse while still encoding much of the pre... |

148 |
Building a Syntactically Annotated Corpus: The Prague Dependency Treebank
- Hajic
- 1998
(Show Context)
Citation Context ...ependency parsing, but for that application, k-best MIRA performs as well or better, and is much faster to train. 4 Experiments We performed experiments on the Czech Prague Dependency Treebank (PDT) (=-=Hajič, 1998-=-; Hajič et al., ). We used the predefined training, development and testing split of this data set. Furthermore, we used the automatically generated POS tags that are provided with the data. Czech POS... |

141 | Max-margin parsing
- Taskar, Klein, et al.
- 2004
(Show Context)
Citation Context ...tored MIRA It is also possible to exploit the structure of the output space and factor the exponential number of margin constraints into a polynomial number of local constraints (Taskar et al., 2003; =-=Taskar et al., 2004-=-). For the directed maximum spanning tree problem, we can factor the output by edges to obtain the following constraints: min � �w (i+1) − w (i)� � s.t. s(l,j) − s(k,j) ≥ 1 ∀(l,j) ∈ yt,(k,j) /∈ yt Thi... |

132 | Statistical Dependency Analysis with Support Vector - YAMADA, MATSUMOTO - 2003 |

130 |
On the shortest arborescence of a directed graph
- Chu, Liu
- 1965
(Show Context)
Citation Context ...hm of Eisner (1996) is sufficient for searching over all projective trees in O(n 3 ) time. More surprisingly, the representation is extended naturally to non-projective parsing using Chu-Liu-Edmonds (=-=Chu and Liu, 1965-=-; Edmonds, 1967) MST algorithm, yielding an O(n 2 ) parsing algorithm. We evaluate these methods on the Prague Dependency Treebank using online large-margin learning techniques (Crammer et al., 2003; ... |

130 | A statistical parser for Czech - Collins, Hajič, et al. - 1999 |

104 | Deterministic Dependency Parsing of English Text, in "Proceedings of Coling 2004 - NIVRE, SCHOLZ |

84 | Automatic paraphrase acquisition from news articles
- Shinyama, Sekine, et al.
- 2002
(Show Context)
Citation Context ...tion Dependency parsing has seen a surge of interest lately for applications such as relation extraction (Culotta and Sorensen, 2004), machine translation (Ding and Palmer, 2005), synonym generation (=-=Shinyama et al., 2002-=-), and lexical resource augmentation (Snow et al., 2004). The primary reasons for using dependency structures instead of more informative lexicalized phrase structures is that they are more efficient ... |

73 | Machine translation using probabilistic synchronous dependency insertion grammars
- Ding, Palmer
- 2005
(Show Context)
Citation Context ...with non-projective dependencies. 1 Introduction Dependency parsing has seen a surge of interest lately for applications such as relation extraction (Culotta and Sorensen, 2004), machine translation (=-=Ding and Palmer, 2005-=-), synonym generation (Shinyama et al., 2002), and lexical resource augmentation (Snow et al., 2004). The primary reasons for using dependency structures instead of more informative lexicalized phrase... |

65 |
Online passive aggressive algorithms
- Crammer, Dekel, et al.
- 2006
(Show Context)
Citation Context ...nds (Chu and Liu, 1965; Edmonds, 1967) MST algorithm, yielding an O(n 2 ) parsing algorithm. We evaluate these methods on the Prague Dependency Treebank using online large-margin learning techniques (=-=Crammer et al., 2003-=-; McDonald et al., 2005) and show that MST parsing increases efficiency and accuracy for languages with non-projective dependencies. 1 Introduction Dependency parsing has seen a surge of interest late... |

33 |
Learning and robust learning of product distributions
- Hoffgen
- 1993
(Show Context)
Citation Context ...the edges of the dependency tree. One might hope that the method would generalize tosinclude features of larger substructures. Unfortunately, that would make the search for the best tree intractable (=-=Höffgen, 1993-=-). Acknowledgments We thank Lillian Lee for bringing an important missed connection to our attention, and Koby Crammer for his help with learning algorithms. This work has been supported by NSF ITR gr... |

18 | A statistical constraint dependency grammar (cdg) parser - Wang, Harper - 2004 |

16 | Finding optimum branchings. Networks - Tarjan - 1977 |

6 |
Arborescence optimization problems solvable by Edmonds’ algorithm
- Leonidas
- 2003
(Show Context)
Citation Context ...tex and recalculates edge weights going into and out of the cycle. It can be shown that a maximum spanning tree on the contracted graph is equivalent to a maximum spanning tree in the original graph (=-=Leonidas, 2003-=-). Hence the algorithm can recursively call itself on the new graph. Naively, this algorithm runs in O(n 3 ) time since each recursive call takes O(n 2 ) to find the highest incoming edge for each wor... |

4 | Semantic dependency analysis method for japanese based on optimum tree search algorithm - Hirakawa - 2001 |

3 |
New large margin algorithms for structured prediction
- Crammer, McDonald, et al.
- 2004
(Show Context)
Citation Context ...error results in a score increase of at least 1, the entire score difference must be at least the number of errors. For sequences, this form of factorization has been called local lattice preference (=-=Crammer et al., 2004-=-). Let n be the number of nodes in graph Gx. Then the number of constraints is O(n 2 ), since for each node we must maintain n − 1 constraints. The factored constraints are in general more restrictive... |

3 |
The Prague Dependency Treebank 1.0 CDROM. Linguistics Data Consortium Cat
- Hajič, Hajicova, et al.
- 2001
(Show Context)
Citation Context ... in English and Czech. grammatical relations, allowing non-projective dependencies that we need to represent and parse efficiently. A non-projective example from the Czech Prague Dependency Treebank (=-=Hajič et al., 2001-=-) is also shown in Figure 2. Most previous dependency parsing models have focused on projective trees, including the work of Eisner (1996), Collins et al. (1999), Yamada and Matsumoto (2003), Nivre an... |

2 |
Algorithm for finding the first k shortest arborescences of a digraph
- Hou
- 1996
(Show Context)
Citation Context ...values of k are sufficient to achieve the best accuracy for these methods. However, here we stay with a single best tree because kbest extensions to the Chu-Liu-Edmonds algorithm are too inefficient (=-=Hou, 1996-=-). This model is related to the averaged perceptron algorithm of Collins (2002). In that algorithm, the single highest scoring tree (or structure) is used to update the weight vector. However, MIRA ag... |