## On the complexity of non-projective data-driven dependency parsing (2007)

Venue: | In Proc. IWPT |

Citations: | 26 - 0 self |

@INPROCEEDINGS{Mcdonald07onthe,

author = {Ryan Mcdonald},

title = {On the complexity of non-projective data-driven dependency parsing},

booktitle = {In Proc. IWPT},

year = {2007},

pages = {121--132}

}

### Years of Citing Articles

### Abstract

In this paper we investigate several nonprojective parsing algorithms for dependency parsing, providing novel polynomial time solutions under the assumption that each dependency decision is independent of all the others, called here the edge-factored model. We also investigate algorithms for non-projective parsing that account for nonlocal information, and present several hardness results. This suggests that it is unlikely that exact non-projective dependency parsing is tractable for any model richer than the edge-factored model. 1

### Citations

8940 |
Introduction to Algorithms
- Cormen
- 2001
(Show Context)
Citation Context ...). Thus, if we construct Q for a graph Gx, then the determinant of the matrix Q 0 is equivalent to Zx. The determinant of an n×n matrix can be calculated in numerous ways, most of which take O(n 3 ) (=-=Cormen et al., 1990-=-). The most efficient algorithms for calculating the determinant of a matrix use the fact that the problem is no harder than matrix multiplication (Cormen et al., 1990). Matrix multiplication currentl... |

2483 | Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...ve this problem using the edge expectation algorithm described in Section 3.3 and the argmax algorithm described in Section 3.1. 4.3 Non-Projective Log-Linear Models Conditional Random Fields (CRFs) (=-=Lafferty et al., 2001-=-) are global discriminative learning algorithms for problems with structured output spaces, such as dependency parsing. For dependency parsing, CRFs would define the conditional probability of a depen... |

728 | Accurate unlexicalized parsing - Klein, Manning - 2003 |

407 | Dependency Syntax: Theory and Practice - Mel'·cuk - 1988 |

285 | Non-Projective Dependency Parsing using Spanning Tree Algorithms - McDonald, Pereira, et al. |

264 | Three new probabilistic models for dependency parsing: An exploration
- Eisner
- 1996
(Show Context)
Citation Context ...s tractable for any model assumptions weaker than those made by the edge-factored models. 1.1 Related Work There has been extensive work on data-driven dependency parsing for both projective parsing (=-=Eisner, 1996-=-; Paskin, 2001; Yamada and Matsumoto, 2003; Nivre and Scholz, 2004; McDonald et al., 2005a) and non-projective parsing systems (Nivre and Nilsson, 2005; Hall and Nóvák, 2005; McDonald et al., 2005b). ... |

260 | CoNLL-X Shared Task on Multilingual Dependency Parsing
- Buchholz, Marsi
- 2006
(Show Context)
Citation Context ...on-projectivity arises due to long distance dependencies or in languages with flexible word order. For many languages, a significant portion of sentences require a non-projective dependency analysis (=-=Buchholz et al., 2006-=-). Thus, the ability to learn and infer nonprojective dependency graphs is an important problem in multilingual language processing. Syntactic dependency parsing has seen a number of new learning and ... |

259 | Ultraconservative Online Algorithms for Multiclass Problems
- Crammer, Singer
(Show Context)
Citation Context ...h calculation is required. 4.1 Inference Based Learning Many learning paradigms can be defined as inference-based learning. These include the perceptron (Collins, 2002) and its large-margin variants (=-=Crammer and Singer, 2003-=-; McDonald et al., 2005a). In these settings, a models parameters are iteratively updated based on the argmax calculation for a single or set of training instances under the current parameter settings... |

240 | Online Large-Margin Training of Dependency Parsers - McDonald, Crammer, et al. |

233 | Memory-Based Dependency Parsing
- Nivre, Nilsson
- 2004
(Show Context)
Citation Context ...data-driven dependency parsing for both projective parsing (Eisner, 1996; Paskin, 2001; Yamada and Matsumoto, 2003; Nivre and Scholz, 2004; McDonald et al., 2005a) and non-projective parsing systems (=-=Nivre and Nilsson, 2005-=-; Hall and Nóvák, 2005; McDonald et al., 2005b). These approaches can often be classified into two broad categories. In the first category are those methods that employ approximate inference, typicall... |

209 |
Éléments de Syntaxe structurale
- Tesnière
- 1959
(Show Context)
Citation Context ...anguage are a simple yet flexible mechanism for encoding words and their syntactic dependencies through directed graphs. These representations have been thoroughly studied in descriptive linguistics (=-=Tesnière, 1959-=-; Hudson, 1984; Sgall et al., 1986; Me´lčuk, 1988) and have been applied in numerous language processing tasks. Figure 1 gives an example dependency graph for the sentence Mr. Tomash will remain as a ... |

192 |
Optimum Branchings
- Edmonds
- 1967
(Show Context)
Citation Context ...rgmax T ∈T (Gx) (i,j) k w ∈ET k ij McDonald et al. (2005b) showed that this can be solved in O(n2 ) for unlabeled parsing using the Chu-Liu-Edmonds algorithm for standard digraphs (Chu and Liu, 1965; =-=Edmonds, 1967-=-). Unlike most exact projective parsing algorithms, which use efficient bottom-up chart parsing algorithms, the ChuLiu-Edmonds algorithm is greedy in nature. It begins by selecting the single best inc... |

169 |
Graph Theory
- Tutte
- 1984
(Show Context)
Citation Context ...anics. We denote this value as Zx, Zx = � w(T ) = � � T ∈T (Gx) T ∈T (Gx) (i,j) k∈ET w k i,j To compute this sum it is possible to use the Matrix Tree Theorem for multi-digraphs, Matrix Tree Theorem (=-=Tutte, 1984-=-): Let G be a multi-digraph with nodes V = {0, 1, . . . , n} and edges E. Define (Laplacian) matrix Q as a (n + 1)×(n + 1) matrix indexed from 0 to n. For all i and j, define: Qjj = � w k ij & Qij = �... |

166 | Online Learning of Approximate Dependency Parsing Algorithms
- McDonald, Pereira
- 2006
(Show Context)
Citation Context ...s that it is not a realistic assumption. Non-local information, such as arity (or valency) and neighbouring dependencies, can be crucial to obtaining high parsing accuracies (Klein and Manning, 2002; =-=McDonald and Pereira, 2006-=-). However, in the data-driven parsing setting this can be partially adverted by incorporating rich feature representations over the input (McDonald et al., 2005a). The goal of this work is to further... |

124 |
On the shortest arborescence of a directed graph
- Chu, Liu
- 1965
(Show Context)
Citation Context ... sentence x � T = argmax T ∈T (Gx) (i,j) k w ∈ET k ij McDonald et al. (2005b) showed that this can be solved in O(n2 ) for unlabeled parsing using the Chu-Liu-Edmonds algorithm for standard digraphs (=-=Chu and Liu, 1965-=-; Edmonds, 1967). Unlike most exact projective parsing algorithms, which use efficient bottom-up chart parsing algorithms, the ChuLiu-Edmonds algorithm is greedy in nature. It begins by selecting the ... |

100 |
Deterministic dependency parsing of English text
- Nivre, Scholz
- 2004
(Show Context)
Citation Context ...made by the edge-factored models. 1.1 Related Work There has been extensive work on data-driven dependency parsing for both projective parsing (Eisner, 1996; Paskin, 2001; Yamada and Matsumoto, 2003; =-=Nivre and Scholz, 2004-=-; McDonald et al., 2005a) and non-projective parsing systems (Nivre and Nilsson, 2005; Hall and Nóvák, 2005; McDonald et al., 2005b). These approaches can often be classified into two broad categories... |

97 | A novel use of statistical parsing to extract information from text
- Miller, Fox, et al.
- 2000
(Show Context)
Citation Context ...ures the compatibility of projective parsing algorithms with many important natural language processing methods that work within a bottom-up chart parsing framework, including information extraction (=-=Miller et al., 2000-=-) and syntax-based machine translation (Wu, 1996). The complexity results given here suggest that polynomial chart-parsing algorithms do not exist for the non-projective case. Otherwise we should be a... |

91 | A polynomial-time algorithm for statistical machine translation
- Wu
- 1996
(Show Context)
Citation Context ...h many important natural language processing methods that work within a bottom-up chart parsing framework, including information extraction (Miller et al., 2000) and syntax-based machine translation (=-=Wu, 1996-=-). The complexity results given here suggest that polynomial chart-parsing algorithms do not exist for the non-projective case. Otherwise we should be able to augment them and move beyond edgefactored... |

52 | Incremental integer linear programming for non-projective dependency parsing - Riedel, Clarke - 2006 |

44 | Pseudo-projectivity: a polynomially parsable non-projective dependency grammar - Kahane, Nasr, et al. - 1998 |

37 | Letting the cat out of the bag: generation for Shake-and-Bake MT
- Brew
- 1992
(Show Context)
Citation Context ... parsing can be related to certain parsing problems defined for phrase structure representations, as for instance immediate dominance CFG parsing (Barton et al., 1987) and shake-and-bake translation (=-=Brew, 1992-=-). Independently of this work, Koo et al. (2007) and Smith and Smith (2007) showed that the MatrixTree Theorem can be used to train edge-factored log-linear models of dependency parsing. Both studies ... |

37 | Tractable Bayesian learning of tree belief networks - Meila, Jaakkola - 2000 |

37 | The complexity of recognition of linguistically adequate dependency grammars - Neuhaus, Bröker - 1997 |

34 | Structured Prediction Models via the Matrix-Tree Theorem - Koo, Globerson, et al. - 2007 |

33 | Structure and performance of a dependency language model
- Chelba, Engle, et al.
- 1997
(Show Context)
Citation Context ...y trees, p(x|n) = � p(x, T |n) = T ∈T (Gx) � T ∈T (Gx) = β � p(x|T, n)p(T |n) � T ∈T (Gx) (i,j) k∈ET p k xi,xj = βZx This probability can be used directly as a nonprojective syntactic language model (=-=Chelba et al., 1997-=-) or possibly interpolated with a separate ngram model. 4.4.2 Unsupervised Learning In unsupervised learning we train our model on |X | a sample of unannotated sentences X = {xα} α=1 . Let |xα| = nα a... |

30 |
The Meaning of the Sentence in Its Pragmatic Aspects
- Sgall, Hajičová, et al.
- 1986
(Show Context)
Citation Context ...ble mechanism for encoding words and their syntactic dependencies through directed graphs. These representations have been thoroughly studied in descriptive linguistics (Tesnière, 1959; Hudson, 1984; =-=Sgall et al., 1986-=-; Me´lčuk, 1988) and have been applied in numerous language processing tasks. Figure 1 gives an example dependency graph for the sentence Mr. Tomash will remain as a director emeritus, which has been ... |

26 | Probabilistic Models of Nonprojective Dependency Trees - Smith, Smith - 2007 |

23 | Corrective Modeling for Non-Projective Dependency Parsing
- Hall, Novák
- 2005
(Show Context)
Citation Context ...rsing for both projective parsing (Eisner, 1996; Paskin, 2001; Yamada and Matsumoto, 2003; Nivre and Scholz, 2004; McDonald et al., 2005a) and non-projective parsing systems (Nivre and Nilsson, 2005; =-=Hall and Nóvák, 2005-=-; McDonald et al., 2005b). These approaches can often be classified into two broad categories. In the first category are those methods that employ approximate inference, typically through the use of l... |

20 | 2004. Corpus-based induction of syntactic structure: Models of dependency and constituency - Klein, Manning |

18 | A statistical constraint dependency grammar (cdg) parser - Wang, Harper - 2004 |

9 |
Fast Exact Natural Language Parsing with a Factored Model
- Klein, Manning
- 2001
(Show Context)
Citation Context ...pendency as independent is that it is not a realistic assumption. Non-local information, such as arity (or valency) and neighbouring dependencies, can be crucial to obtaining high parsing accuracies (=-=Klein and Manning, 2002-=-; McDonald and Pereira, 2006). However, in the data-driven parsing setting this can be partially adverted by incorporating rich feature representations over the input (McDonald et al., 2005a). The goa... |

5 |
The k best spanning arborescences of a network. Networks
- Camerini, Fratta, et al.
- 1980
(Show Context)
Citation Context ...s no longer a multi-digraph and the Chu-Liu-Edmonds algorithm can be applied directly. The new runtime is O(|L|n 2 ). As a side note, the k-best argmax problem for digraphs can be solved in O(kn 2 ) (=-=Camerini et al., 1980-=-). This can also be easily extended to the multidigraph case for labeled parsing. 3.2 Partition Function A common step in many learning algorithms is to compute the sum over the weight of all the poss... |

1 | Graph branch algorithm: An optimum tree search method for scored dependency graph with arc co-occurrence constraints
- Hirakawa
- 2006
(Show Context)
Citation Context ...dge-factored assumption, including both approximate methods (McDonald and Pereira, 2006) and exact methods through integer linear programming (Riedel and Clarke, 2006) or branch-and-bound algorithms (=-=Hirakawa, 2006-=-). For grammar based models there has been limited work on empirical systems for non-projective parsing systems, notable exceptions include the work of Wang and Harper (2004). Theoretical studies of n... |

1 |
Bayes risk minimization in natural language parsing. University of Geneva technical report
- Titov, Henderson
- 2006
(Show Context)
Citation Context ...T ∈T (Gx) T ′ w(T ∈T (Gx) ′ )R(T, T ′ ) where R is a risk function measuring the error between two graphs. Min-risk decoding has been studied for both phrase-structure parsing and dependency parsing (=-=Titov and Henderson, 2006-=-). In that work, as is common with many min-risk decoding schemes, T (Gx) is not the entire space of parse structures. Instead, this set is usually restricted to a small number of possible trees that ... |