## Online Learning of Approximate Dependency Parsing Algorithms (2006)

### Cached

### Download Links

Venue: | In Proc. of EACL |

Citations: | 161 - 9 self |

### BibTeX

@INPROCEEDINGS{Mcdonald06onlinelearning,

author = {Ryan Mcdonald and Fernando Pereira},

title = {Online Learning of Approximate Dependency Parsing Algorithms},

booktitle = {In Proc. of EACL},

year = {2006},

pages = {81--88}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we extend the maximum spanning tree (MST) dependency parsing framework of McDonald et al. (2005c) to incorporate higher-order feature representations and allow dependency structures with multiple parents per word. We show that those extensions can make the MST framework computationally intractable, but that the intractability can be circumvented with new approximate parsing algorithms. We conclude with experiments showing that discriminative online learning using those approximate algorithms achieves the best reported parsing accuracy for Czech and Danish. 1

### Citations

903 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...phs involving multiple parents are well established in the literature (Hudson, 1984). Unfortunately, the problem of finding the dependency structure with highest score in this setting is intractable (=-=Chickering et al., 1994-=-). To create an approximate parsing algorithm for dependency structures with multiple parents, we start with our approximate second-order nonprojective algorithm outlined in Figure 4. We use the non-p... |

822 | A Maximum-Entropy-Inspired Parser
- Charniak
(Show Context)
Citation Context ...atures to be defined over single attachment decisions. Previous work has shown that conditioning on neighboring decisions can lead to significant improvements in accuracy (Yamada and Matsumoto, 2003; =-=Charniak, 2000-=-). In this paper we extend the MST parsing framework to incorporate higher-order feature representations of bounded-size connected subgraphs. We also present an algorithm for acyclic dependency graphs... |

442 | A maximum entropy model for part-of-speech tagging - Ratnaparkhi - 1996 |

382 | Dependency Syntax: Theory and Practice - Mel’čuk - 1988 |

271 | Non-projective dependency parsing using spanning tree algorithms - McDonald, Pereira, et al. - 2005 |

255 | Three New Probabilistic Models for Dependency Parsing: An Exploration
- Eisner
- 1996
(Show Context)
Citation Context ...le dependency structure. proposed by McDonald et al. (2005c). This formulation leads to efficient parsing algorithms for both projective and non-projective dependency trees with the Eisner algorithm (=-=Eisner, 1996-=-) and the Chu-Liu-Edmonds algorithm (Chu and Liu, 1965; Edmonds, 1967) respectively. The formulation works by defining the score of a dependency tree to be the sum of edge scores, s(x, y) = � s(i, j) ... |

249 | Ultraconservative online algorithms for multiclass problems
- Crammer, Singer
- 2003
(Show Context)
Citation Context ... a training set T = {(xt, yt)} T t=1 , consisting of pairs of a sentence xt and its correct dependency representation yt. The algorithm is an extension of the Margin Infused Relaxed Algorithm (MIRA) (=-=Crammer and Singer, 2003-=-) to learning with structured outputs, in the present case dependency structures. Figure 6 gives pseudo-code for the algorithm. An online learning algorithm considers a single training instance for ea... |

225 | Online large-margin training of dependency parsers - McDonald, Crammer, et al. - 2005 |

220 | Memory-based dependency parsing
- Nivre, Hall, et al.
- 2004
(Show Context)
Citation Context ...nstraint. We can easily motivate this approximation by observing that even in non-projective languages like Czech and Danish, most trees are primarily projective with just a few non-projective edges (=-=Nivre and Nilsson, 2005-=-). Thus, by starting with the highest scoring projective tree, we are typically only a small number of transformations away from the highest scoring non-projective tree. The algorithm is shown in Figu... |

177 |
Optimum branchings
- Edmonds
- 1967
(Show Context)
Citation Context ...ormulation leads to efficient parsing algorithms for both projective and non-projective dependency trees with the Eisner algorithm (Eisner, 1996) and the Chu-Liu-Edmonds algorithm (Chu and Liu, 1965; =-=Edmonds, 1967-=-) respectively. The formulation works by defining the score of a dependency tree to be the sum of edge scores, s(x, y) = � s(i, j) (i,j)∈y where x = x1 · · · xn is an input sentence and y a dependency... |

140 | Incremental parsing with the perceptron algorithm
- Collins, Roark
- 2004
(Show Context)
Citation Context ...ust even with approximate rather than exact inference in problems such as word alignment (Moore, 2005), sequence analysis (Daumé and Marcu, 2005; McDonald et al., 2005a) and phrase-structure parsing (=-=Collins and Roark, 2004-=-). This robustness to approximations comes from the fact that the online framework sets weights with respect to inference. In other words, the learning method sees common errors due tosTraining data: ... |

138 | Max-margin parsing
- Taskar, Klein, et al.
- 2004
(Show Context)
Citation Context ...he current weight setting. Past work on tree-structured outputs has used constraints for the k-best scoring tree (McDonald et al., 2005b) or even all possible trees by using factored representations (=-=Taskar et al., 2004-=-; McDonald et al., 2005c). However, we have found that a single margin constraint per example leads to much faster training with a negligible degradation in performance. Furthermore, this formulation ... |

125 | A statistical parser for Czech - Collins, Ramshaw, et al. - 1999 |

114 |
On the shortest arborescence of a directed graph
- Chu, Lin
- 1965
(Show Context)
Citation Context ...al. (2005c). This formulation leads to efficient parsing algorithms for both projective and non-projective dependency trees with the Eisner algorithm (Eisner, 1996) and the Chu-Liu-Edmonds algorithm (=-=Chu and Liu, 1965-=-; Edmonds, 1967) respectively. The formulation works by defining the score of a dependency tree to be the sum of edge scores, s(x, y) = � s(i, j) (i,j)∈y where x = x1 · · · xn is an input sentence and... |

64 | A discriminative framework for bilingual word alignment
- Moore
- 2005
(Show Context)
Citation Context ...rceptron algorithm for structured outputs Collins (2002). Online learning algorithms have been shown to be robust even with approximate rather than exact inference in problems such as word alignment (=-=Moore, 2005-=-), sequence analysis (Daumé and Marcu, 2005; McDonald et al., 2005a) and phrase-structure parsing (Collins and Roark, 2004). This robustness to approximations comes from the fact that the online frame... |

38 | Learning as search optimization: Approximate large margin methods for structured prediction
- Hal, Marcu
- 2005
(Show Context)
Citation Context ...puts Collins (2002). Online learning algorithms have been shown to be robust even with approximate rather than exact inference in problems such as word alignment (Moore, 2005), sequence analysis (Daum=-=é and Marcu, 2005-=-; McDonald et al., 2005a) and phrase-structure parsing (Collins and Roark, 2004). This robustness to approximations comes from the fact that the online framework sets weights with respect to inference... |

30 | Flexible text segmentation with structured multilabel classification - McDonald, Crammer, et al. - 2005 |

23 | Corrective modeling for non-projective dependency parsing - Hall, Novák - 2005 |

13 | The Danish Dependency Treebank and the DTAG treebank tool
- Kromann
- 2003
(Show Context)
Citation Context ...dary Parents Kromann (2001) argued for a dependency formalism called Discontinuous Grammar and annotated a large set of Danish sentences using this formalism to create the Danish Dependency Treebank (=-=Kromann, 2003-=-). The formalism allows for a h3 h3sroot Han spejder efter og ser elefanterne He looks for and sees elephants Figure 5: An example dependency tree from the Danish Dependency Treebank (from Kromann (20... |

9 |
Learning Bayesian networks: The combination of knowledge and statistical data
- Chickering
- 1994
(Show Context)
Citation Context ...phs involving multiple parents are well established in the literature (Hudson, 1984). Unfortunately, the problem of finding the dependency structure with highest score in this setting is intractable (=-=Chickering et al., 1994-=-). To create an approximate parsing algorithm for dependency structures with multiple parents, we start with our approximate second-order nonprojective algorithm outlined in Figure 4. We use the non-p... |

3 |
The Prague Dependency Treebank 1.0 CDROM. Linguistics Data Consortium Cat
- Hajič, Hajicova, et al.
- 2001
(Show Context)
Citation Context ... particular, the complete tree metric is improved considerably. 5.2 Czech Results For the Czech data, we used the predefined training, development and testing split of the Prague Dependency Treebank (=-=Hajič et al., 2001-=-), and the automatically generated POS tags supplied with the data, which we reduce to the POS tag set from Collins et al. (1999). On average, 23% of the sentences in the training, development and tes... |

2 | Optimaility parsing and local cost functions in discontinuous grammars - Kromann - 2003 |

1 | óv ák. 2005. Corrective modeling for non-projective dependency parsing - Hall, N |