Results 1  10
of
16
Generalized HigherOrder Dependency Parsing with Cube Pruning
, 2012
"... Stateoftheart graphbased parsers use features over higherorder dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transitionbased dependency parsers can easily utilize such features without increasing the linear complexity of the shiftr ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Stateoftheart graphbased parsers use features over higherorder dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transitionbased dependency parsers can easily utilize such features without increasing the linear complexity of the shiftreduce system beyond a constant. In this paper, we attempt to address this imbalance for graphbased parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with thirdorder features in a graphbased parser. Our parser achieves the stateofart unlabeled accuracy of 93.06% and labeled accuracy of 91.86 % on the standard test set for English, at a faster speed than a reimplementation of the thirdorder model of Koo et al. (2010).
Datadriven parsing with probabilistic linear contextfree rewriting systems
 In Proceedings of the 23rd International Conference on Computational Linguistics
, 2010
"... This paper presents a first efficient implementation of a weighted deductive CYK parser for Probabilistic Linear ContextFree Rewriting Systems (PLCFRS), together with contextsummary estimates for parse items used to speed up parsing. LCFRS, an extension of CFG, can describe discontinuities both in ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
This paper presents a first efficient implementation of a weighted deductive CYK parser for Probabilistic Linear ContextFree Rewriting Systems (PLCFRS), together with contextsummary estimates for parse items used to speed up parsing. LCFRS, an extension of CFG, can describe discontinuities both in constituency and dependency structures in a straightforward way and is therefore a natural candidate to be used for datadriven parsing. We evaluate our parser with a grammar extracted from the German NeGra treebank. Our experiments show that datadriven LCFRS parsing is feasible with a reasonable speed and yields output of competitive quality. 1
Efficient Parsing of WellNested Linear ContextFree Rewriting Systems
"... The use of wellnested linear contextfree rewriting systems has been empirically motivated for modeling of the syntax of languages with discontinuous constituents or relatively free word order. We present a chartbased parsing algorithm that asymptotically improves the known running time upper boun ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
The use of wellnested linear contextfree rewriting systems has been empirically motivated for modeling of the syntax of languages with discontinuous constituents or relatively free word order. We present a chartbased parsing algorithm that asymptotically improves the known running time upper bound for this class of rewriting systems. Our result is obtained through a linear space construction of a binary normal form for the grammar at hand. 1
Parsing Mildly Nonprojective Dependency Structures ∗
"... We present parsing algorithms for various mildly nonprojective dependency formalisms. In particular, algorithms are presented for: all wellnested structures of gap degree at most 1, with the same complexity as the best existing parsers for constituency formalisms of equivalent generative power; al ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We present parsing algorithms for various mildly nonprojective dependency formalisms. In particular, algorithms are presented for: all wellnested structures of gap degree at most 1, with the same complexity as the best existing parsers for constituency formalisms of equivalent generative power; all wellnested structures with gap degree bounded by any constant k; and a new class of structures with gap degree up to k that includes some illnested structures. The third case includes all the gap degree k structures in a number of dependency treebanks. 1
Discontinuous DataOriented Parsing: A mildly contextsensitive allfragments grammar
"... Recent advances in parsing technology have made treebank parsing with discontinuous constituents possible, with parser output of competitive quality (Kallmeyer and Maier, 2010). We apply DataOriented Parsing (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS). Decisions during ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Recent advances in parsing technology have made treebank parsing with discontinuous constituents possible, with parser output of competitive quality (Kallmeyer and Maier, 2010). We apply DataOriented Parsing (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS). Decisions during parsing are conditioned on all possible fragments, resulting in improved performance. Despite the fact that both DOP and discontinuity present formidable challenges in terms of computational complexity, the model is reasonably efficient, and surpasses the state of the art in discontinuous parsing. 1
Optimal Parsing Strategies for Linear ContextFree Rewriting Systems
"... Factorization is the operation of transforming a production in a Linear ContextFree Rewriting System (LCFRS) into two simpler productions by factoring out a subset of the nonterminals on the production’s righthand side. Factorization lowers the rank of a production but may increase its fanout. We ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Factorization is the operation of transforming a production in a Linear ContextFree Rewriting System (LCFRS) into two simpler productions by factoring out a subset of the nonterminals on the production’s righthand side. Factorization lowers the rank of a production but may increase its fanout. We show how to apply factorization in order to minimize the parsing complexity of the resulting grammar, and study the relationship between rank, fanout, and parsing complexity. We show that it is always possible to obtain optimum parsing complexity with rank two. However, among transformed grammars of rank two, minimum parsing complexity is not always possible with minimum fanout. Applying our factorization algorithm to LCFRS rules extracted from dependency treebanks allows us to find the most efficient parsing strategy for the syntactic phenomena found in nonprojective trees. 1
Direct parsing of discontinuous constituents in German
 In Proceedings of the SPMRL workshop at NAACL HLT 2010
, 2010
"... Discontinuities occur especially frequently in languages with a relatively free word order, such as German. Generally, due to the longdistance dependencies they induce, they lie beyond the expressivity of Probabilistic CFG, i.e., they cannot be directly reconstructed by a PCFG parser. In this paper, ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Discontinuities occur especially frequently in languages with a relatively free word order, such as German. Generally, due to the longdistance dependencies they induce, they lie beyond the expressivity of Probabilistic CFG, i.e., they cannot be directly reconstructed by a PCFG parser. In this paper, we use a parser for Probabilistic Linear ContextFree Rewriting Systems (PLCFRS), a formalism with high expressivity, to directly parse the German NeGra and TIGER treebanks. In both treebanks, discontinuities are annotated with crossing branches. Based on an evaluation using different metrics, we show that an output quality can be achieved which is comparable to the output quality of PCFGbased systems. In most constituency treebanks, sentence annotation is restricted to having the shape of trees without crossing branches, and the nonlocal dependencies induced by the discontinuities are modeled by an additional mechanism. In the Penn Treebank (PTB) (Marcus et al., 1994), e.g., this mechanism is a combination of special labels and empty nodes, establishing implicit additional edges. In the German TüBaD/Z (Telljohann et al., 2006), additional edges are established by a combination of topological field annotation and special edge labels. As an example, Fig. 1 shows a tree from TüBaD/Z with the annotation of (1). Note here the edge label ONMOD on the relative clause which indicates that the subject of the sentence (alle Attribute) is modified. 1
Optimal rank reduction for linear contextfree rewriting systems with fanout two
 In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
, 2010
"... Linear ContextFree Rewriting Systems (LCFRSs) are a grammar formalism capable of modeling discontinuous phrases. Many parsing applications use LCFRSs where the fanout (a measure of the discontinuity of phrases) does not exceed 2. We present an efficient algorithm for optimal reduction of the lengt ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Linear ContextFree Rewriting Systems (LCFRSs) are a grammar formalism capable of modeling discontinuous phrases. Many parsing applications use LCFRSs where the fanout (a measure of the discontinuity of phrases) does not exceed 2. We present an efficient algorithm for optimal reduction of the length of production righthand side in LCFRSs with fanout at most 2. This results in asymptotical running time improvement for known parsing algorithms for this class. 1
Discontinuity and nonprojectivity: Using mildly contextsensitive formalisms for datadriven parsing
 In Proceedings of TAG
, 2010
"... We present a parser for probabilistic Linear ContextFree Rewriting Systems and use it for constituency and dependency treebank parsing. The choice of LCFRS, a formalism with an extended domain of locality, enables us to model discontinuous constituents and nonprojective dependencies in a straight ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We present a parser for probabilistic Linear ContextFree Rewriting Systems and use it for constituency and dependency treebank parsing. The choice of LCFRS, a formalism with an extended domain of locality, enables us to model discontinuous constituents and nonprojective dependencies in a straightforward way. The parsing results show that, firstly, our parser is efficient enough to be used for datadriven parsing and, secondly, its result quality for constituency parsing is comparable to the output quality of other stateoftheart results, all while yielding structures that display discontinuous dependencies. 1
An OptimalTime Binarization Algorithm for Linear ContextFree Rewriting Systems with FanOut Two
"... Linear contextfree rewriting systems (LCFRSs) are grammar formalisms with the capability of modeling discontinuous constituents. Many applications use LCFRSs where the fanout (a measure of the discontinuity of phrases) is not allowed to be greater than 2. We present an efficient algorithm for tran ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Linear contextfree rewriting systems (LCFRSs) are grammar formalisms with the capability of modeling discontinuous constituents. Many applications use LCFRSs where the fanout (a measure of the discontinuity of phrases) is not allowed to be greater than 2. We present an efficient algorithm for transforming LCFRS with fanout at most 2 into a binary form, whenever this is possible. This results in asymptotical runtime improvement for known parsing algorithms for this class. 1