## Extending DOP1 with the Insertion Operation (2000)

Citations: | 5 - 0 self |

### BibTeX

@TECHREPORT{Hoogweg00extendingdop1,

author = {Lars Hoogweg},

title = {Extending DOP1 with the Insertion Operation},

institution = {},

year = {2000}

}

### OpenURL

### Abstract

In Data-Oriented Parsing (DOP) an annotated corpus is used as a stochastic grammar. The most probable analysis of a new input sentence is constructed by combining sub-analyses from the corpus in the most probable way. This thesis presents a model in which the DOP1 model as developed by Bod is enriched with the insertion operation, thus yielding a stochastic Tree Insertion Grammar (TIG) instead of a Stochastic Tree Substitution Grammar. TIG is related to Tree-Adjoining Grammar. Since the adjunction permitted in TIG is restricted, TIG can embed the elegance of the analyses found in Tree-Adjoining Grammar without allowing for context sensitive languages. In addition to presenting the model, the thesis reports on some experiments for measuring the disambiguation accuracy of the model on the ATIS domain. Furthermore, the thesis shows that the Monte Carlo sampling algorithm used in DOP1 to select the most probable parse from the parse forest does not always sample a unique random derivation. A more efficient correct algorithm has been developed.

### Citations

2105 | Building a large annotated corpus of English: the Penn Treebank
- Marcus, Santorini
- 1994
(Show Context)
Citation Context ...wo big advantages of this kind of representation are, that it is very simple and that it is the kind of representation that is assumed in readily available annotated corpora such as the Penn Treebank =-=[MSM9-=-3]. The representation system is in fact the competence grammar that the system assumes; it denes the set of possible analyses. It is not necessary to dene this set very narrowly. Regularities may be ... |

632 | Synchronous tree adjoining grammars
- Shieber, Schabes
- 1990
(Show Context)
Citation Context ... SW93a]. 4.1 The Grammar Tree Insertion Grammar (TIG) [SW94, SW93a] is a tree generating system that makes use of tree substitution and tree adjunction. TIG is related to Tree-Adjoining Grammar (TAG) =-=[Jos87]-=-. However, the adjunction permitted in TIG is suciently restricted that TIGs only derive context free languages and TIGs have the same cubic-time worst-case complexity bounds for recognition and parsi... |

503 | Three generative, lexicalised models for statistical parsing - Collins - 1997 |

409 |
Monte Carlo Methods
- Hammersley, Handscomb
- 1975
(Show Context)
Citation Context ... Law of Large Numbers, the most often generated parse converges to the most probable parse. Methods that estimate the probability of an event by taking random samples are known as Monte Carlo methods =-=[HH64]-=-. 6.2 DOP1 Sampling Method For simplicity the only operation that will be assumed by the algorithms in this chapter is the substitution operation. The algorithms can straightforwardly be extended to i... |

201 | Maximum entropy models for natural language ambiguity resolution
- Ratnaparkhi
- 1998
(Show Context)
Citation Context ...ously adjoined on a single node. Therefore, a uniform distribution will be assumed for the probabilities of the dierent ordered derivations belonging to a single derivation. This way maximum entropy [=-=Rat98]-=- is ensured. A more realistic probability distribution could be inferred when a TIG (or TAG) annotated corpus is used, containing derivational information. The probability of an ordered derivation d b... |

153 | Mathematical and Computational Aspects of Lexicalized Grammars - Schabes - 1990 |

128 |
Algorithm schemata and data structures in syntactic processing
- Kay
- 1986
(Show Context)
Citation Context ...o after sampling the chart still contains two derivations. 1 In [Bod95] Bod gives a similar algorithm in which subderivations are sampled instead of elementary trees. 2 Visual representation based on =-=[Kay80-=-]. 33 0 a 1 b 2 c 3 a A a b A A B S b c B c B Figure 6.1: A parse forest for the string abc 34 6.3 A New Sampling Method A faster (top-down) sampling algorithm 3 which does not suer from the problem d... |

120 | Stochastic Lexicalized Tree-Adjoining Grammars - Schabes - 1992 |

115 | Parsing strategies with 'lexicalized' grammars: application to tree adjoining grammars - Schabes - 1988 |

109 |
Enriching Linguistics with Statistics: Performance Models of Natural Language, ILLC Dissertation Series 1995-14, University of Amsterdam (obtainable via anonymous ftp: ftp://ftp.fwi.uva.nl/pub/theory/illc/dissertations/DS-95-14.text.ps.gz). Bod, 1995b. "T
- Bod
- 1995
(Show Context)
Citation Context ... will be discussed and implemented in Chapters 8 and 9. 7 Chapter 2 DOP1 Parts of this chapter are taken from [BS96]. In this chapter thesrst instantiation of the DOP framework as developed by Bod in =-=[Bod95-=-] is described. This instantiation is called DOP1 and will be specied by indicating the four components as described in Section 1.2. Following the description of the model, some of its shortcomings wi... |

84 | Probabilistic tree-adjoining grammar as a framework for statistical natural language processing - RESNIK - 1992 |

82 | Parsing Inside-Out
- Goodman
- 1998
(Show Context)
Citation Context ...together with the probability mass it is sampled from. When a parse tree is sampled that has already been sampled before, the probability masses of the samples have to be totaled. After sampling 3 In =-=[Goo98]-=- Goodman gives another top-down sampling algorithm for sampling a random derivation from the parse forest of a (binary branching) PCFG. Mathematically, Goodman's algorithm is equivalent to Bod's. 35 N... |

77 | Tree insertion grammar: A cubic-time parsable formalism that lexicalizes context-free grammar without changing the trees produced. Technical report, Mitsubishi Electric Research Laboratories
- Schabes, Waters
- 1994
(Show Context)
Citation Context ...y disambiguation the selection of the most probable parse from this forest is meant. For the generation of a parse forest for an input sentence the Earley-style parsing algorithm that can be found in =-=[SW94]-=- can be used, which parses an input sentence of n words in O(Gn 3 ) time. The algorithm takes as input a set of elementary trees and a sentence and produces a chart of labeled phrases. A labeled phras... |

58 | Efficient algorithms for parsing the DOP model - Goodman - 1996 |

52 |
Taaltheorie en taaltechnologie; competence en performance [language theory and language technology; competence and performance
- Scha
- 1990
(Show Context)
Citation Context ...number of possible analyses, but they do not account for the fact that human language comprehenders usually perceive only one or two of these analyses. A performance model, like Data-Oriented Parsing =-=[Sch90a]-=-, on the other hand, should model the input-output properties of actual human perception. In the DOP model this is achieved by statistically enriching a linguistic competence theory. 1.2 Data-Oriented... |

49 | Complexity of Lexical Descriptions and Its Relevance to Partial Parsing - Srinivas - 1997 |

41 | Stochastic Lexicalized ContextFree Grammar - Schabes, Waters - 1993 |

33 |
Parsing with lexicalized tree adjoining grammar
- Schabes, Joshi
- 1991
(Show Context)
Citation Context ...agment Lexicalization The lexicalization of grammar formalisms is of interest from a computational perspective, because lexicalized grammars can be parsed much more eciently than non-lexicalized ones =-=[SJ9-=-0]. A grammar is said to be lexicalized if it consists of: asnite set of elementary structures ofsnite size, each of which contains at least one lexical item, and asnite set of operations for creati... |

12 |
Characterizing derivations trees of context free grammars through a generalization of nite automata theory
- Thatcher
- 1971
(Show Context)
Citation Context ...a grammar is the set of all paths from root to frontier in the trees generated by the grammar. The path set is a set of strings over [ NT [ fg.) The path sets for CFG and TSG are regular languages [T=-=ha71]-=-. In contrast, just as for TAG, the path sets for TIG are context free languages. To see this, consider that adjunction makes it possible to embed a sequence of nodes (the spine of the auxiliary tree)... |

10 | Stochastic Lexicalized Tree-Insertion Grammar - Schabes, Waters - 1996 |

8 |
Bod and Remko Scha. Data-oriented language processing: An overview
- Rens
- 1996
(Show Context)
Citation Context ...on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 A.3 Substitution and stop adjunction . . . . . . . . . . . . . . . . 58 4 Chapter 1 Introduction Parts of this chapter are taken from =-=[BS96]-=-. 1.1 Competence and Performance In current linguistic theory there exists a strong division between competence and performance models of natural language perception. A linguistic competence model aim... |

5 | Two questions about data-oriented parsing - Bod - 1996 |

4 |
Beyond Grammar: An Experience Based Theory of Language. CSLI
- Bod
- 1998
(Show Context)
Citation Context ...licity the only operation that will be assumed by the algorithms in this chapter is the substitution operation. The algorithms can straightforwardly be extended to include the insertion operation. In =-=[Bod98]-=- Bod gives the following algorithm 1 for sampling a random derivation from the parse forest: Sampling a random derivation in O(Gn 3 ) time Given a derivation forest of a sentence of n words, consistin... |

2 | An optimized algorithm for data-oriented parsing - Sima’an - 1996 |

1 | and Bangalore Srinivas. Disambiguation of super parts of speech (or supertags): Almost parsing - Joshi - 1994 |