## Extending DOP1 with the Insertion Operation (2000)

### Abstract

In Data-Oriented Parsing (DOP) an annotated corpus is used as a stochastic grammar. The most probable analysis of a new input sentence is constructed by combining sub-analyses from the corpus in the most probable way. This thesis presents a model in which the DOP1 model as developed by Bod is enriched with the insertion operation, thus yielding a stochastic Tree Insertion Grammar (TIG) instead of a Stochastic Tree Substitution Grammar. TIG is related to Tree-Adjoining Grammar. Since the adjunction permitted in TIG is restricted, TIG can embed the elegance of the analyses found in Tree-Adjoining Grammar without allowing for context sensitive languages. In addition to presenting the model, the thesis reports on some experiments for measuring the disambiguation accuracy of the model on the ATIS domain. Furthermore, the thesis shows that the Monte Carlo sampling algorithm used in DOP1 to select the most probable parse from the parse forest does not always sample a unique random derivation. A more efficient correct algorithm has been developed.

