Results 1 -
4 of
4
Unsupervised parsing with U-DOP
- In CoNLL
, 2006
"... We propose a generalization of the supervised DOP model to unsupervised learning. This new model, which we call U-DOP, initially assigns all possible unlabeled binary trees to a set of sentences and next uses all subtrees from (a large subset of) these binary trees to compute the most probable parse ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We propose a generalization of the supervised DOP model to unsupervised learning. This new model, which we call U-DOP, initially assigns all possible unlabeled binary trees to a set of sentences and next uses all subtrees from (a large subset of) these binary trees to compute the most probable parse trees. We show how U-DOP can be implemented by a PCFG-reduction technique and report competitive results on English (WSJ), German (NEGRA) and Chinese (CTB) data. To the best of our knowledge, this is the first paper which accurately bootstraps structure for Wall Street Journal sentences up to 40 words obtaining roughly the same accuracy as a binarized supervised PCFG. We show that previous approaches to unsupervised parsing have shortcomings in that they either constrain the lexical or the structural context, or both. 1
Is the End of Supervised Parsing in Sight
- Proc. of the 45th Meeting of the ACL
, 2007
"... How far can we get with unsupervised parsing if we make our training corpus several orders of magnitude larger than has hitherto be attempted? We present a new algorithm for unsupervised parsing using an all-subtrees model, termed U-DOP*, which parses directly with packed forests of all binary trees ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
How far can we get with unsupervised parsing if we make our training corpus several orders of magnitude larger than has hitherto be attempted? We present a new algorithm for unsupervised parsing using an all-subtrees model, termed U-DOP*, which parses directly with packed forests of all binary trees. We train both on Penn’s WSJ data and on the (much larger) NANC corpus, showing that U-DOP * outperforms a treebank-PCFG on the standard WSJ test set. While U-DOP * performs worse than state-of-the-art supervised parsers on handannotated sentences, we show that the model outperforms supervised parsers when evaluated as a language model in syntax-based machine translation on Europarl. We argue that supervised parsers miss the fluidity between constituents and non-constituents and that in the field of syntax-based language modeling the end of supervised parsing has come in sight. 1
Unsupervised Syntax-Based Machine Translation
, 2007
"... We present a new unsupervised syntax-based MT system, termed U-DOT, which uses the unsupervised U-DOP model for learning paired trees, and which computes the most probable target sentence from the relative frequencies of paired subtrees. We test U-DOT on the German-English Europarl corpus, showing t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present a new unsupervised syntax-based MT system, termed U-DOT, which uses the unsupervised U-DOP model for learning paired trees, and which computes the most probable target sentence from the relative frequencies of paired subtrees. We test U-DOT on the German-English Europarl corpus, showing that it outperforms the state-of-the-art phrase-based Pharaoh system. We demonstrate that the inclusion of noncontiguous phrases significantly improves the translation accuracy. This paper presents the first translation results with the data-oriented translation (DOT) model on the Europarl corpus, to the best of our knowledge. Introduction: Phrase-Based vs Syntax-Based Machine Translation Phrase-based and syntax-based methods in MT have complementary strengths and shortcomings. While phrase-based methods have been highly successful
Towards Unifying Perception and Cognition: The Ubiquity of Trees. Prepublication
, 2005
"... Is there a single mechanism that underlies all perceptual and cognitive processing? This paper aims to solve a small part of Newell's challenge (A. Newell 1990, Unified Theories of Cognition, Harvard University Press) and proposes a model that unifies three different modalities: language, music and ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Is there a single mechanism that underlies all perceptual and cognitive processing? This paper aims to solve a small part of Newell's challenge (A. Newell 1990, Unified Theories of Cognition, Harvard University Press) and proposes a model that unifies three different modalities: language, music and problem-solving. In doing so, we will focus on tree structures. Trees are ubiquitous in modeling high-level perception and cognition and have been used to represent grouping structures in linguistic, musical and visual perception and deductive structures in reasoning, learning and problem solving. We will show that an instantiation of the Data-Oriented Parsing (DOP) framework can accurately predict the correct tree structure for linguistic utterances, musical pieces and physics problems. The key idea of the DOP framework is that new input is analyzed by combining subtrees from a representative corpus of previous trees. While the labeling of the trees and the details of the combination operation may differ across the modalities, we argue that there is one model for predicting the tree that humans come up with. We report on experiments with manually annotated corpora for the three modalities, showing that the best performing model is the one which takes into account subtrees of arbitrary size and which selects the most probable tree from among the shortest derivations of an input.

