Results 1 -
4 of
4
Is the End of Supervised Parsing in Sight
- Proc. of the 45th Meeting of the ACL
, 2007
"... How far can we get with unsupervised parsing if we make our training corpus several orders of magnitude larger than has hitherto be attempted? We present a new algorithm for unsupervised parsing using an all-subtrees model, termed U-DOP*, which parses directly with packed forests of all binary trees ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
How far can we get with unsupervised parsing if we make our training corpus several orders of magnitude larger than has hitherto be attempted? We present a new algorithm for unsupervised parsing using an all-subtrees model, termed U-DOP*, which parses directly with packed forests of all binary trees. We train both on Penn’s WSJ data and on the (much larger) NANC corpus, showing that U-DOP * outperforms a treebank-PCFG on the standard WSJ test set. While U-DOP * performs worse than state-of-the-art supervised parsers on handannotated sentences, we show that the model outperforms supervised parsers when evaluated as a language model in syntax-based machine translation on Europarl. We argue that supervised parsers miss the fluidity between constituents and non-constituents and that in the field of syntax-based language modeling the end of supervised parsing has come in sight. 1
Unsupervised Syntax-Based Machine Translation
, 2007
"... We present a new unsupervised syntax-based MT system, termed U-DOT, which uses the unsupervised U-DOP model for learning paired trees, and which computes the most probable target sentence from the relative frequencies of paired subtrees. We test U-DOT on the German-English Europarl corpus, showing t ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present a new unsupervised syntax-based MT system, termed U-DOT, which uses the unsupervised U-DOP model for learning paired trees, and which computes the most probable target sentence from the relative frequencies of paired subtrees. We test U-DOT on the German-English Europarl corpus, showing that it outperforms the state-of-the-art phrase-based Pharaoh system. We demonstrate that the inclusion of noncontiguous phrases significantly improves the translation accuracy. This paper presents the first translation results with the data-oriented translation (DOT) model on the Europarl corpus, to the best of our knowledge. Introduction: Phrase-Based vs Syntax-Based Machine Translation Phrase-based and syntax-based methods in MT have complementary strengths and shortcomings. While phrase-based methods have been highly successful
Tabulation of Automata for Tree Adjoining Languages
, 1999
"... We try to provide a common framework to clarify the relationships between different automata and their associated tabulation techniques for Tree Adjoning Languages, a subclass of Mildly Context-Sensitive Languages. We have chosen Logic Push-down Automata working with Linear Indexed Grammars as a sta ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We try to provide a common framework to clarify the relationships between different automata and their associated tabulation techniques for Tree Adjoning Languages, a subclass of Mildly Context-Sensitive Languages. We have chosen Logic Push-down Automata working with Linear Indexed Grammars as a starting point. Several tabulation techniques for different parsing strategies are proposed and compared with previous approaches. 1 Introduction The class of Mildly Context-Sensitive Languages (MCSL) is placed between context-free languages and context-sensitive languages. An important subclass in MCSL is that of Tree Adjoining Languages, which can be described by several grammar formalisms which have been shown to be equivalent with respect to their weak generative capacity (Vijay-Shanker and Weir, 1994): Tree Adjoining Grammars (Joshi and Schabes, 1997), Linear Indexed Grammars (Gazdar, 1987), Head Grammars (Pollard, 1984) and Combinatory Categorial Grammars (Steedman, 1986). Several parsi...
From exemplar to grammar: Integrating analogy and probability in language learning
, 2008
"... We present a new model of language learning which is based on the following idea: if a language learner does not know which phrase-structure trees should be assigned to initial sentences, s/he allows (implicitly) for all possible trees and lets linguistic experience decide which is the ‘best’ tree f ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a new model of language learning which is based on the following idea: if a language learner does not know which phrase-structure trees should be assigned to initial sentences, s/he allows (implicitly) for all possible trees and lets linguistic experience decide which is the ‘best’ tree for each sentence. The best tree is obtained by maximizing ‘structural analogy ’ between a sentence and previous sentences, which is formalized by the most probable shortest combination of subtrees from all trees of previous sentences. Corpus-based experiments with this model on the Penn Treebank and the Childes database indicate that it can learn both exemplar-based and rulebased aspects of language, ranging from phrasal verbs to auxiliary fronting. By having learned the syntactic structures of sentences, we have also learned the grammar implicit in these structures, which can in turn be used to produce new sentences. We show that our model mimicks children’s language development from item-based constructions to abstract constructions, and that the model can simulate some of the errors made by children in producing complex questions. 1 1

