Results 1 - 10
of
16
Memory-based dependency parsing
- In Proceedings of CoNLL
, 2004
"... In order to realize the full potential of dependency-based syntactic parsing, it is desirable to allow non-projective dependency structures. We show how a datadriven deterministic dependency parser, in itself restricted to projective structures, can be combined with graph transformation techniques t ..."
Abstract
-
Cited by 153 (32 self)
- Add to MetaCart
In order to realize the full potential of dependency-based syntactic parsing, it is desirable to allow non-projective dependency structures. We show how a datadriven deterministic dependency parser, in itself restricted to projective structures, can be combined with graph transformation techniques to produce non-projective structures. Experiments using data from the Prague Dependency Treebank show that the combined system can handle nonprojective constructions with a precision sufficient to yield a significant improvement in overall parsing accuracy. This leads to the best reported performance for robust non-projective parsing of Czech. 1
Maltparser: A language-independent system for data-driven dependency parsing
- In Proc. of the Fourth Workshop on Treebanks and Linguistic Theories
, 2005
"... ..."
Dependency parsing and domain adaptation with LR models and parser ensembles
- In Proceedings of the Eleventh Conference on Computational Natural Language Learning
, 2007
"... We present a data-driven variant of the LR algorithm for dependency parsing, and extend it with a best-first search for probabilistic generalized LR dependency parsing. Parser actions are determined by a classifier, based on features that represent the current state of the parser. We apply this pars ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
We present a data-driven variant of the LR algorithm for dependency parsing, and extend it with a best-first search for probabilistic generalized LR dependency parsing. Parser actions are determined by a classifier, based on features that represent the current state of the parser. We apply this parsing framework to both tracks of the CoNLL 2007 shared task, in each case taking advantage of multiple models trained with different learners. In the multilingual track, we train three LR models for each of the ten languages, and combine the analyses obtained with each individual model with a maximum spanning tree voting scheme. In the domain adaptation track, we use two models to parse unlabeled data in the target domain to supplement the labeled out-ofdomain training set, in a scheme similar to one iteration of co-training. 1
Annealing structural bias in multilingual weighted grammar induction
- In Proc. ACL
, 2006
"... We first show how a structural locality bias can improve the accuracy of state-of-the-art dependency grammar induction models trained by EM from unannotated examples (Klein and Manning, 2004). Next, by annealing the free parameter that controls this bias, we achieve further improvements. We then des ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
We first show how a structural locality bias can improve the accuracy of state-of-the-art dependency grammar induction models trained by EM from unannotated examples (Klein and Manning, 2004). Next, by annealing the free parameter that controls this bias, we achieve further improvements. We then describe an alternative kind of structural bias, toward “broken ” hypotheses consisting of partial structures over segmented sentences, and show a similar pattern of improvement. We relate this approach to contrastive estimation (Smith and Eisner, 2005a), apply the latter to grammar induction in six languages, and show that our new approach improves accuracy by 1–17 % (absolute) over CE (and 8–30% over EM), achieving to our knowledge the best results on this task to date. Our method, structural annealing, is a general technique with broad applicability to hidden-structure discovery problems. 1
Structured prediction models via the matrix-tree theorem
- In EMNLP-CoNLL
, 2007
"... This paper provides an algorithmic framework for learning statistical models involving directed spanning trees, or equivalently non-projective dependency structures. We show how partition functions and marginals for directed spanning trees can be computed by an adaptation of Kirchhoff’s Matrix-Tree ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
This paper provides an algorithmic framework for learning statistical models involving directed spanning trees, or equivalently non-projective dependency structures. We show how partition functions and marginals for directed spanning trees can be computed by an adaptation of Kirchhoff’s Matrix-Tree Theorem. To demonstrate an application of the method, we perform experiments which use the algorithm in training both log-linear and max-margin dependency parsers. The new training methods give improvements in accuracy over perceptron-trained models. 1
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)
Single Malt or Blended? A study in Multilingual Parser Optimization
- In Proc. of the Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL
, 2007
"... We describe a two-stage optimization of the MaltParser system for the ten languages in the multilingual track of the CoNLL 2007 shared task on dependency parsing. The first stage consists in tuning a single-parser system for each language by optimizing parameters of the parsing algorithm, the featur ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We describe a two-stage optimization of the MaltParser system for the ten languages in the multilingual track of the CoNLL 2007 shared task on dependency parsing. The first stage consists in tuning a single-parser system for each language by optimizing parameters of the parsing algorithm, the feature model, and the learning algorithm. The second stage consists in building an ensemble system that combines six different parsing strategies, extrapolating from the optimal parameters settings for each language. When evaluated on the official test sets, the ensemble system significantly outperforms the single-parser system and achieves the highest average labeled attachment score. 1
Vine parsing and minimum risk reranking for speed and precision
- In CoNLL
, 2006
"... We describe our entry in the CoNLL-X shared task. The system consists of three phases: a probabilistic vine parser (Eisner and N. Smith, 2005) that produces unlabeled dependency trees, a probabilistic relation-labeling model, and a discriminative minimum risk reranker (D. Smith and Eisner, 2006). Th ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We describe our entry in the CoNLL-X shared task. The system consists of three phases: a probabilistic vine parser (Eisner and N. Smith, 2005) that produces unlabeled dependency trees, a probabilistic relation-labeling model, and a discriminative minimum risk reranker (D. Smith and Eisner, 2006). The system is designed for fast training and decoding and for high precision. We describe sources of crosslingual error and ways to ameliorate them. We then provide a detailed error analysis of parses produced for sentences in German (much training data) and Arabic (little training data). 1
Discriminative Learning and Spanning Tree Algorithms for Dependency Parsing
, 2006
"... In this thesis we develop a discriminative learning method for dependency parsing using
online large-margin training combined with spanning tree inference algorithms. We will
show that this method provides state-of-the-art accuracy, is extensible through the feature
set and can be implemented effici ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this thesis we develop a discriminative learning method for dependency parsing using
online large-margin training combined with spanning tree inference algorithms. We will
show that this method provides state-of-the-art accuracy, is extensible through the feature
set and can be implemented efficiently. Furthermore, we display the language independent
nature of the method by evaluating it on over a dozen diverse languages as well as show its
practical applicability through integration into a sentence compression system.
We start by presenting an online large-margin learning framework that is a generaliza-
tion of the work of Crammer and Singer [34, 37] to structured outputs, such as sequences
and parse trees. This will lead to the heart of this thesis – discriminative dependency pars-
ing. Here we will formulate dependency parsing in a spanning tree framework, yielding
efficient parsing algorithms for both projective and non-projective tree structures. We will
then extend the parsing algorithm to incorporate features over larger substructures with-
out an increase in computational complexity for the projective case. Unfortunately, the
non-projective problem then becomes NP-hard so we provide structurally motivated ap-
proximate algorithms. Having defined a set of parsing algorithms, we will also define a
rich feature set and train various parsers using the online large-margin learning framework.
We then compare our trained dependency parsers to other state-of-the-art parsers on 14
diverse languages: Arabic, Bulgarian, Chinese, Czech, Danish, Dutch, English, German,
Japanese, Portuguese, Slovene, Spanish, Swedish and Turkish.
Having built an efficient and accurate discriminative dependency parser, this thesis will
then turn to improving and applying the parser. First we will show how additional re-
sources can provide useful features to increase parsing accuracy and to adapt parsers to
new domains. We will also argue that the robustness of discriminative inference-based
learning algorithms lend themselves well to dependency parsing when feature representa-
tions or structural constraints do not allow for tractable parsing algorithms. Finally, we
integrate our parsing models into a state-of-the-art sentence compression system to show
its applicability to a real world problem.
Covariance in Unsupervised Learning of Probabilistic Grammars
"... Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpurpose learn ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpurpose learning algorithms. There has been an increased interest in using probabilistic grammars in the Bayesian setting. To date, most of the literature has focused on using a Dirichlet prior. The Dirichlet prior has several limitations, including that it cannot directly model covariance between the probabilistic grammar’s parameters. Yet, various grammar parameters are expected to be correlated because the elements in language they represent share linguistic properties. In this paper, we suggest an alternative to the Dirichlet prior, a family of logistic normal distributions. We derive an inference algorithm for this family of distributions and experiment with the task of dependency grammar induction, demonstrating performance improvements with our priors on a set of six treebanks in different natural languages. Our covariance framework permits soft parameter tying within grammars and across grammars for text in different languages, and we show empirical gains in a novel learning setting using bilingual, non-parallel data.

