Results 1 - 10
of
35
Posterior regularization for structured latent variable models
- Journal of Machine Learning Research
, 2010
"... We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model co ..."
Abstract
-
Cited by 138 (8 self)
- Add to MetaCart
(Show Context)
We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction,
Generalized expectation criteria for semi-supervised learning of conditional random fields
- In In Proc. ACL, pages 870 – 878
, 2008
"... This paper presents a semi-supervised training method for linear-chain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distri ..."
Abstract
-
Cited by 108 (11 self)
- Add to MetaCart
This paper presents a semi-supervised training method for linear-chain conditional random fields that makes use of labeled features rather than labeled instances. This is accomplished by using generalized expectation criteria to express a preference for parameter settings in which the model’s distribution on unlabeled data matches a target distribution. We induce target conditional probability distributions of labels given features from both annotated feature occurrences in context and adhoc feature majority label assignment. The use of generalized expectation criteria allows for a dramatic reduction in annotation time by shifting from traditional instance-labeling to feature-labeling, and the methods presented outperform traditional CRF training and other semi-supervised methods when limited human effort is available. 1
Phylogenetic Grammar Induction
"... We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead, the phylogenetic prior couples languages at a parameter level. Joint induction in the multiling ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
(Show Context)
We present an approach to multilingual grammar induction that exploits a phylogeny-structured model of parameter drift. Our method does not require any translated texts or token-level alignments. Instead, the phylogenetic prior couples languages at a parameter level. Joint induction in the multilingual model substantially outperforms independent learning, with larger gains both from more articulated phylogenies and as well as from increasing numbers of languages. Across eight languages, the multilingual approach gives error reductions over the standard monolingual DMV averaging 21.1 % and reaching as high as 39%. 1
Bilingually-constrained (monolingual) shift-reduce parsing
- In EMNLP
, 2009
"... Jointly parsing two languages has been shown to improve accuracies on either or both sides. However, its search space is much bigger than the monolingual case, forcing existing approaches to employ complicated modeling and crude approximations. Here we propose a much simpler alternative, bilingually ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
Jointly parsing two languages has been shown to improve accuracies on either or both sides. However, its search space is much bigger than the monolingual case, forcing existing approaches to employ complicated modeling and crude approximations. Here we propose a much simpler alternative, bilingually-constrained monolingual parsing, where a source-language parser learns to exploit reorderings as additional observation, but not bothering to build the target-side tree as well. We show specifically how to enhance a shift-reduce dependency parser with alignment features to resolve shift-reduce conflicts. Experiments on the bilingual portion of Chinese Treebank show that, with just 3 bilingual features, we can improve parsing accuracies by 0.6 % (absolute) for both English and Chinese over a state-of-the-art baseline, with negligible (∼6%) efficiency overhead, thus much faster than biparsing. 1
Alternating projections for learning with expectation constraints
- In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI
, 2009
"... We present an objective function for learn-ing with unlabeled data that utilizes auxil-iary expectation constraints. We optimize this objective function using a procedure that alternates between information and moment projections. Our method provides an alter-nate interpretation of the posterior reg ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
(Show Context)
We present an objective function for learn-ing with unlabeled data that utilizes auxil-iary expectation constraints. We optimize this objective function using a procedure that alternates between information and moment projections. Our method provides an alter-nate interpretation of the posterior regular-ization framework (Graca et al., 2008), main-tains uncertainty during optimization un-like constraint-driven learning (Chang et al., 2007), and is more efficient than general-ized expectation criteria (Mann & McCallum, 2008). Applications of this framework in-clude minimally supervised learning, semi-supervised learning, and learning with con-straints that are more expressive than the un-derlying model. In experiments, we demon-strate comparable accuracy to generalized ex-pectation criteria for minimally supervised learning, and use expressive structural con-straints to guide semi-supervised learning, providing a 3%-6 % improvement over state-of-the-art constraint-driven learning. 1
Parser Adaptation and Projection with Quasi-Synchronous Grammar Features ∗
"... We connect two scenarios in structured learning: adapting a parser trained on one corpus to another annotation style, and projecting syntactic annotations from one language to another. We propose quasisynchronous grammar (QG) features for these structured learning tasks. That is, we score a aligned ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
We connect two scenarios in structured learning: adapting a parser trained on one corpus to another annotation style, and projecting syntactic annotations from one language to another. We propose quasisynchronous grammar (QG) features for these structured learning tasks. That is, we score a aligned pair of source and target trees based on local features of the trees and the alignment. Our quasi-synchronous model assigns positive probability to any alignment of any trees, in contrast to a synchronous grammar, which would insist on some form of structural parallelism. In monolingual dependency parser adaptation, we achieve high accuracy in translating among multiple annotation styles for the same sentence. On the more difficult problem of cross-lingual parser projection, we learn a dependency parser for a target language by using bilingual text, an English parser, and automatic word alignments. Our experiments show that unsupervised QG projection improves on parses trained using only highprecision projected annotations and far outperforms, by more than 35 % absolute dependency accuracy, learning an unsupervised parser from raw target-language text alone. When a few target-language parse trees are available, projection gives a boost equivalent to doubling the number of target-language trees.
Learning Better Monolingual Models with Unannotated Bilingual Text
"... This work shows how to improve state-of-the-art monolingual natural language processing models using unannotated bilingual text. We build a multiview learning objective that enforces agreement between monolingual and bilingual models. In our method the first, monolingual view consists of supervised ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
(Show Context)
This work shows how to improve state-of-the-art monolingual natural language processing models using unannotated bilingual text. We build a multiview learning objective that enforces agreement between monolingual and bilingual models. In our method the first, monolingual view consists of supervised predictors learned separately for each language. The second, bilingual view consists of log-linear predictors learned over both languages on bilingual text. Our training procedure estimates the parameters of the bilingual model using the output of the monolingual model, and we show how to combine the two models to account for dependence between views. For the task of named entity recognition, using bilingual predictors increases F1 by 16.1 % absolute over a supervised monolingual model, and retraining on bilingual predictions increases monolingual model F1 by 14.6%. For syntactic parsing, our bilingual predictor increases F1 by 2.1 % absolute, and retraining a monolingual model on its output gives an improvement of 2.0%. 1
Universal Dependency Annotation for Multilingual Parsing
"... We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than ha ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
(Show Context)
We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than has been possible before. This ‘universal ’ treebank is made freely available in order to facilitate research on multilingual dependency parsing. 1 1
Covariance in Unsupervised Learning of Probabilistic Grammars.
- In Proc. Journees Francophones de Programmation en Logique avec Contraintes.
, 2010
"... Abstract Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpur ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
(Show Context)
Abstract Probabilistic grammars offer great flexibility in modeling discrete sequential data like natural language text. Their symbolic component is amenable to inspection by humans, while their probabilistic component helps resolve ambiguity. They also permit the use of well-understood, generalpurpose learning algorithms. There has been an increased interest in using probabilistic grammars in the Bayesian setting. To date, most of the literature has focused on using a Dirichlet prior. The Dirichlet prior has several limitations, including that it cannot directly model covariance between the probabilistic grammar's parameters. Yet, various grammar parameters are expected to be correlated because the elements in language they represent share linguistic properties. In this paper, we suggest an alternative to the Dirichlet prior, a family of logistic normal distributions. We derive an inference algorithm for this family of distributions and experiment with the task of dependency grammar induction, demonstrating performance improvements with our priors on a set of six treebanks in different natural languages. Our covariance framework permits soft parameter tying within grammars and across grammars for text in different languages, and we show empirical gains in a novel learning setting using bilingual, non-parallel data.
Learning tractable word alignment models with complex constraints
- Computational Linguistics
, 2010
"... Word-level alignment of bilingual text is a critical resource for a growing variety of tasks. Proba ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
Word-level alignment of bilingual text is a critical resource for a growing variety of tasks. Proba