Results 1  10
of
16
Parsing with soft and hard constraints on dependency length
 In Proceedings of the International Workshop on Parsing Technologies (IWPT
, 2005
"... In lexicalized phrasestructure or dependency parses, a word’s modifiers tend to fall near it in the string. We show that a crude way to use dependency length as a parsing feature can substantially improve parsing speed and accuracy in English and Chinese, with more mixed results on German. We then ..."
Abstract

Cited by 33 (5 self)
 Add to MetaCart
In lexicalized phrasestructure or dependency parses, a word’s modifiers tend to fall near it in the string. We show that a crude way to use dependency length as a parsing feature can substantially improve parsing speed and accuracy in English and Chinese, with more mixed results on German. We then show similar improvements by imposing hard bounds on dependency length and (additionally) modeling the resulting sequence of parse fragments. This simple “vine grammar ” formalism has only finitestate power, but a contextfree parameterization with some extra parameters for stringing fragments together. We exhibit a lineartime chart parsing algorithm with a low grammar constant. 1
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)
Featurerich translation by quasisynchronous lattice parsing
 In EMNLP
, 2009
"... We present a machine translation framework that can incorporate arbitrary features of both input and output sentences. The core of the approach is a novel decoder based on lattice parsing with quasisynchronous grammar (Smith and Eisner, 2006), a syntactic formalism that does not require source and t ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
We present a machine translation framework that can incorporate arbitrary features of both input and output sentences. The core of the approach is a novel decoder based on lattice parsing with quasisynchronous grammar (Smith and Eisner, 2006), a syntactic formalism that does not require source and target trees to be isomorphic. Using generic approximate dynamic programming techniques, this decoder can handle “nonlocal ” features. Similar approximate inference techniques support efficient parameter estimation with hidden variables. We use the decoder to conduct controlled experiments on a GermantoEnglish translation task, to compare lexical phrase, syntax, and combined models, and to measure effects of various restrictions on nonisomorphism. 1
Cube Summing, Approximate Inference with NonLocal Features, and Dynamic Programming without Semirings
"... We introduce cube summing, a technique that permits dynamic programming algorithms for summing over structures (like the forward and inside algorithms) to be extended with nonlocal features that violate the classical structural independence assumptions. It is inspired by cube pruning (Chiang, 2007; ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
We introduce cube summing, a technique that permits dynamic programming algorithms for summing over structures (like the forward and inside algorithms) to be extended with nonlocal features that violate the classical structural independence assumptions. It is inspired by cube pruning (Chiang, 2007; Huang and Chiang, 2007) in its computation of nonlocal features dynamically using scored kbest lists, but also maintains additional residual quantities used in calculating approximate marginals. When restricted to local features, cube summing reduces to a novel semiring (kbest+residual) that generalizes many of the semirings of Goodman (1999). When nonlocal features are included, cube summing does not reduce to any semiring, but is compatible with generic techniques for solving dynamic programming equations. 1
Computationally Efficient MEstimation of LogLinear Structure Models
, 2007
"... We describe a new loss function, due to Jeon and Lin (2006), for estimating structured loglinear models on arbitrary features. The loss function can be seen as a (generative) alternative to maximum likelihood estimation with an interesting informationtheoretic interpretation, and it is statistical ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
We describe a new loss function, due to Jeon and Lin (2006), for estimating structured loglinear models on arbitrary features. The loss function can be seen as a (generative) alternative to maximum likelihood estimation with an interesting informationtheoretic interpretation, and it is statistically consistent. It is substantially faster than maximum (conditional) likelihood estimation of conditional random fields (Lafferty et al., 2001; an order of magnitude or more). We compare its performance and training time to an HMM, a CRF, an MEMM, and pseudolikelihood on a shallow parsing task. These experiments help tease apart the contributions of rich features and discriminative training, which are shown to be more than additive.
Exact inference for generative probabilistic nonprojective dependency parsing
 In Proceedings of EMNLP
, 2011
"... We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for uns ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for unsupervised learning of nonprojective dependency trees.
Dynamic programming algorithms as products of weighted logic programs
 In Proc. of ICLP
, 2008
"... Abstract. Weighted logic programming, a generalization of bottomup logic programming, is a successful framework for specifying dynamic programming algorithms. In this setting, proofs correspond to the algorithm’s output space, such as a path through a graph or a grammatical derivation, and are give ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Weighted logic programming, a generalization of bottomup logic programming, is a successful framework for specifying dynamic programming algorithms. In this setting, proofs correspond to the algorithm’s output space, such as a path through a graph or a grammatical derivation, and are given a weighted score, often interpreted as a probability, that depends on the score of the base axioms used in the proof. The desired output is a function over all possible proofs, such as a sum of scores or an optimal score. We describe the PRODUCT transformation, which can merge two weighted logic programs into a new one. The resulting program optimizes a product of proof scores from the original programs, constituting a scoring function known in machine learning as a “product of experts. ” Through the addition of intuitive constraining side conditions, we show that several important dynamic programming algorithms can be derived by applying PRODUCT to weighted logic programs corresponding to simpler weighted logic programs. 1
permission. Declarative Information Extraction in a Probabilistic Database System
"... personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires pri ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific
Products of Weighted Logic Programs
"... www.lti.cs.cmu.edu © 2008, Shay B. Cohen and Robert J. Simmons and Noah A. SmithAbstract. Weighted logic programming, a generalization of bottomup logic programming, is a successful framework for specifying dynamic programming algorithms. In this setting, proofs correspond to the algorithm’s output ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
www.lti.cs.cmu.edu © 2008, Shay B. Cohen and Robert J. Simmons and Noah A. SmithAbstract. Weighted logic programming, a generalization of bottomup logic programming, is a successful framework for specifying dynamic programming algorithms. In this setting, proofs correspond to the algorithm’s output space, such as a path through a graph or a grammatical derivation, and are given a weighted score, often interpreted as a probability, that depends on the score of the base axioms used in the proof. The desired output is a function over all possible proofs, such as a sum of scores or an optimal score. We describe the PRODUCT transformation, which can merge two weighted logic programs into a new one. The resulting program optimizes a product of proof scores from the original programs, constituting a scoring function known in machine learning as a “product of experts. ” Through the addition of intuitive constraining side conditions, we show that several important dynamic programming algorithms can be derived by applying PRODUCT to weighted logic programs corresponding to simpler weighted logic programs. This report is an extended version of [3]. 1 1