• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Solving the problem of cascading errors: Approximate bayesian inference for linguistic annotation pipelines (2006)

by J R Finkel, C D Manning, A Y Ng
Venue:In Proceedings of EMNLP
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 24
Next 10 →

Improved Inference for Unlexicalized Parsing

by Slav Petrov, Dan Klein , 2007
"... We present several improvements to unlexicalized parsing with hierarchically state-split PCFGs. First, we present a novel coarse-to-fine method in which a grammar’s own hierarchical projections are used for incremental pruning, including a method for efficiently computing projections of a grammar wi ..."
Abstract - Cited by 115 (17 self) - Add to MetaCart
We present several improvements to unlexicalized parsing with hierarchically state-split PCFGs. First, we present a novel coarse-to-fine method in which a grammar’s own hierarchical projections are used for incremental pruning, including a method for efficiently computing projections of a grammar without a treebank. In our experiments, hierarchical pruning greatly accelerates parsing with no loss in empirical accuracy. Second, we compare various inference procedures for state-split PCFGs from the standpoint of risk minimization, paying particular attention to their practical tradeoffs. Finally, we present multilingual experiments which show that parsing with hierarchical state-splitting is fast and accurate in multiple languages and domains, even without any language-specific tuning.

Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text

by Noah Ashton Smith , 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract - Cited by 20 (7 self) - Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)

Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction

by Shay B. Cohen, Kevin Gimpel, Noah A. Smith
"... We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal ..."
Abstract - Cited by 17 (4 self) - Add to MetaCart
We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for that model, and then experiment with the task of unsupervised grammar induction for natural language dependency parsing. We show that our model achieves superior results over previous models that use different priors. 1

Joint Parsing and Named Entity Recognition

by Jenny Rose Finkel, Christopher D. Manning
"... For many language technology applications, such as question answering, the overall system runs several independent processors over the data (such as a named entity recognizer, a coreference system, and a parser). This easily results in inconsistent annotations, which are harmful to the performance o ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
For many language technology applications, such as question answering, the overall system runs several independent processors over the data (such as a named entity recognizer, a coreference system, and a parser). This easily results in inconsistent annotations, which are harmful to the performance of the aggregate system. We begin to address this problem with a joint model of parsing and named entity recognition, based on a discriminative feature-based constituency parser. Our model produces a consistent output, where the named entity spans do not conflict with the phrasal spans of the parse tree. The joint representation also allows the information from each type of annotation to improve performance on the other, and, in experiments with the OntoNotes corpus, we found improvements of up to 1.36 % absolute F1 for parsing, and up to 9.0 % F1 for named entity recognition. 1

Piecewise pseudolikelihood for efficient CRF training

by Charles Sutton, Andrew Mccallum - In International Conference on Machine Learning (ICML , 2007
"... Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accur ..."
Abstract - Cited by 13 (2 self) - Add to MetaCart
Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accuracy can be poor. Piecewise training (Sutton & McCallum, 2005) can have better accuracy but does not scale as well in the variable cardinality. In this paper, we introduce piecewise pseudolikelihood, which retains the computational efficiency of pseudolikelihood but can have much better accuracy. On several benchmark NLP data sets, piecewise pseudolikelihood has better accuracy than standard pseudolikelihood, and in many cases nearly equivalent to maximum likelihood, with five to ten times less training time than batch CRF training. 1.

Learning and Inference in WEIGHTED LOGIC WITH APPLICATION TO NATURAL LANGUAGE PROCESSING

by Aron Culotta , 2008
"... ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
Abstract not found

A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

by Michael Auli, Adam Lopez
"... Via an oracle experiment, we show that the upper bound on accuracy of a CCG parser is significantly lowered when its search space is pruned using a supertagger, though the supertagger also prunes many bad parses. Inspired by this analysis, we design a single model with both supertagging and parsing ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
Via an oracle experiment, we show that the upper bound on accuracy of a CCG parser is significantly lowered when its search space is pruned using a supertagger, though the supertagger also prunes many bad parses. Inspired by this analysis, we design a single model with both supertagging and parsing features, rather than separating them into distinct models chained together in a pipeline. To overcome the resulting increase in complexity, we experiment with both belief propagation and dual decomposition approaches to inference, the first empirical comparison of these algorithms that we are aware of on a structured natural language processing problem. On CCGbank we achieve a labelled dependency F-measure of 88.8 % on gold POS tags, and 86.7 % on automatic part-of-speeoch tags, the best reported results for this task. 1

Joint Word Segmentation and POS Tagging using a Single Perceptron

by Yue Zhang, Stephen Clark
"... For Chinese POS tagging, word segmentation is a preliminary step. To avoid error propagation and improve segmentation by utilizing POS information, segmentation and tagging can be performed simultaneously. A challenge for this joint approach is the large combined search space, which makes efficient ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
For Chinese POS tagging, word segmentation is a preliminary step. To avoid error propagation and improve segmentation by utilizing POS information, segmentation and tagging can be performed simultaneously. A challenge for this joint approach is the large combined search space, which makes efficient decoding very hard. Recent research has explored the integration of segmentation and POS tagging, by decoding under restricted versions of the full combined search space. In this paper, we propose a joint segmentation and POS tagging model that does not impose any hard constraints on the interaction between word and POS information. Fast decoding is achieved by using a novel multiple-beam search algorithm. The system uses a discriminative statistical model, trained using the generalized perceptron algorithm. The joint model gives an error reduction in segmentation accuracy of 14.6 % and an error reduction in tagging accuracy of 12.2%, compared to the traditional pipeline approach. 1

Piecewise Training for Structured Prediction

by Charles Sutton, Andrew Mccallum - MACHINE LEARNING
"... A drawback of structured prediction methods is that parameter estimation requires repeated inference, which is intractable for general structures. In this paper, we present an approximate training algorithm called piecewise training that divides the factors into tractable subgraphs, which we call pi ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
A drawback of structured prediction methods is that parameter estimation requires repeated inference, which is intractable for general structures. In this paper, we present an approximate training algorithm called piecewise training that divides the factors into tractable subgraphs, which we call pieces, that are trained independently. Piecewise training can be interpreted as approximating the exact likelihood using belief propagation, and different ways of making this interpretation yield different insights into the method. We also present an extension to piecewise training, called piecewise pseudolikelihood, designed for when variables have large cardinality. On several real-world NLP data sets, piecewise training performs superior to Besag’s pseudolikelihood and sometimes comparably to exact maximum likelihood. In addition, PWPL performs similarly to piecewise and superior to standard pseudolikelihood, but is five to ten times more computationally efficient than batch maximum likelihood training.

The shared logistic normal distribution for grammar induction

by Shay B. Cohen, Noah A. Smith - In Proc. NAACL , 2009
"... We present a shared logistic normal distribution as a Bayesian prior over probabilistic grammar weights. This approach generalizes the similar use of logistic normal distributions [3], enabling soft parameter tying during inference across different multinomials comprising the probabilistic grammar. ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
We present a shared logistic normal distribution as a Bayesian prior over probabilistic grammar weights. This approach generalizes the similar use of logistic normal distributions [3], enabling soft parameter tying during inference across different multinomials comprising the probabilistic grammar. We show that this model outperforms previous approaches on an unsupervised dependency grammar induction task. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University