Results 1  10
of
72
Corpusbased induction of syntactic structure: Models of dependency and constituency
 In Proceedings of the 42nd Annual Meeting of the ACL
, 2004
"... We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best pu ..."
Abstract

Cited by 170 (8 self)
 Add to MetaCart
We present a generative model for the unsupervised learning of dependency structures. We also describe the multiplicative combination of this dependency model with a model of linear constituency. The product model outperforms both components on their respective evaluation metrics, giving the best published figures for unsupervised dependency parsing and unsupervised constituency parsing. We also demonstrate that the combined model works and is robust crosslinguistically, being able to exploit either attachment or distributional regularities that are salient in the data. 1
Image Parsing: Unifying Segmentation, Detection, and Recognition
, 2005
"... In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation in a "parsing graph", in a spirit similar to parsing sentences in speech and natural language. The ..."
Abstract

Cited by 160 (18 self)
 Add to MetaCart
In this paper we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation in a "parsing graph", in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and reconfigures it dynamically using a set of reversible Markov chain jumps. This computational framework integrates two popular inference approaches  generative (topdown) methods and discriminative (bottomup) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottomup tests/filters.
A fully bayesian approach to unsupervised partofspeech tagging
 In ACL
, 2007
"... Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximumlikelihood estimation (MLE) of the model parameters. We show usi ..."
Abstract

Cited by 115 (0 self)
 Add to MetaCart
Unsupervised learning of linguistic structure is a difficult problem. A common approach is to define a generative model and maximize the probability of the hidden structure given the observed data. Typically, this is done using maximumlikelihood estimation (MLE) of the model parameters. We show using partofspeech tagging that a fully Bayesian approach can greatly improve performance. Rather than estimating a single set of parameters, the Bayesian approach integrates over all possible parameter values. This difference ensures that the learned structure will have high probability over a range of possible parameters, and permits the use of priors favoring the sparse distributions that are typical of natural language. Our model has the structure of a standard trigram HMM, yet its accuracy is closer to that of a stateoftheart discriminative model (Smith and Eisner, 2005), up to 14 percentage points better than MLE. We find improvements both when training from data alone, and using a tagging dictionary. 1
Effective selftraining for parsing
 In Proc. N. American ACL (NAACL
, 2006
"... We present a simple, but surprisingly effective, method of selftraining a twophase parserreranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved mod ..."
Abstract

Cited by 91 (6 self)
 Add to MetaCart
We present a simple, but surprisingly effective, method of selftraining a twophase parserreranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an fscore of 92.1%, an absolute 1.1 % improvement (12 % error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon. 1
Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing
"... Unsupervised grammar induction models tend to employ relatively simple models of syntax when compared to their supervised counterparts. Traditionally, the unsupervised models have been kept simple due to tractability and data sparsity concerns. In this paper, we introduce basic valence frames and le ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
Unsupervised grammar induction models tend to employ relatively simple models of syntax when compared to their supervised counterparts. Traditionally, the unsupervised models have been kept simple due to tractability and data sparsity concerns. In this paper, we introduce basic valence frames and lexical information into an unsupervised dependency grammar inducer and show how this additional information can be leveraged via smoothing. Our model produces stateoftheart results on the task of unsupervised grammar induction, improving over the best previous work by almost 10 percentage points. 1
Annealing structural bias in multilingual weighted grammar induction
 In Proc. ACL
, 2006
"... We first show how a structural locality bias can improve the accuracy of stateoftheart dependency grammar induction models trained by EM from unannotated examples (Klein and Manning, 2004). Next, by annealing the free parameter that controls this bias, we achieve further improvements. We then des ..."
Abstract

Cited by 30 (9 self)
 Add to MetaCart
We first show how a structural locality bias can improve the accuracy of stateoftheart dependency grammar induction models trained by EM from unannotated examples (Klein and Manning, 2004). Next, by annealing the free parameter that controls this bias, we achieve further improvements. We then describe an alternative kind of structural bias, toward “broken ” hypotheses consisting of partial structures over segmented sentences, and show a similar pattern of improvement. We relate this approach to contrastive estimation (Smith and Eisner, 2005a), apply the latter to grammar induction in six languages, and show that our new approach improves accuracy by 1–17 % (absolute) over CE (and 8–30% over EM), achieving to our knowledge the best results on this task to date. Our method, structural annealing, is a general technique with broad applicability to hiddenstructure discovery problems. 1
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)
Guiding unsupervised grammar induction using contrastive estimation
 In Proc. of IJCAI Workshop on Grammatical Inference Applications
, 2005
"... We describe a novel training criterion for probabilistic grammar induction models, contrastive estimation [Smith and Eisner, 2005], which can be interpreted as exploiting implicit negative evidence and includes a wide class of likelihoodbased objective functions. This criterion is a generalization ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
We describe a novel training criterion for probabilistic grammar induction models, contrastive estimation [Smith and Eisner, 2005], which can be interpreted as exploiting implicit negative evidence and includes a wide class of likelihoodbased objective functions. This criterion is a generalization of the function maximized by the ExpectationMaximization algorithm [Dempster et al., 1977]. CE is a natural fit for loglinear models, which can include arbitrary features but for which EM is computationally difficult. We show that, using the same features, loglinear dependency grammar models trained using CE can drastically outperform EMtrained generative models on the task of matching human linguistic annotations (the MATCHLINGUIST task). The selection of an implicit negative evidence class—a “neighborhood”—appropriate to a given task has strong implications, but a good neighborhood one can target the objective of grammar induction to a specific application. 1
Probabilistic models of nonprojective dependency trees
 In Proc. EMNLPCoNLL
, 2007
"... A notable gap in research on statistical dependency parsing is a proper conditional probability distribution over nonprojective dependency trees for a given sentence. We exploit the Matrix Tree Theorem (Tutte, 1984) to derive an algorithm that efficiently sums the scores of all nonprojective trees i ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
A notable gap in research on statistical dependency parsing is a proper conditional probability distribution over nonprojective dependency trees for a given sentence. We exploit the Matrix Tree Theorem (Tutte, 1984) to derive an algorithm that efficiently sums the scores of all nonprojective trees in a sentence, permitting the definition of a conditional loglinear model over trees. While discriminative methods, such as those presented in McDonald et al. (2005b), obtain very high accuracy on standard dependency parsing tasks and can be trained and applied without marginalization, “summing trees ” permits some alternative techniques of interest. Using the summing algorithm, we present competitive experimental results on four nonprojective languages, for maximum conditional likelihood estimation, minimum Bayesrisk parsing, and hidden variable training. 1
Variational Bayesian grammar induction for natural language
 In International Colloquium on Grammatical Inference
, 2006
"... Abstract. This paper presents a new grammar induction algorithm for probabilistic contextfree grammars (PCFGs). There is an approach to PCFG induction that is based on parameter estimation. Following this approach, we apply the variational Bayes to PCFGs. The variational Bayes (VB) is an approximat ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
Abstract. This paper presents a new grammar induction algorithm for probabilistic contextfree grammars (PCFGs). There is an approach to PCFG induction that is based on parameter estimation. Following this approach, we apply the variational Bayes to PCFGs. The variational Bayes (VB) is an approximation of Bayesian learning. It has been empirically shown that VB is less likely to cause overfitting. Moreover, the free energy of VB has been successfully used in model selection. Our algorithm can be seen as a generalization of PCFG induction algorithms proposed before. In the experiments, we empirically show that induced grammars achieve better parsing results than those of other PCFG induction algorithms. Based on the better parsing results, we give examples of recursive grammatical structures found by the proposed algorithm. 1