Results 11  20
of
67
Dependency parsing by belief propagation
 In Proceedings of EMNLP
, 2008
"... We formulate dependency parsing as a graphical model with the novel ingredient of global constraints. We show how to apply loopy belief propagation (BP), a simple and effective tool for approximate learning and inference. As a parsing algorithm, BP is both asymptotically and empirically efficient. E ..."
Abstract

Cited by 65 (7 self)
 Add to MetaCart
We formulate dependency parsing as a graphical model with the novel ingredient of global constraints. We show how to apply loopy belief propagation (BP), a simple and effective tool for approximate learning and inference. As a parsing algorithm, BP is both asymptotically and empirically efficient. Even with secondorder features or latent variables, which would make exact parsing considerably slower or NPhard, BP needs only O(n3) time with a small constant factor. Furthermore, such features significantly improve parse accuracy over exact firstorder methods. Incorporating additional features would increase the runtime additively rather than multiplicatively. 1
Semiring Parsing
 Computational Linguistics
, 1999
"... this paper is that all five of these commonly computed quantities can be described as elements of complete semirings (Kuich 1997). The relationship between grammars and semirings was discovered by Chomsky and Schtitzenberger (1963), and for parsing with the CKY algorithm, dates back to Teitelbaum ( ..."
Abstract

Cited by 64 (1 self)
 Add to MetaCart
this paper is that all five of these commonly computed quantities can be described as elements of complete semirings (Kuich 1997). The relationship between grammars and semirings was discovered by Chomsky and Schtitzenberger (1963), and for parsing with the CKY algorithm, dates back to Teitelbaum (1973). A complete semiring is a set of values over which a multiplicative operator and a commutative additive operator have been defined, and for which infinite summations are defined. For parsing algorithms satisfying certain conditions, the multiplicative and additive operations of any complete semiring can be used in place of/x and , and correct values will be returned. We will give a simple normal form for describing parsers, then precisely define complete semirings, and the conditions for correctness
Efficient Algorithms for Parsing the DOP Model
, 1996
"... Excellent results have been reported for DataOriented Parsing (DOP) of natural language texts (Bod, 1993c). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that mus ..."
Abstract

Cited by 58 (4 self)
 Add to MetaCart
Excellent results have been reported for DataOriented Parsing (DOP) of natural language texts (Bod, 1993c). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo p arsing algorithm. In this paper we solve the first problem by a novel reduction of the DOP model toga small, equivalent probabilistic contextfree grammar. We solve the second problem by a novel deterministic parsing strategy that maximizes the expected number of correct con stituents, rather than the probability of a correct parse tree. Using ithe optimizations, experiments yield a 97% crossing brackets rate and 88% zero crossing brackets rate. This differs significantly from the results reported by Bod, and is compara ble to results from a duplication of Pereira and Schabes's (1992) experiment on the same data. We show that Bod's results are at least partially due to an extremely fortuitous choice of test data, and partially due to using cleaner data than other researchers.
EdgeBased BestFirst Chart Parsing
 IN PROCEEDINGS OF THE SIXTH WORKSHOP ON VERY LARGE CORPORA
, 1998
"... Bestfirst probabilistic chart parsing attempts to parse efficiently by working on edges that are judged 'best' by some probabilistic figure of merit (FOM). Recent work has used proba bilistic contextfree grammars (PCFGs) to sign probabilities to constituents, and to use these probabilities as the ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
Bestfirst probabilistic chart parsing attempts to parse efficiently by working on edges that are judged 'best' by some probabilistic figure of merit (FOM). Recent work has used proba bilistic contextfree grammars (PCFGs) to sign probabilities to constituents, and to use these probabilities as the starting point for the FOM. This paper extends this approach to us ing a probabilistic FOM to judge edges (incomplete constituents), thereby giving a much finergrained control over parsing effort. We show how this can be accomplished in a particularly simple way using the common idea of binarizing the PCFG. The results obtained are about a factor of twenty improvement over the best prior results  that is, our parser achieves equivalent results using one twentieth the number of edges. Furthermore we show that this improvement is obtained with parsing precision and recall levels superior to those achieved by exhaustive parsing.
Conditional Structure versus Conditional Estimation in NLP Models
 In EMNLP 2002
, 2002
"... This paper separates conditional parameter estimation, which consistently raises test set accuracy on statistical NLP tasks, from conditional model structures, such as the conditional Markov model used for maximumentropy tagging, which tend to lower accuracy. Error analysis on partofspeech t ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
This paper separates conditional parameter estimation, which consistently raises test set accuracy on statistical NLP tasks, from conditional model structures, such as the conditional Markov model used for maximumentropy tagging, which tend to lower accuracy. Error analysis on partofspeech tagging shows that the actual tagging errors made by the conditionally structured model derive not only from label bias, but also from other ways in which the independence assumptions of the conditional model structure are unsuited to linguistic sequences. The paper presents new wordsense disambiguation and POS tagging experiments, and integrates apparently conflicting reports from other recent work.
Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction
 In Proceedings of NAACLHLT 2009. Shay
, 2009
"... We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, prov ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, providing a new way to encode prior knowledge about an unknown grammar. We describe a variational EM algorithm for learning a probabilistic grammar based on this family of priors. We then experiment with unsupervised dependency grammar induction and show significant improvements using our model for both monolingual learning and bilingual learning with a nonparallel, multilingual corpus. 1
Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German and Italian
 Evaluation of CrossLanguage Information Retrieval Systems, CLEF 2001, volume 2406 of Lecture Notes in Computer Science
, 2001
"... This paper describes the experiments of our team for CLEF 2001, which includes both official and postsubmission runs. We took part in the monolingual task, for Dutch, German, and Italian. The focus of our experiments was on the effects of morphological analyses such as stemming and compound spli ..."
Abstract

Cited by 46 (13 self)
 Add to MetaCart
This paper describes the experiments of our team for CLEF 2001, which includes both official and postsubmission runs. We took part in the monolingual task, for Dutch, German, and Italian. The focus of our experiments was on the effects of morphological analyses such as stemming and compound splitting on retrieval effectiveness. Confirming earlier reports on retrieval in compound splitting languages such as Dutch and German, we found improvements to be around 25% for German and as much as 69% for Dutch. For Italian, lexiconbased stemming resulted in gains of up to 25%. 1
Probabilistic Feature Grammars
 In Proceedings of the International Workshop on Parsing Technologies
, 1997
"... We present a new formalism, probabilistic feature grammar (PFG). PFGs combine most of the best properties of several other formalisms, including those of Collins, Magerman, and Charniak, and in experiments have comparable or better performance. PFGs generate features one at a time, probabilistically ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
We present a new formalism, probabilistic feature grammar (PFG). PFGs combine most of the best properties of several other formalisms, including those of Collins, Magerman, and Charniak, and in experiments have comparable or better performance. PFGs generate features one at a time, probabilistically, conditioning the probabilities of each feature on other features in a local context. Because the conditioning is local, efficient polynomial time parsing algorithms exist for computing inside, outside, and Viterbi parses. PFGs can produce probabilities of strings, making them potentially useful for language modeling. Precision and recall results are comparable to the state of the art with words, and the best reported without words. 1 Introduction Recently, many researchers have worked on statistical parsing techniques which try to capture additional context beyond that of simple probabilistic contextfree grammars (PCFGs), including Magerman (1995), Charniak (1996), Collins (1996; 1997), ...
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)
Balancing Robustness and Efficiency in Unificationaugmented ContextFree Parsers for Large Practical Applications
 Robustness in Language and Speech Technology
"... Large practical NLP applications require robust analysis components that can effectively handle input that is disfluent or extragrammatical. The effectiveness and efficiency of any robust parser are a direct function of three main factors: (1) Flexibility: what types of disfluencies and deviations ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
Large practical NLP applications require robust analysis components that can effectively handle input that is disfluent or extragrammatical. The effectiveness and efficiency of any robust parser are a direct function of three main factors: (1) Flexibility: what types of disfluencies and deviations from the grammar can the parser handle?; (2) Search: How does the parser search the space of possible interpretations, and what techniques are applied to prune the search space?; and (3) Parse Selection and Disambiguation: What methods and resources are used to evaluate and rank potential parses and subparses, and how does the parser cope with the extreme levels of ambiguity introduced by its flexibility parameters? In this chapter we describe our investigations on how to balance flexibility and efficiency in the context of two different robust parsers  a GLR parser and a left corner Chart parser  both based on a unificationaugmented contextfree grammar formalism. We demonstrate how the...