## Probabilistic Feature Grammars (1997)

Venue: | In Proceedings of the International Workshop on Parsing Technologies |

Citations: | 37 - 0 self |

### BibTeX

@INPROCEEDINGS{Goodman97probabilisticfeature,

author = {Joshua Goodman},

title = {Probabilistic Feature Grammars},

booktitle = {In Proceedings of the International Workshop on Parsing Technologies},

year = {1997},

pages = {89--100}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present a new formalism, probabilistic feature grammar (PFG). PFGs combine most of the best properties of several other formalisms, including those of Collins, Magerman, and Charniak, and in experiments have comparable or better performance. PFGs generate features one at a time, probabilistically, conditioning the probabilities of each feature on other features in a local context. Because the conditioning is local, efficient polynomial time parsing algorithms exist for computing inside, outside, and Viterbi parses. PFGs can produce probabilities of strings, making them potentially useful for language modeling. Precision and recall results are comparable to the state of the art with words, and the best reported without words. 1 Introduction Recently, many researchers have worked on statistical parsing techniques which try to capture additional context beyond that of simple probabilistic context-free grammars (PCFGs), including Magerman (1995), Charniak (1996), Collins (1996; 1997), ...

### Citations

503 | Three generative, lexicalised models for statistical parsing - Collins - 1997 |

437 | A new statistical parser based on bigram lexical dependencies
- Collins
- 1996
(Show Context)
Citation Context ... to make history-based grammars (Magerman, 1995) more context free, and thus amenable to dynamic programming; as a way to generalize the work of Black et al. (1992); as a way to turn Collins' parser (=-=Collins, 1996-=-) into a generative probabilistic language model; or as an extension of language-modeling techniques to stochastic grammars. The resulting formalism, which is relatively simple and elegant, has most o... |

373 |
The estimation of stochastic context-free grammars using the Inside–Outside algorithm. Computer Speech and Language
- Lari, Young
- 1990
(Show Context)
Citation Context ...ifferent approaches. 5 Parsing The parsing algorithm we use is a simple variation on probabilistic versions of the CKY algorithm for PCFGs, using feature vectors instead of nonterminals (Baker, 1979; =-=Lari and Young, 1990-=-). The parser computes inside probabilities (the sum of probabilities of all parses, i.e. the probability of the sentence) and Viterbi probabilities (the probability of the best parse), and, optionall... |

268 |
Trainable grammars for speech recognition
- Baker
- 1979
(Show Context)
Citation Context ...of two very different approaches. 5 Parsing The parsing algorithm we use is a simple variation on probabilistic versions of the CKY algorithm for PCFGs, using feature vectors instead of nonterminals (=-=Baker, 1979-=-; Lari and Young, 1990). The parser computes inside probabilities (the sum of probabilities of all parses, i.e. the probability of the sentence) and Viterbi probabilities (the probability of the best ... |

228 | Tree-bank grammars
- Charniak
- 1996
(Show Context)
Citation Context ...P) C(S) , the number of occurrences of S ! NP VP divided by the number of occurrences of S. For a reasonably large treebank, probabilities estimated in this way would be reliable enough to be useful (=-=Charniak, 1996-=-). On the other hand, it is not unlikely that we would never have seen any counts at all of C((S; singular; dies) ! (NP; singular; man)(VP; singular; dies)) C((S; singular; dies)) which is the estimat... |

192 | Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Uni�cation-Based Grammars
- Briscoe, Carroll
- 1993
(Show Context)
Citation Context ... area for future research to determine whether we can improve our performance by using decision trees. 4.6 Probabilistic LR Parsing with Unification Grammars Briscoe and Carroll describe a formalism (=-=Briscoe and Carroll, 1993-=-; Carroll and Briscoe, 1992) similar in many ways to the first IBM model. In particular, a context-free covering grammar of a unification grammar is constructed. Some features are captured by the cove... |

187 | An E cient Probabilistic Context-Free Parsing Algorithm that Computes Pre x Probabilities
- Stolcke
- 1995
(Show Context)
Citation Context ...ft children. Thus, our model can be used to capture a wider variety of grammatical theories, simply by changing the choice of features. 1 Unary branching events are in general difficult to deal with (=-=Stolcke, 1993-=-). We introduce an additional EventProb for unary events, and do not allow more than one unary event in a row. Finally, there are some subtle interesting differences with respect to the distance metri... |

155 |
Natural Language Parsing as Statistical Pattern Recognition
- Magerman
- 1994
(Show Context)
Citation Context ... is somewhat inelegant; also, for the probabilities to sum to one, it requires an additional step of normalization, which they appear not to have implemented. In their next model (Black et al., 1992; =-=Magerman, 1994-=-, pp. 46--56), which strongly influenced our model, five attributes are associated with each nonterminal: a syntactic category, a semantic category, a rule, and two lexical heads. The rules in this gr... |

90 | Parsing algorithms and metrics
- Goodman
- 1996
(Show Context)
Citation Context ...and as part of an integrated model. Furthermore, unlike all but one of the comparable systems, PFGs can compute outside probabilities, which are useful for grammar induction, some parsing algorithms (=-=Goodman, 1996-=-), and, as we will show, pruning (Goodman, 1997). 4.1 Bigram Lexical Dependency Parsing Collins (1996) introduced a parser with extremely good performance. From this parser, we take many of the partic... |

74 |
A fully statistical approach to natural language interfaces
- Miller, Stallard, et al.
- 1996
(Show Context)
Citation Context ...ming algorithms, and learned quickly from a treebank. Finally, unlike most other formalisms, PFGs are potentially useful for language modeling or as one part of an integrated statistical system (e.g. =-=Miller et al., 1996-=-) or for use with algorithms requiring outside probabilities. Empirical results are encouraging: our best parser is comparable to those of Magerman (1995) and Collins (1996) when run on the same data.... |

59 |
Statistically-Driven Computer Grammars of English: The IBM/Lancaster Approach, Rodopi: Amsterdam-Atlanta
- Black, Garside, et al.
- 1993
(Show Context)
Citation Context ...ints. 4.5 IBM Language Modeling Group Researchers in the IBM Language Modeling Group developed a series of successively more complicated models to integrate statistics with features. The first model (=-=Black, Garside, and Leech, 1993-=-; Black, Lafferty, and Roukos, 1992) essentially tries to convert a unification grammar to a PCFG, by instantiating the values of the features. Due to data sparsity, however, not all features can be i... |

54 | Development and evaluation of a broad-coverage probabilistic grammar of English-language computer manuals
- Black, Lafferty, et al.
- 1992
(Show Context)
Citation Context ...son to Previous Work PFG bears much in common with previous work, but in each case has at least some advantages over previous formalisms. Some other models (Charniak, 1996; Brew, 1995; Collins, 1996; =-=Black, Lafferty, and Roukos, 1992-=-) use probability approximations that do not sum to 1, meaning that they should not be used either for language modeling, e.g. in a speech recognition system, or as part of an integrated model such as... |

40 | Global thresholding and multiple-pass parsing
- Goodman
- 1997
(Show Context)
Citation Context ..., unlike all but one of the comparable systems, PFGs can compute outside probabilities, which are useful for grammar induction, some parsing algorithms (Goodman, 1996), and, as we will show, pruning (=-=Goodman, 1997-=-). 4.1 Bigram Lexical Dependency Parsing Collins (1996) introduced a parser with extremely good performance. From this parser, we take many of the particular conditioning features that we will use in ... |

32 | Stochastic hpsg
- Brew
- 1995
(Show Context)
Citation Context ...ation feature). 1 4 Comparison to Previous Work PFG bears much in common with previous work, but in each case has at least some advantages over previous formalisms. Some other models (Charniak, 1996; =-=Brew, 1995-=-; Collins, 1996; Black, Lafferty, and Roukos, 1992) use probability approximations that do not sum to 1, meaning that they should not be used either for language modeling, e.g. in a speech recognition... |

18 | Probabilistic normalisation and unpacking of packed parse forests for unification-based grammars
- Carroll, Briscoe
- 1992
(Show Context)
Citation Context ...o determine whether we can improve our performance by using decision trees. 4.6 Probabilistic LR Parsing with Unification Grammars Briscoe and Carroll describe a formalism (Briscoe and Carroll, 1993; =-=Carroll and Briscoe, 1992-=-) similar in many ways to the first IBM model. In particular, a context-free covering grammar of a unification grammar is constructed. Some features are captured by the covering grammar, while others ... |

13 | Towards probabilistic extensions of constraint-based grammars - Eisele - 1994 |

4 |
Statistical decisionmodels for parsing
- Magerman
- 1995
(Show Context)
Citation Context ...part-of-speech (POS) tags alone as input, we perform significantly better than comparable parsers. 2 Motivation PFG can be regarded in several different ways: as a way to make history-based grammars (=-=Magerman, 1995-=-) more context free, and thus amenable to dynamic programming; as a way to generalize the work of Black et al. (1992); as a way to turn Collins' parser (Collins, 1996) into a generative probabilistic ... |

2 | Stochastic attribute-value grammars. Available as cmplg/9610003 - Abney - 1996 |