Results 1  10
of
19
Generalized Probabilistic LR Parsing of Natural Language (Corpora) with UnificationBased Grammars
 COMPUTATIONAL LINGUISTICS
, 1993
"... ..."
A generalized CYK algorithm for parsing stochastic CFG
, 1998
"... We present a bottomup parsing algorithm for stochastic contextfree grammars that is able (1) to deal with multiple interpretations of sentences containing compound words; (2) to extract Nmost probable parses in O(n 3 ) and compute at the same time all possible parses of any portion of the inpu ..."
Abstract

Cited by 58 (11 self)
 Add to MetaCart
We present a bottomup parsing algorithm for stochastic contextfree grammars that is able (1) to deal with multiple interpretations of sentences containing compound words; (2) to extract Nmost probable parses in O(n 3 ) and compute at the same time all possible parses of any portion of the input sequence with their probabilities; (3) to deal with #out of vocabulary# words. Explicitly extracting all the parse trees associated to a given input sentence depends on the complexity of the grammar, but even in the case where this number is exponential in n, the chart used by the algorithm for the representation is of O(n 2 ) space complexity. 1 Introduction This article presents CYK+, a bottomup parsing algorithm for stochastic contextfree grammars that is able: 1. to deal multiple interpretations of sentences containing compound words; 2. to extract Nmost probable parses in O(n 3 ) and compute at the same time all possible parses of any portion of the input sequence with their p...
Parsing and hypergraphs
 In IWPT
, 2001
"... While symbolic parsers can be viewed as deduction systems, this view is less natural for probabilistic parsers. We present a view of parsing as directed hypergraph analysis which naturally covers both symbolic and probabilistic parsing. We illustrate the approach by showing how a dynamic extension o ..."
Abstract

Cited by 57 (3 self)
 Add to MetaCart
While symbolic parsers can be viewed as deduction systems, this view is less natural for probabilistic parsers. We present a view of parsing as directed hypergraph analysis which naturally covers both symbolic and probabilistic parsing. We illustrate the approach by showing how a dynamic extension of Dijkstra’s algorithm can be used to construct a probabilistic chart parser with an Ç Ò time bound for arbitrary PCFGs, while preserving as much of the flexibility of symbolic chart parsers as allowed by the inherent ordering of probabilistic dependencies. 1
Practical Unificationbased Parsing of Natural Language
, 1993
"... The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a widecoverage unificationbased grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such gr ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a widecoverage unificationbased grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such grammars can be computationally very expensive, and secondly, the observation that many analyses are often assigned to an input, only one of which usually forms the basis of the correct interpretation. The thesis starts by presenting a new unification algorithm, justifies why it is wellsuited to practical NL parsing, and describes a bottomup active chart parser which employs this unification algorithm together with several other novel processing and optimisation techniques. Empirical results demonstrate that an implementation of this parser has significantly better practical
Probabilistic parsing using left corner language models
 In Proc. of the 5th Intl. Workshop on Parsing
, 1997
"... We introduce a novel parser based on a probabilistic version of a leftcorner parser. The leftcorner strategy is attractive because rule probabilities can be conditioned on both topdown goals and bottomup derivations. We develop the underlying theory and explain how a grammar can be induced from ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
We introduce a novel parser based on a probabilistic version of a leftcorner parser. The leftcorner strategy is attractive because rule probabilities can be conditioned on both topdown goals and bottomup derivations. We develop the underlying theory and explain how a grammar can be induced from analyzed data. We show that the leftcorner approach provides an advantage over simple topdown probabilistic contextfree grammars in parsing the Wall Street Journal using a grammar induced from the Penn Treebank. We also conclude that the Penn Treebank provides a fairly weak testbed due to the flatness of its bracketings and to the obvious overgeneration and undergeneration of its induced grammar.
Parameter estimation for constrained contextfree language models
 In Proceedings of the Fifth Darpa Workshop on Speech and Natural Language
, 1992
"... A new language model incorporating both Ngram and contextfree ideas is proposed. This constrained contextfree model is specified by a stochastic contextfree prior distribution with Ngram frequency constraints. The resulting distribution is a Markov random field. Algorithms for sampling from thi ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
A new language model incorporating both Ngram and contextfree ideas is proposed. This constrained contextfree model is specified by a stochastic contextfree prior distribution with Ngram frequency constraints. The resulting distribution is a Markov random field. Algorithms for sampling from this distribution and estimating the parameters of the model are presented. 1.
An O(n³) AgendaBased Chart Parser for Arbitrary Probabilistic ContextFree Grammars
, 2001
"... While O(n³) methods for parsing probabilistic contextfree grammars (PCFGs) are well known, a tabular parsing framework for arbitrary PCFGs which allows for bottonup, topdown, and other parsing strategies, has not yet been provided. This paper presents such an algorithm, and shows its correctness ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
While O(n³) methods for parsing probabilistic contextfree grammars (PCFGs) are well known, a tabular parsing framework for arbitrary PCFGs which allows for bottonup, topdown, and other parsing strategies, has not yet been provided. This paper presents such an algorithm, and shows its correctness and advantages over prior work. The paper finishes by bringing out the connections between the algorithm and work on hypergraphs, which permits us to extend the presented Viterbi (best parse) algorithm to an inside (total probability) algorithm.
Constrained Stochastic Language Models
 Image Models (and Their Speech Model Cousins
, 1994
"... . Stochastic language models incorporating both ngrams and contextfree grammars are proposed. A constrained contextfree model specified by a stochastic contextfree prior distribution with superimposed ngram frequency constraints is derived and the resulting maximumentropy distribution is shown ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
. Stochastic language models incorporating both ngrams and contextfree grammars are proposed. A constrained contextfree model specified by a stochastic contextfree prior distribution with superimposed ngram frequency constraints is derived and the resulting maximumentropy distribution is shown to induce a Markov random field with neighborhood structure at the leaves determined by the relative ngram frequencies. A computationally efficient version, the mixed tree/chain graph model, is derived with identical neighborhood structure. In this model, a wordtree derivation is given by a stochastic contextfree prior on trees down to the preterminal (partofspeech) level and word attachment is made by a nonstationary Markov chain. Using the Penn TreeBank, a comparison of the mixed tree/chain graph model to both the ngram and contextfree models is performed using entropy measures. The model entropy of the mixed tree/chain graph model is shown to reduce the entropy of both the bigram a...
Analyzing And Improving Statistical Language Models For Speech Recognition
, 1994
"... A speech recognizer is a device that translates speech into text. Many current speech recognizers contain two components, an acoustic model and a statistical language model. The acoustic model indicates how likely it is that a certain word corresponds to a part of the acoustic signal (e.g. the speec ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
A speech recognizer is a device that translates speech into text. Many current speech recognizers contain two components, an acoustic model and a statistical language model. The acoustic model indicates how likely it is that a certain word corresponds to a part of the acoustic signal (e.g. the speech). The statistical language model indicates how likely it is that a certain word will be spoken next, given the words recognized so far. Even though the acoustic model might for example not be able to decide between the acoustically similar words "peach" and "teach", the statistical language model can indicate that the word "peach" is more likely if the previously recognized words are "He ate the". Current speech recognizers perform well on constrained tasks, but the goal of continuous, speaker independent speech recognition in potentially noisy environments with a very large vocabulary has not been reached so far. How can statistical language models be improved so that more complex tasks c...
Document Analysis at DFKI  Part 2: Information Extraction
, 1995
"... Document analysis is responsible for an essential progress in office automation. This paper is part of an overview about the combined research efforts in document analysis at DFKI. Common to all document analysis projects is the global goal of providing a high level electronic representation of d ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Document analysis is responsible for an essential progress in office automation. This paper is part of an overview about the combined research efforts in document analysis at DFKI. Common to all document analysis projects is the global goal of providing a high level electronic representation of documents in terms of iconic, structural, textual, and semantic information. These symbolic document descriptions enable an "intelligent" access to a document database. Currently there are three ongoing document analysis projects at DFKI: INCA, OMEGA, and PASCAL2000/PASCAL+. Although the projects pursue different goals in different application domains, they all share the same problems which have to be resolved with similar techniques. For that reason the activities in these projects are bundled to avoid redundant work. At DFKI we have divided the problem of document analysis into two main tasks, text recognition and information extraction, which themselves are divided into a set of s...