Results 11  20
of
94
Efficient Algorithms for Parsing the DOP Model
, 1996
"... Excellent results have been reported for DataOriented Parsing (DOP) of natural language texts (Bod, 1993c). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that mus ..."
Abstract

Cited by 58 (4 self)
 Add to MetaCart
Excellent results have been reported for DataOriented Parsing (DOP) of natural language texts (Bod, 1993c). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo p arsing algorithm. In this paper we solve the first problem by a novel reduction of the DOP model toga small, equivalent probabilistic contextfree grammar. We solve the second problem by a novel deterministic parsing strategy that maximizes the expected number of correct con stituents, rather than the probability of a correct parse tree. Using ithe optimizations, experiments yield a 97% crossing brackets rate and 88% zero crossing brackets rate. This differs significantly from the results reported by Bod, and is compara ble to results from a duplication of Pereira and Schabes's (1992) experiment on the same data. We show that Bod's results are at least partially due to an extremely fortuitous choice of test data, and partially due to using cleaner data than other researchers.
Applying CoTraining methods to Statistical Parsing
, 2001
"... We propose a novel CoTraining method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible lexicalized structures for each word in the training set and a large pool of unlabeled text. The algorithm iteratively labe ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
We propose a novel CoTraining method for statistical parsing. The algorithm takes as input a small corpus (9695 sentences) annotated with parse trees, a dictionary of possible lexicalized structures for each word in the training set and a large pool of unlabeled text. The algorithm iteratively labels the entire data set with parse trees. Using empirical results based on parsing the Wall Street Journal corpus we show that training a statistical parser on the combined labeled and unlabeled data strongly outperforms training only on the labeled data. 1
Bayesian grammar induction for language modeling
 In Proceedings of ACL
, 1995
"... We describe a corpusbased induction algorithm for probabilistic contextfree grammars. The algorithm employs a greedy heuristic search within a Bayesian framework, and a postpass using the InsideOutside algorithm. We compare the performance of our algorithm to ngram models and the InsideOutside ..."
Abstract

Cited by 52 (1 self)
 Add to MetaCart
We describe a corpusbased induction algorithm for probabilistic contextfree grammars. The algorithm employs a greedy heuristic search within a Bayesian framework, and a postpass using the InsideOutside algorithm. We compare the performance of our algorithm to ngram models and the InsideOutside algorithm in three language modeling tasks. In two of the tasks, the training data is generated by a probabilistic contextfree grammar and in both tasks our algorithm outperforms the other techniques. The third task involves naturallyoccurring data, and in this task our algorithm does not perform as well as ngram models but vastly outperforms the InsideOutside algorithm. 1
Can Subcategorisation Probabilities Help a Statistical Parser?
 In Proceedings of the 6th ACL/SIGDAT Workshop on Very Large Corpora
, 1998
"... Research into the automatic acquisition of lexical information from corpora is starting to produce largescale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type ..."
Abstract

Cited by 45 (5 self)
 Add to MetaCart
Research into the automatic acquisition of lexical information from corpora is starting to produce largescale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type of frequency information can in practice improve the accuracy of a statistical parser has not yet been answered. In this paper we describe an experiment with a widecoverage statistical grammar and parser for English and subcategorisation frequencies acquired from ten million words of text which shows that this information can significantly improve parse accuracy 1 .
Head Automata and Bilingual Tiling: Translation with Minimal Representations
, 1996
"... We present a language model consisting of a collection of costed bidirectional finite state automata associated with the head words of phrases. The model is suitable for incremental application of lexical associations in a dynamic programming search for optimal dependency tree derivations. We ..."
Abstract

Cited by 42 (3 self)
 Add to MetaCart
We present a language model consisting of a collection of costed bidirectional finite state automata associated with the head words of phrases. The model is suitable for incremental application of lexical associations in a dynamic programming search for optimal dependency tree derivations. We also
Stochastic Lexicalized ContextFree Grammar
, 1993
"... Stochastic lexicalized contextfree grammar (SLCFG) is an attractive compromise between the parsing efficiency of stochastic contextfree grammar (SCFG) and the lexical sensitivity of stochastic lexicalized treeadjoining grammar (SLTAG). SLCFG is a restricted form of SLTAG that can only generate ..."
Abstract

Cited by 41 (6 self)
 Add to MetaCart
Stochastic lexicalized contextfree grammar (SLCFG) is an attractive compromise between the parsing efficiency of stochastic contextfree grammar (SCFG) and the lexical sensitivity of stochastic lexicalized treeadjoining grammar (SLTAG). SLCFG is a restricted form of SLTAG that can only generate contextfree languages and can be parsed in cubic time. However, SLCFG retains the lexical sensitivity of SLTAG and is therefore a much better basis for capturing distributional information about words than SCFG.
Global Thresholding and MultiplePass Parsing
, 1997
"... We present a variation on classic beam thresholding techniques that is up to an order of magnitude faster than the traditional method, at the same performance level. We also present a new thresholding technique, global thresholding, which, combined with the new beam thresholding, gives an addi ..."
Abstract

Cited by 40 (3 self)
 Add to MetaCart
We present a variation on classic beam thresholding techniques that is up to an order of magnitude faster than the traditional method, at the same performance level. We also present a new thresholding technique, global thresholding, which, combined with the new beam thresholding, gives an additional factor of two improvement, and a novel technique, multiple pass parsing, that can be combined with the others to yield yet another 50% improvement. We use a new search algorithm to simultaneously op timize the thresholding parameters of the various algorithms.
Statistical Parsing With an AutomaticallyExtracted Tree Adjoining Grammar
, 2000
"... We discuss the advantages of lexicalized treeadjoining grammar as an alternative to lexicalized PCFG for statistical parsing, describing the induction of a probabilistic LTAG model from the Penn Treebank and evaluating its parsing performance. We find that this induction method is an improvement ov ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
We discuss the advantages of lexicalized treeadjoining grammar as an alternative to lexicalized PCFG for statistical parsing, describing the induction of a probabilistic LTAG model from the Penn Treebank and evaluating its parsing performance. We find that this induction method is an improvement over the EMbased method of [Hwa, 1998], and that the induced model yields results comparable to lexicalized PCFG.
An optimized algorithm for Data Oriented Parsing
, 1996
"... This paper presents an optimization of a syntactic disambiguation algorithm for Data Oriented Parsing (DOP) (Bod 93) in particular, and for Stochastic TreeSubstitution Grammars (STSGs) in general. The main advantage of this algorithm on existing alternatives ((Bod 93), (Schabes & Waters 93), (Sima' ..."
Abstract

Cited by 33 (5 self)
 Add to MetaCart
This paper presents an optimization of a syntactic disambiguation algorithm for Data Oriented Parsing (DOP) (Bod 93) in particular, and for Stochastic TreeSubstitution Grammars (STSGs) in general. The main advantage of this algorithm on existing alternatives ((Bod 93), (Schabes & Waters 93), (Sima'an et al. 94)) is that its timecomplexity is linear, instead of square, in grammarsize (and cubic in sentence length). It is particularly suitable for natural language STSGs which have many deep elementarytrees and a small underlying ContextFree Grammar (CFG). A first implementation of this algorithm is operational and is exhibiting substantial speed up in comparison to the unoptimized version. In addition to presenting the optimized algorithm, the paper reports experiments for measuring the disambiguationaccuracy, the expected sizes and the executiontimes of various DOP models, which are projected from the ATIS domain. Keywords: Corpusbased statistical NLP, syntactic disambiguation...
Probabilistic constraint logic programming
, 1998
"... Lautklassendetektors (Speech enhancement using a sound class detector). Vol.2 (2) 1995: Word Stress. Master thesis by Stefan Rapp (in German) and papers mostly by Grzegorz ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
Lautklassendetektors (Speech enhancement using a sound class detector). Vol.2 (2) 1995: Word Stress. Master thesis by Stefan Rapp (in German) and papers mostly by Grzegorz