Results 1 -
7 of
7
Wide coverage parsing with stochastic attribute value grammars
- In Proceedings of the IJCNLP-04 Workshop: Beyond
, 2004
"... Stochastic Attribute Value Grammars (SAVG) provide an attractive framework for syntactic analysis, because they allow the combination of linguistic sophistication with a principled treatment of ambiguity. The paper introduces a widecoverage SAVG for Dutch, known as Alpino, and we show how this SAVG ..."
Abstract
-
Cited by 56 (5 self)
- Add to MetaCart
Stochastic Attribute Value Grammars (SAVG) provide an attractive framework for syntactic analysis, because they allow the combination of linguistic sophistication with a principled treatment of ambiguity. The paper introduces a widecoverage SAVG for Dutch, known as Alpino, and we show how this SAVG can be efficiently applied, using a beam search algorithm to recover parses from a shared parse forest. Unlike previous approaches, this algorithm does not place strict locality restrictions on the features used for disambiguation. Experimental results for a number of different corpora suggest that the SAVG framework is applicable for realistically sized grammars and corpora. 1
Alpino: Wide-coverage Computational Analysis of Dutch
- In
, 2000
"... Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. We describe the head-driven lexicalized grammar and the lexical component, which has been derived from existing resources. The grammar produces dependency structures, thus providing ..."
Abstract
-
Cited by 55 (10 self)
- Add to MetaCart
Alpino is a wide-coverage computational analyzer of Dutch which aims at accurate, full, parsing of unrestricted text. We describe the head-driven lexicalized grammar and the lexical component, which has been derived from existing resources. The grammar produces dependency structures, thus providing a reasonably abstract and theory-neutral level of linguistic representation. An important aspect of wide-coverage parsing is robustness and disambiguation. The dependency relations encoded in the dependency structures have been used to develop and evaluate both hand-coded and statistical disambiguation methods.
Unsupervised Pos-Tagging Improves Parsing Accuracy And Parsing Efficiency
- In Proceedings of IWPT
, 2001
"... It is shown that a simple POS-tagger can be used to filter the results of lexical analysis of a widecoverage computational grammar. The reduction of the number of lexical categories not only greatly improves parsing efficiency, but in our experiments also gave rise to a mild increase in parsing accu ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
It is shown that a simple POS-tagger can be used to filter the results of lexical analysis of a widecoverage computational grammar. The reduction of the number of lexical categories not only greatly improves parsing efficiency, but in our experiments also gave rise to a mild increase in parsing accuracy; in contrast to results reported in earlier work on supervised tagging. The novel aspect of our approach is that the POS-tagger does not require any human-annotated data - but rather uses the parser output obtained on a large training set.
Reinforcing Parser Preferences through Tagging
, 2003
"... Lexical ambiguity is an important source of inefficiency for wide-coverage HPSG parsing. In this paper, we propose a lexical analysis filter which removes unlikely lexical categories. The filter is implemented as a straightforward HMM n-gram POS-tagger, which computes the 'a posteriori' probability ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Lexical ambiguity is an important source of inefficiency for wide-coverage HPSG parsing. In this paper, we propose a lexical analysis filter which removes unlikely lexical categories. The filter is implemented as a straightforward HMM n-gram POS-tagger, which computes the 'a posteriori' probability of each lexical category. A lexical category is removed if a competing lexical category is sufficiently more likely. The novel aspect of our approach is the fact that the tagger is trained on the output of the parser itself; therefore there is no need for hand-annotated material. Use of this filter increases the speed of the parser considerably, and in addition gives rise to an improvement in parsing accuracy.
Statistical Parsing of Dutch using Maximum Entropy Models with Feature Merging
- In Proceedings of the Natural Language Processing Pacific Rim Symposium
, 2001
"... In this project report we describe work in statistical parsing using the maximum entropy technique and the Alpino language analysis system for Dutch. A major difficulty in this domain is the lack of sucient corpus data available for training. Among other problems, this sparseness of data increases t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this project report we describe work in statistical parsing using the maximum entropy technique and the Alpino language analysis system for Dutch. A major difficulty in this domain is the lack of sucient corpus data available for training. Among other problems, this sparseness of data increases the danger of the model overfitting the training data, making it particularly important that the selection of statistical features upon which to base the model be optimal. To this end we have adapted the notion of feature merging, a means of constructing equivalence classes of statistical features based upon common elements within them. In spite of promising preliminary results, subsequent tests have not enabled us to conclude whether this approach helps the kind of models we are working with.
Robust Efficient Parsing for Spoken Dialogue Processing
, 1998
"... ion (Johnson and Dorre, [39]) ffl x(A,B,f(A,B),g(A,h(B,i(C)))) =) x(A,B,f(,),g(,)) ffl parsewithweakening(Cat,P0,P,E0,E) :- weaken(Cat,WeakenedCat), parse(WeakenedCat,P0,P,E0,E), Cat=WeakenedCat. ffl Really helps! Ambiguity Packing ffl A parser should not construct all parse trees (exponential) ..."
Abstract
- Add to MetaCart
ion (Johnson and Dorre, [39]) ffl x(A,B,f(A,B),g(A,h(B,i(C)))) =) x(A,B,f(,),g(,)) ffl parsewithweakening(Cat,P0,P,E0,E) :- weaken(Cat,WeakenedCat), parse(WeakenedCat,P0,P,E0,E), Cat=WeakenedCat. ffl Really helps! Ambiguity Packing ffl A parser should not construct all parse trees (exponential) ffl Instead, a compact representation of all such parse trees are constructed -- grammar [42, 9] -- parse forest [76] -- packed structures [3] ffl Here: for every `result item' keep track of the lexical entry and references of other result items that were used to create it ffl Results in a lexicalized tree substitution grammar ffl which generates the input sentence with all its parse trees Bottom-up Inactive-chart Parser Item form: [i;X; j] Axioms: Goals: [0;S;n] Inference Rules: Scan [q i ;wi; qi+1 ] Complete [q k ;X1; q k 0][q k 0;X2; q k 00] : : : [q m0;Xl; qm] [q k ;X0; qm] X0 !X1:::Xl Bottom-up Inactive-chart Parser Inference Rules: Scan [q i ;wi; qi+...
Gertjan van Noord Alfa-informatica & BCN
"... Abstract It is shown that a simple POS-tagger can be used to filter the results of lexical analysis of a widecoverage computational grammar. The reduction of the number of lexical categories not only greatly improves parsing efficiency, but in our experiments also gave rise to a mild increase in par ..."
Abstract
- Add to MetaCart
Abstract It is shown that a simple POS-tagger can be used to filter the results of lexical analysis of a widecoverage computational grammar. The reduction of the number of lexical categories not only greatly improves parsing efficiency, but in our experiments also gave rise to a mild increase in parsing accuracy; in contrast to results reported in earlier work on supervised tagging. The novel aspect of our approach is that the POS-tagger does not require any human-annotated data- but rather uses the parser output obtained on a large training set. 1 Introduction Full parsing of unrestricted texts on the basis of a wide-coverage computational HPSG grammar remains a challenge. In our recent experience in the development of the Alpino system, discussed in section 2, we found that even in the presence of various clever chart parsing and ambiguity packing techniques, lexical ambiguity in particular has an important effect on parsing efficiency. In some cases, a category assigned to a word is obviously wrong for the sentence the word occurs in. For instance, in a lexicalist grammar the two occurrences of called in (1) will be associated with two distinct lexical categories. The entry associated with (1-a) will reflect the requirement that the verb combines syntactically with the particle `up'. Clearly, this lexical category is irrelevant for the analysis of sentence (1-b), since no such particle occurs in the sentence.

