Results 1 - 10
of
23
A probabilistic corpus-driven model for lexical-functional analysis
- Proceedings COLING-ACL'98
, 1998
"... rens.bod @ let.uva.nl Wc develop a l)ata-Oricntcd Parsing (DOP) model based on the syntactic representations of Lexical-f;unctional Grammar (LFG). We start by sum-marizing the original DOP model for tree represen-tations and then show how it can be extended with corresponding functional structures. ..."
Abstract
-
Cited by 56 (15 self)
- Add to MetaCart
rens.bod @ let.uva.nl Wc develop a l)ata-Oricntcd Parsing (DOP) model based on the syntactic representations of Lexical-f;unctional Grammar (LFG). We start by sum-marizing the original DOP model for tree represen-tations and then show how it can be extended with corresponding functional structures. The resulting LFG-DOP model triggers a new, corpus-based notion of grammaticality, and its probability models exhibit interesting behavior with respect to specificity and the interpretation of ill-formed strings. 1.
Parsing with the Shortest Derivation
- Proceedings COLING-2000
, 2000
"... tens @ scs.lecd s.ac.uk Common wisdom has it that tile bias of stochastic grammars in favor of shorter deriwttions of a sentence is hamfful and should be redressed. We show that the common wisdom is wrong for stochastic grammars that use elementary trees instead o1 ' conlext-l'ree rules, such as Sto ..."
Abstract
-
Cited by 34 (12 self)
- Add to MetaCart
tens @ scs.lecd s.ac.uk Common wisdom has it that tile bias of stochastic grammars in favor of shorter deriwttions of a sentence is hamfful and should be redressed. We show that the common wisdom is wrong for stochastic grammars that use elementary trees instead o1 ' conlext-l'ree rules, such as Stochastic Tree-Substitution Grammars used by Data-Oriented Parsing models. For such grammars a non-probabilistic metric based on tile shortest derivation outperforms a probabilistic metric on the ATIS and OVIS corpora, while it obtains competitive results on the Wall Street Journal (WSJ) corpus. This paper also contains the first publislmd experiments with DOP on the WSJ. 1.
ABL: Alignment-Based Learning
, 2000
"... This pal)or int;roduces a new type of grammar learning algorit;hm, insl)ircd l)y sl,ring edii, dis- tan(;c (Wagner an(t Fis(:hcr, 1974). The algorithm takes a (:oft)us of fiat senl,en(:cs as intml, and rcLurns a corpus of labelled, 1)ra(:keted senl, en(:es. Th( lnel,hod works on pairs of Lured sellt ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
This pal)or int;roduces a new type of grammar learning algorit;hm, insl)ircd l)y sl,ring edii, dis- tan(;c (Wagner an(t Fis(:hcr, 1974). The algorithm takes a (:oft)us of fiat senl,en(:cs as intml, and rcLurns a corpus of labelled, 1)ra(:keted senl, en(:es. Th( lnel,hod works on pairs of Lured sellt,ellCeS l,ha[, have oBe o1: illore words in (:ommon. When t, wo sentences are (tivi(led int,o t)arLs i;haL m'e Lhc same in 1)ol, h s(mLen(:es and t)arLs that m:e (litlrenL, this interreal,ion is used to find ])m'Ls l, haL are hd;cr(:hmgeablc. These t)arLs m'e tak(m as possible (:onsLii, uenLs same type. Afi,er this aligmnent learning step, the sele(:tion learning s(,c 1) s(l(z(:l,s i,he mosL at)le (:onsl;ihmnl;s fi'om all possible (:onsLiLuent,s. This method was used 1,o booLsLra t) stru(:hrc on the A.TIS (:oftres (Mm'(:us et, al., 1993) and on the OVI'S 1 corpus (Bornmina eL al., 1997). While Lhc results are en(:om'aging (we o})l, aincd up t,o 89.25 % non-crossing l)ra(:ket,s 1)rc(:ision), this paper will 1)oini; ouL some of the shorl,COlnings of our apl)rom:h and will suggest 1)ossible sohd,ions.
Compacting the Penn Treebank Grammar
, 1998
"... Treebanks, such as the Penn Treebank (PTB), offer a simple approach to obtaining a broad coverage grammax: one can simply read the grammar off the parse trees in the treebank. While such a grammar is easy to obtain, a square-root rate of growth of the rule set with corpus size suggests that the deri ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
Treebanks, such as the Penn Treebank (PTB), offer a simple approach to obtaining a broad coverage grammax: one can simply read the grammar off the parse trees in the treebank. While such a grammar is easy to obtain, a square-root rate of growth of the rule set with corpus size suggests that the derived grammar is far from complete and that much more treebanked text would be required to obtain a complete grammar, if one exists at some limit. However, we offer an alternative explanation in terms of the underspecification of structures within the treebank. This hypothesis is explored by applying an algorithm to compact the derived grammax by eliminating redundant rules - rules whose right hand sides can be parsed by other rules. The size of the resulting compacted grammar, which is significantly less than that of the full treebank grammar, is shown to approach a limit. However, such a compacted grammar does not yield very good performance figures. A version of the compaction algorithm taking rule probabilities into account is proposed, which is argued to be more linguistically motivated. Combined with simple thresholding, this method can be used to give a 58% reduction in grammar size without significant change in parsing performance, and can produce a 69% reduction with some gain in recall, but a loss in precision.
A Unified Model of Structural Organization in Language and Music
, 2002
"... Is there a general model that can predict the perceived phrase structure in language and music? While it is usually assumed that humans have separate faculties for language and music, this work focuses on the commonalities rather than on the differences between these modalities, aiming at finding ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
Is there a general model that can predict the perceived phrase structure in language and music? While it is usually assumed that humans have separate faculties for language and music, this work focuses on the commonalities rather than on the differences between these modalities, aiming at finding a deeper "faculty". Our key idea is that the perceptual system strives for the simplest structure (the "simplicity principle"), but in doing so it is biased by the likelihood of previous structures (the "likelihood principle"). We present a series of dataoriented pro'sing (DOP) models that combine these two principles and that are tested on the Penn Treebank and the Essen Folksong Collection. Our experiments show that (1) a combination of the two principles outperforms the use of either of them, and (2) exactly the same model with the same parameter setting achieves maximum accuracy for both language and music. We argue that our results suggest an interesting parallel between linguistic and musical structuring.
A Memory-Based Model of Syntactic Analysis: Data-Oriented Parsing
- Journal of Experimental and Theoretical Artificial Intelligence
, 1999
"... This paper presents a memory-based model of human syntactic processing: Data-Oriented Parsing. After a brief introduction (section 1), it argues that any account of disambiguation and many other performance phenomena inevitably has an important memory-based component (section 2). It discusses the li ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
This paper presents a memory-based model of human syntactic processing: Data-Oriented Parsing. After a brief introduction (section 1), it argues that any account of disambiguation and many other performance phenomena inevitably has an important memory-based component (section 2). It discusses the limitations of probabilistically enhanced competence-grammars, and argues for a more principled memory-based approach (section 3). In sections 4 and 5, one particular memory-based model is described in some detail: a simple instantiation of the "Data-Oriented Parsing" approach ("DOP1"). Section 6 reports on experimentally established properties of this model, and section 7 compares it with other memory-based techniques. Section 8 concludes and points to future work. 1.
Context-Sensitive Spoken Dialogue Processing with the DOP Model
- Natural Language Engineering
, 1999
"... We show how the DOP model can be used for fast and robust context-sensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates ov ..."
Abstract
-
Cited by 9 (8 self)
- Add to MetaCart
We show how the DOP model can be used for fast and robust context-sensitive processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates over ordinary telephone lines. The prototype system is the immediate goal of the NWO Priority Programme "Language and Speech Technology". In this paper, we extend the original DOP model to context-sensitive interpretation of spoken input. The system we describe uses the OVIS corpus (which consists of 10,000 trees enriched with compositional semantics) to compute from an input word-graph the best utterance together with its meaning. Dialogue context is taken into account by dividing up the OVIS corpus into contextdependent subcorpora. Each system question triggers a subcorpus by which the user answer is analyzed and interpreted. Our experiments indicate that the context-sensitive DOP model obtains better accuracy than the original model, allowing for fast and robust processing of spoken input.
Spoken Dialogue Interpretation with the DOP Model
, 1998
"... We show how the DOP model can be used for fast and robust processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates over ordinary teleph ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We show how the DOP model can be used for fast and robust processing of spoken input in a practical spoken dialogue system called OVIS. OVIS, Openbaar Vervoer Informatie Systeem ("Public Transport Information System"), is a Dutch spoken language information system which operates over ordinary telephone lines. The prototype system is the immediate goal of the NWO 1 Priority Programme "Language and Speech Technology". In this paper, we extend the original DOP model to context-sensitive interpretation of spoken input. The system we describe uses the OVIS corpus (10,000 trees enriched with compositional semantics) to compute from an input word-graph the best utterance together with its meaning. Dialogue context is taken into account by dividing up the OVIS corpus into context-dependent subcorpora. Each system question triggers a subcorpus by which the user answer is analyzed and interpreted. Our experiments indicate that the context-sensitive DOP model obtains better accuracy than the original model, allowing for fast and robust processing of spoken input.
An Empirical Evaluation of LFG-DOP
- In Proceedings of the 19th International Conference on Computational Linguistics
, 2000
"... This paper presents an empirical assessment of the LFG-DOP model introduced by Bod & Kaplan (1998). The parser we describe uses fragments from LFG-annotated sentences to parse new sentences and Monte Carlo techniques to compute the most probable parse. While our main goal is to test Bod & Kaplan's m ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This paper presents an empirical assessment of the LFG-DOP model introduced by Bod & Kaplan (1998). The parser we describe uses fragments from LFG-annotated sentences to parse new sentences and Monte Carlo techniques to compute the most probable parse. While our main goal is to test Bod & Kaplan's model, we will also test a version of LFG-DOP which treats generalized fragments as previously unseen events. Experiments with the Verbmobil and Homecentre corpora show that our version of LFG-DOP outperforms Bod & Kaplan's model, and that LFG's functional information improves the parse accuracy of tree structures. 1
An Improved Parser for Data-Oriented Lexical-Functional Analysis
- In Proceedings of the 38th Conference of the Association for Computational Linguistics
"... We present an LFG-DOP parser which uses fragments from LFG-annotated sentences to parse new sentences. Experiments with the Verbmobil and Homecentre corpora show that (1) Viterbi n best search performs about 100 times faster than Monte Carlo search while both achieve the same accuracy; (2) the DOP h ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
We present an LFG-DOP parser which uses fragments from LFG-annotated sentences to parse new sentences. Experiments with the Verbmobil and Homecentre corpora show that (1) Viterbi n best search performs about 100 times faster than Monte Carlo search while both achieve the same accuracy; (2) the DOP hypothesis which states that parse accuracy increases with increasing fragment size is confirmed for LFG-DOP; (3) LFG-DOP's relative frequency estimator performs worse than a discounted frequency estimator; and (4) LFG-DOP significantly outperforms Tree-DOP if evaluated on tree structures only. 1

