Results 1 - 10
of
19
Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars
- COMPUTATIONAL LINGUISTICS
, 1993
"... ..."
Corpus Annotation for Parser Evaluation
- In Proceedings of the EACL workshop on Linguistically Interpreted Corpora (LINC
, 1999
"... We describe a recently developed corpus annotation scheme for evaluating parsers that avoids shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text. We show ho ..."
Abstract
-
Cited by 50 (5 self)
- Add to MetaCart
We describe a recently developed corpus annotation scheme for evaluating parsers that avoids shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text. We show how the corpus can be used to evaluate the accuracy of a robust parser, and relate the corpus to extant resources. 1 Introduction The evaluation of individual language-processing components forming part of larger-scale natural language processing (NLP) application systems has recently emerged as an important area of research (see e.g. Rubio, 1998; Gaizauskas, 1998). A syntactic parser is often a component of an NLP system; a reliable technique for comparing and assessing the relative strengths and weaknesses of different parsers (or indeed of different versions of the same parser during development) is therefore a necessity. Current methods for evaluating the accuracy of syntactic parsers are...
Practical Unification-based Parsing of Natural Language
, 1993
"... The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such gr ..."
Abstract
-
Cited by 46 (7 self)
- Add to MetaCart
The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such grammars can be computationally very expensive, and secondly, the observation that many analyses are often assigned to an input, only one of which usually forms the basis of the correct interpretation. The thesis starts by presenting a new unification algorithm, justifies why it is well-suited to practical NL parsing, and describes a bottom-up active chart parser which employs this unification algorithm together with several other novel processing and optimisation techniques. Empirical results demonstrate that an implementation of this parser has significantly better practical
GLR*: A Robust Grammar-Focused Parser for Spontaneously Spoken Language
, 1996
"... The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disflu ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disfluencies, the looser notion of grammaticality, and the lack of clearly marked sentence boundaries. The contamination of the input with errors of a speech recognizer can further exacerbate these problems. Most natural language parsing algorithms are designed to analyze "clean" grammatical input. Because they reject any input which is found to be ungrammatical in even the slightest way, such parsers are unsuitable for parsing spontaneous speech, where completely grammatical input is the exception more than the rule. This thesis describes GLR*, a parsing system based on Tomita's Generalized LR parsing algorithm, that was designed to be robust to two particular types of extra-grammaticality: noise...
Parser Evaluation: Using a Grammatical Relation Annotation Scheme
, 2003
"... We describe a recently developed corpus annotation scheme for evaluating parsers that avoids some of the shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
We describe a recently developed corpus annotation scheme for evaluating parsers that avoids some of the shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text. We show how the corpus can be used to evaluate the accuracy of a robust parser, and relate the corpus to extant resources.
Putting Language Into Language Modeling
- In Proc. of Eurospeech-99
, 1999
"... In this paper we describe the statistical Structured Language Model (SLM) that uses grammatical analysis of the hypothesized sentence segment (prefix) to predict the next word. We first describe the operation of a basic, completely lexicalized SLM that builds up partial parses as it proceeds left to ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
In this paper we describe the statistical Structured Language Model (SLM) that uses grammatical analysis of the hypothesized sentence segment (prefix) to predict the next word. We first describe the operation of a basic, completely lexicalized SLM that builds up partial parses as it proceeds left to right. We then develop a chart parsing algorithm and with its help a method to compute the prediction probabilities P (w i+1 jW i ): We suggest useful computational shortcuts followed by a method of training SLM parameters from text data. Finally, we introduce more detailed parametrization that involves non-terminal labeling and considerably improves smoothing of SLM statistical parameters. We conclude by presenting certain recognition and perplexity results achieved on standard corpora. 1. INTRODUCTION In the accepted statistical formulation of the speech recognition problem [1] the recognizer seeks to find the word string c W : = arg max W P (AjW)P (W) where A denotes the observab...
Learning unification-based grammars using the Spoken English Corpus
- In Grammatical Inference and Applications
, 1994
"... This paper describes a grammar learning system that combines modelbased and data-driven learning within a single framework. Our results from learning grammars using the Spoken English Corpus (SEC) suggest that combined model-based and data-driven learning can produce a more plausible grammar than is ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
This paper describes a grammar learning system that combines modelbased and data-driven learning within a single framework. Our results from learning grammars using the Spoken English Corpus (SEC) suggest that combined model-based and data-driven learning can produce a more plausible grammar than is the case when using either learning style in isolation. 1
Automatic Extraction of Tagset Mappings from Parallel-Annotated Corpora
, 1995
"... Several research projects around the world are building grammatically analysed corpora; that is, collections of text annotated with part-of-speech wordtags and syntax trees. However, projects have used quite different wordtagging and parsing schemes. Developers of corpora adhere to a variety of comp ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Several research projects around the world are building grammatically analysed corpora; that is, collections of text annotated with part-of-speech wordtags and syntax trees. However, projects have used quite different wordtagging and parsing schemes. Developers of corpora adhere to a variety of competing models or theories of grammar and parsing, with the effect of restricting the accessibility of their respective corpora, and the potential for collation into a single fully parsed corpus. In view of this heterogeneity, we have begun to investigate and develop methods of automatically mapping between the annotation schemes of the most widely known corpora, thus assessing their differences and improving their reusability. Annotating a single corpus with the different schemes allows for comparisons and will provide a rich testbed for automatic parsers. Collation of all the included corpora into a single large annotated corpus will provide a more detailed language model to be developed for...
Learning Unification-Based Natural Language Grammars
, 1994
"... Practical text processing systems need wide covering grammars. When parsing unrestricted language, such grammars often fail to generate all of the sentences that humans would judge to be grammatical. This problem undermines successful parsing of the text and is known as undergeneration. There are tw ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Practical text processing systems need wide covering grammars. When parsing unrestricted language, such grammars often fail to generate all of the sentences that humans would judge to be grammatical. This problem undermines successful parsing of the text and is known as undergeneration. There are two main ways of dealing with undergeneration: either by sentence correction, or by grammar correction. This thesis concentrates upon automatic grammar correction (or machine learning of grammar) as a solution to the problem of undergeneration. Broadly speaking, grammar correction approaches can be classified as being either datadriven, or model-based. Data-driven learners use data-intensive methods to acquire grammar. They typically use grammar formalisms unsuited to the needs of practical text processing and cannot guarantee that the resulting grammar is adequate for subsequent semantic interpretation. That is, data-driven learners acquire grammars that generate strings that humans would jud...
Learning Unification-Based Grammars and the Treatment of Undergeneration
, 1993
"... We present a framework for learning plausible unification-based natural language grammars. Our framework uses both modelbased and data-driven learning without being committed to any particular configuration of these two learning schemes. We use learning to overcome the problem of undergeneration in ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We present a framework for learning plausible unification-based natural language grammars. Our framework uses both modelbased and data-driven learning without being committed to any particular configuration of these two learning schemes. We use learning to overcome the problem of undergeneration in natural language grammars. This paper presents work that is still in progress: the model-based learning component has been built but the data-driven learning component has not. Full evaluation of the framework awaits a complete implementation. 1 Introduction 1.1 Undergeneration An application of learning natural language grammars is the treatment of undergeneration. A grammar undergenerates when it fails to generate some sentence which human informants judge to be grammatical. Undergeneration undermines the successful processing of natural language. Consider the grammar: S ! NP VP NP ! Det N1 VP ! V NP N1 ! N0 NP ! N1 Sam : NP chases : V the : Det happy : Adj cat : N0 This grammar ...

