Results 1 - 10
of
18
Part-of-Speech Tagging and Partial Parsing
- Corpus-Based Methods in Language and Speech
, 1996
"... m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the va ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the vagaries of natural text, by sacrificing completeness of analysis and accepting a low but non-zero error rate. 1 Tagging The earliest taggers [35, 51] had large sets of hand-constructed rules for assigning tags on the basis of words' character patterns and on the basis of the tags assigned to preceding or following words, but they had only small lexica, primarily for exceptions to the rules. TAGGIT [35] was used to generate an initial tagging of the Brown corpus, which was then hand-edited. (Thus it provided the data that has since been used to train other taggers [20].) The tagger described by Garside [56, 34], CLAWS, was a probabilistic version of TAGGIT, and the DeRose tagger improved on
Towards a Uniform Formal Framework for Parsing
- Current Issues in Parsing Technology
, 1991
"... Introduction Many of the formalisms used to define the syntax of natural (and programming) languages may be located in a continuum that ranges from propositional Horn logic to full first order Horn logic, possibly with non-Herbrand interpretations. This structural parenthood has been previously rem ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
Introduction Many of the formalisms used to define the syntax of natural (and programming) languages may be located in a continuum that ranges from propositional Horn logic to full first order Horn logic, possibly with non-Herbrand interpretations. This structural parenthood has been previously remarked: it lead to the development of Prolog [Col-78, Coh-88] and is analyzed in some detail in [PerW-80] for Context-Free languages and Horn Clauses. A notable outcome is the parsing technique known as Earley deduction [PerW-83]. These formalisms play (at least) three roles: descriptive: they give a finite and organized description of the syntactic structure of the language, analytic: they can be used to analyze sentences so as to retrieve a syntactic structure (i.e. a representation) from which the meaning can be extracted, generative: they can also be used as the specification of the concrete representation of sentences from a more
GLR*: A Robust Grammar-Focused Parser for Spontaneously Spoken Language
, 1996
"... The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disflu ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disfluencies, the looser notion of grammaticality, and the lack of clearly marked sentence boundaries. The contamination of the input with errors of a speech recognizer can further exacerbate these problems. Most natural language parsing algorithms are designed to analyze "clean" grammatical input. Because they reject any input which is found to be ungrammatical in even the slightest way, such parsers are unsuitable for parsing spontaneous speech, where completely grammatical input is the exception more than the rule. This thesis describes GLR*, a parsing system based on Tomita's Generalized LR parsing algorithm, that was designed to be robust to two particular types of extra-grammaticality: noise...
Generalized Left-Corner Parsing
- In Sixth Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference
, 1993
"... We show how techniques known from generalized LR parsing can be applied to leftcorner parsing. The esulting parsing algorithm for context-free grammars has some advantages over generalized LR parsing: the sizes and generation times of the parsers are smaller, the produced output is more compa ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
We show how techniques known from generalized LR parsing can be applied to leftcorner parsing. The esulting parsing algorithm for context-free grammars has some advantages over generalized LR parsing: the sizes and generation times of the parsers are smaller, the produced output is more compact, and the basic parsing technique can more easily be adapted to arbitrary context-free grammars.
The intersection of Finite State Automata and Definite Clause Grammars
, 1995
"... Bernard Lang defines parsing as the calculation of the intersection of a FSA (the input) and a CFG. Viewing the input for parsing as a FSA rather than as a string combines well with some approaches in speech understanding systems, in which parsing takes a word lattice as input (rather than a word st ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
Bernard Lang defines parsing as the calculation of the intersection of a FSA (the input) and a CFG. Viewing the input for parsing as a FSA rather than as a string combines well with some approaches in speech understanding systems, in which parsing takes a word lattice as input (rather than a word string). Furthermore, certain techniques for robust parsing can be modelled as finite state transducers.
Yet Another Chart-Based Technique for Parsing Ill-Formed Input
- The Fourth Conference on Applied Natural Language Processing
, 1991
"... A new chart-based technique for parsing ill-formed input is proposed. This can process sentences with unknown/misspelled words, omitted words or extraneous words. This generalized parsing strategy is, similar to Mellish's, based on an active chart parser, and shares the many advantages of Mell ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
A new chart-based technique for parsing ill-formed input is proposed. This can process sentences with unknown/misspelled words, omitted words or extraneous words. This generalized parsing strategy is, similar to Mellish's, based on an active chart parser, and shares the many advantages of Mellish's technique. It is based on pure syntactic knowledge, it is independent of all grammars, and it does not slow down the original parsing operation if there is no ill-formedness.
The information conveyed by words in sentences
- Journal of Psycholinguistic Research
, 2003
"... A method is presented for calculating the amount of information conveyed to a hearer by a speaker emitting a sentence generated by a probabilistic grammar known to both parties. The method applies the work of Grenander (1967) to the intermediate states of a top-down parser. This allows the uncertain ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
A method is presented for calculating the amount of information conveyed to a hearer by a speaker emitting a sentence generated by a probabilistic grammar known to both parties. The method applies the work of Grenander (1967) to the intermediate states of a top-down parser. This allows the uncertainty about structural ambiguity to be calculated at each point in a sentence. Subtracting these values at successive points gives the information conveyed by a word in a sentence. Word-byword information conveyed is calculated for several small probabilistic grammars, and it is suggested that the number of bits conveyed per word is a determinant of reading times and other measures of cognitive load. KEY WORDS: computational psycholinguistics; entropy reduction.
Prefix Probabilities from Stochastic Tree Adjoining Grammars
- In 36th Annual Meeting of the ACL, Proceedings of the Conference
, 1998
"... Language models for speech recognition typically use a probability model of the form Pr(anla,a2,...,an-). Stochastic grammars, on the other hand, are typically used to as- sign structure to utterances. A language model of the above form is constructed from such grammars by computing the prefix proba ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Language models for speech recognition typically use a probability model of the form Pr(anla,a2,...,an-). Stochastic grammars, on the other hand, are typically used to as- sign structure to utterances. A language model of the above form is constructed from such grammars by computing the prefix probabil- ity we:* Pr(a ...anw), where w represents all possible terminations of the prefix a ..-an. The main result in this paper is an algorithm to compute such prefix probabilities given a stochastic Tree Adjoining Grammax (TAG). The algorithm achieves the required computa- tion in O(n 6) time. The probability of sub- derivations that do not derive any words in the prefix, but contribute structurally to its derivation, are precomputed to achieve termination. This algorithm enables existing corpus-based estimation techniques for stochastic TAGs to be used for language modelling.
Combining Labeled and Unlabeled Data in Statistical Natural Language Parsing
, 2002
"... Prof. Aravind Joshi, my dissertation advisor has been my guide and mentor for the entire time that I spent at Penn. I thank him for all his academic help and personal kindness. The external member on my dissertation committee was Steven Abney, whose suggestions and advice have made the ideas present ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Prof. Aravind Joshi, my dissertation advisor has been my guide and mentor for the entire time that I spent at Penn. I thank him for all his academic help and personal kindness. The external member on my dissertation committee was Steven Abney, whose suggestions and advice have made the ideas presented here stronger. My dissertation committee members from Penn: Mitch Marcus, Mark Liberman and Martha Palmer provided questions whose answers shaped my dissertation proposal into the finished form in front of you. Many thanks to my academic collaborators; the work on prefix probabilities was done with Mark-Jan Nederhof and Giorgio Satta when they visited IRCS in 1998, the work on subcategorization frame learning was done in collaboration with Daniel Zeman when he visited IRCS in 2000. Thanks to B. Srinivas whose previous work provided the path to the experimental work in this dissertation. Thanks also to Paola Merlo and Suzanne Stevenson for discussions on their work on verb alternation classes. I also acknowledge the help of Woottiporn Tripasai in the extension of their work presented in this dissertation. Thanks to
Robust Parsing Using Dynamic Programming
, 2003
"... A robust parser for context-free grammars, based on a dynamic programming architecture, is described. We integrate a regional error repair algorithm and a strategy to deal with incomplete sentences including unknown parts of unknown length. Experimental tests prove the validity of the approach, ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A robust parser for context-free grammars, based on a dynamic programming architecture, is described. We integrate a regional error repair algorithm and a strategy to deal with incomplete sentences including unknown parts of unknown length. Experimental tests prove the validity of the approach, illustrating the perspectives for its application in real systems over a variety of dierent situations, as well as the causes underlying the computational behavior observed.

