Results 1 - 10
of
18
An Efficient Augmented-Context-Free Parsing Algorithm
- Computational Linguistics
, 1987
"... This paper introduces an efficient on-line parsing algorithm, and focuses on its practical application to natural language interfaces. The algorithm can be viewed as a generalized LR parsing algorithm that can handle arbitrary context-free grammars, including ambiguous grammars. Section 2 describes ..."
Abstract
-
Cited by 65 (3 self)
- Add to MetaCart
This paper introduces an efficient on-line parsing algorithm, and focuses on its practical application to natural language interfaces. The algorithm can be viewed as a generalized LR parsing algorithm that can handle arbitrary context-free grammars, including ambiguous grammars. Section 2 describes the algorithm by .extending the standard LR parsing algorithm with the idea of a "graph-structured stack". Section 3 describes how to represent parse trees efficiently, so that all possible parse trees (the parse forest) take at most polynomial space as the ambiguity of a sentence grows exponentially. In section 4, several examples are given. Section 5 presents several empirical results of the algorithm's practical performance, including comparison with Earley's algorithm. In section 6, we discuss how to enhance the algorithm to handle augmented context-free grammars rather than pure context-free grammars. Section 7 describes the concept of on-line parsing, taking advantage of left-to-right operation of our parsing algorithm. The on-line parser parses a sentence strictly from left to right, and starts parsing as soon as the user types in the first word, without waiting for the end of line. Benefits of on-line parsing are then discussed. Finally, several versions of on-line parser have been implemented, and they are mentioned in section 8
GLR*: A Robust Grammar-Focused Parser for Spontaneously Spoken Language
, 1996
"... The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disflu ..."
Abstract
-
Cited by 40 (9 self)
- Add to MetaCart
The analysis of spoken language is widely considered to be a more challenging task than the analysis of written text. All of the difficulties of written language can generally be found in spoken language as well. Parsing spontaneous speech must, however, also deal with problems such as speech disfluencies, the looser notion of grammaticality, and the lack of clearly marked sentence boundaries. The contamination of the input with errors of a speech recognizer can further exacerbate these problems. Most natural language parsing algorithms are designed to analyze "clean" grammatical input. Because they reject any input which is found to be ungrammatical in even the slightest way, such parsers are unsuitable for parsing spontaneous speech, where completely grammatical input is the exception more than the rule. This thesis describes GLR*, a parsing system based on Tomita's Generalized LR parsing algorithm, that was designed to be robust to two particular types of extra-grammaticality: noise...
Parsing Incomplete Sentences
, 1988
"... An efficient context-free parsing algorithln is preseuted that can parse sentences with unknown parts of unknown length. It produc in finite form all possible parses (often infinite in number) that could account for the missing parts. The algorithm is a variation on the construction due to Earl ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
An efficient context-free parsing algorithln is preseuted that can parse sentences with unknown parts of unknown length. It produc in finite form all possible parses (often infinite in number) that could account for the missing parts. The algorithm is a variation on the construction due to Earley. ltowever, its presentation is such that it can readily be adapted to any chart parsing schema (top- down, bottom-up, etc...).
Extensions to Constraint Dependency Parsing for Spoken Language Processing
- COMPUTER SPEECH AND LANGUAGE
, 1995
"... A text-based and spoken language processing framework based on the Constraint Dependency Grammar (CDG) developed by Maruyama [24, 25] is discussed. The scope of CDG is expanded to allow for the analysis of sentences containing lexically ambiguous words, to allow feature analysis in constraints, and ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
A text-based and spoken language processing framework based on the Constraint Dependency Grammar (CDG) developed by Maruyama [24, 25] is discussed. The scope of CDG is expanded to allow for the analysis of sentences containing lexically ambiguous words, to allow feature analysis in constraints, and to efficiently process multiple sentence candidates that are likely to arise in spoken language processing. The benefits of the CDG parsing approach are summarized. Additionally, the development of CDG grammars using our grammar tools and parser is discussed.
Degraded Text Recognition Using Visual And Linguistic Context
, 1995
"... Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depend ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depending on the extent of context used, there are different levels of postprocessing. In current commercial OCR systems, word-level postprocessing methods, such as dictionary-lookup, have been applied successfully. However, many OCR errors cannot be corrected by word-level postprocessing. To overcome this limitation, passage-level postprocessing, in which global contextual information is utilized, is necessary. In most current studies on passage-level postprocessing, linguistic context is the major resource to be exploited. This thesis addresses problems in degraded text recognition and discusses potential solutions through passage-level postprocessing. The objective is to develop a postprocessin...
Text recognition enhancement with a probabilistic lattice chart parser
- in Proceedings of the Second International Conference on Document Analysis and Recognition ICDAR-93
, 1993
"... A probabilistic lattice chart parser is proposed for improving the performance of a text recognition technique. Digital images of words are recognized and alternatives for the identity of each are generated. Local word collocation statistics and a probabilistic chart parsing algorithm are used to de ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
A probabilistic lattice chart parser is proposed for improving the performance of a text recognition technique. Digital images of words are recognized and alternatives for the identity of each are generated. Local word collocation statistics and a probabilistic chart parsing algorithm are used to determine the top N best parses for each sentence using the alternatives provided for the identity of each word by the recognition system. In this paper, an approach in which text recognition and understanding are tightly integrated is discussed. An objective of this approach is to provide the capability to process images of unrestricted English text. A large-scale lexicon, which supports the system, was acquired by training on corpora of over three million words. This paper focuses on the implementation and performance of the probabilistic lattice chart parser. Topic areas: visual text recognition and understandmg, natural language parsing and word lattice parsmg. 1
PARSEC: A Constraint-based Framework for Spoken Language Understanding
- In Proceedings of the International Conference on Spoken Language Processing
, 1992
"... We have extended Maruyama's [5, 6, 7] constraint dependency grammar (CDG) to process a lattice or graph of sentence hypotheses instead of separate text strings. A post-processor to a speech recognizer producing N-best hypotheses generates the word graph representation, which is then augmented with i ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
We have extended Maruyama's [5, 6, 7] constraint dependency grammar (CDG) to process a lattice or graph of sentence hypotheses instead of separate text strings. A post-processor to a speech recognizer producing N-best hypotheses generates the word graph representation, which is then augmented with information required for parsing. We will summarize the CDG parsing algorithm and then describe how the algorithm is extended to process a word graph on a single processor machine. 1 Introduction The most successful of the current speech recognition systems which process continuous speech for a limited (1000 word) vocabulary are those which utilize hidden Markov models (HMM). Most systems utilizing this approach (e.g., [4, 10])) have reduced recognition errors by incorporating some language information (syntactic and semantic) directly into the HMM to reduce perplexity, but since the goal of these systems is recognition, not understanding, no structural analysis of the utterance is construc...
Integration Of Visual Inter-word Constraints And Linguistic Knowledge In Degraded Text Recognition
- in Proceedings of 32nd Annual Meeting of Association for Computational Linguistics
, 1994
"... I 2 3 4 Degraded text recognition is a difficult task. Given a Please fin in tire 0.90 0.33 0.30 0.80 noisy text image, a word recognizer can be applied to Fleece fill In toe generate several candidates for each word image. High- o. os o. 30 o. 28 o. 10 level knowledge sources can then be used ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
I 2 3 4 Degraded text recognition is a difficult task. Given a Please fin in tire 0.90 0.33 0.30 0.80 noisy text image, a word recognizer can be applied to Fleece fill In toe generate several candidates for each word image. High- o. os o. 30 o. 28 o. 10 level knowledge sources can then be used to select a Pierce flu lo lire decision from the candidate set for each word image. 0.02 0.21 0.25 0.05 In this paper, we propose that visual inter-word con- Fierce flit ill the straints can be used to facilitate candidate selection. o.02 o. to o. 13 0.03 Visual inter-word constraints provide a way to link word Pieces till Io Ike images inside the text page, and to interpret them sys- 0.01 0.06 0.04 0.02 tematically.
Parsing N Best Trees from a Word Lattice
- In Advances in Artificial Intelligence. Proceedings of KI-97, number 1303 in LNAI
, 1997
"... . This article describes a probabilistic context free grammar approximation method for unification grammars. In order to produce good results, the method is combined with an N best parsing extension to chart parsing. The first part of the paper introduces the grammar approximation method, while the ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
. This article describes a probabilistic context free grammar approximation method for unification grammars. In order to produce good results, the method is combined with an N best parsing extension to chart parsing. The first part of the paper introduces the grammar approximation method, while the second part describes details of an efficient N-best packing and unpacking scheme for chart parsing. 1 Introduction Recently much attention has been payed to the integration of speech and language technology 1 . The concentration on spontaneous speech understanding led to the definition of a robust interface known as the word graph or word lattice between recognition and understanding. Depending on the application, systems are built to provide a shallow stochastic analysis or a deep linguistic analysis of the word lattice. Using a shallow stochastic approach, a rough template-based analysis can be achieved which makes sense in those cases where a fine grained reconstruction of meanings is...
Analyzing And Improving Statistical Language Models For Speech Recognition
, 1994
"... A speech recognizer is a device that translates speech into text. Many current speech recognizers contain two components, an acoustic model and a statistical language model. The acoustic model indicates how likely it is that a certain word corresponds to a part of the acoustic signal (e.g. the speec ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A speech recognizer is a device that translates speech into text. Many current speech recognizers contain two components, an acoustic model and a statistical language model. The acoustic model indicates how likely it is that a certain word corresponds to a part of the acoustic signal (e.g. the speech). The statistical language model indicates how likely it is that a certain word will be spoken next, given the words recognized so far. Even though the acoustic model might for example not be able to decide between the acoustically similar words "peach" and "teach", the statistical language model can indicate that the word "peach" is more likely if the previously recognized words are "He ate the". Current speech recognizers perform well on constrained tasks, but the goal of continuous, speaker independent speech recognition in potentially noisy environments with a very large vocabulary has not been reached so far. How can statistical language models be improved so that more complex tasks c...

