Results 1 -
9 of
9
Partial parsing via finite-state cascades
- Natural Language Engineering
, 1996
"... Finite-state cascades represent an attractive architecture for parsing unrestricted text. Deterministic parsers specified by finite-state cascades are fast and reliable. They can be extended at modest cost to construct parse trees with finite feature structures. Finally, such deterministic parsers d ..."
Abstract
-
Cited by 261 (4 self)
- Add to MetaCart
Finite-state cascades represent an attractive architecture for parsing unrestricted text. Deterministic parsers specified by finite-state cascades are fast and reliable. They can be extended at modest cost to construct parse trees with finite feature structures. Finally, such deterministic parsers do not necessarily involve trading off accuracy against speed—they may in fact be more accurate than exhaustive-search stochastic contextfree parsers. 1 Finite-State Cascades Of current interest in corpus-oriented computational linguistics are techniques for bootstrapping broad-coverage parsers from text corpora. The work described here is a step along the way toward a bootstrapping scheme that involves inducing a tagger from word distributions, a lowlevel “chunk ” parser from a tagged corpus, and lexical dependencies from a chunked corpus. In particular, I describe a chunk parsing technique based on what I will call a finite-state cascade. Though I shall not address the question of inducing such a parser from a corpus, the parsing technique has been implemented and is being used in a project for inducing lexical dependencies from corpora in English and German. The resulting parsers are robust and very fast. A finite-state cascade consists of a sequence of levels. Phrases at one level are built on phrases at the previous level, and there is no recursion: phrases never contain same-level or higher-level phrases. Two levels of special importance are the level of chunks and the level of simplex clauses [2, 1]. Chunks are the non-recursive cores of “major ” phrases, i.e., NP, VP, PP, AP, AdvP. Simplex clauses are clauses in which embedded clauses have been turned into siblings— tail recursion has been replaced with iteration, so to speak. To illustrate, (1) shows a parse tree represented as a sequence of levels.
Part-of-Speech Tagging and Partial Parsing
- Corpus-Based Methods in Language and Speech
, 1996
"... m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the va ..."
Abstract
-
Cited by 85 (0 self)
- Add to MetaCart
m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the vagaries of natural text, by sacrificing completeness of analysis and accepting a low but non-zero error rate. 1 Tagging The earliest taggers [35, 51] had large sets of hand-constructed rules for assigning tags on the basis of words' character patterns and on the basis of the tags assigned to preceding or following words, but they had only small lexica, primarily for exceptions to the rules. TAGGIT [35] was used to generate an initial tagging of the Brown corpus, which was then hand-edited. (Thus it provided the data that has since been used to train other taggers [20].) The tagger described by Garside [56, 34], CLAWS, was a probabilistic version of TAGGIT, and the DeRose tagger improved on
Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax
- In proceedings of the 35th Annual Meeting of the ACL
, 1997
"... A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using tran ..."
Abstract
-
Cited by 33 (7 self)
- Add to MetaCart
A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the success- ful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. Final results are evaluated for precision and recall, and implications for indexing and retrieval are discussed.
NLP for Term Variant Extraction: Synergy between Morphology, Lexicon, and Syntax
, 1999
"... . We present a natural language processing (NLP) approach to automatic indexing over controlled vocabulary which accounts for term variation. The approach combines a part of speech tagger, a generator of morphologically related forms, and a shallow transformational parser. The system is applied to t ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
. We present a natural language processing (NLP) approach to automatic indexing over controlled vocabulary which accounts for term variation. The approach combines a part of speech tagger, a generator of morphologically related forms, and a shallow transformational parser. The system is applied to the French language; it is trained on newspaper articles and tested on scientific literature. Precision rate of indexing on term and variants is 97.2%. It is only slightly lower than indexing without accounting for term variation (99.7%). Recall rate of indexing on term and variants (93.4%) is much higher than recall of indexing on term occurrences only (72.4%). Conflation of term variants increases indexing coverage up to 30%. The system is a convincing example of the potential synergy between full-fledged morphological analysis and local syntactic analysis. Many details are provided on the implementation of the system. Illustrative examples of syntactic transformations for the French language are given together with the theoretical and empirical methods for their formulation. 2 CHRISTIAN JACQUEMIN AND EVELYNE TZOUKERMANN 1.
Understanding Natural Language Descriptions of Physical Phenomena
, 2004
"... The fact that human readers can learn about the physical world from textual descriptions leads to a number of interesting questions about the connections between our conceptual understanding of the physical world and how it is reflected in natural language. This thesis investigates some forms in whi ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
The fact that human readers can learn about the physical world from textual descriptions leads to a number of interesting questions about the connections between our conceptual understanding of the physical world and how it is reflected in natural language. This thesis investigates some forms in which information about physical phenomena is typically expressed in natural language and how this knowledge can be used to construct models of the underlying physical processes. Based on an analysis of the representations of physical quantities in natural language and common, reoccurring syntactic patterns, we implemented a system that uses Qualitative Process (QP) Theory to guide the semantic interpretation process to capture information about physical phenomena found in natural language text. We have recast QP Theory in terms of frame semantics as FrameNet-compatible representations (QP frames) and use an extendable, controlled subset of English to capture QP specific information from natural language descriptions. In addition to general background knowledge based on a subset of the Cyc knowledge base and the lexical information supplied by a syntactic parser, the semantics of QP Theory are used in rules that guide the semantic interpretation process and the construction of QP Frames. The thesis illustrates that QP Theory, as an established theoretical framework for handling continuous parameters and causation, can provide an essential component of
Retrieval from captioned image databases using natural language processing
- In Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM-00
, 2000
"... At first sight, it might appear that natural language processing should improve the accuracy of information retrieval systems, by making available a more detailed analysis of queries and documents. Although past results appear to show that this is not so, if the focus is shifted to short phrases rat ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
At first sight, it might appear that natural language processing should improve the accuracy of information retrieval systems, by making available a more detailed analysis of queries and documents. Although past results appear to show that this is not so, if the focus is shifted to short phrases rather than full documents, the situation becomes somewhat different. The ANVIL system uses a natural language technique to obtain high accuracy retrieval of images which have been annotated with a descriptive textual caption. The natural language techniques also allow additional contextual information to be derived from the relation between the query and the caption, which can help users to understand the overall collection of retrieval results. The techniques have been successfully used in a information retrieval system which forms both a testbed for research and the basis of a commercial system.
AUTOMATIC IDENTIFICATION OF CAUSAL RELATIONS IN TEXT AND THEIR USE FOR IMPROVING PRECISION IN INFORMATION RETRIEVAL
"... This is a reformatted version of the original dissertation, and ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This is a reformatted version of the original dissertation, and
Decision Tree-Based Noun Phrase Detection and Classification in Agglutinative Languages
, 1999
"... The current paradigm in parsing has been developed primarily using English, a language that relies on word order to express grammatical function. However, most languages in the world rely much more on NP-marking to express the same functions. We propose therefore a shallow NP parsing technique wh ..."
Abstract
- Add to MetaCart
The current paradigm in parsing has been developed primarily using English, a language that relies on word order to express grammatical function. However, most languages in the world rely much more on NP-marking to express the same functions. We propose therefore a shallow NP parsing technique which makes much more use of NP-marking, and evaluate the technique on Korean, an agglutinating language. 1 Introduction In this paper, we take a shallow parsing approach to identifying Noun Phrases and their grammatical relationship to the verb. Unlike more conventional parsers, our techniques rely on local analysis and the use of decision trees which are trained on syntactically annotated corpora. 1.1 Coding Grammatical Function One of the most important reasons for performing parsing is to determine the functional relationships that exist between constituents in a sentence. Determining these relationships is necessary for many, if not most, applications of parsing, including Machine Tr...
A Natural Language Approach To Multi-Word Term Conflation
- Proceedings of the DELOS conference from the European Research Consortium on Information Management (ERCIM
, 1997
"... This paper presents a corpus-based system to expand multi-word index terms using a part-of-speech tagger and a fullfledged derivational morphological system, combined with a shallow parser. The unique contribution of the research is in using these linguistically based tools with filters in order to ..."
Abstract
- Add to MetaCart
This paper presents a corpus-based system to expand multi-word index terms using a part-of-speech tagger and a fullfledged derivational morphological system, combined with a shallow parser. The unique contribution of the research is in using these linguistically based tools with filters in order to avoid the problems of semantic degradation typically associated with derivational analysis. The expansion and subsequent conflation of terms increases indexing coverage up to 30%, with precision of nearly 90% for correct identification of related terms. The system core is language independent and provides a uniform platform on which to build multilingual applications. Language specific modules have been developed for English and French. The fully implemented system is described with particular attention to the role of derivational morphology and phrasal relations. Results and evaluation will be presented in terms of precision and recall, with an analysis of errors. This paper illustrates how the use of natural language processing tools for tasks to which they are especially suited such as indexing, has the potential to improve performance in IR. Paper presented at the DELOS Workshop on Cross-Linguistic Information Retrieval, Zurich, 5-7 March 1997. ERCIM. System Function and Architecture Three NLP modules are key to the system: morphology, part of speech tagging, and surface syntactic analysis (see Figure 1). The emphasis in our research is on the computational linguistic features of the system with particular attention to the role of the morphological component, and on the utilization of a toolset to solve the multi-word indexing coverage problem in information retrieval. The system consists of the following procedures: 1. Start with a multi-word term list and a large corpu...

