Results 1 - 10
of
12
Parsing in Two Frameworks: Finite-State and Functional Dependency Grammar
, 1999
"... the novel non-determistic tokenisation method which was first presented in Tapanainen (1995) and Chanod and Tapanainen (1996a), the formalism for presenting multiword units which was first presented in Tapanainen (1995) and Segond and Tapanainen (1995), the combination of the tokenisation, multiword ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
the novel non-determistic tokenisation method which was first presented in Tapanainen (1995) and Chanod and Tapanainen (1996a), the formalism for presenting multiword units which was first presented in Tapanainen (1995) and Segond and Tapanainen (1995), the combination of the tokenisation, multiword unit recognition, lexical analysis and syntactic analysis, and the syntactic disambiguation engine which is similar to that in Tapanainen (1997)
Morphosyntactic Disambiguation For Basque Based On The Constraint Grammar Formalism
"... This paper presents the development of a surface-based morphosyntactic parsing grammar, as well as the results obtained. It is based on the Constraint Grammar formalism which we find suitable for our project of disambiguating unrestricted texts. Besides, we will present a description of the main typ ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
This paper presents the development of a surface-based morphosyntactic parsing grammar, as well as the results obtained. It is based on the Constraint Grammar formalism which we find suitable for our project of disambiguating unrestricted texts. Besides, we will present a description of the main types of morphosyntactic ambiguity that we have identified and the disambiguation rules designed for their treatment. This work is the first step in the computational treatment of Basque syntax. Keywords: Morphosyntactic disambiguation Constraint Grammar Basque language Word Count: 3200 1 Introduction This paper describes the design of morphosyntactic disambiguation rules as a first step to develop a robust grammar of Basque, conceived as a general basis for different applications; for instance, a lemmatiser/tagger (Aduriz et al., 96) and a syntactic corrector (Gojenola and Sarasola, 94). We have chosen the Constraint Grammar (CG) formalism (Karlsson et al., 95; Voutilainen, 94; Tapanainen a...
Applying the Constraint Grammar Parser of English to the Helsinki Corpus
- ICAME Journal
, 1995
"... The international break-through of the ENGCG Parser, or the Constraint Grammar Parser of English (Karlsson et al. 1995), as a system suitable for analysing Present-day English, opened the field for applications capable of dealing with regional, diachronic and other varieties of ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The international break-through of the ENGCG Parser, or the Constraint Grammar Parser of English (Karlsson et al. 1995), as a system suitable for analysing Present-day English, opened the field for applications capable of dealing with regional, diachronic and other varieties of
Different Issues In The Design Of A Lemmatizer/Tagger For Basque
"... This paper presents relevant issues that have been considered in the design of a general purpose lemmatizer/tagger for Basque (EUSLEM). The lemmatizer/tagger is conceived as a basic tool necessary for other linguistic applications. It uses the lexical data base and the morphological analyzer previou ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper presents relevant issues that have been considered in the design of a general purpose lemmatizer/tagger for Basque (EUSLEM). The lemmatizer/tagger is conceived as a basic tool necessary for other linguistic applications. It uses the lexical data base and the morphological analyzer previously developed and implemented. Due to the characteristics of the language, the tagset here proposed is structured in four levels so that each level is a refinement of the previous one in the sense that it adds more detailed information. We will focus on the problems found in designing this tagset and on the strategies for morphological disambiguation that will be used. 1. Introduction This paper describes the development of a general purpose lemmatizer/tagger for Basque which will lay the foundations for further applications in the field of automatic processing of Basque texts. In order to elaborate this project the following basic tools will be used: . The Lexical Database for Basque (LDB...
AGFL Grammars for full-text Information Retrieval
, 1996
"... This paper is concerned with the development of grammars suitable for full-text Information Retrieval. It first sets out some of the design criteria which should be taken into account in writing such a grammar. Then the notation of Affix Grammars over a Finite Lattice (agfl) is described, a simple f ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper is concerned with the development of grammars suitable for full-text Information Retrieval. It first sets out some of the design criteria which should be taken into account in writing such a grammar. Then the notation of Affix Grammars over a Finite Lattice (agfl) is described, a simple formalism for the morphosyntactic description of natural languages which has an efficient implementation. It is shown how the agfl formalism helps in writing grammars with the properties demanded by Information Retrieval applications. Within the agfl framework, grammars for English and Dutch have been written for Information Retrieval purposes, which are available for use by other research groups. Some properties of those grammars and some experiences with their use are reported. Keywords: agfl, Natural Language Processing, Information Retrieval, full-text, grammarbased, robustness, island parsing. 1 Introduction Ever since the origin of Information Retrieval in the Sixties, the retrieval o...
Condorcet Annual Report
, 1997
"... _PART S SU NP N" DT the N` NK N mechanism C&A PP Prep of NP N" DT the N` NK N wear VP V` V is V` V discussed C&A PP Prep in NP N" DT each N` NK N case Figure 3.1: General framework for the sentence The mechanism of the wear is discussed in each case. As can ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
_PART S SU NP N" DT the N` NK N mechanism C&A PP Prep of NP N" DT the N` NK N wear VP V` V is V` V discussed C&A PP Prep in NP N" DT each N` NK N case Figure 3.1: General framework for the sentence The mechanism of the wear is discussed in each case. As can be seen, a sentence consists of a subject (SU) and a Verb Phrase (VP). The subject consists of a noun phrase followed by optional complements and adverbials. In this example, the head of the subject is followed by a postmodifying prepositional phrase (PP). This PP is attached to the C&A node (complements and adverbials) in order to underspecify PPsequences; the same goes for the PP in each case which serves as a modifier of the verbal cluster. This general framework should also be returned in cases of ungrammatical sentences, which we will discuss below. 3.3 Robustness In practical NLP systems such as Condorcet, the need for robustness is evident: when parsing large numbers of real-world te...
Text analysis meets corpus linguistics
- In Proceedings of the Corpus Linguistics 2003 conference
, 2003
"... In recent years, there has been rising interest to using evidence derived from automatic syntactic analysis in large-scale corpus studies. Ideally, of course, corpus linguists would prefer to have access to the wealth of structural and featural information provided by a full parser based on a comple ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In recent years, there has been rising interest to using evidence derived from automatic syntactic analysis in large-scale corpus studies. Ideally, of course, corpus linguists would prefer to have access to the wealth of structural and featural information provided by a full parser based on a complex grammar
A Two-Stage Model for Robust Parsing
, 1998
"... We discuss a modular strategy to robust parsing that is used in an information retrieval system. A two-stage model is presented, consisting of a parser and a transformational module for reanalyzing robustly parsed sentences. Provisional results are given. 1 Background This paper studies an appro ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We discuss a modular strategy to robust parsing that is used in an information retrieval system. A two-stage model is presented, consisting of a parser and a transformational module for reanalyzing robustly parsed sentences. Provisional results are given. 1 Background This paper studies an approach to robust parsing in the context of a large-scale information retrieval (IR) project, called Condorcet. The Condorcet project is an information retrieval project carried out at the University of Twente, The Netherlands, and funded by the Dutch Technology Foundation (STW), an organization that funds application-oriented technological projects. Its main objective is to build a prototype IR system that will be ultimately capable of processing titles and abstracts of 30,000 documents. The research is mainly concerned with indexing documents within two specific domains, thus producing document representations. The domains covered are mechanical properties of ceramics as a subfield of material...
SDP - Spoken Dialogue Parser
, 1998
"... This report describes work done on part of speech tagging and parsing the MapTask Corpus in the \Robust Parsing and Part-of-Speech Tagging of Transcribed Speech Corpora" project, funded by the ESRC (project R000236800). This report concentrates on the implementation of the software developed in the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This report describes work done on part of speech tagging and parsing the MapTask Corpus in the \Robust Parsing and Part-of-Speech Tagging of Transcribed Speech Corpora" project, funded by the ESRC (project R000236800). This report concentrates on the implementation of the software developed in the project and the format of the SGML annotation of the parse trees. An overview of the project's aims and results can be found in [McKelvie 98b] and an analysis of the speech disuencies found while parsing the corpus can be found in [McKelvie 98a]. 1 Contents 1 Segmentation 5
Efficiency and Robustness in AGFL
, 1997
"... In this paper we discuss an efficient and robust parser that is used in an information retrieval system. We present a principle-based approach to robustness and we discuss optimizations for improving the efficiency of top-down backtrack parsers. Preliminary results are presented. The authors wish ..."
Abstract
- Add to MetaCart
In this paper we discuss an efficient and robust parser that is used in an information retrieval system. We present a principle-based approach to robustness and we discuss optimizations for improving the efficiency of top-down backtrack parsers. Preliminary results are presented. The authors wish to thank Bas van Bakel, Paul Jones and three anonymous referees for valuable comments on earlier versions of this paper. 1 Robustness and Syntax-directedness 2 Introduction There is a growing demand, coming from natural language based applications like information filtering, full-text information retrieval and automatic translation, for fast and robust parsers for natural languages. Furthermore, grammar formalisms should allow for clear expression of linguistic knowledge in such a way that its formalization is clearly separated from procedural information. In this paper, we discuss an efficient and robust parser that is used in an large-scale information retrieval project. In Section 1, ...

