Results 1 -
4 of
4
Customizable Modular Lexicalized Parsing
- In Proc. of the 6th International Workshop on Parsing Technology, IWPT2000
, 2000
"... Dierent NLP applications have dierent eciency constraints (i.e. quality of the results and throughput) that reect on each core linguistic component. Syntactic processors are basic modules in some NLP application. A customization that permits the performance control of these components enables thei ..."
Abstract
-
Cited by 11 (9 self)
- Add to MetaCart
Dierent NLP applications have dierent eciency constraints (i.e. quality of the results and throughput) that reect on each core linguistic component. Syntactic processors are basic modules in some NLP application. A customization that permits the performance control of these components enables their reuse in dierent application scenarios. Throughput has been commonly improved using partial syntactic processors. On the other hand, specialized lexicons are generally employed to improve the quality of the syntactic material produced by speci c parsing (sub)process (e.g. verb argument detection or PPattachment disambiguation). Building upon the idea of grammar strati cation, in this paper a method to push modularity and lexical sensitivity, in parsing, in view of customizable syntactic analysers is presented. A framework for modular parser design is proposed and its main properties are discussed.
University of Durham: Description of the LOLITA system as Used in MUC-7
- In Proceedings of the MUC-7
, 1995
"... Laboratory for Natural Language Engineering, Department of Computer Science, University of Durham, ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Laboratory for Natural Language Engineering, Department of Computer Science, University of Durham,
Building Thesaurus from Manual Sources and Automatic Scanned Texts
"... Abstract. This paper describes the work done in the TIPS project about the construction of a thesaurus base. This construction is a merge from a thesaurus manually built and one automatically extracted from large text corpora. Several manually built thesaurus have been semiformatted to be merged in ..."
Abstract
- Add to MetaCart
Abstract. This paper describes the work done in the TIPS project about the construction of a thesaurus base. This construction is a merge from a thesaurus manually built and one automatically extracted from large text corpora. Several manually built thesaurus have been semiformatted to be merged in a consistent common base. The automatic extraction is based on both syntax and statistics. We present inthispaper the way thesaurus are built and the results on Scienti c corpus in the context of the TIPS project. 1
Czech Language Processing- PoS Tagging-DQ +DMLþ
"... In the specification of the Conference aims, the following keywords appear in the LREC materials: availability of language resources, methods for evaluation of resources, comparing different approaches to a given problem, choosing the best solution etc. To meet these goals, we present here an overvi ..."
Abstract
- Add to MetaCart
In the specification of the Conference aims, the following keywords appear in the LREC materials: availability of language resources, methods for evaluation of resources, comparing different approaches to a given problem, choosing the best solution etc. To meet these goals, we present here an overview of the state-of-art of Czech part-of-speech (PoS) tagging. We concentrate on the data creation and availability problems, then we discuss the results we obtained when using various methods to tag texts written in a highly inflectional language, and finally we conclude by an outline of future perspectives. 1 1 Natural Language Processing One of the meanings of the headword- process- speaks about ”the analysis (of information) using a computer”. That is exactly what we mean by natural language processing (NLP)- an analysis of language information using a computer. However, the computer alone is not good enough. We need an electronic database covering written and spoken language resources 2. The starting points for NLP are building a structured corpus and annotating corpus according to the needs of further processing. A corpus is a vast, electronically processed collection of language texts containing a variety of (as much explicit as possible) information the corpus might (implicitly) provide. If we look at any NLP conference proceedings from 80s and 90s that we can see at the first sight that the vast majority of frequently processed languages are English, French, German, Italian, Spanish. Why they are so few contributions on processing of some typologically different, i.e. Slavic or similar language? There are many 1 The results described herein have been obtained within various

