Results 1 -
5 of
5
Tokenization and proper noun recognition for information retrieval
- In 3rd International Workshop on Natural Language and Information Systems (NLIS 2002), September 2-3, 2002. Aix-en-Provence
, 2002
"... In this paper we consider a set of natural language processing techniques that can be used to analyze large amounts of texts, focusing on the advanced tokenizer which accounts for a number of complex linguistic phenomena, as well as for pre-tagging tasks such as proper noun recognition. We also show ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
In this paper we consider a set of natural language processing techniques that can be used to analyze large amounts of texts, focusing on the advanced tokenizer which accounts for a number of complex linguistic phenomena, as well as for pre-tagging tasks such as proper noun recognition. We also show the results of several experiments performed in order to study the impact of the strategy chosen for the recognition of proper nouns. 1
Morphological and Syntactic Processing for Text Retrieval
- of Lecture Notes in Computer Science
, 2004
"... This article describes the application of lemmatization and shallow parsing as a linguistically-based alternative to stemming in Text Retrieval, with the aim of managing linguistic variation at both word level and phrase level. Several alternatives for selecting the index terms among the syntactic d ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This article describes the application of lemmatization and shallow parsing as a linguistically-based alternative to stemming in Text Retrieval, with the aim of managing linguistic variation at both word level and phrase level. Several alternatives for selecting the index terms among the syntactic dependencies detected by the parser are evaluated. Though this article focusses on...
COLE experiments at CLEF 2002 Spanish monolingual track
- Advances in Cross-Language Information Retrieval, volume 2785 of Lecture Notes in Computer Science
, 2003
"... In this our first participation in CLEF, we have applied Natural Language Processing techniques for single word and multi-word term conflation. We have tested several approaches at different levels of text processing in our experiments: firstly, we have lemmatized the text to avoid inflectional vari ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In this our first participation in CLEF, we have applied Natural Language Processing techniques for single word and multi-word term conflation. We have tested several approaches at different levels of text processing in our experiments: firstly, we have lemmatized the text to avoid inflectional variation; secondly, we have expanded the queries through synonyms according to a fixed threshold of similarity; and thirdly, we have tested a mixed approach based on the employment of productive derivational morphology to solve derivational variation and syntactic dependencies to deal with the syntactic content of the document. 1
Regional Versus Global Finite-State Error Repair ⋆
"... Abstract. We focus on the domain of a regional least-cost strategy in order to illustrate the viability of non-global repair models over finitestate architectures. Our interest is justified by the difficulty, shared by all repair proposals, to determine how far to validate. A short validation may fa ..."
Abstract
- Add to MetaCart
Abstract. We focus on the domain of a regional least-cost strategy in order to illustrate the viability of non-global repair models over finitestate architectures. Our interest is justified by the difficulty, shared by all repair proposals, to determine how far to validate. A short validation may fail to gather sufficient information, and in a long one most of the effort can be wasted. The goal is to prove that our approach can provide, in practice, a performance and quality comparable to that attained by global criteria, with a significant saving in time and space. To the best of our knowledge, this is the first discussion of its kind. 1
COLE experiments at CLEF 2003 - Spanish Monolingual Track
, 2003
"... In this our second participation in the CLEF Spanish monolingual track, we have continued applying Natural Language Processing techniques for single word and multi-word term conflation. Two different conflation approaches have been tested. The first approach is based on the lemmatization of the t ..."
Abstract
- Add to MetaCart
In this our second participation in the CLEF Spanish monolingual track, we have continued applying Natural Language Processing techniques for single word and multi-word term conflation. Two different conflation approaches have been tested. The first approach is based on the lemmatization of the text in order to avoid inflectional variation. Our second approach consists of the employment of syntactic dependencies as complex index terms, in an attempt to solve the problems derived from syntactic variation and, in this way, to obtain more precise terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers.

