Results 1 - 10
of
10
Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax
- In proceedings of the 35th Annual Meeting of the ACL
, 1997
"... A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using tran ..."
Abstract
-
Cited by 33 (7 self)
- Add to MetaCart
A system for the automatic production of controlled index terms is presented using linguistically-motivated techniques. This includes a finite-state part of speech tagger, a derivational morphological processor for analysis and generation, and a unificationbased shallow-level parser using transformational rules over syntactic patterns. The contribution of this research is the success- ful combination of parsing over a seed term list coupled with derivational morphology to achieve greater coverage of multi-word terms for indexing and retrieval. Final results are evaluated for precision and recall, and implications for indexing and retrieval are discussed.
Empirical Observation of Term Variations and Principles for their Description
, 2000
"... Contents 1 Introduction 2 1.1 Do terms vary? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 A Symbolic Framework for the Study of Terminological Variation . . . . . . . . . . . . . . . 4 2 The Most Common Types of English Two-word Terms 7 2.1 Adjective N ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Contents 1 Introduction 2 1.1 Do terms vary? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 A Symbolic Framework for the Study of Terminological Variation . . . . . . . . . . . . . . . 4 2 The Most Common Types of English Two-word Terms 7 2.1 Adjective Noun (A N) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Noun Noun (N 2 N 1 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Noun Preposition Noun (N 1 P N 2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3 Observing and Representing Term Variants 9 3.1 An Observation of Term Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 A Two-level Lexico-syntactic Description of Terms . . . . . . . . . . . . . . . . . . . . . . . 11 3.3 Two Families of Grammatical Rules . .
NLP for Term Variant Extraction: Synergy between Morphology, Lexicon, and Syntax
, 1999
"... . We present a natural language processing (NLP) approach to automatic indexing over controlled vocabulary which accounts for term variation. The approach combines a part of speech tagger, a generator of morphologically related forms, and a shallow transformational parser. The system is applied to t ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
. We present a natural language processing (NLP) approach to automatic indexing over controlled vocabulary which accounts for term variation. The approach combines a part of speech tagger, a generator of morphologically related forms, and a shallow transformational parser. The system is applied to the French language; it is trained on newspaper articles and tested on scientific literature. Precision rate of indexing on term and variants is 97.2%. It is only slightly lower than indexing without accounting for term variation (99.7%). Recall rate of indexing on term and variants (93.4%) is much higher than recall of indexing on term occurrences only (72.4%). Conflation of term variants increases indexing coverage up to 30%. The system is a convincing example of the potential synergy between full-fledged morphological analysis and local syntactic analysis. Many details are provided on the implementation of the system. Illustrative examples of syntactic transformations for the French language are given together with the theoretical and empirical methods for their formulation. 2 CHRISTIAN JACQUEMIN AND EVELYNE TZOUKERMANN 1.
An Incrementally Extensible Document Retrieval System Based on Linguistic and Logical Principles
- Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1992
"... Most natural language based document retrieval systems use the syntax structures of constituent phrases of documents as index terms. Many of these systems also attempt to reduce the syntactic variability of natural language by some normalisation procedure applied to these syntax structures. However, ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Most natural language based document retrieval systems use the syntax structures of constituent phrases of documents as index terms. Many of these systems also attempt to reduce the syntactic variability of natural language by some normalisation procedure applied to these syntax structures. However, the retrieval performance of such systems remains fairly disappointing. Some systems therefore use a meaning representation language to index and retrieve documents. In this paper, a system is presented that uses Horn Clause Logic as meaning representation language, employs advanced techniques from Natural Language Processing to achieve incremental extensibility, and uses methods from Logic Programming to achieve robustness in the face of insufficient data.
What Is The Tree That We See Through The Window: A Linguistic Approach To Windowing And Term Variation
"... Windowing techniques play a key role in information retrieval. Previous works have suggested that the quality of access to information relies heavily on the characteristics of the windows. This study provides a linguistic approach to text windowing through an extraction of term variants with the hel ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
Windowing techniques play a key role in information retrieval. Previous works have suggested that the quality of access to information relies heavily on the characteristics of the windows. This study provides a linguistic approach to text windowing through an extraction of term variants with the help of a partial parser. The syntactic grounding of the method ensures that words observed within restricted spans are lexically related and that spurious word co-occurrences are ruled out with a good level of confidence. The system is computationally tractable on large corpora and large lists of terms. Illustrative examples of term variations from a large medical corpus are given. An experimental evaluation of the method shows that only a small proportion of co-occurring words are lexically related and motivates the call for natural language parsing techniques in text windowing. 1. INTRODUCTION The notion of text window -- a span of contiguous words within a document -- is crucial for severa...
An Evaluation of Linguistically-motivated Indexing Schemes
- In Proceedings of the 22nd BCS-IRSG Colloquium on IR Research
, 2000
"... In this article, we describe a number of indexing experiments based on indexing terms other than simple keywords. These experiments were conducted as one step in validating a linguistically-motivated indexing model. The problem is important but not new. What is new in this approach is the variety of ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
In this article, we describe a number of indexing experiments based on indexing terms other than simple keywords. These experiments were conducted as one step in validating a linguistically-motivated indexing model. The problem is important but not new. What is new in this approach is the variety of schemes evaluated. It is important since it should not only help to overcome the well-known problems of bag-of-words representations, but also the difficulties raised by non-linguistic text simplification techniques such as stemming, stop-word deletion, and term selection. Our approach in the selection of terms is based on part-of-speech tagging and shallow parsing. The indexing schemes evaluated vary from simple keywords to nouns, verbs, adverbs, adjectives, adjacent word-pairs, and head-modifier pairs. Our findings apply to Information Retrieval and most of related areas.
Automatic Indexing: An Approach Using an Index Term Corpus and Combining Linguistic and Statistical Methods
, 2000
"... This thesis discusses the problems and the methods of finding relevant information in large collections of documents. The contribution of this thesis to this problem is to develop better content analysis methods which can be used to describe document content with index terms. Index terms can be used ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This thesis discusses the problems and the methods of finding relevant information in large collections of documents. The contribution of this thesis to this problem is to develop better content analysis methods which can be used to describe document content with index terms. Index terms can be used as meta-information that describes documents, and that is used for seeking information. The main point of this thesis is to illustrate the process of developing an automatic indexer which analyses the content of documents by combining evidence from word frequencies and evidence from linguistic analysis provided by a syntactic parser. The indexer weights the expressions of a text according to their estimated importance for describing the content of a given document on the basis of the content analysis. The typical linguistic features of index terms were explored using a linguistically analysed text collection where the index terms are manually marked up. This text collection is referred to as an index term corpus. Specific features of the index terms provided the basis for a linguistic term-weighting scheme, which was then combined with a frequency-based term-weighting scheme. The use of an index term corpus like this as training material is a new method of developing an automatic indexer. The results of the experiments were promising.
Text Filtering using Linguistically-Motivated Indexing Terms
, 1999
"... In this article, we describe a number of text filtering experiments based on indexing terms other than simple keywords. These experiments were conducted as a first step of validating our phrase-based retrieval model. The problem is important but not new. What is new in this approach is the variety o ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this article, we describe a number of text filtering experiments based on indexing terms other than simple keywords. These experiments were conducted as a first step of validating our phrase-based retrieval model. The problem is important but not new. What is new in this approach is the variety of evaluated linguistically-motivated indexing schemes. This issue applies to Information Retrieval and is important since it should help to overcome well-known diculties raised by non-linguistic text simplification techniques such as stemming or stop-word deletion. Our approach in the selection of indexing terms was based on a part-of-speech tagger and shallow parsing. The evaluated indexing terms include keywords, nouns, verbs, adverbs, adjectives, adjacent word-pairs and head-modier pairs. Furthermore, the eect of lemmatizing terms was investigated. Evaluation was done with three utility-based measures and average set precision. We managed to obtain experimental evidence that our phrase-ba...
AUTOMATIC IDENTIFICATION OF CAUSAL RELATIONS IN TEXT AND THEIR USE FOR IMPROVING PRECISION IN INFORMATION RETRIEVAL
"... This is a reformatted version of the original dissertation, and ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This is a reformatted version of the original dissertation, and
A Natural Language Approach To Multi-Word Term Conflation
- Proceedings of the DELOS conference from the European Research Consortium on Information Management (ERCIM
, 1997
"... This paper presents a corpus-based system to expand multi-word index terms using a part-of-speech tagger and a fullfledged derivational morphological system, combined with a shallow parser. The unique contribution of the research is in using these linguistically based tools with filters in order to ..."
Abstract
- Add to MetaCart
This paper presents a corpus-based system to expand multi-word index terms using a part-of-speech tagger and a fullfledged derivational morphological system, combined with a shallow parser. The unique contribution of the research is in using these linguistically based tools with filters in order to avoid the problems of semantic degradation typically associated with derivational analysis. The expansion and subsequent conflation of terms increases indexing coverage up to 30%, with precision of nearly 90% for correct identification of related terms. The system core is language independent and provides a uniform platform on which to build multilingual applications. Language specific modules have been developed for English and French. The fully implemented system is described with particular attention to the role of derivational morphology and phrasal relations. Results and evaluation will be presented in terms of precision and recall, with an analysis of errors. This paper illustrates how the use of natural language processing tools for tasks to which they are especially suited such as indexing, has the potential to improve performance in IR. Paper presented at the DELOS Workshop on Cross-Linguistic Information Retrieval, Zurich, 5-7 March 1997. ERCIM. System Function and Architecture Three NLP modules are key to the system: morphology, part of speech tagging, and surface syntactic analysis (see Figure 1). The emphasis in our research is on the computational linguistic features of the system with particular attention to the role of the morphological component, and on the utilization of a toolset to solve the multi-word indexing coverage problem in information retrieval. The system consists of the following procedures: 1. Start with a multi-word term list and a large corpu...

