Results 1 - 10
of
22
Extracting paraphrases from a parallel corpus
- In Proc. of the ACL/EACL
, 2001
"... While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of th ..."
Abstract
-
Cited by 152 (4 self)
- Add to MetaCart
While paraphrasing is critical both for interpretation and generation of natural language, current systems use manual or semi-automatic methods to collect paraphrases. We present an unsupervised learning algorithm for identification of paraphrases from a corpus of multiple English translations of the same source text. Our approach yields phrasal and single word lexical paraphrases as well as syntactic paraphrases. 1
Information Fusion in the Context of Multi-Document Summarization
- IN PROCEEDINGS OF THE 37TH ANNUAL MEETING OF THE ACL
, 1999
"... We present a method to automatically generate a concise summary by identifying and synthesizing similar elements across related text from a set of multiple documents. Our approach is unique in its usage of language generation to reformulate the wording of the summary. ..."
Abstract
-
Cited by 107 (16 self)
- Add to MetaCart
We present a method to automatically generate a concise summary by identifying and synthesizing similar elements across related text from a set of multiple documents. Our approach is unique in its usage of language generation to reformulate the wording of the summary.
Guessing Morphology from Terms and Corpora
- Proceedings of SIGIR 97
"... This study proposes an algorithm for automatically acquiring morphological links between words. This algorithm relies on the concurrent use of a corpus and a list of multi-word terms, and does not require any prior linguistic knowledge. The four steps of the algorithm are (1) single-word truncation, ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
This study proposes an algorithm for automatically acquiring morphological links between words. This algorithm relies on the concurrent use of a corpus and a list of multi-word terms, and does not require any prior linguistic knowledge. The four steps of the algorithm are (1) single-word truncation, (2) conflation of multi-word terms, (3) classification and filtering, and (4) clustering of conflation classes. At each step a precise evaluation is performed in order to chose the optimal parameters. The final results indicate a clustering of 45% of the classes with a precision of 87%. The derivational knowledge acquired through this method can be used for conceiving a domain-oriented stemmer for scientific and technical corpora. In Proceedings, 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'97), Philadelphia, PA. 27-31 July 1997. 1
Extracting Structural Paraphrases from Aligned Monolingual Corpora
, 2003
"... We present an approach for automatically learning paraphrases from aligned monolingual corpora. Our algorithm works by generalizing the syntactic paths between corresponding anchors in aligned sentence pairs. Compared to previous work, structural paraphrases generated by our algorithm tend to be muc ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
We present an approach for automatically learning paraphrases from aligned monolingual corpora. Our algorithm works by generalizing the syntactic paths between corresponding anchors in aligned sentence pairs. Compared to previous work, structural paraphrases generated by our algorithm tend to be much longer on average, and are capable of capturing long-distance dependencies. In addition to a standalone evaluation of our paraphrases, we also describe a question answering application currently under development that could immensely benefit from automatically-learned structural paraphrases.
NLP for Term Variant Extraction: Synergy between Morphology, Lexicon, and Syntax
, 1999
"... . We present a natural language processing (NLP) approach to automatic indexing over controlled vocabulary which accounts for term variation. The approach combines a part of speech tagger, a generator of morphologically related forms, and a shallow transformational parser. The system is applied to t ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
. We present a natural language processing (NLP) approach to automatic indexing over controlled vocabulary which accounts for term variation. The approach combines a part of speech tagger, a generator of morphologically related forms, and a shallow transformational parser. The system is applied to the French language; it is trained on newspaper articles and tested on scientific literature. Precision rate of indexing on term and variants is 97.2%. It is only slightly lower than indexing without accounting for term variation (99.7%). Recall rate of indexing on term and variants (93.4%) is much higher than recall of indexing on term occurrences only (72.4%). Conflation of term variants increases indexing coverage up to 30%. The system is a convincing example of the potential synergy between full-fledged morphological analysis and local syntactic analysis. Many details are provided on the implementation of the system. Illustrative examples of syntactic transformations for the French language are given together with the theoretical and empirical methods for their formulation. 2 CHRISTIAN JACQUEMIN AND EVELYNE TZOUKERMANN 1.
Term Extraction and Automatic Indexing
, 2003
"... This chapter presents a new domain of research and development in Natural Language Processing (NLP) that is concerned with the representation, acquisition, and recognition of terms. Terms are pervasive in scientific and technical documents; their identification is a crucial issue for any applicatio ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
This chapter presents a new domain of research and development in Natural Language Processing (NLP) that is concerned with the representation, acquisition, and recognition of terms. Terms are pervasive in scientific and technical documents; their identification is a crucial issue for any application dealing with the analysis, understanding, generation, or translation of such documents. In particular, the ever-growing mass of specialized documentation available on-line, in industrial and governmental archives or in digital libraries, calls for advances in terminology processing for such purposes as information retrieval, cross-language querying, indexing of multimedia documents, translation aids, document routing and summarization, etc. This chapter introduces the basic linguistic characteristics of terms. It presents the main methods in NLP for recognizing or discovering terms and their interrelationships in large corpora. It is divided into three sections: an introduction to the bas...
Effective Use of Natural Language Processing Techniques for Automatic Conflation of Multi-Word Terms: The Role of Derivational Morphology, Part of Speech Tagging, and Shallow Parsing
- In Research and Development in Information Retrieval
"... We present a corpus-based system to expand multi-word index terms using a part-of-speech tagger and a full-fledged derivational morphological system, combined with a shallow parser. The system has been applied to French. The unique contribution of the research is in using these linguistically based ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
We present a corpus-based system to expand multi-word index terms using a part-of-speech tagger and a full-fledged derivational morphological system, combined with a shallow parser. The system has been applied to French. The unique contribution of the research is in using these linguistically based tools with safety filters in order to avoid the problems of degradation typically associated with derivational analysis and generation. The successful expansion and thus conflation of terms, increases indexing coverage up to 30% with precision of nearly 90% for correct identification of related terms. The fully implemented system is described with particular attention on the role of derivational morphology and phrasal relations. Results and evaluation are presented in terms of precision and recall, with an analysis and discussion of errors. This paper illustrates how natural language processing tools, when combined effectively for tasks to which they are especially suited, indicates the pote...
Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem?
, 2001
"... We seek a knowledge-free method for inducing multiword units from text corpora for use as machine-readable dictionary headwords. We provide two major evaluations of nine existing collocation-finders and illustrate the continuing need for improvement. We use Latent Semantic Analysis to make modest ga ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
We seek a knowledge-free method for inducing multiword units from text corpora for use as machine-readable dictionary headwords. We provide two major evaluations of nine existing collocation-finders and illustrate the continuing need for improvement. We use Latent Semantic Analysis to make modest gains in performance, but we show the significant challenges encountered in trying this approach.
REXTOR: A System for Generating Relations from Natural Language
, 2000
"... This paper argues that a finite-state language model with a ternary expression representation is currently the most practical and suitable bridge between natural language processing and information retrieval. Despite the theoretical computational inadequacies of finitestate grammars, they are ver ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This paper argues that a finite-state language model with a ternary expression representation is currently the most practical and suitable bridge between natural language processing and information retrieval. Despite the theoretical computational inadequacies of finitestate grammars, they are very cost effective (in time and space requirements) and adequate for practical purposes. The ternary

