Results 1 -
6 of
6
Term Clustering of Syntactic Phrases
- Proceedings of ACM SIGIR-90
, 1990
"... Term clustering and syntactic phrase formation are methods for transforming natural language text. Both have had only mixed success as strategies for improving the quality of text representations for document retrieval. Since the strengths of these methods are complementary, we have explored combini ..."
Abstract
-
Cited by 56 (5 self)
- Add to MetaCart
Term clustering and syntactic phrase formation are methods for transforming natural language text. Both have had only mixed success as strategies for improving the quality of text representations for document retrieval. Since the strengths of these methods are complementary, we have explored combining them to produce superior representations. In this paper we discuss our implementation of a syntactic phrase generator, as well as our preliminary experiments with producing phrase clusters. These experiments show small improvements in retrieval effectiveness resulting from the use of phrase clusters, but it is clear that corpora much larger than standard information retrieval test collections will be required to thoroughly evaluate the use of this technique.
Methods of Automatic Term Recognition - A Review
, 1996
"... Following the growing interest in "corpus-based" approaches to computational linguistics, a number of studies have recently appeared on the topic of automatic term recognition or extraction. Because a successful term recognition method has to be based on proper insights into the nature of terms, stu ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Following the growing interest in "corpus-based" approaches to computational linguistics, a number of studies have recently appeared on the topic of automatic term recognition or extraction. Because a successful term recognition method has to be based on proper insights into the nature of terms, studies of automatic term recognition not only contribute to the applications of computational linguistics but also to the theoretical foundation of terminology. Many studies on automatic term recognition treat interesting aspects of terms, but most of them are not well founded and described. This paper tries to give an overview of the principles and methods of automatic term recognition. For that purpose, two major trends are examined, i.e. studies in automatic recognition of significant elements for indexing mainly carried out in information retrieval circles, and current research in automatic term recognition in the field of computational linguistics. Keywords Automatic term recognition, au...
Machine Learning in Automated Text Categorisation
, 1999
"... The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to the early ’60s. Until the late ’80s, the most effective approach to the problem seemed to be that of manually building automatic classifiers by means of knowledgeengineering ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to the early ’60s. Until the late ’80s, the most effective approach to the problem seemed to be that of manually building automatic classifiers by means of knowledgeengineering techniques, i.e. manually defining a set of rules encoding expert knowledge on how to classify documents under a given set of categories. In the ’90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest, prompted by which the machine learning paradigm to automatic classifier construction has emerged and definitely superseded the knowledge-engineering approach. Within the machine learning paradigm, a general inductive process (called the learner) automatically builds a classifier (also called the rule, or the hypothesis) by “learning”, from a set of previously classified documents, the characteristics of one or more categories. The advantages of this approach are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this survey we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues pertaining to document indexing, classifier construction, and classifier evaluation, will be discussed in detail. A final section will be devoted to the techniques that have specifically been devised for an emerging application such as the automatic classification of Web pages into “Yahoo!-like ” hierarchically structured sets of categories.
Cross Language Retrieval - English / Russian / French
, 1997
"... This paper brings together discussions of three distinct lines of research, each of which has resulted in a working software product for the database production environment: machine aided indexing, a state-of-the-art translation system, and a multilingual search and retrieval system and interface. 1 ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper brings together discussions of three distinct lines of research, each of which has resulted in a working software product for the database production environment: machine aided indexing, a state-of-the-art translation system, and a multilingual search and retrieval system and interface. 1) The MAI - Machine Aided Indexing software developed by Access Innovations,
Guru: Information retrieval for reuse
- Landmark Contributions in Software Reuse and Reverse Engineering. Unicom Seminars Ltd
, 1994
"... Although software reuse presents clear advantages for programmer productivity and code reliability, it is not practiced enough. One of the reasons for the only moderate success of reuse is the lack of software libraries that facilitate the actual locating and understanding of reusable components. Th ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Although software reuse presents clear advantages for programmer productivity and code reliability, it is not practiced enough. One of the reasons for the only moderate success of reuse is the lack of software libraries that facilitate the actual locating and understanding of reusable components. This paper describes a technology for automatically assembling large software libraries that promote software reuse by helping the user locate the components closest to her/his needs. Software libraries are automatically assembled from a set of unorganized components by using information retrieval techniques. The construction of the library is done in two steps. First, attributes are automatically extracted from natural language documentation by using a new free-text indexing scheme based on the notions of lexical a nities and quantity of information. Then, a hierarchy for browsing is automatically generated using a clustering technique that draws only on the information provided by the attributes. Thanks to the free-text indexing scheme, tools following this approach can accept free-style natural language queries. This technology has been implemented in the Guru system, which has been applied to construct an organized library of Aix utilities. An experiment was conducted in order to evaluate the retrieval e ectiveness of Guru as compared to InfoExplorer, ahypertext library system for Aix 3 on the IBM RS/6000 series, as well as to two other indexing schemes. We followed the usual evaluation procedure used in information retrieval, based upon recall and precision measures, and determined that our system retrieved more e ectively than the others. 1
List of Figures List of Tables 1 An Information Retrieval Approach for Automatically Constructing Software Libraries
"... Abstract Although software reuse presents clear advantages for programmer productivity and code reliability, it is not practiced enough. One of the reasons for the only moderate success of reuse is the lack of software libraries that facilitate the actual locating and understanding of reusable compo ..."
Abstract
- Add to MetaCart
Abstract Although software reuse presents clear advantages for programmer productivity and code reliability, it is not practiced enough. One of the reasons for the only moderate success of reuse is the lack of software libraries that facilitate the actual locating and understanding of reusable components. This paper describes a technology for automatically assembling large software libraries that promote software reuse by helping the user locate the components closest to her/his needs. Software libraries are automatically assembled from a set of unorganized components by using information retrieval techniques. The construction of the library is done in two steps. First, attributes are automatically extracted from natural language documentation by using a new indexing scheme based on the notions of lexical affinities and quantity of information. Then, a hierarchy for browsing is automatically generated using a clustering technique that draws only on the information provided by the attributes. Thanks to the free-text indexing scheme, tools following this approach can accept free-style natural language queries. This technology has been implemented in the system, which has been applied to construct an organized library of utilities. An experiment was conducted in order to evaluate the retrieval effectiveness of as compared to a hypertext library system for 3 on the IBM RISC System/6000 series. We followed the usual evaluation procedure used in information retrieval, based upon recall and precision measures, and determined that our system performs 15 % better on a random test set, while being

