Results 1 -
6 of
6
An Evolutionary Approach to Constructing Effective Software Reuse Repositories
- ACM Transactions on Software Engineering and Methodology
, 1997
"... This article outlines an approach that avoids these problems by choosing a retrieval method that utilizes minimal repository structure to effectively support the process of finding software components. The approach is demonstrated through a pair of proof-ofconcept prototypes: PEEL, a tool to semiaut ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
This article outlines an approach that avoids these problems by choosing a retrieval method that utilizes minimal repository structure to effectively support the process of finding software components. The approach is demonstrated through a pair of proof-ofconcept prototypes: PEEL, a tool to semiautomatically identify reusable components, and CodeFinder, a retrieval system that compensates for the lack of explicit knowledge structures through a spreading activation retrieval process. CodeFinder also allows component representations to be modified while users are searching for information. This mechanism adapts to the changing nature of the information in the repository and incrementally improves the repository while people use it. The combination of these techniques holds potential for designing software repositories that minimize up-front costs, effectively support the search process, and evolve with an organization's changing needs.
Methods of Automatic Term Recognition - A Review
, 1996
"... Following the growing interest in "corpus-based" approaches to computational linguistics, a number of studies have recently appeared on the topic of automatic term recognition or extraction. Because a successful term recognition method has to be based on proper insights into the nature of terms, stu ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Following the growing interest in "corpus-based" approaches to computational linguistics, a number of studies have recently appeared on the topic of automatic term recognition or extraction. Because a successful term recognition method has to be based on proper insights into the nature of terms, studies of automatic term recognition not only contribute to the applications of computational linguistics but also to the theoretical foundation of terminology. Many studies on automatic term recognition treat interesting aspects of terms, but most of them are not well founded and described. This paper tries to give an overview of the principles and methods of automatic term recognition. For that purpose, two major trends are examined, i.e. studies in automatic recognition of significant elements for indexing mainly carried out in information retrieval circles, and current research in automatic term recognition in the field of computational linguistics. Keywords Automatic term recognition, au...
Information Access Tools for Software Reuse
- Journal of Systems and Software
, 1995
"... Software reuse has long been touted as an effective means to develop software products. But reuse technologies for software have not lived up to expectations. Among the barriers are high costs of building software repositories and the need for effective tools to help designers locate re-usable softw ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Software reuse has long been touted as an effective means to develop software products. But reuse technologies for software have not lived up to expectations. Among the barriers are high costs of building software repositories and the need for effective tools to help designers locate re-usable software. While many design-forreuse and software classification efforts have been proposed, these methods are cost-intensive and cannot effectively take advantage of large stores of design artifacts that many development organizations have accumulated. Methods are needed that take advantage of these valuable resources in a cost-effective manner. This paper describes an approach to the design of tools to help software designers build repositories of software components and locate potentially re-usable software in those repositories. The approach is investigated with a retrieval tool, named CodeFinder, which supports the process of retrieving software components when information needs are ill-defi...
Quantitative Portraits of Lexical Elements
"... This paper clarifies the basic concepts and theoretical perspectives by and from which quantitative “weighting ” of lexical elements are defined, and then draws, quantitative portraits of a few lexical elements in order to exemplify the relevance of the concepts and perspectives examined. 1 ..."
Abstract
- Add to MetaCart
This paper clarifies the basic concepts and theoretical perspectives by and from which quantitative “weighting ” of lexical elements are defined, and then draws, quantitative portraits of a few lexical elements in order to exemplify the relevance of the concepts and perspectives examined. 1
Re-ranking Documents Based on Query-Independent Document Specificity
"... Abstract. The use of query-independent knowledge to improve the ranking of documents in information retrieval has proven very effective in the context of web search. This query-independent knowledge is derivedfromananalysisofthegraphstructure of hypertext links between documents. However, there are ..."
Abstract
- Add to MetaCart
Abstract. The use of query-independent knowledge to improve the ranking of documents in information retrieval has proven very effective in the context of web search. This query-independent knowledge is derivedfromananalysisofthegraphstructure of hypertext links between documents. However, there are many cases where explicit hypertext links are absent or sparse, e.g. corporate Intranets. Previous work has sought to induce a graph link structure based on various measures of similarity between documents. After inducing these links, standard link analysis algorithms, e.g. PageRank, can then be applied. In this paper, we propose and examine an alternative approach to derive query-independent knowledge, which is not based on link analysis. Instead, we analyze each document independently and calculate a “specificity ” score, based on (i) normalized inverse document frequency, and (ii) term entropies. Two reranking strategies, i.e. hard cutoff and soft cutoff, are then discussed to utilize our query-independent “specificity ” scores. Experiments on standard TREC test sets show that our re-ranking algorithms produce gains in mean reciprocal rank of about 4%, and 4 % to 6 % gains in precision at 5 and 10, respectively, when using the collection of TREC disk 4 and queries from TREC 8 ad hoc topics. Empirical tests demonstrate that the entropy-based algorithm produces stable results across (i) retrieval models, (ii) query sets, and (iii) collections.
Document-Oriented Pruning of the Inverted Index in Information Retrieval Systems
- INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS
, 2009
"... Searching very large collections can be costly in both computation and storage. To reduce this cost, recent research has focused on reducing the size (pruning) of the inverted index. The inverted index represents a table, the rows and columns of which are terms in the lexicon and documents in the co ..."
Abstract
- Add to MetaCart
Searching very large collections can be costly in both computation and storage. To reduce this cost, recent research has focused on reducing the size (pruning) of the inverted index. The inverted index represents a table, the rows and columns of which are terms in the lexicon and documents in the collection, respectively. A non-zero entry in the table, known as a posting, indicates that the corresponding document contains the term. Previous researches on static index pruning was either (i) posting-oriented, in which less important postings are removed from the table, or (ii) termoriented, in which less important terms are removed from the table. In this paper, we investigate a new, documentoriented pruning strategy that removes entire columns of the table, i.e. removes less important documents from the collection. Three methods for estimating the importance of a document are proposed. Methods 1 and 2 are dependent on the score function of the retrieval system (e.g. Okapi BM25), while Method 3 is independent of the retrieval system. Experimental results compare the three proposed methods with Carmel et al.’s posting-oriented approach, using both the FT and LA Times collections and using both ordinary and difficult queries. Based on mean average precision and precision at 10, experimental results show that Method 3 generally performs best on the FT collection for pruned indexes down to 35 % of the original size. However, for more severe pruning, Carmel et al.’s algorithm is better. For the LA Times collection, the performance of Method 3 and that of Carmel et al. are reversed. This variation in performance across collections has not been previously reported.

