Results 1 - 10
of
24
Recovering Documentation-to-Source-Code Traceability Links using Latent Semantic Indexing
"... An information retrieval technique, latent semantic indexing, is used to automatically identi traceability links from system documentation to program source code. The results of two experiments to identi links in existing software systems (i.e., the LEDA library, and Albergate) are presented. These ..."
Abstract
-
Cited by 100 (10 self)
- Add to MetaCart
An information retrieval technique, latent semantic indexing, is used to automatically identi traceability links from system documentation to program source code. The results of two experiments to identi links in existing software systems (i.e., the LEDA library, and Albergate) are presented. These results are compared with other similar type experimental results of traceability link identification using different types of information retrieval techniques. The method presented proves to give good results by comparison and additionally it is a low cost, highly flexible method to apply with regards to preprocessing and/or parsing of the source code and documentation.
Static Index Pruning for Information Retrieval Systems
- In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 2001
"... We introduce static index pruning methods that significantly reduce the index size in information retrieval systems. We investigate uniform and term-based methods that each remove selected entries from the index and yet have only a minor effect on retrieval results. In uniform pruning, there is a fi ..."
Abstract
-
Cited by 64 (3 self)
- Add to MetaCart
We introduce static index pruning methods that significantly reduce the index size in information retrieval systems. We investigate uniform and term-based methods that each remove selected entries from the index and yet have only a minor effect on retrieval results. In uniform pruning, there is a fixed cutoff threshold, and all index entries whose contribution to relevance scores is bounded above by a given threshold are removed from the index. In term-based pruning, the cutoff threshold is determined for each term, and thus may vary from term to term. We give experimental evidence that for each level of compression, term-based pruning outperforms uniform pruning, under various measures of precision. We present theoretical and experimental evidence that under our term-based pruning scheme, it is possible to prune the index greatly and still get retrieval results that are almost as good as those based on the full index. Topic areas: indexing, compression 1.
An Information Retrieval Approach to Concept Location in Source Code
- In Proceedings of the 11th Working Conference on Reverse Engineering (WCRE 2004
, 2004
"... Concept location identifies parts of a software system that implement a specific concept that originates from the problem or the solution domain. Concept location is a very common software engineering activity that directly supports software maintenance and evolution tasks such as incremental change ..."
Abstract
-
Cited by 57 (9 self)
- Add to MetaCart
Concept location identifies parts of a software system that implement a specific concept that originates from the problem or the solution domain. Concept location is a very common software engineering activity that directly supports software maintenance and evolution tasks such as incremental change and reverse engineering. This paper addresses the problem of concept location using an advanced information retrieval method, Latent Semantic Indexing (LSI). LSI is used to map concepts expressed in natural language by the programmer to the relevant parts of the source code. Results of a case study on NCSA Mosaic are presented and compared with previously published results of other static methods for concept location. 1
Supporting Program Comprehension Using Semantic and Structural Information
, 2001
"... The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a ..."
Abstract
-
Cited by 50 (13 self)
- Add to MetaCart
The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a software system. The other dimension, structural, refers to issues such as the actual syntactic structure of the program along with the control and data flow that it represents. An advanced information retrieval method, latent semantic indexing, is used to define a semantic similarity measure between software components. Components within a software system are then clustered together using this similarity measure. Simple structural information (.e., file organization) of the software system is then used to assess the semantic cohesion of the clusters and files, with respect to each other. The measures are formally defined for general application. A set of experiments is presented which demonstrates how these measures can assist in the understanding of a nontrivial software system, namely a version of NCSA Mosaic.
Identification of High-Level Concept Clones in Source Code
, 2001
"... Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part, or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "reinventing the wheel". ..."
Abstract
-
Cited by 46 (9 self)
- Add to MetaCart
Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part, or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "reinventing the wheel".
NORA/HAMMR: Making Deduction-Based Software Component Retrieval Practical
, 1997
"... Deduction-based software component retrieval uses preand postconditions as indexes and search keys and an automated theorem prover (ATP) to check whether a component matches. This idea is very simple but the vast number of arising proof tasks makes a practical implementation very hard. We thus pass ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
Deduction-based software component retrieval uses preand postconditions as indexes and search keys and an automated theorem prover (ATP) to check whether a component matches. This idea is very simple but the vast number of arising proof tasks makes a practical implementation very hard. We thus pass the components through a chain of filters of increasing deductive power. In this chain, rejection filters based on signature matching and model checking techniques are used to rule out non-matches as early as possible and to prevent the subsequent ATP from "drowning." Hence, intermediate results of reasonable precision are available at (almost) any time of the retrieval process. The final ATP step then works as a confirmation filter to lift the precision of the answer set. We implemented a chain which runs fully automatically and uses MACE for model checking and the automated prover SETHEO as confirmation filter. We evaluated the system over a medium-sized collection of components. The resul...
An Evolutionary Approach to Constructing Effective Software Reuse Repositories
- ACM Transactions on Software Engineering and Methodology
, 1997
"... This article outlines an approach that avoids these problems by choosing a retrieval method that utilizes minimal repository structure to effectively support the process of finding software components. The approach is demonstrated through a pair of proof-ofconcept prototypes: PEEL, a tool to semiaut ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
This article outlines an approach that avoids these problems by choosing a retrieval method that utilizes minimal repository structure to effectively support the process of finding software components. The approach is demonstrated through a pair of proof-ofconcept prototypes: PEEL, a tool to semiautomatically identify reusable components, and CodeFinder, a retrieval system that compensates for the lack of explicit knowledge structures through a spreading activation retrieval process. CodeFinder also allows component representations to be modified while users are searching for information. This mechanism adapts to the changing nature of the information in the repository and incrementally improves the repository while people use it. The combination of these techniques holds potential for designing software repositories that minimize up-front costs, effectively support the search process, and evolve with an organization's changing needs.
A Connectionist View on Document Classification
- In Proc Australasian Database Conf
, 1995
"... Properly structured software libraries are crucial for the success of software reuse. Specifically, the structure of the software library ought to reflect the functional similarity of the stored software components in order to facilitate the retrieval process. We propose the application of artificia ..."
Abstract
-
Cited by 26 (22 self)
- Add to MetaCart
Properly structured software libraries are crucial for the success of software reuse. Specifically, the structure of the software library ought to reflect the functional similarity of the stored software components in order to facilitate the retrieval process. We propose the application of artificial neural network technology to achieve such a structured library. In more detail, we rely on full-text indexing of the software manual in order to obtain the software representation. This software representation is further used as the input data during the training process of an artificial neural network adhering to the unsupervised learning paradigm. The distinctive feature of this very model is to make the semantic relationship between the stored software components geographically explicit. Thus, the actual user of the software library gets a notion of the semantic relationship between the components in terms of their geographical closeness. 1 Introduction Software reuse is concerned with...
Principled Disambiguation: Discriminating Adjective Senses with . . .
- COMPUTATIONAL LINGUISTICS
, 1995
"... ... In this paper we argue for a linguistically principled approach to disambiguation, in which relevant contextual clues are narrowly defined, in syntactic and semantic terms, and in which only highly reliable clues are exploited. Statistical methods play a definite role in this work, helping to or ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
... In this paper we argue for a linguistically principled approach to disambiguation, in which relevant contextual clues are narrowly defined, in syntactic and semantic terms, and in which only highly reliable clues are exploited. Statistical methods play a definite role in this work, helping to organize and analyze data, but the disambiguation method itself does not employ statistical data or decision criteria. This approach results in improved understanding of the disambiguation problem both in general and on a word-specific basis and leads to broadly applicable and nearly errorless clues to word sense. The approach is illustrated by an experiment discriminating among the senses of adjectives, which have been relatively neglected in work on sense disambiguation. In particular, the paper assesses the potential of nouns for discriminating among the senses of adjectives that modify them. This assessment is based on an empirical study of five of the most frequent ambiguous adjectives in English: hard, light, old, right, and short. About three-quarters of all instances of these adjectives can be disambiguated almost errorlessly by the nouns they modify or by the syntactic constructions in which they occur. Such disambiguation requires only simple rules, which can be automated easily. Furthermore, a small number of semantic attributes supply a compact means of representing the noun clues in a very few rules. Clues other than nouns are required when modified nouns are not useable. The sense of an ambiguous modified noun may be needed to determine the relevant semantic attribute for disambiguation of a target adjective; and other adjectives, verbs, and grammatical constructions all show evidence of high reliability, and sometimes of high applicability, when they stand in specific, ...
Ephemeral Document Clustering for Web Applications
, 2000
"... We revisit document clustering in the context of the Web. Specifically, we investigate on-line ephemeral clustering, whereby the input document set is generated dynamically, typically by search results, and the output clustering hierarchy has a short life span, and is used for interactive browsing ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
We revisit document clustering in the context of the Web. Specifically, we investigate on-line ephemeral clustering, whereby the input document set is generated dynamically, typically by search results, and the output clustering hierarchy has a short life span, and is used for interactive browsing purposes. Ephemeral clustering for interactive use introduces several new challenges. It requires an efficient algorithm, since clustering is performed on-line. It also requires high precision, because users who are not domain experts are less tolerant to errors, and because the resulting hierarchy is fully automatically generated, as opposed to off-line clustering in which the hierarchy is often manually modified. Finally, interactive clustering requires a presentation layer that enables users to effectively browse the hierarchy, including visualization techniques and automatic annotations of the hierarchy. We present new concepts, techniques and algorithms that tailor clustering to...

