Results 1 - 10
of
14
Recovering Documentation-to-Source-Code Traceability Links using Latent Semantic Indexing
"... An information retrieval technique, latent semantic indexing, is used to automatically identi traceability links from system documentation to program source code. The results of two experiments to identi links in existing software systems (i.e., the LEDA library, and Albergate) are presented. These ..."
Abstract
-
Cited by 100 (10 self)
- Add to MetaCart
An information retrieval technique, latent semantic indexing, is used to automatically identi traceability links from system documentation to program source code. The results of two experiments to identi links in existing software systems (i.e., the LEDA library, and Albergate) are presented. These results are compared with other similar type experimental results of traceability link identification using different types of information retrieval techniques. The method presented proves to give good results by comparison and additionally it is a low cost, highly flexible method to apply with regards to preprocessing and/or parsing of the source code and documentation.
Supporting Program Comprehension Using Semantic and Structural Information
, 2001
"... The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a ..."
Abstract
-
Cited by 50 (13 self)
- Add to MetaCart
The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a software system. The other dimension, structural, refers to issues such as the actual syntactic structure of the program along with the control and data flow that it represents. An advanced information retrieval method, latent semantic indexing, is used to define a semantic similarity measure between software components. Components within a software system are then clustered together using this similarity measure. Simple structural information (.e., file organization) of the software system is then used to assess the semantic cohesion of the clusters and files, with respect to each other. The measures are formally defined for general application. A set of experiments is presented which demonstrates how these measures can assist in the understanding of a nontrivial software system, namely a version of NCSA Mosaic.
Identification of High-Level Concept Clones in Source Code
, 2001
"... Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part, or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "reinventing the wheel". ..."
Abstract
-
Cited by 46 (9 self)
- Add to MetaCart
Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part, or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "reinventing the wheel".
Clustering Software Artifacts Based on Frequent Common Changes
- In Proc. IWPC
, 2005
"... Changes of software systems are less expensive and less error-prone if they affect only one subsystem. Thus, clusters of artifacts that are frequently changed together are subsystem candidates. We introduce a two-step method for identifying such clusters. First, a model of common changes of software ..."
Abstract
-
Cited by 39 (9 self)
- Add to MetaCart
Changes of software systems are less expensive and less error-prone if they affect only one subsystem. Thus, clusters of artifacts that are frequently changed together are subsystem candidates. We introduce a two-step method for identifying such clusters. First, a model of common changes of software artifacts, called co-change graph, is extracted from the version control repository of the software system. Second, a layout of the co-change graph is computed that reveals clusters of frequently co-changed artifacts. We derive requirements for such layouts, and introduce an energy model for producing layouts that fulfill these requirements. We evaluate the method by applying it to three example systems, and comparing the resulting layouts to authoritative decompositions.
Using Latent Semantic Analysis to Identify Similarities in Source Code to Support Program Understanding
, 2000
"... The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation. Latent Semantic Analysis is a corpus-based statistical method for inducing and representing aspects of the meanings of words and ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation. Latent Semantic Analysis is a corpus-based statistical method for inducing and representing aspects of the meanings of words and passages (of natural language) reflective in their usage. This methodology is assessed for application to the domain of software components (i.e., source code and its accompanying documentation). Here LSA is used as the basis to cluster software components. This clustering is used to assist in the understanding of a nontrivial software system, namely a version of Mosaic. Applying Latent Semantic Analysis to the domain of source code and internal documentation for the support of program understanding is a new application of this method and a departure from the normal application domain of natural language. 1. Introduction The tasks of maintenance and reengineering of an existing software sy...
Using a hypertext model for traceability link conformance analysis
- In Proc. of the 2nd Int. Workshop on Traceability in Emerging Forms of Software Engineering
, 2003
"... A number of techniques for semi-automated traceability link recovery between source code and documentation have recently been proposed to support the reverse engineering and maintenance of legacy systems. This is only the first step in supporting the long term maintainability of such systems. A cruc ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
A number of techniques for semi-automated traceability link recovery between source code and documentation have recently been proposed to support the reverse engineering and maintenance of legacy systems. This is only the first step in supporting the long term maintainability of such systems. A crucial issue, after recovering traceability links is analyzing their general conformance over time. That is, as a system is changed during evolution the validity and conformance of the links may change. Thus, conformance analysis must be performed to identify possible non-conformance of the links. The paper presents a holistic view of how to combine link recovery with conformance analysis that is facilitated by a formal hypertext model. This hypertext model not only supports complex linking structures (e.g., multi links) but also supports versioning of individual links. Such a model preserves and maintains over time the results of the reverse engineered traceability links. 1
Classifying Software Changes: Clean or Buggy?
, 2008
"... This paper introduces a new technique for predicting latent software bugs, called change classification. Change classification uses a machine learning classifier to determine whether a new software change is more similar to prior buggy changes or clean changes. In this manner, change classification ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
This paper introduces a new technique for predicting latent software bugs, called change classification. Change classification uses a machine learning classifier to determine whether a new software change is more similar to prior buggy changes or clean changes. In this manner, change classification predicts the existence of bugs in software changes. The classifier is trained using features (in the machine learning sense) extracted from the revision history of a software project stored in its software configuration management repository. The trained classifier can classify changes as buggy or clean, with a 78 percent accuracy and a 60 percent buggy change recall on average. Change classification has several desirable qualities: 1) The prediction granularity is small (a change to a single file), 2) predictions do not require semantic information about the source code, 3) the technique works for a broad array of project types and programming languages, and 4) predictions can be made immediately upon the completion of a change. Contributions of this paper include a description of the change classification approach, techniques for extracting features from the source code and change histories, a characterization of the performance of change classification across 12 open source projects, and an evaluation of the predictive power of different groups of features.
Recovery of traceability links between software documentation and source code
- International Journal of Software Engineering and Knowledge Engineering
"... An approach for the semi-automated recovery of traceability links between software documentation and source code is presented. The methodology is based on the application of information retrieval techniques to extract and analyze the semantic information from the source code and associated documenta ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
An approach for the semi-automated recovery of traceability links between software documentation and source code is presented. The methodology is based on the application of information retrieval techniques to extract and analyze the semantic information from the source code and associated documentation. A semi-automatic process is defined based on the proposed methodology. The paper advocates the use of latent semantic indexing (LSI) as the supporting information retrieval technique. Two case studies using existing software are presented comparing this approach with others. The case studies show positive results for the proposed approach, especially considering the flexibility of the methods used.
Mining Co-Change Clusters from Version Repositories
, 2005
"... Clusters of software artifacts that are frequently changed together are subsystem candidates, because one of the main goals of software design is to make changes local. The contribution of this paper is a visualization-based method that supports the identification of such clusters. First, we define ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Clusters of software artifacts that are frequently changed together are subsystem candidates, because one of the main goals of software design is to make changes local. The contribution of this paper is a visualization-based method that supports the identification of such clusters. First, we define the co-change graph as a simple but powerful model of common changes of software artifacts, and describe how to extract the graph from version control repositories. Second, we introduce an energy model for computing force-directed layouts of co-change graphs. The resulting layouts have a well-defined interpretation in terms of the structure of the visualized graph, and clearly reveal groups of frequently co-changed artifacts. We evaluate our method by comparing the layouts for three example projects with authoritative subsystem decompositions.
Support For Software Maintenance Using Latent Semantic Analysis
, 2000
"... The paper describes the results of applying semantic (versus structural) methods to the problems of software maintenance and program comprehension. Here, the focus is on tools to assist programmer to understand large legacy software systems. The method applied, Latent Semantic Analysis, is a corpus- ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The paper describes the results of applying semantic (versus structural) methods to the problems of software maintenance and program comprehension. Here, the focus is on tools to assist programmer to understand large legacy software systems. The method applied, Latent Semantic Analysis, is a corpus-based statistical method for inducing and representing aspects of the meanings of words and passages (of natural language) reflective in their usage. This methodology is assessed for application to the domain of software components (i.e., source code and its accompanying documentation). The intent of applying Latent Semantic Analysis to software components is to automatically induce a specific semantic meaning of a given component. Here, LSA is used as the basis to group software components, across files, to assist in program comprehension. This clustering is used in the understanding of a nontrivial software system, namely a version of Mosaic. KEYWORDS Software Maintenance, Program Unders...

