Results 1 -
9 of
9
Recovering Documentation-to-Source-Code Traceability Links using Latent Semantic Indexing
"... An information retrieval technique, latent semantic indexing, is used to automatically identi traceability links from system documentation to program source code. The results of two experiments to identi links in existing software systems (i.e., the LEDA library, and Albergate) are presented. These ..."
Abstract
-
Cited by 100 (10 self)
- Add to MetaCart
An information retrieval technique, latent semantic indexing, is used to automatically identi traceability links from system documentation to program source code. The results of two experiments to identi links in existing software systems (i.e., the LEDA library, and Albergate) are presented. These results are compared with other similar type experimental results of traceability link identification using different types of information retrieval techniques. The method presented proves to give good results by comparison and additionally it is a low cost, highly flexible method to apply with regards to preprocessing and/or parsing of the source code and documentation.
Supporting Program Comprehension Using Semantic and Structural Information
, 2001
"... The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a ..."
Abstract
-
Cited by 50 (13 self)
- Add to MetaCart
The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a software system. The other dimension, structural, refers to issues such as the actual syntactic structure of the program along with the control and data flow that it represents. An advanced information retrieval method, latent semantic indexing, is used to define a semantic similarity measure between software components. Components within a software system are then clustered together using this similarity measure. Simple structural information (.e., file organization) of the software system is then used to assess the semantic cohesion of the clusters and files, with respect to each other. The measures are formally defined for general application. A set of experiments is presented which demonstrates how these measures can assist in the understanding of a nontrivial software system, namely a version of NCSA Mosaic.
Identification of High-Level Concept Clones in Source Code
, 2001
"... Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part, or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "reinventing the wheel". ..."
Abstract
-
Cited by 46 (9 self)
- Add to MetaCart
Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part, or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "reinventing the wheel".
Mudablue: An automatic categorization system for open source repositories
- In Proceedings of the 11th Asia-Pacific Software Engineering Conference (APSEC.04
, 2004
"... Open Source communities typically use a software repository to archive various software projects with their source code, mailing list discussions, documentation, bug reports, and so forth. For example, SourceForge currently hosts over seventy thousand Open Source software systems. Because of the siz ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Open Source communities typically use a software repository to archive various software projects with their source code, mailing list discussions, documentation, bug reports, and so forth. For example, SourceForge currently hosts over seventy thousand Open Source software systems. Because of the size of the rich information content, such repositories offer numerous opportunities for sharing information among projects. For example, one would like to know a set of projects that are related or similar to each other, so that the project groups can collaborate and share their work. With thousands of projects in typical repositories, however, manually locating related projects can be difficult. Hence, we propose MUDABlue, a tool that automatically categorizes software systems. MUDABlue has three major aspects: 1) it relies on no other information than the source code, 2) it determines category sets automatically, and 3) it allows a software system to be a member of multiple categories. MUDABlue has a web interface to visualize determined categories, which eases browsing a software repository. We show the effectiveness of MUDABlue’s categorization capability by comparing its generated categories with that of some other existing research tools. 1
Leveraged quality assessment using information retrieval techniques
- In 14th International Conference on Program Comprehension
, 2006
"... The goal of this research is to apply language processing techniques to extend human judgment into situations where obtaining direct human judgment is impractical due to the volume of information that must be considered. On aspect of this is leveraged quality assessments, which can be used to evalua ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
The goal of this research is to apply language processing techniques to extend human judgment into situations where obtaining direct human judgment is impractical due to the volume of information that must be considered. On aspect of this is leveraged quality assessments, which can be used to evaluate third-party coded subsystems, to track quality across the versions of a program, to assess the compression effort (and subsequent cost) required to make a change, and to identify parts of a program in need of preventative maintenance. A description of the QALP tool, its output from just under two million lines of code, and an experiment aimed at evaluating the tool’s use in leveraged quality assessment are presented. Statistically significant results from this experiment validate the use of the QALP tool in human leverage quality assessment. 1
Interactive exploration of semantic clusters
- In Proceedings of VISSOFT 2005 (3rd IEEE International Workshop on Visualizing Software For Understanding and Analysis
, 2005
"... Using visualization and exploration tools can be of great use for the understanding of a software system when only its source code is available. However, understanding a large software system by visualizing only its lower level artifacts (e.g., classes, methods) and the relations between them does n ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Using visualization and exploration tools can be of great use for the understanding of a software system when only its source code is available. However, understanding a large software system by visualizing only its lower level artifacts (e.g., classes, methods) and the relations between them does not scale for industrial-size systems. To address the scalability issue, higher level hierarchical abstractions (e.g., package structure, clustered decompositions of the system) should be used together with relations between them that are usually aggregated from the lower level relations. In this paper, we present the concepts behind Softwarenaut, a tool aimed at exploring any kind of hierarchical decompositions of a system, and then we look at a specific exploration of a system. In the experiment, the hierarchical decomposition of the system is the result of applying a semantical clustering to group classes that use similar terms.
Latent Problem Solving Analysis as an explanation of expertise effects in a complex, dynamic task
"... Latent Problem Solving Analysis (LPSA) is a theory of knowledge representation in complex problem solving that argues that problem spaces can be represented as multidimensional spaces and expertise is the construction of those spaces from immense amounts of experience. The model was applied using a ..."
Abstract
- Add to MetaCart
Latent Problem Solving Analysis (LPSA) is a theory of knowledge representation in complex problem solving that argues that problem spaces can be represented as multidimensional spaces and expertise is the construction of those spaces from immense amounts of experience. The model was applied using a dataset from a longitudinal experiment on control of thermodynamic systems. When the system is trained with expert-level amounts of experience (3 years), it can predict the end of a trial using the first three quarters with an accuracy of.9. If the system is prepared to mimic a novice (6 months) the prediction accuracy falls to.2. If the system is trained with 3 years of practice in an environment with no constraints, performance is similar to the novice baseline.
Recovering management information from source code
"... IT has become a production means for many organizations and an important element of business strategy. Even though its effective management is a must, reality shows that this area still remains in its infancy. IT management relies profoundly on relevant information which enables risk mitigation or c ..."
Abstract
- Add to MetaCart
IT has become a production means for many organizations and an important element of business strategy. Even though its effective management is a must, reality shows that this area still remains in its infancy. IT management relies profoundly on relevant information which enables risk mitigation or cost control. However, the needed information is either missing or its gathering boils down to daunting tasks. We propose an approach to recovery of management information from the essence of IT; the software’s source code. In this paper we show how to employ source code analysis techniques and recover management information. In our approach we exploit the potential of the concealed data which resides in the source code statements, source comments, and also compiler listings. We show how to depart from the raw sources, extract data, organize it, and eventually utilize so that the bit level data provides IT executives with support at the portfolio level. Our approach is pragmatic as we rely on real management questions, best practices in software engineering, and also IT market specifics. We enable, for instance, an assessment of the IT-portfolio market value, support for carrying out what-if scenarios, or identification and evaluation of the hidden risks for IT-portfolio maintainability. Our approach was deployed in an industrial setting. The study is based on a real-life IT-portfolio which supports business functions of an organization operating in the financial sector. The IT-portfolio comprises Cobol applications run on a mainframe with the total number of lines of code amounting to over 18 million. The approach we propose is suited for facilitation within a large organization. It provides for a fact-based support for strategic decision making at the portfolio level. Keywords: IT-portfolio management; management information; source code analysis; lexical analysis; Latent Semantic Indexing; source code comments; compilers; obsolete language constructs; volatility; vendor locks; legacy systems; operational risk; technology risk; risk mitigation;

