Results 1 -
8 of
8
Supporting Program Comprehension Using Semantic and Structural Information
, 2001
"... The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a ..."
Abstract
-
Cited by 50 (13 self)
- Add to MetaCart
The paper focuses on investigating the combined use of semantic and structural information of programs to support the comprehension tasks involved in the maintenance and reengineering of software systems. Here, semantic refers to the domain specific issues (both problem and development domains) of a software system. The other dimension, structural, refers to issues such as the actual syntactic structure of the program along with the control and data flow that it represents. An advanced information retrieval method, latent semantic indexing, is used to define a semantic similarity measure between software components. Components within a software system are then clustered together using this similarity measure. Simple structural information (.e., file organization) of the software system is then used to assess the semantic cohesion of the clusters and files, with respect to each other. The measures are formally defined for general application. A set of experiments is presented which demonstrates how these measures can assist in the understanding of a nontrivial software system, namely a version of NCSA Mosaic.
Identification of High-Level Concept Clones in Source Code
, 2001
"... Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part, or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "reinventing the wheel". ..."
Abstract
-
Cited by 46 (9 self)
- Add to MetaCart
Source code duplication occurs frequently within large software systems. Pieces of source code, functions, and data types are often duplicated in part, or in whole, for a variety of reasons. Programmers may simply be reusing a piece of code via copy and paste or they may be "reinventing the wheel".
Finding Components in a Hierarchy of Modules: a Step towards Architectural Understanding
, 1997
"... This paper presents a method to view a system as a hierarchy of modules according to information hiding concepts and to identify architectural component candidates in this hierarchy. The result of the method eases the understanding of a system's underlying software architecture. A prototype tool imp ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
This paper presents a method to view a system as a hierarchy of modules according to information hiding concepts and to identify architectural component candidates in this hierarchy. The result of the method eases the understanding of a system's underlying software architecture. A prototype tool implementing this method was applied to three systems written in C (each over 30 Kloc). For one of these systems, an author of the system created an architectural description. The components generated by our method correspond to those of this architectural description in almost all cases. For the other two systems, most of the components resulting from the method correspond to meaningful system abstractions. 1. Introduction It is well known that programmer efforts are mostly devoted to maintaining systems [Corb89, Somm92]. A large portion of that maintenance effort is spent in understanding the program and data [Yau80]. Within this context, helping maintainers to understand the legacy systems...
An Intermediate Representation for Integrating Reverse Engineering Analyses
, 1998
"... Intermediate representations (IR) are a key issue both for compilers as well as for reverse engineering tools to enable efficient analyses. Research in the field of compilers has proposed many sophisticated IRs that can be used in the domain of reverse engineering, especially in the case of deep ana ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Intermediate representations (IR) are a key issue both for compilers as well as for reverse engineering tools to enable efficient analyses. Research in the field of compilers has proposed many sophisticated IRs that can be used in the domain of reverse engineering, especially in the case of deep analyses, but reverse engineering has also its own requirements for intermediate representations not covered by traditional compiler technology. This paper discusses requirements of IRs for reverse engineering. It shows then how most of these requirements can be met by extending and integrating existing IRs. These extensions include a generalized AST and a mechanism supporting multiple views on programs. Moreover, the paper shows how these views can efficiently be implemented.
Software architecture recovery for distributed systems
-
, 1999
"... The design and evaluation of appropriate software architectures is key to the eective development, management, evolution, and reuse of software systems. However, current software engineering practice has generally led to architectural designs that are informal, ad hoc, and dicult to analyse and main ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The design and evaluation of appropriate software architectures is key to the eective development, management, evolution, and reuse of software systems. However, current software engineering practice has generally led to architectural designs that are informal, ad hoc, and dicult to analyse and maintain. One consequence is that most existing systems have little or no documented architectural information, and the information that does exist is often an inaccurate representation of the implemented architecture. All too often, architectural information about an unfamiliar system needs to be extracted directly from the implemented software artifacts. This is a very demanding process commonly referred to as architecture recovery. Although architecture recovery can be signicantly facilitated with the help of current reverse engineering techniques and tools, many issues remain to be properly addressed, particularly regarding recovery of the runtime abstractions (client, servers, communication protocols, etc.) that are typical to the design of distributed systems. This dissertation presents a static reverse engineering approach, called X-ray, to
An approach for recovering distributed system architectures
- Automated Software Engineering
, 2001
"... Abstract. Reasoning about software systems at the architectural level is key to effective software development, management, evolution and reuse. All too often, though, the lack of appropriate documentation leads to a situation where architectural design information has to be recovered directly from ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. Reasoning about software systems at the architectural level is key to effective software development, management, evolution and reuse. All too often, though, the lack of appropriate documentation leads to a situation where architectural design information has to be recovered directly from implemented software artifacts. This is a very demanding process, particularly when involving recovery of runtime abstractions (clients, servers, interaction protocols, etc.) that are typical to the design of distributed software systems. This paper presents an exploratory reverse engineering approach, called X-ray, to aid programmers in recovering architectural runtime information from a distributed system’s existing software artifacts. X-ray comprises three domain-based static analysis techniques, namely component module classification, syntactic pattern matching, and structural reachability analysis. These complementary techniques can facilitate the task of identifying a distributed system’s implemented executable components and their potential runtime interconnections. The component module classification technique automatically distinguishes source code modules according to the executables components they implement. The syntactic pattern matching technique in turn helps to recognise specific code fragments that may implement typical component interaction features. Finally, the structural reachability analysis technique aids in the association of those features to the code specific for each executable component. The paper describes and illustrates the main concepts underlying each technique, reports on their implementation as a suit of new and off-the-shelf tools, and,
Applying File-based Information Flow and ASE Identification to a Legacy System
, 1998
"... this paper, we apply two complementary approaches to help understand such a system: looking at information flow through shared files from the perspective of batch files and identifying atomic components. The case study analyzed in this paper, WELTAB, is a system responsible for handling election dat ..."
Abstract
- Add to MetaCart
this paper, we apply two complementary approaches to help understand such a system: looking at information flow through shared files from the perspective of batch files and identifying atomic components. The case study analyzed in this paper, WELTAB, is a system responsible for handling election data in the USA. It was originally written in an extended version of Fortran on mainframe and has been migrated a few times until it was ported to C on the DOS platform. This system was provided by WorldPath Information Service as part of the reverse engineering demo project, whose aim is to compare and to evaluate different reverse engineering technologies on a common benchmark application.
Applying File-based Information Flow and ASE Identification to a Legacy System
"... Introduction Legacy systems composed of several programs which interact through multiple files can often be complex. They become even more difficult to understand when some programs are used many times across the system, when the "copy and modify" style of programming introduced many clones, and wh ..."
Abstract
- Add to MetaCart
Introduction Legacy systems composed of several programs which interact through multiple files can often be complex. They become even more difficult to understand when some programs are used many times across the system, when the "copy and modify" style of programming introduced many clones, and when multiple migration accumulated the bad features of each language and environments. In this paper, we apply two complementary approaches to help understand such a system: looking at information flow through shared files from the perspective of batch files and identifying atomic components. The case study analyzed in this paper, WELTAB, is a system responsible for handling election data in the USA. It was originally written in an extended version of Fortran on mainframe and has been migrated a few times until it was ported to C on the DOS platform. This system was provided by WorldPath Information Service as part of the reverse engineering demo project, whose aim is to compare and

