Results 1 -
7 of
7
Pattern matching for clone and concept detection
- Journal of Automated Software Engineering
, 1996
"... A legacy system is an operational, large-scale software system that is maintained beyond its first generation of programmers. It typically represents a massive economic investment and is critical to the mission of the organization it serves. As such systems age, they become increasingly complex and ..."
Abstract
-
Cited by 66 (14 self)
- Add to MetaCart
A legacy system is an operational, large-scale software system that is maintained beyond its first generation of programmers. It typically represents a massive economic investment and is critical to the mission of the organization it serves. As such systems age, they become increasingly complex and brittle, and hence harder to maintain. They also become even more critical to the survival of their organization because the business rules encoded within the system are seldom documented elsewhere. Our research is concerned with developing a suite of tools to aid the maintainers of legacy systems in recovering the knowledge embodied within the system. The activities, known collectively as “program understanding”, are essential preludes for several key processes, including maintenance and design recovery for reengineering. In this paper we present three pattern-matching techniques: source code metrics, a dynamic programming algorithm for finding the best alignment between two code fragments, and a statistical matching algorithm between abstract code descriptions represented in an abstract language and actual source code. The methods are applied to detect instances of code cloning in several moderately-sized production systems including tcsh, bash, and CLIPS. The programmer’s skill and experience are essential elements of our approach. Selection of particular tools and analysis methods depends on the needs of the particular task to be accomplished. Integration of the tools provides opportunities for synergy, allowing the programmer to select the most appropriate tool for a given task.
Program understanding as constraint satisfaction
- Journal of Automated Software Engineering
, 1995
"... Abstract. The process of understanding a source code in a high-level programming language involves complex computation. Given a piece of legacy code and a library of program plan templates, understanding the code corresponds to building mappings from parts of the source code to particular program pl ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Abstract. The process of understanding a source code in a high-level programming language involves complex computation. Given a piece of legacy code and a library of program plan templates, understanding the code corresponds to building mappings from parts of the source code to particular program plans. These mappings could be used to assist an expert in reverse engineering legacy code, to facilitate software reuse, or to assist in the translation of the source into another programming language. In this paper we present a model of program understanding using constraint satisfaction. Within this model we intelligently compose a partial global picture of the source program code by transforming knowledge about the problem domain and the program itself into sets of constraints. We then systematically study different search algorithms and empirically evaluate their performance. One advantage of the constraint satisfaction model is its generality; many previous attempts in program understanding could now be cast under the same spectrum of heuristics, and thus be readily compared. Another advantage is the improvement in search efficiency using various heuristic techniques in constraint satisfaction. Keywords: 1. Foreword Three years have passed since the inception of the idea of applying constraint-based representation and techniques (CSP) to program understanding and design pattern recovery. The
Clarity guided belief revision for domain knowledge recovery in legacy systems
- In: Proceedings of the 12th International Conference on Software Engineering and Knowledge Engineering
, 2000
"... Program understanding is the process of acquiring knowledge from a computer program. Although research work utilising knowledge engineering techniques has been undertaken in this field, it is our observation that a thorough application of AI methodology has not been sufficiently explored. In this pa ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Program understanding is the process of acquiring knowledge from a computer program. Although research work utilising knowledge engineering techniques has been undertaken in this field, it is our observation that a thorough application of AI methodology has not been sufficiently explored. In this paper, we present a clarity guided belief revision approach to domain knowledge recovery in legacy software systems. Novel solutions are given to three key AI issues in the context of domain knowledge recovery from source code: knowledge representation, where concrete semantic network is separated from abstract semantic network to better accommodate uncertainty reasoning and propagation; uncertainty reasoning, which borrows ideas from confirmation theory and recasts them in the context of semantic network reasoning; heuristic search, which is designed on the principle of programming psychology. Our approach is light-weighted. It can be used stand-alone or as a complement to traditional heavy-weighted domain knowledge recovery methods.
An Overview of Structural and Specification Driven Candidature Criteria for Reuse Reengineering Processes
, 1995
"... One of the most promising ways to make the population of a repository of reusable assets cost effective and to obtain useful results in the short time is by extracting and reengineering them from existing software. A reuse reengineering process consists of the set of activities for identifying softw ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
One of the most promising ways to make the population of a repository of reusable assets cost effective and to obtain useful results in the short time is by extracting and reengineering them from existing software. A reuse reengineering process consists of the set of activities for identifying software components implementing abstractions, reengineering them according to a predefined template, associating them with their interface and functional specification and populating a repository with the reusable assets so obtained. Code scavenging consists in searching existing software systems for source code components that implement software abstractions. We present an overview of code scavenging techniques with reference to the first phase of the RE 2 project reference paradigm for setting up reuse reengineering processes. Several program representations proposed in the literature for software maintenance and in particular useful for reverse engineering and reengineering are also describ...
Simplicity: A Key Engineering Concept for Program Understanding
"... One of the most significant problems for existing program comprehension methods is its scalability. In this paper, we introduce a new technique to make the scalability possible. In particular, we advocate the concept of “simplicity” for program understanding. We first propose a simplifed semantic ne ..."
Abstract
- Add to MetaCart
One of the most significant problems for existing program comprehension methods is its scalability. In this paper, we introduce a new technique to make the scalability possible. In particular, we advocate the concept of “simplicity” for program understanding. We first propose a simplifed semantic network as domain knowledge representation; we then introduce a linear and domain-oriented program partitioning method which can partition a huge program into self-contained program modules so that the recovery of domain knowledge can be carried out within smaller program space; we also introduce a set of rules for recovering domain knowledge from C code followed by a theoretical analysis on these algorithms; A case study on programming style based program partitioning method is particularly given. Finally, comparisons with others ’ work are made and conclusion is drawn.
Detecting Code Similarity Using Patterns
, 1995
"... A key issue in design recovery is to localize patterns of code that may implement a particular plan or algorithm. This paper describes a set of code-to-code and abstract-description-to-code matching techniques. The code-to-code matching uses dynamic programming techniques to localize similar code fr ..."
Abstract
- Add to MetaCart
A key issue in design recovery is to localize patterns of code that may implement a particular plan or algorithm. This paper describes a set of code-to-code and abstract-description-to-code matching techniques. The code-to-code matching uses dynamic programming techniques to localize similar code fragments and is targeted for large software systems (1MLOC). Patterns are specified either as source code or as a sequence of abstract statements written in an concept language. Markov models are used to compute dissimilarity distances between an abstract description and a code fragment in terms of the probability that a given abstract statement can generate a given code fragment. The abstract-description-to-code matcher is under implementation and early experiments show it is a promising technique. 1 Introduction One of the key objectives towards the design recovery of a large complex software system is the identification of common patterns in the code. These common patterns may reveal imp...
Joint work of:
"... Redundancy, and Similarity in Software was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the ..."
Abstract
- Add to MetaCart
Redundancy, and Similarity in Software was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The rst section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available.

