Results 1 - 10
of
38
A Taxonomy of Obfuscating Transformations
, 1997
"... It has become more and more common to distribute software in forms that retain most or all of the information present in the original source code. An important example is Java bytecode. Since such codes are easy to decompile, they increase the risk of malicious reverse engineering attacks. In this p ..."
Abstract
-
Cited by 164 (13 self)
- Add to MetaCart
It has become more and more common to distribute software in forms that retain most or all of the information present in the original source code. An important example is Java bytecode. Since such codes are easy to decompile, they increase the risk of malicious reverse engineering attacks. In this paper we review several techniques for technical protection of software secrets. We will argue that automatic code obfuscation is currently the most viable method for preventing reverse engineering. We then describe the design of a code obfuscator, a tool which converts a program into an equivalent one that is more difficult to understand and reverse engineer. The obfuscator is based on the application of code transformations, in many cases similar to those used by compiler optimizers. We describe a large number of such transformations, classify them, and evaluate them with respect to their potency (To what degree is a human reader confused?), resilience (How well are automatic deobfuscati...
The Program Understanding Problem: Analysis and A Heuristic Approach
- In Proceedings of the 18th International Conference on Software Engineerig, ICSE-18
"... Program understanding is the process of making sense of a complex source code. This process has been considered as computationally difficult and conceptually complex. So far, no formal complexity results have been presented, and conceptual models differ from one researcher to the next. In this paper ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
Program understanding is the process of making sense of a complex source code. This process has been considered as computationally difficult and conceptually complex. So far, no formal complexity results have been presented, and conceptual models differ from one researcher to the next. In this paper we formally prove that program understanding is NP-hard. Furthermore, we show that even a much simpler subproblem remains NP-hard. However, we do not despair by this result, but rather, offer an attractive problem-solving model for the program understanding problem. Our model is built on a framework for solving Constraint Satisfaction Problems, or CSPs, which are known to have interesting heuristic solutions. Specifically, we can represent and heuristically address previous and new heuristic approaches to the program understanding problem with both existing and specially designed constraint propagation and search algorithms. 1 Introduction An expert attempts to understand the source code ...
Applying plan recognition algorithms to program understanding
- Journal of Automated Software Engineering
, 1998
"... Abstract. Program understanding is often viewed as the task of extracting plans and design goals from program source. As such, it is natural to try to apply standard AI plan recognition techniques to the program understanding problem. Yet program understanding researchers have quietly, but consisten ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Abstract. Program understanding is often viewed as the task of extracting plans and design goals from program source. As such, it is natural to try to apply standard AI plan recognition techniques to the program understanding problem. Yet program understanding researchers have quietly, but consistently, avoided the use of these plan recognition algorithms. This paper shows that treating program understanding as plan recognition is too simplistic and that traditional AI search algorithms for plan recognition are not suitable, as is, for program understanding. In particular, we show (1) that the program understanding task differs significantly from the typical general plan recognition task along several key dimensions, (2) that the program understanding task has particular properties that make it particularly amenable to constraint satisfaction techniques, and (3) that augmenting AI plan recognition algorithms with these techniques can lead to effective solutions for the program understanding problem.
Recovering software architecture from the names of source files
- In Proc. Working Conf. on Reverse Engineering
, 1999
"... We discuss how to extract a useful set of subsystems from a set of source-code file names. This problem is challenging because, in many legacy systems, there are thousands of files with names that are very short and cryptic. At the same time the problem is important because software engineers often ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
We discuss how to extract a useful set of subsystems from a set of source-code file names. This problem is challenging because, in many legacy systems, there are thousands of files with names that are very short and cryptic. At the same time the problem is important because software engineers often find it difficult to understand such systems. We propose a general algorithm to cluster files based on their names, and a set of alternative methods for implementing the algorithm. One of the key tasks is picking candidate words that we will try to identify in file names. We do this by a) iteratively decomposing file names, b) finding common substrings, and c) choosing words in routine names, in an English dictionary or in source code comments. In addition, we investigate generating abbreviations from the candidate words in order to find matches in file names, as well as how to split file names into components given no word markers. To compare and evaluate our approaches, we present two experiments. The first compares the "concepts " found in each file name by each method to the results of manually decomposing file names. The second experiment compares automatically generated subsystems to subsystem examples proposed by experts. We conclude that two methods are most effective: Extracting concepts using common substrings, and extracting those concepts that relate to the names of routines in the files.
A Cooperative Program Understanding Environment
- Journal of Software Maintenance
, 1994
"... The large size and high-percentage of domain-specific code in most legacy systems makes it unlikely that automated understanding tools will be able to completely understand them. Yet automated tools can clearly recognize portions of the design. That suggests exploring environments in which programme ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
The large size and high-percentage of domain-specific code in most legacy systems makes it unlikely that automated understanding tools will be able to completely understand them. Yet automated tools can clearly recognize portions of the design. That suggests exploring environments in which programmer and system work together to understand legacy software. This paper describes such an environment that supports programmer and system cooperating to extract an object-oriented design from legacy software systems. It combines an automated program understanding component that recognizes standard implementations of domain independent plans with with a structured notebook that the programmer uses to link object-oriented design primitives to arbitrary source code fragments. This jointly extracted information is used to support conceptual queries about the program's code and design. 1 Introduction The standard goal of most program understanding efforts is a tool that takes source code and extrac...
Flexible Control for Program Recognition
- WORKING CONFERENCE ON REVERSE ENGINEERING
, 1993
"... Recognizing commonly used data structures and algorithms is a key activity in reverse engineering. Systems developed to automate this recognition process have been isolated, stand-alone systems, usually targeting a specific task. We are interested in applying recognition to multiple tasks requiring ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Recognizing commonly used data structures and algorithms is a key activity in reverse engineering. Systems developed to automate this recognition process have been isolated, stand-alone systems, usually targeting a specific task. We are interested in applying recognition to multiple tasks requiring reverse engineering, such as inspecting, maintaining, and reusing software. This requires a flexible, adaptable recognition architecture, since the tasks vary in the amount and accuracy of knowledge available about the program, the requirements on recognition power, and the resources available. We have developed a recognition system based on graph parsing. It has a flexible, adaptable control structure that can accept advice from external agents. Its flexibility arises from using a chart parsing algorithm. We are studying this graph parsing approach to determine what types of advice can enhance its capabilities, performance, and scalability.
From Run-time Behavior to Usage Scenarios: An Interaction-Pattern Mining Approach
- In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2002
"... A key challenge facing IT organizations today is their evolution towards adopting e-business practices that gives rise to the need for reengineering their underlying software systems. Any reengineering effort has to be aware of the functional requirements of the subject system, in order not to viola ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
A key challenge facing IT organizations today is their evolution towards adopting e-business practices that gives rise to the need for reengineering their underlying software systems. Any reengineering effort has to be aware of the functional requirements of the subject system, in order not to violate the integrity of its intended uses. However, as software systems get regularly maintained throughout their lifecycle, the documentation of their requirements often become obsolete or get lost. To address this problem of "software requirements loss", we have developed an interaction-pattern mining method for the recovery of functional requirements as usage scenarios. Our method analyzes traces of the run-time system-user interaction to discover frequently recurring patterns; these patterns correspond to the functionality currently exercised by the system users, represented as usage scenarios. The discovered scenarios provide the basis for reengineering the software system into web-accessible components, each one supporting one of the discovered scenarios. In this paper, we describe IPM2, our interaction-pattern discovery algorithm, we illustrate it with a case study from a real application and we give an overview of the reengineering process in the context of which it is employed.
Program understanding as constraint satisfaction
- Journal of Automated Software Engineering
, 1995
"... Abstract. The process of understanding a source code in a high-level programming language involves complex computation. Given a piece of legacy code and a library of program plan templates, understanding the code corresponds to building mappings from parts of the source code to particular program pl ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Abstract. The process of understanding a source code in a high-level programming language involves complex computation. Given a piece of legacy code and a library of program plan templates, understanding the code corresponds to building mappings from parts of the source code to particular program plans. These mappings could be used to assist an expert in reverse engineering legacy code, to facilitate software reuse, or to assist in the translation of the source into another programming language. In this paper we present a model of program understanding using constraint satisfaction. Within this model we intelligently compose a partial global picture of the source program code by transforming knowledge about the problem domain and the program itself into sets of constraints. We then systematically study different search algorithms and empirically evaluate their performance. One advantage of the constraint satisfaction model is its generality; many previous attempts in program understanding could now be cast under the same spectrum of heuristics, and thus be readily compared. Another advantage is the improvement in search efficiency using various heuristic techniques in constraint satisfaction. Keywords: 1. Foreword Three years have passed since the inception of the idea of applying constraint-based representation and techniques (CSP) to program understanding and design pattern recovery. The
Identifying crosscutting concerns using fan-in analysis
- ACM Transactions on Software Engineering and Methodology
, 2007
"... Aspect mining is a reverse engineering process that aims at finding crosscutting concerns in existing systems. This paper proposes an aspect mining approach based on determining methods that are called from many different places, and hence have a high fan-in, which can be seen as a symptom of crossc ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Aspect mining is a reverse engineering process that aims at finding crosscutting concerns in existing systems. This paper proposes an aspect mining approach based on determining methods that are called from many different places, and hence have a high fan-in, which can be seen as a symptom of crosscutting functionality. The approach is semi-automatic, and consists of three steps: metric calculation, method filtering, and call site analysis. Carrying out these steps is an interactive process supported by an Eclipse plug-in called FINT. Fan-in analysis has been applied to three open source Java systems, totaling around 200,000 lines of code. The most interesting concerns identified are discussed in detail, which includes several concerns not previously discussed in the aspect-oriented literature. The results show that a significant number of crosscutting concerns can be recognized using fan-in analysis, and each of the three steps can be supported by tools.
Program Plan Recognition For Year 2000 Tools
"... There are many commercial tools that address various aspects of the Year 2000 problem. None of these tools, however, make any documented use of plan-based techniques for automated concept recovery. This implies a general perception that plan-based techniques is not useful for this problem. This pape ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
There are many commercial tools that address various aspects of the Year 2000 problem. None of these tools, however, make any documented use of plan-based techniques for automated concept recovery. This implies a general perception that plan-based techniques is not useful for this problem. This paper argues that this perception is incorrect and these techniques are in fact mature enough to make a significant contribution. In particular, we show representative code fragments illustrating "Year 2000" problems, discuss the problems inherent in recognizing the higher level concepts these fragments implement using pattern-based and rule-based techniques, demonstrate that they can be represented in a programming plan framework, and present some initial experimental evidence that suggests that current algorithms can locate these plans in linear time. Finally, we discuss several ways to integrate plan-based techniques with existing Year 2000 tools. 1991 Computing Reviews Classification System: D.2.2, D.2.3, D.2.7., D.3.4, F.3.1, I.2.2. Keywords and Phrases: Software maintenance, program understanding, plan-based concept recovery, COBOL, Y2K. Note: To appear in Proceedings of the 4th IEEE Working Conference on Reverse Engineering, October, 1997 Note: Work carried out under project SEN-1.1, Software Renovation. 1

