Results 1 - 10
of
56
Identifying Similar Code with Program Dependence Graphs
, 2001
"... We present an approach to identify similar code in programs based on finding similar subgraphs in attributed directed graphs. This approach is used on program dependence graphs and therefore considers not only the syntactic structure of programs but also the data flow within (as an abstraction of th ..."
Abstract
-
Cited by 107 (2 self)
- Add to MetaCart
We present an approach to identify similar code in programs based on finding similar subgraphs in attributed directed graphs. This approach is used on program dependence graphs and therefore considers not only the syntactic structure of programs but also the data flow within (as an abstraction of the semantics). As a result, there is no tradeoff between precision and recall---our approach is very good in both. An evaluation of our prototype implementation shows that our approach is feasible and gives very good results despite the non polynomial complexity of the problem.
Code compression
- In Proc. Conf. on Programming Languages Design and Implementation
, 1997
"... Current research in compiler optimization counts mainly CPU time and perhaps the first cache level or two. This view has been important but is becoming myopic, at least from a system-wide viewpoint, as the ratio of network and disk speeds to CPU speeds grows exponentially. For example, we have seen ..."
Abstract
-
Cited by 80 (11 self)
- Add to MetaCart
Current research in compiler optimization counts mainly CPU time and perhaps the first cache level or two. This view has been important but is becoming myopic, at least from a system-wide viewpoint, as the ratio of network and disk speeds to CPU speeds grows exponentially. For example, we have seen the CPU idle for most of the time during paging, so compressing pages can increase total performance even though the CPU must decompress or interpret the page contents. Another profile shows that many functions are called just once, so reduced paging could pay for their interpretation overhead. This paper describes:. Measurements that show how code compression can save space and total time in some important real-world scenarios.. A compressed executable representation that is roughly the same size as gzipped x86 programs and can be interpreted without decompression. It can also be compiled to high-quality machine code at 2.5 megabytes per second on a 120MHz Pentium processor l A compressed “wire ” representation that must be decompressed before execution but is, for example, roughly 21 % the size of SPARC code when compressing gee.
An Evaluation of Clone Detection Techniques for Identifying Crosscutting Concerns
- IN PROC. OF THE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM
, 2004
"... Code implementing a crosscutting concern is often spread over many different parts of an application. Identifying such code automatically greatly improves both the maintainability and the evolvability of the application. First of all, it allows a developer to more easily find the places in the co ..."
Abstract
-
Cited by 32 (9 self)
- Add to MetaCart
Code implementing a crosscutting concern is often spread over many different parts of an application. Identifying such code automatically greatly improves both the maintainability and the evolvability of the application. First of all, it allows a developer to more easily find the places in the code that must be changed when the concern changes, and thus makes such changes less time consuming and less prone to errors. Second, it allows a developer to refactor the code, so that it uses modern and more advanced abstraction mechanisms, thereby restoring its modularity. In this paper, we evaluate the suitability of clone detection as a technique for the identification of crosscutting concerns. To that end, we manually identify four specific concerns in an industrial C application, and analyze to what extent clone detection is capable of finding these concerns. We consider our results as a stepping stone toward an automated "concern miner" based on clone detection.
A Survey on Software Clone Detection Research
- SCHOOL OF COMPUTING TR 2007-541, QUEEN’S UNIVERSITY
, 2007
"... Code duplication or copying a code fragment and then reuse by pasting with or without any modifications is a well known code smell in software maintenance. Several studies show that about 5 % to 20 % of a software systems can contain duplicated code, which is basically the results of copying existin ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
Code duplication or copying a code fragment and then reuse by pasting with or without any modifications is a well known code smell in software maintenance. Several studies show that about 5 % to 20 % of a software systems can contain duplicated code, which is basically the results of copying existing code fragments and using then by pasting with or without minor modifications. One of the major shortcomings of such duplicated fragments is that if a bug is detected in a code fragment, all the other fragments similar to it should be investigated to check the possible existence of the same bug in the similar fragments. Refactoring of the duplicated code is another prime issue in software maintenance although several studies claim that refactoring of certain clones are not desirable and there is a risk of removing them. However, it is also widely agreed that clones should at least be detected. In this paper, we survey the state of the art in clone detection research. First, we describe the clone terms commonly used in the literature along with their corresponding mappings to the commonly used clone types. Second, we provide a review of the existing
Detecting higher-level similarity patterns in programs
- In ESEC/FSE
, 2005
"... Cloning in software systems is known to create problems during software maintenance. Several techniques have been proposed to detect the same or similar code fragments in software, so-called simple clones. While the knowledge of simple clones is useful, detecting design-level similarities in softwar ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
Cloning in software systems is known to create problems during software maintenance. Several techniques have been proposed to detect the same or similar code fragments in software, so-called simple clones. While the knowledge of simple clones is useful, detecting design-level similarities in software could ease maintenance even further, and also help us identify reuse opportunities. We observed that recurring patterns of simple clones – so-called structural clones- often indicate the presence of interesting design-level similarities. An example would be patterns of collaborating classes or components. Finding structural clones that signify potentially useful design information requires efficient techniques to analyze the bulk of simple clone data and making non-trivial inferences based on the abstracted information. In this paper, we describe a practical solution to the problem of
Deckard: Scalable and accurate tree-based detection of code clones
- In ICSE
, 2007
"... Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Detecting code clones has many software engineering applications. Existing approaches either do not scale to large code bases or are not robust against minor code modifications. In this paper, we present an efficient algorithm for identifying similar subtrees and apply it to tree representations of source code. Our algorithm is based on a novel characterization of subtrees with numerical vectors in the Euclidean space R n and an efficient algorithm to cluster these vectors w.r.t. the Euclidean distance metric. Subtrees with vectors in one cluster are considered similar. We have implemented our tree similarity algorithm as a clone detection tool called DECKARD and evaluated it on large code bases written in C and Java including the Linux kernel and JDK. Our experiments show that DECKARD is both scalable and accurate. It is also language independent, applicable to any language with a formally specified grammar. 1.
Effective, Automatic Procedure Extraction
, 2003
"... Legacy code can often be made more understandable and maintainable by extracting out selected sets of statements to form procedures and replacing the extracted code with procedure calls. Sets of statements that are noncontiguous and/or include non-local jumps (caused by gotos, breaks, continues, etc ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
Legacy code can often be made more understandable and maintainable by extracting out selected sets of statements to form procedures and replacing the extracted code with procedure calls. Sets of statements that are noncontiguous and/or include non-local jumps (caused by gotos, breaks, continues, etc.) can be difficult to extract, and usually cause previous automatic-extraction algorithms to fail or to produce poor results.
Toward a Taxonomy of Clones in Source Code: A Case Study
, 2003
"... Code cloning --- that is, the gratuitous duplication of source code within a software system --- is an endemic problem in large, industrial systems [9, 7]. While there has been much research into techniques for clone detection and analysis, there has been relatively little empirical study on charact ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
Code cloning --- that is, the gratuitous duplication of source code within a software system --- is an endemic problem in large, industrial systems [9, 7]. While there has been much research into techniques for clone detection and analysis, there has been relatively little empirical study on characterizing how, where, and why clones occur in industrial software systems. In this paper, we present a preliminary categorization scheme for code clones, and we discuss how we have applied this taxonomy in a case study performed on the file system subsystem of the Linux operating system. Our case study yielded several interesting results, including that cloning is rampant both within particular file system implementations and across different ones, and that as many as 13% of the 4407 functions that are more than six lines long were involved in a clone-pair relationship.
Mining specifications of malicious behavior
- In Proceedings of the 6th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering
, 2007
"... Malware detectors require a specification of malicious behavior. Typically, these specifications are manually constructed by investigating known malware. We present an automatic technique to overcome this laborious manual process. Our technique derives such a specification by comparing the execution ..."
Abstract
-
Cited by 26 (7 self)
- Add to MetaCart
Malware detectors require a specification of malicious behavior. Typically, these specifications are manually constructed by investigating known malware. We present an automatic technique to overcome this laborious manual process. Our technique derives such a specification by comparing the execution behavior of a known malware against the execution behaviors of a set of benign programs. In other words, we mine the malicious behavior present in a known malware that is not present in a set of benign programs. The output of our algorithm can be used by malware detectors to detect malware variants. Since our algorithm provides a succinct description of malicious behavior present in a malware, it can also be used by security analysts for understanding the malware. We have implemented a prototype based on our algorithm and tested it on several malware programs. Experimental results obtained from our prototype indicate that our algorithm is effective in extracting malicious behaviors that can be used to detect malware variants.

