Results 1 - 10
of
15
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code
- IEEE Transactions on Software Engineering
, 2006
"... Abstract—Recent studies have shown that large software suites contain significant amounts of replicated code. It is assumed that some of this replication is due to copy-and-paste activity and that a significant proportion of bugs in operating systems are due to copypaste errors. Existing static code ..."
Abstract
-
Cited by 45 (0 self)
- Add to MetaCart
Abstract—Recent studies have shown that large software suites contain significant amounts of replicated code. It is assumed that some of this replication is due to copy-and-paste activity and that a significant proportion of bugs in operating systems are due to copypaste errors. Existing static code analyzers are either not scalable to large software suites or do not perform robustly where replicated code is modified with insertions and deletions. Furthermore, the existing tools do not detect copy-paste related bugs. In this paper, we propose a tool, CP-Miner, that uses data mining techniques to efficiently identify copy-pasted code in large software suites and detects copy-paste bugs. Specifically, it takes less than 20 minutes for CP-Miner to identify 190,000 copy-pasted segments in Linux and 150,000 in FreeBSD. Moreover, CP-Miner has detected many new bugs in popular operating systems, 49 in Linux and 31 in FreeBSD, most of which have since been confirmed by the corresponding developers and have been rectified in the following releases. In addition, we have found some interesting characteristics of copy-paste in operating system code. Specifically, we analyze the distribution of copy-pasted code by size (number lines of code), granularity (basic blocks and functions), and modification within copypasted code. We also analyze copy-paste across different modules and various software versions. Index Terms—Software analysis, code reuse, code duplication, debugging aids, data mining.
A Survey on Software Clone Detection Research
- SCHOOL OF COMPUTING TR 2007-541, QUEEN’S UNIVERSITY
, 2007
"... Code duplication or copying a code fragment and then reuse by pasting with or without any modifications is a well known code smell in software maintenance. Several studies show that about 5 % to 20 % of a software systems can contain duplicated code, which is basically the results of copying existin ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
Code duplication or copying a code fragment and then reuse by pasting with or without any modifications is a well known code smell in software maintenance. Several studies show that about 5 % to 20 % of a software systems can contain duplicated code, which is basically the results of copying existing code fragments and using then by pasting with or without minor modifications. One of the major shortcomings of such duplicated fragments is that if a bug is detected in a code fragment, all the other fragments similar to it should be investigated to check the possible existence of the same bug in the similar fragments. Refactoring of the duplicated code is another prime issue in software maintenance although several studies claim that refactoring of certain clones are not desirable and there is a risk of removing them. However, it is also widely agreed that clones should at least be detected. In this paper, we survey the state of the art in clone detection research. First, we describe the clone terms commonly used in the literature along with their corresponding mappings to the commonly used clone types. Second, we provide a review of the existing
Aiding Comprehension of Cloning Through Categorization
, 2004
"... Management of duplicated code in software systems is important in ensuring its graceful evolution. Commonly clone detection tools return large numbers of detected clones with little or no information about them, making clone management impractical and unscalable. We have used a taxonomy of clones to ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
Management of duplicated code in software systems is important in ensuring its graceful evolution. Commonly clone detection tools return large numbers of detected clones with little or no information about them, making clone management impractical and unscalable. We have used a taxonomy of clones to augment current clone detection tools in order to increase the user comprehension of duplication of code within software systems and filter false positives from the clone set. We support our arguments by means of 2 case studies, where we found that as much as 53% of clones can be grouped to form Function clones or Partial Function clones and we were able to filter out as many as 65% of clones as false positives from the reported clone pairs.
Improved Tool Support for the Investigation of Duplication in Software
, 2005
"... Code duplication is a well documented problem in software systems. There has been considerable research into techniques for detecting duplication in software, and there are several effective tools to perform this task. However, a common problem with such tools is that the result set returned can be ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Code duplication is a well documented problem in software systems. There has been considerable research into techniques for detecting duplication in software, and there are several effective tools to perform this task. However, a common problem with such tools is that the result set returned can be too large to handle without complementory tool support. The goal of this paper is to describe the criteria for a complete tool that is designed to aid in the comprehension of cloning within a software system. Furhermore, we present a prototype of such a tool and demonstrate the value of its features through a case study on the Apache httpd web server. For example, in our study we found that a single subsystem comprising only 17% of the system code contained 38.8% of the clones.
Insights into System-Wide Code Duplication
- In WCRE
, 2004
"... Duplication of code is a common phenomenon in the development and maintenance of large software systems. The detection and removal of duplicated code has become a standard activity during the refactoring phases of a software life-cycle. However, code duplication identification tends to produce large ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Duplication of code is a common phenomenon in the development and maintenance of large software systems. The detection and removal of duplicated code has become a standard activity during the refactoring phases of a software life-cycle. However, code duplication identification tends to produce large amounts of data making the understanding of the duplication situation as a whole difficult. Reengineers can easily lose sight of the forest for the trees. There is a need to support a qualitative analysis of the duplicated code. In this paper we propose a number of visualizations of duplicated source elements that support reengineers in answering questions, e.g., which parts of the system are connected by copied code or which parts of the system are copied the most.
Cloning by Accident: An Empirical Study of Source Code Cloning across Software Systems
- Across Software Systems.International Symposium on Empirical Software Engineering (ISESE’05
, 2005
"... One of the key goals of open source development is the sharing of knowledge, experience, and solutions that pertain to a software system and its problem domain. Source code cloning is one way in which expertise can be reused across systems; cloning is known to have been used in several open source p ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
One of the key goals of open source development is the sharing of knowledge, experience, and solutions that pertain to a software system and its problem domain. Source code cloning is one way in which expertise can be reused across systems; cloning is known to have been used in several open source projects, such as the SCSI drivers of the Linux kernel [16] . In this paper, we discuss two case studies in which we performed clone detection and analysis on several open source systems within the same domain: we examined nine text editors written in C, and eight X-Windows window managers written in C and C++. To our surprise, we found little evidence of "true" cloning activity, but we did notice a significant number of "accidental" clones --- that is, code fragments that are similar due to the precise protocols they must use when interacting with a given API or set of libraries. We further discuss the nature of "true" versus "accidental" clones, as well as the details of our case studies.
Supporting the Analysis of Clones in Software Systems: A Case Study
- JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE
, 2006
"... ... In this paper we present an in-depth case study of cloning in a large software system that is in wide use, the Apache web server; we provide insights into cloning as it exists in this system, and we demonstrate techniques to manage and make effective use of the large result sets of clone detecti ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
... In this paper we present an in-depth case study of cloning in a large software system that is in wide use, the Apache web server; we provide insights into cloning as it exists in this system, and we demonstrate techniques to manage and make effective use of the large result sets of clone detection tools. In our case study, we found several interesting types of cloning occurrences, such as "cloning hotspots", where a single subsystem comprising only 17% of the system code contained 38.8% of the clones. We also found several examples of cloning behavior that were beneficial to the development of the system, in particular cloning as a way to add experimental functionality
A Taxonomy of Clones in Source Code: The Re-Engineers Most Wanted List
- In Proceedings of the 2nd International Workshop on Detection of Software Clones (IWDSC’03), 2pp
, 2003
"... Code cloning --- that is, the gratuitous duplication of source code within a software system --- is an endemic problem in large, industrial systems [6, 5]. While there has been much research into techniques for clone detection and analysis, there has been relatively little empirical study on charact ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Code cloning --- that is, the gratuitous duplication of source code within a software system --- is an endemic problem in large, industrial systems [6, 5]. While there has been much research into techniques for clone detection and analysis, there has been relatively little empirical study on characterizing how, where, and why clones occur in industrial software systems. Our current research is to perform an in-depth analysis of code cloning in real software systems and to build a taxonomy of types of code duplication.
An Empirical Study of Function Clones in Open Source Software
, 2008
"... The new hybrid clone detection tool NICAD combines the strengths and overcomes the limitations of both textbased and AST-based clone detection techniques to yield highly accurate identification of cloned code in software systems. In this paper, we present a first empirical study of function clones i ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The new hybrid clone detection tool NICAD combines the strengths and overcomes the limitations of both textbased and AST-based clone detection techniques to yield highly accurate identification of cloned code in software systems. In this paper, we present a first empirical study of function clones in open source software using NICAD. We examine more than 15 open source C and Java systems, including the entire Linux Kernel and Apache
“Cloning considered harmful” considered harmful
- IN: THIRTEENTH WORKING CONFERENCE ON REVERSE ENGINEERING (WCRE
, 2006
"... Current literature on the topic of duplicated (cloned) code in software systems often considers duplication harmful to the system quality and the reasons commonly cited for duplicating code often have a negative connotation. While these positions are sometimes correct, during our case studies we hav ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Current literature on the topic of duplicated (cloned) code in software systems often considers duplication harmful to the system quality and the reasons commonly cited for duplicating code often have a negative connotation. While these positions are sometimes correct, during our case studies we have found that this is not universally true, and we have found several situations where code duplication seems to be a reasonable or even beneficial design option. For example, a method of introducing experimental changes to core subsystems is to duplicate the subsystem and introduce changes there in a kind of sandbox testbed. As features mature and become stable within the experimental subsystem, they can then be introduced gradually into the stable code base. In this way risk of introducing instabilities in the stable version is minimized. This paper describes several patterns of cloning that we have encountered in our case studies and discusses the advantages and disadvantages associated with using them.

