Results 1 - 10
of
11
A Survey on Software Clone Detection Research
- SCHOOL OF COMPUTING TR 2007-541, QUEEN’S UNIVERSITY
, 2007
"... Code duplication or copying a code fragment and then reuse by pasting with or without any modifications is a well known code smell in software maintenance. Several studies show that about 5 % to 20 % of a software systems can contain duplicated code, which is basically the results of copying existin ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
Code duplication or copying a code fragment and then reuse by pasting with or without any modifications is a well known code smell in software maintenance. Several studies show that about 5 % to 20 % of a software systems can contain duplicated code, which is basically the results of copying existing code fragments and using then by pasting with or without minor modifications. One of the major shortcomings of such duplicated fragments is that if a bug is detected in a code fragment, all the other fragments similar to it should be investigated to check the possible existence of the same bug in the similar fragments. Refactoring of the duplicated code is another prime issue in software maintenance although several studies claim that refactoring of certain clones are not desirable and there is a risk of removing them. However, it is also widely agreed that clones should at least be detected. In this paper, we survey the state of the art in clone detection research. First, we describe the clone terms commonly used in the literature along with their corresponding mappings to the commonly used clone types. Second, we provide a review of the existing
Improved Tool Support for the Investigation of Duplication in Software
, 2005
"... Code duplication is a well documented problem in software systems. There has been considerable research into techniques for detecting duplication in software, and there are several effective tools to perform this task. However, a common problem with such tools is that the result set returned can be ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Code duplication is a well documented problem in software systems. There has been considerable research into techniques for detecting duplication in software, and there are several effective tools to perform this task. However, a common problem with such tools is that the result set returned can be too large to handle without complementory tool support. The goal of this paper is to describe the criteria for a complete tool that is designed to aid in the comprehension of cloning within a software system. Furhermore, we present a prototype of such a tool and demonstrate the value of its features through a case study on the Apache httpd web server. For example, in our study we found that a single subsystem comprising only 17% of the system code contained 38.8% of the clones.
How developers copy
- In Proceedings of International Conference on Program Comprehension (ICPC 2006
, 2006
"... Copy-paste programming is dangerous as it may lead to hidden dependencies between different parts of the system. Modifying clones is not always straight forward, because we might not know all the places that need modification. This is even more of a problem when several developers need to know about ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Copy-paste programming is dangerous as it may lead to hidden dependencies between different parts of the system. Modifying clones is not always straight forward, because we might not know all the places that need modification. This is even more of a problem when several developers need to know about how to change the clones. In this paper, we correlate the code clones with the time of the modification and with the developer that performed the modification to detect patterns of how developers copy from one another. We develop a visualization, named Clone Evolution View 1, to represent the evolution of the duplicated code. We show the relevance of our approach on several large case studies and we distill our experience in forms of interesting copy patterns.
Supporting the Analysis of Clones in Software Systems: A Case Study
- JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION: RESEARCH AND PRACTICE
, 2006
"... ... In this paper we present an in-depth case study of cloning in a large software system that is in wide use, the Apache web server; we provide insights into cloning as it exists in this system, and we demonstrate techniques to manage and make effective use of the large result sets of clone detecti ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
... In this paper we present an in-depth case study of cloning in a large software system that is in wide use, the Apache web server; we provide insights into cloning as it exists in this system, and we demonstrate techniques to manage and make effective use of the large result sets of clone detection tools. In our case study, we found several interesting types of cloning occurrences, such as "cloning hotspots", where a single subsystem comprising only 17% of the system code contained 38.8% of the clones. We also found several examples of cloning behavior that were beneficial to the development of the system, in particular cloning as a way to add experimental functionality
Archeology of code duplication: Recovering duplication chains from small duplication fragments
- In Proc. 7 th Int’l. Symposium on Symbolic and Numeric Algorithms for Scientific Computing
, 2005
"... Code duplication is a common problem, and a wellknown sign of bad design. As a result of that, in the last decade, the issue of detecting code duplication led to various solutions and tools that can automatically find duplicated blocks of code. However, duplicated fragments rarely remain identical a ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Code duplication is a common problem, and a wellknown sign of bad design. As a result of that, in the last decade, the issue of detecting code duplication led to various solutions and tools that can automatically find duplicated blocks of code. However, duplicated fragments rarely remain identical after they are copied; they are oftentimes modified here and there. This adaptation usually “scatters ” the duplicated code block into a large amount of small “islands ” of duplication, which detected and analyzed separately hide the real magnitude and impact of the duplicated block. In this paper we propose a novel, automated approach for recovering duplication blocks, by composing small isolated fragments of duplication into larger and more relevant duplication chains. We validate both the efficiency and the scalability of the approach by applying it on several well known open-source case-studies and discussing some relevant findings. By recovering such duplication chains, the maintenance engineer is provided with additional cases of duplication that can lead to relevant refactorings, and which are usually missed by other detection methods.
An Information Retrieval Process to Aid in the Analysis of Code Clones
"... The advent of new static analysis tools has automated the searching for code clones, which are duplicated or similar code fragments in a program. However, clone detection tools can report many clones if the source code that is being searched is large. Programmers may have difficulty comprehending th ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The advent of new static analysis tools has automated the searching for code clones, which are duplicated or similar code fragments in a program. However, clone detection tools can report many clones if the source code that is being searched is large. Programmers may have difficulty comprehending the extensive results from the detection tool, which may inhibit the ability to maintain the identified clones. Latent Semantic Indexing (LSI) is an information retrieval technique that attempts to find relationships in a corpus based on the analysis of the documents in the corpus and the terms in the documents. In this paper, LSI is used to cluster clone sets that have been identified initially by a clone detection tool. The goal of this paper is to detect trends and associations among the clustered clone sets and determine if they provide further comprehension to assist in the maintenance of clones. Experimental evaluation of the approach is reported from a sequence of tools that are chained together to perform an analysis of clones detected in the Microsoft Windows NT Kernel source code.
An Empirical Study of Function Clones in Open Source Software
, 2008
"... The new hybrid clone detection tool NICAD combines the strengths and overcomes the limitations of both textbased and AST-based clone detection techniques to yield highly accurate identification of cloned code in software systems. In this paper, we present a first empirical study of function clones i ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The new hybrid clone detection tool NICAD combines the strengths and overcomes the limitations of both textbased and AST-based clone detection techniques to yield highly accurate identification of cloned code in software systems. In this paper, we present a first empirical study of function clones in open source software using NICAD. We examine more than 15 open source C and Java systems, including the entire Linux Kernel and Apache
Fingerprinting Logic Programs
, 2007
"... Abstract. In this work we present work in progress on functionality duplication detection in logic programs. Eliminating duplicated functionality recently became prominent in context of refactoring. We describe a quantitative approach that allows to measure the “similarity ” between two predicate de ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. In this work we present work in progress on functionality duplication detection in logic programs. Eliminating duplicated functionality recently became prominent in context of refactoring. We describe a quantitative approach that allows to measure the “similarity ” between two predicate definitions. Moreover, we show how to compute a so-called “fingerprint ” for every predicate. Fingerprints capture those characteristics of the predicate that are significant when searching for duplicated functionality. Since reasoning on fingerprints is much easier than reasoning on predicate definitions, comparing the fingerprints is a promising direction in automated code duplication in logic programs. 1
Are Scripting Languages Really Different?
"... Scripting languages such as Python, Perl, Ruby and PHP are increasingly important in new software systems as web technology becomes a dominant force. These languages are often spoken of as having different properties, in particular with respect to cloning, and the question arises whether the observa ..."
Abstract
- Add to MetaCart
Scripting languages such as Python, Perl, Ruby and PHP are increasingly important in new software systems as web technology becomes a dominant force. These languages are often spoken of as having different properties, in particular with respect to cloning, and the question arises whether the observations made based on traditional languages also apply to them. In this paper we present a first experiment in measuring the cloning properties of open source software systems written in the Python scripting language using the NiCad clone detector. We compare our results for Python with previous observations of C, C#, and Java, and discover that perhaps scripting languages are not so different after all.

