Results 1 -
5 of
5
Reading Beside the Lines: Indentation as a Proxy for Complexity Metrics
"... Maintainers face the daunting task of wading through a collection of both new and old revisions, trying to ferret out revisions which warrant personal inspection. One can rank revisions by size/lines of code (LOC), but often, due to the distribution of the size of changes, revisions will be of simil ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Maintainers face the daunting task of wading through a collection of both new and old revisions, trying to ferret out revisions which warrant personal inspection. One can rank revisions by size/lines of code (LOC), but often, due to the distribution of the size of changes, revisions will be of similar size. If we can’t rank revisions by LOC perhaps we can rank by Halstead’s and McCabe’s complexity metrics? However, these metrics are problematic when applied to code fragments (revisions) written in multiple languages: special parsers are required which may not support the language or dialect used; analysis tools may not understand code fragments. We propose using the statistical moments of indentation as a lightweight, language independent, revision/diff friendly metric which actually proxies classical complexity metrics. We have extensively evaluated our approach against the entire CVS histories of the 278 of the most popular and most active SourceForge projects. We found that our results are linearly correlated and rankcorrelated with traditional measures of complexity, suggesting that measuring indentation is a cheap and accurate proxy for code complexity of revisions. Thus ranking revisions by the standard deviation and summation of indentation will be very similar to ranking revisions by complexity. 1
From Indentation Shapes to Code Structures
"... In a previous study, we showed that indentation was regular across multiple languages and the variance in the level of indentation of a block of revised code is correlated with metrics such as McCabe Cyclomatic complexity. Building on that work the current paper investigates the relationship between ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In a previous study, we showed that indentation was regular across multiple languages and the variance in the level of indentation of a block of revised code is correlated with metrics such as McCabe Cyclomatic complexity. Building on that work the current paper investigates the relationship between the “shape ” of the indentation of the revised code block (the “revision”) and the corresponding syntactic structure of the code. We annotated revisions matching these three indentation shapes: “flat ” (all lines are equally indented), “slash ” (indentation becomes increasingly deep), or “bubble ” (indentation increases and then decreases). We then classified the code structure as one of: function definition, loop, expression, comment, etc. We studied thousands of revisions, coming from over 200 software projects, written in a variety of languages. Our study indicates that indentation shape correlates positively with code structure; that is, certain shapes typically correspond to certain code structures. For example, flat shapes commonly correspond to comments while bubble shapes commonly correspond to conditionals and function definitions. These results can form the basis of a tool framework that can analyze code in a language independent way to support browsing targeted to viewing particular code structures such as conditionals or comments. 1.
Software Process Extraction and Identification
, 2006
"... Industrial software is planned, managed, and created using a variety of approaches: sometimes a formal Software Development Life-cycle (SDLC) model is used, sometimes an approach that is less formal but has clearly identifiable stages is employed, and sometimes little if any discernible process is f ..."
Abstract
- Add to MetaCart
Industrial software is planned, managed, and created using a variety of approaches: sometimes a formal Software Development Life-cycle (SDLC) model is used, sometimes an approach that is less formal but has clearly identifiable stages is employed, and sometimes little if any discernible process is followed. In this paper, we extract and correlate the software development process with behaviour and data found within a project's source control repository. We do so by analyzing fine grained changes, revisions, and aggregations of revisions, so that we can correlate them with the stage of the software development process that the project was in at the time they were made. To label intervals of revisions we use machine learning and artificial intelligence techniques including N-Nearest Neighbours classifiers, Markov models, Hidden Markov Models, Markov Decision Processes and Partially Observable Markov Decision Processes. These techniques initially learn the stages from annotated data and then classify unknown data. We describe how to pose the problem using these tools and we evaluate their e#ectiveness on several case studies.
Reading Beside the Lines: Using Indentation to Rank Revisions by Complexity
"... Maintainers often face the daunting task of wading through a collection of both new and old revisions, trying to ferret out those that warrant detailed inspection. Perhaps the most obvious way to rank revisions is by size in terms of lines of code (LOC); this technique has the advantage of being bot ..."
Abstract
- Add to MetaCart
Maintainers often face the daunting task of wading through a collection of both new and old revisions, trying to ferret out those that warrant detailed inspection. Perhaps the most obvious way to rank revisions is by size in terms of lines of code (LOC); this technique has the advantage of being both simple and fast. However, it is well known that the vast majority of revisions are quite small, and so we would like a way of distinguishing between simple and complex changes of the same size. Classical complexity metrics, such as Halstead’s and McCabe’s, could be used but they are hard to apply to code fragments written in multiple programming languages. We propose using the statistical moments of indentation as a lightweight, language independent, revision/diff friendly metric as a proxy for classical complexity metrics. We have evaluated our approach against the entire CVS histories of the 278 of the most popular and most active SourceForge projects. We found that our results are linearly correlated and rank-correlated with traditional measures of complexity, suggesting that measuring indentation is a cheap and accurate proxy for code complexity of revisions. Thus ranking revisions by the standard deviation and summation of indentation yields results that are very similar to ranking revisions by complexity.

