Results 1 -
4 of
4
Reading Beside the Lines: Indentation as a Proxy for Complexity Metrics
"... Maintainers face the daunting task of wading through a collection of both new and old revisions, trying to ferret out revisions which warrant personal inspection. One can rank revisions by size/lines of code (LOC), but often, due to the distribution of the size of changes, revisions will be of simil ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Maintainers face the daunting task of wading through a collection of both new and old revisions, trying to ferret out revisions which warrant personal inspection. One can rank revisions by size/lines of code (LOC), but often, due to the distribution of the size of changes, revisions will be of similar size. If we can’t rank revisions by LOC perhaps we can rank by Halstead’s and McCabe’s complexity metrics? However, these metrics are problematic when applied to code fragments (revisions) written in multiple languages: special parsers are required which may not support the language or dialect used; analysis tools may not understand code fragments. We propose using the statistical moments of indentation as a lightweight, language independent, revision/diff friendly metric which actually proxies classical complexity metrics. We have extensively evaluated our approach against the entire CVS histories of the 278 of the most popular and most active SourceForge projects. We found that our results are linearly correlated and rankcorrelated with traditional measures of complexity, suggesting that measuring indentation is a cheap and accurate proxy for code complexity of revisions. Thus ranking revisions by the standard deviation and summation of indentation will be very similar to ranking revisions by complexity. 1
From Indentation Shapes to Code Structures
"... In a previous study, we showed that indentation was regular across multiple languages and the variance in the level of indentation of a block of revised code is correlated with metrics such as McCabe Cyclomatic complexity. Building on that work the current paper investigates the relationship between ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In a previous study, we showed that indentation was regular across multiple languages and the variance in the level of indentation of a block of revised code is correlated with metrics such as McCabe Cyclomatic complexity. Building on that work the current paper investigates the relationship between the “shape ” of the indentation of the revised code block (the “revision”) and the corresponding syntactic structure of the code. We annotated revisions matching these three indentation shapes: “flat ” (all lines are equally indented), “slash ” (indentation becomes increasingly deep), or “bubble ” (indentation increases and then decreases). We then classified the code structure as one of: function definition, loop, expression, comment, etc. We studied thousands of revisions, coming from over 200 software projects, written in a variety of languages. Our study indicates that indentation shape correlates positively with code structure; that is, certain shapes typically correspond to certain code structures. For example, flat shapes commonly correspond to comments while bubble shapes commonly correspond to conditionals and function definitions. These results can form the basis of a tool framework that can analyze code in a language independent way to support browsing targeted to viewing particular code structures such as conditionals or comments. 1.
Reading Beside the Lines: Using Indentation to Rank Revisions by Complexity
"... Maintainers often face the daunting task of wading through a collection of both new and old revisions, trying to ferret out those that warrant detailed inspection. Perhaps the most obvious way to rank revisions is by size in terms of lines of code (LOC); this technique has the advantage of being bot ..."
Abstract
- Add to MetaCart
Maintainers often face the daunting task of wading through a collection of both new and old revisions, trying to ferret out those that warrant detailed inspection. Perhaps the most obvious way to rank revisions is by size in terms of lines of code (LOC); this technique has the advantage of being both simple and fast. However, it is well known that the vast majority of revisions are quite small, and so we would like a way of distinguishing between simple and complex changes of the same size. Classical complexity metrics, such as Halstead’s and McCabe’s, could be used but they are hard to apply to code fragments written in multiple programming languages. We propose using the statistical moments of indentation as a lightweight, language independent, revision/diff friendly metric as a proxy for classical complexity metrics. We have evaluated our approach against the entire CVS histories of the 278 of the most popular and most active SourceForge projects. We found that our results are linearly correlated and rank-correlated with traditional measures of complexity, suggesting that measuring indentation is a cheap and accurate proxy for code complexity of revisions. Thus ranking revisions by the standard deviation and summation of indentation yields results that are very similar to ranking revisions by complexity.

