Empirical Software Engineering manuscript No. (will be inserted by the editor) Quantifying Identifier Quality: An Analysis of Trends
BibTeX
@MISC{Lawrie_empiricalsoftware,
author = {Dawn Lawrie and Henry Feild and David Binkley},
title = {Empirical Software Engineering manuscript No. (will be inserted by the editor) Quantifying Identifier Quality: An Analysis of Trends},
year = {}
}
OpenURL
Abstract
The date of receipt and acceptance will be inserted by the editor Abstract Identifiers, which represent the defined concepts in a program, account for, by some measures, almost three quarters of source code. The makeup of identifiers plays a key role in how well they communicate these defined concepts. An empirical study of identifier quality based on almost 50 million lines of code, covering thirty years, four programming languages, and both open and proprietary source is presented. For the purposes of the study, identifier quality is conservatively defined as the possibility of constructing the identifier out of dictionary words or known abbreviations. Four hypotheses related to identifier quality are considered using linear mixed effect regression models. For example, the first hypothesis is that modern programs include higher quality identifiers than older ones. In this case, the results show that better programming practices are producing higher quality identifies. Results also confirm some commonly held beliefs, such as proprietary code having more acronyms than open source code.







