Results 1 -
9 of
9
What’s in a name? a study of identifiers
- In 14th International Conference on Program Comprehension
, 2006
"... Readers of programs have two main sources of domain information: identifier names and comments. When functions are uncommented, as many are, comprehension is almost exclusively dependent on the identifier names. Assuming that writers of programs want to create quality identifiers (e.g., include rele ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
Readers of programs have two main sources of domain information: identifier names and comments. When functions are uncommented, as many are, comprehension is almost exclusively dependent on the identifier names. Assuming that writers of programs want to create quality identifiers (e.g., include relevant domain knowledge) how should they go about it? For example, do the initials of a concept name provide enough information to represent the concept? If not, and a longer identifier is needed, is an abbreviation satisfactory or does the concept need to be captured in an identifier that includes full words? Results from a study designed to investigate these questions are reported. The study involved over 100 programmers who were asked to describe twelve different functions. The functions used three different “levels ” of identifiers: single letters, abbreviations, and full words. Responses allow the level of comprehension associated with the different levels to be studied. The functions include standard algorithms studied in computer science courses as well as functions extracted from production code. The results show that full word identifiers lead to the best comprehension; however, in many cases, there is no statistical difference between full words and abbreviations. 1
D.: Syntactic identifier conciseness and consistency
- In: SCAM ’06, IEEE CS Press
, 2006
"... Readers of programs have two main sources of domain information: identifier names and comments. It is therefore important for the identifier names (as well as comments) to communicate clearly the concepts that they are meant to represent. Deißenböck and Pizka recently introduced rules for concise an ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Readers of programs have two main sources of domain information: identifier names and comments. It is therefore important for the identifier names (as well as comments) to communicate clearly the concepts that they are meant to represent. Deißenböck and Pizka recently introduced rules for concise and consistent variable naming. One requirement of their approach is an expert provided mapping from identifiers to concepts. An approach for the concise and consistent naming of variables that does not require any additional information (e.g., a mapping) is presented. Using a pool of 48 million lines of code, experiments with the resulting syntactic rules for concise and consistent naming illustrate that violations of the syntactic pattern exist. Two case studies show that three quarters of the violations uncovered are “real”. That is they would be identified using a concept mapping. Techniques for reducing the number of false positives are also presented. Finally, two related studies show that evolution does not introduce rule violations and that programmers tend to use a rather limited vocabulary.
Research An Empirical Study of Rules for Well-Formed Identifiers
"... Readers of programs have two main sources of domain information: identifier names and comments. In order to efficiently maintain source code, it is important for the identifier names (as well as comments) to communicate clearly the concepts that they represent. Deißenböck and Pizka recently introduc ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Readers of programs have two main sources of domain information: identifier names and comments. In order to efficiently maintain source code, it is important for the identifier names (as well as comments) to communicate clearly the concepts that they represent. Deißenböck and Pizka recently introduced two rules for creating well-formed identifiers: one considers the consistency of identifiers and the other their conciseness. These rules require a mapping from identifiers to the concepts that they represent, which may be costly to develop after the initial release of a system.
Software Fault Prediction using Language Processing
"... Accurate prediction of faulty modules reduces the cost of software development and evolution. Two case studies with a language-processing based fault prediction measure are presented. The measure, refereed to as a QALP score, makes use of techniques from information retrieval to judge software quali ..."
Abstract
- Add to MetaCart
Accurate prediction of faulty modules reduces the cost of software development and evolution. Two case studies with a language-processing based fault prediction measure are presented. The measure, refereed to as a QALP score, makes use of techniques from information retrieval to judge software quality. The QALP score has been shown to correlate with human judgements of software quality. The two case studies consider the measure’s application to fault prediction using two programs (one open source, one proprietary). Linear mixed-effects regression models are used to identify relationships between defects and QALP score. Results, while complex, show that little correlation exists in the first case study, while statistically significant correlations exists in the second. In this second study the QALP score is helpful in predicting faults in modules (files) with its usefulness growing as module size increases.
Empirical Software Engineering manuscript No. (will be inserted by the editor) Quantifying Identifier Quality: An Analysis of Trends
"... The date of receipt and acceptance will be inserted by the editor Abstract Identifiers, which represent the defined concepts in a program, account for, by some measures, almost three quarters of source code. The makeup of identifiers plays a key role in how well they communicate these defined concep ..."
Abstract
- Add to MetaCart
The date of receipt and acceptance will be inserted by the editor Abstract Identifiers, which represent the defined concepts in a program, account for, by some measures, almost three quarters of source code. The makeup of identifiers plays a key role in how well they communicate these defined concepts. An empirical study of identifier quality based on almost 50 million lines of code, covering thirty years, four programming languages, and both open and proprietary source is presented. For the purposes of the study, identifier quality is conservatively defined as the possibility of constructing the identifier out of dictionary words or known abbreviations. Four hypotheses related to identifier quality are considered using linear mixed effect regression models. For example, the first hypothesis is that modern programs include higher quality identifiers than older ones. In this case, the results show that better programming practices are producing higher quality identifies. Results also confirm some commonly held beliefs, such as proprietary code having more acronyms than open source code.
Impact of Limited Memory Resources
"... Since early variable mnemonics were limited to as few as six to eight characters, many early programmers abbreviated concepts in their variable names. The past thirty years has seen a steady increase in permitted name length and, slowly, an increase in the actual length of identifiers. However, in t ..."
Abstract
- Add to MetaCart
Since early variable mnemonics were limited to as few as six to eight characters, many early programmers abbreviated concepts in their variable names. The past thirty years has seen a steady increase in permitted name length and, slowly, an increase in the actual length of identifiers. However, in theory names can be too long. Most obviously, in object-oriented programs, names often involve chaining of method calls and field selectors (e.g., class.firstAssignment().name.trim()). While longer names bring the potential for easier comprehension through more embedded sub-words, there are practical limits to length given limited human memory resources. The central hypothesis studied herein is that names used in modern programs have reached this limit. Statistical models derived from an experiment involving 158 programmers of varying degrees of experience show that longer names extracted from production code take more time to process and reduce correctness in a simple recall activity. This has clear negative implications for any attempt to read, and hence comprehend or manipulate, the source code of modern software. The experiment also evaluates the advantage of identifiers having ties to a programmer’s persistent memory. Combined these results reinforce past proposals advocating the use of limited, consistent, and regular vocabulary in identifier names. In particular, good naming limits length and reduces the need for specialized vocabulary. 1
ToCamelCaseorUnderscore
"... Naming conventions are generally adopted in an effort to improve program comprehension. Two of the most popular conventions are alternatives for composing multi-word identifiers: the use of underscores and the use of camel casing. While most programmers have a personal opinion as to which style is b ..."
Abstract
- Add to MetaCart
Naming conventions are generally adopted in an effort to improve program comprehension. Two of the most popular conventions are alternatives for composing multi-word identifiers: the use of underscores and the use of camel casing. While most programmers have a personal opinion as to which style is better, empirical study forms a more appropriate basis for choosing between them. The central hypothesis considered herein is that identifier style affects the speed and accuracy of manipulating programs. An empirical study of 135 programmers and non-programmers was conducted to better understand the impact of identifier style on code readability. The experiment builds on past work of others who study how readers of natural language perform such tasks. Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style. 1
To CamelCase or Under score
"... Naming conventions are generally adopted in an effort to improve program comprehension. Two of the most popular conventions are alternatives for composing multi-word identifiers: the use of underscores and the use of camel casing. While most programmers have a personal opinion as to which style is b ..."
Abstract
- Add to MetaCart
Naming conventions are generally adopted in an effort to improve program comprehension. Two of the most popular conventions are alternatives for composing multi-word identifiers: the use of underscores and the use of camel casing. While most programmers have a personal opinion as to which style is better, empirical study forms a more appropriate basis for choosing between them. The central hypothesis considered herein is that identifier style affects the speed and accuracy of manipulating programs. An empirical study of 135 programmers and non-programmers was conducted to better understand the impact of identifier style on code readability. The experiment builds on past work of others who study how readers of natural language perform such tasks. Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style. 1
SUMMARY
"... Readers of programs have two main sources of domain information: identifier names and comments. In order to efficiently maintain source code, it is important that the identifier names (as well as comments) communicate clearly the concepts they represent. Deißenböck and Pizka recently introduced two ..."
Abstract
- Add to MetaCart
Readers of programs have two main sources of domain information: identifier names and comments. In order to efficiently maintain source code, it is important that the identifier names (as well as comments) communicate clearly the concepts they represent. Deißenböck and Pizka recently introduced two rules for creating well-formed identifiers: one considers the consistency of identifiers and the other their conciseness. These rules require a mapping from identifiers to the concepts they represent, which may be costly to develop after the initial release of a system. An approach for verifying whether identifiers are well formed without any additional information (e.g., a concept mapping) is developed. Using a pool of 48 million lines of code, experiments with the resulting syntactic rules for well-formed identifiers illustrate that violations of the syntactic pattern exist. Two case studies show that three-quarters of these violations are ‘real’. That is, they could be identified using a concept mapping. Three related studies show that programmers tend to use a rather limited vocabulary, that, contrary to many other aspects of system evolution, maintenance does not introduce additional rule violations, and that open and proprietary sources differ in their percentage

