Results 1 - 10
of
11
Exploring the neighborhood with Dora to expedite software maintenance
- In 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE/ACM
, 2007
"... Completing software maintenance and evolution tasks for today’s large, complex software systems can be difficult, often requiring considerable time to understand the system well enough to make correct changes. Despite evidence that successful programmers use program structure as well as identifier n ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Completing software maintenance and evolution tasks for today’s large, complex software systems can be difficult, often requiring considerable time to understand the system well enough to make correct changes. Despite evidence that successful programmers use program structure as well as identifier names to explore software, most existing program exploration techniques use either structural or lexical identifier information. By using only one type of information, automated tools ignore valuable clues about a developer’s intentions—clues critical to the human program comprehension process. In this paper, we present and evaluate a technique that exploits both program structure and lexical information to help programmers more effectively explore programs. Our approach uses structural information to focus automated program exploration and lexical information to prune irrelevant structure edges from consideration. For the important program exploration step of expanding from a seed, our experimental results demonstrate that an integrated lexical- and structural-based approach is significantly more effective than a state-of-the-art structural program exploration technique.
Automatically capturing source code context of nl-queries for software maintenance and reuse
- In ICSE ’09: Proceedings of the 31st international conference on Software engineering
, 2009
"... As software systems continue to grow and evolve, locating code for maintenance and reuse tasks becomes increasingly difficult. Existing static code search techniques using natural language queries provide little support to help developers determine whether search results are relevant, and few recomm ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
As software systems continue to grow and evolve, locating code for maintenance and reuse tasks becomes increasingly difficult. Existing static code search techniques using natural language queries provide little support to help developers determine whether search results are relevant, and few recommend alternative words to help developers reformulate poor queries. In this paper, we present a novel approach that automatically extracts natural language phrases from source code identifiers and categorizes the phrases and search results in a hierarchy. Our contextual search approach allows developers to explore the word usage in a piece of software, helping them to quickly identify relevant program elements for investigation or to quickly recognize alternative words for query reformulation. An empirical evaluation of 22 developers reveals that our contextual search approach significantly outperforms the most closely related technique in terms of effort and effectiveness. 1.
Debugging method names
- In European Conference on Object-Oriented Programming (ECOOP
, 2009
"... Abstract. Meaningful method names are crucial for the readability and maintainability of software. Existing naming conventions focus on syntactic details, leaving programmers with little or no support in assuring meaningful names. In this paper, we show that naming conventions can go much further: w ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Abstract. Meaningful method names are crucial for the readability and maintainability of software. Existing naming conventions focus on syntactic details, leaving programmers with little or no support in assuring meaningful names. In this paper, we show that naming conventions can go much further: we can mechanically check whether or not a method name and implementation are likely to be good matches for each other. The vast amount of software written in Java defines an implicit convention for pairing names and implementations. We exploit this to extract rules for method names, which are used to identify “naming bugs ” in well-known Java applications. We also present an approach for automatic suggestion of more suitable names in the presence of mismatch between name and implementation. 1
AMAP: Automatically Mining Abbreviation Expansions in Programs to Enhance Software Maintenance Tools
- MSR'08
, 2008
"... When writing software, developers often employ abbreviations in identifier names. In fact, some abbreviations may never occur with the expanded word, or occur more often in the code. However, most existing program comprehension and search tools do little to address the problem of abbreviations, and ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
When writing software, developers often employ abbreviations in identifier names. In fact, some abbreviations may never occur with the expanded word, or occur more often in the code. However, most existing program comprehension and search tools do little to address the problem of abbreviations, and therefore may miss meaningful pieces of code or relationships between software artifacts. In this paper, we present an automated approach to mining abbreviation expansions from source code to enhance software maintenance tools that utilize natural language information. Our scoped approach uses contextual information at the method, program, and general software level to automatically select the most appropriate expansion for a given abbreviation. We evaluated our approach on a set of 250 potential abbreviations and found that our scoped approach provides a 57% improvement in accuracy over the current state of the art.
Exploiting the correspondence between micro patterns and class names, in
- Proceedings of the Eighth IEEE International Working Conference on Source Code Analysis and Manipulation, 2008
"... This paper argues that semantic information encoded in natural language identifiers is a largely neglected resource for program analysis. First we show that words in Java class names relate to class properties, expressed using the recently developed micro patterns language. We analyse a large corpus ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper argues that semantic information encoded in natural language identifiers is a largely neglected resource for program analysis. First we show that words in Java class names relate to class properties, expressed using the recently developed micro patterns language. We analyse a large corpus of Java programs to create a database that links common class name words with micro patterns. Finally we report on prototype tools integrated with the Eclipse development environment. These tools use the database to inform programmers of particular problems or optimization opportunities in their code. 1
Mining source code to automatically split identifiers for software analysis
- Mining Software Repositories, International Workshop on
, 2009
"... Automated software engineering tools (e.g., program search, concern location, code reuse, quality assessment, etc.) increasingly rely on natural language information from comments and identifiers in code. The first step in analyzing words from identifiers requires splitting identifiers into their co ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Automated software engineering tools (e.g., program search, concern location, code reuse, quality assessment, etc.) increasingly rely on natural language information from comments and identifiers in code. The first step in analyzing words from identifiers requires splitting identifiers into their constituent words. Unlike natural languages, where space and punctuation are used to delineate words, identifiers cannot contain spaces. One common way to split identifiers is to follow programming language naming conventions. For example, Java programmers often use camel case, where words are delineated by uppercase letters or non-alphabetic characters. However, programmers also create identifiers by concatenating sequences of words together with no discernible delineation, which poses challenges to automatic identifier splitting. In this paper, we present an algorithm to automatically split identifiers into sequences of words by mining word frequencies in source code. With these word frequencies, our identifier splitter uses a scoring technique to automatically select the most appropriate partitioning for an identifier. In an evaluation of over 8000 identifiers from open source Java programs, our Samurai approach outperforms the existing state of the art techniques. 1.
Metaphors we Program By: Space, Action and Society in Java
"... Abstract. A corpus analysis of the standard Java documentation revealed the range of conceptual metaphors shared by library authors and users of packages such as java.util and java.bean. These metaphors included the expected mental models of internal program behaviour, but also consistent references ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. A corpus analysis of the standard Java documentation revealed the range of conceptual metaphors shared by library authors and users of packages such as java.util and java.bean. These metaphors included the expected mental models of internal program behaviour, but also consistent references to a spatial image-world with material properties and flows. More surprisingly, program components are metaphorically understood as actors with beliefs and intentions, working together according to social relationships. Rather than mechanical imperative models or mathematical declarative ones, it seems that one of the most widespread bases for conceptual models of programming is of social entities that act as proxies for their developers. This may have significant implications for the design of new programming languages and environments. 1.
Impact of Limited Memory Resources
"... Since early variable mnemonics were limited to as few as six to eight characters, many early programmers abbreviated concepts in their variable names. The past thirty years has seen a steady increase in permitted name length and, slowly, an increase in the actual length of identifiers. However, in t ..."
Abstract
- Add to MetaCart
Since early variable mnemonics were limited to as few as six to eight characters, many early programmers abbreviated concepts in their variable names. The past thirty years has seen a steady increase in permitted name length and, slowly, an increase in the actual length of identifiers. However, in theory names can be too long. Most obviously, in object-oriented programs, names often involve chaining of method calls and field selectors (e.g., class.firstAssignment().name.trim()). While longer names bring the potential for easier comprehension through more embedded sub-words, there are practical limits to length given limited human memory resources. The central hypothesis studied herein is that names used in modern programs have reached this limit. Statistical models derived from an experiment involving 158 programmers of varying degrees of experience show that longer names extracted from production code take more time to process and reduce correctness in a simple recall activity. This has clear negative implications for any attempt to read, and hence comprehend or manipulate, the source code of modern software. The experiment also evaluates the advantage of identifiers having ties to a programmer’s persistent memory. Combined these results reinforce past proposals advocating the use of limited, consistent, and regular vocabulary in identifier names. In particular, good naming limits length and reduces the need for specialized vocabulary. 1
ToCamelCaseorUnderscore
"... Naming conventions are generally adopted in an effort to improve program comprehension. Two of the most popular conventions are alternatives for composing multi-word identifiers: the use of underscores and the use of camel casing. While most programmers have a personal opinion as to which style is b ..."
Abstract
- Add to MetaCart
Naming conventions are generally adopted in an effort to improve program comprehension. Two of the most popular conventions are alternatives for composing multi-word identifiers: the use of underscores and the use of camel casing. While most programmers have a personal opinion as to which style is better, empirical study forms a more appropriate basis for choosing between them. The central hypothesis considered herein is that identifier style affects the speed and accuracy of manipulating programs. An empirical study of 135 programmers and non-programmers was conducted to better understand the impact of identifier style on code readability. The experiment builds on past work of others who study how readers of natural language perform such tasks. Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style. 1
To CamelCase or Under score
"... Naming conventions are generally adopted in an effort to improve program comprehension. Two of the most popular conventions are alternatives for composing multi-word identifiers: the use of underscores and the use of camel casing. While most programmers have a personal opinion as to which style is b ..."
Abstract
- Add to MetaCart
Naming conventions are generally adopted in an effort to improve program comprehension. Two of the most popular conventions are alternatives for composing multi-word identifiers: the use of underscores and the use of camel casing. While most programmers have a personal opinion as to which style is better, empirical study forms a more appropriate basis for choosing between them. The central hypothesis considered herein is that identifier style affects the speed and accuracy of manipulating programs. An empirical study of 135 programmers and non-programmers was conducted to better understand the impact of identifier style on code readability. The experiment builds on past work of others who study how readers of natural language perform such tasks. Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style. 1

