Results 1 - 10
of
27
Exploring the neighborhood with Dora to expedite software maintenance
- In 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE/ACM
, 2007
"... Completing software maintenance and evolution tasks for today’s large, complex software systems can be difficult, often requiring considerable time to understand the system well enough to make correct changes. Despite evidence that successful programmers use program structure as well as identifier n ..."
Abstract
-
Cited by 56 (7 self)
- Add to MetaCart
(Show Context)
Completing software maintenance and evolution tasks for today’s large, complex software systems can be difficult, often requiring considerable time to understand the system well enough to make correct changes. Despite evidence that successful programmers use program structure as well as identifier names to explore software, most existing program exploration techniques use either structural or lexical identifier information. By using only one type of information, automated tools ignore valuable clues about a developer’s intentions—clues critical to the human program comprehension process. In this paper, we present and evaluate a technique that exploits both program structure and lexical information to help programmers more effectively explore programs. Our approach uses structural information to focus automated program exploration and lexical information to prune irrelevant structure edges from consideration. For the important program exploration step of expanding from a seed, our experimental results demonstrate that an integrated lexical- and structural-based approach is significantly more effective than a state-of-the-art structural program exploration technique.
Automatically capturing source code context of nl-queries for software maintenance and reuse
- In ICSE ’09: Proceedings of the 31st international conference on Software engineering
, 2009
"... As software systems continue to grow and evolve, locating code for maintenance and reuse tasks becomes increasingly difficult. Existing static code search techniques using natural language queries provide little support to help developers determine whether search results are relevant, and few recomm ..."
Abstract
-
Cited by 48 (7 self)
- Add to MetaCart
(Show Context)
As software systems continue to grow and evolve, locating code for maintenance and reuse tasks becomes increasingly difficult. Existing static code search techniques using natural language queries provide little support to help developers determine whether search results are relevant, and few recommend alternative words to help developers reformulate poor queries. In this paper, we present a novel approach that automatically extracts natural language phrases from source code identifiers and categorizes the phrases and search results in a hierarchy. Our contextual search approach allows developers to explore the word usage in a piece of software, helping them to quickly identify relevant program elements for investigation or to quickly recognize alternative words for query reformulation. An empirical evaluation of 22 developers reveals that our contextual search approach significantly outperforms the most closely related technique in terms of effort and effectiveness. 1.
Mining source code to automatically split identifiers for software analysis
- Mining Software Repositories, International Workshop on
, 2009
"... Automated software engineering tools (e.g., program search, concern location, code reuse, quality assessment, etc.) increasingly rely on natural language information from comments and identifiers in code. The first step in analyzing words from identifiers requires splitting identifiers into their co ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
(Show Context)
Automated software engineering tools (e.g., program search, concern location, code reuse, quality assessment, etc.) increasingly rely on natural language information from comments and identifiers in code. The first step in analyzing words from identifiers requires splitting identifiers into their constituent words. Unlike natural languages, where space and punctuation are used to delineate words, identifiers cannot contain spaces. One common way to split identifiers is to follow programming language naming conventions. For example, Java programmers often use camel case, where words are delineated by uppercase letters or non-alphabetic characters. However, programmers also create identifiers by concatenating sequences of words together with no discernible delineation, which poses challenges to automatic identifier splitting. In this paper, we present an algorithm to automatically split identifiers into sequences of words by mining word frequencies in source code. With these word frequencies, our identifier splitter uses a scoring technique to automatically select the most appropriate partitioning for an identifier. In an evaluation of over 8000 identifiers from open source Java programs, our Samurai approach outperforms the existing state of the art techniques. 1.
Debugging method names
- In European Conference on Object-Oriented Programming (ECOOP
, 2009
"... Abstract. Meaningful method names are crucial for the readability and maintainability of software. Existing naming conventions focus on syntactic details, leaving programmers with little or no support in assuring meaningful names. In this paper, we show that naming conventions can go much further: w ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Meaningful method names are crucial for the readability and maintainability of software. Existing naming conventions focus on syntactic details, leaving programmers with little or no support in assuring meaningful names. In this paper, we show that naming conventions can go much further: we can mechanically check whether or not a method name and implementation are likely to be good matches for each other. The vast amount of software written in Java defines an implicit convention for pairing names and implementations. We exploit this to extract rules for method names, which are used to identify “naming bugs ” in well-known Java applications. We also present an approach for automatic suggestion of more suitable names in the presence of mismatch between name and implementation. 1
Mining Source Code Repositories at Massive Scale using Language Modeling
- in The 10th Working Conference on Mining Software Repositories
"... Abstract—The tens of thousands of high-quality open source software projects on the Internet raise the exciting possibility of studying software development by finding patterns across truly large source code repositories. This could enable new tools for developing code, encouraging reuse, and naviga ..."
Abstract
-
Cited by 23 (5 self)
- Add to MetaCart
(Show Context)
Abstract—The tens of thousands of high-quality open source software projects on the Internet raise the exciting possibility of studying software development by finding patterns across truly large source code repositories. This could enable new tools for developing code, encouraging reuse, and navigating large projects. In this paper, we build the first giga-token probabilistic language model of source code, based on 352 million lines of Java. This is 100 times the scale of the pioneering work by Hindle et al. The giga-token model is significantly better at the code suggestion task than previous models. More broadly, our approach provides a new “lens ” for analyzing software projects, enabling new complexity metrics based on statistical analysis of large corpora. We call these metrics data-driven complexity metrics. We propose new metrics that measure the complexity of a code module and the topical centrality of a module to a software project. In particular, it is possible to distinguish reusable utility classes from classes that are part of a program’s core logic based solely on general information theoretic criteria. I.
AMAP: Automatically Mining Abbreviation Expansions in Programs to Enhance Software Maintenance Tools
- MSR'08
, 2008
"... When writing software, developers often employ abbreviations in identifier names. In fact, some abbreviations may never occur with the expanded word, or occur more often in the code. However, most existing program comprehension and search tools do little to address the problem of abbreviations, and ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
When writing software, developers often employ abbreviations in identifier names. In fact, some abbreviations may never occur with the expanded word, or occur more often in the code. However, most existing program comprehension and search tools do little to address the problem of abbreviations, and therefore may miss meaningful pieces of code or relationships between software artifacts. In this paper, we present an automated approach to mining abbreviation expansions from source code to enhance software maintenance tools that utilize natural language information. Our scoped approach uses contextual information at the method, program, and general software level to automatically select the most appropriate expansion for a given abbreviation. We evaluated our approach on a set of 250 potential abbreviations and found that our scoped approach provides a 57% improvement in accuracy over the current state of the art.
Learning natural coding conventions
- In Symposium on the Foundations of Software Engineering (FSE
, 2014
"... Every programmer has a characteristic style, ranging from pref-erences about identifier naming to preferences about object rela-tionships and design patterns. Coding conventions define a consis-tent syntactic style, fostering readability and hence maintainability. When collaborating, programmers str ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
(Show Context)
Every programmer has a characteristic style, ranging from pref-erences about identifier naming to preferences about object rela-tionships and design patterns. Coding conventions define a consis-tent syntactic style, fostering readability and hence maintainability. When collaborating, programmers strive to obey a project’s coding conventions. However, one third of reviews of changes contain feedback about coding conventions, indicating that programmers do not always follow them and that project members care deeply about adherence. Unfortunately, programmers are often unaware of coding conventions because inferring them requires a global view, one that aggregates the many local decisions programmers make and identifies emergent consensus on style. We present NATURAL-IZE, a framework that learns the style of a codebase, and suggests revisions to improve stylistic consistency. NATURALIZE builds on recent work in applying statistical natural language processing to source code. We apply NATURALIZE to suggest natural identifier names and formatting conventions. We present four tools focused on ensuring natural code during development and release management, including code review. NATURALIZE achieves 94 % accuracy in its top suggestions for identifier names. We used NATURALIZE to generate 18 patches for 5 open source projects: 14 were accepted.
Exploiting the correspondence between micro patterns and class names, in
- Proceedings of the Eighth IEEE International Working Conference on Source Code Analysis and Manipulation, 2008
"... This paper argues that semantic information encoded in natural language identifiers is a largely neglected resource for program analysis. First we show that words in Java class names relate to class properties, expressed using the recently developed micro patterns language. We analyse a large corpus ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
This paper argues that semantic information encoded in natural language identifiers is a largely neglected resource for program analysis. First we show that words in Java class names relate to class properties, expressed using the recently developed micro patterns language. We analyse a large corpus of Java programs to create a database that links common class name words with micro patterns. Finally we report on prototype tools integrated with the Eclipse development environment. These tools use the database to inform programmers of particular problems or optimization opportunities in their code. 1
To CamelCase or Under score
"... Naming conventions are generally adopted in an effort to improve program comprehension. Two of the most popular conventions are alternatives for composing multi-word identifiers: the use of underscores and the use of camel casing. While most programmers have a personal opinion as to which style is b ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Naming conventions are generally adopted in an effort to improve program comprehension. Two of the most popular conventions are alternatives for composing multi-word identifiers: the use of underscores and the use of camel casing. While most programmers have a personal opinion as to which style is better, empirical study forms a more appropriate basis for choosing between them. The central hypothesis considered herein is that identifier style affects the speed and accuracy of manipulating programs. An empirical study of 135 programmers and non-programmers was conducted to better understand the impact of identifier style on code readability. The experiment builds on past work of others who study how readers of natural language perform such tasks. Results indicate that camel casing leads to higher accuracy among all subjects regardless of training, and those trained in camel casing are able to recognize identifiers in the camel case style faster than identifiers in the underscore style. 1
Metaphors we Program By: Space, Action and Society in Java
"... Abstract. A corpus analysis of the standard Java documentation revealed the range of conceptual metaphors shared by library authors and users of packages such as java.util and java.bean. These metaphors included the expected mental models of internal program behaviour, but also consistent references ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
Abstract. A corpus analysis of the standard Java documentation revealed the range of conceptual metaphors shared by library authors and users of packages such as java.util and java.bean. These metaphors included the expected mental models of internal program behaviour, but also consistent references to a spatial image-world with material properties and flows. More surprisingly, program components are metaphorically understood as actors with beliefs and intentions, working together according to social relationships. Rather than mechanical imperative models or mathematical declarative ones, it seems that one of the most widespread bases for conceptual models of programming is of social entities that act as proxies for their developers. This may have significant implications for the design of new programming languages and environments. 1.