Results 1 - 10
of
20
Codebook: Discovering and Exploiting Relationships in Software Repositories
"... Large-scale software engineering requires communication and collaboration to successfully build and ship products. We conducted a survey with Microsoft engineers on inter-team coordination and found that the most impactful problems concerned finding and keeping track of other engineers. Since engine ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
Large-scale software engineering requires communication and collaboration to successfully build and ship products. We conducted a survey with Microsoft engineers on inter-team coordination and found that the most impactful problems concerned finding and keeping track of other engineers. Since engineers are connected by their shared work, a tool that discovers connections in their work-related repositories can help. Here we describe the Codebook framework for mining software repositories. It is flexible enough to address all of the problems identified by our survey with a single data structure (graph of people and artifacts) and a single algorithm (regular language reachability). Codebook handles a larger variety of problems than prior work, analyzes more kinds of work artifacts, and can be customized by and for end-users. To evaluate our framework’s flexibility, we built two applications, Hoozizat and Deep Intellisense. We evaluated these applications with engineers to show effectiveness in addressing multiple inter-team coordination problems. Categories and Subject Descriptors:
The Past, Present, and Future of Software Evolution
"... Change is an essential characteristic of software development, as software systems must respond to evolving requirements, platforms, and other environmental pressures. In this paper, we discuss the concept of software evolution from several perspectives. We examine how it relates to and differs from ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Change is an essential characteristic of software development, as software systems must respond to evolving requirements, platforms, and other environmental pressures. In this paper, we discuss the concept of software evolution from several perspectives. We examine how it relates to and differs from software maintenance. We discuss insights about software evolution arising from Lehman’s laws of software evolution and the staged lifecycle model of Bennett and Rajlich. We compare software evolution to other kinds of evolution, from science and social sciences, and we examine the forces that shape change. Finally, we discuss the changing nature of software in general as it relates to evolution, and we propose open challenges and future directions for software evolution research. 1. Introduction: The
Z.: Improving API Documentation Using API Usage Information (submitted
, 2009
"... Jadeite is a new Javadoc-like API documentation system that takes advantage of multiple users ’ aggregate experience to reduce difficulties that programmers have learning new APIs. Previous studies have shown that programmers often guessed that certain classes or methods should exist, and looked for ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Jadeite is a new Javadoc-like API documentation system that takes advantage of multiple users ’ aggregate experience to reduce difficulties that programmers have learning new APIs. Previous studies have shown that programmers often guessed that certain classes or methods should exist, and looked for these in the API. Jadeite’s “placeholders ” let users add new “pretend ” classes or methods that are displayed in the actual API documentation, and can be annotated with the appropriate APIs to use instead. Since studies showed that programmers had difficulty finding the right classes from long lists in documentation, Jadeite takes advantage of usage statistics to display commonly used classes more prominently. Programmers had difficulty discovering how to instantiate objects, so Jadeite uses a large corpus of sample code to automatically the most common ways to construct an instance of any given class. An evaluation showed that programmers were about three times faster at performing common tasks with Jadeite than with standard Javadoc.
Identifying cross-cutting concerns using software repository mining
, 2009
"... Cross-cutting concerns are pieces of functionality that have not been captured into a separate module, thereby hindering program comprehension and maintainability. Solving these problems requires first identifying these cross-cutting concerns in pieces of software. Several methods for identification ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Cross-cutting concerns are pieces of functionality that have not been captured into a separate module, thereby hindering program comprehension and maintainability. Solving these problems requires first identifying these cross-cutting concerns in pieces of software. Several methods for identification have been proposed but the option of using software repository mining has largely been left unexplored. That technique can uncover relationships between modules that may not be present in the source code and thereby provide a different perspective on the cross-cutting concerns in a software system. We perform software repository mining on the repositories of two software systems for which the crosscutting concerns are known: JHotDraw and Tomcat. Based on the results of the evaluation, we make some suggestions for future directions in the area of identifying crosscutting concerns using software repository mining ★. 1.
What’s a Typical Commit? A Characterization of Open Source Software Repositories
"... The research examines the version histories of nine open source software systems to uncover trends and characteristics of how developers commit source code to version control systems (e.g., subversion). The goal is to characterize what a typical or normal commit looks like with respect to the number ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The research examines the version histories of nine open source software systems to uncover trends and characteristics of how developers commit source code to version control systems (e.g., subversion). The goal is to characterize what a typical or normal commit looks like with respect to the number of files, number of lines, and number of hunks committed together. The results of these three characteristics are presented and the commits are categorized from extra small to extra large. The findings show that approximately 75 % of commits are quite small for the systems examined along all three characteristics. Additionally, the commit messages are examined along with the characteristics. The most common words are extracted from the commit messages and correlated with the size categories of the commits. It is observed that sized categories can be indicative of the types of maintenance activities being performed. 1.
From Java to UpgradeJ: An empirical study
"... UpgradeJ is a variant of Java that offers linguistic support for lightweight dynamic software updating (DSU), or hotswapping. UpgradeJ allows co-existing multiple versions of classes and adapts Java’s type system to provide incremental typechecking. This paper provides some preliminary, but encourag ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
UpgradeJ is a variant of Java that offers linguistic support for lightweight dynamic software updating (DSU), or hotswapping. UpgradeJ allows co-existing multiple versions of classes and adapts Java’s type system to provide incremental typechecking. This paper provides some preliminary, but encouraging, results of an empirical study into the applicability of UpgradeJ. By analysing how classes in popular, open-source Java applications change from release to release, we are able to estimate the proportion of those changes that could be made dynamically in UpgradeJ. Although these applications were not designed with DSU in mind we find that many of the changes to classes could be supported by the UpgradeJ DSU model without any significant code rewriting. 1.
Software Intelligence: The Future of Mining Software Engineering Data ABSTRACT
"... Mining software engineering data has emerged as a successful research direction over the past decade. In this position paper, we advocate Software Intelligence (SI) as the future of mining software engineering data, within modern software engineering research, practice, and education. We coin the na ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Mining software engineering data has emerged as a successful research direction over the past decade. In this position paper, we advocate Software Intelligence (SI) as the future of mining software engineering data, within modern software engineering research, practice, and education. We coin the name SI as an inspiration from the Business Intelligence (BI) field, which offers concepts and techniques to improve business decision making by using fact-based support systems. Similarly, SI offers software practitioners (not just developers) up-to-date and pertinent information to support their daily decision-making processes. SI should support decisionmaking processes throughout the lifetime of a software system not just during its development phase. The vision of SI has yet to become a reality that would enable software engineering research to have a strong impact on modern software practice. Nevertheless, recent advances in the Mining Software Repositories (MSR) field show great promise and provide strong support for realizing SI in the near future. This position paper summarizes the state of practice and research of SI, and lays out future research directions for mining software engineering data to enable SI.
Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining
- EMPIRICAL SOFTWARE ENGINEERING
, 2011
"... Many software production processes advocate rigorous development testing alongside normal code writing, which implies that both test code and production code should co-evolve. To gain insight in the nature of this co-evolution, this paper proposes three views (realized by a tool called TeMo) that co ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Many software production processes advocate rigorous development testing alongside normal code writing, which implies that both test code and production code should co-evolve. To gain insight in the nature of this co-evolution, this paper proposes three views (realized by a tool called TeMo) that combine information from a software project's versioning system, the size of the various artifacts and the test coverage reports. We validate these views against two open source and one industrial software project and evaluate our results both with the help of log messages, code inspections and the original developers of the software system. With these views we could recognize different co-evolution scenarios (i.e., synchronous and phased) and make relevant observations for both developers as well as test engineers
The Ultimate Debian Database: Consolidating Bazaar Metadata for Quality Assurance and Data Mining
"... require a lot more complex infrastructures than most other FLOSS projects. In the case of community-driven distributions like Debian, the development of such an infrastructure is often not very organized, leading to new data sources being added in an impromptu manner while hackers set up new service ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
require a lot more complex infrastructures than most other FLOSS projects. In the case of community-driven distributions like Debian, the development of such an infrastructure is often not very organized, leading to new data sources being added in an impromptu manner while hackers set up new services that gain acceptance in the community. Mixing and matching data is then harder than should be, albeit being badly needed for Quality Assurance and data mining. Massive refactoring and integration is not a viable solution either, due to the constraints imposed by the bazaar development model. This paper presents the Ultimate Debian Database (UDD), 1 which is the countermeasure adopted by the Debian project to the above “data hell”. UDD gathers data from various data sources into a single, central SQL database, turning Quality Assurance needs that could not be easily implemented before into simple SQL queries. The paper also discusses the customs that have contributed to the data hell, the lessons learnt while designing UDD, and its applications and potentialities for data mining on FLOSS distributions. Keywords-open source; distribution; data warehouse; quality assurance; data mining I.
Investigating the Evolution of Bad Smells in Object-Oriented Code
"... Abstract — Software design problems are known and perceived under many different terms such as bad smells, flaws, noncompliance to design principles, violation of heuristics, excessive metric values and antipatterns, signifying the importance of handling them in the construction and maintenance of s ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — Software design problems are known and perceived under many different terms such as bad smells, flaws, noncompliance to design principles, violation of heuristics, excessive metric values and antipatterns, signifying the importance of handling them in the construction and maintenance of software. Once a design problem is identified, it can be removed by applying an appropriate refactoring, improving in most cases several aspects of quality such as maintainability, comprehensibility and reusability. This paper, taking advantage of recent advances and tools in the identification of non-trivial bad smells, explores the presence and evolution of such problems by analyzing past versions of code. Several interesting questions can be investigated such as whether the number of problems increases with the passage of software generations, whether problems vanish by time or only by targeted human intervention, whether bad smells occur in the course of evolution of a module or exist right from the beginning and whether refactorings targeting at smell removal are frequent. In contrast to previous studies that investigate the application of refactorings in the history of a software project, we attempt to study the subject from the point of view of the problems themselves distinguishing deliberate maintenance activities from the removal of design problems as a side effect of software evolution. Results are discussed for two open-source systems and three bad smells.

