Results 1 - 10
of
23
Feature Location in Source Code: A Taxonomy and Survey
- Journal of Software Maintenance and Evolution: Research and Practice
, 2011
"... Feature location is the activity of identifying an initial location in the source code that implements functionality in a software system. Many feature location techniques have been introduced that automate some or all of this process, and a comprehensive overview of this large body of work would be ..."
Abstract
-
Cited by 80 (14 self)
- Add to MetaCart
Feature location is the activity of identifying an initial location in the source code that implements functionality in a software system. Many feature location techniques have been introduced that automate some or all of this process, and a comprehensive overview of this large body of work would be beneficial to researchers and practitioners. This paper presents a systematic literature survey of feature location techniques. Eighty-nine articles from 25 venues have been reviewed and classified within the taxonomy in order to organize and structure existing work in the field of feature location. The paper also discusses open issues and defines future directions in the field of feature location.
MAPO: Mining and recommending API usage patterns
- In European Conference on Object-Oriented Programming (ECOOP
, 2009
"... Abstract. To improve software productivity, when constructing new software systems, programmers often reuse existing libraries or frameworks by invoking methods provided in their APIs. Those API methods, however, are often complex and not well documented. To get familiar with how those API methods a ..."
Abstract
-
Cited by 73 (10 self)
- Add to MetaCart
(Show Context)
Abstract. To improve software productivity, when constructing new software systems, programmers often reuse existing libraries or frameworks by invoking methods provided in their APIs. Those API methods, however, are often complex and not well documented. To get familiar with how those API methods are used, programmers often exploit a source code search tool to search for code snippets that use the API methods of interest. However, the returned code snippets are often large in number, and the huge number of snippets places a barrier for programmers to locate useful ones. In order to help programmers overcome this barrier, we have developed an API usage mining framework and its supporting tool called MAPO (Mining API usage Pattern from Open source repositories) for mining API usage patterns automatically. A mined pattern describes that in a certain usage scenario, some API methods are frequently called together and their usages follow some sequential rules. MAPO further recommends the mined API usage patterns and their associated code snippets upon programmers ’ requests. Our experimental results show that with these patterns MAPO helps programmers locate useful code snippets more effectively than two state-of-the-art code search tools. To investigate whether MAPO can assist programmers in programming tasks, we further conducted an empirical study. The results show that using MAPO, programmers produce code with fewer bugs when facing relatively complex API usages, comparing with using the two state-of-the-art code search tools. 1
Codebook: Discovering and Exploiting Relationships in Software Repositories
"... Large-scale software engineering requires communication and collaboration to successfully build and ship products. We conducted a survey with Microsoft engineers on inter-team coordination and found that the most impactful problems concerned finding and keeping track of other engineers. Since engine ..."
Abstract
-
Cited by 64 (9 self)
- Add to MetaCart
(Show Context)
Large-scale software engineering requires communication and collaboration to successfully build and ship products. We conducted a survey with Microsoft engineers on inter-team coordination and found that the most impactful problems concerned finding and keeping track of other engineers. Since engineers are connected by their shared work, a tool that discovers connections in their work-related repositories can help. Here we describe the Codebook framework for mining software repositories. It is flexible enough to address all of the problems identified by our survey with a single data structure (graph of people and artifacts) and a single algorithm (regular language reachability). Codebook handles a larger variety of problems than prior work, analyzes more kinds of work artifacts, and can be customized by and for end-users. To evaluate our framework’s flexibility, we built two applications, Hoozizat and Deep Intellisense. We evaluated these applications with engineers to show effectiveness in addressing multiple inter-team coordination problems. Categories and Subject Descriptors:
On the Naturalness of Software
"... Abstract—Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily lif ..."
Abstract
-
Cited by 48 (8 self)
- Add to MetaCart
(Show Context)
Abstract—Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension. We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations—and thus, like natural language, it is also likely to be repetitive and predictable. We then proceed to ask whether a) code can be usefully modeled by statistical language models and b) such models can be leveraged to support software engineers. Using the widely adopted n-gram model, we provide empirical evidence supportive of a positive answer to both these questions. We show that code is also very repetitive, and in fact even more so than natural languages. As an example use of the model, we have developed a simple code completion engine for Java that, despite its simplicity, already improves Eclipse’s completion capability. We conclude the paper by laying out a vision for future research in this area. Keywords-language models; n-gram; nature language processing; code completion; code suggestion I.
Debugadvisor: A recommender system for debugging
- In Proceedings of ESEC/FSE ’09
, 2009
"... In large software development projects, when a programmer is assigned a bug to fix, she typically spends a lot of time searching (in an ad-hoc manner) for instances from the past where similar bugs have been debugged, analyzed and resolved. Systematic search tools that allow the programmer to expres ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
(Show Context)
In large software development projects, when a programmer is assigned a bug to fix, she typically spends a lot of time searching (in an ad-hoc manner) for instances from the past where similar bugs have been debugged, analyzed and resolved. Systematic search tools that allow the programmer to express the context of the current bug, and search through diverse data repositories associated with large projects can greatly improve the productivity of debugging. This paper presents the design, implementation and experience from such a search tool called DebugAdvisor. The context of a bug includes all the information a programmer has about the bug, including natural language text, textual rendering of core dumps, debugger output etc. Our key insight is to allow the programmer to collate this entire context as a query to search for related information. Thus, DebugAdvisor allows the programmer to search using a fat query, which could be kilobytes of structured and unstructured data describing the contextual information for the current bug. Information retrieval in the presence of fat queries and variegated data repositories, all of which contain a mix of structured and unstructured data is a challenging problem. We present novel ideas to solve this problem. We have deployed DebugAdvisor to over 100 users inside Microsoft. In addition to standard metrics such as precision and recall, we present extensive qualitative and quantitative feedback from our users.
Using Data Fusion and Web Mining to Support Feature Location in Software
- in Proceedings of 18th IEEE International Conference on Program Comprehension (ICPC'10
, 2010
"... Abstract—Data fusion is the process of integrating multiple sources of information such that their combination yields better results than if the data sources are used individually. This paper applies the idea of data fusion to feature location, the process of identifying the source code that impleme ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
(Show Context)
Abstract—Data fusion is the process of integrating multiple sources of information such that their combination yields better results than if the data sources are used individually. This paper applies the idea of data fusion to feature location, the process of identifying the source code that implements specific functionality in software. A data fusion model for feature location is presented which defines new feature location techniques based on combining information from textual, dynamic, and web mining analyses applied to software. A novel contribution of the proposed model is the use of advanced web mining algorithms to analyze execution information during feature location. The results of an extensive evaluation indicate that the new feature location techniques based on web mining improve the effectiveness of existing approaches by as much as 62%. Keywords-feature location; data fusion; information retrieval; dynamic analysis; web mining I.
Automated API Property Inference Techniques
- IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
, 2012
"... Frameworks and libraries offer reusable and customizable functionality through Application Programming Interfaces (APIs). Correctly using large and sophisticated APIs can represent a challenge due to hidden assumptions and requirements. Numerous approaches have been developed to infer properties of ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Frameworks and libraries offer reusable and customizable functionality through Application Programming Interfaces (APIs). Correctly using large and sophisticated APIs can represent a challenge due to hidden assumptions and requirements. Numerous approaches have been developed to infer properties of APIs, intended to guide their use by developers. With each approach come new definitions of API properties, new techniques for inferring these properties, and new ways to assess their correctness and usefulness. This paper provides a comprehensive survey of over a decade of research on automated property inference for APIs. Our survey provides a synthesis of this complex technical field along different dimensions of analysis: properties inferred, mining techniques, and empirical results. In particular, we derive a classification and organization of over 60 techniques into five different categories based on the type of API property inferred: unordered usage patterns, sequential usage patterns, behavioral specifications, migration mappings, and general information.
Exemplar: A source code search engine for finding highly relevant applications
- IEEE Transactions on Software Engineering
, 2011
"... Abstract—A fundamental problem of finding software applications that are highly relevant to development tasks is the mismatch between the high-level intent reflected in the descriptions of these tasks and low-level implementation details of applications. To reduce this mismatch we created an approac ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
(Show Context)
Abstract—A fundamental problem of finding software applications that are highly relevant to development tasks is the mismatch between the high-level intent reflected in the descriptions of these tasks and low-level implementation details of applications. To reduce this mismatch we created an approach called Exemplar (EXEcutable exaMPLes ARchive) for finding highly relevant software projects from large archives of applications. After a programmer enters a natural-language query that contains high-level concepts (e.g., MIME, data sets), Exemplar retrieves applications that implement these concepts. Exemplar ranks applications in three ways. First, we consider the descriptions of applications. Second, we examine the Application Programming Interface (API) calls used by applcations. Third, we analyze the dataflow among those API calls. We performed two case studies (with professional and student developers) to evaluate how these three rankings contribute to the quality of the search results from Exemplar. The results of our studies show that the combined ranking of application descriptions and API documents yields the most-relevant search results. We released Exemplar and our case study data to the public. Index Terms—Source code search engines, information retrieval, concept location, open source software, mining software repositories, software reuse. 1
Automatically Recommending Triage Decisions for Pragmatic Reuse Tasks
"... Abstract—Planning a complex software modification task imposes a high cognitive burden on developers, who must juggle navigating the software, understanding what they see with respect to their task, and deciding how their task should be performed given what they have discovered. Pragmatic reuse task ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Planning a complex software modification task imposes a high cognitive burden on developers, who must juggle navigating the software, understanding what they see with respect to their task, and deciding how their task should be performed given what they have discovered. Pragmatic reuse tasks, where source code is reused in a white-box fashion, is an example of a complex and error-prone modification task: the developer must plan out which portions of a system to reuse, extract the code, and integrate it into their own system. In this paper we present a recommendation system that automates some aspects of the planning process undertaken by developers during pragmatic reuse tasks. In a retroactive evaluation, we demonstrate that our technique was able to provide the correct recommendation 64 % of the time and was incorrect 25 % of the time. Our case study suggests that developer investigative behaviour is positively influenced by the use of the recommendation system. Keywords—pragmatic software reuse tasks; triage decisions; recommendation systems; structural relevance; cost; source code analysis; retroactive evaluation. I.
Understanding Interaction Differences between Newcomer and Expert Programmers
"... Newcomer and expert programmers often interact with development artifacts differently. Ideally, software development tools should support these different styles of work. In this paper, we describe our investigations into the interaction difference between newcomers and experts, regarding two propert ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Newcomer and expert programmers often interact with development artifacts differently. Ideally, software development tools should support these different styles of work. In this paper, we describe our investigations into the interaction difference between newcomers and experts, regarding two properties that characterize repetition of programmer interaction: temporal locality and interaction coupling recurrence. We describe our approach, research questions and planned methodology.