Results 1 - 10
of
17
Semi-Automatically Extracting FAQs to Improve Accessibility of Software Development Knowledge
- in "ICSE - 34th International Conference on Software Engineering
, 2012
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Relating Requirements to Implementation via Topic Analysis: Do Topics Extracted from Requirements Make Sense to Managers and Developers?
"... Abstract—Large organizations like Microsoft tend to rely on formal requirements documentation in order to specify and design the software products that they develop. These documents are meant to be tightly coupled with the actual implementation of the features they describe. In this paper we evaluat ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Abstract—Large organizations like Microsoft tend to rely on formal requirements documentation in order to specify and design the software products that they develop. These documents are meant to be tightly coupled with the actual implementation of the features they describe. In this paper we evaluate the value of high-level topic-based requirements traceability in the version control system, using Latent Dirichlet Allocation (LDA). We evaluate LDA topics on practitioners and check if the topics and trends extracted matches the perception that Program Managers and Developers have about the effort put into addressing certain topics. We found that effort extracted from version control that was relevant to a topic often matched the perception of the managers and developers of what occurred at the time. Furthermore we found evidence that many of the identified topics made sense to practitioners and matched their perception of what occurred. But for some topics, we found that practitioners had difficulty interpreting and labelling them. In summary, we investigate the high-level traceability of requirements topics to version control commits via topic analysis and validate with the actual stakeholders the relevance of these topics extracted from requirements. Keywords-latent Dirichlet allocation (LDA); requirements; version control; traceability; topics; requirements engineering I.
A contextual approach towards more accurate duplicate bug report detection
- In Proc. the 10th MSR
, 2013
"... Abstract—Bug-tracking and issue-tracking systems tend to be populated with bugs, issues, or tickets written by a wide variety of bug reporters, with different levels of training and knowledge about the system being discussed. Many bug reporters lack the skills, vocabulary, knowledge, or time to effi ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Bug-tracking and issue-tracking systems tend to be populated with bugs, issues, or tickets written by a wide variety of bug reporters, with different levels of training and knowledge about the system being discussed. Many bug reporters lack the skills, vocabulary, knowledge, or time to efficiently search the issue tracker for similar issues. As a result, issue trackers are often full of duplicate issues and bugs, and bug triaging is time consuming and error prone. Many researchers have approached the bug-deduplication problem using off-the-shelf information-retrieval tools, such as BM25F used by Sun et al. In our work, we extend the state of the art by investigating how contextual information, relying on our prior knowledge of software quality, software architecture, and system-development (LDA) topics, can be exploited to im-prove bug-deduplication. We demonstrate the effectiveness of our contextual bug-deduplication method on the bug repository of the Android ecosystem. Based on this experience, we conclude that researchers should not ignore the context of software engineering when using IR tools for deduplication. Index Terms—duplicate bug reports; triaging; textual simi-larity; contextual information; machine learning; information retrieval; deduplication I.
NL-based Query Refinement and Contextualized Code Search Results: A User Study
"... Abstract—As software systems continue to grow and evolve, lo-cating code for software maintenance tasks becomes increasingly difficult. Source code search tools match a developer’s keyword-style or natural language query with comments and identifiers in the source code to identify relevant methods t ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract—As software systems continue to grow and evolve, lo-cating code for software maintenance tasks becomes increasingly difficult. Source code search tools match a developer’s keyword-style or natural language query with comments and identifiers in the source code to identify relevant methods that may need to be changed or understood to complete the maintenance task. In this search process, the developer faces a number of challenges: (1) formulating a query, (2) determining if the results are relevant, and (3) if the results are not relevant, reformulating the query. In this paper, we present a NL-based results view for searching source code for maintenance that helps address these challenges by integrating multiple feedback mechanisms into the search results view: prevalence of the query words in the result set, results grouped by NL-based information, as a result list, and suggested alternative query words. Our search technique is implemented as an Eclipse plug-in, CONQUER, and has been empirically validated by 18 Java developers. Our results show that users prefer CONQUER over a state of the art search technique, requesting customization of the interface in future query reformulation techniques. Index Terms—feature location, source code search, software maintenance I.
A New Family of Software Anti-Patterns: Linguistic Anti-Patterns
"... Abstract—Recent and past studies have shown that poor source code lexicon negatively affects software understandability, maintainability, and, overall, quality. Besides a poor usage of lexicon and documentation, sometimes a software artifact description is misleading with respect to its implementati ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Recent and past studies have shown that poor source code lexicon negatively affects software understandability, maintainability, and, overall, quality. Besides a poor usage of lexicon and documentation, sometimes a software artifact description is misleading with respect to its implementation. Consequently, developers will spend more time and effort when understanding these software artifacts, or even make wrong assumptions when they use them. This paper introduces the definition of software linguistic antipatterns, and defines a family of them, i.e., those related to inconsistencies (i) between method signatures, documentation, and behavior and (ii) between attribute names, types, and comments. Whereas “design ” antipatterns represent recurring, poor design choices, linguistic antipatterns represent recurring, poor naming and commenting choices. The paper provides a first catalogue of one family of linguistic antipatterns, showing real examples of such antipatterns and explaining what kind of misunderstanding they can cause. Also, the paper proposes a dectector prototype for Java programs called LAPD (Linguistic Anti-Pattern Detector), and reports a study investigating the presence of linguistic antipatterns in four Java software projects. Keywords-Software antipatterns, Source code lexicon, Textual analysis of software artifacts.
The MSR Cookbook Mining a Decade of Research
"... community has grown significantly since the first MSR workshop was held in 2004. As the community continues to broaden its scope and deepens its expertise, it is worthwhile to reflect on the best practices that our community has developed over the past decade of research. We identify these best prac ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
community has grown significantly since the first MSR workshop was held in 2004. As the community continues to broaden its scope and deepens its expertise, it is worthwhile to reflect on the best practices that our community has developed over the past decade of research. We identify these best practices by surveying past MSR conferences and workshops. To that end, we review all 117 full papers published in the MSR proceedings between 2004 and 2012. We extract 268 comments from these papers, and categorize them using a grounded theory methodology. From this evaluation, four high-level themes were identified: data acquisition and preparation, synthesis, analysis, and sharing/replication. Within each theme we identify several common recommendations, and also examine how these recommendations have evolved over the past decade. In an effort to make this survey a living artifact, we also provide a public forum that contains the extracted recommendations in the hopes that the MSR community can engage in a continuing discussion on our evolving best practices. I.
Engineering Big Data Solutions
- in Proc. of FOSE’14, 2014
"... Structured and unstructured data in operational support tools have long been prevalent in software engineering. Simi-lar data is now becoming widely available in other domains. Software systems that utilize such operational data (OD) to help with software design and maintenance activities are increa ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Structured and unstructured data in operational support tools have long been prevalent in software engineering. Simi-lar data is now becoming widely available in other domains. Software systems that utilize such operational data (OD) to help with software design and maintenance activities are increasingly being built despite the difficulties of drawing valid conclusions from disparate and low-quality data and the continuing evolution of operational support tools. This paper proposes systematizing approaches to the engineering of OD-based systems. To prioritize and structure research areas we consider historic developments, such as big data hype; synthesize defining features of OD, such as confounded measures and unobserved context; and discuss emerging new applications, such as diverse and large OD collections and extremely short development intervals. To sustain the cred-ibility of OD-based systems more research will be needed to investigate effective existing approaches and to synthesize novel, OD-specific engineering principles.
Using Heuristics to Estimate an Appropriate Number of Latent Topics in Source Code Analysis
"... Latent Dirichlet Allocation (LDA) is a data clustering algorithm that performs especially well for text documents. In natural-language applications it automatically finds groups of related words (called “latent topics”) and clusters the documents into sets that are about the same “topic”. LDA has al ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Latent Dirichlet Allocation (LDA) is a data clustering algorithm that performs especially well for text documents. In natural-language applications it automatically finds groups of related words (called “latent topics”) and clusters the documents into sets that are about the same “topic”. LDA has also been applied to source code, where the documents are natural source code units such as methods or classes, and the words are the keywords, operators, and programmer-defined names in the code. The problem of determining a topic count that most appropriately describes a set of source code documents is an open problem. We address this empirically by constructing clusterings with different numbers of topics for a large number of software systems, and then use a pair of measures based on source code locality and topic model similarity to assess how well the topic structure identifies related source code units. Results suggest that the topic count required can be closely approximated using the number of software code fragments in the system. We extend these results to recommend appropriate topic counts for arbitrary software systems based on an analysis of a set of open source systems.
Latent Dirichlet Allocation: Extracting Topics from Software Engineering Data
"... ..."
(Show Context)
Author manuscript, published in "ICSE- 34th International Conference on Software Engineering (2012)" DOI: 10.1109/ICSE.2012.6227139 Semi-automatically Extracting FAQs to Improve Accessibility of Software Development Knowledge
, 2012
"... Abstract—Frequently asked questions (FAQs) are a popular way to document software development knowledge. As creating such documents is expensive, this paper presents an approach for automatically extracting FAQs from sources of software development discussion, such as mailing lists and Internet foru ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Frequently asked questions (FAQs) are a popular way to document software development knowledge. As creating such documents is expensive, this paper presents an approach for automatically extracting FAQs from sources of software development discussion, such as mailing lists and Internet forums, by combining techniques of text mining and natural language processing. We apply the approach to popular mailing lists and carry out a survey among software developers to show that it is able to extract high-quality FAQs that may be further improved by experts. I.