Results 11 - 20
of
31
Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace
- in Automated Software Engineering (ASE 2007
, 2007
"... The paper presents a semi-automated technique for feature location in source code. The technique is based on combining information from two different sources: an execution trace, on one hand and the comments and identifiers from the source code, on the other hand. Users execute a single partial scen ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
The paper presents a semi-automated technique for feature location in source code. The technique is based on combining information from two different sources: an execution trace, on one hand and the comments and identifiers from the source code, on the other hand. Users execute a single partial scenario, which exercises the desired feature and all executed methods are identified based on the collected trace. The source code is indexed using Latent Semantic Indexing, an Information Retrieval method, which allows users to write queries relevant to the desired feature and rank all the executed methods based on their textual similarity to the query. Two case studies on open source software (JEdit and Eclipse) indicate that the new technique has high accuracy, comparable with previously published approaches and it is easy to use as it considerably simplifies the dynamic analysis.
Mining Business Topics in Source Code using Latent Dirichlet Allocation ABSTRACT
"... One of the difficulties in maintaining a large software system is the absence of documented business domain topics and correlation between these domain topics and source code. Without such a correlation, people without any prior application knowledge would find it hard to comprehend the functionalit ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
One of the difficulties in maintaining a large software system is the absence of documented business domain topics and correlation between these domain topics and source code. Without such a correlation, people without any prior application knowledge would find it hard to comprehend the functionality of the system. Latent Dirichlet Allocation (LDA), a statistical model, has emerged as a popular technique for discovering topics in large text document corpus. But its applicability in extracting business domain topics from source code has not been explored so far. This paper investigates LDA in the context of comprehending large software systems and proposes a human assisted approach based on LDA for extracting domain topics from source code. This method has been applied on a number of open source and proprietary systems. Preliminary results indicate that LDA is able to identify some of the domain topics and is a satisfactory starting point for further manual refinement of topics.
Assigning Bug Reports using a Vocabulary-Based Expertise Model of Developers ∗
, 2009
"... For popular software systems, the number of daily submitted bug reports is high. Triaging these incoming reports is a time consuming task. Part of the bug triage is the assignment of a report to a developer with the appropriate expertise. In this paper, we present an approach to automatically sugges ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
For popular software systems, the number of daily submitted bug reports is high. Triaging these incoming reports is a time consuming task. Part of the bug triage is the assignment of a report to a developer with the appropriate expertise. In this paper, we present an approach to automatically suggest developers who have the appropriate expertise for handling a bug report. We model developer expertise using the vocabulary found in their source code contributions and compare this vocabulary to the vocabulary of bug reports. We evaluate our approach by comparing the suggested experts to the persons who eventually worked on the bug. Using eight years of Eclipse development as a case study, we achieve 33.6 % top-1 precision and 71.0 % top-10 recall.
Consistent Layout for Thematic Software Maps
, 2008
"... Software visualizations can provide a concise overview of a complex software system. Unfortunately, since software has no physical shape, there is no “natural ” mapping of software to a two-dimensional space. As a consequence most visualizations tend to use a layout in which position and distance ha ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Software visualizations can provide a concise overview of a complex software system. Unfortunately, since software has no physical shape, there is no “natural ” mapping of software to a two-dimensional space. As a consequence most visualizations tend to use a layout in which position and distance have no meaning, and consequently layout typical diverges from one visualization to another. We propose a consistent layout for software maps in which the position of a software artifact reflects its vocabulary, and distance corresponds to similarity of vocabulary. We use Latent Semantic Indexing (LSI) to map software artifacts to a vector space, and then use Multidimensional Scaling (MDS) to map this vector space down to two dimensions. The resulting consistent layout allows us to develop a variety of thematic software maps that express very different aspects of software while making it easy to compare them. The approach is especially suitable for comparing views of evolving software, since the vocabulary of software artifacts tends to be stable over time.
A Summary of
- the International Standard Date and Time Notation, http://www.cl.cam.ac.uk/mgk25/iso-time.html
"... Software Cartography: thematic software visualization with consistent layout ‡ ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Software Cartography: thematic software visualization with consistent layout ‡
Recommending library methods: An evaluation of the vector space model (vsm) and latent semantic indexing (lsi
- In 8th International Conference on Software Reuse
, 2006
"... Abstract. The development and maintenance of a reuse repository requires significant investment, planning and managerial support. To minimise risk and ensure a healthy return on investment, reusable components should be accessible, reliable and of a high quality. In this paper we concentrate on acce ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. The development and maintenance of a reuse repository requires significant investment, planning and managerial support. To minimise risk and ensure a healthy return on investment, reusable components should be accessible, reliable and of a high quality. In this paper we concentrate on accessability; we describe a technique which enables a developer to effectively and conveniently make use of large scale libraries. Unlike most previous solutions to component retrieval, our tool, RASCAL, is a proactive component recommender. RASCAL recommends a set of task-relevant reusable components to a developer. Recommendations are produced using Collaborative Filtering (CF). We compare and contrast CF effectiveness when using two information retrieval techniques, namely Vector Space Model (VSM) and Latent Semantic Indexing (LSI). We validate our technique on real world examples and find overall results are encouraging; notably, RASCAL can produce reasonably good recommendations when they are most valuable i.e., at an early stage in code development. 1
A Mutation / Injection-based Automatic Framework for Evaluating Clone Detection Tools
- In Mutation’09
, 2009
"... In recent years many methods and tools for software clone detection have been proposed. While some work has been done on assessing and comparing performance of these tools, very little empirical evaluation has been done. In particular, accuracy measures such as precision and recall have only been ro ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
In recent years many methods and tools for software clone detection have been proposed. While some work has been done on assessing and comparing performance of these tools, very little empirical evaluation has been done. In particular, accuracy measures such as precision and recall have only been roughly estimated, due both to problems in creating a validated clone benchmark against which tools can be compared, and to the manual effort required to hand check large numbers of candidate clones. In this paper we propose an automated method for empirically evaluating clone detection tools that leverages mutation-based techniques to overcome these limitations by automatically synthesizing large numbers of known clones based on an editing theory of clone creation. Our framework is effective in measuring recall and precision of clone detection tools for various types of fine-grained clones in real systems without manual intervention. 1.
A unified meta-model for concept-based reverse engineering
- In Proceedings of the 3rd International Workshop on Metamodels, Schemas, Grammars and Ontologies (ATEM’06
, 2006
"... Abstract. While programming is modeling the reality, reverse engineering is concerned with recovering it from the code. Parts of this reality can be formalized as concepts and relations among them. As previous research suggests, the identification of these concepts is a key issue in automating progr ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. While programming is modeling the reality, reverse engineering is concerned with recovering it from the code. Parts of this reality can be formalized as concepts and relations among them. As previous research suggests, the identification of these concepts is a key issue in automating program analysis. Their central role requires advance reverse engineering tasks to consider them first-class citizens. In this paper we unify the classical, structure-based reverse engineering meta-models with a meta-model describing concepts and their relations. Our unified meta-model establishes an explicit mapping between concepts and their implementations in a program. Instances of the meta-model are built in a semi-automatic manner by analyzing the program’s identifiers. Using this model allows us to raise the abstraction level by viewing the program from the perspective of concepts it implements. This enables a higher degree of automation in the reverse engineering endeavor.
How Programs Represent Reality (and how they don’t)
, 2006
"... Programming is modeling the reality. Most of the times, the mapping between source code and the real world concepts is captured implicitly in the names of identifiers. Making these mappings explicit enables us to regard programs from a conceptual perspective and thereby to detect semantic defects su ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Programming is modeling the reality. Most of the times, the mapping between source code and the real world concepts is captured implicitly in the names of identifiers. Making these mappings explicit enables us to regard programs from a conceptual perspective and thereby to detect semantic defects such as (logical) redundancies in the implementation of concepts and improper naming of program entities. We present real world examples of these problems found in the Java standard library and establish a formal framework that allows their concise classification. Based on this framework, we present our method for recovering the mappings between the code and the real world concepts expressed as ontologies. These explicit mappings enable semi-automatic identification of the discussed defect classes.
Automatic Labeling of Software Components and their Evolution using Log-Likelihood Ratio of Word Frequencies in Source Code
"... As more and more open-source software components become available on the internet we need automatic ways to label and compare them. For example, a developer who searches for reusable software must be able to quickly gain an understanding of retrieved components. This understanding cannot be gained a ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
As more and more open-source software components become available on the internet we need automatic ways to label and compare them. For example, a developer who searches for reusable software must be able to quickly gain an understanding of retrieved components. This understanding cannot be gained at the level of source code due to the semantic gap between source code and the domain model. In this paper we present a lexical approach that uses the log-likelihood ratios of word frequencies to automatically provide labels for software components. We present a prototype implementation of our labeling/comparison algorithm and provide examples of its application. In particular, we apply the approach to detect trends in the evolution of a software system. 1.

