Results 1 - 10
of
42
Sourcerer: A search engine for open source code supporting structure-based search
- In Proc. Int’l Conf. Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’06
, 2006
"... sourcerer is a search engine for open source code that extracts fine-grained structural information from the code. This information is used both to implement a basic notion of code rank and to enable search forms that go beyond conventional keyword-based searches. sourcerer supports two types of sea ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
sourcerer is a search engine for open source code that extracts fine-grained structural information from the code. This information is used both to implement a basic notion of code rank and to enable search forms that go beyond conventional keyword-based searches. sourcerer supports two types of searches: (1) implementations, and their use; and (2) program structures. Several schemes were compared for ranking the results of code search. Results are reported involving 1,555 open source Java projects, corresponding to 254 thousand classes and 17 million LOCs. Of the schemes compared, the scheme that produced the best search results was one consisting of a combination of (a) the standard TF-IDF technique over Fully Qualified Names (FQNs) of code entities, with (b) a “boosting ” factor for terms found towards the right-most handside of FQNs, and (c) a composition with a graph-rank algorithm that identifies popular classes. 1
Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification
- in Proceedings of 14th IEEE International Conference on Program Comprehension (ICPC'06
, 2006
"... The paper recasts the problem of feature location in source code as a decision-making problem in the presence of uncertainty. The main contribution consists in the combination of two existing techniques for feature location in source code. Both techniques provide a set of ranked facts from the softw ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
The paper recasts the problem of feature location in source code as a decision-making problem in the presence of uncertainty. The main contribution consists in the combination of two existing techniques for feature location in source code. Both techniques provide a set of ranked facts from the software, as result to the feature identification problem. One of the techniques is based on a Scenario Based Probabilistic ranking of events observed while executing a program under given scenarios. The other technique is defined as an information retrieval task, based on the Latent Semantic Indexing of the source code. We show the viability and effectiveness of the combined technique with two case studies. A first case study is a replication of feature identification in Mozilla, which allows us to directly compare the results with previously published data. The other case study is a bug location problem in Mozilla. The results show that the combined technique improves feature identification significantly with respect to each technique used independently. * 1.
Static Techniques for Concept Location in Object-Oriented Code
- in Proceedings of 13th IEEE International Workshop on Program Comprehension (IWPC'05), 2005
, 2005
"... Concept location in source code is the process that identifies where a software system implements a specific concept. While it is well accepted that concept location is essential for the maintenance of complex procedural code like code written in C, it is much less obvious whether it is also needed ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
Concept location in source code is the process that identifies where a software system implements a specific concept. While it is well accepted that concept location is essential for the maintenance of complex procedural code like code written in C, it is much less obvious whether it is also needed for the maintenance of the Object-Oriented code. After all, the Object-Oriented code is structured into classes and well-designed classes already implement concepts, so the issue seems to be reduced to the selection of the appropriate class. The objective of our work is to see if the techniques for concept location are still needed (they are) and whether Object-Oriented structuring facilitates concept location (it does not). This paper focuses on static concept location techniques that share common prerequisites and are search the source code using regular expression matching, or static program dependencies, or information retrieval. The paper analyses these techniques to see how they compare to each other in terms of their respective strengths and weaknesses. 1.
Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval
- IEEE Trans. Software Eng
, 2007
"... Abstract—This paper recasts the problem of feature location in source code as a decision-making problem in the presence of uncertainty. The solution to the problem is formulated as a combination of the opinions of different experts. The experts in this work are two existing techniques for feature lo ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Abstract—This paper recasts the problem of feature location in source code as a decision-making problem in the presence of uncertainty. The solution to the problem is formulated as a combination of the opinions of different experts. The experts in this work are two existing techniques for feature location: a scenario-based probabilistic ranking of events and an information retrieval-based technique that uses latent semantic indexing. The combination of these two experts is empirically evaluated through several case studies, which use the source code of the Mozilla Web browser and the Eclipse integrated development environment. The results show that the combination of experts significantly improves the effectiveness of feature location when compared to each of the experts used independently. Index Terms—program understanding, feature identification, concept location, dynamic and static analyses, information retrieval, Latent Semantic Indexing, scenario-based probabilistic ranking, open source software.
Exploring the neighborhood with Dora to expedite software maintenance
- In 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE/ACM
, 2007
"... Completing software maintenance and evolution tasks for today’s large, complex software systems can be difficult, often requiring considerable time to understand the system well enough to make correct changes. Despite evidence that successful programmers use program structure as well as identifier n ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Completing software maintenance and evolution tasks for today’s large, complex software systems can be difficult, often requiring considerable time to understand the system well enough to make correct changes. Despite evidence that successful programmers use program structure as well as identifier names to explore software, most existing program exploration techniques use either structural or lexical identifier information. By using only one type of information, automated tools ignore valuable clues about a developer’s intentions—clues critical to the human program comprehension process. In this paper, we present and evaluate a technique that exploits both program structure and lexical information to help programmers more effectively explore programs. Our approach uses structural information to focus automated program exploration and lexical information to prune irrelevant structure edges from consideration. For the important program exploration step of expanding from a seed, our experimental results demonstrate that an integrated lexical- and structural-based approach is significantly more effective than a state-of-the-art structural program exploration technique.
Leveraged quality assessment using information retrieval techniques
- In 14th International Conference on Program Comprehension
, 2006
"... The goal of this research is to apply language processing techniques to extend human judgment into situations where obtaining direct human judgment is impractical due to the volume of information that must be considered. On aspect of this is leveraged quality assessments, which can be used to evalua ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
The goal of this research is to apply language processing techniques to extend human judgment into situations where obtaining direct human judgment is impractical due to the volume of information that must be considered. On aspect of this is leveraged quality assessments, which can be used to evaluate third-party coded subsystems, to track quality across the versions of a program, to assess the compression effort (and subsequent cost) required to make a change, and to identify parts of a program in need of preventative maintenance. A description of the QALP tool, its output from just under two million lines of code, and an experiment aimed at evaluating the tool’s use in leveraged quality assessment are presented. Statistically significant results from this experiment validate the use of the QALP tool in human leverage quality assessment. 1
Feature Location via Information Retrieval based Filtering of a Single Scenario Execution Trace
- in Automated Software Engineering (ASE 2007
, 2007
"... The paper presents a semi-automated technique for feature location in source code. The technique is based on combining information from two different sources: an execution trace, on one hand and the comments and identifiers from the source code, on the other hand. Users execute a single partial scen ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
The paper presents a semi-automated technique for feature location in source code. The technique is based on combining information from two different sources: an execution trace, on one hand and the comments and identifiers from the source code, on the other hand. Users execute a single partial scenario, which exercises the desired feature and all executed methods are identified based on the collected trace. The source code is indexed using Latent Semantic Indexing, an Information Retrieval method, which allows users to write queries relevant to the desired feature and rank all the executed methods based on their textual similarity to the query. Two case studies on open source software (JEdit and Eclipse) indicate that the new technique has high accuracy, comparable with previously published approaches and it is easy to use as it considerably simplifies the dynamic analysis.
Topology analysis of software dependencies
- ACM Transactions on Software Engineering and Methodology
"... Before performing a modification task, a developer usually has to investigate the source code of a system to understand how to carry out the task. Discovering the code relevant to a change task is costly because it is a human activity whose success depends on a large number of unpredictable factors, ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Before performing a modification task, a developer usually has to investigate the source code of a system to understand how to carry out the task. Discovering the code relevant to a change task is costly because it is a human activity whose success depends on a large number of unpredictable factors, such as intuition and luck. Although studies have shown that effective developers tend to explore a program by following structural dependencies, no methodology is available to guide their navigation through the thousands of dependency paths found in a nontrivial program. We describe a technique to automatically propose and rank program elements that are potentially interesting to a developer investigating source code. Our technique is based on an analysis of the topology of structural dependencies in a program. It takes as input a set of program elements of interest to a developer and produces a fuzzy set describing other elements of potential interest. Empirical evaluation of our technique indicates that it can help developers quickly select program elements worthy of investigation while avoiding less interesting ones.
Software Architecture Reconstruction: a Process-Oriented Taxonomy
, 2009
"... To maintain and understand large applications, it is important to know their architecture. The first problem is that unlike classes and packages, architecture is not explicitly represented in the code. The second problem is that successful applications evolve over time, so their architecture inevita ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
To maintain and understand large applications, it is important to know their architecture. The first problem is that unlike classes and packages, architecture is not explicitly represented in the code. The second problem is that successful applications evolve over time, so their architecture inevitably drifts. Reconstructing the architecture and checking whether it is still valid is therefore an important aid. While there is a plethora of approaches and techniques supporting architecture reconstruction, there is no comprehensive software architecture reconstruction state of the art and it is often difficult to compare the approaches. This article presents a state of the art in software architecture reconstruction approaches.
Source Code Exploration with Google
- Proc. of the 22nd Int-l Conf. on Softw. Maint., 2006
, 2006
"... The paper presents a new approach to source code exploration, which is the result of integrating the Google Desktop Search (GDS) engine into the Eclipse development environment. The resulting search engine, named Google Eclipse Search (GES), provides improved searching in Eclipse software projects. ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
The paper presents a new approach to source code exploration, which is the result of integrating the Google Desktop Search (GDS) engine into the Eclipse development environment. The resulting search engine, named Google Eclipse Search (GES), provides improved searching in Eclipse software projects. The paper advocates for a component-based approach that allows us to develop strong tools, which support various maintenance tasks, by leveraging the strengths of existing frameworks and components. The development effort for such tools is reduced, while customization and flexibility, to fully support user needs, is maintained. GES allows developers to search software projects in a manner similar to searching the internet or their own desktops. The proposed approach takes advantages of the power of GDS for quick and accurate searching and of Eclipse’s extensibility. The paper discusses usage scenarios, advantages, limitations, and possible extensions of the proposed tandem. 1.

