Recovering software architecture from the names of source files (1999)
| Venue: | In Proc. Working Conf. on Reverse Engineering |
| Citations: | 23 - 2 self |
BibTeX
@INPROCEEDINGS{Anquetil99recoveringsoftware,
author = {Nicolas Anquetil and Timothy C. Lethbridge},
title = {Recovering software architecture from the names of source files},
booktitle = {In Proc. Working Conf. on Reverse Engineering},
year = {1999},
pages = {201--221}
}
Years of Citing Articles
OpenURL
Abstract
We discuss how to extract a useful set of subsystems from a set of source-code file names. This problem is challenging because, in many legacy systems, there are thousands of files with names that are very short and cryptic. At the same time the problem is important because software engineers often find it difficult to understand such systems. We propose a general algorithm to cluster files based on their names, and a set of alternative methods for implementing the algorithm. One of the key tasks is picking candidate words that we will try to identify in file names. We do this by a) iteratively decomposing file names, b) finding common substrings, and c) choosing words in routine names, in an English dictionary or in source code comments. In addition, we investigate generating abbreviations from the candidate words in order to find matches in file names, as well as how to split file names into components given no word markers. To compare and evaluate our approaches, we present two experiments. The first compares the "concepts " found in each file name by each method to the results of manually decomposing file names. The second experiment compares automatically generated subsystems to subsystem examples proposed by experts. We conclude that two methods are most effective: Extracting concepts using common substrings, and extracting those concepts that relate to the names of routines in the files.







