## An approach to classify software maintenance requests (2002)

### Cached

### Download Links

- [rcost.unisannio.it]
- [www.rcost.unisannio.it]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proc., International Conference on Software Maintenance (ICSM |

Citations: | 21 - 1 self |

### BibTeX

@INPROCEEDINGS{Lucca02anapproach,

author = {G. A. Di Lucca and M. Di Penta and S. Gradara},

title = {An approach to classify software maintenance requests},

booktitle = {In Proc., International Conference on Software Maintenance (ICSM},

year = {2002},

pages = {93--102},

publisher = {IEEE Computer Society}

}

### Years of Citing Articles

### OpenURL

### Abstract

When a software system critical for an organization exhibits a problem during its operation, it is relevant to fix it in a short period of time, to avoid serious economical losses. The problem is therefore noticed to the organization having in charge the maintenance, and it should be correctly and quickly dispatched to the right maintenance team. We propose to automatically classify incoming maintenance requests (also said tickets), routing them to specialized maintenance teams. The final goal is to develop a router, working around the clock, that, without human intervention, dispatches incoming tickets with the lowest misclassification error, measured with respect to a given routing policy. 6000 maintenance tickets from a large, multi-site, software system, spanning about two years of system in-field operation, were used to compare and assess the accuracy of different classification approaches (i.e., Vector Space model, Bayesian model, support vectors, classification trees and k-nearest neighbor classification). The application and the tickets were divided into eight areas and pre-classified by human experts. Preliminary results were encouraging, up to 84 % of the incoming tickets were correctly classified.

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ... and queries as vectors; documents are ranked against queries by computing a distance function between the corresponding vectors. Support Vector Machine, a new learning method introduced by V. Vapnik =-=[18]-=-, finds the optimal linear hyperplane such that the expected classification error for unseen test sample is minimized. Classification And Regression Trees is a non-parametric technique that can select... |

3909 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...e likely to find a linear hyperplane in the high dimensional feature space. 2.4 Classification And Regression Trees The Classification and Regression Trees (CART) method, introduced by Breiman et al. =-=[4]-=-, is a binary recursive method that yields a class of models called tree-based models. CART is an alternative to linear and additive models for regression and classification problems. It is interestin... |

1697 | Text Categorization with Support Vector Machines: Learning with Many Relevant Features
- Joachims
- 1998
(Show Context)
Citation Context ...atic learning algorithms for text categorization in terms of learning speed and classification accuracy; in particular, they highlighted SVM advantages in terms of accuracy and performances. Joachims =-=[11]-=- performed a set of binary text classification experiments using the SVM and comparing it with other classification techniques. 8. Conclusions We proposed an approach, inspired by intelligent active a... |

1527 | Term-weighting approaches in automatic text retrieval
- Salton, Buckley
(Show Context)
Citation Context ...ccurs in the tickets¦¡ , or 0 otherwise; in other cases more complex measures are constructed according to the frequency of the terms in the documents. We use a well known IR metric called ��� -��©�� =-=[16]-=-. According to this metric, the� -th element© ¡��� is derived from the term frequency��� ¡��� of the� -th term in the tickets¡ and the inverse document frequency ��©�� � of the term over the entire se... |

504 | Inductive learning algorithms and representations for text categorization
- Dumais, Platt, et al.
- 1998
(Show Context)
Citation Context ...ed a new machine learning algorithm, CDM, designed specifically for text categorization, and having better performances than Bayesian classification and decision tree learners. In 1998, Dumais et al. =-=[8]-=- compared different automatic learning algorithms for text categorization in terms of learning speed and classification accuracy; in particular, they highlighted SVM advantages in terms of accuracy an... |

267 | A comparison of two learning algorithms for text categorization
- Lewis, Ringuette
- 1994
(Show Context)
Citation Context ...quence case of feedback lems related to classify and extract information from text. Literature reports some relevant research in the area of automated text categorization: in 1994 Lewis and Ringuette =-=[12]-=- presented empirical results about the performance of a Bayesian classifier and a decision tree learning algorithm on two text categorization data sets, finding that both algorithms achieve reasonable... |

228 |
The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression
- Witten, Bell
- 1991
(Show Context)
Citation Context ...ever, using the simple term frequency would turn the products¢¡�¤£¦¥ ������� � ��s¡ � to zero, whenever any word � � is not present in the tickets¡ . This problem, known as the zero-frequency problem =-=[19]-=-, can be avoided using different approaches (for more details see [6, 15]). 2.2. Vector Space IR model Vector Space IR models map each incoming ticket and each query (i.e., tickets in the training set... |

134 |
Identifying reasons for software changes using historic databases
- Mockus, Votta
- 2000
(Show Context)
Citation Context ...ng inside each request not independent each one from another, i.e., modeling� � §�� ¢¡ £¢ . Classification using SVM, k-nearest neighbor and CART confirmed that, as also stated by Votta and Mockus in =-=[14]-=-, a request can be successfully classified using few, discriminating words. However, our approach differs from that of [14], in that we could not apply a keyword-based criterion: even if we experience... |

72 |
Ranking algorithms
- Harman
- 1992
(Show Context)
Citation Context ... using different approaches (for more details see [6, 15]). 2.2. Vector Space IR model Vector Space IR models map each incoming ticket and each query (i.e., tickets in the training set) onto a vector =-=[10]-=-. In our case, each element of the vector corresponds to a word (or term) in a vocabulary extracted from the tickets themselves. If �§ � is the size of the vocabulary, then the vector ¨© ¡��¥���© ¡���... |

47 |
Introduction to Probability and Mathematical Statistics
- Bain, Engelhardt
- 1992
(Show Context)
Citation Context ... the ranking scores as the probability that a tickets¢¡ is related to a class of tickets (see Section 4) (that is the query £ ). ¤¦¥¨§©¥�������¥¨����������������� ����������� ��� Applying Bayes’ rule =-=[3]-=-, the above conditioned probability can be transformed in: ����������� ����� �����¨������������������¨� �����¨��� For a given class of ticket, ������£�� is a constant and we can further simplify the m... |

31 |
Cross-validatory Choice and Assesment of Statistical Predictions (with discussion
- Stone
- 1976
(Show Context)
Citation Context ...rformances obtained with the availability of maintainer’s feedback. 5.1. Comparison of the Classification Methods In order to compare performances of the different methods, a 10-fold cross-validation =-=[17]-=- for the five models was performed on the entire pool of tickets (6000). Percentages of correct classification (Table 1) and paired t-test results (Table 2) show how Probabilistic model, k-nearest nei... |

28 |
Spoken Dialogues with Computers
- Mori
- 1998
(Show Context)
Citation Context ...all tickets have the same probability. Therefore, for a given class of tickets £ , all incoming ticketss�¡ are ranked by the conditioned probabilities ������£��s¢¡�� . Under quite general assumptions =-=[6, 15]-=- and considering a unigram approximation, that is, all words ��� in the ticket are independent, the following expression is obtained: � �������������¨��� � � ��� ¤¦¥¨§©¥�������¥�������� � ������������... |

22 |
On smoothing techniques for bigram-based natural language modelling
- Ney, Essen
- 1991
(Show Context)
Citation Context ...all tickets have the same probability. Therefore, for a given class of tickets £ , all incoming ticketss�¡ are ranked by the conditioned probabilities ������£��s¢¡�� . Under quite general assumptions =-=[6, 15]-=- and considering a unigram approximation, that is, all words ��� in the ticket are independent, the following expression is obtained: � �������������¨��� � � ��� ¤¦¥¨§©¥�������¥�������� � ������������... |

11 |
CDM: An Approach to Learning in Text Categorization
- Goldberg
- 1996
(Show Context)
Citation Context ...out the performance of a Bayesian classifier and a decision tree learning algorithm on two text categorization data sets, finding that both algorithms achieve reasonable performance. In 1996 Goldberg =-=[9]-=- empirically tested a new machine learning algorithm, CDM, designed specifically for text categorization, and having better performances than Bayesian classification and decision tree learners. In 199... |

10 |
std 1219: Standard for Software maintenance
- IEEE
- 1998
(Show Context)
Citation Context ...7. Related Work Several similarities exist between this paper and the work of Mockus and Votta [14]. In [14] maintenance requests were classified according to maintenance categories as IEEE Std. 1219 =-=[1]-=-, then rated on a fault-severity scale using a frequency-based method. On the other hand, we adopted a different classification schema, where severity rate is actually imposed by customers, while the ... |

6 |
Modeling software maintenance requests: A case study
- Burch, Kung
- 1997
(Show Context)
Citation Context ...ication schema, where severity rate is actually imposed by customers, while the matching of the ticket with the maintenance team is under the responsibility of the maintenance company. Burch and Kung =-=[5]-=- investigated into the changes of maintenance requests, during the lifetime of a large software application, by modeling the requests themselves. The necessity of an automatic dispatcher for maintenan... |

5 | Modeling web maintenance centers through queue models
- Penta, Casazza, et al.
- 2001
(Show Context)
Citation Context ...Correct Classifications (%) vention and the expertise of each team). The automatic dispatcher may also be integrated in a bug-reporting tool like Bug-Buddy or in an environment such that described in =-=[7]-=-, where maintenance requests for a large-use commercial or open-source product are posted via Internet. A comparison of the different classification methods highlights that a Probabilistic model perfo... |

2 | A queue theory-based approach to staff software maintenance centers
- Antoniol, Casazza, et al.
- 2001
(Show Context)
Citation Context ...maintenance process. Moreover, it may be considered as a part of a queueing network composed by multiple, parallel queues, one for each type of request (sophisticating therefore the model proposed in =-=[2]-=-, where a single-queue model was adopted to dispatch requests to maintenance teams independently from the type of interCorrect Classifications (%) vention and the expertise of each team). The automat... |