Results 1 - 10
of
111
Statistical Comparisons of Classifiers over Multiple Data Sets
, 2006
"... While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but igno ..."
Abstract
-
Cited by 120 (0 self)
- Add to MetaCart
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.
The Prediction of Faulty Classes Using Object-Oriented Design Metrics
, 1999
"... Contemporary evidence suggests that most field faults in software applications are found in a smafi percentage of the software's components. This means that if these faulty software components can be detected early in the development project's life cycle, mitigating actions can be taken, such as a ..."
Abstract
-
Cited by 35 (2 self)
- Add to MetaCart
Contemporary evidence suggests that most field faults in software applications are found in a smafi percentage of the software's components. This means that if these faulty software components can be detected early in the development project's life cycle, mitigating actions can be taken, such as a redesign. For object-oriented applications, prediction models using design metrics can be used to identify faulty classes early on. In this paper we report on a study that used object-oriented design metrics to construct such prediction models. The study used data collected from one version of a commercial Java application for constructing a prediction model. The model was then validated on a subsequent release of the same application. Our results indicate that the prediction model has a high accuracy. Furthermore, we found that an export coupling metric had the strongest association with faultproneness, indicating a structural feature that may be symptomatic of a class with a high probability of latent faults.
Machine Learning Techniques for the Computer Security Domain of Anomaly Detection
, 2000
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xv 1 ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xv 1
An Internally Replicated Quasi-Experimental Comparison of Checklist and Perspective-based Reading of Code Documents
, 1999
"... The basic premise of software inspections is that they detect and remove defects before they propagate to subsequent development phases where their detection and correction cost escalates. To exploit their full potential, software inspections must call for a close and strict examination of the ins ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
The basic premise of software inspections is that they detect and remove defects before they propagate to subsequent development phases where their detection and correction cost escalates. To exploit their full potential, software inspections must call for a close and strict examination of the inspected artefact.
Time series data mining: Identifying temporal patterns for characterization and prediction of time series events
- Marquette University
, 1999
"... This work is dedicated to my wife, Christine, our son, Christopher, and his brother, who will arrive shortly. Acknowledgment I would like to thank Dr. Xin Feng for the encouragement, support, and direction he has provided during the past three years. His insightful suggestions, enthusiastic endorsem ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
This work is dedicated to my wife, Christine, our son, Christopher, and his brother, who will arrive shortly. Acknowledgment I would like to thank Dr. Xin Feng for the encouragement, support, and direction he has provided during the past three years. His insightful suggestions, enthusiastic endorsement, and shrewd proverbs have made the completion of this research possible. They provide an example to emulate. I owe a debt of gratitude to my committee members, Drs. Naveen Bansal, Ronald Brown, George Corliss, and James Heinen, who each have helped me to expand the breadth of my research by providing me insights into their areas of expertise. I am grateful to Marquette University for its financial support of this research, and the faculty of the Electrical and Computer Engineering Department for providing a rigorous and stimulating environment that exemplifies cura personalis. I thank Mark Palmer for many interesting, insightful, and thought provoking
An effective hybrid algorithm for university course timetabling
, 2006
"... The university course timetabling problem is an optimisation problem in which a set of events has to be scheduled in timeslots and located in suitable rooms. Recently, a set of benchmark instances was introduced and used for an ‘International Timetabling Competition’ to which 24 algorithms were subm ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
The university course timetabling problem is an optimisation problem in which a set of events has to be scheduled in timeslots and located in suitable rooms. Recently, a set of benchmark instances was introduced and used for an ‘International Timetabling Competition’ to which 24 algorithms were submitted by various research groups active in the field of timetabling. We describe and analyse a hybrid metaheuristic algorithm which was developed under the very same rules and deadlines imposed by the competition and outperformed the official winner. It combines various construction heuristics, tabu search, variable neighbourhood descent and simulated annealing. Due to the complexity of developing hybrid metaheuristics, we strongly relied on an experimental methodology for configuring the algorithms as well as for choosing proper parameter settings. In particular, we used racing procedures that allow an automatic or semi-automatic configuration of algorithms with a good save in time. Our successful example shows that the systematic design of hybrid algorithms through an experimental methodology leads to high performing algorithms for hard combinatorial optimisation problems.
Conditional random people: Tracking humans with crfs and grid filters
- In IEEE Conference on Computer Vision and Pattern Recognition (CVPR
, 2006
"... We describe a state-space tracking approach based on a Conditional Random Field (CRF) model, where the observation potentials are learned from data. We find functions that embed both state and observation into a space where similarity corresponds to L1 distance, and define an observation potential b ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
We describe a state-space tracking approach based on a Conditional Random Field (CRF) model, where the observation potentials are learned from data. We find functions that embed both state and observation into a space where similarity corresponds to L1 distance, and define an observation potential based on distance in this space. This potential is extremely fast to compute and in conjunction with a grid-filtering framework can be used to reduce a continuous state estimation problem to a discrete one. We show how a state temporal prior in the grid-filter can be computed in a manner similar to a sparse HMM, resulting in real-time system performance. The resulting system is used for human pose tracking in video sequences. 1
Effective pre-retrieval query performance prediction using similarity and variability evidence
, 2008
"... Query performance prediction aims to estimate the quality of answers that a search system will return in response to a particular query. In this paper we propose a new family of pre-retrieval predictors based on information at both the collection and document level. Pre-retrieval predictors are im ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Query performance prediction aims to estimate the quality of answers that a search system will return in response to a particular query. In this paper we propose a new family of pre-retrieval predictors based on information at both the collection and document level. Pre-retrieval predictors are important because they can be calculated from information that is available at indexing time; they are therefore more efficient than predictors that incorporate information obtained from actual search results. Experimental evaluation of our approach shows that the new predictors give more consistent performance than previously proposed pre-retrieval methods across a variety of data types and search tasks.
An information-theoretic approach to detecting changes in multi-dimensional data streams
- In Proc. Symp. on the Interface of Statistics, Computing Science, and Applications
, 2006
"... Abstract An important problem in processing large data streams is detecting changes in the underly-ing distribution that generates the data. The challenge in designing change detection schemes is making them general, scalable, and statistically sound. In this paper, we take a general,information-the ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Abstract An important problem in processing large data streams is detecting changes in the underly-ing distribution that generates the data. The challenge in designing change detection schemes is making them general, scalable, and statistically sound. In this paper, we take a general,information-theoretic approach to the change detection problem, which works for multidimensional as well as categorical data. We use relative entropy, also called the Kullback-Leiblerdistance, to measure the difference between two given distributions. The KL-distance is known to be related to the optimal error in determining whether the two distributions are the sameand draws on fundamental results in hypothesis testing. The KL-distance also generalizes traditional distance measures in statistics, and has invariance properties that make it ideally suitedfor comparing distributions. Our scheme is general; it is nonparametric and requires no assumptions on the underlyingdistributions. It employs a statistical inference procedure based on the theory of bootstrapping, which allows us to determine whether our measurements are statistically significant. The schemeis also quite flexible from a practical perspective; it can be implemented using any spatial partitioning scheme that scales well with dimensionality. In addition to providing change detections,our method generalizes Kulldorff's spatial scan statistic, allowing us to quantitatively identify specific regions in space where large changes have occurred.We provide a detailed experimental study that demonstrates the generality and efficiency of our approach with different kinds of multidimensional datasets, both synthetic and real. 1 Introduction We are collecting and storing data in unprecedented quantities and varieties--streams, images, audio, text, metadata descriptions, and even simple numbers. Over time, these data streams change as the underlying processes that generate them change. Some changes are spurious and pertain to glitches in the data. Some are genuine, caused by changes in the underlying distributions. Some changes are gradual and some are more precipitous. We would like to detect changes in a variety of settings:
A Systematic Review of the Application and Empirical Investigation of Search-based Test-Case Generation
, 2008
"... Metaheuristic search techniques have been extensively used to automate the process of generating test cases and thus providing solutions for a more cost-effective testing process. This approach to test automation, often coined as “Search-based Software Testing” (SBST), has been used for a wide varie ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Metaheuristic search techniques have been extensively used to automate the process of generating test cases and thus providing solutions for a more cost-effective testing process. This approach to test automation, often coined as “Search-based Software Testing” (SBST), has been used for a wide variety of test case generation purposes. Since SBST techniques are heuristic by nature, they must be empirically investigated in terms of how costly and effective they are at reaching their test objectives and whether they scale up to realistic development artifacts. However, approaches to empirically study SBST techniques have shown wide variation in the literature. This paper presents the results of a systematic, comprehensive review that aims at characterizing how empirical studies have been designed to investigate SBST cost-effectiveness and what empirical evidence is available in the literature regarding SBST cost-effectiveness and scalability. We also provide a framework that drives the data collection process of this systematic review and can be the starting point of guidelines on how SBST techniques can be empirically assessed. The intent is to aid future researchers doing empirical studies in SBST by providing an unbiased view of the body of empirical evidence and by guiding them in performing well designed and executed empirical studies references.

