Results 1  10
of
68
A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms
 ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... Many lazy learning algorithms are derivatives of the knearest neighbor (kNN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that kNN's performance is highly sensitive to the definition of its distance function. Many k ..."
Abstract

Cited by 144 (0 self)
 Add to MetaCart
Many lazy learning algorithms are derivatives of the knearest neighbor (kNN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that kNN's performance is highly sensitive to the definition of its distance function. Many kNN variants have been proposed to reduce this sensitivity by parameterizing the distance function with feature weights. However, these variants have not been categorized nor empirically compared. This paper reviews a class of weightsetting methods for lazy learning algorithms. We introduce a framework for distinguishing these methods and empirically compare them. We observed four trends from our experiments and conducted further studies to highlight them. Our results suggest that methods which use performance feedback to assign weight settings demonstrated three advantages over other methods: they require less preprocessing, perform better in the presence of interacting features, and generally require less training data to learn good settings. We also found that continuous weighting methods tend to outperform feature selection algorithms for tasks where some features are useful but less important than others.
The analysis of decomposition methods for support vector machines
 IEEE Transactions on Neural Networks
, 1999
"... Abstract. The decomposition method is currently one of the major methods for solving support vector machines. An important issue of this method is the selection of working sets. In this paper through the design of decomposition methods for boundconstrained SVM formulations we demonstrate that the w ..."
Abstract

Cited by 131 (20 self)
 Add to MetaCart
Abstract. The decomposition method is currently one of the major methods for solving support vector machines. An important issue of this method is the selection of working sets. In this paper through the design of decomposition methods for boundconstrained SVM formulations we demonstrate that the working set selection is not a trivial task. Then from the experimental analysis we propose a simple selection of the working set which leads to faster convergences for difficult cases. Numerical experiments on different types of problems are conducted to demonstrate the viability of the proposed method.
Incorporating Diversity in Active Learning with Support Vector Machines
 In ICML
, 2003
"... In many real world applications, active selection of training examples can significantly reduce the number of labelled training examples to learn a classification function. Different strategies in the field of support vector machines have been proposed that iteratively select a single new example fr ..."
Abstract

Cited by 104 (0 self)
 Add to MetaCart
(Show Context)
In many real world applications, active selection of training examples can significantly reduce the number of labelled training examples to learn a classification function. Different strategies in the field of support vector machines have been proposed that iteratively select a single new example from a set of unlabelled examples, query the corresponding class label and then perform retraining of the current classifier. However, to reduce computational time for training, it might be necessary to select batches of new training examples instead of single examples. Strategies for single examples can be extended straightforwardly to select batches by choosing the h> 1 examples that get the highest values for the individual selection criterion. We present a new approach that is especially designed to construct batches and incorporates a diversity measure. It has low computational requirements making it feasible for large scale problems with several thousands of examples. Experimental results indicate that this approach provides a faster method to attain a level of generalization accuracy in terms of the number of labelled examples. 1.
A Comparison of Dynamic and nonDynamic Rough Set Methods for Extracting Laws from Decision Tables
, 1998
"... We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor  see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), ..."
Abstract

Cited by 65 (6 self)
 Add to MetaCart
We report results of experiments on several data sets, in particular: Monk's problems data (see [58]), medical data (lymphography, breast cancer, primary tumor  see [30]) and StatLog's data (see [32]). We compare standard methods for extracting laws from decision tables (see [43], [52]), based on rough set (see [42]) and boolean reasoning (see [8]), with the method based on dynamic reducts and dynamic rules (see [3],[4],[5],[6]). We also compare the results of computer experiments on those data sets obtained by applying our system based on rough set methods with the results on the same data sets obtained with help of several data analysis systems known from literature.
A Comparative Study of CostSensitive Boosting Algorithms
 In Proceedings of the 17th International Conference on Machine Learning
"... This paper describes a study of different adaptations of boosting algorithms for costsensitive classification. The purpose of the study is to improve our understanding of the behavior of various costsensitive boosting algorithms and how variations in the boosting procedure affect misclassification ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
This paper describes a study of different adaptations of boosting algorithms for costsensitive classification. The purpose of the study is to improve our understanding of the behavior of various costsensitive boosting algorithms and how variations in the boosting procedure affect misclassification cost and high cost error. We find that boosting can be simplified for costsensitive classification. A new variant, which excludes a factor used in ordinary boosting, performs best at minimizing high cost errors and it almost always performs better than AdaBoost. We also find that costsensitive boosting seeks to minimize high cost errors rather than cost, and a minimum expected cost criterion, applied during classification, greatly enhances the performance of all costsensitive adaptations of boosting algorithms. We show a strong correlation between an algorithm that produces small model size and its success in reducing high cost errors. For a recently proposed method, AdaCost,...
Pasting small votes for classification in large databases and online
 Machine Learning
, 1999
"... Abstract. Many databases have grown to the point where they cannot fit into the fast memory of even large memory machines, to say nothing of current workstations. If what we want to do is to use these data bases to construct predictions of various characteristics, then since the usual methods requir ..."
Abstract

Cited by 43 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Many databases have grown to the point where they cannot fit into the fast memory of even large memory machines, to say nothing of current workstations. If what we want to do is to use these data bases to construct predictions of various characteristics, then since the usual methods require that all data be held in fast memory, various workarounds have to be used. This paper studies one such class of methods which give accuracy comparable to that which could have been obtained if all data could have been held in core and which are computationally fast.The procedure takes small pieces of the data, grows a predictor on each small piece and then pastes these predictors together. A version is given that scales up to terabyte data sets. The methods are also applicable to online learning. Keywords: combining, database, votes, pasting
On predictive distributions and Bayesian networks
 Statistics and Computing
, 2000
"... this paper we are interested in discrete prediction problems for a decisiontheoretic setting, where the ..."
Abstract

Cited by 39 (30 self)
 Add to MetaCart
this paper we are interested in discrete prediction problems for a decisiontheoretic setting, where the
Speeding Up BackPropagation Using Multiobjective Evolutionary Algorithms
, 2003
"... this paper is to present an optimization algorithm, comprising a multiobjective evolutionary algorithm and a gradientbased local search. In the rest of the paper, this is referred to as memetic Pareto artificial neural network (MPANN) algorithm for training ANNs. The evolutionary approach is used ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
this paper is to present an optimization algorithm, comprising a multiobjective evolutionary algorithm and a gradientbased local search. In the rest of the paper, this is referred to as memetic Pareto artificial neural network (MPANN) algorithm for training ANNs. The evolutionary approach is used to simultaneously train the network and optimize its architecture. The result is a set of networks, with each network in the set attempts to optimize both the training error and the architecture. We also present a selfadaptive version with lower computational cost. We show empirically that the proposed method is capable of reducing the training time compared to gradientbased techniques
Boosting with Structural Sparsity
"... We derive generalizations of AdaBoost and related gradientbased coordinate descent methods that incorporate sparsitypromoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and back ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
(Show Context)
We derive generalizations of AdaBoost and related gradientbased coordinate descent methods that incorporate sparsitypromoting penalties for the norm of the predictor that is being learned. The end result is a family of coordinate descent algorithms that integrate forward feature induction and backpruning through regularization and give an automatic stopping criterion for feature induction. We study penalties based on the ℓ1, ℓ2, and ℓ ∞ norms of the predictor and introduce mixednorm penalties that build upon the initial penalties. The mixednorm regularizers facilitate structural sparsity in parameter space, which is a useful property in multiclass prediction and other related tasks. We report empirical results that demonstrate the power of our approach in building accurate and structurally sparse models. 1. Introduction and
The Consistency of Empirical Comparisons of Regression and Analogybased Software Project Cost Prediction
"... OBJECTIVE to determine the consistency within and between results in empirical studies of software engineering cost estimation. We focus on regression and analogy techniques as these are commonly used. METHOD – we conducted an exhaustive search using predefined inclusion and exclusion criteria and ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
OBJECTIVE to determine the consistency within and between results in empirical studies of software engineering cost estimation. We focus on regression and analogy techniques as these are commonly used. METHOD – we conducted an exhaustive search using predefined inclusion and exclusion criteria and identified 67 journal papers and 104 conference papers. From this sample we identified 11 journal papers and 9 conference papers that used both methods. RESULTS – our analysis found that about 25 % of studies were internally inconclusive. We also found that there is approximately equal evidence in favour of, and against analogybased methods. CONCLUSIONS – we confirm the lack of consistency in the findings and argue that this inconsistent pattern from 20 different studies comparing regression and analogy is somewhat disturbing. It suggests that we need to ask more detailed questions than just: “What is the best prediction system?”