Results 1  10
of
142
A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirtythree Old and New Classification Algorithms
, 2000
"... . Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both cr ..."
Abstract

Cited by 221 (8 self)
 Add to MetaCart
(Show Context)
. Twentytwo decision tree, nine statistical, and two neural network algorithms are compared on thirtytwo datasets in terms of classication accuracy, training time, and (in the case of trees) number of leaves. Classication accuracy is measured by mean error rate and mean rank of error rate. Both criteria place a statistical, splinebased, algorithm called Polyclass at the top, although it is not statistically signicantly dierent from twenty other algorithms. Another statistical algorithm, logistic regression, is second with respect to the two accuracy criteria. The most accurate decision tree algorithm is Quest with linear splits, which ranks fourth and fth, respectively. Although splinebased statistical algorithms tend to have good accuracy, they also require relatively long training times. Polyclass, for example, is third last in terms of median training time. It often requires hours of training compared to seconds for other algorithms. The Quest and logistic regression algor...
Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines
 in Advances in Neural Information Processing Systems 15
, 2002
"... We derive multiplicative updates for solving the nonnegative quadratic programming problem in support vector machines (SVMs). The updates have a simple closed form, and we prove that they converge monotonically to the solution of the maximum margin hyperplane. The updates optimize the traditiona ..."
Abstract

Cited by 81 (7 self)
 Add to MetaCart
We derive multiplicative updates for solving the nonnegative quadratic programming problem in support vector machines (SVMs). The updates have a simple closed form, and we prove that they converge monotonically to the solution of the maximum margin hyperplane. The updates optimize the traditionally proposed objective function for SVMs. They do not involve any heuristics such as choosing a learning rate or deciding which variables to update at each iteration. They can be used to adjust all the quadratic programming variables in parallel with a guarantee of improvement at each iteration. We analyze the asymptotic convergence of the updates and show that the coefficients of nonsupport vectors decay geometrically to zero at a rate that depends on their margins. In practice, the updates converge very rapidly to good classifiers.
Mathematical Programming for Data Mining: Formulations and Challenges
 INFORMS Journal on Computing
, 1998
"... This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research ch ..."
Abstract

Cited by 61 (0 self)
 Add to MetaCart
(Show Context)
This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research challenges, and outline opportunities for contributions by the optimization research communities. Towards these goals, we include formulations of the basic categories of data mining methods as optimization problems. We also provide examples of successful mathematical programming approaches to some data mining problems. keywords: data analysis, data mining, mathematical programming methods, challenges for massive data sets, classification, clustering, prediction, optimization. To appear: INFORMS: Journal of Compting, special issue on Data Mining, A. Basu and B. Golden (guest editors). Also appears as Mathematical Programming Technical Report 9801, Computer Sciences Department, University of Wi...
Automatic Parameter Selection by Minimizing Estimated Error
 In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... We address the problem of finding the parameter settings that will result in optimal performance of a given learning algorithm using a particular dataset as training data. We describe a "wrapper" method, considering determination of the best parameters as a discrete function optimization p ..."
Abstract

Cited by 59 (3 self)
 Add to MetaCart
We address the problem of finding the parameter settings that will result in optimal performance of a given learning algorithm using a particular dataset as training data. We describe a "wrapper" method, considering determination of the best parameters as a discrete function optimization problem. The method uses bestfirst search and crossvalidation to wrap around the basic induction algorithm: the search explores the space of parameter values, running the basic algorithm many times on training and holdout sets produced by crossvalidation to get an estimate of the expected error of each parameter setting. Thus, the final selected parameter settings are tuned for the specific induction algorithm and dataset being studied. We report experiments with this method on 33 datasets selected from the UCI and StatLog collections using C4.5 as the basic induction algorithm. At a 90% confidence level, our method improves the performance of C4.5 on nine domains, degrades performance on one, and is...
InstanceBased Learning: Nearest Neighbour with Generalisation
, 1995
"... Instancebased learning is a machine learning method that classifies new examples by comparing them to those already seen and in memory. There are two types of instancebased learning; nearest neighbour and casebased reasoning. Of these two methods, nearest neighbour fell into disfavour during the ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
Instancebased learning is a machine learning method that classifies new examples by comparing them to those already seen and in memory. There are two types of instancebased learning; nearest neighbour and casebased reasoning. Of these two methods, nearest neighbour fell into disfavour during the 1980s, but regained popularity recently due to its simplicity and ease of implementation. Nearest neighbour learning is not without problems. It is difficult to define a distance function that works well for both discrete and continuous attributes. Noise and irrelevant attributes also pose problems. Finally, the specificity bias adopted by instancebased learning, while often an advantage, can overrepresent small rules at the expense of more general concepts, leading to a marked decrease in classification performance for some domains. Generalised exemplars offer a solution. Examples that share the same class are grouped together, and so represent large rules more fully. This reduces the rol...
Symbolic Interpretation of Artificial Neural Networks
, 1996
"... Hybrid Intelligent Systems that combine knowledge based and artificial neural network systems typically have four phases involving domain knowledge representation, mapping of this knowledge into an initial connectionist architecture, network training and rule extraction respectively. The final phase ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
(Show Context)
Hybrid Intelligent Systems that combine knowledge based and artificial neural network systems typically have four phases involving domain knowledge representation, mapping of this knowledge into an initial connectionist architecture, network training and rule extraction respectively. The final phase is important because it can provide a trained connectionist architecture with explanation power and validate its output decisions. Moreover, it can be used to refine and maintain the initial knowledge acquired from domain experts. In this paper, we present three rule extraction techniques. The first technique extracts a set of binary rules from any type of neural network. The other two techniques are specific to feedforward networks with a single hidden layer of sigmoidal units. Technique 2 extracts partial rules that represent the most important embedded knowledge with an adjustable level of detail, while the third technique provides a more comprehensive and universal approach. A rule eval...
An efficient fuzzy classifier with feature selection based on fuzzy entropy
 IEEE Transactions on Systems, Man, and Cybernetics
, 2001
"... Abstract—This paper presents an efficient fuzzy classifier with the ability of feature selection based on a fuzzy entropy measure. Fuzzy entropy is employed to evaluate the information of pattern distribution in the pattern space. With this information, we can partition the pattern space into nonove ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents an efficient fuzzy classifier with the ability of feature selection based on a fuzzy entropy measure. Fuzzy entropy is employed to evaluate the information of pattern distribution in the pattern space. With this information, we can partition the pattern space into nonoverlapping decision regions for pattern classification. Since the decision regions do not overlap, both the complexity and computational load of the classifier are reduced and thus the training time and classification time are extremely short. Although the decision regions are partitioned into nonoverlapping subspaces, we can achieve good classification performance since the decision regions can be correctly determined via our proposed fuzzy entropy measure. In addition, we also investigate the use of fuzzy entropy to select relevant features. The feature selection procedure not only reduces the dimensionality of a problem but also discards noisecorrupted, redundant and unimportant features. Finally, we apply the proposed classifier to the Iris database and Wisconsin breast cancer database to evaluate the classification performance. Both of the results show that the proposed classifier can work well for the pattern classification application. Index Terms—Feature selection, fuzzy classifier, fuzzy entropy. I.
Weighted clustering ensembles
 In Proceedings of The 6th SIAM International Conference on Data Mining
, 2006
"... Cluster ensembles offer a solution to challenges inherent to clustering arising from its illposed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the ..."
Abstract

Cited by 29 (7 self)
 Add to MetaCart
Cluster ensembles offer a solution to challenges inherent to clustering arising from its illposed nature. Cluster ensembles can provide robust and stable solutions by leveraging the consensus across multiple clustering results, while averaging out emergent spurious structures that arise due to the various biases to which each participating algorithm is tuned. In this paper, we address the problem of combining multiple weighted clusters which belong to different subspaces of the input space. We leverage the diversity of the input clusterings in order to generate a consensus partition that is superior to the participating ones. Since we are dealing with weighted clusters, our consensus function makes use of the weight vectors associated with the clusters. The experimental results show that our ensemble technique is capable of producing a partition that is as good as or better than the best individual clustering. 1
Metastable memory in an artificial immune network
 In Artificial Immune Systems: Proceedings of ICARIS 2003
, 2003
"... Abstract. This paper describes an artificial immune system algorithm which implements a fairly close analogue of the memory mechanism proposed by Jerne(1) (usually known as the Immune Network Theory). The algorithm demonstrates the ability of these types of network to produce metastable structures ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
Abstract. This paper describes an artificial immune system algorithm which implements a fairly close analogue of the memory mechanism proposed by Jerne(1) (usually known as the Immune Network Theory). The algorithm demonstrates the ability of these types of network to produce metastable structures representing populated regions of the antigen space. The networks produced retain their structure indefinitely and capture inherent structure within the sets of antigens used to train them. Results from running the algorithm on a variety of data sets are presented and shown to be stable over long time periods and wide ranges of parameters. The potential of the algorithm as a tool for multivariate data analysis is also explored. 1