## Learning Classifiers from Imbalanced Data Based on Biased Minimax Probability Machine (2004)

### Cached

### Download Links

Citations: | 16 - 1 self |

### BibTeX

@MISC{Huang04learningclassifiers,

author = {Kaizhu Huang and Haiqin Yang and Irwin King and Michael R. Lyu},

title = {Learning Classifiers from Imbalanced Data Based on Biased Minimax Probability Machine},

year = {2004}

}

### Years of Citing Articles

### OpenURL

### Abstract

We consider the problem of the binary classification on imbalanced data, in which nearly all the instances are labelled as one class, while far fewer instances are labelled as the other class, usually the more important class. Traditional machine learning methods seeking an accurate performance over a full range of instances are not suitable to deal with this problem, since they tend to classify all the data into the majority, usually the less important class. Moreover, some current methods have tried to utilize some intermediate factors, e.g., the distribution of the training set, the decision thresholds or the cost matrices, to influence the bias of the classification. However, it remains uncertain whether these methods can improve the performance in a systematic way. In this paper, we propose a novel model named Biased Minimax Probability Machine. Different from previous methods, this model directly controls the worst-case real accuracy of classification of the future data to build up biased classifiers. Hence, it provides a rigorous treatment on imbalanced data. The experimental results on the novel model comparing with those of three competitive methods, i.e., the Naive Bayesian classifier, the k-Nearest Neighbor method, and the decision tree method C4.5, demonstrate the superiority of our novel model.

### Citations

796 |
Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific
- Bertsekas, N
- 1996
(Show Context)
Citation Context ...e data points) steps if the initial 1 In practice, we can always add a small positive amount to the diagonal elements of these two matrices and make them positive definite. point is suitably assigned =-=[1]-=-. In each step, the computational cost to calculate the conjugate gradient is O(n 2 ). Thus this method has a worst-case O(n 3 ) time complexity. Adding the time cost to estimate x, y, x, y, the total... |

468 | The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30:1145
- Bradley
- 1997
(Show Context)
Citation Context ...Operating Characteristic (ROC) analysis. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04) 1063-6919/04 $20.00 © 2004 IEEEsThe MC criterion =-=[2]-=- minimizes the cost measured by Cost = Fp CFp + Fn CFn, whereFp is the number of the false positive, CFp is the cost of a false positive, Fn is the number of false negative, and CFn is the cost of a f... |

330 |
Measuring the accuracy of diagnostic systems
- Swets
- 1988
(Show Context)
Citation Context ...positive and false-positive probability) [4], is a linear form. The ROC analysis originated in signal detection theory has been introduced to evaluate the performance in learning from imbalanced data =-=[12]-=- [9]. This criterion plots a so-called ROC curve to visualize the tradeoff between the false-positive rate and the true-positive rate and leaves the task of the selection of a specific tradeoff to the... |

293 | Arcing classifiers
- Breiman
- 1998
(Show Context)
Citation Context ...troduce noise. According to [10], one open question is that whether simply varying the skewness of the data distribution can improve predictive performance systematically. Furthermore, Breiman et al. =-=[3]-=- establishes the connection among the distribution of the training data, the prior probability of each class, the costs of misclassification of each class, and the setup of the decision threshold. Cha... |

169 | Addressing the curse of imbalanced training sets: Onesided sampling
- Kubat, Matwin
- 1997
(Show Context)
Citation Context ...ce they tend to classify all the data into the majority class, which is usually the less important class. To cope with imbalanced datasets, there are types of methods, such as the methods of sampling =-=[7]-=-, the methods of moving the decision thresholds [9][10], and the methods of adjusting the cost-matrices[9]. The firstschool of methods aims to reduce the data imbalance by “downsampling” (removing) in... |

116 | S.Matwin, “Machine learning for the detection of oil spills in satellite radar images
- Kubat, Holte
- 1998
(Show Context)
Citation Context ... false negative. However, the cost of misclassification is generally unknown in real cases, this restricts the usage of this measure. The criterion of MGM maximizes the geometric mean of the accuracy =-=[6]-=-, but it contains a nonlinear form, which is not easy to be automatically optimized. Comparatively, MS maximizing the sum of the accuracy on the positive class and the negative class (or maximizing th... |

71 | A robust minimax approach to classification
- Lanckriet, Ghaoui, et al.
(Show Context)
Citation Context ...y changing thresholds or adjust the weight for each class lacks the systematic foundation in the same sense as the sampling method. In this paper, based on extending Mimimax Probability Machine (MPM) =-=[8]-=-, a competitive model compared with a state-of-the-art classifier, the Support Vector Machine,we propose a novel model named Biased Minimax Probability Machine (BMPM) to handle the tasks of learning f... |

20 | Arcing classi¯ers - Breiman - 1998 |

17 | Improved rooftop detection in aerial images with machine learning
- Maloof, Langley, et al.
- 2003
(Show Context)
Citation Context ...ive and false-positive probability) [4], is a linear form. The ROC analysis originated in signal detection theory has been introduced to evaluate the performance in learning from imbalanced data [12] =-=[9]-=-. This criterion plots a so-called ROC curve to visualize the tradeoff between the false-positive rate and the true-positive rate and leaves the task of the selection of a specific tradeoff to the rea... |

11 |
Increasing sensitivity of preterm birth by changing rule strengths
- Grzymala-Busse, Goodwin, et al.
- 2003
(Show Context)
Citation Context ...ly optimized. Comparatively, MS maximizing the sum of the accuracy on the positive class and the negative class (or maximizing the difference between the true-positive and false-positive probability) =-=[4]-=-, is a linear form. The ROC analysis originated in signal detection theory has been introduced to evaluate the performance in learning from imbalanced data [12] [9]. This criterion plots a so-called R... |

8 | Nonlinear Programming. Athena Scientic - Bertsekas - 1999 |

6 | Biased minimax probability machine for medical diagnosis
- Huang, Yang, et al.
- 2004
(Show Context)
Citation Context ...ginal space. It is easy to verify the kernelization procedure similar to [8] can be applied to BMPM as well. To save space, we omit the kernelization in this paper and refer the interested readers to =-=[8, 5]-=-. 3 BMPM for Imbalanced Learning In this section, we first review four standard imbalanced learning criteria, which are widely used in previous literatures. We then, based on two of them, apply BMPM t... |

3 |
Fractional programming,” in Nonconvex Optimization and Its Applications
- Schaible
- 1995
(Show Context)
Citation Context ...M optimization problem is changed to: 1 � ( 0) max a6=0 p aT ya p s.t. a aT xa T (x � y) =1: (14) Further the above formulation (14) can be written as the socalled Fractional Programming (FP) problem =-=[11]-=-, f(a) max a6=0 g(a) � s.t. a 2 A = fajaT (x � y) =1g � (15) where f(a) =1� ( 0) p aT p ya� g(a) = aT xa . In the following, we propose Lemma 2 to show that this FP problem is solvable. Lemma 2 The Fr... |

1 | Learning from imbanlanced data sets - Provost - 2000 |

1 | A robust minimax approach to classi£cation - Lanckriet, Ghaoui, et al. |