• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Learning classifiers from imbalanced data based on biased minimax probability machine (2004)

by K Huang, H Yang, I King, K Lyu
Venue:In CVPR
Add To MetaCart

Tools

Sorted by:
Results 1 - 7 of 7

Exploratory Under-Sampling for Class-Imbalance Learning

by Xu-ying Liu, Jianxin Wu, Zhi-hua Zhou
"... Under-sampling is a class-imbalance learning method which uses only a subset of major class examples and thus is very efficient. The main deficiency is that many major class examples are ignored. We propose two algorithms to overcome the deficiency. EasyEnsemble samples several subsets from the majo ..."
Abstract - Cited by 17 (0 self) - Add to MetaCart
Under-sampling is a class-imbalance learning method which uses only a subset of major class examples and thus is very efficient. The main deficiency is that many major class examples are ignored. We propose two algorithms to overcome the deficiency. EasyEnsemble samples several subsets from the major class, trains a learner using each of them, and combines the outputs of those learners. BalanceCascade is similar toEasyEnsemble except that it removes correctly classified major class examples of trained learners from further consideration. Experiments show that both of the proposed algorithms have better AUC scores than many existing class-imbalance learning methods. Moreover, they have approximately the same training time as that of under-sampling, which trains significantly faster than other methods. 1

Linear asymmetric classifier for cascade detectors

by Jianxin Wu, Matthew D. Mullin, James M. Rehg - Proceedings of the 22nd International Conference on Machine Learning , 2005
"... The detection of faces in images is fundamentally a rare event detection problem. Cascade classifiers provide an efficient computational solution, by leveraging the asymmetry in the distribution of faces vs. non-faces. Training a cascade classifier in turn requires a solution for the following subpr ..."
Abstract - Cited by 11 (3 self) - Add to MetaCart
The detection of faces in images is fundamentally a rare event detection problem. Cascade classifiers provide an efficient computational solution, by leveraging the asymmetry in the distribution of faces vs. non-faces. Training a cascade classifier in turn requires a solution for the following subproblems: Design a classifier for each node in the cascade with very high detection rate but only moderate false positive rate. While there are a few strategies in the literature for indirectly addressing this asymmetric node learning goal, none of them are based on a satisfactory theoretical framework. We present a mathematical characterization of the node-learning problem and describe an effective closed form approximation to the optimal solution, which we call the Linear Asymmetric Classifier (LAC). We first use AdaBoost or AsymBoost to select features, and use LAC to learn a linear discriminant function to achieve the node learning goal. Experimental results on face detection show that LAC can improve the detection performance in comparison to standard methods. We also show that Fisher Discriminant Analysis on the features selected by AdaBoost yields better performance than AdaBoost itself. 1.

Handling imbalanced datasets: A review

by Sotiris Kotsiantis, Dimitris Kanellopoulos, Panayiotis Pintelas
"... Abstract. Learning classifiers from imbalanced or skewed datasets is an important topic, arising very often in practice in classification problems. In such problems, almost all the instances are labelled as one class, while far fewer instances are labelled as the other class, usually the more import ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Abstract. Learning classifiers from imbalanced or skewed datasets is an important topic, arising very often in practice in classification problems. In such problems, almost all the instances are labelled as one class, while far fewer instances are labelled as the other class, usually the more important class. It is obvious that traditional classifiers seeking an accurate performance over a full range of instances are not suitable to deal with imbalanced learning tasks, since they tend to classify all the data into the majority class, which is usually the less important class. This paper describes various techniques for handling imbalance dataset problems. Of course, a single article cannot be a complete review of all the methods and algorithms, yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions and suggesting possible bias combinations that have yet to be explored. 1

Graph-based Rare Category Detection

by Jingrui He, Yan Liu, Richard Lawrence
"... Rare category detection is the task of identifying examples from rare classes in an unlabeled data set. It is an open challenge in machine learning and plays key roles in real applications such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this p ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Rare category detection is the task of identifying examples from rare classes in an unlabeled data set. It is an open challenge in machine learning and plays key roles in real applications such as financial fraud detection, network intrusion detection, astronomy, spam image detection, etc. In this paper, we develop a new graph-based method for rare category detection named GRADE. It makes use of the global similarity matrix motivated by the manifold ranking algorithm, which results in more compact clusters for the minority classes; by selecting examples from the regions where the density changes the most, it eliminates the assumption that the majority classes and the minority classes are separable. Furthermore, when detailed information about the data set is not available, we develop a modified version of GRADE named GRADE-LI, which only needs an upper bound on the proportion of all the minority classes as input. Besides working with data with features, both GRADE and GRADE-LI can also work with graph data, which can not be processed by existing rare category detection methods. Experimental results on both synthetic and real data sets demonstrate the effectiveness of the GRADE and GRADE-LI algorithms. 1.

Joint Cascade Optimization Using a Product of Boosted Classifiers

by Leonidas Lefakis, François Fleuret
"... The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the i ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
The standard strategy for efficient object detection consists of building a cascade composed of several binary classifiers. The detection process takes the form of a lazy evaluation of the conjunction of the responses of these classifiers, and concentrates the computation on difficult parts of the image which cannot be trivially rejected. We introduce a novel algorithm to construct jointly the classifiers of such a cascade, which interprets the response of a classifier as the probability of a positive prediction, and the overall response of the cascade as the probability that all the predictions are positive. From this noisy-AND model, we derive a consistent loss and a Boosting procedure to optimize that global probability on the training set. Such a joint learning allows the individual predictors to focus on a more restricted modeling problem, and improves the performance compared to a standard cascade. We demonstrate the efficiency of this approach on face and pedestrian detection with standard data-sets and comparisons with reference baselines. 1

Co-Selection of Features and Instances for Unsupervised Rare Category Analysis

by Jingrui He, Jaime Carbonell
"... Rare category analysis is of key importance both in theory and in practice. Previous research work focuses on supervised ..."
Abstract - Add to MetaCart
Rare category analysis is of key importance both in theory and in practice. Previous research work focuses on supervised

Prior-Free Rare Category Detection

by Jingrui He, Jaime Carbonell
"... Rare category detection is an open challenge in machine learning. It plays the central role in applications such as detecting new financial fraud patterns, detecting new network malware, and scientific discovery. In such cases rare categories are hidden among huge volumes of normal data and observat ..."
Abstract - Add to MetaCart
Rare category detection is an open challenge in machine learning. It plays the central role in applications such as detecting new financial fraud patterns, detecting new network malware, and scientific discovery. In such cases rare categories are hidden among huge volumes of normal data and observations. In this paper, we propose a new method for rare category detection named SEDER, which requires no prior information about the data set. It implicitly performs semiparametric density estimation using specially designed exponentially families, and then picks the examples for labeling where the neighborhood density changes the most. SEDER can work in the cases where the data is not separable. Its unique feature over all existing methods lies in its prior-free nature, i.e. it does not require any prior
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University