• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Supervised versus Unsupervised Binary-Learning by Feedforward Neural Networks (2001)

by N Japkowicz
Venue:Machine Learning
Add To MetaCart

Tools

Sorted by:
Results 1 - 6 of 6

Editorial: Special Issue on Learning from Imbalanced Data Sets

by Nitesh V. Chawla, Nathalie Japkowicz - SIGKDD Explorations , 2004
"... The class imbalance problem is one of the (relatively) new problems that emerged when machine learning matured from an embryonic science to an applied technology, amply used in the worlds of business, industry and scientific research. ..."
Abstract - Cited by 60 (1 self) - Add to MetaCart
The class imbalance problem is one of the (relatively) new problems that emerged when machine learning matured from an embryonic science to an applied technology, amply used in the worlds of business, industry and scientific research.

Boosting for learning multiple classes with imbalanced class distribution

by Yanmin Sun - In 2006 IEEE International Conference on Data Mining (accepted), HongKong , 2006
"... Classification of data with imbalanced class distribution has posed a significant drawback of the performance attainable by most standard classifier learning algorithms, which assume a relatively balanced class distribution and equal misclassification costs. This learning difficulty attracts a lot o ..."
Abstract - Cited by 13 (1 self) - Add to MetaCart
Classification of data with imbalanced class distribution has posed a significant drawback of the performance attainable by most standard classifier learning algorithms, which assume a relatively balanced class distribution and equal misclassification costs. This learning difficulty attracts a lot of research interests. Most efforts concentrate on bi-class problems. However, bi-class is not the only scenario where the class imbalance problem prevails. Reported solutions for bi-class applications are not applicable to multi-class problems. In this paper, we develop a cost-sensitive boosting algorithm to improve the classification performance of imbalanced data involving multiple classes. One barrier of applying the cost-sensitive boosting algorithm to the imbalanced data is that the cost matrix is often unavailable for a problem domain. To solve this problem, we apply Genetic Algorithm to search the optimum cost setup of each class. Empirical tests show that the proposed cost-sensitive boosting algorithm improves the classification performances of imbalanced data sets significantly. 1

A Kernel-Based Two-Class Classifier for Imbalanced Data Sets

by Xia Hong, Senior Member, Sheng Chen, Senior Member, Chris J. Harris
"... Abstract—Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract—Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm. Index Terms—Forward selection, imbalanced data sets, kernel classifier, leave-one-out (LOO) cross validation, receiver operating characteristics (ROCs). I.

Assessing invariance properties of evaluation measures

by Marina Sokolova - In: Proceedings of the Workshop on Testing of Deployable Learning and Decision Systems, the 19th Neural Information Processing Systems Conference (NIPS , 2006
"... Evaluation of classifier performance employs variety of measures. In this work we focus on methods that evaluate how well a classifier identifies classes. We consider the effect of transformations of the confusion matrix on ten well-known and recently introduced classification measures. We analyze t ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Evaluation of classifier performance employs variety of measures. In this work we focus on methods that evaluate how well a classifier identifies classes. We consider the effect of transformations of the confusion matrix on ten well-known and recently introduced classification measures. We analyze the measure’s ability to retain its value under changes in a confusion matrix. This study emphasizes the importance of using appropriate measures in particular learning settings. We discuss benefits from the use of the invariant and non-invariant measures with respect to strong/weak characteristics of data classes. 1

Clustering using an Autoassociator: A case study in Network Event Correlation Abstract

by Reuben Smith, Nathalie Japkowicz
"... An autoassociator is a feedforward neural network that has the same number of input and output units. The goal of the autoassociator is very simple; to reconstruct its input at the output layer. Despite their simplicity, autoassociators have previously been shown to be quite successful on the task o ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
An autoassociator is a feedforward neural network that has the same number of input and output units. The goal of the autoassociator is very simple; to reconstruct its input at the output layer. Despite their simplicity, autoassociators have previously been shown to be quite successful on the task of Novelty Detection applied to industrial and military domains. The purpose of this paper is to test their utility on the more general task of clustering. In particular, we apply a clustering version of the autoassociator to the domain of Network Event Correlation. The results suggest that autoassociators are indeed useful as clustering systems. They were able to successfully correlate similar types of network alerts and have the added advantage of being fast once trained, a crucial feature when used for Network Event Correlation. 1

General Examination

by Sangyun Hahn, Mari Ostendorf, Electrical Engineering
"... People spend many hours in meetings during their working lives. The growing need for help in keeping records in meetings and searching through them has been recognized, and several groups around the world are working on a meeting browser or a summarization tool. In this research, we propose the deve ..."
Abstract - Add to MetaCart
People spend many hours in meetings during their working lives. The growing need for help in keeping records in meetings and searching through them has been recognized, and several groups around the world are working on a meeting browser or a summarization tool. In this research, we propose the development of a classification system that uses machine learning techniques to segment and detect meeting acts, which are high-level interactions among meeting participants as a group (e.g. negotiation, reporting, discussion, planning). As in other data-driven tasks, this requires a large amount of data, but labeling data can be costly, time-consuming and errorprone. To address this problem, semi-supervised learning techniques are often applied, in which a small amount of data are labeled and is used to train a classifier together with a large body of unlabeled data. In this study, we propose to use and extend a novel semi-supervised learning algorithm, the contrast classifier approach, which exploits the contrast between the distributions of labeled and unlabeled data. We will also present our research plan to investigate the impact of different labeling mechanisms on the performance of existing and proposed semi-supervised learning techniques, especially in the presence of imbalanced class distribution. Contents 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University