Results 1 -
9 of
9
Discriminative Models for Information Retrieval
- SIGIR '04
, 2004
"... Discriminative models have been preferred over generative models in many machine learning problems in the recent past owing to some of their attractive theoretical properties. In this paper, we explore the applicability of discriminative classifiers for IR. We have compared the performance of two po ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
Discriminative models have been preferred over generative models in many machine learning problems in the recent past owing to some of their attractive theoretical properties. In this paper, we explore the applicability of discriminative classifiers for IR. We have compared the performance of two popular discriminative models, namely the maximum entropy model and support vector machines with that of language modeling, the state-of-the-art generative model for IR. Our experiments on ad-hoc retrieval indicate that although maximum entropy is significantly worse than language models, support vector machines are on par with language models. We argue that the main reason to prefer SVMs over language models is their ability to learn arbitrary features automatically as demonstrated by our experiments on the home-page finding task of TREC-10.
Editorial: Special Issue on Learning from Imbalanced Data Sets
- SIGKDD Explorations
, 2004
"... The class imbalance problem is one of the (relatively) new problems that emerged when machine learning matured from an embryonic science to an applied technology, amply used in the worlds of business, industry and scientific research. ..."
Abstract
-
Cited by 60 (1 self)
- Add to MetaCart
The class imbalance problem is one of the (relatively) new problems that emerged when machine learning matured from an embryonic science to an applied technology, amply used in the worlds of business, industry and scientific research.
Issues in mining imbalanced data sets - a review paper
- in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, 2005
"... This paper traces some of the recent progress in the field of learning of imbalanced data. It reviews approaches adopted for this problem and it identifies challenges and points out future directions in this relatively new field. ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper traces some of the recent progress in the field of learning of imbalanced data. It reviews approaches adopted for this problem and it identifies challenges and points out future directions in this relatively new field.
Signal + context = better classification
- In Proceedings of the International Conference on Music Information Retrieval
, 2007
"... Typical signal-based approaches to extract musical descriptions from audio only have limited precision. A possible explanation is that they do not exploit context, which provides important cues in human cognitive processing of music: e.g. electric guitar is unlikely in 1930s music, children choirs r ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Typical signal-based approaches to extract musical descriptions from audio only have limited precision. A possible explanation is that they do not exploit context, which provides important cues in human cognitive processing of music: e.g. electric guitar is unlikely in 1930s music, children choirs rarely perform heavy metal, etc. We propose an architecture to train a large set of binary classifiers simultaneously, for many different musical metadata (genre, instrument, mood, etc.), in such a way that correlation between metadata is used to reinforce each individual classifier. The system is iterative: it uses classification decisions it made on some classification problems as new features for new, harder problems; and hybrid: it uses a signal classifier based on timbre similarity to bootstrap symbolic inference with decision trees. While further work is needed, the approach seems to outperform signal-only algorithms by 5 % precision on average, and sometimes up to 15 % for traditionally difficult problems such as cultural and subjective categories.
Generative Oversampling for Mining Imbalanced Datasets
"... Abstract — One way to handle data mining problems where class prior probabilities and/or misclassification costs between classes are highly unequal is to resample the data until a new, desired class distribution in the training data is achieved. Many resampling techniques have been proposed in the p ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract — One way to handle data mining problems where class prior probabilities and/or misclassification costs between classes are highly unequal is to resample the data until a new, desired class distribution in the training data is achieved. Many resampling techniques have been proposed in the past, and the relationship between resampling and cost-sensitive learning has been well studied. Surprisingly, however, few resampling techniques attempt to create new, artificial data points which generalize the known, labeled data. In this paper, we introduce an easily implementable resampling technique (generative oversampling) which creates new data points by learning from available training data. Empirically, we demonstrate that generative oversampling outperforms other wellknown resampling methods on several datasets in the example domain of text classification. I.
Under-Sampling Approaches for Improving Prediction of the Minority Class in an Imbalanced Dataset
"... Abstract. The most important factor of classification for improving classification accuracy is the training data. However, the data in real-world applications often are imbalanced class distribution, that is, most of the data are in majority class and little data are in minority class. In this case, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. The most important factor of classification for improving classification accuracy is the training data. However, the data in real-world applications often are imbalanced class distribution, that is, most of the data are in majority class and little data are in minority class. In this case, if all the data are used to be the training data, the classifier tends to predict that most of the incoming data belong to the majority class. Hence, it is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy for minority class in the imbalanced class distribution problem. The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies. 1
ISSN: 0976-8491(Online) | ISSN: 2229-4333(Print) On the Classification of Imbalanced Datasets 1 C.V. KrishnaVeni,
"... The Classification of Imbalanced Data Sets have received considerable attention in recent research. In this paper, we present an overview of the problem of imbalanced data sets, explain the most commonly used techniques such as sampling and cost sensitive learning, present some evaluation metrics us ..."
Abstract
- Add to MetaCart
The Classification of Imbalanced Data Sets have received considerable attention in recent research. In this paper, we present an overview of the problem of imbalanced data sets, explain the most commonly used techniques such as sampling and cost sensitive learning, present some evaluation metrics used on imbalanced data sets, quote some interesting points drawn from various popular and latest research papers related to imbalanced classification problem. This paper does not mention all the available research solutions, but try to give a clear picture of imbalanced data set classification problem and present a brief review of existing solutions on this problem. Here, we consider binary classification problem on imbalanced data sets.

