Results 1 -
9 of
9
Issues in mining imbalanced data sets - a review paper
- in Proceedings of the Sixteen Midwest Artificial Intelligence and Cognitive Science Conference, 2005
"... This paper traces some of the recent progress in the field of learning of imbalanced data. It reviews approaches adopted for this problem and it identifies challenges and points out future directions in this relatively new field. ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper traces some of the recent progress in the field of learning of imbalanced data. It reviews approaches adopted for this problem and it identifies challenges and points out future directions in this relatively new field.
Discovery of latent structures: Experience with the CoIL challenge 2000 data set
- Journal of Systems Science and Complexity
, 2008
"... Abstract The authors present a case study to demonstrate the possibility of discovering complex and interesting latent structures using hierarchical latent class (HLC) models. A similar effort was made earlier by Zhang (2002), but that study involved only small applications with 4 or 5 observed vari ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract The authors present a case study to demonstrate the possibility of discovering complex and interesting latent structures using hierarchical latent class (HLC) models. A similar effort was made earlier by Zhang (2002), but that study involved only small applications with 4 or 5 observed variables and no more than 2 latent variables due to the lack of efficient learning algorithms. Significant progress has been made since then on algorithmic research, and it is now possible to learn HLC models with dozens of observed variables. This allows us to demonstrate the benefits of HLC models more convincingly than before. The authors have successfully analyzed the CoIL Challenge 2000 data set using HLC models. The model obtained consists of 22 latent variables, and its structure is intuitively appealing. It is exciting to know that such a large and meaningful latent structure can be automatically inferred from data. Key words Bayesian networks, case study, latent structure discovery, learning. 1
Efficient Model Evaluation in the Search-Based Approach to Latent Structure Discovery
"... Latent tree (LT) models are tree-structured Bayesian networks where leaf nodes are observed while internal nodes are hidden. We are interested in learning LT models through systematic search. A key problem here is how to efficiently evaluate candidate models during search. The problem is difficult b ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Latent tree (LT) models are tree-structured Bayesian networks where leaf nodes are observed while internal nodes are hidden. We are interested in learning LT models through systematic search. A key problem here is how to efficiently evaluate candidate models during search. The problem is difficult because there is a large number of candidate models, the candidate models contain latent variables, and some of those latent variables are foreign to the current model. A variety of ideas for attacking the problem have emerged from the literature. In this paper we observe that the ideas can be grouped into two distinct approaches. The first is based on data completion, while the second is based on what we call maximum restricted likelihood. We investigate and compare the two approaches in the framework of EAST, a newly developed search procedure for learning LT models. 1
Bayesian Networks with Imprecise Probabilities: Theory and Application to Classification
, 2010
"... Bayesian network are powerful probabilistic graphical models for modelling uncertainty. Among others, classification represents an important application: some of the most used classifiers are based on Bayesian networks. Bayesian networks are precise models: exact numeric values should be provided fo ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Bayesian network are powerful probabilistic graphical models for modelling uncertainty. Among others, classification represents an important application: some of the most used classifiers are based on Bayesian networks. Bayesian networks are precise models: exact numeric values should be provided for quantification. This requirement is sometimes too narrow. Sets instead of single distributions can provide a more realistic description in these cases. Bayesian networks can be generalized to cope with sets of distributions. This leads to a novel class of imprecise probabilistic graphical models, called credal networks. In particular, classifiers based on Bayesian networks are generalized to so-called credal classifiers. Unlike Bayesian classifiers, which always detect a single class as the one maximizing the posterior class probability, a credal classifier may eventually be unable to discriminate a single class. In other words, if the available information is not sufficient, credal classifiers allow for indecision between two or more classes, this providing a less informative but more robust conclusion than Bayesian classifiers.
On Feature Selection, Bias-Variance, and Bagging
"... Abstract. We examine the mechanism by which feature selection improves the accuracy of supervised learning. An empirical bias/variance analysis as feature selection progresses indicates that the most accurate feature set corresponds to the best bias-variance trade-off point for the learning algorith ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. We examine the mechanism by which feature selection improves the accuracy of supervised learning. An empirical bias/variance analysis as feature selection progresses indicates that the most accurate feature set corresponds to the best bias-variance trade-off point for the learning algorithm. Often, this is not the point separating relevant from irrelevant features, but where increasing variance outweighs the gains from adding more (weakly) relevant features. In other words, feature selection can be viewed as a variance reduction method that trades off the benefits of decreased variance (from the reduction in dimensionality) with the harm of increased bias (from eliminating some of the relevant features). If a variance reduction method like bagging is used, more (weakly) relevant features can be exploited and the most accurate feature set is usually larger. In many cases, the best performance is obtained by using all available features. 1
Quartet-Based Learning of Hierarchical Latent Class Models:
"... Hierarchical latent class (HLC) models are tree-structured Bayesian networks where leaf nodes are observed while internal nodes are hidden. The currently most e#cient algorithm for learning HLC models can deal with only a few dozen observed variables. ..."
Abstract
- Add to MetaCart
Hierarchical latent class (HLC) models are tree-structured Bayesian networks where leaf nodes are observed while internal nodes are hidden. The currently most e#cient algorithm for learning HLC models can deal with only a few dozen observed variables.
Universiteit Leiden Universiteit Leiden
"... ter verkrijging van de graad van Doctor aan de Universiteit Leiden, op gezag van Rector Magnificus prof.mr. P.F. van der Heijden, volgens besluit van het College voor Promoties te verdedigen op dinsdag 19 Januari 2010 klokke 16:15 uur door Petrus Wilhelmus Henricus van der Putten geboren te Eindhove ..."
Abstract
- Add to MetaCart
ter verkrijging van de graad van Doctor aan de Universiteit Leiden, op gezag van Rector Magnificus prof.mr. P.F. van der Heijden, volgens besluit van het College voor Promoties te verdedigen op dinsdag 19 Januari 2010 klokke 16:15 uur door Petrus Wilhelmus Henricus van der Putten geboren te Eindhoven in 1971Samenstelling van de promotiecommisie
Classification of Yeast Cells from Image Features to Evaluate Pathogen Conditions
"... Morphometrics from images, image analysis, may reveal differences between classes of objects present in the images. We have performed an image-features-based classification for the pathogenic yeast Cryptococcus neoformans. Building and analyzing image collections from the yeast under different envir ..."
Abstract
- Add to MetaCart
Morphometrics from images, image analysis, may reveal differences between classes of objects present in the images. We have performed an image-features-based classification for the pathogenic yeast Cryptococcus neoformans. Building and analyzing image collections from the yeast under different environmental or genetic conditions may help to diagnose a new “unseen ” situation. Diagnosis here means that retrieval of the relevant information from the image collection is at hand each time a new “sample ” is presented. The basidiomycetous yeast Cryptococcus neoformans can cause infections such as meningitis or pneumonia. The presence of an extra-cellular capsule is known to be related to virulence. This paper reports on the approach towards developing classifiers for detecting potentially more or less virulent cells in a sample, i.e. an image, by using a range of features derived from the shape or density distribution. The classifier can henceforth be used for automating screening and annotating existing image collections. In addition we will present our methods for creating samples, collecting images, image preprocessing, identifying “yeast cells ” and creating feature extraction from the images. We compare various expertise based and fully automated methods of feature selection and benchmark a range of classification algorithms and illustrate successful application to this particular domain.

