Results 1  10
of
17
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 373 (48 self)
 Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
An Exact Probability Metric for Decision Tree Splitting
 Machine Learning
, 1997
"... ID3's information gain heuristic is wellknown to be biased towards multivalued attributes. This bias is only partially compensated by the gain ratio used in C4.5. Several alternatives have been proposed, notably orthogonality and Beta. Gain ratio and orthogonality are strongly correlated, and all ..."
Abstract

Cited by 32 (3 self)
 Add to MetaCart
ID3's information gain heuristic is wellknown to be biased towards multivalued attributes. This bias is only partially compensated by the gain ratio used in C4.5. Several alternatives have been proposed, notably orthogonality and Beta. Gain ratio and orthogonality are strongly correlated, and all of the metrics share a common bias towards splits with one or more small expected values, under circumstances where the split likely ocurred by chance. Both classical and Bayesian statistics lead to the multiple hypergeometric distribution as the posterior probability of the null hypothesis. Both gain and the chisquared significance test are shown to arise in asymptotic approximations to the hypergeometric, revealing similar criteria for admissibility and showing the nature of their biases. Previous failures to find admissible stopping rules are traced to coupling these biased approximations with one another or with arbitrary thresholds; problems which are overcome by the hypergeometric. Em...
Virtual screens for ligands of orphan G proteincoupled receptors
 J. Chem. Inf. Model. 2005
"... Supporting Information Links to the 4 articles that cite this article, as of the time of this article download Access to high resolution figures Links to articles and content related to this article Copyright permission to reproduce figures and/or text from this article ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Supporting Information Links to the 4 articles that cite this article, as of the time of this article download Access to high resolution figures Links to articles and content related to this article Copyright permission to reproduce figures and/or text from this article
The Time Complexity of Decision Tree Induction
, 1995
"... Various factors affecting decision tree learning time are explored. The factors which consistently affect accuracy are those which directly or indirectly (as in the handling of continuous attributes) allow a greater number and variety of potential trees to be explored. Other factors, such as pruning ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Various factors affecting decision tree learning time are explored. The factors which consistently affect accuracy are those which directly or indirectly (as in the handling of continuous attributes) allow a greater number and variety of potential trees to be explored. Other factors, such as pruning and choice of heuristics, generally have little effect on accuracy, but significantly affect learning time. We prove that the time complexity of induction and postprocessing is exponential in tree height in the worst case and, under fairly general conditions, in the average case. This puts a premium on designs which tend to produce shallower trees (e.g., multiway rather than binary splits, and heuristics which prefer more balanced splits). Simple pruning is linear in tree height, contrasted to the exponential growth of more complex operations. The key factor influencing whether simple pruning will suffice is that the split selection and pruning heuristics should be the same and unbiased. ...
Invariant Operators, Small Samples, and the BiasVariance Dilemma
 In CVPR
, 2004
"... Invariant features or operators are often used to shield the recognition process from the effect of "nuisance" parameters, such as rotations, foreshortening, or illumination changes. From an informationtheoretic point of view, imposing invariance results in reduced (rather than improved) system per ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Invariant features or operators are often used to shield the recognition process from the effect of "nuisance" parameters, such as rotations, foreshortening, or illumination changes. From an informationtheoretic point of view, imposing invariance results in reduced (rather than improved) system performance. In fact, in the case of small training samples, the situation is reversed, and invariant operators may reduce the misclassification rate. We propose an analysis of this interesting behavior based on the biasvariance dilemma, and present experimental results confirming our theoretical expectations. In addition, we introduce the concept of "randomized invariants" for training, which can be used to mitigate the effect of small sample size.
Beyond Brain Blobs: Machine Learning Classifiers as Instruments for Analyzing Functional Magnetic Resonance Imaging Data
, 1998
"... Vector Decomposition MachineEsta tese é dedicada aos meus pais Paula e José, avós Clementina e Sidónio e à minha irmã Mariana, por terem sempre confiado em mim de todas as formas possiveis, mesmo The thesis put forth in this dissertation is that machine learning classifiers can be used as instrument ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Vector Decomposition MachineEsta tese é dedicada aos meus pais Paula e José, avós Clementina e Sidónio e à minha irmã Mariana, por terem sempre confiado em mim de todas as formas possiveis, mesmo The thesis put forth in this dissertation is that machine learning classifiers can be used as instruments for decoding variables of interest from functional magnetic resonance imaging (fMRI) data. There are two main goals in decoding: • Showing that the variable of interest can be predicted from the data in a statistically reliable manner (i.e. there’s enough information present). • Shedding light on how the data encode the information needed to predict, taking into account what the classifier used can learn and any criteria by which the data are filtered (e.g. how voxels and time points used are chosen). Chapter 2 considers the issues that arise when using traditional linear classifiers and several different voxel selection techniques to strive towards these
Small Sample Inference for Generalization Error in Classification Using the CUD Bound
"... Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the resampled estimator follows a known distribution to form a confidence set [Kohavi 19 ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Confidence measures for the generalization error are crucial when small training samples are used to construct classifiers. A common approach is to estimate the generalization error by resampling and then assume the resampled estimator follows a known distribution to form a confidence set [Kohavi 1995, Martin 1996,Yang 2006]. Alternatively, one might bootstrap the resampled estimator of the generalization error to form a confidence set. Unfortunately, these methods do not reliably provide sets of the desired confidence. The poor performance appears to be due to the lack of smoothness of the generalization error as a function of the learned classifier. This results in a nonnormal distribution of the estimated generalization error. We construct a confidence set for the generalization error by use of a smooth upper bound on the deviation between the resampled estimate and generalization error. The confidence set is formed by bootstrapping this upper bound. In cases in which the approximation class for the classifier can be represented as a parametric additive model, we provide a computationally efficient algorithm. This method exhibits superior performance across a series of test and simulated data sets. 1
Evaluating Data Mining Models: A Pattern Language
, 2002
"... This paper extracts and documents patterns that identify recurring solutions for the problem of evaluation of data mining models. The five patterns presented in this paper are organized as a pattern language. The patterns differ in their context of application and how they solve the evaluation probl ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper extracts and documents patterns that identify recurring solutions for the problem of evaluation of data mining models. The five patterns presented in this paper are organized as a pattern language. The patterns differ in their context of application and how they solve the evaluation problem, especially when only limited amounts of data are available. Another contribution of this paper is the introduction of a new pattern section called "Force Resolution Map". We believe that Force Resolution Maps illuminate not only these data mining patterns, but are generally useful in explicating any patterns.
Reliability of Prognostic Models in Medicine: Current Issues and Available Results
"... This paper deals with the problem of the general usefulness of results coming from prognostic models in medicine. It reports a collection of available theoretical results on the sample size of the studies taken from Computational Learning theory and basic statistics. It is concluded that a general g ..."
Abstract
 Add to MetaCart
This paper deals with the problem of the general usefulness of results coming from prognostic models in medicine. It reports a collection of available theoretical results on the sample size of the studies taken from Computational Learning theory and basic statistics. It is concluded that a general guideline for publishing Data Mining results in medicine can be derived; moreover it is argued that only few studies published in the literature followed a procedure that may really enable the reader to judge if the results obtained possess a general validity.
XCS and GALE: a Comparative Study of Two Learning Classifier
 In Proceedings of the 4th International Workshop on Learning Classifier Systems (IWLCS2001
, 2001
"... This paper compares the learning performance, in terms of prediction accuracy, of two geneticbased machine learning systems (GBML), XCS and GALE, with six wellknown learning algorithms, coming from instance based learning, decision tree induction, rulelearning, statistical modeling and supp ..."
Abstract
 Add to MetaCart
This paper compares the learning performance, in terms of prediction accuracy, of two geneticbased machine learning systems (GBML), XCS and GALE, with six wellknown learning algorithms, coming from instance based learning, decision tree induction, rulelearning, statistical modeling and support vector machines. The experiments, performed on several datasets, show the suitability of the geneticbased learning classifier systems for classification tasks. Both XCS and GALE significantly achieved better results than IB1 and Naive Bayes. Besides, any method could not outperform XCS and GALE significantly.