Results 1 
7 of
7
General and Efficient Multisplitting of Numerical Attributes
, 1999
"... . Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the wellbehavedness of an evaluation function, ..."
Abstract

Cited by 51 (7 self)
 Add to MetaCart
. Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the wellbehavedness of an evaluation function, a property that guarantees the optimal multipartition of an arbitrary numerical domain to be defined on boundary points. Wellbehavedness reduces the number of candidate cut points that need to be examined in multisplitting numerical attributes. Many commonly used attribute evaluation functions possess this property; we demonstrate that the cumulative functions Information Gain and Training Set Error as well as the noncumulative functions Gain Ratio and Normalized Distance Measure are all wellbehaved. We also devise a method of finding optimal multisplits efficiently by examining the minimum number of boundary point combinations that is required to produce partitions which are optimal wit...
On the Qualitative Behavior of ImpurityBased Splitting Rules I: The MinimaFree Property
 Machine Learning
, 1997
"... We show that all strictly convex \ impurity measures lead to splits at boundary points, and furthermore show that certain rational splitting rules, notably the information gain ratio, also have this property. A slightly weaker result is shown to hold for impurity measures that are only convex \, suc ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
We show that all strictly convex \ impurity measures lead to splits at boundary points, and furthermore show that certain rational splitting rules, notably the information gain ratio, also have this property. A slightly weaker result is shown to hold for impurity measures that are only convex \, such as Inaccuracy. 1
PRISMA: Improving Risk Estimation with Parallel Logistic Regression Trees
"... Abstract. Logistic regression is a very powerful method to estimate models with binary response variables. With the previously suggested combination of treebased approaches with local, piecewise valid logistic regression models in the nodes, interactions between the covariates are directly conveye ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Logistic regression is a very powerful method to estimate models with binary response variables. With the previously suggested combination of treebased approaches with local, piecewise valid logistic regression models in the nodes, interactions between the covariates are directly conveyed by the tree and can be interpreted more easily. We show that the restriction of partitioning the feature space only at the single best attribute limits the overall estimation accuracy. Here we suggest Parallel RecursIve Search at Multiple Attributes (PRISMA) and demonstrate how the method can significantly improve risk estimation models in heart surgery and successfully perform a benchmark on three UCI data sets. 1
Clinical screening;
, 2004
"... Evaluation of radiological features for breast tumour classification in clinical screening with machine learning methods ..."
Abstract
 Add to MetaCart
(Show Context)
Evaluation of radiological features for breast tumour classification in clinical screening with machine learning methods
Breast cancer;
, 2004
"... Evaluation of radiological features for breast tumour classification in clinical screening with ..."
Abstract
 Add to MetaCart
(Show Context)
Evaluation of radiological features for breast tumour classification in clinical screening with
Proc EUROMISE Int Joint Meeting, Prag 2004, p.1/10 On Temporal Validity Analysis of Association Rules
"... Association rule mining [1] is a prominent datamining method used in many domains. Despite the fact that most large datasets are collected over longer time spans, the considered systems are in most cases assumed stationary, which leads to complete ignorance of temporal effects. In this contribution ..."
Abstract
 Add to MetaCart
(Show Context)
Association rule mining [1] is a prominent datamining method used in many domains. Despite the fact that most large datasets are collected over longer time spans, the considered systems are in most cases assumed stationary, which leads to complete ignorance of temporal effects. In this contribution we present statistical and discretization techniques of partitioning the data recording time into intervals where the considered association rules remain homogeneous with respect to their support and confidence. In contrast with previous work where the considered time intervals are fixed they are determined in a data driven manner, which introduces a problem of optimal time granularity [2]. Furthermore, we demonstrate applicability of the riskadjusted quality assessment in medical domain, specifically as it relates to heart surgery. For example, in comparison with the Euroscore risk system [3] the outcome prediction models for duration of intensive care or mortality can be significantly enhanced. Interesting pattern changes can be identified and assigned to systematical and organizational modifications of the considered system.
WellBehaved Evaluation Functions for Numerical Attributes
, 1997
"... The class of wellbehaved evaluation functions simplifies and makes efficient the handling of numerical attributes; for them it suffices to concentrate on the boundary points in searching for the optimal partition. This holds always for binary partitions and also for multisplits if only the function ..."
Abstract
 Add to MetaCart
(Show Context)
The class of wellbehaved evaluation functions simplifies and makes efficient the handling of numerical attributes; for them it suffices to concentrate on the boundary points in searching for the optimal partition. This holds always for binary partitions and also for multisplits if only the function is cumulative in addition to being wellbehaved. A large portion of the most important attribute evaluation functions are wellbehaved. This paper surveys the class of wellbehaved functions. As a case study, we examine the properties of C4.5's attribute evaluation functions. Our empirical experiments show that a very simple cumulative rectification to the poor bias of information gain significantly outperforms gain ratio.