Results 1 - 10
of
25
Use of Contextual Information for Feature Ranking and Discretization
, 1997
"... Deriving classification rules or decision trees from examples is an important problem. When there are too many features, discarding weak features before the derivation process is highly desirable. When there are numeric features, they need to be discretized for the rule generation. We present a ne ..."
Abstract
-
Cited by 42 (7 self)
- Add to MetaCart
Deriving classification rules or decision trees from examples is an important problem. When there are too many features, discarding weak features before the derivation process is highly desirable. When there are numeric features, they need to be discretized for the rule generation. We present a new approach to these problems. Traditional techniques make use of feature merits based on either the information theoretic or statistical correlation between each feature and the class. We instead assign merits to features by finding each feature's "obligation" to the class discrimination in the context of other features. The merits are then used to rank the features, select a feature subset, and to discretize the numeric variables. Experience with benchmark example sets demonstrates that the new approachisapowerful alternative to the traditional methods. This paper concludes by posing some new technical issues that arise from this approach.
Naive Bayesian classifier within ILP-R
- Department of Computer Science, Katholieke Universiteit Leuven
, 1995
"... When dealing with the classification problems, current ILP systems often lag behind stateof -the-art attributional learners. Part of the blame can be ascribed to a much larger hypothesis space which, therefore, cannot be as thoroughly explored. However, sometimes it is due to the fact that ILP syste ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
When dealing with the classification problems, current ILP systems often lag behind stateof -the-art attributional learners. Part of the blame can be ascribed to a much larger hypothesis space which, therefore, cannot be as thoroughly explored. However, sometimes it is due to the fact that ILP systems do not take into account the probabilistic aspects of hypotheses when classifying unseen examples. This paper proposes just that. We developed a naive Bayesian classifier within our ILP-R first order learner. The learner itself uses a clever RELIEF based heuristic which is able to detect strong dependencies within the literal space when such dependencies exist. We conducted a series of experiments on artificial and real-world data sets. The results show that the combination of ILP-R together with the naive Bayesian classifier sometimes significantly improves the classification of unseen instances as measured by both classification accuracy and average information score. 1 Introduction Th...
An adaptation of Relief for attribute estimation in regression
, 1997
"... Heuristic measures for estimating the quality of attributes mostly assume the independence of attributes so in domains with strong dependencies between attributes their performance is poor. Relief and its extension ReliefF are capable of correctly estimating the quality of attributes in classificati ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Heuristic measures for estimating the quality of attributes mostly assume the independence of attributes so in domains with strong dependencies between attributes their performance is poor. Relief and its extension ReliefF are capable of correctly estimating the quality of attributes in classification problems with strong dependencies between attributes. By exploiting local information provided by different contexts they provide a global view. We present the analysis of ReliefF which lead us to its adaptation to regression (continuous class) problems. The experiments on artificial and real-world data sets show that Regressional ReliefF correctly estimates the quality of attributes in various conditions, and can be used for non-myopic learning of the regression trees. Regressional ReliefF and ReliefF provide a unified view on estimating the attribute quality in regression and classification. 1 Introduction The majority of current propositional inductive learning systems predict discret...
Learning By Discovering Concept Hierarchies
- Artificial Intelligence
, 1999
"... We present a new machine learning method that, given a set of training examples, induces a definition of the target concept in terms of a hierarchy of intermediate concepts and their definitions. This effectively decomposes the problem into smaller, less complex problems. The method is inspired b ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
We present a new machine learning method that, given a set of training examples, induces a definition of the target concept in terms of a hierarchy of intermediate concepts and their definitions. This effectively decomposes the problem into smaller, less complex problems. The method is inspired by the Boolean function decomposition approach to the design of switching circuits. To cope with high time complexity of finding an optimal decomposition, we propose a suboptimal heuristic algorithm. The method, implemented in program HINT (Hierarchy INduction Tool), is experimentally evaluated using a set of artificial and real-world learning problems. In particular, the evaluation addresses the generalization property of decomposition and its capability to discover meaningful hierarchies. The experiments show that HINT performs well in both respects. Keywords Function decomposition, Machine learning, Concept hierarchies, Concept discovery, Constructive induction, Generalization 1 ...
Analysing and Improving the Diagnosis of Ischaemic Heart Disease with Machine Learning
, 1999
"... Ischaemic heart disease is one of the world's most important causes of mortality, so improvements and rationalization of diagnostic procedures would be very useful. The four diagnostic levels consist of evaluation of signs and symptoms of the disease and ECG (electrocardiogram) at rest, sequentia ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Ischaemic heart disease is one of the world's most important causes of mortality, so improvements and rationalization of diagnostic procedures would be very useful. The four diagnostic levels consist of evaluation of signs and symptoms of the disease and ECG (electrocardiogram) at rest, sequential ECG testing during the controlled exercise, myocardial scintigraphy, and finally coronary angiography (which is considered to be the reference method).
Decomposition of heterogeneous classification problems
- Intelligent Data Analysis
, 1998
"... In some classification problems the feature space is heterogeneous in that the best features on which to base the classification are different in different parts of the feature space. In some other problems the classes can be divided into subsets such that distinguishing one subset of classes from a ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In some classification problems the feature space is heterogeneous in that the best features on which to base the classification are different in different parts of the feature space. In some other problems the classes can be divided into subsets such that distinguishing one subset of classes from another and classifying examples within the subsets require very different decision rules, involving different sets of features. In such heterogeneous problems, many modeling techniques (including decision trees, rules, and neural networks) evaluate the performance of alternative decision rules by averaging over the entire problem space, and are prone to generating a model that is suboptimal in any of the regions or subproblems. Better overall models can be obtained by splitting the problem appropriately and modeling each subproblem separately. This paper presents a new measure to determine the degree of dissimilarity between the decision surfaces of two given problems, and suggests a way to search for a strategic splitting of the feature space that identifies regions with different characteristics. We illustrate the concept using a multiplexor problem, and apply the method to a DNA classification problem.
An Application of Machine Learning in the Diagnosis of Ischaemic Heart Disease
, 1997
"... . Ishaemic heart disease is one of the world's most important causes of mortality, so improvements and rationalization of diagnostic procedures would be very useful. The four diagnostic levels consist of evaluation of signs and symptoms of the disease and ECG (electrocardiogram) at rest, sequent ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
. Ishaemic heart disease is one of the world's most important causes of mortality, so improvements and rationalization of diagnostic procedures would be very useful. The four diagnostic levels consist of evaluation of signs and symptoms of the disease and ECG (electrocardiogram) at rest, sequential ECG testing during the controlled exercise, myocardial scintigraphy and finally coronary angiography. The diagnostic process is stepwise and the results are interpreted hierarchically, i.e. the next step is necessary only if the results of the former are inconclusive. Because the suggestibility is possible, the results of each step are interpreted individually and only the results of the highest step are valid. On the other hand, Machine Learning methods may be able of objective interpretation of all available results for the same patient and in this way increase the diagnostic accuracy of each step. We conducted many experiments with four learning algorithms and di#erent variat...
Attribute Selection for Modeling
, 1997
"... Modelling a target attribute by other attributes in the data is perhaps the most traditional data mining task. When there are many attributes in the data, one needs to know which of the attribute (s) are relevant for modelling the target, either as a group or the one feature that is most appropriate ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Modelling a target attribute by other attributes in the data is perhaps the most traditional data mining task. When there are many attributes in the data, one needs to know which of the attribute (s) are relevant for modelling the target, either as a group or the one feature that is most appropriate to select within the model construction process in progress. There are many approaches for selecting the attribute(s) in machine learning. We examine various important concepts and approaches that are used for this purpose and contrast their strengths. Discretization of numeric attributes is also discussed for its use is prevalentinmany modelling techniques. Keywords: attribute quality measures, impurity function, discretization, classification, regression 1 1 Introduction A precondition to any data mining is data itself. The purpose of data mining is to explore the data and to eventually discover certain relationships, rules, correlations etc. that can give some insights about the data ...
Anytime learning of decision trees
- Journal of Machine Learning Research
"... The majority of existing algorithms for learning decision trees are greedy—a tree is induced topdown, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Even the few non-greedy learners cannot learn good trees when the concept is diff ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The majority of existing algorithms for learning decision trees are greedy—a tree is induced topdown, making locally optimal decisions at each node. In most cases, however, the constructed tree is not globally optimal. Even the few non-greedy learners cannot learn good trees when the concept is difficult. Furthermore, they require a fixed amount of time and are not able to generate a better tree if additional time is available. We introduce a framework for anytime induction of decision trees that overcomes these problems by trading computation speed for better tree quality. Our proposed family of algorithms employs a novel strategy for evaluating candidate splits. A biased sampling of the space of consistent trees rooted at an attribute is used to estimate the size of the minimal tree under that attribute, and an attribute with the smallest expected tree is selected. We present two types of anytime induction algorithms: a contract algorithm that determines the sample size on the basis of a pre-given allocation of time, and an interruptible algorithm that starts with a greedy tree and continuously improves subtrees by additional sampling. Experimental results indicate that, for several hard concepts, our proposed approach exhibits good anytime behavior and yields significantly better decision trees when more time is available.
A selective sampling approach to active feature selection
- Artificial Intelligence 159(1-2
, 2004
"... Feature selection, as a preprocessing step to machine learning, has been very effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. Traditional feature selection methods resort to random sampling in dealing with data set ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Feature selection, as a preprocessing step to machine learning, has been very effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. Traditional feature selection methods resort to random sampling in dealing with data sets with a huge number of instances. In this paper, we introduce the concept of active feature selection, and investigate a selective sampling approach to active feature selection in a filter model setting. We present a formalism of selective sampling based on data variance, and apply it to a widely used feature selection algorithm Relief. Further, we show how it realizes active feature selection and reduces the required number of training instances to achieve time savings without performance deterioration. We design objective evaluation measures of performance, conduct extensive experiments using both synthetic and benchmark data sets, and observe consistent and significant improvement. We suggest some further work based on our study and experiments.

