Results 1  10
of
30
Efficient Learning of Selective Bayesian Network Classifiers
, 1995
"... In this paper, we present a computationally efficient method for inducing selective Bayesian network classifiers. Our approach is to use informationtheoretic metrics to efficiently select a subset of attributes from which to learn the classifier. We explore three conditional, informationtheoretic ..."
Abstract

Cited by 54 (4 self)
 Add to MetaCart
In this paper, we present a computationally efficient method for inducing selective Bayesian network classifiers. Our approach is to use informationtheoretic metrics to efficiently select a subset of attributes from which to learn the classifier. We explore three conditional, informationtheoretic metrics that are extensions of metrics used extensively in decision tree learning, namely Quinlan's gain and gain ratio metrics and Mantaras's distance metric. We experimentally show that the algorithms based on gain ratio and distance metric learn selective Bayesian networks that have predictive accuracies as good as or better than those learned by existing selective Bayesian network induction approaches (K2AS), but at a significantly lower computational cost. We prove that the subsetselection phase of these informationbased algorithms has polynomial complexity as compared to the worstcase exponential time complexity of the corresponding phase in K2AS. We also compare the performance o...
Naive Bayes and ExemplarBased approaches to Word Sense Disambiguation Revisited
 Proceedings of the 14th European Conference on Artificial Intelligence, ECAI
, 2000
"... Abstract. This paper describes an experimental comparison between two standard supervised learning methods, namely Naive Bayes and Exemplar–based classification, on the Word Sense Disambiguation (WSD) problem. The aim of the work is twofold. Firstly, it attempts to contribute to clarify some confusi ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
(Show Context)
Abstract. This paper describes an experimental comparison between two standard supervised learning methods, namely Naive Bayes and Exemplar–based classification, on the Word Sense Disambiguation (WSD) problem. The aim of the work is twofold. Firstly, it attempts to contribute to clarify some confusing information about the comparison between both methods appearing in the related literature. In doing so, several directions have been explored, including: testing several modifications of the basic learning algorithms and varying the feature space. Secondly, an improvement of both algorithms is proposed, in order to deal with large attribute sets. This modification, which basically consists in using only the positive information appearing in the examples, allows to improve greatly the efficiency of the methods, with no loss in accuracy. The experiments have been performed on the largest sense–tagged corpus available containing the most frequent and ambiguous English words. Results show that the Exemplar–based approach to WSD is generally superior to the Bayesian approach, especially when a specific metric for dealing with symbolic attributes is used. 1
Parcel: Feature Subset Selection in Variable Cost Domains
, 1998
"... The vast majority of classification systems are designed with a single set of features, and optimised to a single specified cost. However, in examples such as medical and financial risk modelling, costs are known to vary subsequent to system design. In this paper, we present a design method for feat ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
The vast majority of classification systems are designed with a single set of features, and optimised to a single specified cost. However, in examples such as medical and financial risk modelling, costs are known to vary subsequent to system design. In this paper, we present a design method for feature selection in the presence of varying costs. Starting from the Wilcoxon nonparametric statistic for the performance of a classification system, we introduce a concept called the maximum realisable receiver operating characteristic (MRROC), and prove a related theorem. A novel criterion for feature selection, based on the area under the MRROC curve, is then introduced. This leads to a framework which we call Parcel. This has the flexibility to use different combinations of features at different operating points on the resulting MRROC curve. Empirical support for each stage in our approach is provided by experiments on real world problems, with Parcel achieving superior results. iv v C...
A Decision Tree PlugIn for DataEngine
 Proc. 2nd Data Analysis Symposium
, 1998
"... : Inducing decision trees with a topdown algorithm is a wellknown and widely used method to construct classifiers from a set of sample cases. In this paper we review this technique in order to demonstrate its relative simplicity and its power to produce comprehensible results. Since the task to in ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
: Inducing decision trees with a topdown algorithm is a wellknown and widely used method to construct classifiers from a set of sample cases. In this paper we review this technique in order to demonstrate its relative simplicity and its power to produce comprehensible results. Since the task to induce a classifier from data turns up frequently in applications (e.g. credit assessment, disease detection etc.) no commercial data analysis tool can do without offering a decision tree induction module. Nevertheless, the wellknown data analysis program DataEngine tm has until recently suffered from lacking such a module. This drawback is now removed by a plugin consisting of a set of userdefined function blocks we implemented and which we describe in this paper. 1 Introduction Decision trees are a wellknown type of classifiers. Classifiers, in turn, are programs which automatically classify a case or an object, i.e. assign it according to its features to one of several given classes. ...
On the wellbehavedness of important attribute evaluation functions
 In G. Grahne (Ed.), Proceedings of the Sixth Scandinavian Conference on Artificial Intelligence (pp. 95106). Frontiers in Artificial Intelligence and Applications (Vol
, 1997
"... Abstract. The class of wellbehaved evaluation functions simplifies and makes efficient the handling of numerical attributes; for them it suffices to concentrate on the boundary points in searching for the optimal partition. This holds always for binary partitions and also for multisplits if only th ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
Abstract. The class of wellbehaved evaluation functions simplifies and makes efficient the handling of numerical attributes; for them it suffices to concentrate on the boundary points in searching for the optimal partition. This holds always for binary partitions and also for multisplits if only the function is cumulative in addition to being wellbehaved. The class of wellbehaved evaluation functions is a proper superclass of convex evaluation functions. Thus, a large proportion of the most important attribute evaluation functions are wellbehaved. This paper explores the extent and boundaries of wellbehaved functions. In particular, we examine C4.5’s default attribute evaluation function gain ratio, which has been known to have problems with numerical attributes. We show that gain ratio is not convex, but is still wellbehaved with respect to binary partitioning. However, it cannot handle higher arity partitioning well. Our empirical experiments show that a very simple cumulative rectification to the poor bias of information gain significantly outperforms gain ratio. 1
GD: A Measure based on Information Theory for Attribute Selection
 PROC. OF THE 6TH IBEROAMERICAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, LECTURES NOTES IN ARTIFICIAL INTELLIGENCE
, 1998
"... In many Machine Learning problems, the induction algorithms have to deal with attributes that are not relevant to the denition of the class. The irrelevant or redundant attributes do not aect the ideal Bayesian classier because the addition of new attributes never ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
In many Machine Learning problems, the induction algorithms have to deal with attributes that are not relevant to the denition of the class. The irrelevant or redundant attributes do not aect the ideal Bayesian classier because the addition of new attributes never
Redescription Mining: Algorithms and Applications in Bioinformatics
, 2007
"... Scientific data mining purports to extract useful knowledge from massive datasets curated through computational science efforts, e.g., in bioinformatics, cosmology, geographic sciences, and computational chemistry. In the recent past, we have witnessed major transformations of these applied sciences ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Scientific data mining purports to extract useful knowledge from massive datasets curated through computational science efforts, e.g., in bioinformatics, cosmology, geographic sciences, and computational chemistry. In the recent past, we have witnessed major transformations of these applied sciences into datadriven endeavors. In particular, scientists are now faced with an overload of vocabularies for describing domain entities. All of these vocabularies offer alternative and mostly complementary (sometimes, even contradictory) ways to organize information and each vocabulary provides a different perspective into the problem being studied. To further knowledge discovery, computational scientists need tools to help uniformly reason across vocabularies, integrate multiple forms of characterizing datasets, and situate knowledge gained from one study in terms of others. This dissertation defines a new pattern class called redescriptions that provides high level capabilities for reasoning across domain vocabularies. A redescription is a shift of vocabulary, or a different way of communicating the same information; redescription mining finds concerted sets of objects that can be defined in (at least) two ways using given descriptors. We present
Reasoning and learning in probabilistic and possibilistic networks: An overview
 Machine Learning: ECML95, Proceedings of the 8th European Conference on Machine Learning
, 1995
"... Abstract. Inference networks, probabilistic as well as possibilistic, are popular techniques to make reasoning in complex domains feasible. Since constructing such networks by hand can be tedious and time consuming, a large part of recent research has been devoted to learning them from data. In this ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract. Inference networks, probabilistic as well as possibilistic, are popular techniques to make reasoning in complex domains feasible. Since constructing such networks by hand can be tedious and time consuming, a large part of recent research has been devoted to learning them from data. In this paper we review probabilistic and possibilistic networks and discuss the basic ideas used in learning algorithms for these types of networks. With an application in the automotive industry we demonstrate that the considered methods are not only of theoretical importance, but also relevant in practice. 1
Prediction by Categorical Features: Generalization Properties and Application to Feature Ranking
"... Abstract. We describe and analyze a new approach for feature ranking in the presence of categorical features with a large number of possible values. It is shown that popular ranking criteria, such as the Gini index and the misclassification error, can be interpreted as the training error of a predic ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We describe and analyze a new approach for feature ranking in the presence of categorical features with a large number of possible values. It is shown that popular ranking criteria, such as the Gini index and the misclassification error, can be interpreted as the training error of a predictor that is deduced from the training set. It is then argued that using the generalization error is a more adequate ranking criterion. We propose a modification of the Gini index criterion, based on a robust estimation of the generalization error of a predictor associated with the Gini index. The properties of this new estimator are analyzed, showing that for most training sets, it produces an accurate estimation of the true generalization error. We then address the question of finding the optimal predictor that is based on a single categorical feature. It is shown that the predictor associated with the misclassification error criterion has the minimal expected generalization error. We bound the bias of this predictor with respect to the generalization error of the Bayes optimal predictor, and analyze its concentration properties. 1
Explaining similarity in CBR
 In ECCBR 2004 Workshop Proceedings
, 2004
"... Abstract. A desired capability of automatic problem solvers is to explain their results. Such explanations should justify that the solution proposed by the problem solver arises from the known domain knowledge. In this paper we discuss how the explanations can be used in CBR methods in order to ju ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. A desired capability of automatic problem solvers is to explain their results. Such explanations should justify that the solution proposed by the problem solver arises from the known domain knowledge. In this paper we discuss how the explanations can be used in CBR methods in order to justify the results in classification tasks and also for solving new problems. 1