Results 1 - 10
of
25
Rule Induction with CN2: Some Recent Improvements
, 1991
"... The CN2 algorithm induces an ordered list of classification rules from examples using entropy as its search heuristic. In this short paper, we describe two improvements to this algorithm. Firstly, we present the use of the Laplacian error estimate as an alternative evaluation function and secondly, ..."
Abstract
-
Cited by 291 (2 self)
- Add to MetaCart
The CN2 algorithm induces an ordered list of classification rules from examples using entropy as its search heuristic. In this short paper, we describe two improvements to this algorithm. Firstly, we present the use of the Laplacian error estimate as an alternative evaluation function and secondly, we show how unordered as well as ordered rules can be generated. We experimentally demonstrate significantly improved performances resulting from these changes, thus enhancing the usefulness of CN2 as an inductive tool. Comparisons with Quinlan's C4.5 are also made. Keywords: learning, rule induction, CN2, Laplace, noise 1 Introduction Rule induction from examples has established itself as a basic component of many machine learning systems, and has been the first ML technology to deliver commercially successful applications (eg. the systems GASOIL [Slocombe et al., 1986], BMT [Hayes-Michie, 1990], and in process control [Leech, 1986]). The continuing development of inductive techniques is t...
Error Reduction through Learning Multiple Descriptions
, 1996
"... . Learning multiple descriptions for each class in the data has been shown to reduce generalization error but the amount of error reduction varies greatly from domain to domain. This paper presents a novel empirical analysis that helps to understand this variation. Our hypothesis is that the amount ..."
Abstract
-
Cited by 114 (3 self)
- Add to MetaCart
. Learning multiple descriptions for each class in the data has been shown to reduce generalization error but the amount of error reduction varies greatly from domain to domain. This paper presents a novel empirical analysis that helps to understand this variation. Our hypothesis is that the amount of error reduction is linked to the "degree to which the descriptions for a class make errors in a correlated manner." We present a precise and novel definition for this notion and use twenty-nine data sets to show that the amount of observed error reduction is negatively correlated with the degree to which the descriptions make errors in a correlated manner. We empirically show that it is possible to learn descriptions that make less correlated errors in domains in which many ties in the search evaluation measure (e.g. information gain) are experienced during learning. The paper also presents results that help to understand when and why multiple descriptions are a help (irrelevant attribute...
The role of Occam’s Razor in knowledge discovery
- Data Mining and Knowledge Discovery
, 1999
"... Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite di ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
Abstract. Many KDD systems incorporate an implicit or explicit preference for simpler models, but this use of “Occam’s razor ” has been strongly criticized by several authors (e.g., Schaffer, 1993; Webb, 1996). This controversy arises partly because Occam’s razor has been interpreted in two quite different ways. The first interpretation (simplicity is a goal in itself) is essentially correct, but is at heart a preference for more comprehensible models. The second interpretation (simplicity leads to greater accuracy) is much more problematic. A critical review of the theoretical arguments for and against it shows that it is unfounded as a universal principle, and demonstrably false. A review of empirical evidence shows that it also fails as a practical heuristic. This article argues that its continued use in KDD risks causing significant opportunities to be missed, and should therefore be restricted to the comparatively few applications where it is appropriate. The article proposes and reviews the use of domain constraints as an alternative for avoiding overfitting, and examines possible methods for handling the accuracy–comprehensibility trade-off.
Selecting a Classification Method by Cross-Validation
- Machine Learning
, 1993
"... If we lack relevant problem-specific knowledge, cross-validation methods may be used to select a classification method empirically. We examine this idea here to show in what senses cross-validation does and does not solve the selection problem. As illustrated empirically, cross-validation may lead t ..."
Abstract
-
Cited by 57 (0 self)
- Add to MetaCart
If we lack relevant problem-specific knowledge, cross-validation methods may be used to select a classification method empirically. We examine this idea here to show in what senses cross-validation does and does not solve the selection problem. As illustrated empirically, cross-validation may lead to higher average performance than application of any single classification strategy and it also cuts the risk of poor performance. On the other hand, cross-validation is no more or less a form of bias than simpler strategies and applying it appropriately ultimately depends in the same way on prior knowledge. In fact, cross-validation may be seen as a way of applying partial information about the applicability of alternative classification strategies. Keywords: Cross-validation, classification, decision trees, neural networks. 1 Introduction Machine learning researchers and statisticians have produced a host of approaches to the problem of classification including methods for inducing rul...
Inductive and Bayesian learning in medical diagnosis
- Applied Artificial Intelligence
, 1993
"... Abstract. Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two di erent approaches to machine learning in medical appli-cations are compared: the system for inductive learning of decision trees Assistant, and t ..."
Abstract
-
Cited by 56 (9 self)
- Add to MetaCart
Abstract. Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two di erent approaches to machine learning in medical appli-cations are compared: the system for inductive learning of decision trees Assistant, and the naive Bayesian classi er. Both methodologies were tested in four medical diagnostic problems: localization of primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology. The accuracy of automatically acquired diagnostic knowledge from stored data records is compared and the interpretation of the knowledge and the explanation ability of the classi cation process of each system is discussed. Surprisingly, thenaiveBayesian classi er is superior to Assistant in classi cation accuracy and explanation ability, while the interpretation of the acquired knowledge seems to be equally valuable. In ad-dition, two extensions to naive Bayesian classi er are brie y described: dealing with continuous attributes, and discovering the dependencies among attributes.
Occam's Two Razors: The Sharp and the Blunt
- In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining
, 1998
"... Occam's razor has been the subject of much controversy. This paper argues that this is partly because it has been interpreted in two quite different ways, the first of which (simplicity is a goal in itself) is essentially correct, while the second (simplicity leads to greater accuracy) is not. The p ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Occam's razor has been the subject of much controversy. This paper argues that this is partly because it has been interpreted in two quite different ways, the first of which (simplicity is a goal in itself) is essentially correct, while the second (simplicity leads to greater accuracy) is not. The paper reviews the large variety of theoretical arguments and empirical evidence for and against the "second razor," and concludes that the balance is strongly against it. In particular, it builds on the case of (Schaffer, 1993) and (Webb, 1996) by considering additional theoretical arguments and recent empirical evidence that the second razor fails in most domains. A version of the first razor more appropriate to KDD is proposed, and we argue that continuing to apply the second razor risks causing significant opportunities to be missed. 1 Occam's Two Razors William of Occam's famous razor states that "Nunquam ponenda est pluralitas sin necesitate," which, approximately translated, means "En...
Knowledge Discovery In Databases: An Attribute-Oriented Rough Set Approach
, 1995
"... Knowledge Discovery in Databases (KDD) is an active research area with the promise for a high payoff in many business and scientific applications. The grand challenge of knowledge discovery in databases is to automatically process large quantities of raw data, identify the most significant and meani ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Knowledge Discovery in Databases (KDD) is an active research area with the promise for a high payoff in many business and scientific applications. The grand challenge of knowledge discovery in databases is to automatically process large quantities of raw data, identify the most significant and meaningful patterns, and present this knowledge in an appropriate form for achieving the user's goal. Knowledge discovery systems face challenging problems from the real-world databases which tend to be very large, redundant, noisy and dynamic. Each of these problems has been addressed to some extent within machine learning, but few, if any, systems address them all. Collectively handling these problems while producing useful knowledge efficiently and effectively is the main focus of the thesis. In this thesis, we develop an attribute-oriented rough set approach for knowledge discovery in databases. The method adopts the artificial intelligent "learning from examples" paradigm combined with rough...
Knowledge Acquisition via Knowledge Integration
- in Current Trends in AI, B. Wielenga et al.(eds.), IOS
, 1990
"... . In this paper we are concerned with the problem of acquiring knowledge by integration. Our aim is to construct an integrated knowledge base from several separate sources. The need to merge knowledge bases can arise, for example, when knowledge bases are acquired independently from interactions wi ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
. In this paper we are concerned with the problem of acquiring knowledge by integration. Our aim is to construct an integrated knowledge base from several separate sources. The need to merge knowledge bases can arise, for example, when knowledge bases are acquired independently from interactions with several domain experts. As opinions of different domain experts may differ, the knowledge bases constructed in this way will normally differ too. A similar problem can also arise whenever separate knowledge bases are generated by learning algorithms. The objective of integration is to construct one system that exploits all the knowledge that is available and has a good performance. The aim of this paper is to discuss the methodology of knowledge integration, describe the implemented system (INTEG.3), and present some concrete results which demonstrate the advantages of this method. 1. Introduction The areas of knowledge acquisition (KA) and machine learning (ML) have until recently exist...
On learning multiple descriptions of a concept
- PROCEEDINGS OF TOOLS WITH ARTIFICIAL INTELLIGENCE (PP. 476–483
, 1994
"... In sparse data environments, greater classification accuracy can be achieved by learning several concept descriptions of the data and combining their classifications. Stochastic search is a general tool which can be used to generate many good concept descriptions (rule sets) for each class in the da ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
In sparse data environments, greater classification accuracy can be achieved by learning several concept descriptions of the data and combining their classifications. Stochastic search is a general tool which can be used to generate many good concept descriptions (rule sets) for each class in the data. Bayesian probability theory offers an optimal strategy for combining classifications of the individual concept descriptions, and here we use an approximation of that theory. This strategy is most useful when additional data is difficult to obtain and every increase in classification accuracy is important. The primary result of this paper is that multiple concept descriptions are particularly helpful in "flat" hypothesis spaces in which there are many equally good ways to grow a rule, each having similar gain. Another result is experimental evidence that learning multiple rule sets yields more accurate classifications than learning multiple rules for some domains. To demonstrate these behaviors, we learn multiple concept descriptions by adapting HYDRA, a noise-tolerant relational learning algorithm.
A Comparison of Methods for Learning and Combining Evidence From Multiple Models
, 1995
"... Most previous work on multiple models has been done on a few domains. We present a comparsion of three ways of learning multiple models on 29 data sets from the UCI repository. The methods are bagging, k-fold partition learning and stochastic search. By using 29 data sets of various kinds -- artific ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Most previous work on multiple models has been done on a few domains. We present a comparsion of three ways of learning multiple models on 29 data sets from the UCI repository. The methods are bagging, k-fold partition learning and stochastic search. By using 29 data sets of various kinds -- artificial data sets, artificial data sets with noise, molecular-biology and real-world noisy data sets -- we are able to draw robust experimental conclusions about the kinds of data sets for which each learning method works best. We also compare four evidence combination methods (Uniform Voting, Bayesian Combination, Distribution Summation and Likelihood Combination) and characterize the kinds of data sets for which each method works best.

