Results 1  10
of
41
Knowledge acquisition via incremental conceptual clustering
 Machine Learning
, 1987
"... hill climbing Abstract. Conceptual clustering is an important way of summarizing and explaining data. However, the recent formulation of this paradigm has allowed little exploration of conceptual clustering as a means of improving performance. Furthermore, previous work in conceptual clustering has ..."
Abstract

Cited by 649 (6 self)
 Add to MetaCart
hill climbing Abstract. Conceptual clustering is an important way of summarizing and explaining data. However, the recent formulation of this paradigm has allowed little exploration of conceptual clustering as a means of improving performance. Furthermore, previous work in conceptual clustering has not explicitly dealt with constraints imposed by real world environments. This article presents COBWEB, a conceptual clustering system that organizes data so as to maximize inference ability. Additionally, COBWEB is incremental and computationally economical, and thus can be flexibly applied in a variety of domains. 1.
Unifying instance–based and rule–based induction
 Machine Learning 24
, 1996
"... Abstract. Several welldeveloped approaches to inductive learning now exist, but each has specific limitations that are hard to overcome. Multistrategy learning attempts to tackle this problem by combining multiple methods in one algorithm. This article describes a unification of two widelyused em ..."
Abstract

Cited by 87 (6 self)
 Add to MetaCart
Abstract. Several welldeveloped approaches to inductive learning now exist, but each has specific limitations that are hard to overcome. Multistrategy learning attempts to tackle this problem by combining multiple methods in one algorithm. This article describes a unification of two widelyused empirical approaches: rule induction and instancebased learning. In the new algorithm, instances are treated as maximally specific rules, and classification is performed using a bestmatch strategy. Rules are learned by gradually generalizing instances until no improvement in apparent accuracy is obtained. Theoretical analysis shows this approach to be efficient. It is implemented in the RISE 3.1 system. In an extensive empirical study, RISE consistently achieves higher accuracies than stateoftheart representatives of both its parent approaches (PEBLS and CN2), as well as a decision tree learner (C4.5). Lesion studies show that each of RISE’s components is essential to this performance. Most significantly, in 14 of the 30 domains studied, RISE is more accurate than the best of PEBLS and CN2, showing that a significant synergy can be obtained by combining multiple empirical methods.
A Theory of Learning Classification Rules
, 1992
"... The main contributions of this thesis are a Bayesian theory of learning classification rules, the unification and comparison of this theory with some previous theories of learning, and two extensive applications of the theory to the problems of learning class probability trees and bounding error whe ..."
Abstract

Cited by 80 (6 self)
 Add to MetaCart
The main contributions of this thesis are a Bayesian theory of learning classification rules, the unification and comparison of this theory with some previous theories of learning, and two extensive applications of the theory to the problems of learning class probability trees and bounding error when learning logical rules. The thesis is motivated by considering some current research issues in machine learning such as bias, overfitting and search, and considering the requirements placed on a learning system when it is used for knowledge acquisition. Basic Bayesian decision theory relevant to the problem of learning classification rules is reviewed, then a Bayesian framework for such learning is presented. The framework has three components: the hypothesis space, the learning protocol, and criteria for successful learning. Several learning protocols are analysed in detail: queries, logical, noisy, uncertain and positiveonly examples. The analysis is done by interpreting a protocol as a...
Inductive Policy: The Pragmatics of Bias Selection
 MACHINE LEARNING
, 1995
"... This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (the inductive policy). We analyze existing "blas selection " systems, examining the similarities and differences i ..."
Abstract

Cited by 42 (10 self)
 Add to MetaCart
This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (the inductive policy). We analyze existing &quot;blas selection &quot; systems, examining the similarities and differences in their inductive policies, and idemify three techniques useful for building inductive policies. We then present a framework for representing and automaticaIly selecting a wide variety of biases and describe experiments with an instantiation of the framework addressing various pragmatic tradeoffs of time, space, accuracy, and the cost oferrors. The experiments show that a common framework can be used to implement policies for a variety of different types of blas selection, such as parameter selection, term selection, and example selection, using similar techniques. The experiments also show that different tradeoffs can be made by the implementation of different policies; for example, from the same data different rule sets can be learned based on different tradeoffs of accuracy versus the cost of erroneous predictions.
Towards a Better Understanding of MemoryBased Reasoning Systems
 In Proceedings of the Eleventh International Machine Learning Conference
, 1994
"... We quantify both experimentally and analytically the performance of memorybased reasoning (MBR) algorithms. To start gaining insight into the capabilities of MBR algorithms, we compare an MBR algorithm using a value difference metric to a popular Bayesian classifier. These two approaches are similar ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
We quantify both experimentally and analytically the performance of memorybased reasoning (MBR) algorithms. To start gaining insight into the capabilities of MBR algorithms, we compare an MBR algorithm using a value difference metric to a popular Bayesian classifier. These two approaches are similar in that they both make certain independence assumptions about the data. However, whereas MBR uses specific cases to perform classification, Bayesian methods summarize the data probabilistically. We demonstrate that a particular MBR system called Pebls works comparatively well on a wide range of domains using both real and artificial data. With respect to the artificial data, we consider distributions where the concept classes are separated by functional discriminants, as well as timeseries data generated by Markov models of varying complexity. Finally, we show formally that Pebls can learn (in the limit) natural concept classes that the Bayesian classifier cannot learn, and that it will at...
Myths and Legends of the Baldwin Effect
, 1996
"... This position paper argues that the Baldwin effect is widely misunderstood by the evolutionary computation community. The misunderstandings appear to fall into two general categories. Firstly, it is commonly believed that the Baldwin effect is concerned with the synergy that results when there is an ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
This position paper argues that the Baldwin effect is widely misunderstood by the evolutionary computation community. The misunderstandings appear to fall into two general categories. Firstly, it is commonly believed that the Baldwin effect is concerned with the synergy that results when there is an evolving population of learning individuals. This is only half
How to Shift Bias: Lessons from the Baldwin Effect
, 1996
"... An inductive learning algorithm takes a set of data as input and generates a hypothesis as output. A set of data is typically consistent with an infinite number of hypotheses; therefore, there must be factors other than the data that determine the output of the learning algorithm. In machine learnin ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
An inductive learning algorithm takes a set of data as input and generates a hypothesis as output. A set of data is typically consistent with an infinite number of hypotheses; therefore, there must be factors other than the data that determine the output of the learning algorithm. In machine learning, these other factors are called the bias of the learner. Classical learning algorithms have a fixed bias, implicit in their design. Recently developed learning algorithms dynamically adjust their bias as they search for a hypothesis. Algorithms that shift bias in this manner are not as well understood as classical algorithms. In this paper, we show that the Baldwin effect has implications for the design and analysis of bias shifting algorithms. The Baldwin effect was proposed in 1896, to explain how phenomena that might appear to require Lamarckian evolution (inheritance of acquired characteristics) can arise from purely Darwinian evolution. Hinton and Nowlan presented a computational model of the Baldwin effect in 1987. We explore a variation on their model, which we constructed explicitly to illustrate the lessons that the Baldwin effect has for research in bias shifting algorithms. The main lesson is that it appears that a good strategy for shift of bias in a learning algorithm is to begin with a weak bias and gradually shift to a strong bias.
Global Data Analysis and the Fragmentation Problem in Decision Tree Induction
 In 9th European Conference on Machine Learning
, 1997
"... We investigate an inherent limitation of topdown decision tree induction in which the continuous partitioning of the instance space progressively lessens the statistical support of every partial (i.e. disjunctive) hypothesis, known as the fragmentation problem. We show, both theoretically and e ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
We investigate an inherent limitation of topdown decision tree induction in which the continuous partitioning of the instance space progressively lessens the statistical support of every partial (i.e. disjunctive) hypothesis, known as the fragmentation problem. We show, both theoretically and empirically, how the fragmentation problem adversely affects predictive accuracy as variation r (a measure of concept difficulty) increases. Applying featureconstruction techniques at every tree node, which we implement on a decision tree inducer DALI , is proved to only partially solve the fragmentation problem. Our study illustrates how a more robust solution must also assess the value of each partial hypothesis by recurring to all available training data, an approach we name global data analysis, which decision tree induction alone is unable to accomplish. The value of global data analysis is evaluated by comparing modified versions of C4.5rules with C4.5trees and DALI , on both artificial and realworld domains. Empirical results suggest the importance of combining both feature construction and global data analysis to solve the fragmentation problem.
Principled Constructive Induction
, 1991
"... A framework for the construction of new features for hard classification tasks is discussed. The approach brings together ideas from the fields of machine learning, computational geometry, and pattern recognition. Two heuristics for evaluation of newlyconstructed features are proposed, and their st ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
A framework for the construction of new features for hard classification tasks is discussed. The approach brings together ideas from the fields of machine learning, computational geometry, and pattern recognition. Two heuristics for evaluation of newlyconstructed features are proposed, and their statistical significance verified. Finally, it is shown how the proposed framework can be used to combine techniques for selection of representative examples with techniques for construction of new features, in order to solve difficult problems in learning from examples. 1. Introduction. The problem of new terms, also known as the constructive induction problem, has long been considered a source of difficulty in machine learning (Dietterich, 1982). Simple classifiers using only the primitive features of description have limited learning capabilities. For example: (i) Singlelayered neural networks can realize only those class dichotomies, where the classes are linearly separable in the featur...
Seer: Maximum Likelihood Regression for LearningSpeed Curves
 University of Illinois at
, 1995
"... The research presented here focuses on modeling machinelearning performance. The thesis introduces Seer, a system that generates empirical observations of classificationlearning performance and then uses those observations to create statistical models. The models can be used to predict the number ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
The research presented here focuses on modeling machinelearning performance. The thesis introduces Seer, a system that generates empirical observations of classificationlearning performance and then uses those observations to create statistical models. The models can be used to predict the number of training examples needed to achieve a desired level and the maximum accuracy possible given an unlimited number of training examples. Seer advances the state of the art with 1) models that embody the best constraints for classification learning and most useful parameters, 2) algorithms that efficiently find maximumlikelihood models, and 3) a demonstration on realworld data from three domains of a practicable application of such modeling. The first part of the thesis gives an overview of the requirements for a good maximumlikelihood model of classificationlearning performance. Next, reasonable design choices for such models are explored. Selection among such models is a task of nonlinear programming, but by exploiting appropriate problem constraints, the task is reduced to a nonlinear regression task that can be solved with an efficient iterative algorithm. The latter part of the thesis describes almost 100 experiments in the domains of soybean disease, heart disease, and audiological problems. The tests show that Seer is excellent at characterizing learningperformance and that it seems to be as good as possible at predicting learning