Results 1 - 10
of
41
Unifying Instance-Based and Rule-Based Induction
- MACHINE LEARNING
, 1996
"... Several well-developed approaches to inductive learning now exist, but each has specific limitations that are hard to overcome. Multi-strategy learning attempts to tackle this problem by combining multiple methods in one algorithm. This article describes a unification of two widely-used empirical ap ..."
Abstract
-
Cited by 77 (6 self)
- Add to MetaCart
Several well-developed approaches to inductive learning now exist, but each has specific limitations that are hard to overcome. Multi-strategy learning attempts to tackle this problem by combining multiple methods in one algorithm. This article describes a unification of two widely-used empirical approaches: rule induction and instance-based learning. In the new algorithm, instances are treated as maximally specific rules, and classification is performed using a best-match strategy. Rules are learned by gradually generalizing instances until no improvement in apparent accuracy is obtained. Theoretical analysis shows this approach to be efficient. It is implemented in the RISE 3.1 system. In an extensive empirical study, RISE consistently achieves higher accuracies than state-of-the-art representatives of both its parent approaches (PEBLS and CN2), as well as a decision tree learner (C4.5). Lesion studies show that each of RISE's components is essential to this performance. Most signi...
Meta-learning by landmarking various learning algorithms
- in Proceedings of the 17th International Conference on Machine Learning, ICML’2000
, 2000
"... Landmarking is a novel approach to describing tasks in meta-learning. Previous approaches to meta-learning mostly considered only statistics-inspired measures of the data as a source for the definition of metaattributes. Contrary to such approaches, landmarking tries to determine the location of a s ..."
Abstract
-
Cited by 53 (6 self)
- Add to MetaCart
Landmarking is a novel approach to describing tasks in meta-learning. Previous approaches to meta-learning mostly considered only statistics-inspired measures of the data as a source for the definition of metaattributes. Contrary to such approaches, landmarking tries to determine the location of a specific learning problem in the space of all learning problems by directly measuring the performance of some simple and efficient learning algorithms themselves. In the experiments reported we show how such a use of landmark values can help to distinguish between areas of the learning space favouring different learners. Experiments, both with artificial and real-world databases, show that landmarking selects, with moderate but reasonable level of success, the best performing of a set of learning algorithms. 1.
An Extensible Meta-Learning Approach for Scalable and Accurate Inductive Learning
, 1996
"... Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Som ..."
Abstract
-
Cited by 42 (8 self)
- Add to MetaCart
Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Some learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of data, especially for applications in data mining. One approach to handling a large data set is to partition the data set into subsets, run the learning algorithm on each of the subsets, and combine the results. Moreover, data can be inherently distributed across multiple sites on the network and merging all the data in one location can be expensive or prohibitive. In this thesis we propose, investigate, and evaluate a meta-learning approach to integrating the results of mul...
For Every Generalization Action, Is There Really An Equal And Opposite Reaction? Analysis of the Conservation Law for Generalization Performance
- Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... The "Conservation Law for Generalization Performance" [Schaffer, 1994] states that for any learning algorithm and bias, "generalization is a zero-sum enterprise." In this paper we study the law and show that while the law is true, the manner in which the Conservation Law adds up generalization ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
The "Conservation Law for Generalization Performance" [Schaffer, 1994] states that for any learning algorithm and bias, "generalization is a zero-sum enterprise." In this paper we study the law and show that while the law is true, the manner in which the Conservation Law adds up generalization performance over all target concepts, without regard to the probability with which each concept occurs, is relevant only in a uniformly random universe. We then introduce a more meaningful measure of generalization, expected generalization performance. Unlike the Conservation Law's measure of generalization perfor- mance (which is, in essence, defined to be zero), expected generalization performance is conserved only when certain symmetric properties hold in our universe. There is no reason to believe, a priori, that such symmetries exist; learning algorithms may well ex- hibit non-zero (expected) generalization per- forlllance.
Local Cascade Generalization
, 1998
"... In a previous work we have presented Cascade Generalization, a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new att ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
In a previous work we have presented Cascade Generalization, a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. In this paper we extend this work by applying Cascade locally. At each iteration of a divide and conquer algorithm, a reconstruction of the instance space occurs by the addition of new attributes. Each new attribute represents the probability that an example belongs to a class given by a base classifier. We have implemented three Local Generalization Algorithms. The first merges a linear discriminant with a decision tree, the second merges a naive Bayes with a decision tree, and the third mer...
Simplifying Decision Trees: A Survey
, 1996
"... Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpl ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpler, more comprehensible trees (or data structures derived from trees) with good classification accuracy, tree simplification has usually been of secondary concern relative to accuracy and no attempt has been made to survey the literature from the perspective of simplification. We present a framework that organizes the approaches to tree simplification and summarize and critique the approaches within this framework. The purpose of this survey is to provide researchers and practitioners with a concise overview of tree-simplification approaches and insight into their relative capabilities. In our final discussion, we briefly describe some empirical findings and discuss the application of tree i...
A New Supervised Learning Algorithm for Word Sense Disambiguation
- In Proceedings of the Fourteenth National Conference on Artificial Intelligence
"... The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to find a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of mo ..."
Abstract
-
Cited by 24 (12 self)
- Add to MetaCart
The Naive Mix is a new supervised learning algorithm that is based on a sequential method for selecting probabilistic models. The usual objective of model selection is to find a single model that adequately characterizes the data in a training sample. However, during model selection a sequence of models is generated that consists of the best--fitting model at each level of model complexity. The Naive Mix utilizes this sequence of models to define a probabilistic model which is then used as a probabilistic classifier to perform word--sense disambiguation. The models in this sequence are restricted to the class of decomposable log--linear models. This class of models offers a number of computational advantages. Experiments disambiguating twelve different words show that a Naive Mix formulated with a forward sequential search and Akaike's Information Criteria rivals established supervised learning algorithms such as decision trees (C4.5), rule induction (CN2) and nearest--neighbor classif...
Creating and Exploiting Coverage and Diversity
- In Work. Notes AAAI-96 Workshop Integrating Multiple Learned Models
, 1996
"... In this paper, we illustrate that increasing coverage through diversity is not enough to ensure increased prediction accuracy---if the integration method does not utilize the coverage, then no benefit arises from integrating multiple models. We compare four criteria for selecting base-level classifi ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
In this paper, we illustrate that increasing coverage through diversity is not enough to ensure increased prediction accuracy---if the integration method does not utilize the coverage, then no benefit arises from integrating multiple models. We compare four criteria for selecting base-level classifiers and demonstrate that informed selection can lead to more accurate meta-level classifiers than random selection. In addition, we illustrate empirically that straightforward integration methods fail to utilize the diversity of the baselevel classifiers. Introduction A fundamental step in forming a classifier that integrates multiple models is selecting the base-level classifiers. Ideally, the classifiers should be diverse; each should work well on different parts of the given dataset as no benefit arises from combining the predictions of a set of classifiers that all classify the same portion of the data correctly. A related objective is to maximize the coverage of the data, which is the ...

