Results 1 - 10
of
25
Wrappers for feature subset selection
- ARTIFICIAL INTELLIGENCE
, 1997
"... In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a ..."
Abstract
-
Cited by 775 (3 self)
- Add to MetaCart
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and
Improving generalization with active learning
- Machine Learning
, 1994
"... Abstract. Active learning differs from "learning from examples " in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, g ..."
Abstract
-
Cited by 334 (1 self)
- Add to MetaCart
Abstract. Active learning differs from "learning from examples " in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, giving better generalization for a fixed number of training examples. In this article, we consider the problem of learning a binary concept in the absence of noise. We describe a formalism for active concept learning called selective sampling and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers "useful. " We test our implementation, called an SGnetwork, on three domains and observe significant improvement in generalization.
The Extraction of Refined Rules from Knowledge-Based Neural Networks
- Machine Learning
, 1993
"... Neural networks, despite their empirically-proven abilities, have been little used for the refinement of existing knowledge because this task requires a three-step process. First, knowledge in some form must be inserted into a neural network. Second, the network must be refined. Third, knowledge mus ..."
Abstract
-
Cited by 176 (4 self)
- Add to MetaCart
Neural networks, despite their empirically-proven abilities, have been little used for the refinement of existing knowledge because this task requires a three-step process. First, knowledge in some form must be inserted into a neural network. Second, the network must be refined. Third, knowledge must be extracted from the network. We have previously described a method for the first step of this process. Standard neural learning techniques can accomplish the second step. In this paper, we propose and empirically evaluate a method for the final, and possibly most difficult, step. This method efficiently extracts symbolic rules from trained neural networks. The four major results of empirical tests of this method are that the extracted rules: (1) closely reproduce (and can even exceed) the accuracy of the network from which they are extracted; (2) are superior to the rules produced by methods that directly refine symbolic rules; (3) are superior to those produced by previous techniques fo...
The neural basis of cognitive development: A constructivist manifesto
- Behavioral and Brain Sciences
, 1997
"... Quartz, S. & Sejnowski, T.J. (1997). The neural basis of cognitive development: A constructivist manifesto. ..."
Abstract
-
Cited by 106 (0 self)
- Add to MetaCart
Quartz, S. & Sejnowski, T.J. (1997). The neural basis of cognitive development: A constructivist manifesto.
Symbolic and neural learning algorithms: an experimental comparison
- Machine Learning
, 1991
"... Abstract Despite the fact that many symbolic and neural network (connectionist) learning algorithms address the same problem of learning from classified examples, very little is known regarding their comparative strengths and weaknesses. Experiments comparing the ID3 symbolic learning algorithm with ..."
Abstract
-
Cited by 95 (7 self)
- Add to MetaCart
Abstract Despite the fact that many symbolic and neural network (connectionist) learning algorithms address the same problem of learning from classified examples, very little is known regarding their comparative strengths and weaknesses. Experiments comparing the ID3 symbolic learning algorithm with the perception and backpropagation neural learning algorithms have been performed using five large, real-world data sets. Overall, backpropagation performs slightly better than the other two algorithms in terms of classification accuracy on new examples, but takes much longer to train. Experimental results suggest that backpropagation can work significantly better on data sets containing numerical data. Also analyzed empirically are the effects of (1) the amount of training data, (2) imperfect training examples, and (3) the encoding of the desired outputs. Backpropagation occasionally outperforms the other two systems when given relatively small amounts of training data. It is slightly more accurate than ID3 when examples are noisy or incompletely specified. Finally, backpropagation more effectively utilizes a "distributed " output encoding.
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract
-
Cited by 94 (6 self)
- Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious read-once decision graphs (OODGs).
Induction of Oblique Decision Trees
- Journal of Artificial Intelligence Research
, 1993
"... This paper introduces a randomized technique for partitioning examples using oblique hyperplanes. Standard decision tree techniques, such as ID3 and its descendants, partition a set of points with axis-parallel hyperplanes. Our method, by contrast, attempts to find hyperplanes at any orientation. Th ..."
Abstract
-
Cited by 80 (7 self)
- Add to MetaCart
This paper introduces a randomized technique for partitioning examples using oblique hyperplanes. Standard decision tree techniques, such as ID3 and its descendants, partition a set of points with axis-parallel hyperplanes. Our method, by contrast, attempts to find hyperplanes at any orientation. The purpose of this more general technique is to find smaller but equally accurate decision trees than those created by other methods. We have tested our algorithm on both real and simulated data, and found that in some cases it produces surprisingly small trees without losing predictive accuracy. Small trees allow us, in turn, to obtain simple qualitative descriptions of each problem domain. 1 Introduction Decision trees have been used successfully for many different decision making and classification tasks. A number of standard techniques have been developed in the machine learning community, most notably Quinlan's ID3 (1986) and Breiman et al.'s CART (1984). Since the introduction of thes...
The tradeoffs of large scale learning
- IN: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 20
, 2008
"... This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of small-scale and large-scale learning problems. Small-scale learning problems are subject to the usual approx ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of small-scale and large-scale learning problems. Small-scale learning problems are subject to the usual approximation–estimation tradeoff. Large-scale learning problems are subject to a qualitatively different tradeoff involving the computational complexity of the underlying optimization algorithms in non-trivial ways.
Refining Symbolic Knowledge Using Neural Networks
- In Proceedings of the International Workshop on Multistrategy Learning
, 1991
"... This paper uses a special notation for specifying locations in a DNA sequence. The idea is to number locations with respect to a fixed, biologically-meaningful, reference point. Negative numbers indicate sites preceding the reference point (by biological convention, this appears on the left) while p ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
This paper uses a special notation for specifying locations in a DNA sequence. The idea is to number locations with respect to a fixed, biologically-meaningful, reference point. Negative numbers indicate sites preceding the reference point (by biological convention, this appears on the left) while positive numbers indicate sites following the reference point. (Zero is not used.) Figure 9 illustrates this numbering scheme. Rules use this referencing scheme by stating a position with respect to the reference location, denoted by `@', and the giving a subsequence in the positive direction. For example @-4`GGT' refers to the three nucleotide long sequence in Figure 9 that begins at position-4 and ends at position-2. In addition to this notation for specifying locations a DNA sequence, Table 3 specifies a standard coding scheme for referring to any possible combination of nucleotides using a single letter (IUB Nomenclature Committee, 1985). This scheme is compatible with the codes used by the EMBL, GenBank, and PIR data libraries, three major collections of data for molecular biology. 4.2 Promoter recognition

