Results 1 - 10
of
100
Supervised and Unsupervised Discretization of Continuous Features
- MACHINE LEARNING: PROCEEDINGS OF THE TWELFTH INTERNATIONAL CONFERENCE
, 1995
"... Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify defining characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised dis ..."
Abstract
-
Cited by 335 (9 self)
- Add to MetaCart
Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify defining characteristics of the methods, and conduct an empirical evaluation of several methods. We compare binning, an unsupervised discretization method, to entropy-based and purity-based methods, which are supervised algorithms. We found that the performance of the Naive-Bayes algorithm significantly improved when features were discretized using an entropy-based method. In fact, over the 16 tested datasets, the discretized version of Naive-Bayes slightly outperformed C4.5 on average. We also show that in some cases, the performance of the C4.5 induction algorithm significantly improved if features were discretized in advance; in our experiments, the performance never significantly degraded, an interesting phenomenon considering the fact that C4.5 is capable of locally discretizing features.
Distributional Clustering of Words for Text Classification
, 1998
"... This paper describes the application of Distributional Clustering [20] to document classification. This approach clusters words into groups based on the distribution of class labels associated with each word. Thus, unlike some other unsupervised dimensionalityreduction techniques, such as Latent Sem ..."
Abstract
-
Cited by 197 (1 self)
- Add to MetaCart
This paper describes the application of Distributional Clustering [20] to document classification. This approach clusters words into groups based on the distribution of class labels associated with each word. Thus, unlike some other unsupervised dimensionalityreduction techniques, such as Latent Semantic Indexing, we are able to compress the feature space much more aggressively, while still maintaining high document classification accuracy. Experimental results obtained on three real-world data sets show that we can reduce the feature dimensionality by three orders of magnitude and lose only 2% accuracy---significantly better than Latent Semantic Indexing [6], class-based clustering [1], feature selection by mutual information [23], or Markov-blanket-based feature selection [13]. We also show that less aggressive clustering sometimes results in improved classification accuracy over classification without clustering. 1 Introduction The popularity of the Internet has caused an exponent...
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
- Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract
-
Cited by 121 (1 self)
- Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, tree-structured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
RainForest - a Framework for Fast Decision Tree Construction of Large Datasets
- In VLDB
, 1998
"... Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework fo ..."
Abstract
-
Cited by 85 (8 self)
- Add to MetaCart
Classification of large datasets is an important data mining problem. Many classification algorithms have been proposed in the literature, but studies have shown that so far no algorithm uniformly outperforms all other algorithms in terms of quality. In this paper, we present a unifying framework for decision tree classifiers that separates the scalability aspects of algorithms for constructing a decision tree from the central features that determine the quality of the tree. This generic algorithm is easy to instantiate with specific algorithms from the literature (including C4.5, CART,
Use of Contextual Information for Feature Ranking and Discretization
, 1997
"... Deriving classification rules or decision trees from examples is an important problem. When there are too many features, discarding weak features before the derivation process is highly desirable. When there are numeric features, they need to be discretized for the rule generation. We present a ne ..."
Abstract
-
Cited by 42 (7 self)
- Add to MetaCart
Deriving classification rules or decision trees from examples is an important problem. When there are too many features, discarding weak features before the derivation process is highly desirable. When there are numeric features, they need to be discretized for the rule generation. We present a new approach to these problems. Traditional techniques make use of feature merits based on either the information theoretic or statistical correlation between each feature and the class. We instead assign merits to features by finding each feature's "obligation" to the class discrimination in the context of other features. The merits are then used to rank the features, select a feature subset, and to discretize the numeric variables. Experience with benchmark example sets demonstrates that the new approachisapowerful alternative to the traditional methods. This paper concludes by posing some new technical issues that arise from this approach.
Symbolic Representation of Neural Networks
- IEEE Computer
, 1996
"... Although backpropagation neural networks generally predict better than decision trees do for pattern classification problems, they are often regarded as black boxes, i.e., their predictions cannot be explained as those of decision trees. In many applications, more often than not, explicit knowled ..."
Abstract
-
Cited by 39 (9 self)
- Add to MetaCart
Although backpropagation neural networks generally predict better than decision trees do for pattern classification problems, they are often regarded as black boxes, i.e., their predictions cannot be explained as those of decision trees. In many applications, more often than not, explicit knowledge is needed by human experts. This work derives symbolic representations from a neural network to make epxlicit each prediction of the network. An algorithm is proposed and implemented to extract symbolic rules from neural networks.
Compression-Based Discretization of Continuous Attributes
- Proceedings of the 12th International Conference on Machine Learning
, 1995
"... Discretization of continuous attributes into ordered discrete attributes can be beneficial even for propositional induction algorithms that are capable of handling continuous attributes directly. Benefits include possibly large improvements in induction time, smaller sizes of induced trees or rule s ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
Discretization of continuous attributes into ordered discrete attributes can be beneficial even for propositional induction algorithms that are capable of handling continuous attributes directly. Benefits include possibly large improvements in induction time, smaller sizes of induced trees or rule sets, and even improved predictive accuracy. We define a global evaluation measure for discretizations based on the so-called Minimum Description Length (MDL) principle from information theory. Furthermore we describe the efficient algorithmic usage of this measure in the MDL-Disc algorithm. The new method solves some problems of alternative local measures used for discretization. Empirical results in a few natural domains and extensive experiments in an artificial domain show that MDL-Disc scales up well to large learning problems involving noise. 1 Financial support for the Austrian Research Institute for Artificial Intelligence is provided by the Austrian Federal Ministry of Science and...
Selecting Examples for Partial Memory Learning
- Machine Learning
, 2000
"... . This paper describes a method for selecting training examples for a partial memory learning system. The method selects extreme examples that lie at the boundaries of concept descriptions and uses these examples with new training examples to induce new concept descriptions. Forgetting mechanisms al ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
. This paper describes a method for selecting training examples for a partial memory learning system. The method selects extreme examples that lie at the boundaries of concept descriptions and uses these examples with new training examples to induce new concept descriptions. Forgetting mechanisms also may be active to remove examples from partial memory that are irrelevant or outdated for the learning task. Using an implementation of the method, we conducted a lesion study and a direct comparison to examine the effects of partial memory learning on predictive accuracy and on the number of training examples maintained during learning. These experiments involved the STAGGER Concepts, a synthetic problem, and two real-world problems: a blasting cap detection problem and a computer intrusion detection problem. Experimental results suggest that the partial memory learner notably reduced memory requirements at the slight expense of predictive accuracy, and tracked concept drift as well as ot...
Symbolic Interpretation of Artificial Neural Networks
, 1996
"... Hybrid Intelligent Systems that combine knowledge based and artificial neural network systems typically have four phases involving domain knowledge representation, mapping of this knowledge into an initial connectionist architecture, network training and rule extraction respectively. The final phase ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Hybrid Intelligent Systems that combine knowledge based and artificial neural network systems typically have four phases involving domain knowledge representation, mapping of this knowledge into an initial connectionist architecture, network training and rule extraction respectively. The final phase is important because it can provide a trained connectionist architecture with explanation power and validate its output decisions. Moreover, it can be used to refine and maintain the initial knowledge acquired from domain experts. In this paper, we present three rule extraction techniques. The first technique extracts a set of binary rules from any type of neural network. The other two techniques are specific to feedforward networks with a single hidden layer of sigmoidal units. Technique 2 extracts partial rules that represent the most important embedded knowledge with an adjustable level of detail, while the third technique provides a more comprehensive and universal approach. A rule eval...
Extracting Rules from Neural Networks by Pruning and Hidden-Unit Splitting
, 1994
"... An algorithm for extracting rules from a standard three-layer feedforward neural network is proposed. The trained network is first pruned not only to remove redundant connections in the network, but more importantly, to detect the relevant inputs. The algorithm generates rules from the pruned net ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
An algorithm for extracting rules from a standard three-layer feedforward neural network is proposed. The trained network is first pruned not only to remove redundant connections in the network, but more importantly, to detect the relevant inputs. The algorithm generates rules from the pruned network by considering only a small number of activation values at the hidden units. If the number of inputs connected to a hidden unit is sufficiently small, then rules that describe how each of its activation values is obtained can be readily generated. Otherwise, the hidden unit will be split and treated as output units, with each output unit corresponding to an activation value. A hidden layer is inserted and a new subnetwork is formed, trained, and pruned. This process is repeated until every hidden unit in the network has a relatively small number of input units connected to it. Examples on how the proposed algorithm works are shown using real-world data arising from molecular bio...

