Results 1 - 10
of
61
SLIQ: A Fast Scalable Classifier for Data Mining
, 1996
"... . Classification is an important problem in the emerging field of data mining. Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This pap ..."
Abstract
-
Cited by 173 (8 self)
- Add to MetaCart
. Classification is an important problem in the emerging field of data mining. Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets. This paper discusses issues in building a scalable classifier and presents the design of SLIQ 1 , a new classifier. SLIQ is a decision tree classifier that can handle both numeric and categorical attributes. It uses a novel pre-sorting technique in the tree-growth phase. This sorting procedure is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets. SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees. The combination of these techniques enables SLIQ to scale for large data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an ...
A Guide to the Literature on Learning Probabilistic Networks From Data
, 1996
"... This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the ..."
Abstract
-
Cited by 156 (0 self)
- Add to MetaCart
This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The presentation avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples. Keywords--- Bayesian networks, graphical models, hidden variables, learning, learning structure, probabilistic networks, knowledge discovery. I. Introduction Probabilistic networks or probabilistic gra...
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
- Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract
-
Cited by 122 (1 self)
- Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, tree-structured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Learning classification trees
- Statistics and Computing
, 1992
"... Algorithms for learning cIassification trees have had successes in ar-tificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statis-tics. This iutroduces Bayesian techniques for splitting, smoothing, and tree averaging. T ..."
Abstract
-
Cited by 112 (8 self)
- Add to MetaCart
Algorithms for learning cIassification trees have had successes in ar-tificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statis-tics. This iutroduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to QuinIan’s information gain, while smoothing and averaging replace pruning. Comparative ex-periments with reimplementations of a minimum encoding approach, Quinlan’s C4 (1987) and Breiman et aL’s CART (1984) show the full Bayesian algorithm produces more accurate predictions than versions
Making Use of Population Information in Evolutionary Artificial Neural Networks
, 1998
"... This paper is concerned with the simultaneous evolution of artificial neural network (ANN) architectures and weights. The current practice in evolving ANNs is to choose the best ANN in the last generation as the final result. This paper proposes a different approach to form the final result by combi ..."
Abstract
-
Cited by 65 (22 self)
- Add to MetaCart
This paper is concerned with the simultaneous evolution of artificial neural network (ANN) architectures and weights. The current practice in evolving ANNs is to choose the best ANN in the last generation as the final result. This paper proposes a different approach to form the final result by combining all the individuals in the last generation in order to make best use of all the information contained in the whole population. This approach regards a population of ANNs as an ensemble and uses a combination method to integrate them. Although there has been some work on integrating ANN modules [2], [3], little has been done in evolutionary learning to make best use of its population information. Four linear combination methods have been investigated in this paper to illustrate our ideas. Three real world data sets have been used in our experimental studies, which show that the recursive least square (RLS) algorithm always produces an integrated system that outperforms the best individua...
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning
, 1998
"... Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to gene ..."
Abstract
-
Cited by 56 (4 self)
- Add to MetaCart
Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to generate class models. These classifiers first build a decision tree and then prune subtrees from the decision tree in a subsequent pruning phase to improve accuracy and prevent "overfitting". Generating the decision tree in two distinct phases could result in a substantial amount of wasted effort since an entire subtree constructed in the first phase may later be pruned in the next phase. In this paper, we propose PUBLIC, an improved decision tree classifier that integrates the second "pruning" phase with the initial "building" phase. In PUBLIC, a node is not expanded during the building phase, if it is determined that it will be pruned during the subsequent pruning phase. In order to ma...
The practical implementation of Bayesian model selection
- Institute of Mathematical Statistics
, 2001
"... In principle, the Bayesian approach to model selection is straightforward. Prior probability distributions are used to describe the uncertainty surrounding all unknowns. After observing the data, the posterior distribution provides a coherent post data summary of the remaining uncertainty which is r ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
In principle, the Bayesian approach to model selection is straightforward. Prior probability distributions are used to describe the uncertainty surrounding all unknowns. After observing the data, the posterior distribution provides a coherent post data summary of the remaining uncertainty which is relevant for model selection. However, the practical implementation of this approach often requires carefully tailored priors and novel posterior calculation methods. In this article, we illustrate some of the fundamental practical issues that arise for two different model selection problems: the variable selection problem for the linear model and the CART model selection problem.
Mathematical Programming for Data Mining: Formulations and Challenges
- INFORMS Journal on Computing
, 1998
"... This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research ch ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
This paper is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research challenges, and outline opportunities for contributions by the optimization research communities. Towards these goals, we include formulations of the basic categories of data mining methods as optimization problems. We also provide examples of successful mathematical programming approaches to some data mining problems. keywords: data analysis, data mining, mathematical programming methods, challenges for massive data sets, classification, clustering, prediction, optimization. To appear: INFORMS: Journal of Compting, special issue on Data Mining, A. Basu and B. Golden (guest editors). Also appears as Mathematical Programming Technical Report 98-01, Computer Sciences Department, University of Wi...
Compression-Based Discretization of Continuous Attributes
- Proceedings of the 12th International Conference on Machine Learning
, 1995
"... Discretization of continuous attributes into ordered discrete attributes can be beneficial even for propositional induction algorithms that are capable of handling continuous attributes directly. Benefits include possibly large improvements in induction time, smaller sizes of induced trees or rule s ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
Discretization of continuous attributes into ordered discrete attributes can be beneficial even for propositional induction algorithms that are capable of handling continuous attributes directly. Benefits include possibly large improvements in induction time, smaller sizes of induced trees or rule sets, and even improved predictive accuracy. We define a global evaluation measure for discretizations based on the so-called Minimum Description Length (MDL) principle from information theory. Furthermore we describe the efficient algorithmic usage of this measure in the MDL-Disc algorithm. The new method solves some problems of alternative local measures used for discretization. Empirical results in a few natural domains and extensive experiments in an artificial domain show that MDL-Disc scales up well to large learning problems involving noise. 1 Financial support for the Austrian Research Institute for Artificial Intelligence is provided by the Austrian Federal Ministry of Science and...
On Pruning and Averaging Decision Trees
- In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... Pruning a decision tree is considered by some researchers to be the most important part of tree building in noisy domains. While, there are many approaches to pruning, an alternative approach of averaging over decision trees has not received as much attention. We perform an empirical comparison of p ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
Pruning a decision tree is considered by some researchers to be the most important part of tree building in noisy domains. While, there are many approaches to pruning, an alternative approach of averaging over decision trees has not received as much attention. We perform an empirical comparison of pruning with the approach of averaging over decision trees. For this comparison we use a computationally efficient method of averaging, namely averaging over the extended fanned set of a tree. Since there are a wide range of approaches to pruning, we compare tree averaging with a traditional pruning approach, along with an optimal pruning approach.

