Results 1 -
4 of
4
Decision Trees: More Theoretical Justification For Practical Algorithms
"... We study impurity-based decision tree algorithms such as CART, C4.5, etc., so as to better understand their theoretical underpinnings. We consider ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We study impurity-based decision tree algorithms such as CART, C4.5, etc., so as to better understand their theoretical underpinnings. We consider
An Approach to Automation Selection of Decision Tree based on Training Data Set D.Saravana Kumar
"... In Data mining applications, very large training data sets with several million records are common. Decision trees are very much powerful and excellent technique for both classification and prediction problems. Many decision tree construction algorithms have been proposed to develop and handle large ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
In Data mining applications, very large training data sets with several million records are common. Decision trees are very much powerful and excellent technique for both classification and prediction problems. Many decision tree construction algorithms have been proposed to develop and handle large or small training data. Some related algorithms are best for large data sets and some for small data sets. Each algorithm works best for its own criteria. The decision tree algorithms classify categorical and continuous attributes very well but it handles efficiently only a smaller data set. It consumes more time for large datasets. Supervised Learning In Quest (SLIQ) and Scalable Parallelizable Induction of Decision Tree (SPRINT) handles very large datasets. But SLIQ requires that the class labels should be available in main memory beforehand. SPRINT is best suited for large data sets and it removes all these memory restrictions. The research work deals with the automatic selection of decision tree algorithm based on training dataset size. This proposed system first prepares the training dataset size using the mathematical measure. The result training set size problem will be checked with the available memory space. If memory is very sufficient then the tree construction will continue. After the classifying the data, the accuracy of the classifier data set is estimated. The main advantages of the proposed method are that the system takes less time and avoids memory problem.
Techniques in Data Mining: Decision Trees Classification and Constraint-based Itemsets Mining
, 2001
"... Classification and Association Rules Mining are two important data mining techniques. These two techniques are complements of each other. Decision trees classification is a supervised learning that requires a training dataset to develop a classifier, while itemsets mining is an unsupervised learnin ..."
Abstract
- Add to MetaCart
Classification and Association Rules Mining are two important data mining techniques. These two techniques are complements of each other. Decision trees classification is a supervised learning that requires a training dataset to develop a classifier, while itemsets mining is an unsupervised learning that requires no apriori knowledge. Both of them are essential to practical applications. In this thesis, we aim at improving these two techniques for large databases. Classification has been widely used to assist decision making processes in various applications. Among the techniques for classification, decision tree has caught most attention recently due to its conceptual simplicity and accuracy. In the first half of this thesis, we investigate several strategies to speed up the process for building decision trees under the database oriented constraint: the main memory space is limited and usually much smaller than the dataset. Our methods for building decision trees are all based on pre-sorting. We pay particular attention to the problem of how to minimize I/O operations under the limited memory space. Our study shows that by emphasizing on ii different aspects such as the order of hashing, allocation of memory buffers, the amount of disk space, and the tradeoff between I/O and CPU costs, we can obtain schemes with different performance characteristics. Thus they can meet different requirements for different applications.
Prediction of Online Vehicle Insurance System using Bayes Classifier – A Proposed Approach
"... A classification technique (or classifier) is a systematic approach used in building classification models from an input data set. Some examples include decision tree classifier, rule based classifiers, neural networks, support vector machines and naïve Bayes classifiers. Each technique employs a le ..."
Abstract
- Add to MetaCart
(Show Context)
A classification technique (or classifier) is a systematic approach used in building classification models from an input data set. Some examples include decision tree classifier, rule based classifiers, neural networks, support vector machines and naïve Bayes classifiers. Each technique employs a learning algorithm to identify a model that best fits the relationships between the attribute set and the class label of the input data. The model generated by the learning algorithm should both fit the input data well and correctly predict the class labels of records it has never seen before. Therefore, a key objective of the learning algorithm is to build models that accurately predict the class labels of previously unknown records. In this work prediction has been done for the entire test set data, and accuracy is found to be good, compared to the relative performance of different classifiers namely decision tree classifier.