Results 1 - 10
of
10
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
- Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract
-
Cited by 122 (1 self)
- Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, tree-structured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
Decision Tree Induction Based on Efficient Tree Restructuring
- Machine Learning
, 1996
"... . The ability to restructure a decision tree efficiently enables a variety of approaches to decision tree induction that would otherwise be prohibitively expensive. Two such approaches are described here, one being incremental tree induction (ITI), and the other being non-incremental tree induction ..."
Abstract
-
Cited by 110 (5 self)
- Add to MetaCart
. The ability to restructure a decision tree efficiently enables a variety of approaches to decision tree induction that would otherwise be prohibitively expensive. Two such approaches are described here, one being incremental tree induction (ITI), and the other being non-incremental tree induction using a measure of tree quality instead of test quality (DMTI). These approaches and several variants offer new computational and classifier characteristics that lend themselves to particular applications. Keywords: decision tree, incremental induction, direct metric, binary test, example incorporation, missing value, tree transposition, installed test, virtual pruning, update cost. 1. Introduction Decision tree induction offers a highly practical method for generalizing from examples whose class membership is known. The most common approach to inducing a decision tree is to partition the labelled examples recursively until a stopping criterion is met. The partition is defined by selectin...
BPF+: Exploiting Global Data-flow Optimization in a Generalized Packet Filter Architecture
- In SIGCOMM
, 1999
"... A packet filter is a programmable selection criterion for classifying or selecting packets from a packet stream in a generic, reusable fashion. Previous work on packet filters falls roughly into two categories, namely those efforts that investigate flexible and extensible filter abstractions but sac ..."
Abstract
-
Cited by 53 (0 self)
- Add to MetaCart
A packet filter is a programmable selection criterion for classifying or selecting packets from a packet stream in a generic, reusable fashion. Previous work on packet filters falls roughly into two categories, namely those efforts that investigate flexible and extensible filter abstractions but sacrifice performance, and those that focus on low-level, optimized filtering representations but sacrifice flexibility. Applications like network monitoring and intrusion detection, however, require both high-level expressiveness and raw performance. In this paper, we propose a fully general packet filter framework that affords both a high degree of flexibility and good performance. In our framework, a packet filter is expressed in a high-level language that is compiled into a highly efficient native implementation. The optimization phase of the compiler uses a flowgraph set relation called edge dominators and the novel application of an optimization technique that we call "redundant predicate...
Simplifying Decision Trees: A Survey
, 1996
"... Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpl ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpler, more comprehensible trees (or data structures derived from trees) with good classification accuracy, tree simplification has usually been of secondary concern relative to accuracy and no attempt has been made to survey the literature from the perspective of simplification. We present a framework that organizes the approaches to tree simplification and summarize and critique the approaches within this framework. The purpose of this survey is to provide researchers and practitioners with a concise overview of tree-simplification approaches and insight into their relative capabilities. In our final discussion, we briefly describe some empirical findings and discuss the application of tree i...
Adjoint Rewriting
, 1995
"... This thesis concerns rewriting in the typed -calculus. Traditional categorical models of typed -calculus use concepts such as functor, adjunction and algebra to model type constructors and their associated introduction and elimination rules, with the natural categorical equations inherent in these s ..."
Abstract
-
Cited by 25 (11 self)
- Add to MetaCart
This thesis concerns rewriting in the typed -calculus. Traditional categorical models of typed -calculus use concepts such as functor, adjunction and algebra to model type constructors and their associated introduction and elimination rules, with the natural categorical equations inherent in these structures providing an equational theory for -terms. One then seeks a rewrite relation which, by transforming terms into canonical forms, provides a decision procedure for this equational theory. Unfortunately the rewrite relations which have been proposed, apart from for the most simple of calculi, either generate the full equational theory but contain no decision procedure, or contain a decision procedure but only for a subtheory of that required. Our proposal is to unify the semantics and reduction theory of the typed -calculus by generalising the notion of model from categorical structures based on term equality to categorical structures based on term reduction. This is accomplished via...
Finding Small Equivalent Decision Trees is Hard
- International Journal of Foundations of Computer Science
, 1999
"... Two decision trees are called equivalent if they represent the same function, i.e., they yield the same result for every possible input. We prove that given a decision tree and a number, the problem of deciding if there is an equivalent decision tree of size at most that number is NP-complete. As a ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Two decision trees are called equivalent if they represent the same function, i.e., they yield the same result for every possible input. We prove that given a decision tree and a number, the problem of deciding if there is an equivalent decision tree of size at most that number is NP-complete. As a consequence, finding decision tree of minimal size that is decision equivalent to a given decision tree is an NP-complete problem.
Decision Trees: Equivalence and Propositional Operations
- Utrecht University
, 1998
"... . For the well-known concept of decision trees as it is used for inductive inference we study the natural concept of equivalence: two decision trees are equivalent if and only if they represent the same hypothesis. We present a simple efficient algorithm to establish whether two decision trees are e ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
. For the well-known concept of decision trees as it is used for inductive inference we study the natural concept of equivalence: two decision trees are equivalent if and only if they represent the same hypothesis. We present a simple efficient algorithm to establish whether two decision trees are equivalent or not. The complexity of this algorithm is bounded by the product of the sizes of both decision trees. The hypothesis represented by a decision tree is essentially a boolean function, just like a proposition. Although every boolean function can be represented in this way, we show that disjunctions and conjunctions of decision trees can not efficiently be represented as decision trees, and simply shaped propositions may require exponential size for representation as decision trees. 1 Introduction The problem of inductive inference, or shortly induction, in machine learning ([7]) can be described as follows. Roughly speaking, a number of observations each having an outcome, has to ...
UNIVERSITY OF CALGARY Floey, an Intermediate Language for Optimizing Compilers
, 2008
"... In modern optimizing compilers, linear human-readable text representation of a program is first transformed into an abstract syntax tree that represents the structure of that program. Abstract syntax tree is then transformed into intermediate representation (IR), based on which compiler optimization ..."
Abstract
- Add to MetaCart
In modern optimizing compilers, linear human-readable text representation of a program is first transformed into an abstract syntax tree that represents the structure of that program. Abstract syntax tree is then transformed into intermediate representation (IR), based on which compiler optimizations are accomplished. The optimized IR is sent to the code generator and finally translated into assembly or machine code. Research on IRs has been focused on how they can be designed to facilitate compiler optimizations or more effective code generation on specific architecture. This thesis presents a mid-level intermediate language, called Floey. In a Floey program, control flowgraphs are separated into different tree-like structures called control expressions. Different control expressions are connected by entries. On Floey, a machine independent optimization, called the reduction algorithm, is implemented. By comparing the reduction algorithm to various conventional optimizations, we argue that not only Floey facilities compiler optimization design, it also provides a cleaner and uniform perspective on compiler optimizations in general.
Static Analyses of Cryptographic Protocols
, 2009
"... Most protocol analyses only address security properties. However, other properties are important and can increase our understanding of protocols, as well as aid in the deployment and compilation of implementations. We investigate such analyses. Unfortunately, existing high-level protocol implementat ..."
Abstract
- Add to MetaCart
Most protocol analyses only address security properties. However, other properties are important and can increase our understanding of protocols, as well as aid in the deployment and compilation of implementations. We investigate such analyses. Unfortunately, existing high-level protocol implementation languages do not accept programs that match the style used by the protocol design community. These languages are designed to implement protocol roles independently, not whole protocols. Therefore, a different program must be written for each role. We define a language, WPPL, that avoids this problem. It avoids the need to create a new tool-chain, however, by compiling protocol descriptions into an existing, standard role-based protocol implementation language. Next, we investigate two families of analyses. The first reveals the implicit design decisions of the protocol designer and enables fault-tolerance in implementations. The second characterizes the infinite space of all messages a protocol role could accept and enables scalability by determining the session state necessary to support concurrency. Our entire work is formalized in a mechanical proof checker, the Coq proof assistant, to ensure its theoretical reliability. Our implementations are automatically extracted from the formal Coq theory, so they are guaranteed to implement the theory.
Improved Decision Tree Induction Algorithm with Feature Selection, Cross Validation, Model Complexity and Reduced Error Pruning
"... Abstract — Data mining is the process of finding new patterns. Classification is the technique of generalizing known structure to apply to new data. Classification using a decision tree is performed by routing from the root node until arriving at a leaf node. To model classification process, decisio ..."
Abstract
- Add to MetaCart
Abstract — Data mining is the process of finding new patterns. Classification is the technique of generalizing known structure to apply to new data. Classification using a decision tree is performed by routing from the root node until arriving at a leaf node. To model classification process, decision tree is used. Decision can handle both continuous and categorical data. In this research work, Comparison is made between ID3, C4.5 and C5.0. Among these classifiers C5.0 gives more accurate and efficient output with comparatively high speed. Memory usage to store the ruleset in case of the C5.0 classifier is less as it generates smaller decision tree. This research work supports high accuracy, good speed and low memory usage as proposed system is using C5.0 as the base classifier. The classification process here has

