Results 1 - 10
of
19
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
- Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract
-
Cited by 122 (1 self)
- Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, tree-structured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
The New Jersey Machine-Code Toolkit
- IN PROCEEDINGS OF THE 1995 USENIX TECHNICAL CONFERENCE
, 1995
"... The New Jersey Machine-Code Toolkit helps programmers write applications that process machine code. Applications that use the toolkit are written at an assembly-language level of abstraction, but they recognize and emit binary. Guided by a short instructionset specification, the toolkit generates al ..."
Abstract
-
Cited by 48 (8 self)
- Add to MetaCart
The New Jersey Machine-Code Toolkit helps programmers write applications that process machine code. Applications that use the toolkit are written at an assembly-language level of abstraction, but they recognize and emit binary. Guided by a short instructionset specification, the toolkit generates all the bitmanipulating code. The toolkit's specification language uses four concepts: fields and tokens describe parts of instructions, patterns describe binary encodings of instructions or groups of instructions, and constructors map between the assembly-language and binary levels. These concepts are suitable for describing both CISC and RISC machines; we have written specifications for the MIPS R3000, SPARC, and Intel 486 instruction sets. We have used the toolkit to help write two applications: a debugger and a linker. The toolkit generates efficient code; for example, the linker emits binary up to 15% faster than it emits assembly language, making it 1.7-2 times faster to produce an a....
Dynamic Hashing Schemes
- ACM Computing Surveys
, 1988
"... A new type of dynamic file access called dynamic hushing has recently emerged. It promises the flexibility of handling dynamic tiles while preserving the fast access times expected from hashing. Such a fast, dynamic file access scheme is needed to support modern database systems. This paper surveys ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
A new type of dynamic file access called dynamic hushing has recently emerged. It promises the flexibility of handling dynamic tiles while preserving the fast access times expected from hashing. Such a fast, dynamic file access scheme is needed to support modern database systems. This paper surveys dynamic hashing schemes and examines
Efficient Tabling Mechanisms for Logic Programs
- PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LOGIC PROGRAMMING
, 1995
"... The use of tabling in logic programming allows bottom-up evaluation to be incorporated in a top-down framework, combining advantages that accrue from both. At the engine level, tabling also introduces complications not present in pure top-down evaluation, due to the need for subgoals and answers to ..."
Abstract
-
Cited by 40 (7 self)
- Add to MetaCart
The use of tabling in logic programming allows bottom-up evaluation to be incorporated in a top-down framework, combining advantages that accrue from both. At the engine level, tabling also introduces complications not present in pure top-down evaluation, due to the need for subgoals and answers to access tables during clause resolution. This paper describes the design, implementation, and experimental evaluation of data structures and algorithms for highperformance table access. Our approach uses tries as the basis for tables. Tries provide complete discrimination for terms, and permit a lookup and possible insertion to be performed in a single pass through a term. A novel technique of substitution factoring is also proposed. When substitution factoring is used in conjunction with tries, the access cost for answers is proportional to the size of the answer substitution, rather than to the size of the answer itself. As a special case, the access cost of answers to a datalog query is pr...
Efficient Access Mechanisms For Tabled Logic Programs
, 1999
"... This article describes the design, implementation, and experimental evaluation of data structures and algorithms for high-performance table access. Our approach uses tries as the basis for tables. Tries, a variant of discrimination nets, provide complete discrimination for terms, and permit a lookup ..."
Abstract
-
Cited by 28 (13 self)
- Add to MetaCart
This article describes the design, implementation, and experimental evaluation of data structures and algorithms for high-performance table access. Our approach uses tries as the basis for tables. Tries, a variant of discrimination nets, provide complete discrimination for terms, and permit a lookup and possible insertion to be performed in a single pass through a term. In addition, a novel technique of substitution factoring is proposed. When substitution factoring is used, the access cost for answers is proportional to the size of the answer substitution, rather than to the size of the answer itself. Answer tries can be implemented both as interpreted structures and as compiled WAM-like code. When they are compiled, the speed of computing substitutions through answer tries is competitive with the speed of unit facts compiled or asserted as WAM code. Because answer tries can also be created an order of magnitude more quickly than asserted code, they form a promising alternative for representing certain types of dynamic code, even in Prolog systems without tabling. / Address correspondence to I.V. Ramakrishnan, D.S. Warren, Dept. of Computer Science, State University of New York at Stony Brook, Stony Brook, NY 11794-4400, U.S.A., email: fram,warreng@cs.sunysb.edu; P. Rao, Bellcore, 445 South Street, Morristown, NJ 07960-6438, U.S.A., e-mail: prasadr@bellcore.com; K. Sagonas, Dept. of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001, Heverlee, Belgium, email:
Burst Tries: A Fast, Efficient Data Structure for String Keys
- ACM Transactions on Information Systems
, 2002
"... Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, t ..."
Abstract
-
Cited by 21 (10 self)
- Add to MetaCart
Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it requires no more memory than a binary tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or near-sorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.
Unification Factoring for Efficient Execution of Logic Programs
- In Proc. of the 22nd Symp. on Principles of Programming Languages. ACM
, 1995
"... The efficiency of resolution-based logic programming languages, such as Prolog, depends critically on selecting and executing sets of applicable clause heads to resolve against subgoals. Traditional approaches to this problem have focused on using indexing to determine the smallest possible applicab ..."
Abstract
-
Cited by 16 (10 self)
- Add to MetaCart
The efficiency of resolution-based logic programming languages, such as Prolog, depends critically on selecting and executing sets of applicable clause heads to resolve against subgoals. Traditional approaches to this problem have focused on using indexing to determine the smallest possible applicable set. Despite their usefulness, these approaches ignore the non-determinism inherent in many programming languages to the extent that they do not attempt to optimize execution after the applicable set has been determined. Unification factoring seeks to rectify this omission by regarding the indexing and unification phases of clause resolution as a single process. This paper formalizes that process through the construction of factoring automata. A polynomial-time algorithm is given for constructing optimal factoring automata which preserve the clause selection strategy of Prolog. More generally, when the clause selection strategy is not fixed, constructing an optimal automaton is shown to b...
Principles and Practice of Unification Factoring
- ACM Transactions on Programming Languages and Systems
, 1996
"... Devices]: Models of Computation---automata; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems---pattern matching General Terms: Algorithms, Languages, Theory Additional Key Words and Phrases: Indexing, logic programming, trie minimization, unification 1. IN ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Devices]: Models of Computation---automata; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems---pattern matching General Terms: Algorithms, Languages, Theory Additional Key Words and Phrases: Indexing, logic programming, trie minimization, unification 1. INTRODUCTION In logic programming languages, such as Prolog, a predicate is defined by a sequence of Horn clauses. A clause becomes applicable for resolution if its head unifies with a selected goal; and each applicable clause is invoked in textual order. Unification of a clause head with a goal involves two basic types of operations: elementary match operations and substitution operations for variables in the two A preliminary version of this article was presented at POPL '95. This work was supported in part by NSF grants CCR-9102159, CCR-9102989, CCR-9404921, CDA-9303181, CDA-9504275, INT-9314412, and ONR grant 400X116YIP01. Authors' addresses: S. Dawson, Computer Science Laboratory, SRI In...
IP Address Lookup in Hardware for High-Speed Routing
, 1998
"... This paper presents a way of doing IP packet classification at high speed, a critical operation for highcapacity routers. Two hardware designs are presented, both based on simple, standard logic. One goal is that the designs should be inexpensive and simple enough that it is feasible to replicate th ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This paper presents a way of doing IP packet classification at high speed, a critical operation for highcapacity routers. Two hardware designs are presented, both based on simple, standard logic. One goal is that the designs should be inexpensive and simple enough that it is feasible to replicate them on each input port on a router. The first design is for unicast forwarding, based on destination addresses, and does IP address lookup through a longest prefix match operation. The second design is for identifier lookup, used for multicast addresses and for packet flows. Both designs use standard memory and simple programmable logic, and are capable of one lookup per memory cycle. With standard memory technology, this corresponds to a rate of more than 10 Gb/s per port. In this paper we present the designs, analyse their performance and cost, and discuss how they can be used in a high-capacity router. 1.0
When Do Match-Compilation Heuristics Matter?
, 2000
"... Modern, statically typed, functional languages define functions by pattern matching. Although pattern matching is defined in terms of sequential checking of a value against one pattern after another, real implementations translate patterns into automata that can test a value against many pattern ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Modern, statically typed, functional languages define functions by pattern matching. Although pattern matching is defined in terms of sequential checking of a value against one pattern after another, real implementations translate patterns into automata that can test a value against many patterns at once. Decision trees are popular automata.

