Results 1 - 10
of
27
A Survey of Methods for Scaling Up Inductive Algorithms
- Data Mining and Knowledge Discovery
, 1999
"... . One of the defining challenges for the KDD research community is to enable inductive learning algorithms to mine very large databases. This paper summarizes, categorizes, and compares existing work on scaling up inductive algorithms. We concentrate on algorithms that build decision trees and rule ..."
Abstract
-
Cited by 74 (10 self)
- Add to MetaCart
. One of the defining challenges for the KDD research community is to enable inductive learning algorithms to mine very large databases. This paper summarizes, categorizes, and compares existing work on scaling up inductive algorithms. We concentrate on algorithms that build decision trees and rule sets, in order to provide focus and specific details; the issues and techniques generalize to other types of data mining. We begin with a discussion of important issues related to scaling up. We highlight similarities among scaling techniques by categorizing them into three main approaches. For each approach, we then describe, compare, and contrast the different constituent techniques, drawing on specific examples from published papers. Finally, we use the preceding analysis to suggest how to proceed when dealing with a large problem, and where to focus future research. Keywords: scaling up, inductive learning, decision trees, rule learning 1. Introduction The knowledge discovery and data...
Multiple Comparisons in Induction Algorithms
- Machine Learning
, 1998
"... Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 01003-4610 413-545-3613 A single ..."
Abstract
-
Cited by 67 (9 self)
- Add to MetaCart
Keywords Running Head multiple comparison procedure Multiple Comparisons in Induction Algorithms David Jensen and Paul R. Cohen Experimental Knowledge Systems Laboratory Department of Computer Science Box 34610 LGRC University of Massachusetts Amherst, MA 01003-4610 413-545-3613 A single mechanism is responsible for three pathologies of induction algorithms: attribute selection errors, overfitting, and oversearching. In each pathology, induction algorithms compare multiple items based on scores from an evaluation function and select the item with the maximum score. We call this a ( ). We analyze the statistical properties of and show how failure to adjust for these properties leads to the pathologies. We also discuss approaches that can control pathological behavior, including Bonferroni adjustment, randomization testing, and cross-validation. Inductive learning, overfitting, oversearching, attribute selection, hypothesis testing, parameter estimation Multiple Com...
Eliciting Knowledge and Transferring It Effectively to a Knowledge-Based System
- IEEE Transactions on Knowledge and Data Engineering
, 1993
"... The knowledge acquisition bottleneck impeding the development of expert systems is being alleviated by the development of computer-based knowledge acquisition tools. These work directly with experts to elicit knowledge, and structure it appropriately to operate as a decision support tool within an e ..."
Abstract
-
Cited by 32 (10 self)
- Add to MetaCart
The knowledge acquisition bottleneck impeding the development of expert systems is being alleviated by the development of computer-based knowledge acquisition tools. These work directly with experts to elicit knowledge, and structure it appropriately to operate as a decision support tool within an expert system. However, the elicitation of expert knowledge and its effective transfer to a useful knowledge-based system is complex and involves a diversity of activities. This paper illustrates the complete development of a decision support system using knowledge acquisition tools. The example is simple enough to be completely analyzed but exhibits enough real-world characteristics to give significant insights into the processes and problems of knowledge engineering. 1 Introduction Knowledge acquisition for expert system development has come to be termed knowledge engineering, following Feigenbaum's (1980) use of the term to describe the reduction of a large body of knowledge to a precise...
Simplifying Decision Trees: A Survey
, 1996
"... Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpl ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
Induced decision trees are an extensively-researched solution to classification tasks. For many practical tasks, the trees produced by tree-generation algorithms are not comprehensible to users due to their size and complexity. Although many tree induction algorithms have been shown to produce simpler, more comprehensible trees (or data structures derived from trees) with good classification accuracy, tree simplification has usually been of secondary concern relative to accuracy and no attempt has been made to survey the literature from the perspective of simplification. We present a framework that organizes the approaches to tree simplification and summarize and critique the approaches within this framework. The purpose of this survey is to provide researchers and practitioners with a concise overview of tree-simplification approaches and insight into their relative capabilities. In our final discussion, we briefly describe some empirical findings and discuss the application of tree i...
Knowledge Acquisition Tools based on Personal Construct Psychology
, 1993
"... Knowledge acquisition research supports the generation of knowledge-based systems through the development of principles, techniques, methodologies and tools. What differentiates knowledgebased system development from conventional system development is the emphasis on in-depth understanding and forma ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Knowledge acquisition research supports the generation of knowledge-based systems through the development of principles, techniques, methodologies and tools. What differentiates knowledgebased system development from conventional system development is the emphasis on in-depth understanding and formalization of the relations between the conceptual structures underlying expert performance and the computational structures capable of emulating that performance. Personal construct psychology is a theory of individual and group psychological and social processes that has been used extensively in knowledge acquisition research to model the cognitive processes of human experts. The psychology takes a constructivist position appropriate to the modeling of human knowledge processes but develops this through the characterization of human conceptual structures in axiomatic terms that translate directly to computational form. In particular, there is a close correspondence between the intensional lo...
The Use of Simulated Experts in Evaluating Knowledge Acquisition
- University of Calgary
, 1995
"... Evaluation of knowledge acquisition methods remains an important goal; however, evaluation of actual knowledge acquisition is difficult because of the unavailability of experts for adequately controlled studies. This paper proposes the use of simulated experts, i.e., other knowledge based systems ..."
Abstract
-
Cited by 20 (12 self)
- Add to MetaCart
Evaluation of knowledge acquisition methods remains an important goal; however, evaluation of actual knowledge acquisition is difficult because of the unavailability of experts for adequately controlled studies. This paper proposes the use of simulated experts, i.e., other knowledge based systems as sources of expertise in assessing knowledge acquisition tools. A simulated expert is not as creative or wise as a human expert, but it readily allows for controlled experiments. This method has been used to assess a knowledge acquisition methodology, Ripple Down Rules at various levels of expertise and shows that redundancy is not a major problem with RDR. Introduction Evaluation of knowledge acquisition (KA) methods remains an important goal. Many KA methods have been proposed and many tools have been developed. However, the critical issue for any developer of knowledge based systems (KBS) is to select the best KA technique for the task in hand. This means that papers describing m...
A Situated Classification Solution of a Resource Allocation Task Represented in a Visual Language
- International Journal Human-Computer Studies
"... The Sisyphus room allocation problem solving example has been solved using a situated classification approach. A solution was developed from the protocol provided in terms of three heuristic classification systems, one classifying people, another rooms, and another tasks on an agenda of recommended ..."
Abstract
-
Cited by 19 (9 self)
- Add to MetaCart
The Sisyphus room allocation problem solving example has been solved using a situated classification approach. A solution was developed from the protocol provided in terms of three heuristic classification systems, one classifying people, another rooms, and another tasks on an agenda of recommended room allocations. The domain ontology, problem data, problem-solving method, and domain-specific classification rules, have each been represented in a visual language. These knowledge structures compile to statements in a term subsumption knowledge representation language, and are loaded and run in a knowledge representation server to solve the problem. The user interface has been designed to provide support for human intervention in under-determined and overdetermined situations, allowing advantage to be taken of the additional choices available in the first case, and a compromise solution to be developed in the second. 1 INTRODUCTION The Sisyphus room allocation problem is a resource all...
Comparing Conceptual Structures: Consensus, Conflict, Correspondence and Contrast
, 1989
"... One problem of eliciting knowledge from several experts is that experts may share only parts of their terminologies and conceptual systems. Experts may use the same term for different concepts, use different terms for the same concept, use the same term for the same concept, or use different terms a ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
One problem of eliciting knowledge from several experts is that experts may share only parts of their terminologies and conceptual systems. Experts may use the same term for different concepts, use different terms for the same concept, use the same term for the same concept, or use different terms and have different concepts. Moreover, clients who use an expert system have even less likelihood of sharing terms and concepts with the experts who produced it. This paper outlines a methodology for eliciting and recognizing such individual differences. It can be used to focus discussion between experts on those differences between them which require resolution, enabling them to classify them in terms of differing terminologies, levels of abstraction, disagreements, and so on. The methodology promotes the full exploration of the conceptual framework of a domain of expertise by encouraging experts to operate in a "brain-storming" mode as a group, using differing viewpoints to develop a rich f...
Transforming Rules and Trees into Comprehensible Knowledge Structures
, 1996
"... The problem of transforming the knowledge bases of expert systems using induced rules or decision trees into comprehensible knowledge structures is addressed. A knowledge structure is developed that generalizes and subsumes production rules, decision trees, and rules with exceptions. It gives rise t ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
The problem of transforming the knowledge bases of expert systems using induced rules or decision trees into comprehensible knowledge structures is addressed. A knowledge structure is developed that generalizes and subsumes production rules, decision trees, and rules with exceptions. It gives rise to a natural complexity measure that allows them to be understood, analyzed and compared on a uniform basis. The structure is a directed acyclic graph with the semantics that nodes are premises, some of which have attached conclusions, and the arcs are inheritance links with disjunctive multiple inheritance. A detailed example is given of the generation of a range of such structures of equivalent performance for a simple problem, and the complexity measure of a particular structure is shown to relate to its perceived complexity. The simplest structures are generated by an algorithm that factors common sub-premises from the premises of rules. A more complex example of a chess dataset is used t...
Class Library Implementation of an Open Architecture Knowledge Support System
, 1994
"... Object-oriented class libraries offer the potential for individual researchers to manage the large bodies of code generated in the experimental development of complex interactive systems. This article analyzes the structure of such a class library that supports the rapid prototyping of a wide range ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
Object-oriented class libraries offer the potential for individual researchers to manage the large bodies of code generated in the experimental development of complex interactive systems. This article analyzes the structure of such a class library that supports the rapid prototyping of a wide range of systems including collaborative networking, shared documents, hypermedia, machine learning, knowledge acquisition and knowledge representation, and various combinations of these technologies. The overall systems architecture is presented in terms of a heterogeneous collection of systems providing a wide range of application functionalities. Examples are given of group writing, multimedia and knowledge-based systems which are based on combining these functionalities. The detailed design issues of the knowledge representation server component of the system are analyzed in terms of requirements, current state-of-the-art, and the underlying theoretical principles that lead to an effective obj...

