Results 11 - 20
of
62
Learning Symbolic Rules Using Artificial Neural Networks
- Proceedings of the Tenth International Conference on Machine Learning
, 1993
"... A distinct advantage of symbolic learning algorithms over artificial neural networks is that typically the concept representations they form are more easily understood by humans. One approach to understanding the representations formed by neural networks is to extract symbolic rules from trained net ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
A distinct advantage of symbolic learning algorithms over artificial neural networks is that typically the concept representations they form are more easily understood by humans. One approach to understanding the representations formed by neural networks is to extract symbolic rules from trained networks. In this paper we describe and investigate an approach for extracting rules from networks that uses (1) the NofM extraction algorithm, and (2) the network training method of soft weight-sharing. Previously, the NofM algorithm had been successfully applied only to knowledge-based neural networks. Our experiments demonstrate that our extracted rules generalize better than rules learned using the C4.5 system. In addition to being accurate, our extracted rules are also reasonably comprehensible. 1 INTRODUCTION Artificial neural networks (ANNs) have been successfully applied to real-world problems as varied as steering a motor vehicle (Pomerleau, 1991) and learning to pronounce English tex...
StatLog: Comparison of Classification Algorithms on Large Real-World Problems
, 1995
"... This paper describes work in the StatLog project comparing classification algorithms on large real-world problems. The algorithms compared were from: symbolic learning (CART, C4.5, NewID, AC 2 , ITrule, Cal5, CN2), statistics (Naive Bayes, k-nearest neighbor, kernel density, linear discriminant, qua ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
This paper describes work in the StatLog project comparing classification algorithms on large real-world problems. The algorithms compared were from: symbolic learning (CART, C4.5, NewID, AC 2 , ITrule, Cal5, CN2), statistics (Naive Bayes, k-nearest neighbor, kernel density, linear discriminant, quadratic discriminant, logistic regression, projection pursuit, Bayesian networks), and neural networks (back-propagation, radial basis functions). Twelve datasets were used: five from image analysis, three from medicine, and two each from engineering and finance. We found that which algorithm performed best depended critically on the dataset investigated. We therefore developed a set of dataset descriptors to help decide which algorithms are suited to particular datasets. For example, datasets with extreme distributions (skew ? 1 and kurtosis ? 7) and with many binary/categorical attributes (? 38%) tend to favor symbolic learning algorithms. We suggest how classification algorith...
A Comparison of ID3 and Backpropagation for English Text-to-Speech Mapping
, 1995
"... The performance of the error backpropagation (BP) and ID3 learning algorithms was compared on the task of mapping English text to phonemes and stresses. Under the distributed output code developed by Sejnowski and Rosenberg, it is shown that BP consistently outperforms ID3 on this task by several pe ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
The performance of the error backpropagation (BP) and ID3 learning algorithms was compared on the task of mapping English text to phonemes and stresses. Under the distributed output code developed by Sejnowski and Rosenberg, it is shown that BP consistently outperforms ID3 on this task by several percentage points. Three hypotheses explaining this difference were explored: (a) ID3 is overfitting the training data, (b) BP is able to share hidden units across several output units and hence can learn the output units better, and (c) BP captures statistical information that ID3 does not. We conclude that only hypothesis (c) is correct. By augmenting ID3 with a simple statistical learning procedure, the performance of BP can be approached but not matched. More complex statistical procedures can improve the performance of both BP and ID3 substantially. A study of the residual errors suggests that there is still substantial room for improvement in learning methods for text-to-speech mapping.
Linear Machine Decision Trees
, 1991
"... This article presents an algorithm for inducing multiclass decision trees with multivariate tests at internal decision nodes. Each test is constructed by training a linear machine and eliminating variables in a controlled manner. Empirical results demonstrate that the algorithm builds small accurate ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
This article presents an algorithm for inducing multiclass decision trees with multivariate tests at internal decision nodes. Each test is constructed by training a linear machine and eliminating variables in a controlled manner. Empirical results demonstrate that the algorithm builds small accurate trees across a variety of tasks. 1 Introduction One of the fundamental research problems in machine learning is how to learn from examples. From a sequence or set of training examples, each labeled with its correct class name, a machine learns by forming or selecting a generalization of the training examples. This process, also known as supervised learning, is useful for real classification tasks, e.g. disease diagnosis, and for problem solving tasks in which control decisions depend on classification, e.g. rule applicability. The ability to generalize is fundamental to intelligence because it allows one to reason in accordance with predictions that are often correct. This article focuse...
Generalization and Decision Tree Induction: Efficient Classification In Data Mining
- IN PROC. OF 1997 INT. WORKSHOP RESEARCH ISSUES ON DATA ENGINEERING (RIDE'97
, 1997
"... Efficiency and scalability are fundamental issues concerning data mining in large databases. Although classification has been studied extensively, few of the known methods take serious consideration of efficient induction in large databases and the analysis of data at multiple abstraction levels. T ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
Efficiency and scalability are fundamental issues concerning data mining in large databases. Although classification has been studied extensively, few of the known methods take serious consideration of efficient induction in large databases and the analysis of data at multiple abstraction levels. This paper addresses the efficiency and scalability issues by proposing a data classification method which integrates attribute-oriented induction, relevance analysis, and the induction of decision trees. Such an integration leads to efficient, high-quality, multiple-level classification of large amounts of data, the relaxation of the requirement of perfect training sets, and the elegant handling of continuous and noisy data.
Rerepresenting and Restructuring Domain Theories: A Constructive Induction Approach
- Journal of Artificial Intelligence Research
, 1995
"... Theory revision integrates inductive learning and background knowledge by combining training examples with a coarse domain theory to produce a more accurate theory. There are two challenges that theory revision and other theory-guided systems face. First, a representation language appropriate for th ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Theory revision integrates inductive learning and background knowledge by combining training examples with a coarse domain theory to produce a more accurate theory. There are two challenges that theory revision and other theory-guided systems face. First, a representation language appropriate for the initial theory may be inappropriate for an improved theory. While the original representation may concisely express the initial theory, a more accurate theory forced to use that same representation may be bulky, cumbersome, and difficult to reach. Second, a theory structure suitable for a coarse domain theory may be insufficient for a fine-tuned theory. Systems that produce only small, local changes to a theory have limited value for accomplishing complex structural alterations that may be required. Consequently, advanced theory-guided learning systems require flexible representation and flexible structure. An analysis of various theory revision systems and theory-guided learning systems ...
Transferring Previously Learned Back-Propagation Neural Networks To New Learning Tasks
, 1993
"... ..."
An Anytime Approach To Connectionist Theory Refinement: Refining The Topologies Of Knowledge-Based Neural Networks
, 1995
"... Many scientific and industrial problems can be better understood by learning from samples of the task at hand. For this reason, the machine learning and statistics communities devote considerable research effort on generating inductive-learning algorithms that try to learn the true "concept" of a ta ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Many scientific and industrial problems can be better understood by learning from samples of the task at hand. For this reason, the machine learning and statistics communities devote considerable research effort on generating inductive-learning algorithms that try to learn the true "concept" of a task from a set of its examples. Often times, however, one has additional resources readily available, but largely unused, that can improve the concept that these learning algorithms generate. These resources include available computer cycles, as well as prior knowledge describing what is currently known about the domain. Effective utilization of available computer time is important since for most domains an expert is willing to wait for weeks, or even months, if a learning system can produce an improved concept. Using prior knowledge is important since it can contain information not present in the current set of training examples. In this thesis, I present three "anytime" approaches to connec...
A Survey of Methods for Scaling Up Inductive Learning Algorithms
, 1997
"... : Each year, one of the explicit challenges for the KDD research community is to develop methods that facilitate the use of inductive learning algorithms for mining very large databases. By collecting, categorizing, and summarizing past work on scaling up inductive learning algorithms, this paper se ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
: Each year, one of the explicit challenges for the KDD research community is to develop methods that facilitate the use of inductive learning algorithms for mining very large databases. By collecting, categorizing, and summarizing past work on scaling up inductive learning algorithms, this paper serves to establish a common ground for researchers addressing the challenge. We begin with a discussion of important, but often tacit, issues related to scaling up learning algorithms. We highlight similarities among methods by categorizing them into three main approaches. For each approach, we then describe, compare, and contrast the different constituent methods, drawing on specific examples from the published literature. Finally, we use the preceding analysis to suggest how one should proceed when dealing with a large problem, and where future research efforts should be focused. Primary contact: Foster Provost NYNEX Science and Technology, 400 Westchester Avenue, White Plains, NY 10604 em...
An Efficient Way To Learn English Grapheme-To-Phoneme Rules Automatically
- Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP
, 1993
"... We present an efficient way to learn automatically grapheme-to-phoneme mapping rules for English by using Kohonen's concept of Dynamically Expanding Context. This method constructs rules that are most general in the sense of an explicitly defined specificity hierarchy. As the hierarchy, we have used ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
We present an efficient way to learn automatically grapheme-to-phoneme mapping rules for English by using Kohonen's concept of Dynamically Expanding Context. This method constructs rules that are most general in the sense of an explicitly defined specificity hierarchy. As the hierarchy, we have used the amount of expanding context around the symbol to be transformed, weighted towards the right. To apply this concept to English text-to-speech mapping, we have used the 20008-word corpus provided in the public domain by Sejnowski and Rosenberg, that was also used in the NETTALK-experiments. Phoneme-level mapping accuracies of 91 per cent with data not used in training demonstrate that the Dynamically Expanding Context is able to capture quite efficiently the contextdependent relationships in the corpus. 1 INTRODUCTION The problem addressed in this paper is automatic learning of grapheme-to-phoneme mapping rules. We present an efficient way to learn these for English by using Kohonen's c...

