Results 1 - 10
of
47
Constructing Deterministic Finite-State Automata in Recurrent Neural Networks
- Journal of the ACM
, 1996
"... Recurrent neural networks that are trained to behave like deterministic finite-state automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use o ..."
Abstract
-
Cited by 66 (15 self)
- Add to MetaCart
Recurrent neural networks that are trained to behave like deterministic finite-state automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use of a sigmoidal discriminant function together with the recurrent structure contribute to this instability. We prove that a simple algorithm can construct second-order recurrent neural networks with a sparse interconnection topology and sigmoidal discriminant function such that the internal DFA state representations are stable, i.e. the constructed network correctly classifies strings of arbitrary length. The algorithm is based on encoding strengths of weights directly into the neural network. We derive a relationship between the weight strength and the number of DFA states for robust string classification. For a DFA with n states and m input alphabet symbols, the constructive algorithm genera...
Hybrid Neural Systems
, 2000
"... This chapter provides an introduction to the field of hybrid neural systems. Hybrid neural systems are computational systems which are based mainly on artificial neural networks but also allow a symbolic interpretation, or interaction with symbolic components. In this overview, we will describe rece ..."
Abstract
-
Cited by 34 (9 self)
- Add to MetaCart
This chapter provides an introduction to the field of hybrid neural systems. Hybrid neural systems are computational systems which are based mainly on artificial neural networks but also allow a symbolic interpretation, or interaction with symbolic components. In this overview, we will describe recent results of hybrid neural systems. We will give a brief overview of the main methods used, outline the work that is presented here, and provide additional references. We will also highlight some important general issues and trends.
Constructive Learning of Recurrent Neural Networks: Limitations of Recurrent Casade Correlation and a Simple Solution
, 1993
"... It is often difficult to predict the optimal neural network size for a particular application. Constructive or destructive methods that add or subtract neurons, layers, connections, etc. might offer a solution to this problem. We prove that one method, Recurrent Cascade Correlation, due to its topol ..."
Abstract
-
Cited by 27 (9 self)
- Add to MetaCart
It is often difficult to predict the optimal neural network size for a particular application. Constructive or destructive methods that add or subtract neurons, layers, connections, etc. might offer a solution to this problem. We prove that one method, Recurrent Cascade Correlation, due to its topology, has fundamental limitations in representation and thus in its learning capabilities. It cannot represent with monotone (i.e. sigmoid) and hard-threshold activation functions certain finite state automata. We give a "preliminary" approach on how to get around these limitations by devising a simple constructive training method that adds neurons during training while still preserving the powerful fully-recurrent structure. We illustrate this approach by simulations which learn many examples of regular grammars that the Recurrent Cascade Correlation method is unable to learn. 1 Introduction Choosing the architecture of a neural network for a particular problem usually requires some prior k...
Rule Revision with Recurrent Neural Networks
, 1996
"... Recurrent neural networks readily process, recognize and generate temporal sequences. By encoding grammatical strings as temporal sequences, recurrent neural networks can be trained to behave like deterministic sequential finite-state automata. Algorithms have been developed for extracting grammatic ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
Recurrent neural networks readily process, recognize and generate temporal sequences. By encoding grammatical strings as temporal sequences, recurrent neural networks can be trained to behave like deterministic sequential finite-state automata. Algorithms have been developed for extracting grammatical rules from trained networks. Using a simple method for inserting prior knowledge (or rules) into recurrent neural networks, we show that recurrent neural networks are able to perform rule revision. Rule revision is performed by comparing the inserted rules with the rules in the finite-state automata extracted from trained networks. The results from training a recurrent neural network to recognize a known non-trivial, randomly generated regular grammar show that not only do the networks preserve correct rules but that they are able to correct through training inserted rules which were initially incorrect. (By incorrect, we mean that the rules were not the ones in the randomly generated gra...
A Formal Definition of Intelligence Based on an Intensional Variant of Algorithmic Complexity
- In Proceedings of the International Symposium of Engineering of Intelligent Systems (EIS'98
, 1998
"... Machine Due to the current technology of the computers we can use, we have chosen an extremely abridged emulation of the machine that will effectively run the programs, instead of more proper languages, like l-calculus (or LISP). We have adapted the "toy RISC" machine of [Hernndez & Hernndez 1993] ..."
Abstract
-
Cited by 20 (10 self)
- Add to MetaCart
Machine Due to the current technology of the computers we can use, we have chosen an extremely abridged emulation of the machine that will effectively run the programs, instead of more proper languages, like l-calculus (or LISP). We have adapted the "toy RISC" machine of [Hernndez & Hernndez 1993] with two remarkable features inherited from its object-oriented coding in C++: it is easily tunable for our needs, and it is efficient. We have made it even more reduced, removing any operand in the instruction set, even for the loop operations. We have only three registers which are AX (the accumulator), BX and CX. The operations Q b we have used for our experiment are in Table 1: LOOPTOP Decrements CX. If it is not equal to the first element jump to the program top.
An Anytime Approach To Connectionist Theory Refinement: Refining The Topologies Of Knowledge-Based Neural Networks
, 1995
"... Many scientific and industrial problems can be better understood by learning from samples of the task at hand. For this reason, the machine learning and statistics communities devote considerable research effort on generating inductive-learning algorithms that try to learn the true "concept" of a ta ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Many scientific and industrial problems can be better understood by learning from samples of the task at hand. For this reason, the machine learning and statistics communities devote considerable research effort on generating inductive-learning algorithms that try to learn the true "concept" of a task from a set of its examples. Often times, however, one has additional resources readily available, but largely unused, that can improve the concept that these learning algorithms generate. These resources include available computer cycles, as well as prior knowledge describing what is currently known about the domain. Effective utilization of available computer time is important since for most domains an expert is willing to wait for weeks, or even months, if a learning system can produce an improved concept. Using prior knowledge is important since it can contain information not present in the current set of training examples. In this thesis, I present three "anytime" approaches to connec...
The Neural Network Pushdown Automaton: Model, Stack and Learning Simulations
, 1993
"... In order for neural networks to learn complex languages or grammars, they must have sufficient computational power or resources to recognize or generate such languages. Though many approaches to effectively utilizing the computational power of neural networks have been discussed, an obvious one is t ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
In order for neural networks to learn complex languages or grammars, they must have sufficient computational power or resources to recognize or generate such languages. Though many approaches to effectively utilizing the computational power of neural networks have been discussed, an obvious one is to couple a recurrent neural network with an external stack memory- in effect creating a neural network pushdown automata (NNPDA). This NNPDA generalizes the concept of a recurrent network so that the network becomes a more complex computing structure. This paper discusses in detail a NNPDA- its construction, how it can be trained and how useful symbolic information can be extracted from the trained network. To effectively couple the external stack to the neural network, an optimization method is developed which uses an error function that connects the learning of the state automaton of the neural network to the learning of the operation of the external stack: push, pop, and no-operation. To minimize the error function using gradient descent learning, an analog stack is designed such that the action and storage of information in the stack are continuous. One interpretation of a continuous stack is the probabilistic storage of and action on data. After training on sample strings of an unknown source grammar, a quantization procedure extracts from the analog stack and neural network a discrete pushdown automata (PDA). Simulations show that in learning deterministic context-free grammars- the balanced parenthesis language, 1 n 0 n, and the deterministic Palindrome- the extracted PDA is correct in the sense that it can correctly recognize unseen strings of arbitrary length. In addition, the extracted PDAs can be shown to be identical or equivalent to the PDAs of the source grammars which were used to generate the training strings.
Pruning Recurrent Neural Networks for Improved Generalization Performance
, 1994
"... Determining the architecture of a neural network is an important issue for any learning task. For recurrent neural networks no general methods exist that permit the estimation of the number of layers of hidden neurons, the size of layers or the number of weights. We present a simple pruning heuristi ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Determining the architecture of a neural network is an important issue for any learning task. For recurrent neural networks no general methods exist that permit the estimation of the number of layers of hidden neurons, the size of layers or the number of weights. We present a simple pruning heuristic which significantly improves the generalization performance of trained recurrent networks. We illustrate this heuristic by training a fully recurrent neural network on positive and negative strings of a regular grammar. We also show that if rules are extracted from networks trained to recognize these strings, that rules extracted after pruning are more consistent with the rules to be learned. This performance improvement is obtained by pruning and retraining the networks. Simulations are shown for training and pruning a recurrent neural net on strings generated by two regular grammars, a randomly-generated 10-state grammar and an 8-state triple parity grammar. Further simulations indicate ...
A Framework for Programming Embedded Systems: Initial Design and Results
, 1998
"... This paper describes CES, a proto-type of a new programming language for robots and other embedded systems, equipped with sensors and actuators. CES contains two new ideas, currently not found in other programming languages: support of computing with uncertain information, and support of adaptation ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
This paper describes CES, a proto-type of a new programming language for robots and other embedded systems, equipped with sensors and actuators. CES contains two new ideas, currently not found in other programming languages: support of computing with uncertain information, and support of adaptation and teaching as a means of programming. These innovations facilitate the rapid development of software for embedded systems, as demonstrated by a mobile robot application.
Knowledge Extraction from Transducer Neural Networks
- Journal of Applied Intelligence
, 2000
"... Previously neural networks have shown interesting performance results for tasks such as classification, but they still suffer from an insufficient focus on the structure of the knowledge represented therein. In this paper, we analyze various knowledge extraction techniques in detail and we develop n ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
Previously neural networks have shown interesting performance results for tasks such as classification, but they still suffer from an insufficient focus on the structure of the knowledge represented therein. In this paper, we analyze various knowledge extraction techniques in detail and we develop new transducer extraction techniques for the interpretation of recurrent neural network learning. First, we provide an overview of different possibilities to express structured knowledge using neural networks. Then, we analyze a type of recurrent network rigorously, applying a broad range of different techniques. We argue that analysis techniques, such as weight analysis using Hinton diagrams, hierarchical cluster analysis, and principal component analysis may be useful for providing certain views on the underlying knowledge. However, we demonstrate that these techniques are too static and too low-level for interpreting recurrent network classifications. The contribution of this paper is a particularly broad analysis of knowledge extraction techniques. Furthermore, we propose dynamic learning analysis and transducer extraction as two new dynamic interpretation techniques. Dynamic learning analysis provides a better understanding of how the network learns, while transducer extraction provides a better understanding of what the network represents.

