## Constructing Deterministic Finite-State Automata in Recurrent Neural Networks (1996)

### Cached

### Download Links

Venue: | Journal of the ACM |

Citations: | 70 - 16 self |

### BibTeX

@ARTICLE{Omlin96constructingdeterministic,

author = {Christian W. Omlin and C. Lee Giles},

title = {Constructing Deterministic Finite-State Automata in Recurrent Neural Networks},

journal = {Journal of the ACM},

year = {1996},

volume = {43},

pages = {937--972}

}

### Years of Citing Articles

### OpenURL

### Abstract

Recurrent neural networks that are trained to behave like deterministic finite-state automata (DFAs) can show deteriorating performance when tested on long strings. This deteriorating performance can be attributed to the instability of the internal representation of the learned DFA states. The use of a sigmoidal discriminant function together with the recurrent structure contribute to this instability. We prove that a simple algorithm can construct second-order recurrent neural networks with a sparse interconnection topology and sigmoidal discriminant function such that the internal DFA state representations are stable, i.e. the constructed network correctly classifies strings of arbitrary length. The algorithm is based on encoding strengths of weights directly into the neural network. We derive a relationship between the weight strength and the number of DFA states for robust string classification. For a DFA with n states and m input alphabet symbols, the constructive algorithm genera...

### Citations

3824 |
Introduction to Automata Theory Languages and Computation,Second Edition
- HOPCROFT, MOTAWANI, et al.
(Show Context)
Citation Context ... will compare DFA encoding algorithm with other methods proposed in the literature. 2 FINITE STATE AUTOMATA Regular languages represent the smallest class of formal languages in the Chomsky hierarchy =-=[16]-=-. Regular languages are generated by regular grammars. A regular grammar G is a quadruple G =! S; N;T; P ? where S is the start symbol, N and T are non-terminal and terminal symbols, respectively, and... |

3603 |
Neural Networks: A Comprehensive Foundation
- Haykin
- 1998
(Show Context)
Citation Context ...n, the foundations necessary for the universal approximation theories of neural networks, the interpretation of neural network outputs as a posteriori probability estimates, etc. For more details see =-=[14]-=-. Stability of an internal DFA state representation implies that the output of the sigmoidal state neurons assigned to DFA states saturate at high gain; a constructed discrete-time network thus has st... |

1536 | Finding structure in time
- Elman
- 1990
(Show Context)
Citation Context ...he potential of greatly increasing the versatility of neural network implementations. 1.2 Background Recurrent neural networks can be trained to behave like deterministic finite-state automata (DFAs) =-=[4, 6, 9, 11, 25, 26, 31]-=-. The dynamical nature of recurrent networks can cause the internal representation of learned DFA states to deteriorate for long strings [32]; therefore, it can be difficult to make predictions about ... |

668 |
Fractals Everywhere
- Barnsley
- 1988
(Show Context)
Citation Context ...ion h(x; H) = h(x; H) = 1 1+e H(1\Gamma2nx)=2 since this special form of the discriminant will occur throughout the remainder of this paper. First, we define the concept of fixed points of a function =-=[2]-=-: Definition 5.3.1 Let f : X ! X be a mapping on a metric space (X, d). A point x f 2 X such that f(x f ) = x f is called a fixed point of the mapping. We are interested in a particular kind of fixed ... |

648 |
Analog VLSI and Neural Systems
- Mead
- 1989
(Show Context)
Citation Context ...for example see the recent special issue on dynamically-driven recurrent neural networks [10]). For enhanced performance, some of these neural network algorithms are mapped directly into VLSI designs =-=[20, 28]-=-. Neural networks readily enhance their performance by having a priori knowledge about the problem to be solved encoded or used in the neural network [8, 27]. This work discusses how a priori finite s... |

608 | Neural networks and the bias/variance dilemma - Geman, Bienenstock, et al. - 1992 |

453 | Computation: Finite and Infinite Machines - Minsky - 1967 |

210 | The induction of dynamical recognizers
- Pollack
- 1991
(Show Context)
Citation Context ...he potential of greatly increasing the versatility of neural network implementations. 1.2 Background Recurrent neural networks can be trained to behave like deterministic finite-state automata (DFAs) =-=[4, 6, 9, 11, 25, 26, 31]-=-. The dynamical nature of recurrent networks can cause the internal representation of learned DFA states to deteriorate for long strings [32]; therefore, it can be difficult to make predictions about ... |

172 |
Learning and extracting finite state automata with second-order recurrent neural networks
- Giles, Miller, et al.
- 1992
(Show Context)
Citation Context ...he potential of greatly increasing the versatility of neural network implementations. 1.2 Background Recurrent neural networks can be trained to behave like deterministic finite-state automata (DFAs) =-=[4, 6, 9, 11, 25, 26, 31]-=-. The dynamical nature of recurrent networks can cause the internal representation of learned DFA states to deteriorate for long strings [32]; therefore, it can be difficult to make predictions about ... |

103 |
The dynamics of discrete-time computation, with application to recurrent neural networks and finite state machine extraction
- Casey
- 1996
(Show Context)
Citation Context ...DFAstate representation becomes unstable with increasing string length due to the network 's dynamical nature and the sigmoidal discriminant function. This phenomenon has also been observed by others =-=[3, 29, 32]-=-. We encoded a randomly generated, minimized 100-state DFA with alphabet \Sigma = f0; 1g into a recurrent network with 101 state neurons. The graph in figure 5 shows the generalization performance of ... |

93 | Neural networks and the bias/variance dilemma. Neural computation - Geman, Bienenstock, et al. - 1992 |

90 | Graded State Machines: The representation of temporal contingencies in simple recurrent networks
- Servan-Schreiber, Cleeremans, et al.
- 1991
(Show Context)
Citation Context |

82 |
Induction of finite-state languages using second-order recurrent networks
- Watrous, Kuhn
- 1992
(Show Context)
Citation Context |

54 | A Framework for Combining Symbolic and Neural Learning
- Shavlik
- 1994
(Show Context)
Citation Context ...hms are mapped directly into VLSI designs [20, 28]. Neural networks readily enhance their performance by having a priori knowledge about the problem to be solved encoded or used in the neural network =-=[8, 27]-=-. This work discusses how a priori finite state automata rules can be encoded into a recurrent neural network with sigmoid activation neurons in such a way that arbitrary long string sequences are alw... |

45 |
Learning finite state machines with self-clustering recurrent networks
- Zeng, Goodman, et al.
- 1993
(Show Context)
Citation Context ...ministic finite-state automata (DFAs) [4, 6, 9, 11, 25, 26, 31]. The dynamical nature of recurrent networks can cause the internal representation of learned DFA states to deteriorate for long strings =-=[32]-=-; therefore, it can be difficult to make predictions about the generalization performance of trained recurrent networks. Recently, we have developed a simple method for encoding partial DFAs (state tr... |

39 |
Refinement of approximately correct domain theories by knowledge-based neural networks
- Shavlik, Noordewier, et al.
- 1990
(Show Context)
Citation Context ...e training, programming as few weights as possible is desirable because it leaves the network with many unbiased adaptable weights. This is important when a network is used for domain theory revision =-=[19, 27, 30]-=-, where the prior knowledge is not only incomplete, but may also be incorrect [13, 22]. Methods for constructing DFAs in recurrent networks where neurons have hard-limiting discriminant functions have... |

37 |
Efficient simulation of finite automata by neural nets
- Alon, Dewdney, et al.
- 1991
(Show Context)
Citation Context ...or knowledge is not only incomplete, but may also be incorrect [13, 22]. Methods for constructing DFAs in recurrent networks where neurons have hard-limiting discriminant functions have been proposed =-=[1, 18, 21]-=-. This paper is concerned with neural network implementations of DFAs where continuous sigmoidal discriminant functions are used. Stability of an internal DFA state representation implies that the out... |

36 | Representation of finite state automata in recurrent radial basis function networks
- Frasconi, Gori, et al.
- 1996
(Show Context)
Citation Context ...ucted network and the given DFA are not identical for an arbitrary distribution of the randomly initialized weights in the interval [\GammaW; W ]. 5.11 Comparison with other Methods Different methods =-=[1, 7, 5, 18, 21]-=- for encoding DFAs with n states and m input symbols in recurrent networks are summarized in table 1. The methods differ in the choice of the discriminant function (hard-limiting, sigmoidal, radial ba... |

36 |
Bounds on the complexity of recurrent neural network implementations of finite state machines
- Horne, Hush
- 1996
(Show Context)
Citation Context ...or knowledge is not only incomplete, but may also be incorrect [13, 22]. Methods for constructing DFAs in recurrent networks where neurons have hard-limiting discriminant functions have been proposed =-=[1, 18, 21]-=-. This paper is concerned with neural network implementations of DFAs where continuous sigmoidal discriminant functions are used. Stability of an internal DFA state representation implies that the out... |

35 | Training second-order recurrent neural networks using hints
- Omlin, Giles
- 1992
(Show Context)
Citation Context ...redictions about the generalization performance of trained recurrent networks. Recently, we have developed a simple method for encoding partial DFAs (state transitions) into recurrent neural networks =-=[12, 24]-=-. The goal was to demonstrate that prior knowledge can decrease the learning time significantly compared to learning without any prior knowledge. The training time improvement was `proportional' to th... |

34 | Using knowledge-based neural networks to improve algorithms: Refining the chou-fasman algorithm for protein folding
- Maclin, Shavlik
- 1993
(Show Context)
Citation Context ...e training, programming as few weights as possible is desirable because it leaves the network with many unbiased adaptable weights. This is important when a network is used for domain theory revision =-=[19, 27, 30]-=-, where the prior knowledge is not only incomplete, but may also be incorrect [13, 22]. Methods for constructing DFAs in recurrent networks where neurons have hard-limiting discriminant functions have... |

33 |
Unified integration of explicit rules and learning by example in recurrent networks
- Frasconi, Gori, et al.
- 1995
(Show Context)
Citation Context ...re continuous sigmoidal discriminant functions are used. Our method is an alternative to an algorithm for constructing DFA's in recurrent networks with first-order weights proposed by Frasconi et al. =-=[4, 5, 6]-=-. A short introduction to finite-state automata will be followed by a review of the method by Frasconi et al. We will prove that our method can implement any deterministic finite-state automaton in se... |

24 |
A unified approach for integrating explicit knowledge and learning by example in recurrent networks
- Frasconi, Gori, et al.
- 1991
(Show Context)
Citation Context |

23 |
Inserting rules into recurrent neural network
- Giles, Omlim
- 1992
(Show Context)
Citation Context ...redictions about the generalization performance of trained recurrent networks. Recently, we have developed a simple method for encoding partial DFAs (state transitions) into recurrent neural networks =-=[12, 24]-=-. The goal was to demonstrate that prior knowledge can decrease the learning time significantly compared to learning without any prior knowledge. The training time improvement was `proportional' to th... |

21 | Stable encoding of large finite-state automata in recurrent neural networks with sigmoid discriminants
- Omlin, Giles
- 1996
(Show Context)
Citation Context ... stable for values of the weight strength H which is considerably smaller than the value predicted by the theory. The question of how H scales with network size is important. The empirical results in =-=[23]-=- indicate that Hs6 for randomly generated DFAs independent of the size of the DFA. However, for DFAs where there exists one or several states q i with a large number of states q j for which ffi (q j ;... |

19 | Rule revision with recurrent neural networks
- Omlin, Giles
- 1996
(Show Context)
Citation Context ...ork with many unbiased adaptable weights. This is important when a network is used for domain theory revision [19, 27, 30], where the prior knowledge is not only incomplete, but may also be incorrect =-=[13, 22]-=-. Methods for constructing DFAs in recurrent networks where neurons have hard-limiting discriminant functions have been proposed [1, 18, 21]. This paper is concerned with neural network implementation... |

17 |
Dynamic recurrent neural networks: theory and applications
- Giles, Kuhn, et al.
- 1994
(Show Context)
Citation Context ...speech processing, plant control, adaptive signal processing, time series prediction, engine diagnostics etc. (for example see the recent special issue on dynamically-driven recurrent neural networks =-=[10]-=-). For enhanced performance, some of these neural network algorithms are mapped directly into VLSI designs [20, 28]. Neural networks readily enhance their performance by having a priori knowledge abou... |

14 |
Second-Order Recurrent Neural Networks for Grammatical Inference
- Gales, Chen, et al.
(Show Context)
Citation Context |

12 | Refining algorithms with knowledge-based neural networks: Improving the Chou-Fasman algorithm for protein folding
- Maclin, Shavlik
- 1992
(Show Context)
Citation Context ...e training, programming as few weights as possible is desirable because it leaves the network with many unbiased adaptable weights. This is important when a network is used for domain theory revision =-=[15, 21]-=-, where the prior knowledge is not only incomplete, but may also be incorrect [10, 17]. Methods for constructing DFA's in recurrent networks where neurons have hard-limiting discriminant functions hav... |

10 |
Rule refinement with recurrent neural networks
- Giles, Omlin
- 1993
(Show Context)
Citation Context ...ork with many unbiased adaptable weights. This is important when a network is used for domain theory revision [19, 27, 30], where the prior knowledge is not only incomplete, but may also be incorrect =-=[13, 22]-=-. Methods for constructing DFAs in recurrent networks where neurons have hard-limiting discriminant functions have been proposed [1, 18, 21]. This paper is concerned with neural network implementation... |

10 |
Neural Information Processing and VLSI
- Sheu
- 1995
(Show Context)
Citation Context ...for example see the recent special issue on dynamically-driven recurrent neural networks [10]). For enhanced performance, some of these neural network algorithms are mapped directly into VLSI designs =-=[20, 28]-=-. Neural networks readily enhance their performance by having a priori knowledge about the problem to be solved encoded or used in the neural network [8, 27]. This work discusses how a priori finite s... |

9 | Injecting nondeterministic finite state automata into recurrent neural networks
- Frasconi, Gori, et al.
- 1992
(Show Context)
Citation Context ...t for a special case of discrete-time recurrent networks. Our method is an alternative to an algorithm for constructing DFAs in recurrent networks with first-order weights proposed by Frasconi et al. =-=[6, 7]-=-. A short introduction to finite-state automata will be followed by a review of the method by Frasconi et al. We will prove that our method can implement any deterministic finite-state automaton in se... |

8 | Fixed points in two-neuron discrete time recurrent networks: Stability and bifurcation considerations
- Tino, Horne, et al.
- 1995
(Show Context)
Citation Context ...DFAstate representation becomes unstable with increasing string length due to the network 's dynamical nature and the sigmoidal discriminant function. This phenomenon has also been observed by others =-=[3, 29, 32]-=-. We encoded a randomly generated, minimized 100-state DFA with alphabet \Sigma = f0; 1g into a recurrent network with 101 state neurons. The graph in figure 5 shows the generalization performance of ... |

8 | Combining symbolic and neural - Shavlik - 1994 |

7 |
at high gain in discrete time recurrent networks
- Hirsch
- 1994
(Show Context)
Citation Context ...wn stability result asserts that for a broad class of discrete-time networks where all output neurons are either self-inhibiting or self-exciting, outputs at stable fixed points saturate at high gain =-=[15]-=-. Our proof of stability of an internal DFA state representation establishes such a result for a special case of discrete-time recurrent networks. Our method is an alternative to an algorithm for cons... |

7 | Special issue on dynamic recurrent neural networks - Giles, Kuhn, et al. - 1994 |

5 |
Computation: Finite and Infinite Machines, ch
- Minsky
- 1967
(Show Context)
Citation Context ...or knowledge is not only incomplete, but may also be incorrect [13, 22]. Methods for constructing DFAs in recurrent networks where neurons have hard-limiting discriminant functions have been proposed =-=[1, 18, 21]-=-. This paper is concerned with neural network implementations of DFAs where continuous sigmoidal discriminant functions are used. Stability of an internal DFA state representation implies that the out... |

4 |
Convergent activation dynamics in continuous-time neural networks
- Hirsch
- 1989
(Show Context)
Citation Context ...eurons assigned to DFA states saturate at high gain; a constructed discrete-time network thus has stable periodic 2 orbits. A saturation result has previously been proven for continuous-time networks =-=[14]-=-; for sufficiently high gain, the output along a stable limit cycle is saturated almost all the time. There is no known analog of this for stable periodic orbits of discrete-time networks. The only kn... |

2 |
Rule checking with recurrent neural networks
- Omlin, Giles
- 1993
(Show Context)
Citation Context ...network with many unbiased adaptable weights. This is important when a network is used for domain theory revision [15, 21], where the prior knowledge is not only incomplete, but may also be incorrect =-=[10, 17]-=-. Methods for constructing DFA's in recurrent networks where neurons have hard-limiting discriminant functions have been proposed [1, 14, 16]. This paper is concerned with neural network implementatio... |

2 |
G.Soda, "Recurrent neural networks and prior knowledge for sequence processing: A constrained nondeterminstic approach
- Frasconi, Gori
- 1995
(Show Context)
Citation Context ...t for a special case of discrete-time recurrent networks. Our method is an alternative to an algorithm for constructing DFAs in recurrent networks with first-order weights proposed by Frasconi et al. =-=[7, 5]-=-. A short introduction to finite-state automata will be followed by a review of the method by Frasconi et al. We will prove that our method can implement any deterministic finite-state automaton in se... |

1 | 971 Finite-State Automata in Neural Networks - GILES, OMLIN - 1993 |