## Rule Extraction from Recurrent Neural Networks: a Taxonomy and Review (2005)

### Cached

### Download Links

- [www.ida.his.se]
- [web.cecs.pdx.edu]
- [www.ida.his.se]
- DBLP

### Other Repositories/Bibliography

Venue: | Neural Computation |

Citations: | 24 - 3 self |

### BibTeX

@ARTICLE{Jacobsson05ruleextraction,

author = {Henrik Jacobsson},

title = {Rule Extraction from Recurrent Neural Networks: a Taxonomy and Review},

journal = {Neural Computation},

year = {2005},

volume = {17},

pages = {1223--1263}

}

### Years of Citing Articles

### OpenURL

### Abstract

this paper, the progress of this development is reviewed and analysed in detail. In order to structure the survey and to evaluate the techniques, a taxonomy, specifically designed for this purpose, has been developed. Moreover, important open research issues are identified, that, if addressed properly, possibly can give the field a significant push forward

### Citations

3836 |
J.D.: Introduction to automata theory, languages, and computation
- Hopcroft, Motwani, et al.
(Show Context)
Citation Context ...e actually two different models for how an FSM can be described; Mealy (as above) or Moore machines that, although they are quite different from each other, are computationally equivalent (Hopcroft & =-=Ullman 1979). M-=-oore machines generate outputs based only on the current state and Mealy machines on the transitions between states, i.e. the output function, γ, is for a Moore machine γ : Q → Y and for a Mealy m... |

3242 |
The self-organizing map
- Kohonen
- 1990
(Show Context)
Citation Context ...sistent transitions was introduced. This algorithm could, however, fail under certain circumstances so that the extraction of a DFA could not be guaranteed. A star topology self-organising map (SOM, (=-=Kohonen 1995))-=- was used to quantise the state space. Tiňo & ˇ Sajda (1995) were the first to extract Mealy instead of Moore machines and also the first who did not confine the output to binary accept/reject decis... |

1902 |
The Structure of Scientific Revolutions
- Kuhn
- 1962
(Show Context)
Citation Context ...n stem from more sophisticated analysis tools and measuring devices producing qualitatively new data conflicting with existing models (anomalies) that eventually may result in scientific revolutions (=-=Kuhn 1962-=-). Today we have deep, but partially conflicting theories, of what the RNNs will be able to do in practice (i.e. the Turing machine equivalence vs. the difficulty to acquire correct behaviour through ... |

1542 | Finding structure in time
- Elman
- 1990
(Show Context)
Citation Context ...far from being models of the nervous system. In the early nineties, the research on recurrent neural networks was revived. When Elman introduced his, quite well known, simple recurrent network (SRN) (=-=Elman 1990-=-), the connection between finite state machines and neural networks 1swas again there from the start. In his paper, the internal activations of the networks were compared to the states of a finite sta... |

1300 | Data clustering: A review - Jain, Murty, et al. - 1999 |

716 | A logical calculus of the ideas immanent in nervous activity - McCulloch, Pitts - 1943 |

360 | Probabilistic logics and the synthesis of reliable organisms from unreliable components - Neumann - 1956 |

253 | Learning longterm dependencies with gradient descent is difficult - Bengio, Simard, et al. - 1994 |

244 | Long short-term memory - Hochreiter, Schmidhuber - 1997 |

234 | A survey and critique of techniques for extracting rules from trained artificial neural networks - Andrews, Diederich, et al. - 1995 |

196 | Extracting refined rules from knowledge-based neural networks - Towell, Shavlik - 1993 |

172 | Learning and extracting finite state automata with second-order recurrent neural networks - Giles, Miller, et al. - 1992 |

170 | Probabilistic Automata - Rabin - 1963 |

160 |
Introduction to probabilistic automata
- Paz
- 1971
(Show Context)
Citation Context ...inistic Mealy machine, see Figure 2 for examples. Moreover, the machines can be stochastic 3 as well if transition probabilities are also encoded in the machine. 3 cf. stochastic sequential machines (=-=Paz 1971-=-) and probabilistic automata (Rabin 1963). 5s1:c a b a 2:d b 1 a:c b:c a:d A B 1:c a a b a 2:d C D 2 b:d 1 a:c b:d b:c a:d Figure 2: Examples of (non-equivalent) different finite state machine types w... |

158 | On the computational power of neural nets - Siegelmann, Sontag - 1995 |

149 |
Finite state automata and simple recurrent networks
- Cleeremans, Servan-Schreiber, et al.
- 1989
(Show Context)
Citation Context ...ave been used on RNNs is: Hinton diagrams (e.g. Hinton 1990, Niklasson & Bodén 1997), Hierarchical Cluster Analysis (e.g. Cleeremans, McClelland & Servan-Schreiber 1989, Elman 1990, Servan-Schreiber,=-= Cleeremans & McClelland 1989,-=- Sharkey & Jackson 1995, Bullinaria 1997), simple state space plots (e.g. Giles & Omlin 1993, Zeng, Goodman & Smyth 1993, Gori, Maggini & Soda 1994, Niklasson & Bodén 1997, Tonkes, Blair & Wiles 1998... |

129 |
Mathematical Classification and Clustering
- Mirkin
- 1996
(Show Context)
Citation Context ...ring (Cechin et al. 2003). That makes eight different techniques, not counting small variations in implementations. Although just a fraction of existing clustering techniques have at all been tested (=-=Mirkin 1996-=-, Jain, Murty & Flynn 1999) it is clear that a multitude of existing clustering techniques has been used to solve the quantisation problem. But what is most striking about this multitude of various te... |

117 |
Mapping part-whole hierarchies into connectionist networks
- Hinton
- 1990
(Show Context)
Citation Context ...nd a survey on this issue should definitely be written as well. A brief (and most probably inconclusive) list of examples of other analysis tools that have been used on RNNs is: Hinton diagrams (e.g. =-=Hinton 1990,-=- Niklasson & Bodén 1997), Hierarchical Cluster Analysis (e.g. Cleeremans, McClelland & Servan-Schreiber 1989, Elman 1990, Servan-Schreiber, Cleeremans & McClelland 1989, Sharkey & Jackson 1995, Bulli... |

103 |
The dynamics of discrete-time computation, with application to recurrent neural networks and finite state machine extraction
- Casey
- 1996
(Show Context)
Citation Context ... and traditional (discrete) computational devices (Crutchfield & Young 1990, Servan-Schreiber, Cleeremans & McClelland 1991, Crutchfield 1994, Kolen 1994, Horne & Hush 1994, Siegelmann & Sontag 1995, =-=Casey 1996, T-=-iňo, Horne, Giles & Collingwood 1998, Blank & 24 co-authors 1999, Omlin & Giles 2000, Sima & Orponen 2003, Hammer & Tiňo 2003, Tiňo & Hammer 2003). These papers cover a wide spectrum of highly inte... |

90 | Graded State Machines: The representation of temporal contingencies in simple recurrent networks - Servan-Schreiber, Cleeremans, et al. - 1991 |

87 | Extracting tree-structured representations of trained networks - Craven, Shavlik - 1996 |

87 | Back-propagation Is Sensitive to Initial Conditions - Kolen, Pollack |

84 | Computation at the onset of chaos, in: by W - Crutchfield, Young |

74 |
The truth will come to light: Directions and challenges in extracting the knowledge embedded within trained artificial neural networks
- Tickle, Andrews, et al.
- 1998
(Show Context)
Citation Context ... been demonstrated to work on the state-of-the-art RNNs operating on the most challenging domains. 7.5 Relation to the ADT taxonomy Although the ADT taxonomy (Andrews et al. 1995, Tickle et al. 1997, =-=Tickle et al. 1998-=-) has not been used explicitly to classify the techniques in the taxonomy of this paper, it can be highly useful as a basis for discussion of the techniques. Some important points will be evident when... |

73 | Using Sampling and Queries to Extract Rules from Trained Neural Networks - Craven, Shavlik - 1994 |

70 | Constructing deterministic finitestate automata in recurrent neural networks - Omlin, Giles - 1996 |

67 | Dynamic cell structure learns perfectly topology preserving map - Bruske, Sommer - 1995 |

61 | Extraction of rules from discretetime recurrent neural networks - Omlin, Giles - 1996 |

58 |
Adaptive nonlinear system identification with Echo State Networks
- Jaeger
- 2003
(Show Context)
Citation Context ...the domain of FSM generated languages (Forcada & Carrasco 2001), 2. have a clearly defined input, state and output, i.e. less or randomly structured RNNs may be problematic (e.g. echo state networks (=-=Jaeger 2003-=-)), 3. have a fully observable state, otherwise unobserved state nodes or noise in the observation process would disturb the extraction process since the state space would not be reliably quantised, 4... |

58 | Encoding sequential structures in simple recurrent networks - Servan-Schreiber, Cleeremans, et al. - 1988 |

58 | Learning complex, extended sequences using the principle of history compression - Schmidhuber - 1992 |

56 | LSTM recurrent networks learn simple context-free and context-sensitive languages - Gers, Schmidhuber - 2001 |

49 | An incremental approach to developing intelligent neural network controllers for robots
- Meeden
- 1996
(Show Context)
Citation Context ... 1998, Tonkes & Wiles 1999, Rodriguez, Wiles & Elman 1999, Rodriguez 1999, Tabor & Tanenhaus 1999, Lin˚aker & Jacobsson 2001), activation values plotted over time (e.g. Husbands, Harvey & Cliff 1995,=-= Meeden 1996-=-, Ziemke & Thieme 2002), iterated maps (e.g. Wiles & Elman 1995), vector flow fields (e.g. Rodriguez et al. 1999, Rodriguez 1999), external behaviour analysis of RNN-controlled autonomous robotic cont... |

47 | Noisy Time Series Prediction Using Recurrent Neural - Giles, Lawrence, et al. - 2001 |

45 | Natural language grammatical inference with recurrent neural networks - Lawrence, Giles, et al. |

45 | Learning finite state machines with self-clustering recurrent networks - Zeng, Goodman, et al. - 1993 |

39 | Circle in the round: state space attractors for evolved sighted robots. Robotics and Autonomous Systems, Forthcoming - Husbands, Harvey, et al. |

39 | Learning to count without a counter: A case study of dynamics and activation landscapes in recurrent networks
- Wiles, Elman
- 1995
(Show Context)
Citation Context ...odriguez, Wiles & Elman 1999, Rodriguez 1999, Tabor & Tanenhaus 1999), activation values plotted over time (e.g. Husbands, Harvey & Cliff 1995, Meeden 1996, Ziemke & Thieme 2002), iterated maps (e.g. =-=Wiles & Elman 1995-=-), vector flow fields (e.g. Rodriguez et al. 1999, Rodriguez 1999), external behaviour analysis of RNN-controlled autonomous robotic controllers (e.g. Husbands et al. 1995, Meeden 1996), weight space ... |

38 |
1994] “The calculi of emergence
- Crutchfield
(Show Context)
Citation Context ...lishing the connection between (analogue) RNNs (or other dynamical systems) and traditional (discrete) computational devices (Crutchfield & Young 1990, Servan-Schreiber, Cleeremans & McClelland 1991, =-=Crutchfield 1994, -=-Kolen 1994, Horne & Hush 1994, Siegelmann & Sontag 1995, Casey 1996, Tiňo, Horne, Giles & Collingwood 1998, Blank & 24 co-authors 1999, Omlin & Giles 2000, Sima & Orponen 2003, Hammer & Tiňo 2003, T... |

38 |
Dynamic Construction Of FiniteState Automata From Examples Using Hill-Climbing
- Tomita
- 1982
(Show Context)
Citation Context ...inary (accept/reject) decision. Quantisation: A simple vector quantifier, details unclear. State generation: Sampling on a test set. Network(s): First-order RNNs. Domain(s): Regular binary languages (=-=Tomita 1982).-=- Table 6: The sampling-based DFA extractor proposed originally in Fanelli (1993). DFM extraction, SOM, sampling on domain (Tiňo & ˇ Sajda 1995) Rule type: Mealy DFM with multiple output symbols. Qua... |

37 | Extracting and learning an unknown grammar with recurrent neural networks - Giles, Miller, et al. - 1992 |

36 | Representation of finite state automata in recurrent radial basis function networks
- Frasconi, Gori, et al.
- 1996
(Show Context)
Citation Context ...& Soda (1998), and a similar SOM-based approach in Blanco et al. (2000). A summary of these approaches are given in Table 3. DFA extraction, vector quantifier, breadth first search (Zeng et al. 1993, =-=Frasconi et al. 1996-=-, Gori et al. 1998) Rule type: Moore DFA with binary (accept/reject) output. Quantisation: k-means. State generation: Breadth-first search. Network(s): Second-order RNNs (Zeng et al. 1993), Recurrent ... |

36 | An experimental comparison of recurrent neural networks - Horne, Giles - 1995 |

36 | Bounds on the complexity of recurrent neural network implementations of finite state machines - Horne, Hush - 1996 |

35 | Training second-order recurrent neural networks using hints - Omlin, Giles - 1992 |

34 |
Extraction, insertion and refinement of symbolic rules in dynamically driven recurrent neural networks
- Giles, Omlin
- 1993
(Show Context)
Citation Context ...d the intrinsic learning problem for a specific domain. Such studies are conducted in virtually all papers applying RNNs on a domain, and in some cases more systematic studies are presented (Miller & =-=Giles 1993,-=- Horne & Giles 1995, Alquézar, Sanfeliu & Sainz 1997). But even something as simple as evaluating the performance of an RNN on a specific domain has some intrinsic problems since implicit aspects of ... |

33 | Dynamical models of sentence processing - Tabor, Tanenhaus - 1999 |

33 | Markovian architectural bias of recurrent neural networks
- Tino, Cernansky, et al.
- 2004
(Show Context)
Citation Context ...chfield 1994, Kolen 1994, Horne & Hush 1994, Siegelmann & Sontag 1995, Casey 1996, Tiňo, Horne, Giles & Collingwood 1998, Blank & 24 co-authors 1999, Omlin & Giles 2000, Sima & Orponen 2003, Hammer &=-= Tiňo 2003,-=- Tiňo & Hammer 2003). These papers cover a wide spectrum of highly interesting and important theoretical insights, but in this paper we will not dwell on these theoretical issues. First of all becaus... |

32 | Analysis of dynamical recognizers - Blair, Pollack - 1997 |

32 | A unified gradient-descent/clustering architecture for finite state machine induction
- Das, Mozer
- 1994
(Show Context)
Citation Context ...crostate, see Figures 3 and 4. The first encountered microstate of each macrostate was then used to induce new states. This guaranteed the extraction of a deterministic machine since any state drift (=-=Das & Mozer 1994-=-, Das & Mozer 1998) was avoided as the search was pruned when reentering already visited partitions. The extracted automaton was then minimised using a standard minimisation algorithm for DFAs (Hopcro... |