## GENES IV: A Bit-Serial Processing Element for a Multi-Model Neural-Network Accelerator (1995)

Venue: | Journal of VLSI Signal Processing |

Citations: | 10 - 7 self |

### BibTeX

@ARTICLE{Ienne95genesiv:,

author = {Paolo Ienne and Marc A. Viredaz},

title = {GENES IV: A Bit-Serial Processing Element for a Multi-Model Neural-Network Accelerator},

journal = {Journal of VLSI Signal Processing},

year = {1995},

volume = {9},

pages = {345--356}

}

### OpenURL

### Abstract

Reprinted from Luigi Dadda and Benjamin Wah, editors, Proceedings of the International Conference on Application-Specific Array Processors, pages 345 -- 356, Venice, Italy, October 1993. Euromicro, IEEE, IEEE Computer Society Press. Copyright c fl 1993 by IEEE. A systolic array of dedicated processing elements (PEs) is presented as the heart of a multi-model neural-network accelerator. The instruction set of the PEs allows the implementation of several widely-used neural models, including multi-layer Perceptrons with the backpropagation learning rule and Kohonen feature maps. Each PE holds an element of the synaptic weight matrix. An instantaneous swapping mechanism of the weight matrix allows the implementation of neural networks larger than the physical PE array. A systolicallyflowing instruction accompanies each input vector propagating in the array. This avoids the need of emptying and refilling the array when the operating mode of the array is changed. Both the GENES IV chip, cont...

### Citations

1930 |
Introduction to the Theory of Neural Computation
- Hertz, Krogh, et al.
- 1991
(Show Context)
Citation Context ...works algorithms This section briefly introduces the algorithms addressed by the architecture presented in this paper. A more detailed description can be found in a classic introductory book, such as =-=[6]-=-. The typical artificial neuron model represents a device with n inputs and a single output. The output y i of the i-th neuron of the network is computed as: y i = oe (p i ) = oe ( n X j=1 W i;j \Delt... |

48 |
The future of time series: learning and understanding
- Gershenfeld, Weigend
- 1994
(Show Context)
Citation Context ...tions to real-life problems tend to become competitive with well-established traditional techniques. Recent experiences (as the Santa Fe Institute's competition on Time Series Prediction and Analysis =-=[5]-=-) show the maturity of some NN algorithms and the importance of their intrinsic non-linearity contrasted to classical linear approaches. Unfortunately, real applications tend to require large networks... |

25 | Experimental Determination of Precision Requirements for BackPropagation Training
- Asanovic´, Morgan
- 1991
(Show Context)
Citation Context ...lute minimum boundaries that, on themselves, cannot guarantee convergence [13]. In the absence of strong analytical grounds, simulation usually provides the designer with the required information. In =-=[2]-=- and [7] simulations are performed to determine the needs of typical backpropagation applications. These have been completed by simulations of the Kohonen model in the application that prompted the de... |

17 | Bit-Serial Multipliers and Squarers
- Ienne, Viredaz
- 1994
(Show Context)
Citation Context ...cycles in operations like euclidean, where the accumulation is performed on the longest data word of the system. An improved multiplication scheme was therefore developed for two's complement numbers =-=[8]-=-. A 17-bit multiplier has been implemented in the PE. Apart from presenting only a combinational delay, this design has the additional advantage of indefinitely sign-extending the output as long as th... |

16 |
Back propagation simulations using limited precision calculations
- Holt, Baker
- 1991
(Show Context)
Citation Context ...imum boundaries that, on themselves, cannot guarantee convergence [13]. In the absence of strong analytical grounds, simulation usually provides the designer with the required information. In [2] and =-=[7]-=- simulations are performed to determine the needs of typical backpropagation applications. These have been completed by simulations of the Kohonen model in the application that prompted the developmen... |

14 |
The Ring Array Processor: A Multiprocessing Peripheral for Connectionist Applications
- Morgan, Beck, et al.
- 1992
(Show Context)
Citation Context ...ely compete with larger general-purpose systems. The performance is also expected to be comparable or superior to most other NN-dedicated systems (e.g., the RAP, 45 MCUPS on a backpropagation problem =-=[11]-=-) or VLSI accelerators (Lneuro 1.0, 32 MCUPS [10]). The CNAPS Server/512 [1] has a higher peak performance of 1; 460 MCUPS (at a lower precision) thanks to a higher clock rate, parallel PE communicati... |

13 |
Quantization Effects in Digitally Behaving Circuit Implementations of Kohonen Networks
- Thiran, Peiris, et al.
- 1994
(Show Context)
Citation Context ... two's complement fixed point. As for the required number of bits for each element, theoretical analysis usually leads to absolute minimum boundaries that, on themselves, cannot guarantee convergence =-=[13]-=-. In the absence of strong analytical grounds, simulation usually provides the designer with the required information. In [2] and [7] simulations are performed to determine the needs of typical backpr... |

12 |
Achieving Supercomputer Performance for Neural Net Simulation with an Array
- Muller, Baumle, et al.
- 1992
(Show Context)
Citation Context ...at a utilization rate Us95 % can easily be sustained if the problem data are already in the DSP memory. These values compare well to those reported for supercomputers (e.g., 130 MCUPS on the NEC SX-3 =-=[12]-=-). The results show that, all differences taken into account (such as the fixed or floating point), neuro-computers can efficiently and cost-effectively compete with larger general-purpose systems. Th... |

5 |
Fast multipliers for two's-complement numbers in serial form
- Dadda
- 1985
(Show Context)
Citation Context ...in the pipelined nature of the array (requiring an even more complex control logic). Typical serial-serial multipliers have some cycles of latency between the first input bit and the first output one =-=[4]-=-. This would again lead to additional delay cycles in operations like euclidean, where the accumulation is performed on the longest data word of the system. An improved multiplication scheme was there... |

4 |
Une Implantation Systolique des Algorithmes Connexionnistes
- Blayo
- 1990
(Show Context)
Citation Context ...ecurrent network. The latter network is described by equation (1) with the input ~ x of one iteration being set to the output ~ y of the previous one. 3: Array architecture It has been shown by Blayo =-=[3]-=- that some common connectionist algorithms can be decomposed in simple operations that can be conveniently implemented on a 2-D mesh of PEs. This has been demonstrated with the implementation of a rec... |

4 | MANTRA I: An SIMD processor array for neural computation
- Viredaz
- 1993
(Show Context)
Citation Context ...pdate operations, the product cannot be determined with the 17-bit multiplier and it is therefore assumed that it always dominates over the additive term a. 5: System architecture The MANTRA I system =-=[14]-=- is shown in figure 6. The computational heart is a GENES IV array of 40 \Theta 40 PEs. The sequencing of the systolic array is performed by a TMS320C40 digital signal processor (DSP) from Texas Instr... |

3 |
R'eseaux de neurones comp'etitifs de grandes dimensions pour l'auto-organisation: analyse, synt`ese et implantation sur circuits systoliques
- Lehmann
- 1993
(Show Context)
Citation Context ...ctionist algorithms can be decomposed in simple operations that can be conveniently implemented on a 2-D mesh of PEs. This has been demonstrated with the implementation of a recurrent network in VLSI =-=[9]-=-. The GENES IV PE extends the basic scheme to include on-chip learning and to address more connectionist models. Figure 1 (a) shows the basic architecture, together with all the operands: (1) the matr... |

2 |
Jacques-Ariel Sirat. Lneuro 1.0: A piece of hardware LEGO for building neural network systems
- Mauduit, Duranton, et al.
- 1992
(Show Context)
Citation Context ...The performance is also expected to be comparable or superior to most other NN-dedicated systems (e.g., the RAP, 45 MCUPS on a backpropagation problem [11]) or VLSI accelerators (Lneuro 1.0, 32 MCUPS =-=[10]-=-). The CNAPS Server/512 [1] has a higher peak performance of 1; 460 MCUPS (at a lower precision) thanks to a higher clock rate, parallel PE communication, and a more advanced VLSI technology. Its PE o... |