## Digital Neurochip Design (1991)

Venue: | In K. Wojtek Przytula and Viktor K. Prasanna, editors, Digital Parallel Implementations of Neural Networks |

Citations: | 2 - 0 self |

### BibTeX

@INPROCEEDINGS{Burr91digitalneurochip,

author = {James B. Burr},

title = {Digital Neurochip Design},

booktitle = {In K. Wojtek Przytula and Viktor K. Prasanna, editors, Digital Parallel Implementations of Neural Networks},

year = {1991},

pages = {223--281},

publisher = {Prentice Hall}

}

### OpenURL

### Abstract

Introduction This chapter describes a methodology for designing digital VLSI neurochips which emphasizes area, power, and performance estimation to facilitate architectural exploration in the early stages of design. It first discusses some key aspects of mapping neural net algorithms onto VLSI architectures. It then introduces a set of circuit level building blocks commonly used in constructing digital nets. It discusses how to estimate chip area, performance, and power consumption in architectures constructed from these blocks, showing how to include technology scaling rules in the estimation process. It concludes with a detailed discussion of a CMOS implementation of a digital Boltzmann machine. 2 Mapping algorithms to architectures An algorithm is a set of tasks to be applied to data in a specified order to transform inputs and internal state to desired outputs. An architecture is a set of resources and interconnections. Mapping algorithms to architectures

### Citations

4433 |
Computer Architecture: A Quantitative Approach, 4th ed
- Hennessy, Patterson
- 2007
(Show Context)
Citation Context ...hniques can be used to advantage in designing digital neural networks. This section is not intended to be a thorough presentation of digital logic design, as there are many excellent sources for this =-=[48, 64, 67, 27]-=-. Rather, we assume a familiarity with the basics and seek to highlight specific structures which are especially useful in designing digital neural network arithmetic elements, memory, and noise sourc... |

474 | A learning algorithm for boltzmann machines
- Ackley, Hinton, et al.
- 1985
(Show Context)
Citation Context ...face sigmoid unit r/w r next r/w r current r/w r last activation store error accumulator Figure 13: Stanford Boltzmann Engine block diagram. 38 James B. Burr 6.1 Boltzmann machines Boltzmann machines =-=[28, 1, 6, 29, 53]-=- are a special class of neural networks whose learning algorithm can be shown to minimize a global energy measure using only local information. The network contains recurrent connections (feedback), w... |

455 |
Introduction to VLSI Systems
- Mead, Conway
- 1980
(Show Context)
Citation Context ...hniques can be used to advantage in designing digital neural networks. This section is not intended to be a thorough presentation of digital logic design, as there are many excellent sources for this =-=[48, 64, 67, 27]-=-. Rather, we assume a familiarity with the basics and seek to highlight specific structures which are especially useful in designing digital neural network arithmetic elements, memory, and noise sourc... |

283 |
SPICE2: a computer program to simulate semiconductor circuits
- Nagel
- 1975
(Show Context)
Citation Context ...C. Transistor level schematics were described using net [63]. VLSI layout, extraction, and design rule checking were done with magic [50]. Detailed timing simulation and analysis were done with spice =-=[49]-=-. Functional simulation, coarse timing, critical path analysis, and power measurements were done using our own version of rsim [63, 10]. 6.3 Multichip networks The chip is designed to be tiled in a re... |

199 |
Signed-digit number representations for fast parallel arithmetic
- Avizienis
(Show Context)
Citation Context ...path is 3 xors in series. Carry propagation is an expensive operation in digital arithmetic. Several families of arithmetic have been developed to reduce the impact of carry propagation. Signed digit =-=[9] and vario-=-us redundant binary methods [24] have been proposed. "4:2" arithmetic based on 4:2 adders [60] interfaces cleanly to standard two's complement, implements an efficient, compact accumulator [... |

150 |
A mean field theory learning algorithm for neural networks
- Peterson, Anderson
- 1987
(Show Context)
Citation Context ... schedule. Annealing is necessary due to the recurrent nature of the connections, and to avoid local minima. There are two distinct types of Boltzmann machines: "stochastic" [28, 1], and &qu=-=ot;mean field" [52] or "-=-deterministic" [29]. The neurons in stochastic Boltzmann machines are binary valued stochastic decision elements. The neurons in deterministic Boltzmann machines are deterministic and multivalued... |

96 |
High-speed CMOS circuit tech nique
- Yuan, Svensson
(Show Context)
Citation Context ...MOS 1980 2phase [48] Nonoverlapping 2-phase clocking 1983 NORA [22] Race-free dynamic CMOS 1987 TSPC-1 [35] True single-phase clocking, version 1 1988 SSPC [46] Safe single-phase clocking 1989 TSPC-2 =-=[69]-=- True single-phase clocking, version 2 Table 3: Clocking styles. is compact, and low power. Yano et al's multiplier paper [68] has a good explanation of the style as well as detailed schematics of CPL... |

69 |
Boltzmann machines: constraint satisfaction network that learn. Carnegie Mellon University technical report
- Hinton, Sejnowski, et al.
- 1984
(Show Context)
Citation Context ...face sigmoid unit r/w r next r/w r current r/w r last activation store error accumulator Figure 13: Stanford Boltzmann Engine block diagram. 38 James B. Burr 6.1 Boltzmann machines Boltzmann machines =-=[28, 1, 6, 29, 53]-=- are a special class of neural networks whose learning algorithm can be shown to minimize a global energy measure using only local information. The network contains recurrent connections (feedback), w... |

69 |
Deterministic Boltzmann learning performs steepest descent in weightspace
- Hinton
- 1989
(Show Context)
Citation Context ...face sigmoid unit r/w r next r/w r current r/w r last activation store error accumulator Figure 13: Stanford Boltzmann Engine block diagram. 38 James B. Burr 6.1 Boltzmann machines Boltzmann machines =-=[28, 1, 6, 29, 53]-=- are a special class of neural networks whose learning algorithm can be shown to minimize a global energy measure using only local information. The network contains recurrent connections (feedback), w... |

64 |
Simple formulas for two and three dimensional capacitance,” IEEE Trans. on Electron Devices
- Sakurai, Tamaru
- 1983
(Show Context)
Citation Context ...nce. Table 5 summarizes the equations we use to compute propagation delay. The symbology of these equations follows the development in Hodges and Jackson [30]. Capacitance formulas were obtained from =-=[55]. Th-=-e effective resistance should be increased to reflect velocity saturation, especially in short channel devices. Velocity saturation occurs at around 4V in 2.0�� CMOS. Digital Neurochip Design 33 T... |

63 |
Introduction to Arithmetic for Digital Systems Designers
- Waser, Flynn
- 1982
(Show Context)
Citation Context ...hniques can be used to advantage in designing digital neural networks. This section is not intended to be a thorough presentation of digital logic design, as there are many excellent sources for this =-=[48, 64, 67, 27]-=-. Rather, we assume a familiarity with the basics and seek to highlight specific structures which are especially useful in designing digital neural network arithmetic elements, memory, and noise sourc... |

56 |
A VLSI architecture for High-Performance, Low-Cost, On-chip Learning
- Hammerstrom
- 1990
(Show Context)
Citation Context ...mented are better suited to signal processing applications than to the other DARPA applications because they feature a relatively small synaptic store for each processor. The Adaptive Solutions CNAPS =-=[23]-=-, for example, has 64 processors and 4K 8-bit weights per processor. Each processor can execute 25 million CPS, resulting in about 10 4 CPS/C. Each neural processor in our Boltzmann Engine (see Sectio... |

52 |
Highspeed compact circuits with CMOS
- Krambeck, Lee, et al.
(Show Context)
Citation Context ...to wirelength. Massively parallel architectures need to be very careful about the number of long range connections. 10 James B. Burr year what who description - Static - Fully static CMOS 1982 Domino =-=[39]-=- Domino logic 1987 DCVSL [16] Differential cascode voltage switch logic 1987 DPTL [51] Differential pass transistor logic 1990 CPL [68] Complementary pass transistor logic 1991 L-DPTL [40] Latched dif... |

48 |
Weste and Kamran Eshraghian, “Principles of CMOS VLSI Design A System Perspective second edition
- Neil
- 1994
(Show Context)
Citation Context |

41 |
R.: An Electrically Trainable Artificial Neural Network (ETANN) with 10240 “Floating Gate” Synapses
- Holler, Tam, et al.
- 1989
(Show Context)
Citation Context ...(12) MOSIS 2T DRAM 13\Theta31 403 200 6(36) MOSIS 3T DRAM 24\Theta24 576 192 9(54) MOSIS 6T SRAM 32\Theta36 1152 192 18(108) MOSIS 7T USC 40\Theta60 2400 342 37 [41] 12T ETANN 83\Theta97 8036 671 125 =-=[31]-=- 42T Mitsubishi 200\Theta200 40000 952 625 [8] 200T ECLNN 160\Theta320 51200 256 700 [4] 1T MIT CCD 12\Theta10 120 120 2(12) MOSIS 3T MIT CAM - - - - MIT 30T Andreas 80\Theta80 6400 213 100 JHU 30T Bo... |

39 |
Explorations of the mean field theory learning algorithm
- Peterson, Hartman
- 1989
(Show Context)
Citation Context |

27 |
A comparison of CMOS circuit techniques: Differential cascode voltage switch logic versus conventional logic
- Chu, Pulfrey
- 1987
(Show Context)
Citation Context ...llel architectures need to be very careful about the number of long range connections. 10 James B. Burr year what who description - Static - Fully static CMOS 1982 Domino [39] Domino logic 1987 DCVSL =-=[16]-=- Differential cascode voltage switch logic 1987 DPTL [51] Differential pass transistor logic 1990 CPL [68] Complementary pass transistor logic 1991 L-DPTL [40] Latched differential pass transistor log... |

27 |
Analysis and Design of Digital Integrated Circuits
- Hodges, Jackson
- 1983
(Show Context)
Citation Context ...stance charging or discharging a node capacitance. Table 5 summarizes the equations we use to compute propagation delay. The symbology of these equations follows the development in Hodges and Jackson =-=[30]-=-. Capacitance formulas were obtained from [55]. The effective resistance should be increased to reflect velocity saturation, especially in short channel devices. Velocity saturation occurs at around 4... |

26 |
A vlsi-efficient technique for generating multiple uncorrelated noise sources and its application to stochastic neural networks,” Circuits and Systems
- Alspector, Gannett, et al.
- 1991
(Show Context)
Citation Context ...urces is a hard problem. Joshua Alspector and colleagues have developed an area-efficient technique which they have implemented on their ECLNN chip [4]. They give this subject a thorough treatment in =-=[5]-=-. Our Boltzmann Engine has a limited requirement for noise, which we implement using a shift register with feedback xors tapped off strategic locations. Each such structure generates an uncorrelated p... |

26 |
Performance of a stochastic learning microchip
- Alspector, Gupta, et al.
- 1989
(Show Context)
Citation Context ... incrementally over many cycles. Stochastic techniques Digital Neurochip Design 31 have been applied successfully to Hopfield networks [36], competitive learning networks [17], and Boltzmann machines =-=[6, 4]-=-. Figure 12 shows the two basic operations: multiplication by ANDing pulse streams (P (AB) = P (A)P (B)), and addition by ORing (P (A + B) = P (A)+P (B) \Gamma P (AB)). The P (AB) term in the probabil... |

24 |
Elimination of Process-Dependent Clock Skew in CMOS VLSI
- Shoji
- 1986
(Show Context)
Citation Context ...lk) globally, locally, or not at all. Global clkL generation has the highest performance but the narrowest margin due to clock skew within the clock distribution networks. The techniques described in =-=[61]-=- can minimize the impact of process-dependent clock skew on global clock drivers. Local clkL generation has wider margins but generally lower performance. Clock distribution at the chip level is extre... |

23 |
Ultra low power CMOS technology
- Burr, Peterson
- 1991
(Show Context)
Citation Context ...greater logic depths, logic energy dominates. We have found the 8 James B. Burr optimum logic depth for a 32\Theta32 bit multiplier to be just about equal to the propagation delay through a 4:2 adder =-=[13]-=-. At this logic depth, the area penalty is 37%. We have obtained similar results for both array and tree multipliers ranging in size from 4\Theta4 bits to 256\Theta256 bits, and various types of adder... |

18 | n-p-CMOS: A racefree dynamic CMOS technique for pipelined logic structures
- Goncalves, Man
- 1982
(Show Context)
Citation Context ... pass-transistor logic (CPL) offers modest performance, Digital Neurochip Design 11 year what who description 1973 C 2 MOS [62] Clocked CMOS 1980 2phase [48] Nonoverlapping 2-phase clocking 1983 NORA =-=[22]-=- Race-free dynamic CMOS 1987 TSPC-1 [35] True single-phase clocking, version 1 1988 SSPC [46] Safe single-phase clocking 1989 TSPC-2 [69] True single-phase clocking, version 2 Table 3: Clocking styles... |

18 |
A Digital Neural Network Architecture for VLSI
- Tomlinson, Walker, et al.
(Show Context)
Citation Context ...en the case in neural network learning, where the network evolves incrementally over many cycles. Stochastic techniques Digital Neurochip Design 31 have been applied successfully to Hopfield networks =-=[36]-=-, competitive learning networks [17], and Boltzmann machines [6, 4]. Figure 12 shows the two basic operations: multiplication by ANDing pulse streams (P (AB) = P (A)P (B)), and addition by ORing (P (A... |

18 |
Design and Clocking of VLSI Multipliers
- Santoro
- 1989
(Show Context)
Citation Context ...ously at the point of use and are only distributed locally over the iterative structure. Using this technique, Mark Santoro demonstrated 400MHz operation of a 4:2 adder-based multiplier in 0.8�� C=-=MOS [56, 57, 58]-=-. He built the local clock driver out of a 4:2 adder so it would track temperature and process variations. 3.4 Concurrency Concurrency is a widely used technique for increasing performance through par... |

17 |
a Pipelined 64x64 bit iterative multiplier
- Santoro, Horowitz
- 1989
(Show Context)
Citation Context ...ously at the point of use and are only distributed locally over the iterative structure. Using this technique, Mark Santoro demonstrated 400MHz operation of a 4:2 adder-based multiplier in 0.8�� C=-=MOS [56, 57, 58]-=-. He built the local clock driver out of a 4:2 adder so it would track temperature and process variations. 3.4 Concurrency Concurrency is a widely used technique for increasing performance through par... |

17 |
A 3.8-ns CMOS 16x16-b Multiplier using Complementary PassTransistor Logic
- Yano
- 1990
(Show Context)
Citation Context ...ar what who description - Static - Fully static CMOS 1982 Domino [39] Domino logic 1987 DCVSL [16] Differential cascode voltage switch logic 1987 DPTL [51] Differential pass transistor logic 1990 CPL =-=[68]-=- Complementary pass transistor logic 1991 L-DPTL [40] Latched differential pass transistor logic Table 2: Logic design styles. 4 Building blocks A number of basic circuits and circuit techniques can b... |

13 |
Deterministic boltzmann learning in networks with asymmetric connectivity
- Galland, Hinton
- 1989
(Show Context)
Citation Context ...e" unit, which supplies a "1" to all the other units. Connections between the units are usually symmetric (w ij = w ji ), although this constraint can be relaxed if the weights are perm=-=itted to decay [21, 3]-=-. Network response to an input vector is found by initializing the hidden and output activations to random values and then annealing the network according to a temperature schedule. Annealing is neces... |

12 |
A True Single Phase Clock Dynamic CMOS Circuit Technique
- Yuan, Karlsson, et al.
- 1987
(Show Context)
Citation Context ...st performance, Digital Neurochip Design 11 year what who description 1973 C 2 MOS [62] Clocked CMOS 1980 2phase [48] Nonoverlapping 2-phase clocking 1983 NORA [22] Race-free dynamic CMOS 1987 TSPC-1 =-=[35]-=- True single-phase clocking, version 1 1988 SSPC [46] Safe single-phase clocking 1989 TSPC-2 [69] True single-phase clocking, version 2 Table 3: Clocking styles. is compact, and low power. Yano et al'... |

11 |
Clocked CMOS Calculator Circuitry
- Suzuki, Odagawa, et al.
- 1973
(Show Context)
Citation Context ...high power, however, since every output toggles on every cycle. Complementary pass-transistor logic (CPL) offers modest performance, Digital Neurochip Design 11 year what who description 1973 C 2 MOS =-=[62]-=- Clocked CMOS 1980 2phase [48] Nonoverlapping 2-phase clocking 1983 NORA [22] Race-free dynamic CMOS 1987 TSPC-1 [35] True single-phase clocking, version 1 1988 SSPC [46] Safe single-phase clocking 19... |

8 | Energy considerations in multichip-module based multiprocessors
- Burr, Peterson
- 1991
(Show Context)
Citation Context ...is fully on due to the increase in V t =V dd . Leakage, on the other hand, becomes very important if the threshold voltage is reduced. We have discussed energy optimization at low voltage in [13] and =-=[12]-=-. The key element in our power estimation technique is obtaining an accurate estimate of circuit activity a. Results reported from event driven simulation suggest a is around 10%. That is, around 10% ... |

8 |
A pipelined 64X64b iterative array multiplier,” in Digest of Technical Pa- pers
- Santoro, Horowitz
- 1988
(Show Context)
Citation Context ...ously at the point of use and are only distributed locally over the iterative structure. Using this technique, Mark Santoro demonstrated 400MHz operation of a 4:2 adder-based multiplier in 0.8�� C=-=MOS [56, 57, 58]-=-. He built the local clock driver out of a 4:2 adder so it would track temperature and process variations. 3.4 Concurrency Concurrency is a widely used technique for increasing performance through par... |

8 |
4-2 carry-save adder implementation using send circuits
- Shen, Weinberger
- 1978
(Show Context)
Citation Context ...milies of arithmetic have been developed to reduce the impact of carry propagation. Signed digit [9] and various redundant binary methods [24] have been proposed. "4:2" arithmetic based on 4=-=:2 adders [60]-=- interfaces cleanly to standard two's complement, implements an efficient, compact accumulator [44, 43, 42, 56, 57, 58], and has optimal logic depth [13]. Although 4:2 adders can be implemented using ... |

6 |
Learning of stable states in stochastic asymmetric networks
- Allen, Alspector
- 1989
(Show Context)
Citation Context ...e" unit, which supplies a "1" to all the other units. Connections between the units are usually symmetric (w ij = w ji ), although this constraint can be relaxed if the weights are perm=-=itted to decay [21, 3]-=-. Network response to an input vector is found by initializing the hidden and output activations to random values and then annealing the network according to a temperature schedule. Annealing is neces... |

6 |
CLC - A cascadeable learning chip
- Alspector
- 1990
(Show Context)
Citation Context ...6T SRAM 32\Theta36 1152 192 18(108) MOSIS 7T USC 40\Theta60 2400 342 37 [41] 12T ETANN 83\Theta97 8036 671 125 [31] 42T Mitsubishi 200\Theta200 40000 952 625 [8] 200T ECLNN 160\Theta320 51200 256 700 =-=[4]-=- 1T MIT CCD 12\Theta10 120 120 2(12) MOSIS 3T MIT CAM - - - - MIT 30T Andreas 80\Theta80 6400 213 100 JHU 30T Bourke 106\Theta113 11978 399 187 U.Sydney 30T M.Pert 150\Theta180 27000 900 422 U.Sydney ... |

6 | Stochastic Computers
- Gaines
(Show Context)
Citation Context ...e the energy required to transmit data with a dynamic range of N is proportional to N rather than to log(N) as in standard digital encoding. Good discussions of stochastic computation can be found in =-=[19, 47, 20]-=-. 5 Area, power, and performance estimation We have developed a simple area, performance, and power estimation technique which we use to construct spreadsheets in the early stages of architectural exp... |

6 |
VLSI Image Processors using Analog Programmable Synapses and Neurons
- Lee, Sheu
- 1990
(Show Context)
Citation Context ... commercial 1T DRAM 8\Theta16 128 128 2(12) MOSIS 2T DRAM 13\Theta31 403 200 6(36) MOSIS 3T DRAM 24\Theta24 576 192 9(54) MOSIS 6T SRAM 32\Theta36 1152 192 18(108) MOSIS 7T USC 40\Theta60 2400 342 37 =-=[41]-=- 12T ETANN 83\Theta97 8036 671 125 [31] 42T Mitsubishi 200\Theta200 40000 952 625 [8] 200T ECLNN 160\Theta320 51200 256 700 [4] 1T MIT CCD 12\Theta10 120 120 2(12) MOSIS 3T MIT CAM - - - - MIT 30T And... |

4 |
Yoshinobu Nakagome, Masahi Horiguchi, Shin'ichi Ikenaga, and Katsuhiro Shimohigashi. A 16-level/cell dynamic memory
- Aoki
- 1987
(Show Context)
Citation Context ...have the highest density. Six-transistor (6T) static memories (SRAMs) consume 22 James B. Burr fets types\Thetas2 2 =T rel.area who 1T FLASH 8\Theta8 64 64 1(6) commercial 1T DRAM 8\Theta8 64 64 1(2) =-=[7]-=-: 4 bit/cell 1T DRAM 8\Theta8 64 64 1(6) commercial 4T SRAM 12\Theta20 240 60 4(24) commercial 1T DRAM 8\Theta16 128 128 2(12) MOSIS 2T DRAM 13\Theta31 403 200 6(36) MOSIS 3T DRAM 24\Theta24 576 192 9... |

4 |
Atushi Maeda, Harufusa Kondoh, and Shinpei Kayano. A self-learning neural network chip with 125 neurons and 10K self-organization synapses
- Arima, Mashiko, et al.
- 1990
(Show Context)
Citation Context ...IS 3T DRAM 24\Theta24 576 192 9(54) MOSIS 6T SRAM 32\Theta36 1152 192 18(108) MOSIS 7T USC 40\Theta60 2400 342 37 [41] 12T ETANN 83\Theta97 8036 671 125 [31] 42T Mitsubishi 200\Theta200 40000 952 625 =-=[8]-=- 200T ECLNN 160\Theta320 51200 256 700 [4] 1T MIT CCD 12\Theta10 120 120 2(12) MOSIS 3T MIT CAM - - - - MIT 30T Andreas 80\Theta80 6400 213 100 JHU 30T Bourke 106\Theta113 11978 399 187 U.Sydney 30T M... |

4 | System-wide energy optimization in the MCM environment
- Burr, Burnham, et al.
- 1991
(Show Context)
Citation Context ... reduced by velocity saturation. 34 James B. Burr 5.3 Power estimation We estimate power by estimating the capacitance switched on each clock cycle. We ignore short circuit current and DC leakage. In =-=[11]-=-, we showed that short circuit current can be optimized out of the system, and becomes negligible at low voltage because the current at the switching threshold of a gate is only a few percent of the c... |

4 |
Leakage studies in high-density dynamic MOS memory devices
- Chatterjee, Taylor, et al.
- 1979
(Show Context)
Citation Context ...ging the bitlines to at least a threshold drop below V dd . This is much higher energy than SRAM, which only has to swing the bitlines 100mV or so. DRAM cells must be refreshed due to leakage current =-=[14]-=-. Normally the refresh 24 James B. Burr wordline bit bitL 4T sram, poly pullups wordline bit bitL 4T sram, pfet pullups wordline bit bitL 6T sram dual ported 6T sram write wordline read wordline write... |

4 |
den Bout and Thomas K. Miller III. TInMANN: The integer Markovian artificial neural network
- Van
- 1989
(Show Context)
Citation Context ...ng, where the network evolves incrementally over many cycles. Stochastic techniques Digital Neurochip Design 31 have been applied successfully to Hopfield networks [36], competitive learning networks =-=[17]-=-, and Boltzmann machines [6, 4]. Figure 12 shows the two basic operations: multiplication by ANDing pulse streams (P (AB) = P (A)P (B)), and addition by ORing (P (A + B) = P (A)+P (B) \Gamma P (AB)). ... |

4 |
Uncertainty as a foundation of computational power in neural networks
- Gaines
- 1987
(Show Context)
Citation Context ...e the energy required to transmit data with a dynamic range of N is proportional to N rather than to log(N) as in standard digital encoding. Good discussions of stochastic computation can be found in =-=[19, 47, 20]-=-. 5 Area, power, and performance estimation We have developed a simple area, performance, and power estimation technique which we use to construct spreadsheets in the early stages of architectural exp... |

4 |
Multilevel random-access memory using one transistor per cell
- Heald, Hodges
- 1976
(Show Context)
Citation Context ...much closer to being proportional to n due to devices. Multi-level storage Several researchers have reported techniques to store multiple levels in a single DRAM cell. The first were Heald and Hodges =-=[26]-=- in 1976. More recently, Aoki, Horiguchi, and colleagues [7, 33] reported 16 levels per cell in 1987. More levels might be achievable in neural networks since errors will always be small. Weight decay... |

4 |
Stochastic and deterministic averaging processors
- Mars, Poppelbaum
- 1981
(Show Context)
Citation Context ...e the energy required to transmit data with a dynamic range of N is proportional to N rather than to log(N) as in standard digital encoding. Good discussions of stochastic computation can be found in =-=[19, 47, 20]-=-. 5 Area, power, and performance estimation We have developed a simple area, performance, and power estimation technique which we use to construct spreadsheets in the early stages of architectural exp... |

3 |
The Block Z transform and applications to digital signal processing using distributed arithmetic and the Modified Fermat Number transform
- Li
- 1988
(Show Context)
Citation Context ...] and various redundant binary methods [24] have been proposed. "4:2" arithmetic based on 4:2 adders [60] interfaces cleanly to standard two's complement, implements an efficient, compact ac=-=cumulator [44, 43, 42, 56, 57, 58], and has -=-optimal logic depth [13]. Although 4:2 adders can be implemented using two full adders we discovered a "direct logic" implementation [44] that reduces the number of xors in series from four ... |

3 |
An 80 MHz Multiply Accumulator
- Li, Burr
- 1987
(Show Context)
Citation Context ...] and various redundant binary methods [24] have been proposed. "4:2" arithmetic based on 4:2 adders [60] interfaces cleanly to standard two's complement, implements an efficient, compact ac=-=cumulator [44, 43, 42, 56, 57, 58], and has -=-optimal logic depth [13]. Although 4:2 adders can be implemented using two full adders we discovered a "direct logic" implementation [44] that reduces the number of xors in series from four ... |

3 |
User's guide to NET, PRESIM, and RNL/NL
- Terman
- 1982
(Show Context)
Citation Context ...y stages of architectural exploration and feasibility analysis in an area-limited design. 5.2 Performance estimation We estimate performance using a simple RC timing model based on the RSIM simulator =-=[63]-=-, in which transistors are calibrated to have an effective resistance charging or discharging a node capacitance. Table 5 summarizes the equations we use to compute propagation delay. The symbology of... |

2 |
Masayoshi Ohkawa, Akane Aizaki, Yasushi Okuyama, 54 James B. Burr Isao Sasaki
- Aizaki
- 1990
(Show Context)
Citation Context ...- [37] 16M DRAM 10ns 2M 100M 50 - - M5M44C256 1M DRAM 60ns 128K 8M 63 300mW 38nJ [45] 512K DRAM 12ns 64K 80M 1250 1.7W 21nJ [54] 4M RDRAM 2ns 512K 500M 953 - - [25] 4M SRAM 23ns 512K 43M 84 350mW 8nJ =-=[2]-=- 4M SRAM 15ns 512K 67M 131 650mW 10nJ [59] 4M SRAM 9ns 512K 111M 217 970mW 9nJ Paradigm91 1M SRAM 17ns 128K 59M 449 - - Table 1: Maximum pattern presentation rates at full capacity for various memory ... |

2 |
Advanced simulation and development techniques
- Burr
- 1988
(Show Context)
Citation Context ...c [50]. Detailed timing simulation and analysis were done with spice [49]. Functional simulation, coarse timing, critical path analysis, and power measurements were done using our own version of rsim =-=[63, 10]-=-. 6.3 Multichip networks The chip is designed to be tiled in a regular array with nearest neighbor connections to implement multichip networks. The sigmoid units are deactivated in all chips except th... |