Results 1 - 10
of
53
A High-Performance Microarchitecture with Hardware-Programmable Functional Units
- in Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... This paper explores a novel way to incorporate hardware-programmable resources into a processor microarchitecture to improve the performance of general-purpose applications. Through a coupling of compile-time analysis routines and hardware synthesis tools, we automatically configure a given set of t ..."
Abstract
-
Cited by 171 (1 self)
- Add to MetaCart
This paper explores a novel way to incorporate hardware-programmable resources into a processor microarchitecture to improve the performance of general-purpose applications. Through a coupling of compile-time analysis routines and hardware synthesis tools, we automatically configure a given set of the hardware-programmable functional units (PFUs) and thus augment the base instruction set architecture so that it better meets the instruction set needs of each application. We refer to this new class of general-purpose computers as PRogrammable Instruction Set Computers (PRISC). Although similar in concept, the PRISC approach differs from dynamically programmable microcode because in PRISC we define entirely-new primitive datapath operations. In this paper, we concentrate on the microarchitectural design of the simplest form of PRISC---a RISC microprocessor with a single PFU that only evaluates combinational functions. We briefly discuss the operating system and the programming language co...
Authentication in the Taos Operating System
- ACM Transactions on Computer Systems
, 1994
"... this paper we do not describe any formal notations or rules for propositional connectives. Instead, we use English keywords, like "if" and "then", and informal reasoning. 4 \Delta E. Wobber et al. --- Conjunctions of principals. We write A B for the conjunction of A and B. If both A says S and B s ..."
Abstract
-
Cited by 161 (11 self)
- Add to MetaCart
this paper we do not describe any formal notations or rules for propositional connectives. Instead, we use English keywords, like "if" and "then", and informal reasoning. 4 \Delta E. Wobber et al. --- Conjunctions of principals. We write A B for the conjunction of A and B. If both A says S and B says S then (A B) says S as well. --- Principals quoting principals. We write B j A for B quoting A. If B says A says
Programmable Active Memories: Reconfigurable Systems Come of Age
- IEEE Transactions on VLSI Systems
, 1996
"... Programmable Active Memories (PAM) are a novel form of universal reconfigurable hardware co-processor. Based on Field-Programmable Gate Array (FPGA) technology, a PAM is a virtual machine, controlled by a standard microprocessor, which can be dynamically and indefinitely reconfigured into a large nu ..."
Abstract
-
Cited by 123 (5 self)
- Add to MetaCart
Programmable Active Memories (PAM) are a novel form of universal reconfigurable hardware co-processor. Based on Field-Programmable Gate Array (FPGA) technology, a PAM is a virtual machine, controlled by a standard microprocessor, which can be dynamically and indefinitely reconfigured into a large number of application-specific circuits. PAMs offer a new mixture of hardware performance and software versatility. We review the important architectural features of PAMs, through the example of DECPeRLe-1, an experimental device built in 1992. PAM programming is presented, in contrast to classical gate-array and full custom circuit design. Our emphasis is on large, code-generated synchronous systems descriptions
Separating agreement from execution for byzantine fault tolerant services
- IN PROC. SOSP
, 2003
"... We describe a new architecture for Byzantine fault tolerant state machine replication that separates agreement that orders requests from execution that processes requests. This separation yields two fundamental and practically significant advantages over previous architectures. First, it reduces rep ..."
Abstract
-
Cited by 110 (15 self)
- Add to MetaCart
We describe a new architecture for Byzantine fault tolerant state machine replication that separates agreement that orders requests from execution that processes requests. This separation yields two fundamental and practically significant advantages over previous architectures. First, it reduces replication costs because the new architecture can tolerate faults in up to half of the state machine replicas that execute requests. Previous systems can tolerate faults in at most a third of the combined agreement/state machine replicas. Second, separating agreement from execution allows a general privacy firewall architecture to protect confidentiality through replication. In contrast, replication in previous systems hurts confidentiality because exploiting the weakest replica can be su#cient to compromise the system. We have constructed a prototype and evaluated it running both microbenchmarks and an NFS server. Overall, we find that the architecture adds modest latencies to unreplicated systems and that its performance is competitive with existing Byzantine fault tolerant systems.
Programmable Active Memories: a Performance Assessment
- Research on Integrated Systems: Proceedings of the 1993 Symposium
, 1993
"... We present some quantitative performance measurements for the computing power of Programmable Active Memories (PAM), as introduced by [BRV 89]. Based on Programmable Gate Array (PGA) technology, the PAM is a universal hardware co-processor closely coupled to a standard host computer. The PAM can spe ..."
Abstract
-
Cited by 101 (6 self)
- Add to MetaCart
We present some quantitative performance measurements for the computing power of Programmable Active Memories (PAM), as introduced by [BRV 89]. Based on Programmable Gate Array (PGA) technology, the PAM is a universal hardware co-processor closely coupled to a standard host computer. The PAM can speed up many critical software applications running on the host, by executing part of the computations through a specific hardware PAM design. The performance measurements presented are based on two PAM architectures and ten specific applications, drawn from arithmetics, algebra, geometry, physics, biology, audio and video. Each of these PAM designs proves as fast as any reported hardware or super-computer for the corresponding application. In cases where we could bring some genuine algorithmic innovation into the design process, the PAM has proved an order of magnitude faster than any previously existing system (see [SBV 91] and [S 92]). 1 PAM concept Like any RAM memory module, a PAM is att...
Preserving Privacy in a Network of Mobile Computers
"... Even as wireless networks create the potential for access to information from mobile platforms, they pose aproblem for privacy. In order to retrieve messages, users must periodically poll the network. The information that the user must give to the network could potentially be used totrack that user. ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
Even as wireless networks create the potential for access to information from mobile platforms, they pose aproblem for privacy. In order to retrieve messages, users must periodically poll the network. The information that the user must give to the network could potentially be used totrack that user. However, the movements of the user can also be used to hide the user's location if the protocols for sending and retrieving messages are carefully designed. We have developed a replicated memory service which allows users to read from memory without revealing which memory locations they are reading. Unlike previous protocols, our protocol is e cient in its use of computation and bandwidth. In this paper, we will show how this protocol can be usedinconjunction with existing privacy preserving protocols to allow a user of a mobile computer to maintain privacy despite active attacks.
Improving Functional Density Through Run-Time Circuit Reconfiguration
, 1997
"... orting a C compiler to the DISC processor. Justin Diether assisted in the design, hand-layout, and testing of many partially reconfigured circuits. I would also like to thank Paul Graham for his generous assistance and support of our many mutual activities, classes, and projects at BYU. Other gradua ..."
Abstract
-
Cited by 42 (2 self)
- Add to MetaCart
orting a C compiler to the DISC processor. Justin Diether assisted in the design, hand-layout, and testing of many partially reconfigured circuits. I would also like to thank Paul Graham for his generous assistance and support of our many mutual activities, classes, and projects at BYU. Other graduate students assisting me with this work include Russel Peterson, Mike Rencher, Richard Ross, and Peter Bellows. My advisor, Brad Hutchings, provided essential assistance and encouragement in all of the projects, ideas, and results presented within this work. My decision to complete this degree and write this dissertation was influenced largely by his advice and positive encouragement. Brent Nelson and other faculty members within the Electrical and Computer Engineering department at BYU have provided critical feedback on a wide variety of topics relating to this work. I would also like to acknowledge the insight and assistance of many collaborators researching closely related subjects. For
An Energy-Efficient Reconfigurable Public-Key Cryptography Processor
- IEEE Journal of Solid-State Circuits
, 2001
"... The ever-increasing demand for security in portable energy-constrained environments that lack a coherent security architecture has resulted in the need to provide energy-efficient algorithm-agile cryptographic hardware. Domain-specific reconfigurability is utilized to provide the required flexibilit ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
The ever-increasing demand for security in portable energy-constrained environments that lack a coherent security architecture has resulted in the need to provide energy-efficient algorithm-agile cryptographic hardware. Domain-specific reconfigurability is utilized to provide the required flexibility, without incurring the high overhead costs associated with generic reprogrammable logic. The resulting implementation is capable of performing an entire suite of cryptographic primitives over the integers modulo , binary Galois Fields and nonsupersingular elliptic curves over GF(2 ), with fully programmable moduli, field polynomials and curve parameters ranging in size from 8 to 1024 bits. The resulting processor consumes a maximum of 75 mW when operating at a clock rate of 50 MHz and a 2-V supply voltage. In ultralow-power mode (3 MHz at 0.7 V) the processor consumes at most 525 W. Measured performance and energy efficiency indicate a comparable level of performance to previously reported dedicated hardware implementations, while providing all of the flexibility of a software-based implementation. In addition, the processor is two to three orders of magnitude more energy efficient than optimized software and reprogrammable logic-based implementations.
PAM-Blox: High performance FPGA design for adaptive computing
- Computing, IEEE Symposium on FPGAs for Custom Computing Machines
, 1998
"... PAM-Blox are object-oriented circuit generators on top of the PCI Pamette design environment, PamDC. High- performance FPGA design for adaptive computing is simplified by using a hierarchy of optimized hardware objects described in C++. PAM-Blox consist of two major layers of abstraction. First, Pam ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
PAM-Blox are object-oriented circuit generators on top of the PCI Pamette design environment, PamDC. High- performance FPGA design for adaptive computing is simplified by using a hierarchy of optimized hardware objects described in C++. PAM-Blox consist of two major layers of abstraction. First, PamBlox are parameterizable simple elements such as counters and adders. Automatic placement of carry chains and flexible shapes are supported. PaModules are more complex elements possibly instantiating PamBlox. PaModules have fixed shapes and are usually optimized for a specific data-width. Examples for PaModules are multipliers, Coordinate Rotations (CORDICs), and special arithmetic units for encryption. The key difference of our approach to most other design tools for FPGAs is that the designer has total control over placement at each level of the design hierarchy, which is the key to high-performance FPGA design. Second, the object interface was chosen carefully to encourage code-reuse and simplify code-sharing between designers. PAM-Blox are intended to be part of an open library that allows design sharing between members of the adaptive computing community. 1.
Montgomery Modular Exponentiation on Reconfigurable Hardware
, 1999
"... It is widely recognized that security issues will play a crucial role in the majority of future computer and communication systems. Central tools for achieving system security are cryptographic algorithms. For performance as well as for physical security reasons, it is often advantageous to realize ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
It is widely recognized that security issues will play a crucial role in the majority of future computer and communication systems. Central tools for achieving system security are cryptographic algorithms. For performance as well as for physical security reasons, it is often advantageous to realize cryptographic algorithms in hardware. In order to overcome the well-known drawback of reduced flexibility that is associated with traditional ASIC solutions, this contribution proposes arithmetic architectures which are optimized for modern field programmable gate arrays (FPGAs). The proposed architectures perform modular exponentiation with very long integers. This operation is at the heart of many practical public-key algorithms such as RSA and discrete logarithm schemes. We combine the Montgomery modular multiplication algorithm with a new systolic array design, which is capable of processing a variable number of bits per array cell. The designs are flexible, allowing any choice of operan...

