Results 1 - 10
of
11
A Micropipelined ARM
, 1993
"... An asynchronous implementation of the ARM microprocessor is described. The design is based on Sutherland's Micropipelines, and allows considerable internal asynchronous concurrency. The rationale for the work is presented, the organisation of the chip described, and the characteristics of the chip d ..."
Abstract
-
Cited by 53 (13 self)
- Add to MetaCart
An asynchronous implementation of the ARM microprocessor is described. The design is based on Sutherland's Micropipelines, and allows considerable internal asynchronous concurrency. The rationale for the work is presented, the organisation of the chip described, and the characteristics of the chip described. The design displays unusual properties such as nondeterministic (but bounded) prefetch depth beyond a branch instruction. This work demonstrates the feasibility of building complex asynchronous systems and gives an indication of the costs and benefits of the Micropipeline approach. Keyword Codes: C.1.1; B.1.1; B.7.1 Keywords: Processor Architectures, Single Data Stream Architectures; Control Structures and Microprogramming, Control Design Styles; Integrated Circuits, Types and Design Styles 1. INTRODUCTION The power dissipation of high-performance CMOS VLSI microprocessors is becoming an increasing problem. Even when battery power and portability are not an issue the 20 to 30...
The Design and Evaluation of an Asynchronous Microprocessor
- In Proc. International Conf. Computer Design (ICCD). IEEE Computer
, 1994
"... AMULET1 is a fully asynchronous implementation of the ARM microprocessor which was designed at Manchester University between 1991 and 1993. First silicon arrived in April 1994 and was found to be functional, demonstrating that asynchronous design of complex circuits is feasible with present day CAD ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
AMULET1 is a fully asynchronous implementation of the ARM microprocessor which was designed at Manchester University between 1991 and 1993. First silicon arrived in April 1994 and was found to be functional, demonstrating that asynchronous design of complex circuits is feasible with present day CAD tools. This paper presents the motivation for the work, some of the design choices which were made, the problems which were encountered during the development of the design and the characteristics of the device itself. The future potential for asynchronous circuits is also discussed. 1: Introduction The growth in demand for high performance portable computing equipment has led to a resurgence of interest in asynchronous logic design techniques. In order to investigate the power saving potential of asynchronous approaches to CMOS design, a self-timed implementation of the ARM microprocessor [1] has been developed as a commercially realistic technology demonstrator. The methodology applied ...
A CMOS VLSI Implementation of an Asynchronous ALU
, 1993
"... A CMOS self-timed ALU has been developed as part of an asynchronous implementation of the ARM microprocessor. This unit exploits the data dependency inherent in many arithmetic operations to enable a small, simple ALU to deliver a mean performance comparable with that of a more sophisticated synchro ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
A CMOS self-timed ALU has been developed as part of an asynchronous implementation of the ARM microprocessor. This unit exploits the data dependency inherent in many arithmetic operations to enable a small, simple ALU to deliver a mean performance comparable with that of a more sophisticated synchronous one with consequent reductions in both silicon area and electrical power consumption. The self-timed nature of the unit means that the majority of operations complete quickly whilst allowing rare `worst-case' operations to take longer, maintaining a high average throughput. This paper
Register Locking in an Asynchronous Microprocessor
- IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN: VLSI IN COMPUTERS & PROCESSORS
, 1992
"... A high performance register bank is a central component of a RISC processor. A novel register bank design has been developed, as an integral part of a self-timed implementation of a commercial RISC microprocessor, to address the problem of register interlocking in an asynchronous micropipelined exec ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
A high performance register bank is a central component of a RISC processor. A novel register bank design has been developed, as an integral part of a self-timed implementation of a commercial RISC microprocessor, to address the problem of register interlocking in an asynchronous micropipelined execution unit. The challenge in an asynchronous design is to maintain coherent register operation while allowing concurrent read and write accesses with arbitrary timing. The solution presented here includes a novel arbiter-free locking mechanism which enables efficient read operations in the presence of multiple pending write operations.
Superscalar Performance in a Multithreaded Microprocessor
, 1993
"... Multithreaded processors, having hardware support for the concurrent execution of fine-grained threaded computations, are noted for their latency tolerance and low-cost synchronization. Multithreading is a technique for improving the utilization of processing elements (PEs) in parallel processing sy ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Multithreaded processors, having hardware support for the concurrent execution of fine-grained threaded computations, are noted for their latency tolerance and low-cost synchronization. Multithreading is a technique for improving the utilization of processing elements (PEs) in parallel processing systems, thereby reducing cost/performance ratios. With increasing integrated circuit densities it is becoming feasible to integrate several PEs onto a single die, and further diminish the physical dimensions of parallel systems. However, by eliminating the artificial on-chip PE boundaries and sharing expensive resources in a more tightly coupled multithreaded architecture, even greater performance can be achieved from similar hardware. A multithreaded processor architecture (Concurro) was designed for possible microprocessor implementation with the objective of multiple instruction issues per cycle---sustained superscalar performance---by means of multithreading. This thesis considers the tra...
Design and VLSI Implementation of an Address Generation Coprocessor
, 1995
"... Most applications of general purpose VLSI processors are developed using high level languages. In these languages, information is generally handled in a structured form. Compilers generate a considerable amount of code to navigate through the data structures and considerable processing time is spent ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Most applications of general purpose VLSI processors are developed using high level languages. In these languages, information is generally handled in a structured form. Compilers generate a considerable amount of code to navigate through the data structures and considerable processing time is spent performing address calculations required to access the data structures. In this paper, an alternative to software address generation, a hardware memory reconfiguring unit or an address generation coprocessor is presented. To demonstrate the VLSI feasibility of the designed device, it is implemented in VLSI using the Octtool suite of tools. The tools used and the implementation procedure are described. VLSI design aspects such as regularity, modularity, scalability etc are discussed. It is observed that this chip is suitable for fault--tolerant design incorporating some redundancy and for wafer--scale integration. The performance of the device is evaluated using assembly language programs th...
Specification and Verification of Pipelining in the ARM2 RISC Microprocessor
- ACM Transactions on Design Automation of Electronic Systems
, 1997
"... State Machines (ASMs) provide a sound mathematical basis for the specification and verification of systems. An application of the ASM methodology to the verification of a pipelined microprocessor (an ARM2 implementation) is described. Both the sequential execution model and final pipelined model are ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
State Machines (ASMs) provide a sound mathematical basis for the specification and verification of systems. An application of the ASM methodology to the verification of a pipelined microprocessor (an ARM2 implementation) is described. Both the sequential execution model and final pipelined model are formalized using ASMs. A series of intermediate models are introduced that gradually expose the complications of pipelining. The first intermediate model is proven equivalent to the sequential model in the absence of structural, control, and data hazards. In the following steps, these simplifying assumptions are lifted one by one, and the original proof is refined to establish the equivalence of each intermediate model with the sequential model, leading ultimately to a full proof of equivalence of the sequential and pipelined models. Categories and Subject Descriptors: B.5.2 [Hardware]: Register transfer level implementation--- Design Aids; C.1.1 [Computer Systems Organization]: Processor ...
Strategies For The Modelling And Simulation Of Asynchronous Computer Architectures
, 1995
"... 15 Preface 19 Acknowledgements 22 1 Introduction 24 1.1 Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24 1.2 Motivation and Objectives : : : : : : : : : : : : : : : : : : : : : : 24 1.3 Structure of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : 25 1.3.1 Related ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
15 Preface 19 Acknowledgements 22 1 Introduction 24 1.1 Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24 1.2 Motivation and Objectives : : : : : : : : : : : : : : : : : : : : : : 24 1.3 Structure of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : 25 1.3.1 Related Publications : : : : : : : : : : : : : : : : : : : : : 27 2 The Quest for High Performance 28 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28 2.2 Bit and Instruction Level Parallelism : : : : : : : : : : : : : : : : 29 2.3 Reduced Instruction Set Computers : : : : : : : : : : : : : : : : : 30 2.4 The Limits of Sequential Computation : : : : : : : : : : : : : : : 31 2.5 Parallel Computer Architectures : : : : : : : : : : : : : : : : : : : 32 2.5.1 SIMD : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 33 2.5.2 MIMD : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 34 2.5.2.1 Shared Memory MIMD Architectures : : : : : : : 34 2.5.2.2 Distributed M...

