Results 1 - 10
of
12
Automated Bus Generation for Multiprocessor SoC Design
, 2003
"... The performance of a system, especially a multiprocessor system, heavily depends upon the efficiency of its bus architecture. This paper presents a methodology to generate a custom bus system for a multiprocessor System-on-a-Chip (SoC). Our bus synthesis tool (BusSyn) uses this methodology to genera ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
The performance of a system, especially a multiprocessor system, heavily depends upon the efficiency of its bus architecture. This paper presents a methodology to generate a custom bus system for a multiprocessor System-on-a-Chip (SoC). Our bus synthesis tool (BusSyn) uses this methodology to generate five different bus systems as examples: Bi-FIFO Bus Architecture (BFBA), Global Bus Architecture Version I (GBAVI), Global Bus Architecture Version III (GBAVIII), Hybrid bus architecture (Hybrid) and Split Bus Architecture (SplitBA). We verify and evaluate the performance of each bus system in the context of two applications: an Orthogonal Frequency Division Multiplexing (OFDM) wireless transmitter and an MPEG2 decoder. This methodology gives the designer a great benefit in fast design space exploration of bus architectures across a variety of performance impacting factors such as bus types, processor types and software programming style. In this paper, we show that BusSyn can generate buses that achieve superior performance when compared to a simple General Global Bus Architecture (GGBA) (e.g., 16.44% performance improvement in the case of OFDM transmitter) or when compared to the CoreConnect Bus Architecture (CCBA) (e.g., 15.54% peformance improvement in the case of MPEG2 decoder). In addition, the bus architecture generated by BusSyn is designed in a matter of seconds instead of weeks for the hand design of a custom bus system.
Integrated intra- and inter-task cache analysis for preemptive multi-tasking real-time systems
- In Proceedings of the 8th International Workshop, SCOPES 2004, in: Lecture Notes on Computer Science, LNCS3199
, 2004
"... Abstract. In this paper, we propose a timing analysis approach for preemptive multi-tasking real-time systems with caches. The approach focuses on the cache reload overhead caused by preemptions. The Worst Case Response Time (WCRT) of each task is estimated by incorporating cache reload overhead. Af ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract. In this paper, we propose a timing analysis approach for preemptive multi-tasking real-time systems with caches. The approach focuses on the cache reload overhead caused by preemptions. The Worst Case Response Time (WCRT) of each task is estimated by incorporating cache reload overhead. After acquiring the WCRT of each task, we can further analyze the schedulability of the system. Four sets of applications are used to exhibit the performance of our approach. The experimental results show that our approach can reduce the estimate of WCRT up to 44 % over prior state-of-the-art. 1
The System-on-a-Chip Lock Cache
, 2004
"... CONTENTS DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF FIGURES . . . . . . . . . . . . . . . . . . ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
CONTENTS DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . iv LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii I INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Thesis Organization and Roadmap . . . . . . . . . . . . . . . . . . . 5 II BACKGROUND AND PREVIOUS WORK . . . . . . . . . . . . 6 2.1 Locking Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Hardware Instructions for Locking . . . . . . . . . . . . . . . 8 2.1.2 Traditional Spin-Lock . . . . . . . . . . . . . . . . . . . . . . 1
Hardware Support for Real-Time Embedded Multiprocessor System-on-a-Chip Memory Management
- PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON HARDWARE/SOFTWARE CODESIGN (CODES'02
, 2002
"... The aggressive evolution of the semiconductor industry -- smaller process geometries, higher densities, and greater chip complexity -- has provided design engineers the means to create complex, high-performance Systems-on-a-Chip (SoC) designs. Such SoC designs typically have more than one processor ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
The aggressive evolution of the semiconductor industry -- smaller process geometries, higher densities, and greater chip complexity -- has provided design engineers the means to create complex, high-performance Systems-on-a-Chip (SoC) designs. Such SoC designs typically have more than one processor and huge memory, all on the same chip. Dealing with the global onchip memory allocation/de-allocation in a dynamic yet deterministic way is an important issue for the upcoming billion transistor multiprocessor SoC designs. To achieve this, we propose a memory management hierarchy we call Two-Level Memory Management. To implement this memory management scheme -- which presents a paradigm shift in the way designers look at on-chip dynamic memory allocation -- we present a System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU) for allocation of the global on-chip memory, which we refer to as Level Two memory management (Level One is the operating system management of memory allocated to a particular on-chip Processing Element). In this way, processing elements (heterogeneous or non-heterogeneous hardware or software) in an SoC can request and be granted portions of the global memory in a fast and deterministic time (for an example of a four processing element SoC, the dynamic memory allocation of the global onchip memory takes sixteen cycles per allocation/deallocation in the worst case). In this paper, we show how to modify an existing Real-Time Operating System (RTOS) to support the new proposed SoCDMMU. Our example shows a multiprocessor SoC that utilizes the SoCDMMU has 440% overall speedup of the application transition time over fully shared memory that does not utilize the SoCDMMU.
Timing Analysis for Preemptive Multi-tasking Real-Time Systems
- Proceedings of Design, Automation and Test in Europe (DATE’04
, 2004
"... In this paper, we propose an approach to estimate the Worst Case Response Time (WCRT) of tasks in a preemptive multi-tasking single-processor real-time system with a set associative cache. The approach focuses on analyzing the cache reload overhead caused by preemptions. We combine inter-task cache ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
In this paper, we propose an approach to estimate the Worst Case Response Time (WCRT) of tasks in a preemptive multi-tasking single-processor real-time system with a set associative cache. The approach focuses on analyzing the cache reload overhead caused by preemptions. We combine inter-task cache eviction behavior analysis and path analysis of the preempted task to reduce, in our analysis, the estimate of the number of cache lines that can possibly be evicted by the preempting task (thus requiring a reload by the preempted task). A mobile robot application which contains three tasks is used to test our approach. The experimental results show that our approach can tighten the WCRT estimate by up to 73% over prior state-of-the-art. 1.
DX-Gt: Memory Management and Crossbar Switch Generator for Multiprocessor System-on-a-Chip
, 2003
"... As the number of transistors on a single chip increases rapidly, there is a productivity gap between the increasing number of available transistors and the design time. One solution to reduce this productivity gap is to increase the use of Intellectual Property (IP) cores. However, an IP core should ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
As the number of transistors on a single chip increases rapidly, there is a productivity gap between the increasing number of available transistors and the design time. One solution to reduce this productivity gap is to increase the use of Intellectual Property (IP) cores. However, an IP core should be customized/configured before being used in a system different than the one for which it was designed. Thus, to reconfigure the IP core, either an engineer must spend significant effort altering the core by hand or else an enhanced CAD tool (IP generator) can automatically configure and customize the core according to the customer specifications.
Dynamic Memory Management for Embedded Real-Time Multiprocessor System on aChip
, 2003
"... this memory management scheme -- which presents a shift in the way designers look at on-chip dynamic memory allocation -- we present the System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU) for allocation xiii of the global on-chip memory, which we refer to as Level Two memory management (Leve ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
this memory management scheme -- which presents a shift in the way designers look at on-chip dynamic memory allocation -- we present the System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU) for allocation xiii of the global on-chip memory, which we refer to as Level Two memory management (Level One is the management of memory allocated to a particular on-chip Processing Element, e.g., an operating system's management of memory allocated to a particular processor). In this way, processing elements (heterogeneous or non-heterogeneous hardware or software) in an SoC can request and be granted portions of the global memory in a fast and deterministic time. A new tool is introduced to generate a custom optimized version of the SoCDMMU. Also, a real-time operating system is modified support the new proposed SoCDMMU. We show an example where shared memory multiprocessor SoC that employs the Two-Level Memory Management and utilizes the SoCDMMU has an overall average speedup in application transition time as well as normal execution time
A novel deadlock avoidance algorithm and its hardware implementation
- CODESISSS
, 2004
"... Implementation ..."
Design Space Exploration of Multiprocessor Systems with Multicontext Reconfigurable coprocessors
- In Proceedings of Engineering of Reconfigurable Systems and Algorithms, ERSA’07
, 2007
"... Abstract — Future high performance computing systems may consist of multiple processors and reconfigurable logic coprocessors. As indicated by industry trends, such co-processors will be integrated on existing motherboards without any glue logic. It is likely that such hybrid computing machines will ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — Future high performance computing systems may consist of multiple processors and reconfigurable logic coprocessors. As indicated by industry trends, such co-processors will be integrated on existing motherboards without any glue logic. It is likely that such hybrid computing machines will be a breakthrough for various high performance applications. As a result, it has become essential to investigate the system architectures of such machines. This paper describes a fullsystem simulation approach to model and evaluate hybrid computing systems made up of multiple processors and coarsegrained reconfigurable logic co-processors. We develop a fullsystem simulator for such hybrid machines by extending an existing full-system simulator to have device models for multicontext coarse-grained reconfigurable logic co-processors. The proposed full-system simulator is able to execute an unmodified multiprocessor operating system and a multiband filtering application. Using this full-system simulation approach, we have investigated the tradeoffs among various system architectures.
Automated Generation of Round-robin Arbitration and Crossbar Switch Logic
, 2003
"... “In his heart a man plans his course, but the LORD determines his steps....” – Proverbs 16:9 To my parents iii ACKNOWLEDGMENTS During my Ph. D. study, there are many people in Georgia Tech to whom I am thankful. First of all, I would like to express enormous appreciation to my adviser, Dr. Vincent J ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
“In his heart a man plans his course, but the LORD determines his steps....” – Proverbs 16:9 To my parents iii ACKNOWLEDGMENTS During my Ph. D. study, there are many people in Georgia Tech to whom I am thankful. First of all, I would like to express enormous appreciation to my adviser, Dr. Vincent J. Mooney III, from the bottom of the heart. In addition to his enthusiasm and professionalism dedicated to all members of our Codesign group, Dr. Mooney has been supporting and encouraging me to develop my thesis. With our weekly regular meeting, he has been listening to my idea patiently, and we have been brainstorming by short question and answer session. He has been also helping me improve my writing with logical reasoning and has been correcting my English pronunciation. His technical acumen, integrity and concern for all members of Codesign are remarkable

