Results 1 - 10
of
11
Synthesis of Hardware Models in C with Pointers and Complex Data Structures
- IEEE TRANSACTIONS ON VLSI SYSTEMS
, 2001
"... One of the greatest challenges in a C/C++-based design methodology is efficiently mapping C/C++ models into hardware. Many networking and multimedia applications implemented in hardware or mixed hardware/software systems now use complex data structures stored in multiple memories, so many C/C++ feat ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
One of the greatest challenges in a C/C++-based design methodology is efficiently mapping C/C++ models into hardware. Many networking and multimedia applications implemented in hardware or mixed hardware/software systems now use complex data structures stored in multiple memories, so many C/C++ features that were originally designed for software applications are now making their way into hardware. Such features include dynamic memory allocation and pointers for managing data. We present a solution for efficiently mapping arbitrary C code with pointers and malloc/free into hardware. Our solution, which fits current memory management methodologies, instantiates an application-specific hardware memory allocator coupled with a memory architecture. Our work also supports the resolution of pointers without restriction on the data structures. We present an implementation based on the SUIF framework along with case studies such as the realization of a video filter and an ATM segmentation engine.
Hardware Support for Real-Time Embedded Multiprocessor System-on-a-Chip Memory Management
- PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON HARDWARE/SOFTWARE CODESIGN (CODES'02
, 2002
"... The aggressive evolution of the semiconductor industry -- smaller process geometries, higher densities, and greater chip complexity -- has provided design engineers the means to create complex, high-performance Systems-on-a-Chip (SoC) designs. Such SoC designs typically have more than one processor ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
The aggressive evolution of the semiconductor industry -- smaller process geometries, higher densities, and greater chip complexity -- has provided design engineers the means to create complex, high-performance Systems-on-a-Chip (SoC) designs. Such SoC designs typically have more than one processor and huge memory, all on the same chip. Dealing with the global onchip memory allocation/de-allocation in a dynamic yet deterministic way is an important issue for the upcoming billion transistor multiprocessor SoC designs. To achieve this, we propose a memory management hierarchy we call Two-Level Memory Management. To implement this memory management scheme -- which presents a paradigm shift in the way designers look at on-chip dynamic memory allocation -- we present a System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU) for allocation of the global on-chip memory, which we refer to as Level Two memory management (Level One is the operating system management of memory allocated to a particular on-chip Processing Element). In this way, processing elements (heterogeneous or non-heterogeneous hardware or software) in an SoC can request and be granted portions of the global memory in a fast and deterministic time (for an example of a four processing element SoC, the dynamic memory allocation of the global onchip memory takes sixteen cycles per allocation/deallocation in the worst case). In this paper, we show how to modify an existing Real-Time Operating System (RTOS) to support the new proposed SoCDMMU. Our example shows a multiprocessor SoC that utilizes the SoCDMMU has 440% overall speedup of the application transition time over fully shared memory that does not utilize the SoCDMMU.
DMMX: Dynamic memory management extensions
- ICCD Workshop on Hardware Support for Objects and Microarchitectures for Java
, 2002
"... Automatic Dynamic Memory Management (ADMM) allows programmers to be more productive, increases system reliability and functionality. However, the true characteristics of these ADMM algorithms are known to be slow and non-deterministic. It is a well known fact that object-oriented applications tend t ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Automatic Dynamic Memory Management (ADMM) allows programmers to be more productive, increases system reliability and functionality. However, the true characteristics of these ADMM algorithms are known to be slow and non-deterministic. It is a well known fact that object-oriented applications tend to be dynamic memory intensive. Therefore, it is imperative that the programmers must decide whether or not the benefits of ADMM outweigh the shortcomings. In many object-oriented real-time and embedded systems, the programmers agree that the shortcomings are too severe for ADMM to be used in their applications. Therefore, these programmers while using Java or C++ as the development language, decide to allocate memory either statically or on the stack. In this paper, we present the design of an application specific instruction extension called the Dynamic Memory Management eXtension (DMMX) that would allow automatic dynamic memory management to be done in the hardware. Our highperformance scheme allows both allocation and garbage collection to be done in a predictable fashion. The allocation is done through the modified buddy system which allows constant time object creation. The garbage collection algorithm is mark-sweep, where the sweeping phase can be accomplished in constant time. This hardware scheme would greatly improve the speed and predictability of ADMM. Additionally, our proposed scheme is an addon approach, which allows easy integration into any CPU, hardware implemented Java Virtual Machine (JVM), or Processor in Memory (PIM). index terms: hardware support for garbage collection, real-time garbage collector, mark-sweep garbage collector, application specific instruction extension, object-oriented programming 1.
DX-Gt: Memory Management and Crossbar Switch Generator for Multiprocessor System-on-a-Chip
, 2003
"... As the number of transistors on a single chip increases rapidly, there is a productivity gap between the increasing number of available transistors and the design time. One solution to reduce this productivity gap is to increase the use of Intellectual Property (IP) cores. However, an IP core should ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
As the number of transistors on a single chip increases rapidly, there is a productivity gap between the increasing number of available transistors and the design time. One solution to reduce this productivity gap is to increase the use of Intellectual Property (IP) cores. However, an IP core should be customized/configured before being used in a system different than the one for which it was designed. Thus, to reconfigure the IP core, either an engineer must spend significant effort altering the core by hand or else an enhanced CAD tool (IP generator) can automatically configure and customize the core according to the customer specifications.
A Hardware Implementation of Realloc Function
- Proceedings of WVLSI'99 IEEE Annual Workshop on VLSI
, 1999
"... The memory intensive nature of object-oriented languages such as C++ and Java has created the need of a high-performance dynamic memory management. Objectoriented applications often generate higher memory intensity in the heap region. Thus, high-performance memory manager is needed to cope with such ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The memory intensive nature of object-oriented languages such as C++ and Java has created the need of a high-performance dynamic memory management. Objectoriented applications often generate higher memory intensity in the heap region. Thus, high-performance memory manager is needed to cope with such applications. As today’s VLSI technology advances, it becomes more and more attractive to map basic software algorithms such as malloc(), free(), and realloc() into hardware. This paper presents a hardware design of realloc function that fully utilizes the advantage of combinational logic. There are two steps needed to complete a reallocation process: (a) try to reallocate on the original memory block and (b) if (a) failed, allocate another memory block and copy the contents of the original block to this new location. In our scheme, (a) can be done in constant time. For (b), the allocation of new memory block and the deallocation of original block are done in constant time. The hardware complexity of proposed scheme (i.e. X-unit, RS-unit, and ESG-unit) is O(n), where n represents the size of bit-map. Index Terms — hardware algorithms, VLSI systems, dynamic memory management algorithms, expand(), realloc() 1.
Efficient dynamic heap allocation of scratch-pad memory
- In ISMM ’08: Proceedings of the 7th international symposium on Memory management
, 2008
"... An increasing number of processor architectures support scratchpad memory – software managed on-chip memory. Scratch-pad memory provides low latency data storage, like on-chip caches, but under explicit software control. The simple design and predictable nature of scratchpad memories has seen them i ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
An increasing number of processor architectures support scratchpad memory – software managed on-chip memory. Scratch-pad memory provides low latency data storage, like on-chip caches, but under explicit software control. The simple design and predictable nature of scratchpad memories has seen them incorporated into a number of embedded and real-time system processors. They are also employed by multi-core architectures to isolate processor core local data and act as low latency inter-core shared memory. Managing scratch-pad memory by hand is time consuming, error prone and potentially wasteful; tools that automatically manage this memory are essential for its use by general purpose software. While there has been promising work in compile time allocation of scratch-pad memory, there will always be applications which require run-time allocation. Modern dynamic memory management
Dynamic Memory Management for Embedded Real-Time Multiprocessor System on aChip
, 2003
"... this memory management scheme -- which presents a shift in the way designers look at on-chip dynamic memory allocation -- we present the System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU) for allocation xiii of the global on-chip memory, which we refer to as Level Two memory management (Leve ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
this memory management scheme -- which presents a shift in the way designers look at on-chip dynamic memory allocation -- we present the System-on-a-Chip Dynamic Memory Management Unit (SoCDMMU) for allocation xiii of the global on-chip memory, which we refer to as Level Two memory management (Level One is the management of memory allocated to a particular on-chip Processing Element, e.g., an operating system's management of memory allocated to a particular processor). In this way, processing elements (heterogeneous or non-heterogeneous hardware or software) in an SoC can request and be granted portions of the global memory in a fast and deterministic time. A new tool is introduced to generate a custom optimized version of the SoCDMMU. Also, a real-time operating system is modified support the new proposed SoCDMMU. We show an example where shared memory multiprocessor SoC that employs the Two-Level Memory Management and utilizes the SoCDMMU has an overall average speedup in application transition time as well as normal execution time
A High-Performance Hardware-Efficient Memory Allocation Technique and Design
- PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER DESIGN
, 1999
"... This paper presents a hardware-efficient memory allocation (EMA) technique designed to eliminate both internal and external fragmentation that appear in the buddy system. EMA can allocate a free memory block of any size in any part of memory. Hardware implementation of EMA is introduced, but only pa ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents a hardware-efficient memory allocation (EMA) technique designed to eliminate both internal and external fragmentation that appear in the buddy system. EMA can allocate a free memory block of any size in any part of memory. Hardware implementation of EMA is introduced, but only part of its circuits is shown in the paper due to the space limitation. Simulation results show that EMA utilizes memory space more efficiently than the previously known techniques.
A Page-based Hybrid (Software-Hardware) Dynamic Memory Allocator
"... Abstract — Modern programming languages often include complex mechanisms for dynamic memory allocation and garbage collection. These features drive the need for more efficient implementation of memory management functions, both in terms of memory usage and execution performance. In this paper, we in ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — Modern programming languages often include complex mechanisms for dynamic memory allocation and garbage collection. These features drive the need for more efficient implementation of memory management functions, both in terms of memory usage and execution performance. In this paper, we introduce a software and hardware codesign to improve the speed of the software allocator used in Free-BSD systems. The hardware complexity of our design is independent of the dynamic memory size, thus making the allocator suitable for any memory size. Our design improves the performance of memory management intensive benchmarks by as much as 43%. To our knowledge, this is the first-ever work of this kind, introducing “hybrid memory allocator”. I.
1 Custom Microcoded Dynamic Memory Management for Distributed On-Chip Memory Organizations
"... Abstract—Multi-Processor System-on-Chip (MPSoCs) have attracted significant attention since they are recognized as a scalable paradigm to interconnect and organize a high number of cores. Current multi-core embedded systems exhibit increased levels of dynamic behavior, leading to unexpected memory f ..."
Abstract
- Add to MetaCart
Abstract—Multi-Processor System-on-Chip (MPSoCs) have attracted significant attention since they are recognized as a scalable paradigm to interconnect and organize a high number of cores. Current multi-core embedded systems exhibit increased levels of dynamic behavior, leading to unexpected memory footprint variations unknown at design time. Dynamic Memory Management (DMM) is a promising solution for such types of dynamic systems. Although some efficient dynamic memory managers have been proposed for conventional bus-based MPSoC platforms, there are no DMM solutions regarding the constraints and the opportunities delivered by the physical distribution of multiple memory nodes of the platform. In this work, we address the problem of providing customized microcoded DMM on MPSoC platforms with distributed memory organization. Customization is enabled at application- and platform-level. Results show that customized microcoded DMM can serve approximately 7 × more allocation requests compared to pure distributed memory platforms and perform 25 % faster than the corresponding high-level implementation in C language.

