Results 1 -
5 of
5
Fine-Grain Distributed Shared Memory on Clusters of Workstations
, 1997
"... Shared memory, one of the most popular models for programming parallel platforms, is becoming ubiquitous both in low-end workstations and high-end servers. With the advent of low-latency networking hardware, clusters of workstations strive to offer the same processing power as high-end servers for a ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
Shared memory, one of the most popular models for programming parallel platforms, is becoming ubiquitous both in low-end workstations and high-end servers. With the advent of low-latency networking hardware, clusters of workstations strive to offer the same processing power as high-end servers for a fraction of the cost. In such environments, shared memory has been limited to page-based systems that control access to shared memory using the memory's page protection to implement shared memory coherence protocols. Unfortunately, false sharing and fragmentation problems force such systems to resort to weak consistency shared memory models that complicate the shared memory programming model.
Fine-Grain Protocol Execution Mechanisms & Scheduling Policies on SMP Clusters
, 1998
"... Symmetric multiprocessor (SMP) clusters are emerging as the cost-effective medium- to large-scale parallel computers of choice, exploiting the superior cost-performance of SMP desktops and servers. These machines implement communication among SMP nodes by sending/receiving messages through an interc ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Symmetric multiprocessor (SMP) clusters are emerging as the cost-effective medium- to large-scale parallel computers of choice, exploiting the superior cost-performance of SMP desktops and servers. These machines implement communication among SMP nodes by sending/receiving messages through an interconnection network. Many applications and systems use a variety of software protocols to coordinate this communication. As such, protocol performance can significantly impact communication time and overall system performance. This thesis proposes and evaluates techniques to improve fine-grain software protocol performance. Rather than provide embedded network interface processors, some systems schedule and execute the protocol code on the SMP processors to reduce hardware complexity and cost. This thesis evaluates when it is beneficial to dedicate one or more processors in every SMP to always execute the protocol code. Results from simulating a finegrain software distributed shared memory (D...
On-chip COMA cache-coherence protocol for microgrids of microthreaded cores
- In Proc. Workshop on Highly Parallel Processing on a Chip (HPPC
, 2007
"... This paper describes an on-chip COMA cache coherency protocol to support the microthread model of concurrent program composition. The model gives a sound basis for building multi-core computers as it captures concurrency, abstracts communication and identifies resources, such as processor groups exp ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper describes an on-chip COMA cache coherency protocol to support the microthread model of concurrent program composition. The model gives a sound basis for building multi-core computers as it captures concurrency, abstracts communication and identifies resources, such as processor groups explicitly and where mapping and scheduling is performed dynamically. The result is a model where binary compatibility is guaranteed over arbitrary numbers of cores and where backward binary compatibility is also assured. We present the design of a memory system with relaxed synchronisation and consistency constraints that matches the characteristics of this model. We exploit an on-chip COMA organisation, which provides a flexible and transparent partitioning between processors and memory. This paper describes the coherency protocol and consistency model and describes work undertaken on the validation of the model and the development of a co-simulator to the Microgrid CMP emulator. 1
A General Model of Concurrency and its Implementation as Many-core Dynamic RISC Processors
"... Abstract—This paper presents a concurrent execution model and its micro-architecture based on in-order RISC processors, which schedules instructions from large pools of contextualised threads. The model admits a strategy for programming chip multiprocessors using parallelising compilers based on exi ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract—This paper presents a concurrent execution model and its micro-architecture based on in-order RISC processors, which schedules instructions from large pools of contextualised threads. The model admits a strategy for programming chip multiprocessors using parallelising compilers based on existing languages. The model is supported in the ISA by number of instructions to create and manage abstract concurrency. The paper estimates the cost of supporting these instructions in silicon. The model and its implementation uses dynamic parameterisation of concurrency creation, where a single instruction captures asynchronous remote function execution, mutual exclusion and the execution of a general concurrent loop structure and all associated communication. Concurrent loops may be dependent or independent, bounded or unbounded and may be nested arbitrarily. Hierarchical concurrency allows compilers to restructure and parallelise sequential code to meet the strict constraints on the model, which provide its freedom from deadlock and locality of communication. Communication is implicit in both the model and micro-architecture, due to the dynamic distribution of concurrency. The result is location-independent binary code that may execute on any number of processors. Simulation and analysis of the micro-architecture indicate that the model is a strong candidate for the exploitation of many-core processors. The results show near-linear speedup over two orders of magnitude of processor scaling, good energy efficiency and tolerance to large latencies in asynchronous operations. This is true for both independent threads as well as for reductions. I.
Evaluating CMPs and their Memory Architecture
"... Abstract. Many-core processor architectures require scalable solutions that reflect the locality and power constraints of future generations of technology. This paper presents a CMP architecture that supports automatic mapping and dynamic scheduling of threads leaving the binary code devoid of any e ..."
Abstract
- Add to MetaCart
Abstract. Many-core processor architectures require scalable solutions that reflect the locality and power constraints of future generations of technology. This paper presents a CMP architecture that supports automatic mapping and dynamic scheduling of threads leaving the binary code devoid of any explicit communication. The thrust of this approach is to produce binary code that is divorced from implementation parameters, yet, which still gives good performance over future generations of CMPs. A key component of this abstract processor architecture is the memory system. This paper evaluates the memory architectures, which must maintain performance across a range of targets. 1.

