Results 11 - 20
of
4,465
The SGI Origin: A ccNUMA highly scalable server
- In Proceedings of the 24th International Symposium on Computer Architecture (ISCA’97
, 1997
"... The SGI Origin 2000 is a cache-coherent non-uniform memory access (ccNUMA) multiprocessor designed and manufactured by Silicon Graphics, Inc. The Origin system was designed from the ground up as a multiprocessor capable of scaling to both small and large processor counts without any bandwidth, laten ..."
Abstract
-
Cited by 497 (0 self)
- Add to MetaCart
The SGI Origin 2000 is a cache-coherent non-uniform memory access (ccNUMA) multiprocessor designed and manufactured by Silicon Graphics, Inc. The Origin system was designed from the ground up as a multiprocessor capable of scaling to both small and large processor counts without any bandwidth
Cache Equalizer: A Placement Mechanism for Chip Multiprocessor Distributed Shared Caches
"... This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large-scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets ’ usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misse ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large-scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets ’ usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing
C-AMTE: A Location Mechanism for Flexible Cache Management in Chip Multiprocessors
, 2009
"... This paper describes Constrained Associative-Mapping-of-Tracking-Entries (C-AMTE), a scalable mechanism to facilitate flexible and efficient distributed cache management in large-scale chip multiprocessors (CMPs). C-AMTE enables fast locating of cache blocks in CMP cache schemes that employ one-to-o ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper describes Constrained Associative-Mapping-of-Tracking-Entries (C-AMTE), a scalable mechanism to facilitate flexible and efficient distributed cache management in large-scale chip multiprocessors (CMPs). C-AMTE enables fast locating of cache blocks in CMP cache schemes that employ one
Efficient Cache Coherence Protocol in Tiled Chip Multiprocessors
"... Abstract — Although directory-based cache coher-ence protocols are the best choice when designing large-scale chip multiprocessors (CMPs), they in-troduce indirection to access directory information, which negatively impacts performance. In this work, we present DiCo-CMP, a cache coherence protocol ..."
Abstract
- Add to MetaCart
Abstract — Although directory-based cache coher-ence protocols are the best choice when designing large-scale chip multiprocessors (CMPs), they in-troduce indirection to access directory information, which negatively impacts performance. In this work, we present DiCo-CMP, a cache coherence protocol
Sparcle: An Evolutionary Processor Design for Large-Scale Multiprocessors
- IEEE MICRO
, 1993
"... Sparcle is a processor chip developed jointly by MIT, LSI Logic, and SUN Microsystems, by evolving an existing RISC architecture towards a processor suited for large-scale multiprocessors. Sparcle supports three multiprocessor mechanisms: fast context switching, fast, user-level message handling, a ..."
Abstract
-
Cited by 112 (21 self)
- Add to MetaCart
Sparcle is a processor chip developed jointly by MIT, LSI Logic, and SUN Microsystems, by evolving an existing RISC architecture towards a processor suited for large-scale multiprocessors. Sparcle supports three multiprocessor mechanisms: fast context switching, fast, user-level message handling
The Stanford FLASH multiprocessor
- In Proceedings of the 21st International Symposium on Computer Architecture
, 1994
"... The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine’s global memory, a port to the interconnection n ..."
Abstract
-
Cited by 349 (20 self)
- Add to MetaCart
The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine’s global memory, a port to the interconnection
Address Remapping for Static NUCA in NoC-based Degradable Chip-Multiprocessors
"... Abstract—Large scale Chip-Multiprocessors (CMPs) generally employ Network-on-Chip (NoC) to connect the last level cache (LLC), which is generally organized as distributed NUCA (non-uniform cache access) arrays for scalability and efficiency. On the other hand, aggressive technology scaling induces s ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—Large scale Chip-Multiprocessors (CMPs) generally employ Network-on-Chip (NoC) to connect the last level cache (LLC), which is generally organized as distributed NUCA (non-uniform cache access) arrays for scalability and efficiency. On the other hand, aggressive technology scaling induces
Computer Science- Research and Development manuscript No. (will be inserted by the editor) Predictive Analysis of a Hydrodynamics Application on Large-Scale CMP Clusters
"... Abstract We present the development of a predictive performance model for the high-performance computing code Hydra, a hydrodynamics benchmark developed and maintained by the United Kingdom Atomic Weapons Establishment (AWE). The developed model elucidates the parallel computation of Hydra, with whi ..."
Abstract
- Add to MetaCart
, with which it is possible to predict its run-time and scaling performance on varying large-scale chip multiprocessor (CMP) clusters. A key feature of the model is its granularity; with the model we are able to separate the contributing costs, including computation, point-topoint communications, collectives
Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors
- Journal of Parallel and Distributed Computing
, 1991
"... The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Although the provision of coherent caches in many recent machines has alleviated the problem somewhat, cache misses still occur frequently enough that they s ..."
Abstract
-
Cited by 302 (18 self)
- Add to MetaCart
The large latency of memory accesses is a major obstacle in obtaining high processor utilization in large scale shared-memory multiprocessors. Although the provision of coherent caches in many recent machines has alleviated the problem somewhat, cache misses still occur frequently enough
Disco: Running commodity operating systems on scalable multiprocessors
- ACM Transactions on Computer Systems
, 1997
"... In this paper we examine the problem of extending modern operating systems to run efficiently on large-scale shared memory multiprocessors without a large implementation effort. Our approach brings back an idea popular in the 1970s, virtual machine monitors. We use virtual machines to run multiple c ..."
Abstract
-
Cited by 253 (10 self)
- Add to MetaCart
In this paper we examine the problem of extending modern operating systems to run efficiently on large-scale shared memory multiprocessors without a large implementation effort. Our approach brings back an idea popular in the 1970s, virtual machine monitors. We use virtual machines to run multiple
Results 11 - 20
of
4,465