Results 1 - 10
of
198
Shasta: A Low Overhead, Software-Only Approach for Supporting Fine-Grain Shared Memory
"... This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granulari ..."
Abstract
-
Cited by 236 (5 self)
- Add to MetaCart
This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine
Hierarchical Image Caching for Accelerated Walkthroughs of Complex Environments
, 1996
"... We present a new method that utilizes path coherence to accelerate walkthroughs of geometrically complex static scenes. As a preprocessing step, our method constructs a BSP-tree that hierarchically partitions the geometric primitives in the scene. In the course of a walkthrough, images of nodes at v ..."
Abstract
-
Cited by 184 (10 self)
- Add to MetaCart
We present a new method that utilizes path coherence to accelerate walkthroughs of geometrically complex static scenes. As a preprocessing step, our method constructs a BSP-tree that hierarchically partitions the geometric primitives in the scene. In the course of a walkthrough, images of nodes
The Alpha 21364 Network Architecture
- IEEE Micro
"... The Alpha 21364 processor provides a high-performance, highly scalable, and highly reliable network urchitecture. The router runs at I.2GHz and routes packets at a peak bandwidth of 22.4 GB/s. The network architecture scales up to a 128-processor configuration, which can support up to four terabytes ..."
Abstract
-
Cited by 93 (0 self)
- Add to MetaCart
terabytes of distributed Rambus memory and hundreds of terabytes of disk storage. The distributed Rambus memory is kept coherent viu a scalable, directory-based, cache coherence scheme. The network also provides a variety of reliability features, such as per-flit ECC. These features make the 21364 network
Automatic Software Cache Coherence through Vectorization
- In Proceedings of the 1992 International Conference on Supercomputing
, 1992
"... Access latency in large-scale shared-memory multiproces- sors is a concern since most (if not all) memory is one or more hops away through an interconnection network. Providing processors with one or more levels of cache is an accepted way to reduce the average access latency; however, in a multipro ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
multiprocessor, cached values must be kept coherent for the multiprocessor to support the abstraction of a shared global memory. There is no generally accepted hardware solution to provide cache coherence for large-scale shared-memory multiprocessors. Software coherence strategies offer sealability with current
Verification Techniques for Cache Coherence Protocols.
, 1997
"... ion and Specification Using FSMs Although there is a variety of ways to specify a protocol model, we are interested in methodologies that employ finite state machines (FSMs) to form protocol models. Because cache protocols are essentially composed of component processes such as memory and cache cont ..."
Abstract
-
Cited by 43 (0 self)
- Add to MetaCart
finite state machine [FSM.sub.c] and the protocol machine is composed of all [FSM.sub.c]s. Inputs to these machines are processor-generated events and messages for maintaining data consistency. In general, the protocol models are abstracted representations. They are often kept simple to make
Coherent Parallel Hashing
, 2011
"... (a) The flower image is 3820 × 3820 image (14.5 million pixels) and contains 3.7 million non–white pixels. The coordinates of these pixels are shown as colors in (b). We store the image in a hash table under a 0.99 load factor: the hash table contains only 3.73 million entries. These are used as key ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
.5 ms. The visible structures are due to preserved coherence. This translates to faster access as neighboring threads perform similar operations and access nearby memory. (e) Neighboring keys are kept together during probing, thereby improving the coherence of memory accesses of neighboring threads
Enhancements to Directional Coherence Maps
- University of West Bohemia
, 2001
"... Directional Coherence Maps as proposed by Guo in '98 are a very efficient acceleration technique for ray tracing based global illumination renderers. It vastly reduces the number of pixels which have to be computed exactly by identifying regions which are suitable for interpolation. By using ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Directional Coherence Maps as proposed by Guo in '98 are a very efficient acceleration technique for ray tracing based global illumination renderers. It vastly reduces the number of pixels which have to be computed exactly by identifying regions which are suitable for interpolation. By using
K2: A Mobile Operating System for Heterogeneous Coherence Domains
- In Proc. ACM ASPLOS
, 2014
"... Mobile System-on-Chips (SoC) that incorporate heterogeneous coherence domains promise high energy efficiency to a wide range of mobile applications, yet are difficult to program. To exploit the architecture, a desirable, yet missing capability is to replicate operating system (OS) services over mult ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
with its two kernels running on top of the two coherence domains of OMAP4. The two kernels have independent instances of core OS services, such as page allocator and interrupt management, as coordinated by K2; the two kernels share most extended OS services, such as device drivers, whose state is kept
Scheduling to Reduce Memory Coherence Overhead on Coarse-Grain Multiprocessors
, 1995
"... Some Distributed Shared Memory (DSM) and Cache-Only Memory Architecture (COMA) multiprocessors keep processes near the data they reference by transparently replicating remote data in the processes' local memories. This automatic replication of data can impose substantial memory system overhead ..."
Abstract
- Add to MetaCart
on an application since all replicated data must be kept coherent. We examine the effect of task scheduling on data replication and memory system overhead due to coherency requirements. We show that simple policies using programmer hints can reduce memory coherence overhead in our workload applications. 1
Vantage: Scalable and Efficient FineGrain Cache Partitioning
- In Proc. of the International Symposium on Computer Architecture (ISCA), SJ
, 2011
"... ......Shared caches are pervasive in chip multiprocessors (CMPs). In particular, CMPs almost always feature a large, fully shared last-level cache (LLC) to mitigate the high latency, high energy, and limited bandwidth of main memory. A shared LLC has several advantages over multiple, private LLCs: i ..."
Abstract
-
Cited by 35 (11 self)
- Add to MetaCart
: it increases cache utilization, accelerates intercore communication (which happens through the cache instead of main memory), and reduces the cost of coherence (because only non-fully-shared caches must be kept coherent). Unfortunately, these advantages come at a significant cost. When multiple applications
Results 1 - 10
of
198