Results 1 - 10
of
30,388
A Performance Study on Bounteous Transfer in Multiprocessor Sectored Caches
- in Proc. of 11th Annual International Symposium on High Performance Computing Systems
, 1997
"... In a sectored cache, a cache line is divided into several subblocks. Each subblock is a basic coherence unit. In this way partial block invalidation can be done on cache coherent multiprocessors. Bounteous transfer is a scheme in sectored caches in which a subblock holder supplies extra subblocks ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In a sectored cache, a cache line is divided into several subblocks. Each subblock is a basic coherence unit. In this way partial block invalidation can be done on cache coherent multiprocessors. Bounteous transfer is a scheme in sectored caches in which a subblock holder supplies extra
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors
- ACM Transactions on Computer Systems
, 1991
"... Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become marke ..."
Abstract
-
Cited by 567 (32 self)
- Add to MetaCart
-accessible ag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory. We present a
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers
, 1990
"... ..."
A high-performance, portable implementation of the MPI message passing interface standard
- Parallel Computing
, 1996
"... MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we d ..."
Abstract
-
Cited by 888 (67 self)
- Add to MetaCart
MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we
Informed Prefetching and Caching
- In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles
, 1995
"... The underutilization of disk parallelism and file cache buffers by traditional file systems induces I/O stall time that degrades the performance of modern microprocessor-based systems. In this paper, we present aggressive mechanisms that tailor file system resource management to the needs of I/O-int ..."
Abstract
-
Cited by 404 (10 self)
- Add to MetaCart
The underutilization of disk parallelism and file cache buffers by traditional file systems induces I/O stall time that degrades the performance of modern microprocessor-based systems. In this paper, we present aggressive mechanisms that tailor file system resource management to the needs of I
Simultaneous Multithreading: Maximizing On-Chip Parallelism
, 1995
"... This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar’s multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with alternative organizations: a wide s ..."
Abstract
-
Cited by 802 (48 self)
- Add to MetaCart
multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multithreading. We evaluate several cache configurations made possible by this type of organization and evaluate tradeoffs between them. We also show that simultaneous multithreading
The SPLASH-2 programs: Characterization and methodological considerations
- INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1995
"... The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH-2 programs in terms of fundamental propertie ..."
Abstract
-
Cited by 1399 (12 self)
- Add to MetaCart
The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH-2 programs in terms of fundamental
A Survey of active network Research
- IEEE Communications
, 1997
"... Active networks are a novel approach to network architecture in which the switches of the network perform customized computations on the messages flowing through them. This approach is motivated by both lead user applications, which perform user-driven computation at nodes within the network today, ..."
Abstract
-
Cited by 542 (29 self)
- Add to MetaCart
Active networks are a novel approach to network architecture in which the switches of the network perform customized computations on the messages flowing through them. This approach is motivated by both lead user applications, which perform user-driven computation at nodes within the network today
U-Net: A User-Level Network Interface for Parallel and Distributed Computing
- In Fifteenth ACM Symposium on Operating System Principles
, 1995
"... The U-Net communication architecture provides processes with a virtual view of a network interface to enable userlevel access to high-speed communication devices. The architecture, implemented on standard workstations using offthe-shelf ATM communication hardware, removes the kernel from the communi ..."
Abstract
-
Cited by 596 (17 self)
- Add to MetaCart
, as well as novel abstractions like Active Messages can be implemented efficiently. A U-Net prototype on an 8-node ATM cluster of standard workstations offers 65 microseconds round-trip latency and 15 Mbytes/sec bandwidth. It achieves TCP performance at maximum network bandwidth and demonstrates
Results 1 - 10
of
30,388