Results 1 - 10
of
60,946
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors
- ACM Transactions on Computer Systems
, 1991
"... Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become marke ..."
Abstract
-
Cited by 567 (32 self)
- Add to MetaCart
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally-accessible ag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory. We present a new scalable algorithm for spin locks that generates O(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than
Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors
- In Proceedings of the 17th Annual International Symposium on Computer Architecture
, 1990
"... Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the f ..."
Abstract
-
Cited by 735 (18 self)
- Add to MetaCart
Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory
The Case for a Single-Chip Multiprocessor
- IEEE Computer
, 1996
"... Advances in IC processing allow for more microprocessor design options. The increasing gate density and cost of wires in advanced integrated circuit technologies require that we look for new ways to use their capabilities effectively. This paper shows that in advanced technologies it is possible to ..."
Abstract
-
Cited by 433 (6 self)
- Add to MetaCart
to implement a single-chip multiproces-sor in the same area as a wide issue superscalar processor. We find that for applications with little parallelism the performance of the two microarchitectures is comparable. For applications with large amounts of parallelism at both the fine and coarse grained levels
Introduction to redundant arrays of inexpensive disks
- Proceedings of the IEEE COMPCON
, 1989
"... Abstract Increasmg performance of CPUs and memorres wrll be squandered lf not matched by a sunrlm peformance ourease m II0 Whde the capactty of Smgle Large Expenstve D&T (SLED) has grown rapuily, the performance rmprovement of SLED has been modest Redundant Arrays of Inexpensive Disks (RAID), ba ..."
Abstract
-
Cited by 846 (55 self)
- Add to MetaCart
Abstract Increasmg performance of CPUs and memorres wrll be squandered lf not matched by a sunrlm peformance ourease m II0 Whde the capactty of Smgle Large Expenstve D&T (SLED) has grown rapuily, the performance rmprovement of SLED has been modest Redundant Arrays of Inexpensive Disks (RAID), based on the magnetic duk technology developed for personal computers, offers an attractive alternattve IO SLED, promtang onprovements of an or&r of mogm&e m pctformance, rehabdlty, power consumption, and scalalnlrty Thu paper rntroducesfivc levels of RAIDS, grvmg rheu relative costlpetfotmance, and compares RAID to an IBM 3380 and a Fupisu Super Eagle 1 Background: Rlsrng CPU and Memory Performance The users of computers are currently enJoymg unprecedented growth m the speed of computers Gordon Bell said that between 1974 and 1984. smgle chip computers improved m performance by 40 % per year, about twice the rate of mmlcomputers [Bell 841 In the followmg year B111 Joy
Sorting networks and their applications
, 1968
"... To achieve high throughput rates today's computers perform several operations simultaneously. Not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing ..."
Abstract
-
Cited by 660 (0 self)
- Add to MetaCart
To achieve high throughput rates today's computers perform several operations simultaneously. Not only are I/O operations performed concurrently with computing, but also, in multiprocessors, several computing
The Stanford FLASH multiprocessor
- In Proceedings of the 21st International Symposium on Computer Architecture
, 1994
"... The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine’s global memory, a port to the interconnection n ..."
Abstract
-
Cited by 349 (20 self)
- Add to MetaCart
The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine’s global memory, a port to the interconnection
Transactional Memory: Architectural Support for Lock-Free Data Structures
"... A shared data structure is lock-free if its operations do not require mutual exclusion. If one process is interrupted in the middle of an operation, other processes will not be prevented from operating on that object. In highly concurrent systems, lock-free data structures avoid common problems asso ..."
Abstract
-
Cited by 1006 (24 self)
- Add to MetaCart
associated with conventional locking techniques, including priority inversion, convoying, and difficulty of avoiding deadlock. This paper introduces transactional memory, a new multiprocessor architecture intended to make lock-free synchronization as efficient (and easy to use) as conventional techniques
Results 1 - 10
of
60,946