• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Synchronization and communication in the T3E multiprocessor (1996)

by Steven L Scott
Venue:In ASPLOS VII
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 93
Next 10 →

Titanium: A High-Performance Java Dialect

by Kathy Yelick, Luigi Semenzato, Geoff Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham, David Gay, Phil Colella, Alex Aiken - In ACM , 1998
"... Abstract Titanium is a language and system for high-performance parallel scientific computing. Titaniumuses Java as its base, thereby leveraging the advantages of that language and allowing us to focus ..."
Abstract - Cited by 192 (27 self) - Add to MetaCart
Abstract Titanium is a language and system for high-performance parallel scientific computing. Titaniumuses Java as its base, thereby leveraging the advantages of that language and allowing us to focus

The Landscape of Parallel Computing Research: A View from Berkeley

by Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, Katherine A. Yelick, Meetings Jim Demmel, William Plishker, John Shalf, Samuel Williams, Katherine Yelick - TECHNICAL REPORT, UC BERKELEY , 2006
"... All rights reserved. ..."
Abstract - Cited by 187 (13 self) - Add to MetaCart
All rights reserved.

The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus

by Steven L. Scott, et al. , 1996
"... This paper describes the interconnection network used in the Cray T3E multiprocessor. The network is a bidirectional 3D torus with fully adaptive routing, optimized virtual channel assignments, integrated barrier synchronization support and considerable fault tolerance. The routers are built with LS ..."
Abstract - Cited by 111 (4 self) - Add to MetaCart
This paper describes the interconnection network used in the Cray T3E multiprocessor. The network is a bidirectional 3D torus with fully adaptive routing, optimized virtual channel assignments, integrated barrier synchronization support and considerable fault tolerance. The routers are built with LSI’s 500K ASIC technology with custom transmitters/ receivers driving low-voltage differential signals at 375 MHz, for a link data payload capacity of approximately 500 MB/s.

Effects of communication latency, overhead, and bandwidth in a cluster architecture

by Richard P. Martin, Amin M. Vahdat, David E. Culler, Thomas E. Anderson - In Proceedings of the 24th Annual International Symposium on Computer Architecture , 1997
"... This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on ..."
Abstract - Cited by 98 (5 self) - Add to MetaCart
This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased from 3 to 103 s. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and per-message bandwidth, indicating that further improvements in communication performance will continue to improve application performance. 1

Vector Microprocessors

by Krste Asanovic, Krste Asanovic, Krste Asanovic - In Hot Chips VII , 1998
"... Vector Microprocessors by Krste Asanovic Doctor of Philosophy in Computer Science University of California, Berkeley Professor John Wawrzynek, Chair Most previous research into vector architectures has concentrated on supercomputing applications and small enhancements to existing vector superc ..."
Abstract - Cited by 62 (4 self) - Add to MetaCart
Vector Microprocessors by Krste Asanovic Doctor of Philosophy in Computer Science University of California, Berkeley Professor John Wawrzynek, Chair Most previous research into vector architectures has concentrated on supercomputing applications and small enhancements to existing vector supercomputer implementations. This thesis expands the body of vector research by examining designs appropriate for single-chip full-custom vector microprocessor implementations targeting a much broader range of applications. I present the design, implementation, and evaluation of T0 (Torrent-0): the first single-chip vector microprocessor. T0 is a compact but highly parallel processor that can sustain over 24 operations per cycle while issuing only a single 32-bit instruction per cycle. T0 demonstrates that vector architectures are well suited to full-custom VLSI implementation and that they perform well on many multimedia and human-machine interface tasks. The remainder of the thesis contains ...

Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems

by Andrea Carol Arpaci-Dusseau - ACM TRANSACTIONS ON COMPUTER SYSTEMS , 1998
"... In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing natural ..."
Abstract - Cited by 44 (2 self) - Add to MetaCart
In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing naturally-occurring local events and their corresponding implicit information, i.e., information available outside of a defined interface. Many systems, particularly in distributed and networked environments, have leveraged implicit control to simplify the implementation of services with autonomous components. To concretely demonstrate the advantages of implicit control, we propose and implement implicit coscheduling, an algorithm for dynamically coordinating the time...

LoPC: Modeling Contention in Parallel Algorithms

by Matthew I. Frank, Anant Agarwal, Mary K. Vernon , 1997
"... Parallel algorithm designers need computational models that take first order system costs into account, but are also simple enough to use in practice. This paper introduces the LoPC model, which is inspired by the LogP model but accounts for contention for message processing resources in parallel al ..."
Abstract - Cited by 41 (9 self) - Add to MetaCart
Parallel algorithm designers need computational models that take first order system costs into account, but are also simple enough to use in practice. This paper introduces the LoPC model, which is inspired by the LogP model but accounts for contention for message processing resources in parallel algorithms on a multiprocessor or network of workstations. LoPC takes the , and parameters directly from the LogP model and uses them to predict the cost of contention, .

Adaptive History-Based Memory Schedulers

by Ibrahim Hur, Calvin Lin
"... As memory performance becomes increasingly important to overall system performance, the need to carefully schedule memory operations also increases. This paper presents a new approach to memory scheduling that considers the history of recently scheduled operations. This history-based approach provid ..."
Abstract - Cited by 34 (2 self) - Add to MetaCart
As memory performance becomes increasingly important to overall system performance, the need to carefully schedule memory operations also increases. This paper presents a new approach to memory scheduling that considers the history of recently scheduled operations. This history-based approach provides two conceptual advantages: (1) it allows the scheduler to better reason about the delays associated with its scheduling decisions, and (2) it allows the scheduler to select operations so that they match the program's mixture of Reads and Writes, thereby avoiding certain bottlenecks within the memory controller. We evaluate our solution using a cycle-accurate simulator for the recently announced IBM Power5. When compared with an in-order scheduler, our solution achieves IPC improvements of 10.9% on the NAS benchmarks and 63% on the data-intensive Stream benchmarks. Using microbenchmarks, we illustrate the growing importance of memory scheduling in the context of CMP's, hardware controlled prefetching, and faster CPU speeds.

Fine-Grain Distributed Shared Memory on Clusters of Workstations

by Ioannis T. Schoinas , 1997
"... Shared memory, one of the most popular models for programming parallel platforms, is becoming ubiquitous both in low-end workstations and high-end servers. With the advent of low-latency networking hardware, clusters of workstations strive to offer the same processing power as high-end servers for a ..."
Abstract - Cited by 30 (8 self) - Add to MetaCart
Shared memory, one of the most popular models for programming parallel platforms, is becoming ubiquitous both in low-end workstations and high-end servers. With the advent of low-latency networking hardware, clusters of workstations strive to offer the same processing power as high-end servers for a fraction of the cost. In such environments, shared memory has been limited to page-based systems that control access to shared memory using the memory's page protection to implement shared memory coherence protocols. Unfortunately, false sharing and fragmentation problems force such systems to resort to weak consistency shared memory models that complicate the shared memory programming model.

High performance virtual machines (HPVM): Clusters with supercomputing APIs and performance

by Andrew Chien, Scott Pakin, Mario Lauria, Matt Buchanan, Kay Hane, Louis Giannini, Jane Prusakova - in: Proceedings of the 8th SIAM Conference on Parallel Processing for Scientific Computing , 1997
"... The HPVM project provides software which enables high-performance computing on clusters of PCs and workstations using standard supercomputing APIs such as MPI, SHMEM Put/Get, and Global Arrays. HPVMs—High-Performance Virtual Machines—are surprisingly competitive with MPP systems, such as the IBM SP2 ..."
Abstract - Cited by 29 (4 self) - Add to MetaCart
The HPVM project provides software which enables high-performance computing on clusters of PCs and workstations using standard supercomputing APIs such as MPI, SHMEM Put/Get, and Global Arrays. HPVMs—High-Performance Virtual Machines—are surprisingly competitive with MPP systems, such as the IBM SP2 and Cray T3D. The Illinois HPVM achieves impressive low-level communication performance across the cluster: one-way latencies of around 11 µsec and bandwidths> 50 MBytes/sec—even for small packets (< 256 bytes). Performance at higher levels, such as MPI, is expected to be approximately 17 µsec latency and also> 50 MByte/sec bandwidth.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University