Results 1 - 10
of
41
Using Prediction to Accelerate Coherence Protocols
, 1998
"... Most large shared-memory multiprocessors use directory protocols to keep per-processor caches coherent. Some memory references in such systems, however, suffer long latencies for misses to remotely cached blocks. To ameliorate this latency, researchers have augmented standard coherence protocols wit ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
Most large shared-memory multiprocessors use directory protocols to keep per-processor caches coherent. Some memory references in such systems, however, suffer long latencies for misses to remotely cached blocks. To ameliorate this latency, researchers have augmented standard coherence protocols with optimizations for specific sharing patterns, such as read-modify-write, producer-consumer, and migratory sharing. This paper seeks to replace these directed solutions with general prediction logic that monitors coherence activity and triggers appropriate coherence actions. This paper takes the first step toward using general prediction to accelerate coherence protocols by developing and evaluating the Cosmos coherence message predictor. Cosmos predicts the source and type of the next coherence message for a cache block using logic that is an extension of Yeh and Patt's two-level PAp branch predictor. For five scientific applications running on 16 processors, Cosmos has prediction accuracie...
Volume Leases for Consistency in Large-Scale Systems
, 1999
"... This article introduces volume leases as a mechanism for providing server-driven cache consistency for large-scale, geographically distributed networks. Volume leases retain the good performance, fault tolerance, and server scalability of the semantically weaker client-driven protocols that are now ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
This article introduces volume leases as a mechanism for providing server-driven cache consistency for large-scale, geographically distributed networks. Volume leases retain the good performance, fault tolerance, and server scalability of the semantically weaker client-driven protocols that are now used on the web. Volume leases are a variation of object leases, which were originally designed for distributed file systems. However, whereas traditional object leases amortize overheads over long lease periods, volume leases exploit spatial locality to amortize overheads across multiple objects in a volume. This approach allows systems to maintain good write performance even in the presence of failures. Using trace-driven simulation, we compare three volume lease algorithms against four existing cache consistency algorithms and show that our new algorithms provide strong consistency while maintaining scalability and faulttolerance. For a trace-based workload of web accesses, we find that volumes can reduce message traffic at servers by 40% compared to a standard lease algorithm, and that volumes can considerably reduce the peak load at servers when popular objects are modified.
Better Operating System Features for Faster Network Servers
- In Proc. Workshop on Internet Server Performance
, 1998
"... Widely-used operating systems provide inadequate support for large-scale Internet server applications. Their algorithms and interfaces fail to efficiently support either event-driven or multi-threaded servers. They provide poor control over the scheduling and management of machine resources, making ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
Widely-used operating systems provide inadequate support for large-scale Internet server applications. Their algorithms and interfaces fail to efficiently support either event-driven or multi-threaded servers. They provide poor control over the scheduling and management of machine resources, making it difficult to provide robust and controlled service. We propose new UNIX interfaces to improve scalability, and to provide fine-grained scheduling and resource management. 1 Introduction The performance of Internet server applications on a general purpose operating system is often dismayingly lower than what one would expect from the underlying hardware. Internet servers also suffer from other undesirable properties such as poor scalability, unfair resource allocation, susceptibility to livelock under excess load, instability under denial of service attacks, and inability to prioritize handling of requests. The cause of these problems is a fundamental mismatch between the original design ...
Verifying Systems with Replicated Components in Murφ
, 1997
"... An extension to the Murphi verifier is presented to verify systems with replicated identical components. Although most systems are finite-state in nature, many of them are also designed to be scalable, so that a description gives a family of systems, each member of which has a different number of re ..."
Abstract
-
Cited by 40 (3 self)
- Add to MetaCart
An extension to the Murphi verifier is presented to verify systems with replicated identical components. Although most systems are finite-state in nature, many of them are also designed to be scalable, so that a description gives a family of systems, each member of which has a different number of replicated components. It is therefore desirable to be able to verify the entire family of systems, independent of the exact number of replicated components. The verification is performed by explicit state enumeration in an abstract state space where states do not record the exact numbers of components. We provide an extension to the existing Murphi language, by which a designer can easily specify a system in its concrete form. Through a new datatype, called RepetitiveID, a designer can suggest the use of this abstraction to verify a family of systems. First of all, Murphi automatically checks the soundness of this abstraction. Then it automatically translates the system description to an abstract ...
Protocol-based data-race detection
- In Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
, 1998
"... Distributed Shared-Memory (DSM)computers, which partition physical memory among a collection of workstationlike computing nodes, are now a common way to implement parallel machines. Recently, there has been much interest in DSM machines that use software, instead of hardware, to implement coherence ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Distributed Shared-Memory (DSM)computers, which partition physical memory among a collection of workstationlike computing nodes, are now a common way to implement parallel machines. Recently, there has been much interest in DSM machines that use software, instead of hardware, to implement coherence protocols to manage data replication and cache coherence. Software offers many advantages, not the least of which is the possibility of adding significant functionality — such as race detection — to a protocol. This paper describes a new, transparent, protocol-based technique for automatically detecting data races on-the-fly. An implementation of this approach in a DSM system running on a Thinking Machines CM-5 found data races in two of a set of five shared-memory benchmarks. Monitored applications had slowdowns ranging from 0–3 on 32 nodes. 1
Fine-Grain Distributed Shared Memory on Clusters of Workstations
, 1997
"... Shared memory, one of the most popular models for programming parallel platforms, is becoming ubiquitous both in low-end workstations and high-end servers. With the advent of low-latency networking hardware, clusters of workstations strive to offer the same processing power as high-end servers for a ..."
Abstract
-
Cited by 30 (8 self)
- Add to MetaCart
Shared memory, one of the most popular models for programming parallel platforms, is becoming ubiquitous both in low-end workstations and high-end servers. With the advent of low-latency networking hardware, clusters of workstations strive to offer the same processing power as high-end servers for a fraction of the cost. In such environments, shared memory has been limited to page-based systems that control access to shared memory using the memory's page protection to implement shared memory coherence protocols. Unfortunately, false sharing and fragmentation problems force such systems to resort to weak consistency shared memory models that complicate the shared memory programming model.
Modeling Web Interactions
, 2003
"... Programmers confront a minefield when they design interactive Web programs. Web interactions take place via Web browsers. With browsers, consumers can whimsically navigate among the various stages of a dialog and can thus confuse the most sophisticated corporate Web sites. In turn, Web services ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Programmers confront a minefield when they design interactive Web programs. Web interactions take place via Web browsers. With browsers, consumers can whimsically navigate among the various stages of a dialog and can thus confuse the most sophisticated corporate Web sites. In turn, Web services can fault in frustrating and inexplicable ways. The quickening transition from Web scripts to Web services lends these problems immediacy.
Experience with a Language for Writing Coherence Protocols
, 1997
"... In this paper we describe our experience with Teapot [7], a domain-specific language for addressing the cache coherence problem. The cache coherence problem arises when parallel and distributed computing systems make local replicas of shared data for reasons of scalability and performance. ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
In this paper we describe our experience with Teapot [7], a domain-specific language for addressing the cache coherence problem. The cache coherence problem arises when parallel and distributed computing systems make local replicas of shared data for reasons of scalability and performance.
CACHET: An Adaptive Cache Coherence Protocol for Distributed Shared-Memory Systems
- In International Conference on Supercomputing
, 1999
"... An adaptive cache coherence protocol changes its actions to address changing program behaviors. We present an adaptive protocol called Cachet for distributed sharedmemory systems. Cachet is a seamless integration of several micro-protocols, each of which has been optimized for a particular memory ac ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
An adaptive cache coherence protocol changes its actions to address changing program behaviors. We present an adaptive protocol called Cachet for distributed sharedmemory systems. Cachet is a seamless integration of several micro-protocols, each of which has been optimized for a particular memory access pattern. Cachet embodies both intraprotocol and inter-protocol adaptivity, and exploits adaptivity to achieve high performance under changing memory access patterns. Cachet is presented in the context of a mechanism-oriented memory model, Commit-Reconcile & Fences (CRF), which is a generalization of sequential consistency and other weaker memory models in use today. A protocol to implement CRF is automatically a correct implementation of any memory model whose programs can be expressed as CRF programs. 1 Introduction Shared-memory programs have various access patterns, and empirical evidence suggests that no fixed cache coherence protocol works well for all access patterns [1, 4, 5, 12...
A Domain-Specific Language For Video Device Drivers: From Design To Implementation
- IN CONFERENCE ON DOMAIN SPECIFIC LANGUAGES
, 1997
"... Domain-specific languages (DSL) have many potential advantages in terms of software engineering ranging from increased productivity to the application of formal methods. Although they have been used in practice for decades, there has been little study of methodology or implementation tools for the ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
Domain-specific languages (DSL) have many potential advantages in terms of software engineering ranging from increased productivity to the application of formal methods. Although they have been used in practice for decades, there has been little study of methodology or implementation tools for the DSL approach. In this paper we present our DSL approach and its application to a realistic application: video display device drivers. The presentation focuses on the validation of our proposed framework for domain-specific languages, which provides automatic generation of efficient implementations of DSL programs (see SSR'97, ACM Symposium on Software Reuse). Additionally, we describe an example of a complete DSL for video display adaptors and the benefits of the DSL approach in this application. This demonstrates some of the generally claimed benefits of using DSLs: increased productivity, higher-level abstraction, and easier verification.

