Results 1 - 10
of
15
Scheduling Multithreaded Computations by Work Stealing
"... This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computa ..."
Abstract
-
Cited by 316 (32 self)
- Add to MetaCart
This paper studies the problem of efficiently scheduling fully strict (i.e., well-structured) multithreaded computations on parallel computers. A popular and practical method of scheduling this kind of dynamic MIMD-style computation is "work stealing," in which processors needing work steal computational threads from other processors. In this paper, we give the first provably good work-stealing scheduler for multithreaded computations with dependencies. Specifically,
Commit-Reconcile Fences (CRF): A New Memory Model for Architects and Compiler Writers
- In Proceedings of the 26th International Symposium on Computer Architecture
, 1999
"... We present a new mechanism-oriented memory model called Commit-Reconcile & Fences (CRF) and define it using algebraic rules. Many existing memory models can be described as restricted versions of CRF. The model has been designed so that it is both easy for architects to implement, and stable enough ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
We present a new mechanism-oriented memory model called Commit-Reconcile & Fences (CRF) and define it using algebraic rules. Many existing memory models can be described as restricted versions of CRF. The model has been designed so that it is both easy for architects to implement, and stable enough to serve as a target machine interface for compilers of high-level languages. The CRF model exposes a semantic notion of caches (saches), and decomposes load and store instructions into finer-grain operations. We sketch how to integrate CRF into modern microprocessors and outline an adaptive coherence protocol to implement CRF in distributed shared-memory systems. CRF offers an upward compatible way to design next generation computer systems. 1. Loads and Stores: The CISC of Nineties Caching and instruction reordering are ubiquitous features of modern computer systems and are necessary to achieve higher performance. For uniprocessor configurations, these features are mostly transparent and...
The Weakest Reasonable Memory Model
, 1998
"... A memory model is some description of how memory behaves in a parallel computer system. While there is consensus that sequential consistency [Lamport 1979] is the strongest memory model, nobody seems to have tried to identify the weakest memory model. This thesis concerns itself with precisely this ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
A memory model is some description of how memory behaves in a parallel computer system. While there is consensus that sequential consistency [Lamport 1979] is the strongest memory model, nobody seems to have tried to identify the weakest memory model. This thesis concerns itself with precisely this problem. We cannot hope to identify the weakest memory model unless we specify a minimal set of properties we want it to obey. In this thesis, we identify five such properties: completeness, monotonicity, constructibility, nondeterminism confinement, and classicality. Constructibility is especially interesting, because a nonconstructible model cannot be implemented exactly, and hence every implementation necessarily supports a stronger model. One nonconstructible model is, for example, dag consistency [Blumofe et al. 1996a]. We argue (with some caveats) that if one wants the five properties, then location consistency is the weakest reasonable memory model. In location consistency, every memo...
Transparent information dissemination
- In Proc. Middleware
, 2004
"... Abstract. This paper describes Transparent Replication through Invalidation and Prefetching (TRIP), a self tuning data replication middleware system that enables transparent replication of large-scale information dissemination services. The TRIP middleware is a key building block for constructing in ..."
Abstract
-
Cited by 21 (11 self)
- Add to MetaCart
Abstract. This paper describes Transparent Replication through Invalidation and Prefetching (TRIP), a self tuning data replication middleware system that enables transparent replication of large-scale information dissemination services. The TRIP middleware is a key building block for constructing information dissemination services, a class of services where updates occur at an origin server and reads occur at a number of replicas; examples information dissemination services include content distribution networks such as Akamai [1] and IBM’s Sport and Event replication system [2]. Furthermore, the TRIP middleware can be used to build key parts of general applications that distribute content such as file systems, distributed databases, and publish-subscribe systems. Our data replication middleware supports transparent replication by providing two crucial properties: (1) sequential consistency to avoid introducing anomalous behavior to increasingly complex services and (2) selftuning transmission of updates to maximize performance and availability given available system resources. Our analysis of simulations and our evaluation of a prototype support the hypothesis that it is feasible to provide transparent replication for dissemination services. For example, in simulations, our system’s performance is a factor of three to four faster than a demand-based middleware system for a wide range of configurations. 1
Portable High-Performance Programs
, 1999
"... right notice and this permission notice are preserved on all copies. ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
right notice and this permission notice are preserved on all copies.
Memory model = instruction reordering + store atomicity
- In ACM IEEE International Symposium on Computer Architecture
, 2006
"... We present a novel framework for defining memory models in terms of two properties: thread-local Instruction Reordering axioms and Store Atomicity, which describes inter-thread communication via memory. Most memory models have the store atomicity property, and it is this property that is enforced by ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
We present a novel framework for defining memory models in terms of two properties: thread-local Instruction Reordering axioms and Store Atomicity, which describes inter-thread communication via memory. Most memory models have the store atomicity property, and it is this property that is enforced by cache coherence protocols. A memory model with Store Atomicity is serializable; there is a unique global interleaving of all operations which respects the reordering rules. Our framework uses partially ordered execution graphs; one graph represents many instruction interleavings with identical behaviors. The major contribution of this framework is a procedure for enumerating program behaviors in any memory model with Store Atomicity. Using this framework, we show that address aliasing speculation introduces new program behaviors; we argue that these new behaviors should be permitted by the memory model specification. We also show how to extend our model to capture the behavior of non-atomic memory models such as SPARC R ○ TSO. 1.
Memory models for open-nested transactions
- In MSPC ’06: Proceedings of the 2006 workshop on Memory system performance and correctness
, 2006
"... Open nesting provides a loophole in the strict model of atomic transactions. Moss and Hosking suggested adapting open nesting for transactional memory, and Moss and a group at Stanford have proposed hardware schemes to support open nesting. Since these researchers have described their schemes using ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Open nesting provides a loophole in the strict model of atomic transactions. Moss and Hosking suggested adapting open nesting for transactional memory, and Moss and a group at Stanford have proposed hardware schemes to support open nesting. Since these researchers have described their schemes using only operational definitions, however, the semantics of these systems have not been specified in an implementation-independent way. This paper offers a framework for defining and exploring the memory semantics of open nesting in a transactional-memory setting. Our framework allows us to define the traditional model of serializability and two new transactional-memory models, race freedom and prefix race freedom. The weakest of these memory models, prefix race freedom, closely resembles the Stanford opennesting model. We prove that these three memory models are equivalent for transactional-memory systems that support only closed nesting, as long as aborted transactions are “ignored. ” We prove that for systems that support open nesting, however, the models of serializability, race freedom, and prefix race freedom are distinct. We show that the Stanford TM system implements a model at least as strong as prefix race freedom and strictly weaker than race freedom. Thus, their model compromises serializability, the property traditionally used to reason about the correctness of transactions. 1.
Depot: Cloud storage with minimal trust
"... Abstract: We describe the design, implementation, and evaluation of Depot, a cloud storage system that minimizes trust assumptions. Depot assumes less than any prior system about the correct operation of participating hosts—Depot tolerates Byzantine failures, including malicious or buggy behavior, b ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract: We describe the design, implementation, and evaluation of Depot, a cloud storage system that minimizes trust assumptions. Depot assumes less than any prior system about the correct operation of participating hosts—Depot tolerates Byzantine failures, including malicious or buggy behavior, by any number of clients or servers—yet provides safety and availability guarantees (on consistency, staleness, durability, and recovery) that are useful. The key to safeguarding safety without sacrificing availability (and vice versa) in this environment is to join forks: participants (clients and servers) that observe inconsistent behaviors by other participants can join their forked view into a single view that is consistent with what each individually observed. Our experimental evaluation suggests that the costs of protecting the system are modest. Depot adds a few hundred bytes of metadata to each update and each stored object, and requires hashing and signing each update. 1
Design and Implementation of a Multi-purpose Cluster System Network Interface Unit
, 1999
"... Today, the interface between a high speed network and a high performance computation node is the least mature hardware technology in scalable general purpose cluster computing. Currently, the one-interface-fits-all philosophy prevails. This approach performs poorly in some cases because of the compl ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Today, the interface between a high speed network and a high performance computation node is the least mature hardware technology in scalable general purpose cluster computing. Currently, the one-interface-fits-all philosophy prevails. This approach performs poorly in some cases because of the complexity of modern memory hierarchy and the wide range of communication sizes and patterns. Today's message passing NIU's are also unable to utilize the best data transfer and coordination mechanisms due to poor integration into the computation node's memory hierarchy. These shortcomings unnecessarily constrain the performance of cluster systems. Our thesis is that a cluster system NIU should support multiple communication interfaces layered on a virtual message queue substrate in order to streamline data movement both within each node as well as between nodes. The NIU should be tightly integrated into the computation node's memory hierarchy via the cachecoherent snoopy system bus so as to gain...

