Results 1 -
8 of
8
Limits of instruction-level parallelism
, 1991
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There two other research laboratories located in Palo Al ..."
Abstract
-
Cited by 339 (7 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There two other research laboratories located in Palo Alto, the Network Systems
Shasta: A Low Overhead, Software-Only Approach . . . .
- IN PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS
, 1996
"... This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granu ..."
Abstract
-
Cited by 207 (5 self)
- Add to MetaCart
This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granularity. In addition, the system allows the coherence granularity to vary across different shared data structures in a single application. Shasta implements the shared address space by transparently rewriting the application executable to intercept loads and stores. For each shared load or store, the inserted code checks to see if the data is available locally and communicates with other processors if necessary. The system uses numerous techniques to reduce the run-time overhead of these checks. Since Shasta is implemented entirely in software, it also provides tremendous flexibility in supporting different types of cache coherence protocols. We have implemented an efficient cache co...
Efficient Procedure Mapping using Cache Line Coloring
- IN PROCEEDINGS OF THE SIGPLAN'97 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1997
"... As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replace ..."
Abstract
-
Cited by 67 (12 self)
- Add to MetaCart
As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replacement policy, associativity, line size and the resulting cache access time. Software writers use various optimization techniques, including software prefetching, data scheduling and code reordering. Our focus is on improving memory usage through code reordering compiler techniques. In this
Memory Consistency Models for Shared-Memory Multiprocessors
- WRL RESEARCH REPORT
, 1995
"... The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the u ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the underlying hardware. Relaxed models that impose fewer memory ordering constraints offer the potential for higher performance by allowing hardware and software to overlap and reorder memory operations. However, fewer ordering guarantees can compromise programmability and portability. Many of the previously proposed models either fail to provide reasonable programming semantics or are biased toward programming ease at the cost of sacrificing performance. Furthermore, the lack of consensus on an acceptable model hinders software portability across different systems. This dissertation focuses on providing a balanced solution that directly addresses the trade-off between programming ease and performance. To address programmability, we propose an alternative method for specifying memory behavior that presents a higher level abstraction to the programmer. We show that with only a few types of information supplied by the
Design and Performance of the Shasta Distributed Shared Memory Protocol
- Western Research Laboratory, Digital Equipment Corporation
, 1997
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There are two other research laboratories located in Pal ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There are two other research laboratories located in Palo Alto, the Network Systems
Optimization in Permutation Spaces
, 1996
"... Many optimization problems find a natural mapping in permutation spaces where dedicated algorithms can be used during the optimization process. Unfortunately, some of the best and most effective techniques currently used can only be applied to vectors (cartesian) spaces, where a concept of distance ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Many optimization problems find a natural mapping in permutation spaces where dedicated algorithms can be used during the optimization process. Unfortunately, some of the best and most effective techniques currently used can only be applied to vectors (cartesian) spaces, where a concept of distance between different objects can be easily defined. Examples of such techniques go from simplest deepest descent hill climbers and the more sophisticated conjugate gradient methods used in continuous spaces, to dynanic hill climbers or Genetic algorithms (GAs) used in many large combinatorial problems. This paper describes a general method that allows the best optimization techniques used in vector spaces to be applied to all order based problems whose domain is a permutation space. It will also be shown how this method can be applied to a real world problem, the optimal placement of interconnected cells (modules) on a chip, in order to minimize the total length of their connections. For this p...
Piecewise Linear Models for Rsim
, 1993
"... Rsim is a switch-level simulator which can simulate large digital MOS integrated circuits with speedups of over 3 orders of magnitude over SPICE. Unfortunately, Rsim's simple switched-resistor model renders it incapable of simulating certain CMOS and most BiCMOS and ECL digital circuits. We obser ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Rsim is a switch-level simulator which can simulate large digital MOS integrated circuits with speedups of over 3 orders of magnitude over SPICE. Unfortunately, Rsim's simple switched-resistor model renders it incapable of simulating certain CMOS and most BiCMOS and ECL digital circuits. We observe that the switched-resistor model is just one particular piecewise linear model and that Rsim's simulation framework can accommodate more elaborate piecewise linear models. The resulting simulator, Mom, combines the efficiency of switch-level simulation with the ability to simulate a wider variety of circuits. We demonstrate Mom's efficiency and flexibility on a variety of circuits. This research was supported in part by DARPA contract N00039-91-C-1038. d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 USA 1 Introduction The high cost of semiconductor processing makes it desirable to verify the correctness of a large custom digital integr...
Optimizations and Placements with the Genetic Workbench
, 1996
"... The Genetic Workbench (GWB) is a software system built with the intent of investigating evolutionary or non-standard algorithms applied to difficult combinatorial problems. The user is allowed to experiment with various techniques, operators, parameters, strategies and compare the results. In partic ..."
Abstract
- Add to MetaCart
The Genetic Workbench (GWB) is a software system built with the intent of investigating evolutionary or non-standard algorithms applied to difficult combinatorial problems. The user is allowed to experiment with various techniques, operators, parameters, strategies and compare the results. In particular the optimal placements of connected components or modules on a plane has been considered, but some of the strategies implemented in the GWB can be applied to other permutation based problems as well. Techniques which generate the best results have also been compared with one of the best commercial tools available, TimberWolf ver. 7, which uses a special simulated annealing algorithm, to highlight the strengths and weaknesses of the different methods. Most of the strategies used in the GWB can be classified as evolutionary or rely on some implementation of a genetic algorithm; this is the reason why the qualifier genetic has been used to name the system. For the placement problem in part...

