Results 1 -
9 of
9
Shasta: A Low Overhead, Software-Only Approach . . . .
- IN PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS
, 1996
"... This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granu ..."
Abstract
-
Cited by 207 (5 self)
- Add to MetaCart
This paper describes Shasta, a system that supports a shared address space in software on clusters of computers with physically distributed memory. A unique aspect of Shasta compared to most other software distributed shared memory systems is that shared data can be kept coherent at a fine granularity. In addition, the system allows the coherence granularity to vary across different shared data structures in a single application. Shasta implements the shared address space by transparently rewriting the application executable to intercept loads and stores. For each shared load or store, the inserted code checks to see if the data is available locally and communicates with other processors if necessary. The system uses numerous techniques to reduce the run-time overhead of these checks. Since Shasta is implemented entirely in software, it also provides tremendous flexibility in supporting different types of cache coherence protocols. We have implemented an efficient cache co...
Systems for Late Code Modification
- WRL Research Report 91/5
, 1991
"... Modifying code after the compiler has generated it can be useful for both optimization and instrumentation. This paper compares the code modification systems of Mahler and pixie, and describes two new systems we have built that are hybrids of the two. This paper covers material presented at the CODE ..."
Abstract
-
Cited by 87 (5 self)
- Add to MetaCart
Modifying code after the compiler has generated it can be useful for both optimization and instrumentation. This paper compares the code modification systems of Mahler and pixie, and describes two new systems we have built that are hybrids of the two. This paper covers material presented at the CODE '91 International Workshop on Code Generation, Schloss Dagstuhl, Germany, May 20-24, 1991. i 1. Introduction Late code modification is the process of modifying the output of a compiler after the compiler has generated it. The reasons one might want to do this fall into two categories, optimization and instrumentation. Some forms of optimization must be performed on assembly-level or machinelevel code. The oldest is peephole optimization [11], which acts to tidy up code that a compiler has generated; it has since been generalized to include transformations on more machine-independent code [2,3]. Reordering of code to avoid pipeline stalls [4,7,18] is most often done after the code is gene...
Efficient Procedure Mapping using Cache Line Coloring
- IN PROCEEDINGS OF THE SIGPLAN'97 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1997
"... As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replace ..."
Abstract
-
Cited by 67 (12 self)
- Add to MetaCart
As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replacement policy, associativity, line size and the resulting cache access time. Software writers use various optimization techniques, including software prefetching, data scheduling and code reordering. Our focus is on improving memory usage through code reordering compiler techniques. In this
Memory Consistency Models for Shared-Memory Multiprocessors
- WRL RESEARCH REPORT
, 1995
"... The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the u ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the underlying hardware. Relaxed models that impose fewer memory ordering constraints offer the potential for higher performance by allowing hardware and software to overlap and reorder memory operations. However, fewer ordering guarantees can compromise programmability and portability. Many of the previously proposed models either fail to provide reasonable programming semantics or are biased toward programming ease at the cost of sacrificing performance. Furthermore, the lack of consensus on an acceptable model hinders software portability across different systems. This dissertation focuses on providing a balanced solution that directly addresses the trade-off between programming ease and performance. To address programmability, we propose an alternative method for specifying memory behavior that presents a higher level abstraction to the programmer. We show that with only a few types of information supplied by the
Design and Performance of the Shasta Distributed Shared Memory Protocol
- Western Research Laboratory, Digital Equipment Corporation
, 1997
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There are two other research laboratories located in Pal ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There are two other research laboratories located in Palo Alto, the Network Systems
Optimization in Permutation Spaces
, 1996
"... Many optimization problems find a natural mapping in permutation spaces where dedicated algorithms can be used during the optimization process. Unfortunately, some of the best and most effective techniques currently used can only be applied to vectors (cartesian) spaces, where a concept of distance ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Many optimization problems find a natural mapping in permutation spaces where dedicated algorithms can be used during the optimization process. Unfortunately, some of the best and most effective techniques currently used can only be applied to vectors (cartesian) spaces, where a concept of distance between different objects can be easily defined. Examples of such techniques go from simplest deepest descent hill climbers and the more sophisticated conjugate gradient methods used in continuous spaces, to dynanic hill climbers or Genetic algorithms (GAs) used in many large combinatorial problems. This paper describes a general method that allows the best optimization techniques used in vector spaces to be applied to all order based problems whose domain is a permutation space. It will also be shown how this method can be applied to a real world problem, the optimal placement of interconnected cells (modules) on a chip, in order to minimize the total length of their connections. For this p...
WRL Research Report 91/6
, 1991
"... In their quest to produce tools for the production of uniform graphical user interfaces, almost all designers of toolkits for the X window system have overlooked an important capability. The best way to improve many programs is not to replace text interfaces based on command line flags with graphica ..."
Abstract
- Add to MetaCart
In their quest to produce tools for the production of uniform graphical user interfaces, almost all designers of toolkits for the X window system have overlooked an important capability. The best way to improve many programs is not to replace text interfaces based on command line flags with graphical buttons, but to provide programs with a simple way to draw pictures. This report describes a graphics server, ezd, that sits between an application program and the X server and allows both existing and new programs easy access to structured graphics. Programs may draw, edit, and sense user events in terms of application-defined graphical objects. When run on workstations with 10 MIPS or faster processors, interactive response is excellent, indicating that ezd's simple structured graphics drawing model can be widely applied. The enthusiastic response of ezd's initial users and the variety of uses to which they have put it to suggest that there is a tremendous pent-up urge to draw with progr...
Smart Code, Stupid Memory: A Fast X Server for a Dumb Color Frame Buffer
, 1989
"... Processor speeds are improving faster than memory access speeds. The current generation of RISC processors can perform many 2-dimensional graphics operations at memory bandwidth speeds, rendering specialized hardware unnecessary. This paper describes the DECStation 3100 color frame buffer hardware a ..."
Abstract
- Add to MetaCart
Processor speeds are improving faster than memory access speeds. The current generation of RISC processors can perform many 2-dimensional graphics operations at memory bandwidth speeds, rendering specialized hardware unnecessary. This paper describes the DECStation 3100 color frame buffer hardware and several of the graphics algorithms used in the X server implementation. Measured performance numbers are presented and compared to memory bandwidth speed, and possible frame buffer improvements are discussed. d i g i t a l Western Research Laboratory 100 Hamilton Avenue Palo Alto, California 94301 USA 1. Introduction Color workstations have typically had a screen size of 1024 columns by 768 to 1024 rows, 8 bits per pixel, and a 1 to 3 MIPS processor. Except at the very low end, color workstations are traditionally equipped with special-purpose graphics accelerators in order to achieve reasonable performance. The DECStation 3100 supports a 1024x864x8 color display, but uses no special...
Optimizations and Placements with the Genetic Workbench
, 1996
"... The Genetic Workbench (GWB) is a software system built with the intent of investigating evolutionary or non-standard algorithms applied to difficult combinatorial problems. The user is allowed to experiment with various techniques, operators, parameters, strategies and compare the results. In partic ..."
Abstract
- Add to MetaCart
The Genetic Workbench (GWB) is a software system built with the intent of investigating evolutionary or non-standard algorithms applied to difficult combinatorial problems. The user is allowed to experiment with various techniques, operators, parameters, strategies and compare the results. In particular the optimal placements of connected components or modules on a plane has been considered, but some of the strategies implemented in the GWB can be applied to other permutation based problems as well. Techniques which generate the best results have also been compared with one of the best commercial tools available, TimberWolf ver. 7, which uses a special simulated annealing algorithm, to highlight the strengths and weaknesses of the different methods. Most of the strategies used in the GWB can be classified as evolutionary or rely on some implementation of a genetic algorithm; this is the reason why the qualifier genetic has been used to name the system. For the placement problem in part...

