Results 1 - 10
of
21
A High-Performance Microarchitecture with Hardware-Programmable Functional Units
- in Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... This paper explores a novel way to incorporate hardware-programmable resources into a processor microarchitecture to improve the performance of general-purpose applications. Through a coupling of compile-time analysis routines and hardware synthesis tools, we automatically configure a given set of t ..."
Abstract
-
Cited by 171 (1 self)
- Add to MetaCart
This paper explores a novel way to incorporate hardware-programmable resources into a processor microarchitecture to improve the performance of general-purpose applications. Through a coupling of compile-time analysis routines and hardware synthesis tools, we automatically configure a given set of the hardware-programmable functional units (PFUs) and thus augment the base instruction set architecture so that it better meets the instruction set needs of each application. We refer to this new class of general-purpose computers as PRogrammable Instruction Set Computers (PRISC). Although similar in concept, the PRISC approach differs from dynamically programmable microcode because in PRISC we define entirely-new primitive datapath operations. In this paper, we concentrate on the microarchitectural design of the simplest form of PRISC---a RISC microprocessor with a single PFU that only evaluates combinational functions. We briefly discuss the operating system and the programming language co...
Design issues and tradeoffs for write buffers
- In Proceedings of the Third IEEE Symposium on High Performance Computer Architecture
, 1997
"... Processors with write-through caches typically require a write buffer to hide the write latency to the next level of memory hierarchy and to reduce write trajgic. A write buffer can cause processor stalls when it isfull, when it contends with a cache miss for access to the next level of the hierarch ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
Processors with write-through caches typically require a write buffer to hide the write latency to the next level of memory hierarchy and to reduce write trajgic. A write buffer can cause processor stalls when it isfull, when it contends with a cache miss for access to the next level of the hierarchy, and when it contains thefreshest copy of data needed by a load. This paper uses instructionlevel simulation of SPEC92 benchmarks to investigate how different write buffer depths, retirement policies, and load-hazard policies affect these three types of write-buffer stalls. Deeper buflers with adequate headroom, lazier retirement policies, and the ability to read data directly from the write buffer combine to substantially reduce write-buffer-induced stalls. 1
Inducing Heuristics To Decide Whether To Schedule
- IN PROCEEDINGS OF THE ACM SIGPLAN ’04 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 2004
"... Instruction scheduling is a compiler optimization that can improve program speed, sometimes by 10% or more---but it can also be expensive. Furthermore, time spent optimizing is more important in a Java just-in-time (JIT) compiler than in a traditional one because a JIT compiles code at run time, add ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
Instruction scheduling is a compiler optimization that can improve program speed, sometimes by 10% or more---but it can also be expensive. Furthermore, time spent optimizing is more important in a Java just-in-time (JIT) compiler than in a traditional one because a JIT compiles code at run time, adding to the running time of the program. We found that, on any given block of code, instruction scheduling often does not produce significant benefit and sometimes degrades speed. Thus, we hoped that we could focus scheduling effort on those blocks that benefit from it. Using
Isolating Failure-Inducing Thread Schedules
- In International Symposium on Software Testing and Analysis
, 2002
"... Consider a multi-threaded application that occasionally fails due to non-determinism. Using the DEJAVU capture/replay tool, it is possible to record the thread schedule and replay the application in a deterministic way. By systematically narrowing down the difference between a thread schedule that m ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
Consider a multi-threaded application that occasionally fails due to non-determinism. Using the DEJAVU capture/replay tool, it is possible to record the thread schedule and replay the application in a deterministic way. By systematically narrowing down the difference between a thread schedule that makes the program pass and another schedule that makes the program fail, the Delta Debugging approach can pinpoint the error location automatically -- namely, the location(s) where a thread switch causes the program to fail. In a case study, Delta Debugging isolated the failure-inducing schedule difference from 3.8 billion differences in only 50 tests.
Automatic Removal of Array Memory Leaks in Java
- In Proceedings of the International Conference on Compiler Construction (CC ’00), March-April
, 2000
"... Current garbage collection (GC) techniques do not (and in general cannot) collect all the garbage that a program produces. This may lead to a performance slowdown and to programs running out of memory space. In this paper, we present a practical algorithm for statically detecting memory leaks occurr ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
Current garbage collection (GC) techniques do not (and in general cannot) collect all the garbage that a program produces. This may lead to a performance slowdown and to programs running out of memory space. In this paper, we present a practical algorithm for statically detecting memory leaks occurring in arrays of objects in a garbage collected environment. No previous algorithm exists. The algorithm is conservative, i.e., it never detects a leak on a piece of memory that is subsequently used by the program, although it may fail to identify some leaks. The presence of the detected leaks is exposed to the garbage collector, thus allowing GC to collect more storage. We have instrumented the Java virtual machine to measure the effect of memory leaks in arrays. Our initial experiments indicate that this problem occurs in many Java applications. Our measurements of heap size show improvement on some example programs.
Method-specific dynamic compilation using logistic regression
- of ACM SIGPLAN Conferences on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'06
, 2006
"... Abstract Determining the best set of optimizations to apply to a programhas been a long standing problem for compiler writers. To reduce ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Abstract Determining the best set of optimizations to apply to a programhas been a long standing problem for compiler writers. To reduce
Estimating the Impact of Heap Liveness Information on Space Consumption In Java
- IN ISMM ’02: PROCEEDINGS OF THE 3RD INTERNATIONAL SYMPOSIUM ON MEMORY MANAGEMENT
, 2002
"... We study the potential impact of di#erent kinds of liveness information on the space consumption of a program in a garbage collected environment, specifically for Java. The idea is to measure the time di#erence between the actual time an object is collected by the garbage collector (GC) and the pote ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
We study the potential impact of di#erent kinds of liveness information on the space consumption of a program in a garbage collected environment, specifically for Java. The idea is to measure the time di#erence between the actual time an object is collected by the garbage collector (GC) and the potential earliest time an object could be collected assuming liveness information were available. We focus on the following kinds of liveness information: (i) stack reference liveness (local reference variable liveness in Java), (ii) global reference liveness (static reference variable liveness in Java), (iii) heap reference liveness (instance reference variable liveness or array reference liveness in Java), and (vi) any combination of (i)-(iii). We also provide some insights on the kind of interface between a compiler and GC that could achieve these potential savings. The Java
A Toolkit for Modelling and Simulation of Data Grids with Integration of Data Storage, Replication and Analysis
, 2005
"... Data Grids are an emerging new technology for managing large amounts of distributed data. This technology is highly-anticipated by scientific communities, such as in the area of astronomy, protein simulation and high energy physics. This is because experiments in these fields generate massive amount ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
Data Grids are an emerging new technology for managing large amounts of distributed data. This technology is highly-anticipated by scientific communities, such as in the area of astronomy, protein simulation and high energy physics. This is because experiments in these fields generate massive amount of data which need to be shared and analysed. Since it is not possible to test many different usage scenarios on real Data Grid testbeds, it is easier to use simulation as a means of studying complex scenarios.
Automatic Tuning of Inlining Heuristics
- In ACM/IEEE Conference on Supercomputing
, 2005
"... Inlining improves the performance of programs by reducing the overhead of method invocation and increasing the opportunities for compiler optimization. Incorrect inlining decisions, however, can degrade both the running and compilation time of a program. This is especially important for a dynamicall ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Inlining improves the performance of programs by reducing the overhead of method invocation and increasing the opportunities for compiler optimization. Incorrect inlining decisions, however, can degrade both the running and compilation time of a program. This is especially important for a dynamically compiled language such as Java. Therefore, the heuristics that control inlining must be carefully tuned to achieve a good balance between these two costs to reduce overall total execution time. This paper develops a genetic algorithms based approach to automatically tune a dynamic compiler’s internal inlining heuristic. We evaluate our technique within the Jikes RVM [1] compiler and show a 17 % average reduction in total execution time on the SPECjvm98 benchmark suite on a Pentium-4. When applied to the DaCapo benchmark suite, our approach reduces total execution time by 37%, outperforming all existing techniques. 1
Mostly Concurrent Garbage Collection Revisited
- In Proceedings of the 18th ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications (Oct 2003), ACM
, 2003
"... The mostly concurrent garbage collection was presented in the seminal paper of Boehm et al. With the deployment of Java as a portable, secure and concurrent programming language, the mostly concurrent garbage collector turned out to be an excellent solution for Java's garbage collection task. The us ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The mostly concurrent garbage collection was presented in the seminal paper of Boehm et al. With the deployment of Java as a portable, secure and concurrent programming language, the mostly concurrent garbage collector turned out to be an excellent solution for Java's garbage collection task. The use of this collector is reported for several modern production Java Virtual Machines and it has been investigated further in academia.

