• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Procedure merging with instruction caches (0)

by Scott Mcfarling
Venue:In PLDI
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 37
Next 10 →

Optimally Profiling and Tracing Programs

by Thomas Ball, James R. Larus, Thomas Ball, James R. Larus - ACM Transactions on Programming Languages and Systems , 1994
"... copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others ..."
Abstract - Cited by 256 (17 self) - Add to MetaCart
copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications

Cache-Conscious Data Placement

by Brad Calder, Chandra Krintz, Simmi John, Todd Austin - in Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems , 1998
"... As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction cache performance by mapping code with temporal locality to different cache blocks in the vir ..."
Abstract - Cited by 131 (3 self) - Add to MetaCart
As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been used to improve instruction cache performance by mapping code with temporal locality to different cache blocks in the virtual address space eliminating cache conflicts. These code placement techniques can be applied directly to the problem of placing data for improved data cache performance. In this paper we present a general framework for Cache Conscious Data Placement. This is a compiler directed approach that creates an address placement for the stack (local variables), global variables, heap objects, and constants in order to reduce data cache misses. The placement of data objects is guided by a temporal relationship graph between objects generated via profiling. Our results show that profile driven data placement significantly reduces the data miss rate by 24% on average. 1 Introduction Much effort has b...

The Design and Implementation of the SELF Compiler, an Optimizing Compiler for Object-Oriented Programming Languages

by Craig Chambers, John Hennessy, Mark Linton , 1992
"... Object-oriented programming languages promise to improve programmer productivity by supporting abstract data types, inheritance, and message passing directly within the language. Unfortunately, traditional implementations of object-oriented language features, particularly message passing, have been ..."
Abstract - Cited by 120 (15 self) - Add to MetaCart
Object-oriented programming languages promise to improve programmer productivity by supporting abstract data types, inheritance, and message passing directly within the language. Unfortunately, traditional implementations of object-oriented language features, particularly message passing, have been much slower than traditional implementations of their non-object-oriented counterparts: the fastest existing implementation of Smalltalk-80 runs at only a tenth the speed of an optimizing C implementation. The dearth of suitable implementation technology has forced most object-oriented languages to be designed as hybrids with traditional non-object-oriented languages, complicating the languages and making programs harder to extend and reuse. This dissertation describes a collection of implementation techniques that can improve the run-time performance of object-oriented languages, in hopes of reducing the need for hybrid languages and encouraging wider spread of purely object-oriented langu...

ADAPTIVE OPTIMIZATION FOR SELF: RECONCILING HIGH PERFORMANCE WITH EXPLORATORY PROGRAMMING

by Urs Hölzle , 1994
"... Object-oriented programming languages confer many benefits, including abstraction, which lets the programmer hide the details of an object’s implementation from the object’s clients. Unfortunately, crossing abstraction boundaries often incurs a substantial run-time overhead in the form of frequent p ..."
Abstract - Cited by 95 (6 self) - Add to MetaCart
Object-oriented programming languages confer many benefits, including abstraction, which lets the programmer hide the details of an object’s implementation from the object’s clients. Unfortunately, crossing abstraction boundaries often incurs a substantial run-time overhead in the form of frequent procedure calls. Thus, pervasive use of abstraction, while desirable from a design standpoint, may be impractical when it leads to inefficient programs. Aggressive compiler optimizations can reduce the overhead of abstraction. However, the long compilation times introduced by optimizing compilers delay the programming environment‘s responses to changes in the program. Furthermore, optimization also conflicts with source-level debugging. Thus, programmers are caught on the horns of two dilemmas: they have to choose between abstraction and efficiency, and between responsive programming environments and efficiency. This dissertation shows how to reconcile these seemingly contradictory goals by performing optimizations lazily. Four new techniques work together to achieve high performance and high responsiveness: • Type feedback achieves high performance by allowing the compiler to inline message sends based on information extracted from the runtime system. On average, programs run 1.5 times faster than the previous SELF system; compared to a commercial Smalltalk implementation, two medium-sized benchmarks run about three times faster. This level of performance is obtained with a compiler that is both simpler and faster than previous SELF compilers. • Adaptive optimization achieves high responsiveness without sacrificing performance by using a fast nonoptimizing compiler to generate initial code while automatically recompiling heavily used parts of the program with an optimizing compiler. On a previous-generation workstation like the SPARCstation-2, fewer than 200 pauses exceeded 200 ms during a 50-minute interaction, and 21 pauses exceeded one second. On a currentgeneration workstation, only 13 pauses exceed 400 ms. • Dynamic deoptimization shields the programmer from the complexity of debugging optimized code by transparently recreating non-optimized code as needed. No matter whether a program is optimized or not, it can always be stopped, inspected, and single-stepped. Compared to previous approaches, deoptimization allows more debugging while placing fewer restrictions on the optimizations that can be performed. • Polymorphic inline caching generates type-case sequences on-the-fly to speed up messages sent from the same call site to several different types of object. More significantly, they collect concrete type information for the optimizing compiler. With better performance yet good interactive behavior, these techniques make exploratory programming possible both for pure object-oriented languages and for application domains requiring higher ultimate performance, reconciling exploratory programming, ubiquitous abstraction, and high performance.

Simple and Effective Link-Time Optimization of Modula-3 Programs

by Mary F. Fernández , 1994
"... ..."
Abstract - Cited by 75 (3 self) - Add to MetaCart
Abstract not found

Optimizing Instruction Cache Performance for Operating System Intensive Workloads

by Josep Torrellas, Chun Xia, Russell Daigle - IEEE Transactions on Computers , 1995
"... High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been applied to application code only, even though ther ..."
Abstract - Cited by 61 (11 self) - Add to MetaCart
High instruction cache hit rates are key to high performance. One known technique to improve the hit rate of caches is to use an optimizing compiler to minimize cache interference via an improved layout of the code. This technique, however, has been applied to application code only, even though there is evidence that the operating system often uses the cache heavily and with less uniform patterns than applications. Therefore, it is unknown how well existing optimizations perform for systems code and whether better optimizations can be found. We address this problem in this paper. This paper characterizes in detail the locality patterns of the operating system code and shows that there is substantial locality. Unfortunately, caches are not able to extract much of it: rarely-executed special-case code disrupts spatial locality, loops with few iterations that call routines make loop locality hard to exploit, and plenty of loop-less code hampers temporal locality. As a result, interference...

SPAID: Software Prefetching in Pointer- and Call-Intensive Environments

by Mikko H. Lipasti, William J. Schmidt, Steven R. Kunkel, Robert R. Roediger - In Proceedings of the 28th annual international symposium on Microarchitecture , 1995
"... Software prefetching, typically in the context of numericor loop-intensive benchmarks, has been proposed as one remedy for the performance bottleneck imposed on computer systems by the cost of servicing cache misses. This paper proposes a new heuristic--SPAID--for utilizing prefetch instructions in ..."
Abstract - Cited by 58 (3 self) - Add to MetaCart
Software prefetching, typically in the context of numericor loop-intensive benchmarks, has been proposed as one remedy for the performance bottleneck imposed on computer systems by the cost of servicing cache misses. This paper proposes a new heuristic--SPAID--for utilizing prefetch instructions in pointer- and call-intensive environments. We use trace-driven cache simulation of a number of pointer- and call-intensive benchmarks to evaluate the benefits and implementation trade-offs of SPAID. Our results indicate that a significant proportion of the cost of data cache misses can be eliminated or reduced with SPAID without unduly increasing memory traffic. 1. Introduction It is well known that processor clock speeds are increasing exponentially over time, while memory speeds are not increasing nearly as rapidly [RD94]. The computing industry has reached the point where system performance is dominated by the cost of servicing cache misses. To address this problem, several instruction s...

PLTO: A Link-Time Optimizer for the Intel IA-32 Architecture

by Benjamin Schwarz, Saumya Debray, Gregory Andrews, Matthew Legendre - In Proc. 2001 Workshop on Binary Translation (WBT-2001 , 2001
"... tool we have developed for the Intel IA-32 architecture. A number of characteristics of this architecture complicate the task of link-time optimization. These include a large number of op-codes and addressing modes, which increases the complexity of program analysis; variable-length instructions, wh ..."
Abstract - Cited by 50 (15 self) - Add to MetaCart
tool we have developed for the Intel IA-32 architecture. A number of characteristics of this architecture complicate the task of link-time optimization. These include a large number of op-codes and addressing modes, which increases the complexity of program analysis; variable-length instructions, which complicates disassembly of machine code; a paucity of available registers, which limits the extent of some optimizations; and a reliance on using memory locations for holding values and for parameter passing, which complicates program analysis and optimization. We describe how PLTO addresses these problems and the resulting performance improvements it is able to achieve.

FIAT: A Framework for Interprocedural Analysis and Transformation

by Mary W. Hall, John Mellor-Crummey, Alan Carle, René Rodríguez , 1995
"... Modern architectures with deep memory hierarchies or parallehsm require the use of increasingly sophisticated code analysis and optimization to achieve maximum performance for large, scientific programs. In such ..."
Abstract - Cited by 48 (7 self) - Add to MetaCart
Modern architectures with deep memory hierarchies or parallehsm require the use of increasingly sophisticated code analysis and optimization to achieve maximum performance for large, scientific programs. In such

Obtaining Sequential Efficiency for Concurrent Object-Oriented Languages

by John Plevyak, Xingbin Zhang, Andrew A. Chien - In Proceedings of the ACM Symposium on the Principles of Programming Languages , 1995
"... Concurrent object-oriented programming (COOP) languages focus the abstraction and encapsulation power of abstract data types on the problem of concurrency control. In particular, pure fine-grained concurrent object-oriented languages (as opposed to hybrid or data parallel) provides the programmer wi ..."
Abstract - Cited by 47 (15 self) - Add to MetaCart
Concurrent object-oriented programming (COOP) languages focus the abstraction and encapsulation power of abstract data types on the problem of concurrency control. In particular, pure fine-grained concurrent object-oriented languages (as opposed to hybrid or data parallel) provides the programmer with a simple, uniform, and flexible model while exposing maximum concurrency. While such languages promise to greatly reduce the complexity of large-scale concurrent programming, the popularity of these languages has been hampered by efficiency which is often many orders of magnitude less than that of comparable sequential code. We present a sufficient set of techniques which enables the efficiency of fine-grained concurrent object-oriented languages to equal that of traditional sequential languages (like C) when the required data is available. These techniques are empirically validated by the application to a COOP implementation of the Livermore Loops. 1 Introduction The increasing use of ...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University