Results 1 - 10
of
41
Blocking and Array Contraction Across Arbitrarily Nested Loops Using Affine Partitioning
, 2001
"... Applicable to arbitrary sequences and nests of loops, affine partitioning is a program transformation framework that unifies many previously proposed loop transformations, including unimodular transforms, fusion, fission, reindexing, scaling and statement reordering. Algorithms based on affine parti ..."
Abstract
-
Cited by 60 (1 self)
- Add to MetaCart
Applicable to arbitrary sequences and nests of loops, affine partitioning is a program transformation framework that unifies many previously proposed loop transformations, including unimodular transforms, fusion, fission, reindexing, scaling and statement reordering. Algorithms based on affine partitioning have been shown to be effective for parallelization and communication minimization. This paper presents algorithms that improve data locality using affine partitioning. Blocking and array contraction are two important optimizations that have been shown to be useful for data locality. Blocking creates a set of inner loops so that data brought into the faster levels of the memory hierarchy can be reused. Array contraction reduces an array to a scalar variable and thereby reduces the number of memory operations executed and the memory footprint. Loop transforms are often necessary to make blocking and array contraction possible.
Finding effective optimization phase sequences
- In Proceedings of the 2003 ACM SIGPLAN conference on Language, Compiler, and Tool for Embedded Systems
, 2003
"... It has long been known that a single ordering of optimization phases will not produce the best code for every application. This phase ordering problem can be more severe when generating code for embedded systems due to the need to meet conflicting constraints on time, code size, and power consumptio ..."
Abstract
-
Cited by 45 (11 self)
- Add to MetaCart
It has long been known that a single ordering of optimization phases will not produce the best code for every application. This phase ordering problem can be more severe when generating code for embedded systems due to the need to meet conflicting constraints on time, code size, and power consumption. Given that many embedded application developers are willing to spend time tuning an application, we believe a viable approach is to allow the developer to steer the process of optimizing a function. In this paper, we describe support in VISTA, an interactive compilation system, for finding effective sequences of optimization phases. VISTA provides the user with dynamic and static performance information that can be used during an interactive compilation session to gauge the progress of improving the code. In addition, VISTA provides support for automatically using performance information to select the best optimization sequence among several attempted. One such feature is the use of a genetic algorithm to search for the most efficient sequence based on specified fitness criteria. We hav e included a number of experimental results that evaluate the effectiveness of using a genetic algorithm in VISTA to find effective optimization phase sequences.
Revisiting the sequential programming model for multi-core
- In Proceedings of the 40th Annual ACM/IEEE International Symposium on Microarchitecture
, 2007
"... Single-threaded programming is already considered a complicated task. The move to multi-threaded programming only increases the complexity and cost involved in software development due to rewriting legacy code, training of the programmer, increased debugging of the program, and efforts to avoid race ..."
Abstract
-
Cited by 33 (4 self)
- Add to MetaCart
Single-threaded programming is already considered a complicated task. The move to multi-threaded programming only increases the complexity and cost involved in software development due to rewriting legacy code, training of the programmer, increased debugging of the program, and efforts to avoid race conditions, deadlocks, and other problems associated with parallel programming. To address these costs, other approaches, such as automatic thread extraction, have been explored. Unfortunately, the amount of parallelism that has been automatically extracted is generally insufficient to keep many cores busy. This paper argues that this lack of parallelism is not an intrinsic limitation of the sequential programming model, but rather occurs for two reasons. First, there exists no framework for automatic thread extraction that brings together key existing state-of-the-art compiler and hardware techniques. This paper shows that such a framework can yield scalable parallelization on several SPEC CINT2000 benchmarks. Second, existing sequential programming languages force programmers to define a single legal program outcome, rather than allowing for a range of legal outcomes. This paper shows that natural extensions to the sequential programming model enable parallelization for the remainder of the SPEC CINT2000 suite. Our experience demonstrates that, by changing only 60 source code lines, all of the C benchmarks in the SPEC CINT2000 suite were parallelizable by automatic thread extraction. This process, constrained by the limits of modern optimizing compilers, yielded a speedup of 454 % on these applications. 1
Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs
- in ACM SIGSOFT Symposium on the Foundations of Software Engineering
, 2003
"... This paper proposes a pointer alias analysis for automatic error detection. State-of-the-art pointer alias analyses are either too slow or too imprecise for finding errors in real-life programs. We propose a hybrid pointer analysis that tracks actively manipulated pointers held in local variables an ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
This paper proposes a pointer alias analysis for automatic error detection. State-of-the-art pointer alias analyses are either too slow or too imprecise for finding errors in real-life programs. We propose a hybrid pointer analysis that tracks actively manipulated pointers held in local variables and parameters accurately with path and context sensitivity and handles pointers stored in recursive data structures less precisely but efficiently. We make the unsound assumption that pointers passed into a procedure, in parameters, global variables, and locations reached by applying simple access paths to parameters and global variables, are all distinct from each other and from any other locations. This assumption matches the semantics of many functions, reduces spurious aliases and speeds up the analysis. We present a program representation...
Exposing speculative thread parallelism in SPEC2000
- In Proc. of PPoPP’05
, 2005
"... As increasing the performance of single-threaded processors becomes increasingly difficult, consumer desktop processors are moving toward multi-core designs. One way to enhance the performance of chip multiprocessors that has received considerable attention is the use of thread-level speculation (TL ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
As increasing the performance of single-threaded processors becomes increasingly difficult, consumer desktop processors are moving toward multi-core designs. One way to enhance the performance of chip multiprocessors that has received considerable attention is the use of thread-level speculation (TLS). As a case study, we manually parallelized several of the SPEC CPU2000 floating point and integer applications using TLS. The use of manual parallelization enabled us to apply techniques and programmer expertise that are beyond the current capabilities of automated parallelizers. With the experience gained from this, we provide insight into ways to aggressively apply TLS to parallelize applications for high performance. This information can help guide future advanced TLS compiler design. For each application, we discuss how and where parallelism was located within the application, the impediments to extracting this parallelism using TLS, and the code transformations that were required to overcome these impediments. We also generalize these experiences to a discussion of common hindrances to TLS parallelization, and describe methods of programming that help expose application parallelism to TLS systems. These guidelines can assist developers of uniprocessor programs to create applications that can easily port to TLS systems and yield good performance. By using manual parallelization on SPEC2000, we provide guidance on where thread-level parallelism exists in these well known benchmarks, what limits its extraction, how to reduce these limitations and what performance can be expected on these applications from a chip multiprocessor system with TLS.
Static Single Information Form
- Master's thesis, Massachussets Institute of Technology
, 1999
"... This paper presents a new intermediate format called Static Single Information (SSI) form. SSI form generalizes the traditional concept of a variable de nition to include all information de nition points, or points where the analysis may obtain information about the value in a variable. Informatio ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
This paper presents a new intermediate format called Static Single Information (SSI) form. SSI form generalizes the traditional concept of a variable de nition to include all information de nition points, or points where the analysis may obtain information about the value in a variable. Information de nition points include conditional branches as well as assignments. Because SSI form provides a new name for each variable at each information de nition point, it provides excellent support for both predicated analyses, which exploit information gained from conditionals, and backwards dataow analyses.
Resolution, Optimization, and Encoding of Pointer Variables for the Behavioral Synthesis from C
, 2001
"... As designers may model mixed hardware--software systems using a subset of or ++, we present SpC, a solution to synthesize and optimize hardware models with pointers. In hardware, a pointer is not only the address of data in memory, but it may also reference data mapped to registers, ports, or wires. ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
As designers may model mixed hardware--software systems using a subset of or ++, we present SpC, a solution to synthesize and optimize hardware models with pointers. In hardware, a pointer is not only the address of data in memory, but it may also reference data mapped to registers, ports, or wires. Pointer analysis is used to find the set of locations each pointer may reference in a program at compile time. In this paper, we address the problem of synthesizing and optimizing pointers to multiple variables or array elements. The value of the pointers are encoded and branching statements are used to dynamically access data referenced by pointers. A heuristic is used to efficiently encode the values of the pointers. Compiler techniques are also used to reduce storage before loads and stores. An implementation using the SUIF framework (Wilson et al., 1994; SUIF Compiler Framework) is presented, followed by some case studies and experimental results.
Using Visualization To Understand The Behavior Of Computer Systems
, 2001
"... As computer systems continue to grow rapidly in both complexity and scale, developers need tools to help them understand the behavior and performance of these systems. While information visualization is a promising technique, most existing computer systems visualizations have focused on very specifi ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
As computer systems continue to grow rapidly in both complexity and scale, developers need tools to help them understand the behavior and performance of these systems. While information visualization is a promising technique, most existing computer systems visualizations have focused on very specific problems and data sources, limiting their applicability. This dissertation introduces Rivet, a general-purpose environment for the development of computer systems visualizations. Rivet can be used for both real-time and post-mortem analyses of data from a wide variety of sources. The modular architecture of Rivet enables sophisticated visualizations to be assembled using simple building blocks representing the data, the visual representations, and the mappings between them. The implementation of Rivet enables the rapid prototyping of visualizations through a scripting language interface while still providing high-performance graphics and data management. The effectiveness of Rivet as a tool for computer systems analysis is demonstrated through a collection of case studies. Visualizations created using Rivet have been used to display: (a) line-by-line execution data from the SUIF Explorer interactive parallelizing compiler, enabling programmers to maximize the parallel speedups of their applications; (b) detailed memory system utilization data from the FlashPoint memory profiler, providing insights on both sequential and parallel program bottlenecks; (c) the behavior of applications running on superscalar processors, allowing developers to take full advantage of these complex CPUs; and (d) the real-time performance of computer systems and clusters, drawing attention to interesting or anomalous behavior. In addition to these focused examples, Rivet has been also used in co...
An empirical study of static program slice size
- ACM Transactions on Software Engineering and Methodology (TOSEM
, 2007
"... Abstract This paper presents results from a study of all slices from 43 programs, ranging up to 136,000 lines of code in size. The study investigates the effect of five aspects that affect slice size. Three slicing algorithms are used to study two algorithmic aspects: calling-context treatment and s ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Abstract This paper presents results from a study of all slices from 43 programs, ranging up to 136,000 lines of code in size. The study investigates the effect of five aspects that affect slice size. Three slicing algorithms are used to study two algorithmic aspects: calling-context treatment and slice granularity. The remaining three aspects affect the upstream dependencies considered by the slicer. These include collapsing structure fields, removal of dead code, and the influence of points-to analysis. The results show that, for the most precise slicer, the average slice contains just under one third of the program. Furthermore, ignoring calling context causes a 50 % increase in slice size and (coarse grained) function-level slices, while 33 % larger than the corresponding statement-level slice, may be useful predictors of the (finer grained) statement-level slice size. Finally, the upstream analyses have an order of magnitude less influence on slice size. 1 Introduction A static program slice is a semantically meaningful portion of a program that captures a subset of the program's computation [80]. A slice is computed from a `slicing criterion ' (a program point and variable of interest). Two 1
An Empirical Study of Static Program Slice Size
"... This article presents results from a study of all slices from 43 programs, ranging up to 136,000 lines of code in size. The study investigates the effect of five aspects that affect slice size. Three slicing algorithms are used to study two algorithmic aspects: calling-context treatment and slice gr ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This article presents results from a study of all slices from 43 programs, ranging up to 136,000 lines of code in size. The study investigates the effect of five aspects that affect slice size. Three slicing algorithms are used to study two algorithmic aspects: calling-context treatment and slice granularity. The remaining three aspects affect the upstream dependencies considered by the slicer. These include collapsing structure fields, removal of dead code, and the influence of points-to analysis. The results show that for the most precise slicer, the average slice contains just under one-third of the program. Furthermore, ignoring calling context causes a 50 % increase in slice size, and while (coarse-grained) function-level slices are 33 % larger than corresponding statement-level slices, they may be useful predictors of the (finer-grained) statement-level slice size. Finally, upstream analyses have an order of magnitude less influence on slice size.

