Results 1 
8 of
8
Statistical Models for Automatic Performance Tuning
 In Proceedings of the 2001 International Conference on Computational Science (ICCS 2001
, 2001
"... Achieving peak performance from library subroutines usually requires extensive, machinedependent tuning by hand. Automatic tuning systems have emerged in response, and they typically operate by (1) generating a large number of possible implementations of a subroutine, and (2) selecting the fast ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
Achieving peak performance from library subroutines usually requires extensive, machinedependent tuning by hand. Automatic tuning systems have emerged in response, and they typically operate by (1) generating a large number of possible implementations of a subroutine, and (2) selecting the fastest implementation by an exhaustive, empirical search. This paper presents quantitative data that motivates the development of such a searchbased system, and discusses two problems which arise in the context of search. First, we develop a heuristic for stopping an exhaustive compiletime search early if a nearoptimal implementation is found. Second, we show how to construct runtime decision rules, based on runtime inputs, for selecting from among a subset of the best implementations.
Visualizing industrial CT volume data for nondestructive testing applications
 In Proceedings of the 14th IEEE Visualization, VIS ’03
, 2003
"... ..."
(Show Context)
Statistical Modeling of Feedback Data in an Automatic Tuning System
 in MICRO33: Third ACM Workshop on FeedbackDirected Dynamic Optimization
, 2000
"... Achieving peak performance from library subroutines usually requires extensive, machinedependent tuning by hand. Automatic tuning systems have been developed in response which typically operate, at compiletime, by (1) generating a large number of possible implementations of a subroutine, and (2) s ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Achieving peak performance from library subroutines usually requires extensive, machinedependent tuning by hand. Automatic tuning systems have been developed in response which typically operate, at compiletime, by (1) generating a large number of possible implementations of a subroutine, and (2) selecting a fast implementation by an exhaustive, empirical search. In this paper, we show how statistical modeling of the performance feedback data collected during the search phase can be used in two novel and important ways. First, we develop a heuristic for stopping an exhaustive compiletime search early if a nearoptimal implementation is found. Second, we show how to construct runtime decision rules, based on runtime inputs, for selecting from among a subset of the best implementations. We apply our methods to actual performance data collected by the PHiPAC tuning system for matrix multiply on a variety of hardware and compiler platforms.
Multiscale morphological volume segmentation and visualization
 In APVIS ’07. 2007 6th International AsiaPacific Symposium on, Vol., Iss., Feb. 2007
, 2007
"... This paper presents a multiscale morphology approach to the volume segmentation and visualization problem. The basis of the approach is applying morphological operations with spherical structuring elements at various sizes to create a representation of the volume data that encodes structural inform ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
This paper presents a multiscale morphology approach to the volume segmentation and visualization problem. The basis of the approach is applying morphological operations with spherical structuring elements at various sizes to create a representation of the volume data that encodes structural information at multiple scales. Through an interactive user interface, the user can effectively segment and visualize a specific feature of interest using a fast, region growing method with this multiscale data representation. A graph representing the segmented feature is created to facilitate interactive visual inspection and refinement of the feature. We have introduced a new volume data visualization technique based on interactive segmentation rather than the traditional transferfunction based classification approach. This new technique offers the user greater power in isolating and examining volumetric features of interest. 1
On Statistical Models in Automatic Tuning
"... . Achieving peak performance from library subroutines usually requires extensive, machinedependent tuning by hand. Automatic tuning systems have emerged in response, and they typically operate, at compiletime, by (1) generating a large number of possible implementations of a subroutine, and (2 ..."
Abstract
 Add to MetaCart
. Achieving peak performance from library subroutines usually requires extensive, machinedependent tuning by hand. Automatic tuning systems have emerged in response, and they typically operate, at compiletime, by (1) generating a large number of possible implementations of a subroutine, and (2) selecting a fast implementation by an exhaustive, empirical search. This paper applies statistical techniques to exploit the large amount of performance data collected during the search. First, we develop a heuristic for stopping an exhaustive compiletime search early if a nearoptimal implementation is found. Second, we show how to construct runtime decision rules, based on runtime inputs, for selecting from among a subset of the best implementations. We apply our methods to actual performance data collected by the PHiPAC tuning system for matrix multiply on a variety of hardware platforms. 1
Author Retrospective for Optimizing Matrix Multiply using PHiPAC: a Portable HighPerformance ANSI C Coding Methodology
"... PHiPAC was an early attempt to improve software performance by searching in a large design space of possible implementations to find the best one. At the time, in the early 1990s, the most efficient numerical linear algebra li ..."
Abstract
 Add to MetaCart
(Show Context)
PHiPAC was an early attempt to improve software performance by searching in a large design space of possible implementations to find the best one. At the time, in the early 1990s, the most efficient numerical linear algebra li
A role for Pareto optimality in mining performance data
"... Improvements in performance modeling and identification of computational regimes within software libraries is a critical first step in developing software libraries that are truly agile with respect to the application as well as to the hardware. It is shown here that Pareto ranking, a concept from m ..."
Abstract
 Add to MetaCart
Improvements in performance modeling and identification of computational regimes within software libraries is a critical first step in developing software libraries that are truly agile with respect to the application as well as to the hardware. It is shown here that Pareto ranking, a concept from multiobjective optimization, can be an effective tool for mining large performance datasets. The approach is illustrated using software performance data gathered using both the public domain LAPACK library and an asynchronous communication library based on IBM LAPI active message library. 1.
AN AUTOMATICALLYTUNED SORTING LIBRARY ERAN BIDA AND SIVAN TOLEDO
"... We present atsl, an automaticallytuned sorting library. Atsl generates an incore sorting routine optimized to the target machine for a specific data type. Atsl finds a highperformance sorting routine by searching an algorithmic space that we have defined. The search space includes basic sorting a ..."
Abstract
 Add to MetaCart
(Show Context)
We present atsl, an automaticallytuned sorting library. Atsl generates an incore sorting routine optimized to the target machine for a specific data type. Atsl finds a highperformance sorting routine by searching an algorithmic space that we have defined. The search space includes basic sorting algorithms and automaticallygenerated compositions of sorting algorithms. Performance measurements are used both for ranking candidate algorithms and for characterizing the behavior of candidates in specific settings (ranges of array sizes). These characterizations allow atsl to generate hybrid algorithms that intelligently exploit the strengths of particular algorithms, such as high speed at specific inputsize ranges. Many sorting algorithms can be tuned using numeric parameters. Atsl searches these parameter spaces to find values that yield high performance on the target machine. The building blocks from which atsl synthesizes sorting algorithms include adaptations of many of the most effective handtuned sorting routines, including several that are tuned for cache efficiency. An extensive experimental evaluation shows that atsl generates highperformance codes that are well tuned for the target machine and data type. The experiments were conducted on six different machines, of several architectures, and with three different compilers. The algorithms that are generated are fast; in particular, they beat the handtuned building blocks and the compiler’s C++ builtin sorting routine. The algorithms that atsl generates on different machines and using different compilers are different from each other. 1.