Results 1 - 10
of
47
Analysis of Benchmark Characteristics and Benchmark Performance Prediction
- ACM Transactions on Computer Systems
, 1992
"... Standard benchmarking provides the run times for given programs on given machines, but fails to provide insight as to why those results were obtained (either in terms of machine or program characteristics), and fails to provide run times for that program on some other machine, or some other programs ..."
Abstract
-
Cited by 99 (4 self)
- Add to MetaCart
Standard benchmarking provides the run times for given programs on given machines, but fails to provide insight as to why those results were obtained (either in terms of machine or program characteristics), and fails to provide run times for that program on some other machine, or some other programs on that machine. We have developed a machineindependent model of program execution to characterize both machine performance and program execution. By merging these machine and program characterizations, we can estimate execution time for arbitrary machine/program combinations. Our technique allows us to identify those operations, either on the machine or in the programs, which dominate the benchmark results. This information helps designers in improving the performance of future machines, and users in tuning their applications to better utilize the performance of existing machines. Here we apply our methodology to characterize benchmarks and predict their execution times. We present extensi...
Supercomputer Performance Evaluation and the Perfect Benchmarks
- In Proceedings of the 1990 ACM International Conference on Supercomputing
, 1990
"... In the past three years, the Perfect Benchmark TM Suite has evolved from a supercomputer performance evaluation plan, presented by Kuck and Sameh at the 1987 International Conference on Supercomputing, to a vigorous international activity. This paper surveys the current state of supercomputer perf ..."
Abstract
-
Cited by 64 (0 self)
- Add to MetaCart
In the past three years, the Perfect Benchmark TM Suite has evolved from a supercomputer performance evaluation plan, presented by Kuck and Sameh at the 1987 International Conference on Supercomputing, to a vigorous international activity. This paper surveys the current state of supercomputer performance evaluation with particular focus on the methodology adopted by the Perfect effort. While there has been considerable success in achieving the goals of the plan, some issues remain unresolved, and new questions have surfaced. 1 Introduction During the four decades since the invention of the transistor, performance increases in computers have been attributable, in large part, to increases in hardware speed, averaging an order of magnitude every seven years. In recent years, the progress of hardware technology has begun to slow as certain fundamental limits (ie. the speed of light and the width of the atom) have been approached. In an effort to sustain increases in the peak speed of ne...
The Case for Application-Specific Benchmarking
- In Workshop on Hot Topics in Operating Systems
, 1999
"... Most performance analysis today uses either microbenchmarks or standard macrobenchmarks (e.g., SPEC, LADDIS, the Andrew benchmark). However, the results of such benchmarks provide little information to indicate how well a particular system will handle a particular application. Such results are, at b ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
Most performance analysis today uses either microbenchmarks or standard macrobenchmarks (e.g., SPEC, LADDIS, the Andrew benchmark). However, the results of such benchmarks provide little information to indicate how well a particular system will handle a particular application. Such results are, at best, useless and, at worst, misleading. In this paper, we argue for an application-directed approach to benchmarking, using performance metrics that reflect the expected behavior of a particular application across a range of hardware or software platforms. We present three different approaches to application-specific measurement, one using vectors that characterize both the underlying system and an application, one using trace-driven techniques, and a hybrid approach. We argue that such techniques should become the new standard. 1
Automatic Accurate Time-Bound Analysis for High-Level Languages
- In Proceedings of the ACM SIGPLAN 1998 Workshop on Languages, Compilers, and Tools for Embedded Systems, volume 1474 of Lecture Notes in Computer Science
, 1998
"... This paper describes a general approach for automatic and accurate time-bound analysis. The approach consists of transformations for building time-bound functions in the presence of partially known input structures, symbolic evaluation of the time-bound function based on input parameters, optimizati ..."
Abstract
-
Cited by 36 (9 self)
- Add to MetaCart
This paper describes a general approach for automatic and accurate time-bound analysis. The approach consists of transformations for building time-bound functions in the presence of partially known input structures, symbolic evaluation of the time-bound function based on input parameters, optimizations to make the overall analysis efficient as well as accurate, and measurements of primitive parameters, all at the source-language level. We have implemented this approach and performed a number of experiments for analyzing Scheme programs. The measured worst-case times are closely bounded by the calculated bounds. 1 Introduction Analysis of program running time is important for real-time systems, interactive environments, compiler optimizations, performance evaluation, and many other computer applications. It has been extensively studied in many fields of computer science: algorithms [20, 12, 13, 41], programming languages [38, 21, 30, 33], and systems [35, 28, 32, 31]. It is particularl...
Hierarchical Tiling for Improved Superscalar Performance
- IN INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM
, 1995
"... It takes more than a good algorithm to achieve high performance: inner-loop performance and data locality are also important. Tiling is a well-known method for parallelization and for improving data locality. However, tiling has the potential of being even more beneficial. At the finest granularity, ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
It takes more than a good algorithm to achieve high performance: inner-loop performance and data locality are also important. Tiling is a well-known method for parallelization and for improving data locality. However, tiling has the potential of being even more beneficial. At the finest granularity, it can be used to guide register allocation and instruction scheduling; at the coarsest level, it can help manage magnetic storage media. It also can be useful in overlapping data movement with computation, for instance by prefetching data from archival storage, disks and main memory into cache and registers, or by choreographing data movement between processors. Hierarchical tiling is a framework for applying both known tiling methods and new techniques to an expanded set of uses. It eases the burden on several compiler phases that are traditionally treated separately, such as scalar replacement, register allocation, generation of message passing calls, and storage mapping. By explicitly ...
Micro Benchmark Analysis of the KSR1
- In Supercomputing '93
, 1993
"... A new approach, micro benchmarks, has recently been developed. Using this technique, we have analyzed the KSR1, and in particular the "ALLCACHE" memory architecture and ring interconnection. We have been able to elucidate many facets of memory performance. The technique has enabled us to identify an ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
A new approach, micro benchmarks, has recently been developed. Using this technique, we have analyzed the KSR1, and in particular the "ALLCACHE" memory architecture and ring interconnection. We have been able to elucidate many facets of memory performance. The technique has enabled us to identify and characterize parts of the memory design not described by Kendall Square Research. Our results show that a miss in the local cache can incur a penalty ranging from 7.5 microseconds to 500 microseconds (when a dirty "page" in the local cache must be evicted). The programmer must be very careful in placement and accessing of data to obtain maximum performance from the KSR1; the data presented here will help in understanding the performance actually obtained. 1. Introduction The KSR1 from Kendall Square Research is a novel new parallel computer. It is the first commercial machine embodying a scalable all cache form of shared memory architecture. In addition, there are a number of other inter...
A New Approach to I/O Performance Evaluation - Self-Scaling I/O Benchmarks, Predicted I/O Performance
, 1993
"... . Current I/O benchmarks suffer from several chronic problems: they quickly become obsolete, they do not stress the I/O system, and they do not help in understanding I/O system performance. We propose a new approach to I/O performance analysis. First, we propose a self-scaling benchmark that dynamic ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
. Current I/O benchmarks suffer from several chronic problems: they quickly become obsolete, they do not stress the I/O system, and they do not help in understanding I/O system performance. We propose a new approach to I/O performance analysis. First, we propose a self-scaling benchmark that dynamically adjusts aspects of its workload according to the performance characteristic of the system being measured. By doing so, the benchmark automatically scales across current and future systems. The evaluation aids in understanding system performance by reporting how performance varies according to each of five workload parameters. Second, we propose predicted performance, a technique for using the results from the self-scaling evaluation to quickly estimate the performance for workloads that have not been measured. We show that this technique yields reasonably accurate performance estimates and argue that this method gives a far more accurate comparative performance evaluation than tradition...
Structural Prediction Models for High-Performance Distributed Applications
- PROCEEDINGS OF THE CLUSTER COMPUTING CONFERENCE (CCC '97)
, 1997
"... We present a structural performance model that uses application profiles and component models to predict an application's performance on a set of distributed resources. We decompose application performance in accordance with the structure of the application: that is, into interacting component model ..."
Abstract
-
Cited by 27 (10 self)
- Add to MetaCart
We present a structural performance model that uses application profiles and component models to predict an application's performance on a set of distributed resources. We decompose application performance in accordance with the structure of the application: that is, into interacting component models that correspond to component tasks. Then, using the application profile and available information as guides, we select models for each component appropriately. As a proof of concept, we have implemented this approach for two distributed applications, a master-slave genetic algorithm code and a red-black stencil successive over-relaxation code. Our predictions are within 10% of actual time. Context Clusters of distributed machines have become a common platform for high performance applications, but remain a challenging environment in which to achieve good performance. One reason for this is the difficulty of predicting an application's execution time in this variable setting, where only mi...
A Software Architecture for User Transparent Parallel Image Processing on MIMD Computers
- Parallel Computing
, 2001
"... This paper describes a software architecture that allows image processing researchers to develop parallel applications in a transparent manner. The architecture's main component is an extensive library of low level image processing operations that can be run on distributed memory MIMD-style para ..."
Abstract
-
Cited by 25 (15 self)
- Add to MetaCart
This paper describes a software architecture that allows image processing researchers to develop parallel applications in a transparent manner. The architecture's main component is an extensive library of low level image processing operations that can be run on distributed memory MIMD-style parallel hardware. Since the library has an application programming interface identical to that of an existing sequential image library, all parallelism is completely hidden from the user. In this paper we give an overview of all architecture components, and show how issues related to automatic parallelization and optimization are dealt with by the application of domain specic performance models. Results obtained for a realistic application indicate that model-based optimization of a wide range of imaging software indeed is possible. 1

