Results 1 - 10
of
34
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research
- Computer Architecture Letters
, 2002
"... Computer architects must determine how to most effectively use finite computational resources when running simulations to evaluate new architectural ideas. To facilitate efficient simulations with a range of benchmark programs, we have developed the MinneSPEC input set for the SPEC CPU 2000 benchmar ..."
Abstract
-
Cited by 174 (14 self)
- Add to MetaCart
Computer architects must determine how to most effectively use finite computational resources when running simulations to evaluate new architectural ideas. To facilitate efficient simulations with a range of benchmark programs, we have developed the MinneSPEC input set for the SPEC CPU 2000 benchmark suite. This new workload allows computer architects to obtain simulation results in a reasonable time using existing simulators. While the MinneSPEC workload is derived from the standard SPEC CPU 2000 workload, it is a valid benchmark suite in and of itself for simulation-based research. MinneSPEC also may be used to run large numbers of simulations to find "sweet spots" in the evaluation parameter space. This small number of promising design points subsequently may be investigated in more detail with the full SPEC reference workload. In the process of developing the MinneSPEC datasets, we quantify its differences in terms of function-level execution patterns, instruction mixes, and memory behaviors compared to the SPEC programs when executed with the reference inputs. We find that for some programs, the MinneSPEC profiles match the SPEC reference dataset program behavior very closely. For other programs, however, the MinneSPEC inputs produce significantly different program behavior. The MinneSPEC workload has been recognized by SPEC and is distributed with Version 1.2 and higher of the SPEC CPU 2000 benchmark suite.
Statistically rigorous Java performance evaluation
- In Proceedings of the ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA
, 2007
"... Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism at run-time causes the execution time of a Java program to differ fr ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism at run-time causes the execution time of a Java program to differ from run to run. There are a number of sources of non-determinism such as Just-In-Time (JIT) compilation and optimization in the virtual machine (VM) driven by timerbased method sampling, thread scheduling, garbage collection, and various system effects. There exist a wide variety of Java performance evaluation methodologies used by researchers and benchmarkers. These methodologies differ from each other in a number of ways. Some report average performance over a number of runs of the same experiment; others report the best or second best performance observed; yet others report the worst. Some iterate the benchmark multiple times within a single VM invocation; others consider multiple VM invocations and iterate a single benchmark execution; yet others consider multiple VM invocations and iterate the benchmark multiple times. This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.
Techniques for Obtaining High Performance in Java Programs
- ACM Computing Surveys
, 1999
"... This survey describes research directions in techniques to improve the performance of programs written in the Java programming language. The standard technique for Java execution is interpretation. A Javainterpreter dynamically executes Java bytecodes, which comprise the instruction set of the Java ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
This survey describes research directions in techniques to improve the performance of programs written in the Java programming language. The standard technique for Java execution is interpretation. A Javainterpreter dynamically executes Java bytecodes, which comprise the instruction set of the Java Virtual Machine (JVM). Execution-time performance of Java programs can be improved through compilation. Various types of Java compilers have been proposed including Just-In-Time (JIT) compilers that compile bytecodes into native processor instructions on the fly; direct compilers that directly translate the Java source code into the target processor's native language; and bytecode-to-source translators that generate either native code or an intermediate language, such as C, from the bytecodes. Some techniques, including bytecode optimization and executing Java programs in parallel, attempt to improve Javaruntime performance while maintaining Java's portability. Another alternative f...
CDx: A Family of Real-time Java Benchmarks
"... Java is becoming a viable platform for hard real-time computing. There are production and research real-time Java VMs, as well as applications in both military and civil sector. Technological advances and increased adoption of Real-time Java contrast significantly with the lack of real-time benchmar ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Java is becoming a viable platform for hard real-time computing. There are production and research real-time Java VMs, as well as applications in both military and civil sector. Technological advances and increased adoption of Real-time Java contrast significantly with the lack of real-time benchmarks. The few benchmarks that exist are either low-level synthetic micro-benchmarks, or benchmarks used internally by companies, making it difficult to independently verify and repeat reported results. This paper presents the CDx (Collision Detector) benchmark suite, an open source application benchmark suite that targets different hard and soft real-time virtual machines. CDx is, at its core, a real-time benchmark with a single periodic task, which implements aircraft collision detection based on simulated radar frames. The benchmark can be configured to use different sets of real-time features and comes with a number of workloads. We describe the architecture of the benchmark and characterize the workload based on input parameters. 1.
Versatility and versabench: A new metric and a benchmark suite for flexible architectures
, 2004
"... For the last several decades, computer architecture research has largely benefited from, and continues to be driven by ad-hoc benchmarking. Often the benchmarks are selected to represent workloads that architects believe should run on the computational platforms they design. For example, benchmark s ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
For the last several decades, computer architecture research has largely benefited from, and continues to be driven by ad-hoc benchmarking. Often the benchmarks are selected to represent workloads that architects believe should run on the computational platforms they design. For example, benchmark suites such as SPEC, Winstone, and MediaBench, which represent workstation, desktop and media workloads respectively, have influenced computer architecture innovation for the last decade. Recently, advances in VLSI technology have created an increasing interest within the computer architecture community to build a new kind of processor that is more flexible than extant general purpose processors. Such new processor architectures must efficiently support a broad class of applications including graphics, networking, and signal processing in addition to the traditional desktop workloads. Thus, given the new focus on flexibility demands, a new benchmark suite and new metrics are necessary to accurately reflect the goals of the architecture community. This paper thus proposes (i) VersaBench as a new benchmark suite, and (ii) a new Versatility measure to characterize architectural flexibility, or in other words, the ability of the architecture to effectively execute a wide array of workloads. The benchmark suite is composed of applications drawn from several domains including desktop, server, stream, and bit-level processing. The Versatility measure is a single scalar metric inspired by the SPEC paradigm. It normalizes processor performance on each benchmark by that of the highest-performing machine for that application. This paper reports the measured versatility for several existing processors, as well as for some new and emerging research processors. The benchmark suite is freely distributed, and we are actively cataloging and sharing results for various reference processors. 1.
More on Finding a Single Number to Indicate Overall Performance of a Benchmark Suite
- ACM Computer Architecture News
, 2004
"... The topic of finding a single number to summarize overall performance over a benchmark suite is continuing to be a difficult issue 14 years after Smith’s paper [1]. While significant insight into the problem has been provided by Smith [1], Hennessey and Patterson [2], Cragon [3], etc, the research c ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The topic of finding a single number to summarize overall performance over a benchmark suite is continuing to be a difficult issue 14 years after Smith’s paper [1]. While significant insight into the problem has been provided by Smith [1], Hennessey and Patterson [2], Cragon [3], etc, the research community still seems to be unclear on the correct mean to use for different performance metrics. How should metrics obtained from individual benchmarks be aggregated to present a summary of the performance over the entire suite? What are valid central tendency measures over the whole benchmark suite for speedup, CPI, IPC, MIPS, MFLOPS, cache miss rates, cache hit rates, branch misprediction rates, etc? Arithmetic mean has been touted to be appropriate for
Fft benchmarking for digital signal processing technologies
- In 17th IMEKO World Congress
, 2003
"... An appropriate choice of the computing devices employed in digital signal processing applications requires to characterize and to compare various technologies, so that the best component in terms of cost and performance can be used in a given system design. In this paper, a benchmark strategy is pre ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
An appropriate choice of the computing devices employed in digital signal processing applications requires to characterize and to compare various technologies, so that the best component in terms of cost and performance can be used in a given system design. In this paper, a benchmark strategy is presented to measure the performances of various types of digital signal processing devices. Although different metrics can be used as performance indexes, Fast Fourier Transform (FFT) computation time and Real-Time Bandwidth (RTBW) have proved to be excellent and complete performance parameters. Moreover, a new index, measuring the architectural efficiency in computing FFT, is introduced and explained. Both parameters can be used to compare several digital signal processing technologies, thus guiding designers in optimal component selection.
Java Performance Evaluation through Rigorous Replay Compilation
- In ACM Conference on Object-Oriented Programming, Systems, Languages, and Applications
, 2008
"... A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, nondetermini ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, nondeterminism due to timer-based sampling for JIT optimization, thread scheduling, and various system effects further complicate the Java performance benchmarking process. Replay compilation is a recently introduced Java performance analysis methodology that aims at controlling nondeterminism to improve experimental repeatability. The key idea of replay compilation is to control the compilation load during experimentation by inducing a pre-recorded compilation plan at replay time. Replay compilation also enables teasing apart performance effects of the application versus the virtual machine. This paper argues that in contrast to current practice which uses a single compilation plan at replay time, multiple compilation plans add statistical rigor to the replay compilation methodology. By doing so, replay compilation better accounts for the variability observed in compilation load across compilation plans. In addition, we propose matchedpair comparison for statistical data analysis. Matched-pair comparison considers the performance measurements per compilation plan before and after an innovation of interest as a pair, which enables limiting the number of compilation plans needed for accurate performance analysis compared to statistical analysis assuming unpaired measurements.
Techniques for Accurate, Accelerated Processor Simulation: An Analysis of Reduced Inputs and Sampling
, 2002
"... Detailed execution-driven simulation is an important tool for computer architecture research. It is desirable to drive these simulations with standard benchmark programs that are commonly used to evaluate existing computer systems, such as the SPEC2000 suite. Unfortunately, simulating these benchm ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Detailed execution-driven simulation is an important tool for computer architecture research. It is desirable to drive these simulations with standard benchmark programs that are commonly used to evaluate existing computer systems, such as the SPEC2000 suite. Unfortunately, simulating these benchmark programs to completion using full-detail, cycle-accurate simulation on the designated reference input sets results in intractably long simulation durations. This study evaluates and compares two techniques for combating long simulation times: reduced inputs and sampling. Our objective is to assess the ability of each to reduce simulation running times, while simultaneously minimizing the difference in the results generated by using these techniques relative to the results generated by simulating the benchmark programs to completion using the reference inputs. With the reduced input technique, new input sets are carefully generated by hand to produce run-time characteristics of the benchmark programs that are comparable to the overall characteristics produced when the programs are run with the standard inputs.
TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation
"... The slow speed of conventional execution-driven architecture simulators is a serious impediment to obtaining desirable research productivity. This paper proposes and evaluates a fast manycore processor simulation framework called Two-Phase Trace-driven Simulation(TPTS), which splits detailed timing ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The slow speed of conventional execution-driven architecture simulators is a serious impediment to obtaining desirable research productivity. This paper proposes and evaluates a fast manycore processor simulation framework called Two-Phase Trace-driven Simulation(TPTS), which splits detailed timing simulation into a trace generation phase and a trace simulation phase. Much of the simulation overhead caused by uninteresting architectural events is only incurred once during the trace generation phase and can be omitted in the repeated trace-driven simulations. We design and implement tsim, an event-driven manycore processor simulator that models detailed memory hierarchy, interconnect, and coherence protocol models based on the proposed TPTS framework. By applying aggressive event filtering, tsim achieves an impressive simulation speed of 146 MIPS, when running 16-thread parallel applications. 1.

