Results 1 - 10
of
23
Measuring program similarity: Experiments with SPEC CPU benchmark suites
- in Proceedings of the 2005 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’05
, 2005
"... It is essential that a subset of benchmark programs used to evaluate an architectural enhancement, is well distributed within the target workload space rather than clustered in specific areas. Past efforts for identifying subsets have primarily relied on using microarchitecturedependent metrics of p ..."
Abstract
-
Cited by 50 (8 self)
- Add to MetaCart
It is essential that a subset of benchmark programs used to evaluate an architectural enhancement, is well distributed within the target workload space rather than clustered in specific areas. Past efforts for identifying subsets have primarily relied on using microarchitecturedependent metrics of program performance, such as cycles per instruction and cache miss-rate. The shortcoming of this technique is that the results could be biased by the idiosyncrasies of the chosen configurations. The objective of this paper is to present a methodology to measure similarity of programs based on their inherent microarchitecture-independent characteristics which will make the results applicable to any microarchitecture. We apply our methodology to the SPEC CPU2000 benchmark suite and demonstrate that a subset of 8 programs can be used to effectively represent the entire suite. We validate the usefulness of this subset by using it to estimate the average IPC and L1 data cache miss-rate of the entire suite. The average IPC of 8-way and 16-way issue superscalar processor configurations could be estimated with 3.9 % and 4.4 % error respectively. This methodology is applicable not only to find subsets from a benchmark suite, but also to identify programs for a benchmark suite from a list of potential candidates. Studying the four generations of SPEC CPU benchmark suites, we find that other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained the same. 1.
Efficiently exploring architectural design spaces via predictive modeling
- in Proc. of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2006
"... Architects use cycle-by-cycle simulation to evaluate design choices and understand tradeoffs and interactions among design parameters. Efficiently exploring exponential-size design spaces with many interacting parameters remains an open problem: the sheer number of experiments renders detailed simul ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
Architects use cycle-by-cycle simulation to evaluate design choices and understand tradeoffs and interactions among design parameters. Efficiently exploring exponential-size design spaces with many interacting parameters remains an open problem: the sheer number of experiments renders detailed simulation intractable. We attack this problem via an automated approach that builds accurate, confident predictive design-space models. We simulate sampled points, using the results to teach our models the function describing relationships among design parameters. The models produce highly accurate performance estimates for other points in the space, can be queried to predict performance impacts of architectural changes, and are very fast compared to simulation, enabling efficient discovery of tradeoffs among parameters in different regions. We validate our approach via sensitivity studies on memory hierarchy and CPU design spaces: our models generally predict IPC with only 1-2% error and reduce required simulation by two orders of magnitude. We also show the efficacy of our technique for exploring chip multiprocessor (CMP) design spaces: when trained on a 1 % sample drawn from a CMP design space with 250K points and up to 55× performance swings among different system configurations, our models predict performance with only 4-5 % error on average. Our approach combines with techniques to reduce time per simulation, achieving net time savings of three-four orders of magnitude.
Performance prediction based on inherent program similarity
- In PACT
, 2006
"... A key challenge in benchmarking is to predict the performance of an application of interest on a number of platforms in order to determine which platform yields the best performance. This paper proposes an approach for doing this. We measure a number of microarchitecture-independent characteristics ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
A key challenge in benchmarking is to predict the performance of an application of interest on a number of platforms in order to determine which platform yields the best performance. This paper proposes an approach for doing this. We measure a number of microarchitecture-independent characteristics from the application of interest, and relate these characteristics to the characteristics of the programs from a previously profiled benchmark suite. Based on the similarity of the application of interest with programs in the benchmark suite, we make a performance prediction of the application of interest. We propose and evaluate three approaches (normalization, principal components analysis and genetic algorithm) to transform the raw data set of microarchitecture-independent characteristics into a benchmark space in which the relative distance is a measure for the relative performance differences. We evaluate our approach using all of the SPEC CPU2000 benchmarks and real hardware performance numbers from the SPEC website. Our framework estimates per-benchmark machine ranks with a 0.89 average and a 0.80 worst case rank correlation coefficient.
Exploiting program microarchitecture independent characteristics and phase behavior for reduced benchmark suite simulation
- In IISWC
, 2005
"... Abstract — Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to complete. Simulating the full execution of the whole benchmark suite for one architecture configuration can take months. To addres ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
Abstract — Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to complete. Simulating the full execution of the whole benchmark suite for one architecture configuration can take months. To address this issue researchers have examined using targetted sampling based on phase behavior to significantly reduce the simulation time of each program in the benchmark suite. However, even with this sampling approach, simulating the full benchmark suite across a large range of architecture designs can take days to weeks to complete. The goal of this paper is to further reduce simulation time for architecture design space exploration. We reduce simulation time by finding similarity between benchmarks and program inputs at the level of samples (100M instructions of execution). This allows us to use a representative sample of execution from one benchmark to accurately represent a sample of execution of other benchmarks and inputs. The end result of our analyis is a small number of sample points of execution. These are selected across the whole benchmark suite in order to accurately represent the complete simulation of the whole benchmark suite for design space exploration. We show that this provides approximately the same accuracy as the SimPoint sampling approach while reducing the number of simulated instructions by a factor of 1.5. I.
Measuring Benchmark Similarity Using Inherent Program Characteristics,” Laboratory of Computer Architecture
, 2006
"... Abstract—This paper proposes a methodology for measuring the similarity between programs based on their inherent microarchitecture-independent characteristics, and demonstrates two applications for it: 1) finding a representative subset of programs from benchmark suites and 2) studying the evolution ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Abstract—This paper proposes a methodology for measuring the similarity between programs based on their inherent microarchitecture-independent characteristics, and demonstrates two applications for it: 1) finding a representative subset of programs from benchmark suites and 2) studying the evolution of four generations of SPEC CPU benchmark suites. Using the proposed methodology, we find a representative subset of programs from three popular benchmark suites—SPEC CPU2000, MediaBench, and MiBench. We show that this subset of representative programs can be effectively used to estimate the average benchmark suite IPC, L1 data cache miss-rates, and speedup on 11 machines with different ISAs and microarchitectures—this enables one to save simulation time with little loss in accuracy. From our study of the similarity between the four generations of SPEC CPU benchmark suites, we find that, other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained unchanged. Index Terms—Measurement techniques, modeling techniques, performance of systems, performance attributes. æ 1
Midatasets: Creating the conditions for a more realistic evaluation of iterative optimization
- In Proceedings of the International Conference on High Performance Embedded Architectures & Compilers (HiPEAC
, 2007
"... Abstract. Iterative optimization has become a popular technique to obtain improvements over the default settings in a compiler for performance-critical applications, such as embedded applications. An implicit assumption, however, is that the best configuration found for any arbitrary data set will w ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
Abstract. Iterative optimization has become a popular technique to obtain improvements over the default settings in a compiler for performance-critical applications, such as embedded applications. An implicit assumption, however, is that the best configuration found for any arbitrary data set will work well with other data sets that a program uses. In this article, we evaluate that assumption based on 20 data sets per benchmark of the MiBench suite. We find that, though a majority of programs exhibit stable performance across data sets, the variability can significantly increase with many optimizations. However, for the best optimization configurations, we find that this variability is in fact small. Furthermore, we show that it is possible to find a compromise optimization configuration across data sets which is often within 5 % of the best possible configuration for most data sets, and that the iterative process can converge in less than 20 iterations (for a population of 200 optimization configurations). All these conclusions have significant and positive implications for the practical utilization of iterative optimization. 1
Experiments with Subsetting Benchmark Suites
"... Benchmarks are one of the most popular tools to compare the performance of computing systems. Benchmark suites typically contain multiple benchmark programs with more or less the same properties. Hence the suite contains redundancy, which increases the cost of executing or simulating the benchmark s ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Benchmarks are one of the most popular tools to compare the performance of computing systems. Benchmark suites typically contain multiple benchmark programs with more or less the same properties. Hence the suite contains redundancy, which increases the cost of executing or simulating the benchmark suite without adding value. To limit simulation time, researchers frequently subset benchmark suites. However, correctly identifying a representative subset is of paramount importance to perform a trustworthy evaluation. This paper shows that subsetting a benchmark suite in such a way that representativeness of the suite is maintained is non-trivial. We show that a small randomly selected subset is not representative of the full benchmark suite. We discuss algorithms to subset the SPEC CPU 2000 benchmark suite and show that they provide more representative subsets than randomly selected subsets. However, the algorithms evaluated in this paper do not always compute representative subsets: the algorithms produce bad results for some subset sizes. In this sense, these algorithms are unreliable, as it remains necessary to validate the benchmark suite subset. We find one subsetting algorithm that is reliable. It is, however, uncertain whether this algorithm is also reliable under other circumstances.
Accurate statistical approaches for generating representative workload compositions
- In Proceedings of the 2005 IEEE International Symposium on Workload Characterization
, 2005
"... Abstract — Composing a representative workload is a crucial step during the design process of a microprocessor. The workload should be composed in such a way that it is representative for the target domain of application and yet, the amount of redundancy in the workload should be minimized as much a ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Abstract — Composing a representative workload is a crucial step during the design process of a microprocessor. The workload should be composed in such a way that it is representative for the target domain of application and yet, the amount of redundancy in the workload should be minimized as much as possible in order not to overly increase the total simulation time. As a result, there is an important trade-off that needs to be made between workload representativeness and simulation accuracy versus simulation speed. Previous work used statistical data analysis techniques to identify representative benchmarks and corresponding inputs, also called a subset, from a large set of potential benchmarks and inputs. These methodologies measure a number of program characteristics on which Principal Components Analysis (PCA) is applied before identifying distinct program behaviors among the benchmarks using cluster analysis. In this paper we propose Independent Components Analysis (ICA) as a better alternative to PCA as it does not assume that the original data set has a Gaussian distribution, which allows ICA to better find the important axes in the workload space. Our experimental results using SPEC CPU2000 benchmarks show that ICA significantly outperforms PCA in that ICA achieves smaller benchmark subsets that are more accurate than those found by PCA. I.
Efficient Architectural Design Space Exploration via Predictive Modeling
"... Efficiently exploring exponential-size architectural design spaces with many interacting parameters remains an open problem: the sheer number of experiments required renders detailed simulation intractable. We attack this via an automated approach that builds accurate predictive models. We simulate ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Efficiently exploring exponential-size architectural design spaces with many interacting parameters remains an open problem: the sheer number of experiments required renders detailed simulation intractable. We attack this via an automated approach that builds accurate predictive models. We simulate sampled points, using results to teach our models the function describing relationships among design parameters. The models can be queried and are very fast, enabling efficient design tradeoff discovery. We validate our approach via two uniprocessor sensitivity studies, predicting IPC with only 1-2 % error. In an experimental study using the approach, training on 1 % of a 250Kpoint CMP design space allows our models to predict performance with only 4-5 % error. Our predictive modeling combines well with techniques that reduce the time taken by each simulation experiment, achieving net time savings of three-four orders of magnitude.
Comparing benchmarks using key microarchitectureindependent characteristics. Accepted for
- IEEE International Symposium on Workload Characterization (IISWC-2006), October 2006
, 2006
"... Abstract — Understanding the behavior of emerging workloads is important for designing next generation microprocessors. For addressing this issue, computer architects and performance analysts build benchmark suites of new application domains and compare the behavioral characteristics of these benchm ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Abstract — Understanding the behavior of emerging workloads is important for designing next generation microprocessors. For addressing this issue, computer architects and performance analysts build benchmark suites of new application domains and compare the behavioral characteristics of these benchmark suites against well-known benchmark suites. Current practice typically compares workloads based on microarchitecture-dependent characteristics generated from running these workloads on real hardware. There is one pitfall though with comparing benchmarks using microarchitecture-dependent characteristics, namely that completely different inherent program behavior may yield similar microarchitecture-dependent behavior. This paper proposes a methodology for characterizing benchmarks based on microarchitecture-independent characteristics. This methodology minimizes the number of inherent program characteristics that need to be measured by exploiting correlation between program characteristics. In fact, we reduce our 47dimensional space to an 8-dimensional space without compromising the methodology’s ability to compare benchmarks. The important benefits of this methodology are that (i) only a limited number of microarchitecture-independent characteristics need to be measured, and (ii) the resulting workload characterization is easy to interpret. Using this methodology we compare 122 benchmarks from 6 recently proposed benchmark suites. We conclude that some benchmarks in emerging benchmark suites are indeed similar to benchmarks from well-known benchmark suites as suggested through a microarchitecture-dependent characterization. However, other benchmarks are dissimilar based on a microarchitecture-independent characterization although a microarchitecture-dependent characterization suggests the opposite to be true. I.

