Results 11 - 20
of
73
Architecture independent performance characterization and benchmarking for scientific applications
- IN INTERNATIONAL SYMPOSIUM ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER TELECOMMUNICATIONS SYSTEMS
, 2004
"... A simple, tunable, synthetic benchmark with a performance directly related to applications would be of great benefit to the scientific computing community. In this paper, we present a novel approach to develop such a benchmark. The initial focus of this project is on data access performance of scien ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
A simple, tunable, synthetic benchmark with a performance directly related to applications would be of great benefit to the scientific computing community. In this paper, we present a novel approach to develop such a benchmark. The initial focus of this project is on data access performance of scientific applications. First a hardware independent characterization of code performance in terms of address streams is developed. The parameters chosen to characterize a single address stream are related to regularity, size, spatial, and temporal locality. These parameters are then used to implement a synthetic benchmark program that mimics the performance of a corresponding code. To test the validity of our approach we performed experiments using five test kernels on six different platforms. The performance of most of our test kernels can be approximated by a single synthetic address stream. However in some cases overlapping two address streams is necessary to achieve a good approximation.
HBench:Java: An Application-Specific Benchmarking Framework for Java Virtual Machines
- ACM Java Grande
, 2000
"... Java applications represent a broad class of programs, ranging from programs running on embedded products to highperformance server applications. Standard Java benchmarks ignore this fact and assume a fixed workload. When an actual application’s beha vior differs from that included in a standard ben ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
Java applications represent a broad class of programs, ranging from programs running on embedded products to highperformance server applications. Standard Java benchmarks ignore this fact and assume a fixed workload. When an actual application’s beha vior differs from that included in a standard benchmark, the benchmark results are useless, if not misleading. In this paper, we present HBench:Java, an application-specific benchmarking framework, based on the concept that a system's performance must be measured in the context of the application of interest. HBench:Java employs a methodology that uses vectors to characterize the application and the underlying JVM and carefully combines the two vectors to form a single metric that reflects a specific application ’ s performance on a particular JVM such that the performance of multiple JVMs can be realistically compared. Our performance results demonstrate HBench:Java ’ s superiority over traditional benchmarking approaches in predicting real application performance and its ability to pinpoint performance problems.
Workload Design: Selecting Representative Program-Input Pairs," presented at PACT '02
- Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
, 2002
"... Having a representative workload of the target domain of a microprocessor is extremely important throughout its design. The composition of a workload involves two issues: (i) which benchmarks to select and (ii) which input data sets to select per benchmark. Unfortunately, it is impossible to select ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Having a representative workload of the target domain of a microprocessor is extremely important throughout its design. The composition of a workload involves two issues: (i) which benchmarks to select and (ii) which input data sets to select per benchmark. Unfortunately, it is impossible to select a huge number of benchmarks and respective input sets due to the large instruction counts per benchmark and due to limitations on the available simulation time. In this paper, we use statistical data analysis techniques such as principal components analysis (PCA) and cluster analysis to efficiently explore the workload space. Within this workload space, different input data sets for a given benchmark can be displayed, a distance can be measured between program-input pairs that gives us an idea about their mutual behavioral differences and representative input data sets can be selected for the given benchmark. This methodology is validated by showing that program-input pairs that are close to each other in this workload space indeed exhibit similar behavior. The final goal is to select a limited set of representative benchmark-input pairs that span the complete workload space. Next to workload composition, there are a number of other possible applications, namely getting insight in the impact of input data sets on program behavior and profile-guided compiler optimizations. 1
Experiment Management Support for Performance Tuning
- PROCEEDINGS OF THE SC’97 CONFERENCE
, 1997
"... The development of a high-performance parallel system or application is an evolutionary process -- both the code and the environment go through many changes during a program's lifetime -- and at each change, a key question for developers is: how and how much did the performance change? No existing ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
The development of a high-performance parallel system or application is an evolutionary process -- both the code and the environment go through many changes during a program's lifetime -- and at each change, a key question for developers is: how and how much did the performance change? No existing performance tool provides the necessary functionality to answer this question. This paper reports on the design and preliminary implementation of a tool which views each execution as a scientific experiment and provides the functionality to answer questions about a program's performance which span more than a single execution or environment. We report results of using our tool with an actual performance tuning study and with a scientific application run in changing environments. Our goal is to use historic program performance data to develop techniques for parallel program performance diagnosis.
Measuring Benchmark Similarity Using Inherent Program Characteristics,” Laboratory of Computer Architecture
, 2006
"... Abstract—This paper proposes a methodology for measuring the similarity between programs based on their inherent microarchitecture-independent characteristics, and demonstrates two applications for it: 1) finding a representative subset of programs from benchmark suites and 2) studying the evolution ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Abstract—This paper proposes a methodology for measuring the similarity between programs based on their inherent microarchitecture-independent characteristics, and demonstrates two applications for it: 1) finding a representative subset of programs from benchmark suites and 2) studying the evolution of four generations of SPEC CPU benchmark suites. Using the proposed methodology, we find a representative subset of programs from three popular benchmark suites—SPEC CPU2000, MediaBench, and MiBench. We show that this subset of representative programs can be effectively used to estimate the average benchmark suite IPC, L1 data cache miss-rates, and speedup on 11 machines with different ISAs and microarchitectures—this enables one to save simulation time with little loss in accuracy. From our study of the similarity between the four generations of SPEC CPU benchmark suites, we find that, other than a dramatic increase in the dynamic instruction count and increasingly poor temporal data locality, the inherent program characteristics have more or less remained unchanged. Index Terms—Measurement techniques, modeling techniques, performance of systems, performance attributes. æ 1
Open corba benchmarking
- In International Symposium on Performance Evaluation of Computer and Telecommunication Systems
, 2001
"... Abstract: We present two benchmark suites for CORBA brokers, targeted at the broker vendor and user audiences respectively. The vendor suite is a result of several benchmarking projects with industrial partners, and covers the entire functionality of a CORBA broker. The user suite is simplified to g ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Abstract: We present two benchmark suites for CORBA brokers, targeted at the broker vendor and user audiences respectively. The vendor suite is a result of several benchmarking projects with industrial partners, and covers the entire functionality of a CORBA broker. The user suite is simplified to give an overview of the basic factors influencing broker performance, and is complemented with an approach for tailoring the results to a specific system and mode of operation without a prohibitive loss of precision.
A Decompositional Approach to Computer System Performance Evaluation
, 1997
"... Contents 1 Introduction 1 2 Decomposing the Performance of the Operating System Kernel 9 2.1 Related Work: Benchmarking Operating Systems 11 2.2 Microbenchmark Tools: Revising lmbench into hbench-OS 13 2.2.1 Timing Methodology 15 2.2.2 Statistical Methodology 16 2.2.3 Increased Parameterization 18 ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Contents 1 Introduction 1 2 Decomposing the Performance of the Operating System Kernel 9 2.1 Related Work: Benchmarking Operating Systems 11 2.2 Microbenchmark Tools: Revising lmbench into hbench-OS 13 2.2.1 Timing Methodology 15 2.2.2 Statistical Methodology 16 2.2.3 Increased Parameterization 18 2.2.4 Context Switch Latency 18 2.2.5 Memory Bandwidths 20 2.2.6 New Output Format 21 2.3 Case Study: A Performance Decomposition for NetBSD on the Intel x86 Platform 22 2.3.1 Bulk Data Transfer 24 2.3.2 Process Creation 36 2.3.3 Signal Handler Installation 39 3 Extending the Performance Decomposition to User Applications 43 3.2 Case Study: Developing Tools 46 3.3 Case Study: The Apache Web Server 49 3.3.1 Step 1: Decomposing Apache's Internal Structure 49 3.3.2 Step 2: Connecting the Application and Operating System Hierarchies 51 3.4 Related Work: Understanding Application Performance 54 4 Distilling the Detail: Performance at the OS-Application Abstraction Boundary 57 4.2 Analysis of Met
Expressing Meaningful Processing Requirements among Heterogeneous Nodes in an Active Network
- in Proc. of the Second International Workshop on Software and Performance
, 2000
"... Active Network technology envisions deployment of virtual execution environments within network elements, such as switches and routers. As a result, nonhomogeneous processing can be applied to network traffic associated with services, flows, or even individual packets. To use such a technology safel ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Active Network technology envisions deployment of virtual execution environments within network elements, such as switches and routers. As a result, nonhomogeneous processing can be applied to network traffic associated with services, flows, or even individual packets. To use such a technology safely and efficiently, individual nodes must provide mechanisms to enforce resource limits. To provide effective enforcement mechanisms, each node must have a meaningful understanding of the resource requirements for specific network traffic. In Active Network nodes, resource requirements typically come in three categories: bandwidth, memory, and processing. Well-accepted metrics exist for expressing bandwidth (bits per second) and memory (bytes) in units independent of the capabilities of particular nodes. Unfortunately, no well-accepted metric exists for expressing processing (i.e., CPU time) requirements in a platformindependent form. This paper investigates a method to express the CPU time r...
Performance Forecasting: Towards a Methodology for Characterizing Large Computational Applications
- In Proc. of the Int'l Conf. on Parallel Processing
, 1998
"... We present a methodology that can identify and formulate performance characteristics of a computational application and uncover program performance trends on very large, future computer architectures and problem sizes. Based on this methodology we present "performance forecast diagrams" that predict ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
We present a methodology that can identify and formulate performance characteristics of a computational application and uncover program performance trends on very large, future computer architectures and problem sizes. Based on this methodology we present "performance forecast diagrams" that predict the scalability of a large seismology application suite on a terabyte data set. We find that the applications scale well up to a large number of processors, given an interconnection network similar to the one of the SGI/Cray Origin architecture. However we find that if we increase the computation-to-communication speed ratio by a factor of 100, the different applications of the seismic suite start exhibiting architectural "sweet spots", at which the communication overhead starts to dominate computation time.
Workload-Specific File System Benchmarks
, 2001
"... To Maddie, who didn’t understand why Daddy had to work late And to Jackie, who did A fundamental problem with the current generation of file system benchmarks is that they fail to take into account the fact that a file system’s performance can vary depending on the workload running on it. Many bench ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
To Maddie, who didn’t understand why Daddy had to work late And to Jackie, who did A fundamental problem with the current generation of file system benchmarks is that they fail to take into account the fact that a file system’s performance can vary depending on the workload running on it. Many benchmarks attempt to reduce file system perfor-mance to a single number, producing a simplistic one-dimensional ordering of the sys-tems being tested. Although this may be useful for marketing literature, the performance of file systems in the real world is more complicated. Different workloads place different demands on the file system, and can result in different behavior from the underlying sys-tem. A file system that provides superior performance for a web server may have inferior performance when running a software development workload. In this dissertation I demonstrate that the “one size fits all ” approach of current file system benchmarks does not accurately predict the performance of different workloads on different file systems. I then present a new benchmarking methodology

