Results 1 -
7 of
7
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures
- In International Parallel and Distributed Processing Symposium
, 2002
"... This paper examines the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures. By focusing on the Message Passing Interface (MPI) and by using ..."
Abstract
-
Cited by 74 (9 self)
- Add to MetaCart
This paper examines the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures. By focusing on the Message Passing Interface (MPI) and by using hardware counters on the microprocessor, we observe each application's inherent behavioral characteristics: point-to-point and collective communication, and floating-point operations. Furthermore, we explore the sensitivities of these characteristics to both problem size and number of processors. Our analysis reveals several striking similarities across our diverse set of applications including the use of collective operations, especially those collectives with very small data payloads. We also highlight a trend of novel applications parting with regimented, static communication patterns in favor of dynamically evolving patterns, as evidenced by our experiments on applications that use implicit linear solvers and adaptive mesh refinement. Overall, our study contributes a better understanding of the requirements of current and emerging paradigms of scientific computing in terms of their computation and communication demands.
Predicting Indirect Branches via Data Compression
, 1998
"... Branch prediction is a key mechanism used to achieve high performance on multiple issue, deeply pipelined processors. By predicting the branch outcome at the instruction fetch stage of the pipeline, superscalar processors are better able to exploit Instruction Level Parallelism (ILP) by providing a ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Branch prediction is a key mechanism used to achieve high performance on multiple issue, deeply pipelined processors. By predicting the branch outcome at the instruction fetch stage of the pipeline, superscalar processors are better able to exploit Instruction Level Parallelism (ILP) by providing a larger window of instructions. However, when a branch is mispredicted, instructions from the mispredicted path must be discarded. Therefore, branch prediction accuracy is critical to achieve high performance. Existing branch prediction schemes can accurately predict the direction of conditional branches, but they have difficulty predicting the correct targets of indirect branches. Indirect branches occur frequently in Object-Oriented Languages (OOL), as well as in Dynamically-Linked Libraries (DLLs), two programming environments rapidly increasing in popularity. In addition, certain language constructs such as multi-way control transfers (e.g., switches), and architectural features such as 6...
Execution Characteristics of Multimedia Applications on a Pentium II Processor
, 2000
"... With the widespread use of 3D graphics, animation, speech recognition, and other media applications, general -purpose processors are increasingly spending their cycles on video and audio processing. However, the characteristics of media applications when executed on general purpose processors are no ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
With the widespread use of 3D graphics, animation, speech recognition, and other media applications, general -purpose processors are increasingly spending their cycles on video and audio processing. However, the characteristics of media applications when executed on general purpose processors are not well understood. Such knowledge is extremely important in guiding the design of future microprocessors and development of media applications. In this paper we characterize the performance of multimedia applications on an Intel Pentium II processor based system. Six different commercial multimedia applications belonging to 3D graphics, streaming video or streaming audio categories are executed on an Intel Pentium II processor and performance is measured. Architectural data pertaining to utilization of various hardware resources on the chip are collected, using on-chip performance monitoring counters. Multimedia applications are seen to have fewer branch instructions than SPECint benchmarks,...
Indirect Branch Prediction using Data Compression Techniques
- Journal of Instruction Level Parallelism
, 1999
"... Branch prediction is a key mechanism used to achieve high performance on multiple issue, deeply pipelined processors. By predicting the branch outcome at the instruction fetch stage of a pipeline, superscalar processors become able to exploit Instruction Level Parallelism (ILP) by providing a lar ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Branch prediction is a key mechanism used to achieve high performance on multiple issue, deeply pipelined processors. By predicting the branch outcome at the instruction fetch stage of a pipeline, superscalar processors become able to exploit Instruction Level Parallelism (ILP) by providing a larger window of instructions. However, when a branch is mispredicted, instructions from the mispredicted path must be discarded. Therefore, branch prediction accuracy is critical to achieve high performance. Existing branch prediction schemes can accurately predict the direction of conditional branches, but have difficulties predicting the correct targets of indirect branches. Indirect branches occur frequently in Object-Oriented Languages (OOL), as well as in Dynamically-Linked Libraries (DLLs), two programming environments rapidly increasing in popularity. In addition, certain language constructs such as multi-way control transfers (e.g., switches), and architectural features such as ...
Improving the Accuracy of Indirect Branch Predication via Branch Classification
, 1998
"... Providing accurate branch prediction is critical to effectively exploit superscalar execution. While most modern processors employ speculative execution to overcome the branch hazard problem, some number of the instructions will have to be discarded when a branch misprediction occurs. Even though ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Providing accurate branch prediction is critical to effectively exploit superscalar execution. While most modern processors employ speculative execution to overcome the branch hazard problem, some number of the instructions will have to be discarded when a branch misprediction occurs. Even though existing branch prediction schemes can accurately predict the direction of conditional branches, they still have difficulty predicting the correct targets of indirect branches. This type of branch occurs more frequently in languages used in Object-Oriented Programming (OOP), as well as in Dynamically-Linked Libraries (DLLs), two programming environment rapidly increasing in popularity. In this paper, we investigate the performance of several predictors used to predict the targets of indirect branches. We present indirect branch classification as a mechanism to characterize the behavior of indirect branches. We then propose hybrid predictors utilizing static and profile-guided branch c...
Communication Characteristics of Large-Scale Scientific Applications for
- In International Parallel and Distributed Processing Symposium
, 2002
"... This paper examines the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures. By focusing on the Message Passing Interface (MPI) and by using h ..."
Abstract
- Add to MetaCart
This paper examines the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures. By focusing on the Message Passing Interface (MPI) and by using hardware counters on the microprocessor, we observe each application's inherent behavioral characteristics: point-to-point and collective communication, and floating-point operations. Furthermore, we explore the sensitivities of these characteristics to both problem size and number of processors. Our analysis reveals several striking similarities across our diverse set of applications including the use of collective operations, especially those collectives with very small data payloads. We also highlight a trend of novel applications parting with regimented, static communication patterns in favor of dynamically evolving patterns, as evidenced by our experiments on applications that use implicit linear solvers and adaptive mesh refinement. Overall, our study contributes a better understanding of the requirements of current and emerging paradigms of scientific computing in terms of their computation and communication demands.
Exploring Performance Limits to Future Instruction-Level-Parallel Processors
, 1998
"... In this paper, we examine the relative importance of memory latency, memory bandwidth, and branch predictability on the performance of future processors. We develop and validate a sampling-based simulation methodology that allows us to simulate a large number of design points. Our methodology ens ..."
Abstract
- Add to MetaCart
In this paper, we examine the relative importance of memory latency, memory bandwidth, and branch predictability on the performance of future processors. We develop and validate a sampling-based simulation methodology that allows us to simulate a large number of design points. Our methodology ensures that the entire execution profile of the application is captured while limiting the errors induced by sampling to less than 2%. We extend our simulation results by fitting the data to analytic expressions of filters. Using the insight gained from these expressions, our simulation data, and known technological trends, we develop an understanding of the factors that will limit the performance of future-generation processors. From our examination, we conclude the following. The amount of instruction-level parallelism exploited by an application changes the relative importance of performance bottlenecks. In systems with less capacity to exploit instruction-level parallelism, memory l...

