Results 1 - 10
of
66
Using the SimOS Machine Simulator to Study Complex Computer Systems
- ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION
, 1997
"... ... This paper identifies two challenges that machine simulators such as SimOS must overcome in order to effectively analyze large complex workloads: handling long workload execution times and collecting data effectively. To study long-running workloads, SimOS includes multiple interchangeable simul ..."
Abstract
-
Cited by 144 (5 self)
- Add to MetaCart
... This paper identifies two challenges that machine simulators such as SimOS must overcome in order to effectively analyze large complex workloads: handling long workload execution times and collecting data effectively. To study long-running workloads, SimOS includes multiple interchangeable simulation models for each hardware component. By selecting the appropriate combination of simulation models, the user can explicitly control the tradeoff between simulation speed and simulation detail. To handle the large amount of low-level data generated by the hardware simulation models, SimOS contains flexible annotation and event classification mechanisms that map the data back to concepts meaningful to the user. SimOS has been extensively used to study new computer hardware designs, to analyze application performance, and to study operating systems. We include two case studies that demonstrate how a low-level machine simulator such as SimOS can be used to study large and complex workloads.
Scalable Performance Analysis: The Pablo Performance Analysis Environment
- In Proceedings of the Scalable parallel libraries conference
, 1993
"... Developers of application codes for massively parallel computer systems face daunting performance tuning and optimization problems that must be solved if massively parallel systems are to fulfill their promise. Recording and analyzing the dynamics of application program, system software, and hardwar ..."
Abstract
-
Cited by 137 (19 self)
- Add to MetaCart
Developers of application codes for massively parallel computer systems face daunting performance tuning and optimization problems that must be solved if massively parallel systems are to fulfill their promise. Recording and analyzing the dynamics of application program, system software, and hardware interactions is the key to understanding and the prerequisite to performance tuning, but this instrumentation and analysis must not unduly perturb program execution. Pablo is a performance analysis environment designed to provide unobtrusive performance data capture, analysis, and presentation across a wide variety of scalable parallel systems. Current efforts include dynamic statistical clustering to reduce the volume of data that must be captured and complete performance data immersion via head-mounted displays. 1 Introduction As computational science becomes an equal partner to theory and experiment, there is growing consensus that massively parallel systems are the only technically an...
Meta-level Programming with CodA
- IN PROCEEDINGS OF ECOOP'95, VOLUME LNCS 952
, 1995
"... Meta-levels are complex pieces of software with diverse demands in both the computation and interaction domains. Common techniques using just code to express behaviour fail to clearly assign responsibility for a particular behaviour's definition or to provide support for the reuse or integration o ..."
Abstract
-
Cited by 74 (1 self)
- Add to MetaCart
Meta-levels are complex pieces of software with diverse demands in both the computation and interaction domains. Common techniques using just code to express behaviour fail to clearly assign responsibility for a particular behaviour's definition or to provide support for the reuse or integration of existing behaviour descriptions. The techniques of ne-grained decomposition of meta-level behaviour into objects and their subsequent composition into object models provides a framework for creating, reusing and integrating complex object behaviours. Using such a framework, we show that users can develop and integrate quite different object models while retaining a high degree of abstraction and fostering meta-level component reuse.
VAMPIR: Visualization and Analysis of MPI Resources
- Supercomputer
, 1996
"... Performance analysis most often is based on the detailed knowledge of program behavior. One option to get this information is tracing. Based on the research tool PARvis, the visualization environment VAMPIR was developed at KFA which now supports the new message passing standard MPI. VAMPIR tran ..."
Abstract
-
Cited by 67 (4 self)
- Add to MetaCart
Performance analysis most often is based on the detailed knowledge of program behavior. One option to get this information is tracing. Based on the research tool PARvis, the visualization environment VAMPIR was developed at KFA which now supports the new message passing standard MPI. VAMPIR translates a given trace file into a variety of graphical views, e.g., state diagrams, activity charts, time-line displays, and statistics. Moreover, it supports an animation mode that can help to locate performance bottlenecks, and it provides flexible filter operations to reduce the amount of information displayed. The most interesting part of VAMPIR is the powerful zooming feature that allows to identify problems at any level of detail. 1 Introduction On massively parallel computer systems, performance analysis and debugging can become an extremely complicated process. Over the years, experience has shown that user-friendly tools supporting this process are extremely helpful and can d...
Falcon: On-line Monitoring for Steering Parallel Programs
- In Ninth International Conference on Parallel and Distributed Computing and Systems (PDCS’97
, 1998
"... Advances in high performance computing, communications, and user interfaces enable developers to construct increasingly interactive high performance applications. The Falcon system presented in this paper supports such interactivity by providing runtime libraries, tools, and user interfaces that per ..."
Abstract
-
Cited by 51 (13 self)
- Add to MetaCart
Advances in high performance computing, communications, and user interfaces enable developers to construct increasingly interactive high performance applications. The Falcon system presented in this paper supports such interactivity by providing runtime libraries, tools, and user interfaces that permit the on-line monitoring and steering of large-scale parallel codes. The principal aspects of Falcon described in this paper are its abstractions and tools for capture and analysis of application-specific program information, performed on-line, with controlled latencies and scalable to parallel machines of substantial size. In addition, Falcon provides support for the on-line graphical display of monitoring information, and it allows programs to be steered during their execution, by human users or algorithmically. This paper presents our basic research motivation, outlines the Falcon system's functionality, and includes a detailed evaluation of its performance characteristics in light of i...
dQUOB: Managing Large Data Flows Using Dynamic Embedded Queries
, 2000
"... The dQUOB system satisfies client need for specific information from high-volume data streams. The data streams we speak of are the flow of data existing during large-scale visualizations, video streaming to large numbers of distributed users, and high volume business transactions. We introduces the ..."
Abstract
-
Cited by 38 (9 self)
- Add to MetaCart
The dQUOB system satisfies client need for specific information from high-volume data streams. The data streams we speak of are the flow of data existing during large-scale visualizations, video streaming to large numbers of distributed users, and high volume business transactions. We introduces the notion of conceptualizing a data stream as a set of relational database tables so that a scientist can request information with an SQL-like query. Transformation or computation that often needs to be performed on the data en-route can be conceptualized ascomputation performed on consecutive views of the data, with computation associated with each view. The dQUOB system moves the query code into the data stream as a quoblet; as compiled code. The relational database data model has the significant advantage of presenting opportunities for efficient reoptimizations of queries and sets of queries. Using examples from global atmospheric modeling, we illustrate the usefulness of the dQUOB system. We carry the examples through the experiments to establish the viability of the approach for high performance computing with a baseline benchmark. We define a cost-metric of end-to-end latency that can be used to determine realistic cases where optimization should be applied. Finally, we show that end-to-end latency can be controlled through a probability assigned to a query that a query will evaluate to true.
Analyzing the Behavior and Performance of Parallel Programs
- Univ. of Wisconsin-Madison, UW CS Tech. Rep
, 1993
"... An analytical performance model for parallel programs can provide qualitative insight as well as efficient quantitative evaluation and prediction of parallel program performance. While stochastic models for parallel programs can represent execution time variance due to communication and resource con ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
An analytical performance model for parallel programs can provide qualitative insight as well as efficient quantitative evaluation and prediction of parallel program performance. While stochastic models for parallel programs can represent execution time variance due to communication and resource contention delays, a qualitative assessment of previous models shows that the stochastic assumption makes it extremely difficult to compute synchronization costs and overall execution times. This thesis first re-evaluates the need for the stochastic assumption by examining the influence of non-deterministic communication and resource contention delays on execution times in parallel programs. An analytical model of program behavior, combined with detailed program measurements, provides compelling evidence that in shared-memory programs on current systems as well as programs with similar granularity on foreseeable future systems, such delays introduce extremely low variance into the execution tim...
TAU: A Portable Parallel Program Analysis Environment for pC++
, 1994
"... The realization of parallel language systems that offer high-level programming paradigms to reduce the complexity of application development, scalable runtime mechanisms to support variable size problem sets, and portable compiler platforms to provide access to multiple parallel architectures, place ..."
Abstract
-
Cited by 37 (7 self)
- Add to MetaCart
The realization of parallel language systems that offer high-level programming paradigms to reduce the complexity of application development, scalable runtime mechanisms to support variable size problem sets, and portable compiler platforms to provide access to multiple parallel architectures, places additional demands on the tools for program development and analysis. The need for integration of these tools into a comprehensive programming environment is even more pronounced and will require more sophisticated use of the language system technology (i.e., compiler and runtime system). Furthermore, the environment requirements of high-level support for the programmer, large-scale applications, and portable access to diverse machines also apply to the program analysis tools. In this paper, we discuss ø (TAU, Tuning and Analysis Utilities), a first prototype for an integrated and portable program analysis environment for pC++ , a parallel object-oriented language system. ø is integrated w...
Requirements for Data-Parallel Programming Environments
, 1994
"... this paper is to convey an understanding of the tools and strategies that will be needed to adequately support efficient, machineindependent data-parallel programming. To achieve our goal, we will examine the requirements for such tools and describe promising implementation strategies for meeting th ..."
Abstract
-
Cited by 26 (9 self)
- Add to MetaCart
this paper is to convey an understanding of the tools and strategies that will be needed to adequately support efficient, machineindependent data-parallel programming. To achieve our goal, we will examine the requirements for such tools and describe promising implementation strategies for meeting these requirements. April 22, 1994 Requirements for Data-Parallel Programming Environments 3 of 23
An Empirical Performance Evaluation of Scalable Scientific Applications
- in Proceedings of the 2002 ACM/IEEE Conference on Supercomputing
, 2002
"... We investigate the scalability, architectural requirements, and performance characteristics of eight scalable scientific applications. Our analysis is driven by empirical measurements using statistical and tracing instrumentation for both communication and computation. Based on these measurements, w ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
We investigate the scalability, architectural requirements, and performance characteristics of eight scalable scientific applications. Our analysis is driven by empirical measurements using statistical and tracing instrumentation for both communication and computation. Based on these measurements, we refine our analysis into precise explanations of the factors that influence performance and scalability for each application; we distill these factors into common traits and overall recommendations for both users and designers of scalable platforms. Our experiments demonstrate that some traits, such as improvements in the scaling and performance of MPI's collective operations, will benefit most applications. We also find specific characteristics of some applications that limit performance. For example, one application's intensive use of a 64-bit, floating-point divide instruction, which has high latency and is not pipelined on the POWER3, limits the performance of the application's primary computation. 1

