Results 1  10
of
15
Parallel program performance prediction using deterministic task graph analysis
 ACM Trans. Comput. Syst
, 2004
"... In this paper, we consider analytical techniques for predicting detailed performance characteristics of a single shared memory parallel program for a particular input. Analytical models for parallel programs have been successful at providing simple qualitative insights and bounds on program scalabil ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
In this paper, we consider analytical techniques for predicting detailed performance characteristics of a single shared memory parallel program for a particular input. Analytical models for parallel programs have been successful at providing simple qualitative insights and bounds on program scalability, but have been less successful in practice for providing detailed insights and metrics for program performance (leaving these to measurement or simulation). We develop a conceptually simple modeling technique called deterministic task graph analysis that provides detailed performance prediction for sharedmemory programs with arbitrary task graphs, a wide variety of task scheduling policies, and significant communication and resource contention. Unlike many previous models that are stochastic models, our model assumes deterministic task execution times (while retaining the use of stochastic models for communication and resource contention). This assumption is supported by a previous study of the influence of nondeterministic delays in parallel programs. We evaluate our model in three ways. First, an experimental evaluation shows that our analysis technique is accurate and efficient for a variety of sharedmemory programs, including programs with large and/or complex task graphs, sophisticated task scheduling, highly nonuniform task
A simulator for parallel applications with dynamically varying compute node allocation, to be published
 in Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS'06), IEEE Press, 2006 vii
"... Dynamically allocating computing nodes to parallel applications is a promising technique for improving the utilization of cluster resources. We introduce the concept of dynamic efficiency which expresses the resource utilization efficiency as a function of time. We propose a simulation framework whi ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Dynamically allocating computing nodes to parallel applications is a promising technique for improving the utilization of cluster resources. We introduce the concept of dynamic efficiency which expresses the resource utilization efficiency as a function of time. We propose a simulation framework which enables predicting the dynamic efficiency of a parallel application. It relies on the DPS parallelization framework to which we add direct execution simulation capabilities. The high level flow graph description of DPS applications enables the accurate simulation of parallel applications without needing to modify the application code. Thanks to partial direct execution, simulation times and memory requirements may be reduced. In simulations under partial direct execution, the application's parallel behavior is simulated thanks to direct execution, and the duration of individual operations is obtained from a performance prediction model or from prior measurements. We verify the accuracy of our simulator by comparing the effective running time, respectively the dynamic efficiency, of parallel program executions with the running time, respectively the dynamic efficiency, predicted by the simulator. These comparisons are performed for an LU factorization application under different parallelization and dynamic node allocation strategies. 1.
A New Performance Evaluation Approach for System Level Design Space Exploration
, 2002
"... Application specific systems have potential for customization of design with a view to achieve a better costperformance power tradeo#. Such customization requires extensive design space exploration. In this paper, we introduce a performance evaluation methodology for systemlevel design exploratio ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Application specific systems have potential for customization of design with a view to achieve a better costperformance power tradeo#. Such customization requires extensive design space exploration. In this paper, we introduce a performance evaluation methodology for systemlevel design exploration that is much faster than traditional cycleaccurate simulation. The trade o# is between accuracy and simulation speed. The methodology is based on probabilistic modeling of system components customized with application behavior. Performance numbers are generated by simulating these models. We have implemented our models using SystemC and validated these for uniprocessor as well as multiprocessor systems against various benchmarks.
Predicting execution times of parallelindependent programs using Pearson distributions
 PARALLEL COMPUTING
, 2005
"... Predicting the execution time of parallel programs involves computing the maximum or minimum of the execution times of the tasks involved in the parallel computation. We present a method to accurately compute the distribution of the largest (Max) and the smallest (Min) execution time of the composit ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Predicting the execution time of parallel programs involves computing the maximum or minimum of the execution times of the tasks involved in the parallel computation. We present a method to accurately compute the distribution of the largest (Max) and the smallest (Min) execution time of the composite of a number of parallel programming tasks, each having an independent, stochastic, arbitrary workload. The Max function applies to the general case that the composite task completes at the time its longest constituent task terminates. The Min function applies when the completion of its shortest task terminates the whole parallel process, such as in a parallel searching program. Both the Min and Max density function of a constituent task are characterized in terms of a Pearson distribution. Due to its accuracy, the presented method is especially of interest when the performance of time critical parallel applications must be derived. Both prediction methods are tested against three wellknown distributions. Furthermore, the Max prediction method is also tested against a number of measured reallife data parallel programs with different degree of parallelism. The results show excellent accuracy of better than 1 % with a very few exceptions in extreme situations.
Reliability Analysis of Hierarchical Systems Using Statistical Moments
, 2006
"... Abstract—In many practical engineering circumstances, systems reliability analysis is complicated by the fact that the failure time distributions of the constituent subsystems cannot be accurately modeled by standard distributions. In this paper, we present a lowcost, compositional approach based on ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In many practical engineering circumstances, systems reliability analysis is complicated by the fact that the failure time distributions of the constituent subsystems cannot be accurately modeled by standard distributions. In this paper, we present a lowcost, compositional approach based on the use of the first four statistical moments to characterize the failure time distributions of the constituent components, subsystems, and toplevel system. The approach is based on the use of Pearson distributions as an intermediate analytical vehicle, in terms of which the constituent failure time distributions are approximated. The analysis technique is presented foroutof systems with identical subsystems, series systems with different subsystems, and systems exploiting standby redundancy. The momentinmomentout approach allows for the analysis of systems with arbitrary hierarchy, and arbitrary (unimodal) failure time distributions, provided the subsystems are independent such that the resulting failure time can be expressed in terms of sums or order statistics. The technique consistently exhibits very good accuracy (on average, much less than 1 percent error) at very modest computing cost. Index Terms—Failure time analysis, hierarchical systems, order statistics, Pearson approximation, redundant systems. NOTATIONS failure time of composite system failure time pdf of composite system failure time cdf of composite system number of units (subsystems) failure time of unit failure time pdf of unit failure time cdf of unit Pearson approximation of
A simulator for adaptive parallel applications
"... Dynamically allocating computing nodes to parallel applications is a promising technique for improving the utilization of cluster resources. Detailed simulations can help identify allocation strategies and problem decomposition parameters that increase the efficiency of parallel applications. We des ..."
Abstract
 Add to MetaCart
(Show Context)
Dynamically allocating computing nodes to parallel applications is a promising technique for improving the utilization of cluster resources. Detailed simulations can help identify allocation strategies and problem decomposition parameters that increase the efficiency of parallel applications. We describe a simulation framework supporting dynamic node allocation which, given a simple cluster model, predicts the running time of parallel applications taking CPU and network sharing into account. Simulations can be carried out without needing to modify the application code. Thanks to partial direct execution, simulation times and memory requirements are reduced. In partial direct execution simulations, the application's parallel behavior is retrieved via direct execution, and the duration of individual operations is obtained from a performance prediction model or from prior measurements. Simulations may then vary cluster model parameters, operation durations and problem decomposition parameters to analyze their impact on the application performance and identify the limiting factors. We implemented the proposed techniques by adding direct execution simulation capabilities to the Dynamic Parallel Schedules parallelization framework. We introduce the concept of dynamic efficiency to express the resource utilization efficiency as a function of time. We verify the accuracy of our simulator by comparing the effective running time, respectively the dynamic efficiency, of parallel program executions with the running time, re
A Statistical Approach to Contention Modeling for HighLevel Heterogeneous Multiprocessor Simulation
, 2007
"... Single chip systems featuring multiple heterogeneous processors and a variety of communication and memory architectures have emerged to satisfy the demand for networking, handheld computing, and other custom devices. The complex interactions between applications, schedulers, and processor resources, ..."
Abstract
 Add to MetaCart
(Show Context)
Single chip systems featuring multiple heterogeneous processors and a variety of communication and memory architectures have emerged to satisfy the demand for networking, handheld computing, and other custom devices. The complex interactions between applications, schedulers, and processor resources, along with the resulting contention delays for shared busses and memories, are the chief limitation against raising the modeling abstraction level above the clock cycle. Without raising the simulation abstraction level, multiprocessor simulations are slow to build and execute, severely limiting the number of design iterations that can be considered, thus restricting the design space that can be explored. This work introduces a new level of design which we call the Stochastic Contention Level (SCL). Instead of considering shared resource accesses at the clock cycle granularity, SCL simulations operate on blocks that are thousands to millions of clock cycles long, summarizing the stochastic behavior of large groups of shared resource accesses throughout the time period. The SCL idea is enabled by three key contributions: merging of an analytical stochastic model with discrete event simulation, the parameterization of shared resource access patterns used for contention modeling, and the method for predicting when
More Accurate Semantics Defining Constraint Combination for Software Systems Having Client
"... In this paper we present a new method of combining multiple precedence constraints for a single task to support software systems having clientserver relationships. In these types of software systems, the combination of the precedence constraints must support the dynamic job structure of the system. ..."
Abstract
 Add to MetaCart
In this paper we present a new method of combining multiple precedence constraints for a single task to support software systems having clientserver relationships. In these types of software systems, the combination of the precedence constraints must support the dynamic job structure of the system. Existing formalisms do not offer this support while accurately representing the desired behavior of the software. Additionally, the application of existing formalisms to software systems having periodic clients is ill defined. The new semantics presented here can be applied at the task graph level of description, support the underlying dynamic job structure of these systems, accurately represent the desired behavior of these software systems including those with periodic workloads, and easily map to commercialofftheshelf, asynchronous, eventdriven, prioritybased scheduling environments. Key Words: task graph; job graph; precedence constraints; scheduling; constraint combination; client
A Survey and Analysis of Existing Constraint Combination Formalisms and Their Application to Software Systems Having ClientServer Relationships
"... Task graph representations of complex software systems often require the specification of multiple constraints upon the execution of a single task. In this paper, existing formalisms associated with this type of specification and their underlying execution models are discussed. These formalisms are ..."
Abstract
 Add to MetaCart
(Show Context)
Task graph representations of complex software systems often require the specification of multiple constraints upon the execution of a single task. In this paper, existing formalisms associated with this type of specification and their underlying execution models are discussed. These formalisms are analyzed from different structural levels of the software system to identify their limitations in describing systems having clientserver relationships. In these types of software systems, the combination of the precedence constraints must support the dynamic job structure of the system. Existing formalisms do not offer this support while accurately representing the desired behavior of the software. Key Words: precedence constraints; software scheduling;
under the guidance of
"... I hereby express my sincere thanks and gratitude towards my guide Prof. R. K. Joshi for his constant help, encouragement and inspiration. Meeting with him have been a constant source of ideas, and gave me motivation for this project. ..."
Abstract
 Add to MetaCart
(Show Context)
I hereby express my sincere thanks and gratitude towards my guide Prof. R. K. Joshi for his constant help, encouragement and inspiration. Meeting with him have been a constant source of ideas, and gave me motivation for this project.