• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 20,252
Next 10 →

Table 1: Assumed execution times of each block in SIMD and SPMD modes for the ow analysis tree of Fig. 2.

in Determining the Execution Time Distribution for a Data Parallel Program in a Heterogeneous Computing Environment
by Yan Alexander Li, John K. Antonio, Howard Jay Siegel, Min Tan, Daniel W. Watson 1997
"... In PAGE 28: ... Only blk f is assumed to contain inter-PE communications, thus it executes faster in SIMD mode than in SPMD mode. Execution times of each block are listed in Table1 . Simplifying assumptions are made for the \overhead quot; operations listed; the framework developed here can readily support distinct values for each overhead operation in each mode.... ..."

Table 2: Actual and approximate expected values for execution time of the whole program in SIMD, SPMD, and according to the mixed-mode scheme of Fig. 6.

in Determining the Execution Time Distribution for a Data Parallel Program in a Heterogeneous Computing Environment
by Yan Alexander Li, John K. Antonio, Howard Jay Siegel, Min Tan, Daniel W. Watson 1997
"... In PAGE 29: ... 6: Mixed-mode execution for the numerical example. rst row of Table2 . These values were computed after evaluating the density function for each case considered.... In PAGE 31: ...Table2 , note that the actual expected values (computed using the proposed ap- proach) are signi cantly di erent from the corresponding approximate expected values (com- puted using the average value approach). Based on the average value approach, the mixed- mode case has an expected execution time that is signi cantly better than both the SIMD and SPMD executions of the entire program.... ..."

Table 1 gives a rough idea of the size of these codes. Comments are not counted in the number of lines.

in Parallelization of Finite Element Codes With Automatic Placement of Communications
by Laurent Hascoët
"... In PAGE 7: ... Table1 : Sizes of application examples 3 The SPMD parallel execution model As we saw in section 1, the SPMD parallel execution model requires two major operations: Mesh Partitioning and SPMD Program Generation. As shown on gure 1, these operations are independent.... ..."

Table 3: Optimization Results: Dynamic Counts

in Compiler Optimizations for Eliminating Barrier Synchronization
by Chau-wen Tseng 1995
"... In PAGE 10: ...2 Dynamic Measures To better determine the performance impact we look at dynamic measures. Table3 presents the dynamic count of parallel SPMD regions, barriers, and counters during program execution. The first two columns present information on SPMD regions.... ..."
Cited by 77

Table 2. Overhead Compensation Results for NAS Benchmarks on Linux Cluster - Parallel

in Overhead compensation in performance profiling
by Allen D. Malony, Sameer S. Shende 2004
"... In PAGE 8: ... 4.3 Parallel Experiments Table2 reports the results for parallel execution of the six NAS benchmarks on the Linux Xeon cluster. All of the applications execute as SPMD programs using MPI message passing for parallelization across 16 processors.... ..."
Cited by 2

Table 1 Implementation complexity of programming models using HAMSTER Programming model No. of lines No. of API calls Lines/call Platform

in SMiLE: an integrated, multi-paradigm software infrastructure for SCI-based clusters
by Martin Schulz, Jie Tao, Carsten Trinitis, Wolfgang Karl

Table 2. What DAME provides with respect to ideal SPMD machine features.

in DAME: An Environment for Preserving Efficiency of Data Parallel Computations on Distributed Systems
by M. Colajanni, M. Cermele
"... In PAGE 5: ... In order to preserve the ideal SPMD style without loosing efficiency, DAME provides the programmer with five supports for virtual topology (VTS), data distribution (DDS), data management (DMS), interprocess communication (ICS) and workload reconfiguration (WRS). As Table2 shows, each support contributes to achieve some characteristics of the virtual SPMD machine. 3.... ..."

Table 1: Overhead per checkpoint (in seconds) for the SPMD applications.

in System-Level versus User-Defined Checkpointing
by Luis Moura Silva, João Gabriel Silva, System-level Versus User-defined Checkpointing
"... In PAGE 4: ... In this study, we have implemented system-level transparent and user-defined checkpointing in the same system and we used the same parallel machine and the same benchmarks to allow a direct comparison. Table1 compares the overhead per checkpointing of the two main schemes: STC and UDC. Both schemes use a non- blocking technique to perform the checkpoints concurrently with the application.... ..."

Table 2 Results for steady-state performance without any reconfiguration.

in .I..
by J. E. Moreira, V. K. Naik
"... In PAGE 15: ... The notation P, -+ P, denotes reconfiguration from P, (source) to P, (target) processors. Table2 compares the steady-state time per iteration for the DRMS and SPMD versions of the application when executing on the same number of processors. The column drms lists the time for the DRMS version (t,,,,), and the spmd column lists the time for the SPMD version (tspM,).... In PAGE 15: ... first iteration after a reconfiguration, on the new set of processors. Because of page and cache misses, we expect this first iteration to execute more slowly than when the application is in steady state on a fixed number of processors (compare to Table2 ). This penalty to reachieve steady-state performance is also a form of reconfiguration cost, paid by the application.... In PAGE 18: ... CHOLESKY only has one distributed data structure; however, its efficiency is less than that of JACOBI because its dataset is smaller and because the computation of slices of data to be transferred is more elaborate in the presence of arbitrary distribution. Analyzing columns drms and spmd of Table2 , we observe that in one case the steady-state performance of the DRMS version was better than that of the corresponding SPMD version. The DRMS version of APPLU is 1% faster than the corresponding SPMD version.... ..."

Table 2: Results for steady-state performance.

in Dynamic Resource Management on Distributed Systems Using Reconfigurable Applications
by José E. Moreira, Vijay K. Naik 1997
"... In PAGE 21: ... The notation P1 ! P2 denotes reconfiguration from P1 (source) to P2 (target) processors. Table2 compares the steady-state time per iteration for the DRMS and SPMD versions of the application when executing on the same number of processors. The column drms lists the time for the DRMS version (tDRMS), and the spmd column lists the time for the SPMD version (tSPMD).... In PAGE 22: ... The column labeled First lists the execution time of the first iteration after a reconfiguration, on the new set of processors. Because of page and cache misses, we expect this first iteration to execute more slowly than when the application is in steady state on a fixed number of processors (compare to Table2 ). This penalty to reachieve steady-state performance is also a form of reconfiguration cost, paid by the application.... In PAGE 25: ... CHOLESKY only has one distributed data structure; however, its efficiency is less than that of JACOBI because its data set is smaller and because the computation of slices of data to be transferred is more elaborate in the presence of arbitrary distribution. Analyzing columns drms and spmd of Table2 , we observe that in one case the steady-stateperformance of the DRMS version was better than that of the corresponding SPMD version. The DRMS version of APPLU is 1% faster than the corresponding SPMD version.... ..."
Cited by 15
Next 10 →
Results 1 - 10 of 20,252
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University