• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 38,023
Next 10 →

Table 3.1: Stencil to compute the gradient vector

in Experiments In Minimizing Numerical Diffusion Across A Material Boundary
by Christian Aalburg, Lu Shiyong, Zhang Ge, Neetu Gupta 1996

Table 2: Grid spacing for each stencil compute the derivative. To compute fx(u(xj; t)), use the formula fx(u(xj; t)) = 1

in A Hybrid Adaptive ENO Scheme
by Robert Bruce Bauer, Dr. Bauer

Table 4. The execution times in rounded milliseconds for 10 iterations of a simple 5-point stencil computation for various sized square mesh sizes. Four di erent data distributions and communication structures are compared using FM/Myrinet.

in Implementing Data-Parallel Programs on Commodity Clusters
by P. J. Hatcher, R. D. Russell, S. Kumaran, M. J. Quinn
"... In PAGE 15: ... Figure 2 shows the results from the next set of experiments, using Illinois Fast Messages for communication over Myrinet. (The actual data values are given in Table4 .) In these experiments, since Fast Messages are being used, we never cre- ate PVM processes, but rather use multithreading to exploit the multiprocessor nodes.... ..."

Table 1 summarizes the performance measurements comparing iterative and cache oblivious implementations of different variants of the Lax-Wendroff stencil computation. The speedup is the ratio of the runtime of the iterative version and the runtime of the corresponding cache oblivious version. A more detailed performance analysis appears in Section 6 below.

in Software Engineering Aspects of Cache Oblivious Stencil Computations
by Volker Strumpen, Matteo Frigo, Volker Strumpen, Matteo Frigo 2006
"... In PAGE 3: ... Table1 : Runtimes and speedups of cache oblivious Lax-Wendroff codes versus the iterative pro- grams for N = 10, 000, 000 space points and T = 100 time steps. The results in Table 1 were generated on an Apple Power Mac G5 with 2 GHz PowerPC 970 processors and an IBM Power5 system with 1.... In PAGE 3: ...Table 1: Runtimes and speedups of cache oblivious Lax-Wendroff codes versus the iterative pro- grams for N = 10, 000, 000 space points and T = 100 time steps. The results in Table1 were generated on an Apple Power Mac G5 with 2 GHz PowerPC 970 processors and an IBM Power5 system with 1.65 GHz processors.... In PAGE 3: ... For both classes, we implemented programs with two different storage schemes, toggle arrays and boundary passing, as discussed in Section 2. Table1 shows that for each of the two processor architectures, the speedups are roughly the same for the four different programs, and amount to about 2.... In PAGE 27: ... Table 2 shows the runtimes and speedups of our optimized cache oblivious program compared to the naive, iterative Lax-Wendroff codes on a 970 processor. These numbers should be compared with those in Table1 of Section 1. We find that the runtimes of the cache oblivious version are essentially unchanged while those of the naive, iterative version increase by a factor of 2.... ..."

Table 1: Timings for mesh pyramid computation assuming storage rather then recomputation of all areas and length needed in stencil weight computations. The size field counts the total vertices (N). Face counts are generally twice as large. All times are given in seconds on an SGI R10k O2 @175Mhz.

in Multiresolution Signal Processing for Meshes
by Igor Guskov, Wim Sweldens, Peter Schröder

Table 1: Timings for mesh pyramid computation assuming storage rather then recomputation of all areas and length needed in stencil weight computations. The size field counts the total vertices (N). Face counts are generally twice as large. All times are given in seconds on an SGI R10k O2 @175Mhz.

in Multiresolution Signal Processing for Meshes
by Igor Guskov, Wim Sweldens, Peter Schröder

Table 2: Average frame rate using different resolutions and shadow volume methods: For high resolutions the computation of the shadow mask in the alpha or screen buffer was slightly faster than using the stencil buffer. Computing the shadow mask at half the screen resolution improved the performance for higher window resolutions.

in Shadow Volumes Revisited
by Stefan Roettger, Alexander Irion, Thomas Ertl 2002
"... In PAGE 6: ... In particular, the minimum frame rate is in- creased by almost a factor of four. 5 Results Table2 shows the average frame rate achieved by our algorithms during an animation of the scene shown in Figure 6. The test was performed on a AMD 800MHz PC with a NVIDIA GeForce 2 MX graphics card.... ..."
Cited by 1

Table 4: Performance comparison before and after applying stencil factoring.

in Experiences Tuning SMG98 - a Semicoarsening Multigrid Benchmark based on the hypre Library
by Guohua Jin, John Mellor-Crummey
"... In PAGE 8: ...4 and 2.13 on MIPS R12000 and EV67 as shown in Table4 , where orig represents the original code and opt represents the optimized code. The overall performance is lower because this kernel for han- dling stride-adjacent triplets only applies to 72% of FLOPS in the residual stencil computation.... ..."

Table 4: Performance for the Heat equation and WaveToy stencils. X1E and Itanium2 experiments use 2563 grids. The Opteron uses a 1283. Cell uses the largest grid that would fit within the local stores. The (n steps) versions denote a temporally blocked version where n time steps are computed.

in Scientific computing kernels on the cell processor
by Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, Katherine Yelick 2007
"... In PAGE 10: ... 8.5 Stencil Kernel Results The performance for the heattut and WaveToy stencil ker- nels is shown in Table4 . Results show that as the number of time steps increases, a corresponding decrease in the grid size is required due to the limited memory footprint of the local store.... In PAGE 10: ... 8.6 Performance Comparison Table4 presents a performance comparison of the stencil computations across our evaluated set of leading processors. Note that stencil performance has been optimized for the cache-based platforms as described in [17].... ..."
Cited by 2

Table 9: Stencils for fth order approximations Stencil #

in A Hybrid Adaptive ENO Scheme
by Robert Bruce Bauer, Dr. Bauer
Next 10 →
Results 1 - 10 of 38,023
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University