Results 1 - 10
of
251,040
Table 2. Performance for multi-record pages
"... In PAGE 6: ... 5.2 Multi-record pages Table2 shows the parameter setting of the final experi- ment for multi-record pages. The training pages needed for multi-record page extraction are comparablyless than those for singular pages since each training page contains several records where variations can occur.... ..."
Table 2: Slice statistics
"... In PAGE 25: ...Speculative Precomputation: Exploring the Use of Multithreading for Latency 25 register is overwritten in one thread before a child thread has read it. Fortunately, as shown in Table2 , the number of live-in values that must be copied is very small. Table 2: Slice statistics ... In PAGE 53: ... As mentioned earlier, although dynamic scheduling has better load balance, co-located parts of the pictures may not be decoded by the same processor when using dynamic scheduling. This scheduling scheme incurs more bus transactions, as shown in Table2 , with the result that the overall speed using dynamic scheduling is slower. Compared to dual-processor systems, processors with Hyper-Threading Technology have the advantage of sharing the second-level cache between two logical processors.... In PAGE 53: ...6 1.8 Speed-up Dual- processor Hyper- Threading Technology 1 thread 2 threads 3 threads (c) Figure 7: Performance of; (a) our video encoder; (b) our video decoder; and (c) our watermarking detection with software configurations Table2 : The numbers of front-side bus (FSB) data activities per second between static scheduling and dynamic scheduling on a dual-processor system Event Static scheduling Dynamic scheduling FSB_data_activity 8,604,511 12,486,051 Table 3: The numbers of FSB data activities per second between static scheduling and dynamic scheduling on a processor with Hyper-Threading Technology Event Static scheduling Dynamic scheduling FSB_data_activity 8,474,022 ... In PAGE 61: ...Hyper-Threading Technology: Impact on Compute-Intensive Workloads 61 Table2 : Counter data and performance results Exactly which resources lie idle, however, is not clear. The fraction of floating-point instructions (FP%) gives one indication of the per-stream instruction mix.... ..."
Table 2: Alignment evaluation. Evaluation data performed on the token level alignment of the parallel text.
"... In PAGE 6: ... In step 9 the final automatic alignment was performed using ITrix and then (step 10) the test set was evaluated against the gold standard. The evaluation of the final ITrix run was then compared to a baseline run (where only sta- tistical data were used) and to one where no training data had been utilised (see Table2 Alignment evaluation). However, to arrive at a usable term collection, the output from the word alignment needed to be verified and this was performed in step 11.... In PAGE 8: ... The test set was then used as a gold standard in the evaluation of the output of the automatic alignment. Table2 presents evaluation figures on recall, precision, and F-measure on three different configurations of the alignment. The first alignment was a baseline version where only statistical data were used as input resources.... In PAGE 9: ... The quality of the translation correspondences in MeSH was generally so high that we did not need to use it for training the ITools suite; the automatic alignment performed well without training. The third and final run used training data from the interactive sessions and these results are also shown in Table2 . As can be seen the train- ing sessions did substantially increase the performance.... ..."
Table 4: Performance with loops sliced, best execution times.
"... In PAGE 5: ... The more the parallelism is throttled, the less space it uses and the more time it takes to complete the execution in general. This is evidenced in Table4 where the slice sizes are chosen that favor execution time over the space, and in Table 5 where the space is favored over the time. In the tables, slice sizes are speci#0Ced in terms of the number of worker processes that are used to execute the parallel loops.... In PAGE 5: ... For instance, slice size 10,20 indicates that the innermost parallel loop would be split up between 10 worker processes and the outer, second level parallel loop is split up between 20 worker processes. Table4 shows that the processor utilizations is approxi- mately equal to that of the unconstrained case. Good pro- cessor utilization implies that the parallelism has not been throttled to the extent where the processor is sitting idle when it should not.... In PAGE 6: ... Since the best speedups and the best space savings have been extracted, the next logical step is to combine the two and try for good execution times with low space utilization. Table 6 shows the combination of the chunking size from Table 3 and the slice parameters from Table4 that favors the execution time over the space saving. For the problem sizes used in this paper, the space saving seems to give a good trade o#0B with respect to the time taken to solve the problem.... ..."
Table 2. Performance of the designs based on Corollary 1 (from low-level simulation) TI DSP (600MHz) DK1 based design (50MHz) Our design (120MHz)
2003
"... In PAGE 9: ... We also compared estimates from Section 3 against actual values based on implemented designs to test the accuracy of the performance estimation. We observed that the energy estimates (See Table2 ) were within 10% of the simulation results. The the average power dissipation of the designs on Virtex-II included the quiescent power of 150 mW (from XPower).... In PAGE 9: ... For the DSP, we chose the block size b,0 lt;b lt;min(n, 16) so as to minimize the energy dissipation. As seen from the results in Table2 , our FPGA implementations perform LU decomposition faster using less energy. While we used the high performance DSP processor, TI also provides low power devices, namely the TMS320VC55xx series.... ..."
Cited by 5
Table 5: Performance with loops sliced, best occupancy.
"... In PAGE 5: ... The more the parallelism is throttled, the less space it uses and the more time it takes to complete the execution in general. This is evidenced in Table 4 where the slice sizes are chosen that favor execution time over the space, and in Table5 where the space is favored over the time. In the tables, slice sizes are speci#0Ced in terms of the number of worker processes that are used to execute the parallel loops.... In PAGE 6: ...3#25 of the space. Table5 shows the results for those cases where proces- sor utilization remains fairly high but the space utilized is the lowest. In this case, the speedups drop even more with the range of -9.... In PAGE 6: ...6#25. Table 7 shows the combination of the chunking size from Table 3 and the slice parameters from Table5 , that favors the space saving over the execution time. This table shows greater savings in the matching store space utilization.... ..."
Table 1. The benchmark applications used for the performance evaluation of parallel Mermaid.
1998
"... In PAGE 5: ... The benchmark applications used for the performance evaluation of parallel Mermaid. marks can be found in Table1 . Of these three benchmarks, none requires execution-driven simulation (i.... ..."
Cited by 2
Table 1. The benchmark applications used for the performance evaluation of parallel Mermaid.
"... In PAGE 5: ... The benchmark applications used for the performance evaluation of parallel Mermaid. marks can be found in Table1 . Of these three benchmarks, none requires execution-driven simulation (i.... ..."
Table 4: Effect of received-packet batching on performance
1987
"... In PAGE 19: ...3BSD Kernel TCP Packet FilterPup/BSP Kbytes/Sec Figure 15: Relative performance of VMTP for bulk data transfer The packet-filter based implementation measured in table 3 uses received-packet batching. Table4 (and figure 16) shows that batching improves throughput by about 75% over identical code that reads just one packet per system call; the difference cannot be entirely due to decreased system call overhead, but may reflect reductions in context switching and dropped packets.... In PAGE 34: ...List of Tables Table 1: Cost of sending packets 16 Table 2: Relative performance of VMTP for small messages 16 Table 3: Relative performance of VMTP for bulk data transfer 17 Table4 : Effect of received-packet batching on performance 17 Table 5: Effect of user-level demultiplexing on performance 18 Table 6: Relative performance of stream protocol implementations 19 Table 7: Relative performance of Telnet 19 Table 8: Per-packet cost of user-level demultiplexing 20 Table 9: Per-packet cost of user-level demultiplexing with received-packet 21 batching Table 10: Cost of interpreting packet filters 21... ..."
Cited by 214
Table 4: Effect of received-packet batching on performance
1987
"... In PAGE 19: ...3BSD Kernel TCP Packet FilterPup/BSP Kbytes/Sec Figure 15: Relative performance of VMTP for bulk data transfer The packet-filter based implementation measured in table 3 uses received-packet batching. Table4 (and figure 16) shows that batching improves throughput by about 75% over identical code that reads just one packet per system call; the difference cannot be entirely due to decreased system call overhead, but may reflect reductions in context switching and dropped packets.... In PAGE 34: ...List of Tables Table 1: Cost of sending packets 16 Table 2: Relative performance of VMTP for small messages 16 Table 3: Relative performance of VMTP for bulk data transfer 17 Table4 : Effect of received-packet batching on performance 17 Table 5: Effect of user-level demultiplexing on performance 18 Table 6: Relative performance of stream protocol implementations 19 Table 7: Relative performance of Telnet 19 Table 8: Per-packet cost of user-level demultiplexing 20 Table 9: Per-packet cost of user-level demultiplexing with received-packet 21 batching Table 10: Cost of interpreting packet filters 21... ..."
Cited by 214
Results 1 - 10
of
251,040