• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 2,625
Next 10 →

Table 3: Performance of lock-based pipeline on ADI and Hydro for three di erent sizes. Note the sequential times are the same as for the one-way pipeline results. Nodes ADI (sec)

in Efficient Support for Pipelining in Distributed Shared Memory Systems
by Karthik Balasubramanian, David K. Lowenthal 1999
Cited by 5

Table 3: A lock serialization effect in Volrend.

in Monitoring Shared Virtual Memory Performance on a Myrinet-based PC Cluster
by Cheng Liao, Dongming Jiang, Liviu Iftode, Margaret Martonosi, Douglas W. Clark 1998
"... In PAGE 6: ... Even though the lock-based synchronization is more compli- cated in Cholesky and Volrend, statistics generated by the perfor- mance monitor can still demonstrate that lock serialization is the main reason for high lock overheads. Table3 gives one piece of the statistics data that exhibits the serialization on a lock in Vol- rend. Let us first focus on the first three rows in Table 3.... In PAGE 6: ... Table 3 gives one piece of the statistics data that exhibits the serialization on a lock in Vol- rend. Let us first focus on the first three rows in Table3 . When Node 1 is interrupted by Node 4 for a lock acquire at time x + 9:0, Node 1 has to wait for another 99 s (108:0 ? 9:0) to catch the lock.... ..."
Cited by 7

Table 4: The di erence in single-keyword read query response time between the lock-based and timestamp-based implementations of the Batch approach when an insert of a document of di erent size proceeds simultaneously.

in Efficient Real-Time Index Updates in Text Retrieval Systems
by Tzi-cker Chiueh, Lan Huang 1998
Cited by 16

Table 2: Basic Application Benchmark Characteristics. Instructions exclude the operating system idle loop. Update silent stores are indicated. Temporally silent stores are those captured with MESTI. IPC is across all processors. All lock-based data structures in the SPLASH-2 applications are padded to minimize coherence conflicts.

in Invalidate Protocol
by Kevin M. Lepak, Mikko H. Lipasti
"... In PAGE 5: ...D cache for stale storage capacities of 32KB and 128KB (for benchmarks described in Table2 and a machine configuration similar to Table 1, details in [20])1. We see that both stale storage capacities are effective at capturing useful temporally silent pairs across all benchmarks.... In PAGE 9: ...Application Benchmark Results We now compare the techniques with application benchmarks. Basic workload properties are shown in Table2 . We simulate three SPLASH-2 codes [34] and four commercial workloads with execution-driven simulation and measure performance using accepted statistical methods required for non-deterministic work- loads [1].... ..."

Table 2. Semantic locks for Map describe read locks that are taken when executing operations as well as lock based conflict detection that is done by writes at commit time. For example, the containsKey, get, put, and remove operations take a lock for the key that was passed as an argument to these methods. When a transaction containing put or remove operations commits, it aborts other transactions that hold locks on the keys it is adding or removing from the Map as well as on other transactions that have read the size of the Map if it is growing or shrinking.

in Transactional collection classes
by Brian D. Carlstrom, Austen Mcdonald, Michael Carbin, Christos Kozyrakis, Kunle Olukotun
"... In PAGE 6: ...similar elements of abstract state. Table2 shows the conditions under which locks are taken during different operations. Read operations lock abstract state throughout the transaction.... ..."

Table 2: Results in Network of Workstations (in seconds) Appl. Size Seq. One Proc. Two Proc. Four Proc. Eight Proc.

in JIAJIA: An SVM System Based on a New Cache Coherence Protocol
by Weiwu Hu, Weisong Shi, Zhimin Tang, Zhiyu Zhou, M. Rasit Eskicioglu 1999
"... In PAGE 17: ... In the testing, all libraries and applications are compiled by gcc with the -O2 optimization option3. Table2 and Table 3 present the results in network of SPARCstations and SP2 respectively. Execution time of sequential programs4 and parallel programs with one, two, four, and eight processors are given.... In PAGE 21: ... In this way, JIAJIA imposes little additional overhead on the single processor performance, because all coherence related actions are taken at the time of synchronization in JIAJIA apos;s lock-based cache coherence protocol. Test result of Table2 and Table 3 strongly validate our intention. With the exception of SOR, all results show that single processor performance of JIAJIA is similar to the sequential performance, and better than the single processor performance of CVM.... In PAGE 21: ...imilar to that of parallel program in one processor (27.38 seconds for 1024 1024 and 117.73 seconds for 2048 2048). The results in Table2 and Table 3 also show that CVM has considerable single processor system overhead for some applications. 5.... In PAGE 28: ... The disappointing performance of CVM comes from its system overhead. It can be seen from Table2 and Table 3 that, the single processor performance of CVM is much worse than that of sequential program which, as has been indicated, is obtained from parallel program of CVM (the parallel program of JIAJIA is also ported from that of CVM). To nd the reason for CVM apos;s bad single processor performance of TSP, we rst reduce CVM apos;s reserved shared memory from 4K pages to 256 pages and obtain a little improvement.... ..."
Cited by 36

Table 3: Results in IBM SP2 (in seconds) Appl. Size Seq. One Proc. Two Proc. Four Proc. Eight Proc.

in JIAJIA: An SVM System Based on a New Cache Coherence Protocol
by Weiwu Hu, Weisong Shi, Zhimin Tang, Zhiyu Zhou, M. Rasit Eskicioglu 1999
"... In PAGE 17: ... In the testing, all libraries and applications are compiled by gcc with the -O2 optimization option3. Table 2 and Table3 present the results in network of SPARCstations and SP2 respectively. Execution time of sequential programs4 and parallel programs with one, two, four, and eight processors are given.... In PAGE 21: ... In this way, JIAJIA imposes little additional overhead on the single processor performance, because all coherence related actions are taken at the time of synchronization in JIAJIA apos;s lock-based cache coherence protocol. Test result of Table 2 and Table3 strongly validate our intention. With the exception of SOR, all results show that single processor performance of JIAJIA is similar to the sequential performance, and better than the single processor performance of CVM.... In PAGE 21: ...imilar to that of parallel program in one processor (27.38 seconds for 1024 1024 and 117.73 seconds for 2048 2048). The results in Table 2 and Table3 also show that CVM has considerable single processor system overhead for some applications. 5.... In PAGE 28: ... The disappointing performance of CVM comes from its system overhead. It can be seen from Table 2 and Table3 that, the single processor performance of CVM is much worse than that of sequential program which, as has been indicated, is obtained from parallel program of CVM (the parallel program of JIAJIA is also ported from that of CVM). To nd the reason for CVM apos;s bad single processor performance of TSP, we rst reduce CVM apos;s reserved shared memory from 4K pages to 256 pages and obtain a little improvement.... ..."
Cited by 36

Table 2: Message Costs of Shared Memory Operations Protocols Access Miss Lock Unlock Barrier

in Where Does the Time Go in Software DSM Systems: Experiences with JIAJIA?
by Weisong Shi, Weiwu Hu, Zhimin Tang
"... In PAGE 7: ... The lock-based protocol has least coherence related overhead for ordinary read or write operations. Table2 shows the number of messages sent on an ordinary access miss, a lock, an unlock, or a barrier in the lazy release protocol and the lock-based protocol. The zero message count in access miss of the lock-based protocol represents the write miss on an RO page.... In PAGE 7: ... The zero message count in access miss of the lock-based protocol represents the write miss on an RO page. It can be seen from Table2 that, compared to the lazy release protocol, our protocol has less message cost on both ordinary accesses or lock, but requires to write di s back to home of associated pages on a release or a barrier. Besides, the lock-based protocol is free from the overhead of maintaining the directory.... ..."

Table 6: The number of synchronization operations of different scheduling algorithms in metacomputing environment Apps. Static SS BSS GSS FS TSS SSS AFS AAFS ABS

in unknown title
by unknown authors
"... In PAGE 12: ... Fig- ure 4 illustrates the execution time of different schemes under metacomputing environ- ment. The loop allocation overhead is listed as the number of synchronization operations in Table6 . The number of remote getpages, which reflects the effect of remote data com- munication, is shown in Table 7.... In PAGE 13: ... BGBABFBABEBA BTD2CPD0DDD7CXD7 Figure 4 illustrates the execution time of different schemes under metacomputing en- vironment. Similar to the analysis in dedicated environment, the loop allocation overhead is listed as the number of synchronization operations in Table6 . The number of remote getpages, which reflects the effect of remote data communication, is shown in Table 7.... In PAGE 13: ... The CBDDD2 overhead listed in Table 8 and Table 9 shows the effects of load imbalance. Similar to the dedicated environment, though SS promises the perfect load balance among all the nodes, SS remains the worst scheme for all applications except MM because of the large loop allocation overhead, as shown in Table6 . Furthermore, Table 7 shows the inherent drawback of traditional central queue based scheduling algorithm , i.... In PAGE 14: ... The performance of these 5 scheduling schemes is acceptable. However, as discussed in the last subsection, due to the large extra overhead resulting from loop alloca- tion and corresponding potential of violating processor affinity associated with BSS, GSS, FS, TSS, and SSS scheduling schemes, as listed in Table6 and Table 7, the performance of these 5 schemes remains unacceptable with respect to Static scheduling scheme in meta- computing environment for three fine grain kernels, which is illustrated in Figure 4. Among these five scheduling schemes, SSS is the worst due to the large chunk size allocated in the first phase, which results in large amount of waiting time at synchronization points.... In PAGE 14: ... For all five applications but TC, the performance of AFS is improved significantly com- pared with the former 6 dynamic scheduling schemes, as shown in Figure 4. Though the number of synchronization operations of AFS increases about one order of magnitude, as shown in Table6 , the number of remote getpage reduces about one order of magnitude, as listed in Table 7, which amortizes the effects of loop allocation overhead. Surprisingly, the number of remote getpage operations of AFS in TC leads to the contrary conclusion, i.... In PAGE 14: ...ynchronization time increases from 212.96 seconds in AFS to 541.22 seconds in AAFS. Other results are presented in Figure 4, Table6 and Table 7, and Table 9. Our results give the contrary conclusion to that presented in [23] where AAFS was better than AFS for all iSince the algorithm of FS and SSS is very similar except the value of chunk size in allocation phase, we compare it with FS here.... In PAGE 15: ... The ABS scheduling scheme achieves the best performance among all of the dynamic scheduling schemes, as shown in Figure 4, and is superior to Static for all 5 kernels except TC. From Table6 and Table 7, we can find that the great reduction of the number of getpages, an obvious result of the exploitation of processor affinity, amortizes the negative effects of the increasing of synchronization overhead. Table 8 and Table 9 show this case.... In PAGE 15: ... There is only 8% performance gap between them. Compared with AFS, the synchronization overhead resulting from loop allocation in ABS reduces about one order of magnitude or more, as shown in Table6 . Furthermore, the number of getpages reduces significantly because of the reduction of synchronization operations and the lock- based cache coherence protocol adopted in JIAJIA.... ..."

Table 3: Hierarchical Locking Rules example, to read a single record, a transaction would obtain IS locks on the database, relation, and page, followed by an S lock on the speci c tuple. If a transaction wanted to read all or most tuples on a page, then it could obtain IS locks on the database and relation, followed by an S lock on the entire page. By following this uniform protocol, potential con icts between transactions that ultimately obtain S and/or X locks at di erent granularities can be detected. A useful extension to hierarchical locking is known as lock escalation. Lock escalation allows the DBMS to automatically adjust the granularity at which transactions obtain locks based on their behavior. If the system detects that a transaction is obtaining locks on a large percentage of the granules that make up a larger granule, it can attempt to grant the transaction a lock on the larger granule so that no additional locks will be required for subsequent accesses to other objects in that granule. Automatic 22

in Concurrency Control and Recovery
by Michael Franklin 1997
Cited by 4
Next 10 →
Results 1 - 10 of 2,625
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University