Improving effective bandwidth through compiler enhancement of global cache reuse (2001)
Cached
Download Links
- [www.cs.rochester.edu]
- [www.cs.rochester.edu]
- [softlib.rice.edu]
- [www.cs.rochester.edu]
- [www.cs.rochester.edu]
- [www.cs.rochester.edu]
- [softlib.rice.edu]
- DBLP
Other Repositories/Bibliography
| Venue: | In Proceedings of International Parallel and Distributed Processing Symposium |
| Citations: | 62 - 17 self |
BibTeX
@INPROCEEDINGS{Ding01improvingeffective,
author = {Chen Ding},
title = {Improving effective bandwidth through compiler enhancement of global cache reuse},
booktitle = {In Proceedings of International Parallel and Distributed Processing Symposium},
year = {2001},
publisher = {}
}
Years of Citing Articles
OpenURL
Abstract
While CPU speed has been improved by a factor of 6400 over the past twenty years, memory bandwidth has increased by a factor of only 139 during the same period. Consequently, on modern machines the limited data supply simply cannot keep a CPU busy, and applications often utilize only a few percent of peak CPU performance. The hardware solution, which provides layers of high-bandwidth data cache, is not effective for large and complex applications primarily for two reasons: far-separated data reuse and large-stride data access. The first repeats unnecessary transfer and the second communicates useless data. Both waste memory bandwidth. This dissertation pursues a software remedy. It investigates the potential for compiler optimizations to alter program behavior and reduce its memory bandwidth consumption. To this end, this research has studied a two-step transformation strategy: first fuse computations on the same data and then group data used by the same computation. Existing techniques such as loop blocking can be viewed as an application of this strategy within a single loop nest. In order to carry out this strategy







