## How GPUs can improve the quality of magnetic resonance imaging (2007)

Venue: | In The First Workshop on General Purpose Processing on Graphics Processing Units |

Citations: | 9 - 3 self |

### BibTeX

@INPROCEEDINGS{Stone07howgpus,

author = {Sam S. Stone and Haoran Yi and Justin P. Haldar and Wen-mei W. Hwu and Bradley P. Sutton and Zhi-pei Liang},

title = {How GPUs can improve the quality of magnetic resonance imaging},

booktitle = {In The First Workshop on General Purpose Processing on Graphics Processing Units},

year = {2007}

}

### OpenURL

### Abstract

Abstract — In magnetic resonance imaging (MRI), non-Cartesian scan trajectories are advantageous in a wide variety of emerging applications. Advanced reconstruction algorithms that operate directly on non-Cartesian scan data using optimality criteria such as least-squares (LS) can produce significantly better images than conventional algorithms that apply a fast Fourier transform (FFT) after interpolating the scan data onto a Cartesian grid. However, advanced LS reconstructions require significantly more computation than conventional reconstructions based on the FFT. For example, one LS algorithm requires nearly six hours to reconstruct a single three-dimensional image on a modern CPU. Our work demonstrates that this advanced reconstruction can be performed quickly and efficiently on a modern GPU, with the reconstruction of a 64 3 3D image requiring just three minutes, an acceptable latency for key applications. This paper describes how the reconstruction algorithm leverages the resources of the GeForce 8800 GTX (G80) to achieve over 150 GFLOPS in performance. We find that the combination of tiling the data and storing the data in the G80’s constant memory dramatically reduces the algorithm’s required bandwidth to off-chip memory. The G80’s special functional units provide substantial acceleration for the trigonometric computations in the algorithm’s inner loops. Finally, experiment-driven code transformations increase the reconstruction’s performance by as much as 60 % to 80%. I.

### Citations

466 |
Accelerated volume rendering and tomographic reconstruction using texture mapping hardware
- Cabral, Cam, et al.
- 1994
(Show Context)
Citation Context ... VI. RELATED WORK Medical imaging was one of the first GPGPU applications, with computed tomography (CT) reconstruction achieving a speedup of two orders of magnitude on the SGI RealityEngine in 1994 =-=[28]-=-. A wide variety of CT reconstruction algorithms have since been accelerated on the GPU [3], [29], [30], [31]. In [31] the GPU is used to accelerate Simultaneous Algebraic Reconstruction Technique (SA... |

181 |
The OpenGL Graphics System: A Specification, Version 1.2.1. Available at: http://www.opengl.org
- Segal, Akeley
(Show Context)
Citation Context ... [3], [5]. Furthermore, general-purpose applications targeting the G80 are developed using ANSI C with simple extensions, rather than the cumbersome graphics application programming interfaces (APIs) =-=[6]-=-, [7] and high-level languages layered on top of graphics APIs [8], [9], [10] that have been used in the past. Magnetic resonance imaging (MRI) is one application that can benefit greatly from these i... |

84 | Selection of a convolution function for Fourier inversion using gridding
- Jackson, Meyer, et al.
- 1991
(Show Context)
Citation Context ...onstruct non-Cartesian (non-uniformly sampled) data. In the most common approach, gridding, the data issfirst interpolated onto a uniform Cartesian grid and then reconstructed in one step via the FFT =-=[16]-=-, [17]. However, gridding satisfies no optimality criterion, and has limited ability to deal with important imaging scenarios, such as parallel imaging [18]. In short, non-Cartesian scan trajectories ... |

83 | Nonuniform fast Fourier transforms using Min-Max interpolation
- Fessler, Sutton
(Show Context)
Citation Context ...duces inaccuracies and produces sub-optimal images. By contrast, optimal image reconstructions can be performed using advanced algorithms that perform the reconstruction iteratively [19], [20], [21], =-=[22]-=-, [23], [24]. These iterative algorithms require substantially more computation than algorithms based on gridding. A class of iterative algorithms leverages the observations of Wajer et al. in [25] to... |

78 | Accelerator: using data parallelism to program GPUs for general-purpose uses
- Tarditi, Puri, et al.
- 2006
(Show Context)
Citation Context ...eveloped using ANSI C with simple extensions, rather than the cumbersome graphics application programming interfaces (APIs) [6], [7] and high-level languages layered on top of graphics APIs [8], [9], =-=[10]-=- that have been used in the past. Magnetic resonance imaging (MRI) is one application that can benefit greatly from these increases in computational resources and advancements in architecture and prog... |

71 | SENSE: sensitivity encoding for fast MRI
- Pruessmann
- 1999
(Show Context)
Citation Context ... then reconstructed in one step via the FFT [16], [17]. However, gridding satisfies no optimality criterion, and has limited ability to deal with important imaging scenarios, such as parallel imaging =-=[18]-=-. In short, non-Cartesian scan trajectories are often superior to Cartesian scan trajectories in terms of the quality of the data obtained during the scan. Nonuniformly sampled data can be interpolate... |

64 |
Image Formation by Induced Local Interactions: Examples Employing Nuclear Magnetic Resonance." Nature 242(5394
- Lauterbur
- 1973
(Show Context)
Citation Context ...timal reconstruction from data collected with more general sampling trajectories. However, non-Cartesian Fourier sampling is becoming increasingly common in MRI. For example, trajectories with radial =-=[12]-=-, spiral [13], stochastic [14], and randomly-perturbed [15] sampling patterns can be superior to Cartesian trajectories in terms of imaging speed, hardware requirements, and sensitivity to artifacts c... |

31 |
The gridding method for image reconstruction by Fourier transformation
- Schomberg, Timmer
- 1995
(Show Context)
Citation Context ...ct non-Cartesian (non-uniformly sampled) data. In the most common approach, gridding, the data issfirst interpolated onto a uniform Cartesian grid and then reconstructed in one step via the FFT [16], =-=[17]-=-. However, gridding satisfies no optimality criterion, and has limited ability to deal with important imaging scenarios, such as parallel imaging [18]. In short, non-Cartesian scan trajectories are of... |

31 |
Advances in sensitivity encoding with arbitrary k-space trajectories. Magnetic Resonance in Medicine
- Pruessmann, Weiger, et al.
- 2001
(Show Context)
Citation Context ...is technique introduces inaccuracies and produces sub-optimal images. By contrast, optimal image reconstructions can be performed using advanced algorithms that perform the reconstruction iteratively =-=[19]-=-, [20], [21], [22], [23], [24]. These iterative algorithms require substantially more computation than algorithms based on gridding. A class of iterative algorithms leverages the observations of Wajer... |

30 | Rapid 3-D cone-beam reconstruction with the simultaneous algebraic reconstruction technique (SART) using 2-D texture mapping hardware
- Mueller, Yagel
- 2000
(Show Context)
Citation Context ...nstruction achieving a speedup of two orders of magnitude on the SGI RealityEngine in 1994 [28]. A wide variety of CT reconstruction algorithms have since been accelerated on the GPU [3], [29], [30], =-=[31]-=-. In [31] the GPU is used to accelerate Simultaneous Algebraic Reconstruction Technique (SART), an algorithm that increases the quality of image reconstruction relative to the conventional filtered ba... |

19 |
Nvidia cuda software and gpu parallel computing architecture,” Microprocessor Forum
- Nickolls
- 2007
(Show Context)
Citation Context ...DIA’s Compute Unified Device Architecture (CUDA) and the microarchitectural features of the G80 that are most relevant to accelerating MRI reconstruction. A more complete description is found in [2], =-=[11]-=-. From the application developer’s perspective, the CUDA programming model consists of ANSI C supported by several keywords and constructs. CUDA treats the GPU as a coprocessor that executes dataparal... |

19 |
iterative image reconstruction for MRI in the presence of field inhomogeneities
- Sutton, Noll, et al.
- 2003
(Show Context)
Citation Context ... introduces inaccuracies and produces sub-optimal images. By contrast, optimal image reconstructions can be performed using advanced algorithms that perform the reconstruction iteratively [19], [20], =-=[21]-=-, [22], [23], [24]. These iterative algorithms require substantially more computation than algorithms based on gridding. A class of iterative algorithms leverages the observations of Wajer et al. in [... |

18 | Toeplitz-based iterative image reconstruction for MRI with correction for magnetic field inhomogeneity
- Fessler, Lee, et al.
- 2005
(Show Context)
Citation Context ...hnique introduces inaccuracies and produces sub-optimal images. By contrast, optimal image reconstructions can be performed using advanced algorithms that perform the reconstruction iteratively [19], =-=[20]-=-, [21], [22], [23], [24]. These iterative algorithms require substantially more computation than algorithms based on gridding. A class of iterative algorithms leverages the observations of Wajer et al... |

17 |
Fourier volume rendering on the GPU using a split-stream FFT
- Jansen, Rymon-Lipinski, et al.
- 2004
(Show Context)
Citation Context ...n this area has focused on accelerating the fast Fourier transform (FFT), which is a key component of many MRI reconstruction algorithms. Speedups on the order of 2x-9x have been achieved [32], [33], =-=[34]-=-. VII. CONCLUSIONS AND FUTURE WORK The computational resources, architectural features, and programmability of the GeForce 8800 GTX reduce the time for an optimal reconstruction of non-uniform MRI sca... |

14 |
Non-Cartesian MRI Scan Time Reduction through Sparse Sampling
- Wajer
- 2001
(Show Context)
Citation Context ...], [22], [23], [24]. These iterative algorithms require substantially more computation than algorithms based on gridding. A class of iterative algorithms leverages the observations of Wajer et al. in =-=[25]-=- to remove all approximations from the reconstruction while simultaneously improving the reconstruction’s speed. These advanced algorithm have been impractical for large-scale problems due to computat... |

11 | Exploring graphics processor performance for general purpose applications
- Trancoso, Charalambous
- 2005
(Show Context)
Citation Context ...flow path. As the SPMD programming model has been used on massively parallel supercomputers in the past, it natural to expect that many high-performance applicationsswill perform well on the GPU [3], =-=[5]-=-. Furthermore, general-purpose applications targeting the G80 are developed using ANSI C with simple extensions, rather than the cumbersome graphics application programming interfaces (APIs) [6], [7] ... |

11 |
High-speed spiral-scan echo planar NMR imaging
- Ahn, Kim, et al.
- 1986
(Show Context)
Citation Context ...ruction from data collected with more general sampling trajectories. However, non-Cartesian Fourier sampling is becoming increasingly common in MRI. For example, trajectories with radial [12], spiral =-=[13]-=-, stochastic [14], and randomly-perturbed [15] sampling patterns can be superior to Cartesian trajectories in terms of imaging speed, hardware requirements, and sensitivity to artifacts caused by non-... |

11 |
Acceleration of fluoroCT reconstruction for a mobile C-arm on GPU and FPGA hardware: a simulation study
- Xue, Cheryauka, et al.
- 2006
(Show Context)
Citation Context ...hy (CT) reconstruction achieving a speedup of two orders of magnitude on the SGI RealityEngine in 1994 [28]. A wide variety of CT reconstruction algorithms have since been accelerated on the GPU [3], =-=[29]-=-, [30], [31]. In [31] the GPU is used to accelerate Simultaneous Algebraic Reconstruction Technique (SART), an algorithm that increases the quality of image reconstruction relative to the conventional... |

9 | Compressed sensing for rapid MR imaging
- Lustig, Lee, et al.
- 2005
(Show Context)
Citation Context ...sampling trajectories. However, non-Cartesian Fourier sampling is becoming increasingly common in MRI. For example, trajectories with radial [12], spiral [13], stochastic [14], and randomly-perturbed =-=[15]-=- sampling patterns can be superior to Cartesian trajectories in terms of imaging speed, hardware requirements, and sensitivity to artifacts caused by non-ideal experimental conditions. A variety of te... |

9 |
MR image reconstruction using the GPU
- Schiwietz, Chang, et al.
- 2006
(Show Context)
Citation Context ...arch in this area has focused on accelerating the fast Fourier transform (FFT), which is a key component of many MRI reconstruction algorithms. Speedups on the order of 2x-9x have been achieved [32], =-=[33]-=-, [34]. VII. CONCLUSIONS AND FUTURE WORK The computational resources, architectural features, and programmability of the GeForce 8800 GTX reduce the time for an optimal reconstruction of non-uniform M... |

8 |
Rapid emission tomography reconstruction
- Möller, T
- 2003
(Show Context)
Citation Context ...) reconstruction achieving a speedup of two orders of magnitude on the SGI RealityEngine in 1994 [28]. A wide variety of CT reconstruction algorithms have since been accelerated on the GPU [3], [29], =-=[30]-=-, [31]. In [31] the GPU is used to accelerate Simultaneous Algebraic Reconstruction Technique (SART), an algorithm that increases the quality of image reconstruction relative to the conventional filte... |

6 |
Why do commodity graphics hardware boards (GPUs) work so well for acceleration of computed tomography
- Mueller, Xu, et al.
(Show Context)
Citation Context ...g, fast context switching, and high memory bandwidth to tolerate ever-increasing latencies to main memory by overlapping long-latency loads in stalled threads with useful computation in other threads =-=[3]-=-. % !"# $ & ! & & ! & & ! & & ! % & % & ! & Fig. 1. Theoretical peak GFLOPS on modern GPUs and CPUs. Each core of the Core2 processors can retire four single-precision, multiply-accumulate operations ... |

5 |
Specification v0.2
- Buck, Brook
- 2003
(Show Context)
Citation Context ...are developed using ANSI C with simple extensions, rather than the cumbersome graphics application programming interfaces (APIs) [6], [7] and high-level languages layered on top of graphics APIs [8], =-=[9]-=-, [10] that have been used in the past. Magnetic resonance imaging (MRI) is one application that can benefit greatly from these increases in computational resources and advancements in architecture an... |

5 |
Frequency resolved single-shot MR imaging using stochastic k-space trajectories. Magn Reson Med 1996;35:569
- Scheffler, Hennig
(Show Context)
Citation Context ... collected with more general sampling trajectories. However, non-Cartesian Fourier sampling is becoming increasingly common in MRI. For example, trajectories with radial [12], spiral [13], stochastic =-=[14]-=-, and randomly-perturbed [15] sampling patterns can be superior to Cartesian trajectories in terms of imaging speed, hardware requirements, and sensitivity to artifacts caused by non-ideal experimenta... |

2 |
Iterative reconstruction methods for non-cartesian MRI
- Fessler, Noll
- 2007
(Show Context)
Citation Context ...H d � Q (xn) = n = M� m=1 F H Fρ = F H d (1) M� |φ(km)| 2 e (i2πkm·xn) m=1 φ ∗ (km)d(km)e (i2πkm·xn) (2) (3) These precomputations are usually approximated using a gridding-type technique [25], [20], =-=[26]-=-. However, accurately approximating these precomputations becomes computationally difficult, particularly as the dimensionality of the problem increases beyond the typical 2D images to which this tech... |

2 |
Medical Image Reconstruction with the FFT. GPU Gems 2: Programming Techniques for High-Performance Graphics and GeneralPurpose Computation
- Sumanaweera, Liu
- 2005
(Show Context)
Citation Context .... Research in this area has focused on accelerating the fast Fourier transform (FFT), which is a key component of many MRI reconstruction algorithms. Speedups on the order of 2x-9x have been achieved =-=[32]-=-, [33], [34]. VII. CONCLUSIONS AND FUTURE WORK The computational resources, architectural features, and programmability of the GeForce 8800 GTX reduce the time for an optimal reconstruction of non-uni... |

1 |
High-resolution diffusion MRI
- Haldar, Liang
(Show Context)
Citation Context ...racies and produces sub-optimal images. By contrast, optimal image reconstructions can be performed using advanced algorithms that perform the reconstruction iteratively [19], [20], [21], [22], [23], =-=[24]-=-. These iterative algorithms require substantially more computation than algorithms based on gridding. A class of iterative algorithms leverages the observations of Wajer et al. in [25] to remove all ... |

1 |
MR image reconstruction using
- Schiwietz, Chang, et al.
- 2006
(Show Context)
Citation Context ...arch in this area has focused on accelerating the fast Fourier transform (FFT), which is a key component of many MRI reconstruction algorithms. Speedups on the order of 2x-9x have been achieved [32], =-=[33]-=-, [34]. VII. CONCLUSIONS AND FUTURE WORK The computational resources, architectural features, and programmability of the GeForce 8800 GTX reduce the time for an optimal reconstruction of non-uniform M... |