Abstract
A real-time implementation of self-calibrating GRAPPA Operator Gridding (SC-GROG) for radial acquisitions is presented. SC-GROG is a parallel-imaging based, parameter-free gridding algorithm, where coil sensitivity profiles are used to calculate gridding weights. SC-GROG’s weight-set calculation and image reconstruction steps are decoupled into two distinct processes, implemented in C++ and parallelized. This decoupling allows the weights to be updated adaptively in the background while image reconstruction threads use the most recent gridding weights to grid and reconstruct images. All possible combinations of 2D gridding weights GxmGyn are evaluated for m,n={−0.5, −0.4,…,0,0.1,…,0.5} and stored in a look-up-table (LUT). Consequently, the per-sample 2D weights calculation during gridding is eliminated from the reconstruction process and replaced by a simple LUT access. In practice, up to 34x faster reconstruction than conventional (parallelized) SC-GROG is achieved. On a 32-coil dataset of size 128×64, reconstruction performance is 14.5fps, while the data acquisition is 6.6fps.
Keywords: GROG, radial trajectories, real-time MRI, parallel MRI
Introduction
Even though non-Cartesian trajectories have been shown to be beneficial in some clinical applications for acquiring MR data (1–3), Cartesian trajectories have gained more widespread use in the clinic as the image reconstruction process is less demanding in terms of computation and execution time. Typically, non-Cartesian data is mapped onto a Cartesian grid, a process known as gridding, prior to image reconstruction. Several methods have been proposed to perform the gridding, such as convolution gridding (4), URS/BURS (5, 6), and some iterative methods such as INNG (7) and DING (8). All these methods are computationally demanding and the reconstruction quality depends on several different parameters.
GRAPPA Operator Gridding (GROG) (9, 10) is a gridding algorithm that uses coil sensitivity profiles to calculate k-space shifting weights in the kx and ky directions to shift non-Cartesian samples to their corresponding Cartesian grid locations. In GROG, density compensation is straightforward. If more than one non-Cartesian samples are mapped to the same Cartesian grid location, averaging is performed to calculate final Cartesian sample value. GROG algorithm requires an additional Cartesian dataset to use as a calibration data to calculate the weights, as in GRAPPA (11). A self-calibrating version of GROG (SC-GROG) is presented in (12). SC-GROG uses the non-Cartesian data points themselves as a calibration dataset and provides a parameter-free, parallel-imaging based, robust, self-calibrating gridding algorithm that can be applied to grid both undersampled and fully sampled datasets. The quality of (SC)-GROG images is on par with conventional gridding (10, 12).
In this work, we present the first real-time implementation of the SC-GROG algorithm for radial acquisitions. SC-GROG’s weight-set calculation and image reconstruction steps are decoupled into two separate processes, modified for better performance and then parallelized for a generic purpose architecture. Decoupling makes it possible to continuously update the kx and ky weights for each slice in the background, while the reconstruction threads use the most recent weights for gridding, as in (13, 14). Additionally, per sample 2D shifting weights GxmGyn that are normally calculated during gridding are evaluated for all possible “non-Cartesian to Cartesian” distance combinations in x and y directions (m,n = {−0.5, −0.4, …, 0, 0.1, …,0.5}). The results are stored in a look-up-table (LUT) by the weight calculation process. In this way, the per-sample 2D weight calculation is replaced by a simple LUT access, providing increased reconstruction performance.
Methods
Basic Algorithm
SC-GROG uses a (Nc × Nc\) weight-set to perform gridding, where Nc represents number of receiver coils used during the acquisition. Let G1 designate unitary weight-set (for unitary shifts in k-space, Δk) in one dimension. The weights for smaller shifts can be calculated by:
| (1) |
where −1 < m < 1. Gridding of a 2D k-space sample s(kx, ky) can be generalized using the properties of the GRAPPA operator (9):
| (2) |
where δx and δy is the distance of this non-Cartesian sample to its nearest Cartesian grid location in kx and ky direction, Gx and Gy are the unitary weights in the kx and ky directions, and finally s(kx+ δx, ky+ δy) represents the gridded signal at the Cartesian location (kx+ δx, ky+ δy). A schematic of this operation is shown in Fig. 1. If multiple non-Cartesian samples are mapped to the same Cartesian location, the results are averaged. Therefore, no Density Compensation Function is required.
FIG. 1.
GROG gridding for two samples S1 and S2 of a projection Pi is represented. Gridding weights Gx, and Gy are applied to the samples S1 and S2 according to the kx and ky distances to their closest Cartesian grid neighbors and , and their contributions to these Cartesian grid points are calculated by and .
Calculation of Gx and Gy is not straightforward. First, an angular weight-set Gθi must be calculated for each projection (i = 1, …,Np where Np is the number of projections). Gθ represents the unitary shift operator of size (Nc × Nc) along the given radial ray: s(θ, r+1) = Gθ s(θ, r). Therefore,
| (3) |
where pinv represents the pseudo-inverse operator, s(θ, r) represents multi-coil data of projection θ at position r. The set of angular weights can be decoupled into two distinct shifting weights in the kx and ky directions. This operation is performed by noticing that Gθ = (Gx)m (Gy)n where m and n represent the shifts in the kx and ky directions between each consecutive sample along given projection:
| (4) |
Equation 4 can be linearized by taking the matrix logarithm of each side:
| (5) |
After linearization, Eq. 5 can be reorganized as a matrix equation and solved for ln(Gx) and ln(Gy). Subsequently, Gx (and Gy) is calculated by taking the matrix exponentials, Gx = exp(ln(Gx)) (and Gy = exp(ln(Gy)), respectively).
Implementation
RT-GROG has been implemented in C++ and parallelized using Pthreads (http://pasc.org) and OpenMP (http://openmp.org). The weight-set calculation and reconstruction processes are decoupled to be executed asynchronously in parallel. A separate C++ class is dedicated to weights calculation (GrogWsCalculator) and image reconstruction (GrogRegridder) processes. For simple usage, another C++ class Grog that encapsulates weights calculation and image reconstruction objects is provided. When an instance of Grog is created, weights calculation and image reconstruction object instances are automatically created, and all threads are initiated. Raw data from acquired frames are fed to the instance of Grog, and reconstructed image is obtained as the output of a method, recon().
The weight-set calculation thread continuously updates the weights in the background to track changes in the coil sensitivities, while the reconstruction threads grid non-Cartesian samples using the latest weights and reconstruct images in real-time. A total of Nc reconstruction threads are initially created by the Pthreads library, and the same threads with the same allocated memory variables are used during the whole execution to prevent continuous context switching and thus to avoid performance penalty. The weight-set calculation process uses OpenMP library to allow dynamic change of the number of threads devoted to the weight-set calculation. The number of threads used for the weight-set calculation may be decreased dynamically after an initial set of weights, thereby devoting more resources to image reconstruction.
A block diagram of the RT-GROG implementation is shown in Fig. 2. The upper dotted rectangle describes the weights calculation process, and the lower rectangle represents the reconstruction process. Every acquired frame is fed to the integration window of the weights calculation process to obtain the calibration dataset. Following the calculation of local shifting weights Gθ, Gx and Gy are determined. Subsequently, the LUT is computed. The reconstruction process uses most recent LUT to perform gridding. The final image is obtained by combining per-coil images using sum-of-squares.
FIG. 2.
Block diagram of RT-GROG is represented. Dotted rectangles represent asynchronously executed parallel regions (upper dotted rectangle: weights calculation process, lower dotted rectangle: reconstruction process).
RT-GROG was computed in parallel using 8 dual-core AMD Opteron 8220 processor (2.8GHz) on Linux-2.6.16.46-0.12-smp. Pseudo-inverse operations and calculation of the eigenvectors were realized using the AMD Core Math Library (ACML) (http://developer.amd.com/acml). For the matrix operations, the ATLAS library is used (http://math-atlas.sourceforge.net). The FFTW library (http://fftw.org) was used for Fast Fourier Transformations. The code was compiled on GCC (http://gcc.gnu.org) version 4.2.2. Matrix power, matrix logarithm and matrix exponential operations were implemented assuming that Gx, Gy are diagonalizable matrices. Let M be a diagonalizable square matrix, and V matrix of eigenvectors of M. M can be projected into another space where its projection M′ will be a diagonal matrix:
| (6) |
Thus,
| (7) |
As M′ is a diagonal matrix, the calculations in Eq. 7 are realized by replacing every number on the diagonal of M′ by its logarithm, its exponential, or its power, respectively. All projections are used to calculate Gx and Gy to achieve the best accuracy. We assume a step size of 0.1 (or 1/10 of unit in k-space) for distance measurements between a non-Cartesian sample and its closest Cartesian grid neighbor. For example, the distance of 0.123 is rounded to 0.1, while 0.167 is rounded to 0.2. This assumption makes it possible to pre-calculate all possible combinations of per sample 2D weights (Gx)m(Gy)n for m, n = {−0.5, −0.4, …, 0, 0.1, …, 0.5} and to store them in a look-up table (LUT) Gxy by the weight-set calculation thread. In this way, Eq. 2 is simplified to
| (8) |
that eliminates per sample (Gx)δx(Gy) δy operations during the gridding. Reconstruction threads only calculate the distance (δx, δy) of each sample to its closest Cartesian grid neighbor, and retrieve the appropriate pre-calculated 2D weight-set from LUT. Consequently, reconstruction performance is greatly improved.
Data Acquisition
For validation, several phantom datasets were acquired using a short, wide bore Siemens Magnetom Espree 1.5T (Siemens Medical Solutions, Erlangen, Germany). The sequence used was 2D radial TrueFisp with flip angle=45 degrees, TR=3.76ms, FOV=340mm. A total of 6–15 receiver coils were used to acquire 50–400 projections with 256 samples. Images were reconstructed in real-time and results were analyzed in term of performance and image quality.
Following in-vitro validation, several real-time cardiac images of a healthy subject were acquired using a Siemens Magnetom Avanto 1.5T (Siemens Medical Solutions, Erlangen, Germany) using the same sequence with fewer projections (64 and 96) and fewer samples (128 and 192) using 32 receiver coils (Invivo Corporation). The human subject protocol (NCT00720460) was approved by the NHLBI Institutional Review Board. All subjects consented to participate in writing. The TR was 2.36ms for 128 sample acquisitions, and 2.86ms for 192 sample acquisitions.
RESULTS
Image Quality
Phantom images were reconstructed using 6–15 coil datasets with 50–400 projections. Figure 3.a–c represents images reconstructed from data acquired with 15 coil with Np = 256, 128, 64, from left to right.
FIG. 3.
RT-GROG reconstructed images are presented. Upper images represents phantom images reconstructed from a 15 coil dataset with a. 256 projections, b. 128 projections, and c. 64 projections (256 samples on each). Lower image: d. cardiac cine image series reconstructed in real time using a 32 coil dataset (64 projections, 128 samples). Acquisition time: 151 ms. Reconstruction time: 69 ms.
Free-breathing non-gated short axis cardiac data were acquired with 32 coils (192×96, 128×64 and 192×64). Figure 3.d represents real-time reconstructed cine images of a beating heart from a 32 coil dataset on short axis (64 projections, 128 samples). These images show that a fixed set of SC-GROG weights may be used to grid multiple consecutive acquisitions.
Performance
The performances of the image reconstruction and weight-set calculation (in seconds) on 12 coil phantom data with different number of projections are represented in Fig. 4. With 32 coil cardiac dataset, the reconstruction was up to 3 times faster than data acquisition. For acquisitions of 128×96 (192×96, 128×64, respectively), the reconstruction rate was 13.2 fps (10.0 fps, 14.5 fps respectively), while the data acquisition rate was 4.4 fps (3.6 fps, 6.6 fps, respectively).
FIG. 4.
Image reconstruction and weight-set calculation performances in seconds for 12 coil phantom data and different number of projections (number of samples = 256).
DISCUSSION
When reconstructing images from a 12 coil phantom dataset with RT-GROG, image reconstruction speed was between 52–83 ms (12.0–19.2 fps), while the data acquisition was between 188–1504 ms (0.6–5 fps). Reconstruction speeds of 69–101 ms (10.0–14.5 fps) were achieved with a 32 coil cardiac dataset while the data acquisition speeds were 151–274 ms (3.6–6.6 fps). The weight-set update speeds were between 321–2431 ms for the phantom dataset and 294–646 ms for the cardiac dataset. RT-GROG reconstruction was faster than the data acquisition, even with smaller matrix sizes, such as 128×32 (18fps reconstruction, 13fps acquisition). However, because SC-GROG does not fill-up sub-sampled k-space data, reconstructed images suffered in term of quality. View sharing could be used, but it is not preferred because it introduces blurring due to motion. Further improved image quality on undersampled datasets may be obtained by using receiver coil sensitivity profiles to estimate missing non-Cartesian data samples prior or subsequent to gridding (15,16).
MR image reconstruction implicitly assumes that signal encoding employs perfectly linear gradients. In practice, gradient non-linearities, spatially varying inhomogeneities and time-varying eddy-currents cause deviations from this ideal. Furthermore, MR gradient coils have independent errors and delays that can cause deviations from the expected k-space trajectory. In general quite difficult to be certain of the particular k-space location for a sample and ascribe a unique value for it. Prior experience with non-Cartesian MR reconstruction suggests that the k-space location must be provided with significantly better precision than 1 k-space sample, but there is little to be gained by using full precision. The effect of different step sizes/precisions on image quality and performance are represented in Fig. 5. The full precision reference image of a phantom is reconstructed using conventional SC-GROG. Subsequently, the same frame is reconstructed using RT-GROG with different step sizes, and Root-Mean-Square-Error (RMSE) values are calculated for each case. The full precision image reconstruction is 39 times slower than RT-GROG reconstruction. In theory, changes in the step size should not affect RT-GROG reconstruction performance as only one LUT access per sample is required. However, with small step sizes, the weights calculation process uses shared system resources more aggressively, and consequently slows down the reconstruction. Let t be the LUT computation time for a given step size s. Increasing the precision by n implies n2 times bigger LUT, thus theoretically LUT computation time increases to n2t. In practice, the overall slow-down in LUT calculations with increased precisions is more than the theoretical values due to increased system overload. Our results indicate that step size of 0.1 provides the best compromise between image quality and computational efficiency on our current hardware: reconstruction performance stabilizes (~0.100s for step sizes ≥0.1), and reconstructed images are virtually identical to full precision images with step size of 0.1. Additionally, LUT computation time is negligible (22ms) when compared to SC-GROG weights calculation time (2.390s). On a 32-coil dataset of size 128×64, SC-GROG weights are calculated in 294ms, LUT with step size of 0.1 is computed in 70ms, and LUT with step size of 0.05 in 285ms. To further investigate our step size choice, frames were reconstructed using conventional SC-GROG (with full precision) and the discretized LUT version with step size of 0.1. The signal difference between these images was less than 40% of the standard deviation of the noise of the full precision image, supporting the assertion that artifacts from using step size of 0.1 are below the noise floor and essentially negligible. Please note that smaller step sizes should be preferred on faster computer hardware to improve image quality.
FIG. 5.
The effects of different step sizes on image quality and computation times are represented. 15 coil phantom data with 256 projections (256 samples) is reconstructed using full-precision SC-GROG as a reference image (most left image). Subsequently, the same frame is reconstructed using different step sizes, and the difference images are calculated to compute Root-Mean-Square-Error (RMSE) values for each case.
Reconstruction performance comparison between conventional SC-GROG and RT-GROG is given in table 1. 34-fold performance increase is observed with RT-GROG for the reconstruction of 32 coil, 192×96 data over conventional, parallelized SC-GROG. A substantial comparison of the (SC-)GROG and conventional gridding approaches for non-Cartesian MRI reconstruction is given in the journal papers on GROG (10,12). With GROG algorithm, gridding of a non-Cartesian sample requires Nc weighting coefficients. Convolution gridding uses a 2D kernel with variable size, MxN. Therefore, GROG is computationally more efficient than convolution gridding for Nc ≤ MxN. If Nc > MxN, the convolution gridding method will map the non-Cartesian samples onto Cartesian grid faster than GROG. However, considering the additional calculations required by convolution gridding (e.g. non-trivial density compensation with some trajectories), we believe that GROG’s reconstruction performance will be better than (or at least on par with) convolution gridding method in a clinical environment where number of receiver coils is generally around 8–12. Parameter-free reconstruction is beneficial in clinic as it eliminates imaging errors due to parameterization. Therefore RT-GROG should be preferable over conventional gridding even if it falls behind in term of performance when high numbers of receiver coils are used.
Table 1.
Reconstruction performance comparison (in seconds) between conventional parallelized SC-GROG and RT-GROG on 32 coil datasets
| Matrix Size | ||||
|---|---|---|---|---|
| 128 × 64 | 128 × 96 | 192 × 64 | 192 × 96 | |
| SC-GROG | 1.412 | 1.972 | 2.285 | 3.4525 |
| RT-GROG | 0.069 | 0.076 | 0.101 | 0.123 |
LUT computation increased the weights calculation time by 70ms (on average) on 32 coil cardiac dataset. Considering the reconstruction performance gain (up to 34x on cardiac dataset) and the fact that one fixed set of SC-GROG coefficients can grid multiple consecutive acquisitions, this additional 70ms considered to be reasonable.
Our results emphasize 32-coil acquisitions on a 16 core workstation. It may be difficult to provide such a workstation on a basic clinical environment. However, 8–12 coil acquisitions provide decent image quality, imply faster reconstruction performance, and can be reconstructed in real-time on a modern quad-core workstation. If implemented on a GPU, implementation costs can be further decreased. Scanner manufacturers are now providing multi-core reconstruction computers. Thus, we believe that RT-GROG can be employed for clinical purposes with minimum effort. Additionally, OpenMP parallelism of the weight-set calculation thread may permit real-time synchronous SC-GROG reconstruction (a new set of weights per acquisition without delays) if enough CPU resources are dedicated to the weight-set calculation process. Our implementation can also be applied to constant-angular-velocity and reordered constant-linear-velocity spiral trajectories (12).
CONCLUSION
Highly parallel low latency real-time SC-GROG implementation for radial acquisitions was developed and demonstrated. The weight-set calculation for SC-GROG and the image reconstruction steps are separated into two individual processes, and implemented in parallel to run asynchronously in real-time on a general purpose architecture. Following the calculation of the two 1D gridding weights Gx and Gy, all possible combinations of per-sample 2D weights (Gx)m (Gy)n are pre-calculated for m, n = {−0.5, −0.4, … 0, 0.1, … 0.5} and stored in a look-up-table (LUT) by the weight-set calculation process. Consequently, the per-sample 2D weight calculation during gridding is eliminated from the reconstruction process and replaced by a simple LUT access. In this way, reconstruction performance is improved to better suit real-time requirements. RT-GROG is auto-calibrated, parameter-free, and highly parallel to overcome slice plane changes during real-time MRI. It is possible to dynamically change number of threads associated to weights calculation for improved auto-adaptation. Up to 34-fold reconstruction performance increase obtained over conventional SC-GROG when reconstructing 32 coil dataset.
References
- 1.Glover GH, Pauly JM. Projection reconstruction techniques for reduction of motion effects in MRI. Magn Reson Med. 1992;28:275–289. doi: 10.1002/mrm.1910280209. [DOI] [PubMed] [Google Scholar]
- 2.Meyer CH, Hu BS, Nishimura DG, Macovski A. Fast spiral coronary artery imaging. Magn Reson Med. 1992;28:202–213. doi: 10.1002/mrm.1910280204. [DOI] [PubMed] [Google Scholar]
- 3.Pipe JG. Motion correction with PROPELLER MRI: application to head motion and free-breathing cardiac imaging. Magn Reson Med. 1999;42:963–969. doi: 10.1002/(sici)1522-2594(199911)42:5<963::aid-mrm17>3.0.co;2-l. [DOI] [PubMed] [Google Scholar]
- 4.Jackson JI, Meyer CH, Nishimura DG, Macovski A. Selection of a convolution function for fourier inversion using gridding [computerised tomography application] IEEE Trans Med Imaging. 1991;10:473–478. doi: 10.1109/42.97598. [DOI] [PubMed] [Google Scholar]
- 5.Rosenfeld D. An optimal and efficient new gridding algorithm using singular value decomposition. Magn Reson Med. 1998;40:14–23. doi: 10.1002/mrm.1910400103. [DOI] [PubMed] [Google Scholar]
- 6.Rosenfeld D. New approach to gridding using regularization and estimation theory. Magn Reson Med. 2002;48:193–202. doi: 10.1002/mrm.10132. [DOI] [PubMed] [Google Scholar]
- 7.Moriguchi H, Duerk JL. Iterative next-neighbor regridding (INNG): improved reconstruction from nonuniformly sampled k-space data using rescaled matrices. Magn Reson Med. 2004;51:343–352. doi: 10.1002/mrm.10692. [DOI] [PubMed] [Google Scholar]
- 8.Gabr RE, Aksit P, Bottomley PA, Youssef ABM, Kadah YM. Deconvolution-interpolation gridding (DING): accurate reconstruction for arbitrary k-space trajectories. Magn Reson Med. 2006;56:1182–1191. doi: 10.1002/mrm.21095. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Griswold MA, Blaimer M, Breuer F, Heidemann RM, Mueller M, Jakob PM. Parallel magnetic resonance imaging using the GRAPPA operator formalism. Magn Reson Med. 2005;54:1553–1556. doi: 10.1002/mrm.20722. [DOI] [PubMed] [Google Scholar]
- 10.Seiberlich N, Breuer FA, Blaimer M, Barkauskas K, Jakob PM, Griswold MA. Non-Cartesian data reconstruction using GRAPPA operator gridding (GROG) Magn Reson Med. 2007;58:1257–1265. doi: 10.1002/mrm.21435. [DOI] [PubMed] [Google Scholar]
- 11.Griswold MA, Jakob PM, Heidemann RM, Nittka M, Jellus V, Wang K, Kiefer B, Haase A. Generalized autocalibrating partially parallel acquisitions (GRAPPA) Magn Reson Med. 2002;47:1202–1210. doi: 10.1002/mrm.10171. [DOI] [PubMed] [Google Scholar]
- 12.Seiberlich N, Breuer F, Blaimer M, Jakob P, Griswold M. Self-calibrating GRAPPA operator gridding for radial and spiral trajectories. Magn Reson Med. 2008;59(4):930–935. doi: 10.1002/mrm.21565. [DOI] [PubMed] [Google Scholar]
- 13.Guttman MA, Kellman P, Dick AJ, Lederman RJ, McVeigh ER. Realtime accelerated interactive MRI with adaptive TSENSE and UNFOLD. Magn Reson Med. 2003;50:315–321. doi: 10.1002/mrm.10504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Saybasili H, Kellman P, Griswold MA, Derbyshire JA, Guttman MA. HTGRAPPA: Real-time B1-weighted image domain TGRAPPA reconstruction. Magn Reson Med. 2009;61:1425–1433. doi: 10.1002/mrm.21922. [DOI] [PubMed] [Google Scholar]
- 15.Seiberlich N, Breuer F, Heidemann R, Blaimer M, Griswold M, Jakob P. Reconstruction of undersampled non-Cartesian datasets using pseudo-Cartesian GRAPPA in conjunction with GROG. Magn Reson Med. 2008;59:1127–1137. doi: 10.1002/mrm.21602. [DOI] [PubMed] [Google Scholar]
- 16.Seiberlich N, Breuer FA, Ehses P, Moriguchi H, Blaimer M, Jakob PM, Griswold MA. Using the GRAPPA operator and the generalized sampling theorem to reconstruct undersampled non-Cartesian data. Magn Reson Med. 2009;61:705–715. doi: 10.1002/mrm.21891. [DOI] [PubMed] [Google Scholar]





