GPU-based fast gamma index calculation

Xuejun Gu; Xun Jia; Steve B Jiang

doi:10.1088/0031-9155/56/5/014

. Author manuscript; available in PMC: 2012 Mar 7.

Published in final edited form as: Phys Med Biol. 2011 Feb 11;56(5):1431–1441. doi: 10.1088/0031-9155/56/5/014

GPU-based fast gamma index calculation

Xuejun Gu ¹, Xun Jia ¹, Steve B Jiang ¹

PMCID: PMC3156145 NIHMSID: NIHMS308333 PMID: 21317484

Abstract

The γ-index dose comparison tool has been widely used to compare dose distributions in cancer radiotherapy. The accurate calculation of γ-index requires an exhaustive search of the closest Euclidean distance in the high-resolution dose-distance space. This is a computational intensive task when dealing with 3D dose distributions. In this work, we combine a geometric method (Ju et al. Med Phys 35 879-87, 2008) with a radial pre-sorting technique (Wendling et al. Med Phys 34 1647-54, 2007), and implement them on computer graphics processing units (GPUs). The developed GPU-based γ-index computational tool is evaluated on eight pairs of IMRT dose distributions. The γ-index calculations can be finished within a few seconds for all 3D testing cases on one single NVIDIA Tesla C1060 card, achieving 45-75× speedup compared to CPU computations conducted on an Intel Xeon 2.27 GHz processor. We further investigated the effect of various factors on both CPU and GPU computation time. The strategy of pre-sorting voxels based on their dose difference values speeds up the GPU calculation by about 2.7-5.5 times. For n-dimensional dose distributions, γ-index calculation time on CPU is proportional to the summation of γⁿ over all voxels, while that on GPU is affected by γⁿ distributions and is approximately proportional to the γⁿ summation over all voxels. We found increasing the resolution of dose distributions leads to quadratic increase of computation time on CPU, while less-than-quadratic increase on GPU. The values of dose difference (DD) and distance-to-agreement (DTA) criteria also have their impacts on γ-index calculation time.

1. Introduction

The γ-index concept introduced by Low et al (Low et al., 1998) has been widely used to compare two dose distributions in cancer radiotherapy. The original γ-index calculation algorithm of Low et al (Low et al., 1998) has been improved for better accuracy and/or efficiency (Depuydt et al., 2002; Bakai et al., 2003; Stock et al., 2005; Jiang et al., 2006; Spezi and Lewis, 2006). However, since these modified algorithms still involve computational intensive tasks such as interpolation of dose grid and exhaustive search, it is very time-consuming (e.g., many minutes) to compare two 3D dose distributions of clinically relevant sizes. On the other hand, the comparison of 3D dose distributions becomes necessary in patient specific QA for recently developed and more sophisticated treatment modalities such as volumetric modulated arc therapy (VMAT) (Teke et al., 2010). Therefore, there is a clinical need to significantly speed up 3D γ-index computations.

In more recent years, much effort has been invested to develop fast and/or accurate γ-index calculation algorithms. Wendling et al. (Wendling et al., 2007) speeded up the exhaustive search by pre-sorting involved evaluation dose points with respect to their spatial distances to a reference dose point and performing interpolation on-fly in a fixed searching radius region. This fixed search region induces an overestimation of γ-index values at certain cases when dose difference values are very large inside the search region and have sharp drop just beyond the search region boundary. This algorithm also relies on a fine dose interpolation to secure accuracy. The geometric interpretation of γ-index evaluation technique proposed by Ju et. al. (Ju et al., 2008) implies a linear interpolation by calculating the distance from a reference point to a subdivided simplex formed by evaluation dose points in search regions. Thus, high accuracy and efficiency can be achieved without interpolating the dose grid to a fine resolution. However, searching the closest distance over all subdivided simplexes is still time-consuming. Later, Chen et. al. (Chen et al., 2009) reports a method based on using fast Euclidean distance transform (EDT) of quantized n-dimensional dose distributions. Fast γ-index evaluation can be achieved with the complexity of O(Nⁿ M), where N is the size of dose distribution in each dimension, n is the number of dimensions, and M is the number of quantized values for dose distribution. This method brings in discretization errors when quantizing dose distributions. It also requires M times' more memory space of original searching based algorithm. Thus, a full 3D application of EDT method is limited by its memory requirement. The searching based algorithm's complexity is O(Nⁿ N_s), where N_s represents exhaustive search steps. For cases where γ is not very large, where N_s is much smaller than M, this EDT method loses its advantage of efficiency. Recently, Yuan et. al. (Yuan and Chen, 2010) proposes a technique using a k-d tree technique for nearest neighbor searching. The searching time for Nⁿ voxels dose distribution can be reduced to (Nⁿ)^1/^k, where 2 < k < 3 for 2D and 3D dose distributions. However, this method requires interpolating dose grid to secure accuracy. Moreover, in certain cases, the overhead of k-d tree construction time is longer than γ-index calculation time.

One approach to achieve fast γ-index evaluation is to implement an accurate algorithm on graphics processing unit (GPU) platform. The GPU is originally designed for graphics rendering. It has recently been introduced into the radiotherapy community to accelerate computational tasks such as DTS reconstruction, CBCT reconstruction, rigid and deformable image registration, dose calculation, and treatment plan optimization (Li et al., 2005; Sharp et al., 2007; Yan et al., 2007; Li et al., 2008; Samant et al., 2008; Jacques et al., 2008; Hissoiny et al., 2009; Gu et al., 2009; Men et al., 2009; Jia et al., 2010b; Gu et al., 2010; Jia et al., 2010a; Men et al., 2010a; Men et al., 2010b). The GPU is especially well-suited for problems that can be expressed as data-parallel computations (NVIDIA, 2010). γ-index calculation belongs to this category, because the evaluation of each reference point is totally independent. Instead of implementing the memory demanding EDT method or the large overhead k-d tree method, we decide to combine the accuracy of geometric interpretation technique (Ju et al., 2008) and the efficiency of the pre-sorting technique (Wendling et al., 2007). We will also revise this modified algorithm to make it GPU-friendly and then implement it on GPU to achieve both accuracy and high efficiency.

The remainder of this paper is organized as follows. In Section 2, we will discuss the modified γ-index algorithm and its implementation on GPU. Section 3 will present the evaluation of our GPU-based algorithm using eight 3D IMRT dose distributions pairs. We will first study the speedup factor achieved by GPU implementation. Then, we will study the effects of dose difference sorting and the r-index values on the computation time. We will also investigate how the dose distribution resolution and dose difference (DD) and distance-to-agreement (DTA) criteria impact on computation time. Conclusion will be given in Section 4.

2. Methods and Materials

2.1 A modified γ-index algorithm

The γ-index is the minimum Euclidean distance in normalized dose-distance space (Low et al., 1998):

γ (r_{r}) = min {Γ (r_{r}, r_{e})},

with

\begin{matrix} Γ (r_{r}, r_{e}) = | {\tilde{r}}_{r} - {\tilde{r}}_{e} |, \forall {r_{e}}, \\ {\tilde{r}}_{r} = (\frac{r_{r}}{Δ d}, \frac{D_{r} (r_{r})}{Δ D}), \\ {\tilde{r}}_{e} = (\frac{r_{e}}{Δ d}, \frac{D_{e} (r_{e})}{Δ D}) . \end{matrix}

(1)

Here, D_r(r_r) is the reference dose distribution at position r_r and D_e(r_e) is the evaluated dose distribution at position r_e. ΔD and Δd refer to dose DD criterion and DTA criterion, respectively. Using the geometric method (Ju et al., 2008), the accurate Γ can be obtained by calculating the distance from the reference point r̃_r to the continuous evaluation surface formed by discrete evaluation points r̃_e. And the minimum Γ value is achieved by accelerated exhaustive search with pre-sorting algorithm (Wendling et al., 2007). The algorithm A1 illustrates the CPU implementation of combined presorting and geometric γ-index algorithm.

Algorithm A1: A modified γ-index calculation algorithm implemented on CPU

Calculate the maximum DD: max(DD(r_r)) = max(D_r(r_r) − D_e(r_r)), ∀{r_r};
Calculate the geometric distance set L_n, which defines the maximum search range for each reference point;

$L_{n} = \sqrt{{(i Δ x)}^{2} + {(j Δ y)}^{2} + {(k Δ z)}^{2}} / Δ d;$

with $| i (or j, k) | \leq \frac{max (D D (r_{r}))}{Δ D} \frac{Δ d}{Δ x (or Δ y, Δ z)}$ & $| L_{n} | < \frac{max (D D (r_{r}))}{Δ D}$ , where i, j, k are discretized coordinates in x, y, z directions and Δx, Δy, Δz refer to resolutions in x, y, z directions;
Sort the geometric distance set {n, L_n} in ascending order of L_n, with n as the total number of voxel inside the search range
For each reference dose point:
1. Set γ(r_r) = DD(r_r)⁄∆D;
2. For n = 1: N (N is the length of {L_n})
  - For j = 1:nS (nS is the number of simplexes in one voxel)
    1. Calculate Euclidean distance Γ(r̃_r, S_j) from reference point r̃_r to a k-simplex S_j:
      
      $Γ ({\tilde{r}}_{r}, S_{j}) = {\begin{matrix} min | {\tilde{r}}_{r} - \sum_{i = 1}^{k + 1} ω_{i} ν_{i} |, & if all ω_{i} > 0 \\ min_{S_{i} \in \partial S_{i}} Γ ({\tilde{r}}_{r}, S_{i}), & others \end{matrix};$
    2. If Γ(r̃_r, S_j) < γ(r_r) : γ(r_r) = Γ(r̃_r, S_j);
    End For
  - If γ(r_r) < L_n break;
  End For
End For

Similar to Wendling et. al. (Wendling et al., 2007), at the Steps 2 and 3 of Algorithm A1, we establish a sorted table of normalized geometric distance {L_n} of all the voxels in the maximum search range. However, instead of using a manually selected search range as in (Wendling et al. 2007), we choose search radius as max(DD(r_r)) ∆d⁄∆D, which can avoid overestimating γ-index values.

The Γ(r̃_r, S_j) in Algorithm A1 Step 4 is obtained when {ω₁,⋯,ω_k} = (V^TV)⁻¹V^TP, $ω_{k + 1} = 1 - \sum_{i = 1}^{k} ω_{i}$ , where P and V are K × 1 and K × k matrices with a form $P = {\begin{matrix} c_{1} ({\tilde{r}}_{r}) - c_{1} (ν_{k + 1}) \\ ⋮ \\ c_{n} ({\tilde{r}}_{r}) - c_{n} (ν_{k + 1}) \end{matrix}}$ , $V = {\begin{matrix} c_{1} (ν_{1}) - c_{1} (ν_{k + 1}) & \dots & c_{1} (ν_{k}) - c_{1} (ν_{k + 1}) \\ ⋮ & ⋮ & ⋮ \\ c_{n} (ν_{1}) - c_{n} (ν_{k + 1}) & \dots & c_{n} (ν_{k}) - c_{n} (ν_{k + 1}) \end{matrix}}$ . Here, for a k-dimensional dose distribution K = k + 1, c_j(q) is the jth coordinate of point of q and {ν₁,⋯,ν_k₊₁} are the vertices of a k-th simplex. Regarding Γ(r̃_r, S_j) calculation, we follow the computational acceleration techniques presented by Ju et. al. (Ju et al., 2008), where the computation is conducted recursively in the simplexes set and the recursive computation is only limited to the subset of simplexes where corresponding weights ω_i are negative. Detailed information regarding Γ(r̃_r, S_j) calculation can be found in the reference (Ju et al., 2008).

2.2 GPU implementation

In this work, we implement the γ-index algorithm (Algorithm A1) on GPU using Compute Unified Device Architecture (CUDA) programming environment. In the Algorithm A1 Step 4, for each reference point the minimum Γ value is searched around the reference point in a search range of a radius (DD(r_r)⁄∆D)∆d. On CPU, Step 4 is repeated for all reference points in a sequential manner. On GPU, this step can be parallelized for a large number of reference points and executed simultaneously using multiple threads. A key point of the GPU implementation of this algorithm is to ensure all threads in the same batch (strictly speaking warp in CUDA terminology) to have similar numbers of arithmetic operations. This is because, if some threads in a warp require much longer execution time, the other threads in this warp will finish first and then wait in idle until the longer execution time threads finish, implying a waste of computational power. Therefore, directly mapping the CPU version of γ-index algorithm (Algorithm A1) onto GPU cannot guarantee that all threads in a warp have similar computation burdens, and consequentially cannot achieve maximum speed up. As we know, the upper boundary of the search range for each reference point is (DD(r_r)⁄∆D)∆d. The computation task for each reference point is then approximately proportional to the dose difference DD(r_r). The larger the DD(r_r), the more evaluation dose points will be involved, leading to longer computation time. We therefore pre-sort the voxels according to DD(r_r) (for convenience we call it DD sorting) and perform γ-index calculation on GPU according to the pre-sorted voxel order. This DD-sorting procedure, along with the pre-sorting geometric distance set {n, L_n}, can be parallelized using recently developed Thrust library functions (Hoberock et al., 2010), which can sort a (or multiple) millions-element array(s) within subseconds. The completed GPU-based γ-index algorithm is illustrated as following:

Algorithm A2: A modified γ-index calculation algorithm implemented on GPU

Transfer dose distributions data from CPU to GPU;
CUDA Kernel 1: calculate in parallel the dose difference DD(r_r) = D_r(r_r) − D_e(r_r), ∀ {r_r};
Sort in parallel {Voxel Index, DD(r_r)} array pair in ascending order of DD(r_r) using Thrust parallel sorting function and obtain max(DD(r_r));
CUDA Kernel 2: calculate in parallel the geometric distance set { L_n};
Sort in parallel the geometric distance set {n, L_n} in ascending order of L_n using Thrust parallel sorting function;
CUDA Kernel 3: calculate in parallel the γ-index values using the algorithm illustrated in Step 4 of Algorithm A1;
Sort {Voxel Index, γ} back to the original voxel index order;
Transfer the γ-index data from GPU to CPU.

We would like to point out that the Step 4-b-i of Algorithm A1 utilizes a recursive algorithm for computing Γ(r̃_r, S_j) only in the subset of simplexes which involves many IF conditions and thus creates a branching issue in GPU implementation. To avoid this problem, in Step 6 (Kernel 3) of Algorithm A2, we calculate Γ(r̃_r, S_j) in all simplexes.

2.3 Experimental data sets

We tested our GPU implementation on eight IMRT dose-distribution pairs (4 lung cases (L1-L4) and 4 head-neck cases (H1-H4)), which were generated using a Monte Carlo dose engine called MCSIM (Ma et al., 2002) as well as an in-house developed pencil beam algorithm (Gu et al., 2009). Monte Carlo dose calculation results were treated as the reference dose distributions while results obtained from the pencil beam algorithm were used as the evaluation dose distributions. All the doses were originally calculated with the voxel size of 4.0mm × 4.0mm × 2.5mm and normalized to the prescription dose and interpolated to various resolution levels for comparison studies. CPU computation was conducted on a 4-core Intel Xeon 2.27 GHz processor. GPU computation was performed on one single NVIDIA Tesla C1060 card, which has 240 processors cores (1.3 GHz) and 4 GB device memory. We would like to point out that our GPU implementation did not affect the calculation accuracy; in all scenarios, the γ-index values calculated on GPU agree with those calculated on CPU within ∼10^-6. In the following sections, we present results under various conditions. For the CPU implementation based on Algorithm A1, we divide the total computation time T^c into two parts, i.e., $T^{C} = T_{p}^{C} + T_{γ}^{C}$ , where $T_{p}^{C}$ is the data processing time (Steps 1, 2, and 3 of Algorithm A1) and $T_{γ}^{C}$ is the γ-index calculation time (Step 4 of Algorithm A1). For the GPU implementation based on Algorithm A2, we split the total computation time T^G into three parts, i.e., $T^{G} = T_{t}^{G} + T_{p}^{G} + T_{γ}^{G}$ , where $T_{t}^{G}$ is the data transferring time between CPU and GPU (Steps 1 and 8 of Algorithm A2), $T_{p}^{G}$ is the data processing time (Steps 2-5 and Step 7 of Algorithm A2), and $T_{γ}^{G}$ is the γ-index calculation time (Step 6 of Algorithm A2).

3. Experimental Results and Discussion

3.1 Speedup of GPU vs. CPU

For this study, we set the resolution of dose distributions to be 256 × 256 × 144 (or 160, 206) and use 3% for DD criterion and 3 mm for DTA criterion. Table 1 lists computation time for the CPU implementation (Algorithm A1) and the GPU implementation (Algorithm A2). We present two speedup factors in Table 1, with and without CPU-GPU data transferring time, i.e., T^C/T^G and $T^{C} / (T^{G} - T_{t}^{G})$ . These two speedup factors are quite similar (within 3% for all cases), indicating that the data transferring time in GPU calculation is not significant compared to γ-index computation time $T_{γ}^{G}$ . We can also see that the data processing time in both CPU and GPU implementations is relatively insignificant compared to the γ-index calculation time (Step 4 of Algorithm A1 and Step 6 of Algorithm A2). Overall, the GPU implementation can achieve about 45∼75× speedup compared to its CPU implementation.

Table 1.

Calculation time of γ-index for CPU and GPU implementations for 8 IMRT dose distributions pairs.

Case

Voxel number

CPU (sec)

GPU (sec)

Speedup factor

T_{p}^{C}

T_{γ}^{C}

T^C

T_{t}^{G}

T_{p}^{G}

T_{γ}^{G}

T^G

T^{C} / (T^{G} - T_{t}^{G})

T^C/T^G

256×256×206

0.33

64.93

65.26

0.07

0.18

1.18

1.43

47.99

45.64

256×256×160

0.24

65.64

65.89

0.06

0.15

1.13

1.34

51.47

49.16

256×256×160

0.28

101.46

101.74

0.06

0.14

1.68

1.88

55.90

54.12

256×256×160

0.25

30.10

30.35

0.06

0.14

0.43

0.63

53.25

48.17

256×256×144

0.49

47.73

47.95

0.05

0.11

0.91

1.07

47.27

45.07

256×256×144

0.45

242.23

242.68

0.05

0.12

3.13

3.30

74.67

73.54

256×256×144

0.24

116.14

116.38

0.05

0.12

2.10

2.27

52.42

51.27

256×256×144

0.22

107.61

107.86

0.05

0.12

1.75

1.92

57.66

56.16

Open in a new tab

3.2 The effect of DD sorting on computation time $T_{γ}^{G}$

As mentioned in Section 2.2, introducing of DD sorting (Step 3 in Algorithm A2) can better synchronize the computational tasks on CUDA threads and consequently reduce computation time. We illustrate the effect of DD sorting on $T_{γ}^{G}$ in Table 2. The speedup achieved by DD sorting is around 2.7- 5.5 times.

Table 2.

Speedup achieved in GPU computation by sorting voxels based on the dose difference values.

Case	$T_{γ}^{G}$ (Non-DD sorting) (sec)	$T_{γ}^{G}$ (DD sorting) (sec)	Speedup factor achieved by DD sorting
L1	4.37	1.18	3.70
L2	4.62	1.13	4.09
L3	6.71	1.68	3.99
L4	1.18	0.43	2.74
H1	4.99	0.91	5.48
H2	14.31	3.13	4.57
H3	9.01	2.10	4.29
H4	12.54	1.75	7.17

Open in a new tab

3.3 The effect of γ-index values on computation time $T_{γ}^{C}$ and $T_{γ}^{G}$

From Table 1, we see that, both $T_{γ}^{C}$ and $T_{γ}^{G}$ change significantly from case to case. Taking cases L3 and L4 as an example, they have the same number of voxels, but their γ-index calculation time differs by around 3 times. We know that computation time t for each reference point is proportional to the number of searched voxels N_s, i.e., t ∝ N_s. The relationship of N_s with the search length L can be expressed as N_s ∝ Lⁿ, where n is the dimension of dose distributions. Here, for all the testing cases in this paper, n = 3. On the other hand, from Algorithm A1 Step 4-b-ii, we can deduce the search length L is proportional to the γ value at each reference point, i.e., L ∝ γ. Thus, we can state that the computation time t for each reference point is proportional to γⁿ: t ∝ γⁿ. In Figure 1(a), we plot $T_{γ}^{C}$ and $T_{γ}^{G}$ versus the summation of γ³ over all voxels (Σ γ³) for each of 8 testing cases, respectively. We see that both $T_{γ}^{C}$ and $T_{γ}^{G}$ are monotonically increasing with the value of Σ γ³. To further illustrate our point, we choose one set of patient data (H2) and shift the evaluation dose distribution (normalized to the prescription dose) by -10%, -9%, …, up to 10%, at a step size of 1%, inside the region of 10% iso-dose line. Figure 1(b) illustrates the γ-index calculation time $T_{γ}^{C}$ and $T_{γ}^{G}$ with respect to Σ γ³. We can see that $T_{γ}^{C}$ versus Σ γ³ can be fitted with a straight line (dashed line in Figure 1(b)), indicating that $T_{γ}^{C}$ strictly follows the rule $T_{γ}^{C}$ ∝ Σ γⁿ. However, the date points for $T_{γ}^{G}$ are much more scattered. This is because the GPU computation time is not only the function of Σ γ³, but also the function of γⁿ distribution that determines the variation of threads computation time in a warp.

(a) GPU and CPU computation time for eight testing cases vs. the summation of γ³ values over all evaluated points. (b) GPU and CPU computation time for case H2 with various dose shifts on the evaluation dose distribution vs. the summation of γ³ values over all evaluated points. For convenient purpose, we scaled down CPU computation time by a factor of 60.0 to illustrate them in the same vertical axis of GPU computation time.

3.4 The effect of dose distribution resolution on the computation time $T_{γ}^{C}$ and $T_{γ}^{G}$

We choose the case H2 to test the effect of the dose distribution resolution on computation time $T_{γ}^{C}$ and $T_{γ}^{G}$ . We interpolate the dose distributions to various resolution levels, including 128 × 128 × 72, 128 × 128 × 144, 256 × 256 × 72, 256 × 256 × 144, and 512 × 512 × 72. We illustrate $T_{γ}^{C}$ and $T_{γ}^{G}$ change with respect to the resolution changes in Figure 2. As indicated by the power trend lines (dashed lines in Figure 2), $T_{γ}^{C}$ increases approximately as N^1.90 while $T_{γ}^{G}$ increases approximately as N^1.75, when the resolution of dose distribution increases N times. As illustrated in Algorithm A1, the CPU based γ-index calculation is completed with two loops. The outer loop is over all the reference dose points and the inner loop is an exhaustive search in a limited region around each reference dose point. The computation time of the outer loop is increased linearly with respect to the increase of resolution of dose distributions. The inner loop computation time is proportional to the number of voxels involved. For the geometric method, the number of involved voxels in a fixed region is increased linearly as the resolution increases. Overall, it leads to a quadratic increase of computational time (∝ N^1.90) for a linear change of resolution. For the GPU algorithm A2, the computation time increases as (N)^1.75 in this testing case. This slight difference might be due to the fact that the memory accessing time can be hidden by large arithmetic operations in GPU computation.

CPU and GPU computation time as functions of dose distribution resolution. Again, For the convenient purpose, we scaled down CPU computation time by a factor of 60.0 to illustrate CPU computation time in the same axis of GPU computation time.

3.5 The effect of DD and DTA criteria on computation time $T_{γ}^{C}$ and $T_{γ}^{G}$

In this study, we choose case H2 and fix the resolution to 256 × 256 × 144, then vary DD and DTA criteria. Table 3 lists the computation time obtained from varying criteria. There are three interesting phenomena: 1) when we increase the DD criterion value and fix the DTA criterion value, the computation time decreases; 2) when we fix the DD criterion value and increase the DTA criterion value, the computation time increases; 3) when we increase both DD criterion value and DTA criterion value proportionally, for example, from 1%, 1mm to 2%, 2mm, or 3%, 3mm, the computation time do not change. As we mentioned in Section 3.4, the γ-index calculation time for each reference dose point t ∝ γⁿ. For phenomenon 1), when we increase the DD criterion value and fix the DTA criterion value, the γ-index value decreases. Consequently, the required searching steps decreases, and computation time decreases. For phenomenon 2), when we fix the DD criterion value, but increase the DTA criterion value by k times, the γ-index values will decrease by k′ times, with k′ < k. This decreases computation time by (k′)ⁿ times. However, when the DTA criterion value increases by k times, the resolution in the normalized dose-distance space will also increases by kⁿ times, which consequently increases the computation time by kⁿ times. The net change of the computation time should be (k/k′)ⁿ. Since k′ < k, the overall computation time will then increase. For phenomenon 3), when we increase both DTA and DD criteria values simultaneously by k times, the γ-index values decrease by k′ = k times. The increase rate of computation time will be (k/k)ⁿ = 1. The computation time under this situation will not change. From Table 3, we can see that the change of DD and DTA criteria values does not affect the speedup factor achieved with GPU implementation.

Table 3.

CPU and GPU computation time varies with DD and DTA criteria values.

DD criteria

DTA criteria (mm)

Computational time (sec)

Speedup factor

T_{γ}^{C} / T_{γ}^{G}

T_{γ}^{G}

T_{γ}^{C}

3.13

240.37

76.80

11.33

863.33

76.20

23.91

1545.09

64.62

0.97

74.27

76.57

3.13

265.82

84.93

6.53

486.68

74.53

0.55

40.09

72.89

1.55

119.32

76.98

3.13

245.34

78.38

Open in a new tab

4. Conclusions

In this paper, we implemented a modified γ-index algorithm on GPU. We evaluated our GPU implementation on eight pairs of IMRT dose distributions. Overall, when using one single Tesla C1060 GPU card for GPU computation and one core Intel Xeon 2.27GHz processor, our GPU implementation has achieved about 45∼75× speedup compared to the CPU implementation and can finish the γ-index calculation within a few seconds. We also studied the effects of various factors on the calculation time on both CPU and GPU for our implementation of the modified γ-index algorithm. We found that the pre-sorting procedure based on the dose difference speeds up the GPU calculation by about 2.7∼5.5 times. The CPU computation time is proportional to the summation of γⁿ over all voxels, where n is the dimensions of dose distributions. The GPU computation time is approximately proportional to the summation of γⁿ over all voxels, but affected by the variation of γⁿ among different voxels. We also found that increasing the resolution of dose distribution leads to a quadratic increase of computation time on CPU, while less-than-quadratic increase on GPU. We observed that both CPU and GPU computation time decrease when increasing the DD criterion value and fixing the DTA criterion value, increase when increasing the DTA criterion value and fixing the DD criterion value, and don't vary when both DD and DTA criterion values change proportionally. Both CPU and GPU codes developed in this work for γ-index dose evaluation are in public domain and available upon request.

Acknowledgments

This work is supported in part by the University of California Lab Fees Research Program, by Varian Medical Systems, Inc., and by an NIH/NCI grant 1F32 CA154045-01. We would like to thank NVIDIA for providing GPU cards for this project.

References

Bakai A, Alber M, Nusslin F. A revision of the gamma-evaluation concept for the comparison of dose distributions. Phys Med Biol. 2003;48:3543–53. doi: 10.1088/0031-9155/48/21/006. [DOI] [PubMed] [Google Scholar]
Chen ML, Lu WG, Chen Q, Ruchala K, Olivera G. Efficient gamma index calculation using fast Euclidean distance transform. Phys Med Biol. 2009;54:2037–47. doi: 10.1088/0031-9155/54/7/012. [DOI] [PubMed] [Google Scholar]
Depuydt T, Van Esch A, Huyskens DP. A quantitative evaluation of IMRT dose distributions: refinement and clinical assessment of the gamma evaluation. Radiotherapy and Oncology. 2002;62:309–19. doi: 10.1016/s0167-8140(01)00497-2. [DOI] [PubMed] [Google Scholar]
Gu XJ, Choi DJ, Men CH, Pan H, Majumdar A, Jiang SB. GPU-based ultra-fast dose calculation using a finite size pencil beam model. Phys Med Biol. 2009;54:6287–97. doi: 10.1088/0031-9155/54/20/017. [DOI] [PMC free article] [PubMed] [Google Scholar]
Gu XJ, Pan H, Liang Y, Castillo R, Yang DS, Choi DJ, Castillo E, Majumdar A, Guerrero T, Jiang SB. Implementation and evaluation of various demons deformable image registration algorithms on a GPU. Phys Med Biol. 2010;55:207–19. doi: 10.1088/0031-9155/55/1/012. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hissoiny S, Ozell B, Despres P. Fast convolution-superposition dose calculation on graphics hardware. Medical Physics. 2009;36:1998–2005. doi: 10.1118/1.3120286. [DOI] [PubMed] [Google Scholar]
Hoberock J, Bell N. Thrust: A Parallel Template Library. 2010. [Google Scholar]
Jacques R, Taylor R, Wong J, McNutt T. Towards Real-Time Radiation Therapy: GPU Accelerated Superposition/Convolution. High-Perfornance MICCAI Workshop 2008 [Google Scholar]
Jia X, Gu XJ, Sempau J, Choi D, Majumdar A, Jiang SB. Development of a GPU-based Monte Carlo dose calculation code for coupled electron-photon transport. Phys Med Biol. 2010a;55:3077–86. doi: 10.1088/0031-9155/55/11/006. [DOI] [PubMed] [Google Scholar]
Jia X, Lou YF, Li RJ, Song WY, Jiang SB. GPU-based fast cone beam CT reconstruction from undersampled and noisy projection data via total variation. Medical Physics. 2010b;37:1757–60. doi: 10.1118/1.3371691. [DOI] [PubMed] [Google Scholar]
Jiang SB, Sharp GC, Neicu T, Berbeco RI, Flampouri S, Bortfeld T. On dose distribution comparison. Phys Med Biol. 2006;51:759–76. doi: 10.1088/0031-9155/51/4/001. [DOI] [PubMed] [Google Scholar]
Ju T, Simpson T, Deasy JO, Low DA. Geometric interpretation of the gamma dose distribution comparison technique: Interpolation-free calculation. Medical Physics. 2008;35:879–87. doi: 10.1118/1.2836952. [DOI] [PubMed] [Google Scholar]
Li G, Xie HC, Ning H, Capala J, Arora BC, Coleman CN, Camphausen K, Miller RW. A novel 3D volumetric voxel registration technique for volume-view-guided image registration of multiple imaging modalities. International Journal of Radiation Oncology Biology Physics. 2005;63:261–73. doi: 10.1016/j.ijrobp.2005.05.008. [DOI] [PubMed] [Google Scholar]
Li G, Xie HC, Ning H, Citrin D, Capala J, Maass-Moreno R, Guion P, Arora B, Coleman N, Camphausen K, Miller RW. Accuracy of 3D volumetric image registration based on CT, MR and PET/CT phantom experiments. Journal of Applied Clinical Medical Physics. 2008;9:17–36. doi: 10.1120/jacmp.v9i4.2781. [DOI] [PMC free article] [PubMed] [Google Scholar]
Low DA, Harms WB, Mutic S, Purdy JA. A technique for the quantitative evaluation of dose distributions. Medical Physics. 1998;25:656–61. doi: 10.1118/1.598248. [DOI] [PubMed] [Google Scholar]
Ma CM, Li JS, Pawlicki T, Jiang SB, Deng J, Lee MC, Koumrian T, Luxton M, Brain S. A Monte Carlo dose calculation tool for radiotherapy treatment planning. Phys Med Biol. 2002;47:1671. doi: 10.1088/0031-9155/47/10/305. [DOI] [PubMed] [Google Scholar]
Men CH, Gu XJ, Choi DJ, Majumdar A, Zheng ZY, Mueller K, Jiang SB. GPU-based ultrafast IMRT plan optimization. Phys Med Biol. 2009;54:6565–73. doi: 10.1088/0031-9155/54/21/008. [DOI] [PubMed] [Google Scholar]
Men CH, Jia X, Jiang SB. GPU-based ultra-fast direct aperture optimization for online adaptive radiation therapy. Phys Med Biol. 2010a;55:4309–19. doi: 10.1088/0031-9155/55/15/008. [DOI] [PubMed] [Google Scholar]
Men CH, Jia X, Jiang SB, Romeijn HE. Ultrafast treatment plan optimization for volumetric modulated arc therapy (VMAT) Medical Physics. 2010b;37:5787–91. doi: 10.1118/1.3491675. [DOI] [PubMed] [Google Scholar]
NVIDIA. NVIDIA CUDA Compute Unified Device Architecture, Programming Guide version 3.2. ed NVIDIA. 2010. [Google Scholar]
Samant SS, Xia JY, Muyan-Ozcelilk P, Owens JD. High performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy. Medical Physics. 2008;35:3546–53. doi: 10.1118/1.2948318. [DOI] [PubMed] [Google Scholar]
Sharp GC, Kandasamy N, Singh H, Folkert M. GPU-based streaming architectures for fast cone-beam CT image reconstruction and demons deformable registration. Phys Med Biol. 2007;52:5771–83. doi: 10.1088/0031-9155/52/19/003. [DOI] [PubMed] [Google Scholar]
Spezi E, Lewis DG. Gamma histograms for radiotherapy plan evaluation. Radiotherapy and Oncology. 2006;79:224–30. doi: 10.1016/j.radonc.2006.03.020. [DOI] [PubMed] [Google Scholar]
Stock M, Kroupa B, Georg D. Interpretation and evaluation of the gamma index and the gamma index angle for the verification of IMRT hybrid plans. Phys Med Biol. 2005;50:399–411. doi: 10.1088/0031-9155/50/3/001. [DOI] [PubMed] [Google Scholar]
Teke T, Bergman AM, Kwa W, Gill B, Duzenli C, Popescu IA. Monte Carlo based, patient-specific RapidArc QA using Linac log files. Medical Physics. 2010;37:116–23. doi: 10.1118/1.3266821. [DOI] [PubMed] [Google Scholar]
Wendling M, Zijp LJ, McDermott LN, Smit EJ, Sonke JJ, Mijnheer BJ, Van Herk M. A fast algorithm for gamma evaluation in 3D. Medical Physics. 2007;34:1647–54. doi: 10.1118/1.2721657. [DOI] [PubMed] [Google Scholar]
Yan H, Ren L, Godfrey DJ, Yin FF. Accelerating reconstruction of reference digital tomosynthesis using graphics hardware. Medical Physics. 2007;34:3768–76. doi: 10.1118/1.2779945. [DOI] [PubMed] [Google Scholar]
Yuan JK, Chen WM. A gamma dose distribution evaluation technique using the k-d tree for nearest neighbor searching. Medical Physics. 2010;37:4868–73. doi: 10.1118/1.3480964. [DOI] [PubMed] [Google Scholar]

[R1] Bakai A, Alber M, Nusslin F. A revision of the gamma-evaluation concept for the comparison of dose distributions. Phys Med Biol. 2003;48:3543–53. doi: 10.1088/0031-9155/48/21/006. [DOI] [PubMed] [Google Scholar]

[R2] Chen ML, Lu WG, Chen Q, Ruchala K, Olivera G. Efficient gamma index calculation using fast Euclidean distance transform. Phys Med Biol. 2009;54:2037–47. doi: 10.1088/0031-9155/54/7/012. [DOI] [PubMed] [Google Scholar]

[R3] Depuydt T, Van Esch A, Huyskens DP. A quantitative evaluation of IMRT dose distributions: refinement and clinical assessment of the gamma evaluation. Radiotherapy and Oncology. 2002;62:309–19. doi: 10.1016/s0167-8140(01)00497-2. [DOI] [PubMed] [Google Scholar]

[R4] Gu XJ, Choi DJ, Men CH, Pan H, Majumdar A, Jiang SB. GPU-based ultra-fast dose calculation using a finite size pencil beam model. Phys Med Biol. 2009;54:6287–97. doi: 10.1088/0031-9155/54/20/017. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] Gu XJ, Pan H, Liang Y, Castillo R, Yang DS, Choi DJ, Castillo E, Majumdar A, Guerrero T, Jiang SB. Implementation and evaluation of various demons deformable image registration algorithms on a GPU. Phys Med Biol. 2010;55:207–19. doi: 10.1088/0031-9155/55/1/012. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] Hissoiny S, Ozell B, Despres P. Fast convolution-superposition dose calculation on graphics hardware. Medical Physics. 2009;36:1998–2005. doi: 10.1118/1.3120286. [DOI] [PubMed] [Google Scholar]

[R7] Hoberock J, Bell N. Thrust: A Parallel Template Library. 2010. [Google Scholar]

[R8] Jacques R, Taylor R, Wong J, McNutt T. Towards Real-Time Radiation Therapy: GPU Accelerated Superposition/Convolution. High-Perfornance MICCAI Workshop 2008 [Google Scholar]

[R9] Jia X, Gu XJ, Sempau J, Choi D, Majumdar A, Jiang SB. Development of a GPU-based Monte Carlo dose calculation code for coupled electron-photon transport. Phys Med Biol. 2010a;55:3077–86. doi: 10.1088/0031-9155/55/11/006. [DOI] [PubMed] [Google Scholar]

[R10] Jia X, Lou YF, Li RJ, Song WY, Jiang SB. GPU-based fast cone beam CT reconstruction from undersampled and noisy projection data via total variation. Medical Physics. 2010b;37:1757–60. doi: 10.1118/1.3371691. [DOI] [PubMed] [Google Scholar]

[R11] Jiang SB, Sharp GC, Neicu T, Berbeco RI, Flampouri S, Bortfeld T. On dose distribution comparison. Phys Med Biol. 2006;51:759–76. doi: 10.1088/0031-9155/51/4/001. [DOI] [PubMed] [Google Scholar]

[R12] Ju T, Simpson T, Deasy JO, Low DA. Geometric interpretation of the gamma dose distribution comparison technique: Interpolation-free calculation. Medical Physics. 2008;35:879–87. doi: 10.1118/1.2836952. [DOI] [PubMed] [Google Scholar]

[R13] Li G, Xie HC, Ning H, Capala J, Arora BC, Coleman CN, Camphausen K, Miller RW. A novel 3D volumetric voxel registration technique for volume-view-guided image registration of multiple imaging modalities. International Journal of Radiation Oncology Biology Physics. 2005;63:261–73. doi: 10.1016/j.ijrobp.2005.05.008. [DOI] [PubMed] [Google Scholar]

[R14] Li G, Xie HC, Ning H, Citrin D, Capala J, Maass-Moreno R, Guion P, Arora B, Coleman N, Camphausen K, Miller RW. Accuracy of 3D volumetric image registration based on CT, MR and PET/CT phantom experiments. Journal of Applied Clinical Medical Physics. 2008;9:17–36. doi: 10.1120/jacmp.v9i4.2781. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R15] Low DA, Harms WB, Mutic S, Purdy JA. A technique for the quantitative evaluation of dose distributions. Medical Physics. 1998;25:656–61. doi: 10.1118/1.598248. [DOI] [PubMed] [Google Scholar]

[R16] Ma CM, Li JS, Pawlicki T, Jiang SB, Deng J, Lee MC, Koumrian T, Luxton M, Brain S. A Monte Carlo dose calculation tool for radiotherapy treatment planning. Phys Med Biol. 2002;47:1671. doi: 10.1088/0031-9155/47/10/305. [DOI] [PubMed] [Google Scholar]

[R17] Men CH, Gu XJ, Choi DJ, Majumdar A, Zheng ZY, Mueller K, Jiang SB. GPU-based ultrafast IMRT plan optimization. Phys Med Biol. 2009;54:6565–73. doi: 10.1088/0031-9155/54/21/008. [DOI] [PubMed] [Google Scholar]

[R18] Men CH, Jia X, Jiang SB. GPU-based ultra-fast direct aperture optimization for online adaptive radiation therapy. Phys Med Biol. 2010a;55:4309–19. doi: 10.1088/0031-9155/55/15/008. [DOI] [PubMed] [Google Scholar]

[R19] Men CH, Jia X, Jiang SB, Romeijn HE. Ultrafast treatment plan optimization for volumetric modulated arc therapy (VMAT) Medical Physics. 2010b;37:5787–91. doi: 10.1118/1.3491675. [DOI] [PubMed] [Google Scholar]

[R20] NVIDIA. NVIDIA CUDA Compute Unified Device Architecture, Programming Guide version 3.2. ed NVIDIA. 2010. [Google Scholar]

[R21] Samant SS, Xia JY, Muyan-Ozcelilk P, Owens JD. High performance computing for deformable image registration: Towards a new paradigm in adaptive radiotherapy. Medical Physics. 2008;35:3546–53. doi: 10.1118/1.2948318. [DOI] [PubMed] [Google Scholar]

[R22] Sharp GC, Kandasamy N, Singh H, Folkert M. GPU-based streaming architectures for fast cone-beam CT image reconstruction and demons deformable registration. Phys Med Biol. 2007;52:5771–83. doi: 10.1088/0031-9155/52/19/003. [DOI] [PubMed] [Google Scholar]

[R23] Spezi E, Lewis DG. Gamma histograms for radiotherapy plan evaluation. Radiotherapy and Oncology. 2006;79:224–30. doi: 10.1016/j.radonc.2006.03.020. [DOI] [PubMed] [Google Scholar]

[R24] Stock M, Kroupa B, Georg D. Interpretation and evaluation of the gamma index and the gamma index angle for the verification of IMRT hybrid plans. Phys Med Biol. 2005;50:399–411. doi: 10.1088/0031-9155/50/3/001. [DOI] [PubMed] [Google Scholar]

[R25] Teke T, Bergman AM, Kwa W, Gill B, Duzenli C, Popescu IA. Monte Carlo based, patient-specific RapidArc QA using Linac log files. Medical Physics. 2010;37:116–23. doi: 10.1118/1.3266821. [DOI] [PubMed] [Google Scholar]

[R26] Wendling M, Zijp LJ, McDermott LN, Smit EJ, Sonke JJ, Mijnheer BJ, Van Herk M. A fast algorithm for gamma evaluation in 3D. Medical Physics. 2007;34:1647–54. doi: 10.1118/1.2721657. [DOI] [PubMed] [Google Scholar]

[R27] Yan H, Ren L, Godfrey DJ, Yin FF. Accelerating reconstruction of reference digital tomosynthesis using graphics hardware. Medical Physics. 2007;34:3768–76. doi: 10.1118/1.2779945. [DOI] [PubMed] [Google Scholar]

[R28] Yuan JK, Chen WM. A gamma dose distribution evaluation technique using the k-d tree for nearest neighbor searching. Medical Physics. 2010;37:4868–73. doi: 10.1118/1.3480964. [DOI] [PubMed] [Google Scholar]

PERMALINK

GPU-based fast gamma index calculation

Xuejun Gu

Xun Jia

Steve B Jiang

Abstract

1. Introduction

2. Methods and Materials

2.1 A modified γ-index algorithm

2.2 GPU implementation

2.3 Experimental data sets

3. Experimental Results and Discussion

3.1 Speedup of GPU vs. CPU

Table 1.

3.2 The effect of DD sorting on computation time $T_{γ}^{G}$

Table 2.

3.3 The effect of γ-index values on computation time $T_{γ}^{C}$ and $T_{γ}^{G}$

Figure 1.

3.4 The effect of dose distribution resolution on the computation time $T_{γ}^{C}$ and $T_{γ}^{G}$

Figure 2.

3.5 The effect of DD and DTA criteria on computation time $T_{γ}^{C}$ and $T_{γ}^{G}$

Table 3.

4. Conclusions

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

GPU-based fast gamma index calculation

Xuejun Gu

Xun Jia

Steve B Jiang

Abstract

1. Introduction

2. Methods and Materials

2.1 A modified γ-index algorithm

2.2 GPU implementation

2.3 Experimental data sets

3. Experimental Results and Discussion

3.1 Speedup of GPU vs. CPU

Table 1.

3.2 The effect of DD sorting on computation time TγG

Table 2.

3.3 The effect of γ-index values on computation time TγC and TγG

Figure 1.

3.4 The effect of dose distribution resolution on the computation time TγC and TγG

Figure 2.

3.5 The effect of DD and DTA criteria on computation time TγC and TγG

Table 3.

4. Conclusions

Acknowledgments

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

3.2 The effect of DD sorting on computation time $T_{γ}^{G}$

3.3 The effect of γ-index values on computation time $T_{γ}^{C}$ and $T_{γ}^{G}$

3.4 The effect of dose distribution resolution on the computation time $T_{γ}^{C}$ and $T_{γ}^{G}$

3.5 The effect of DD and DTA criteria on computation time $T_{γ}^{C}$ and $T_{γ}^{G}$