Skip to main content
. 2011 Oct 28;2(11):3207–3222. doi: 10.1364/BOE.2.003207

Table 2.

Comparison of Single-Threaded CPU and Parallelized GPU Reconstruction Times for the Digimouse Phantom with 174080 Elements

Task CPU GPU Speed-up

Assembly of the system matrices
  Compute element contributions
    Mesh level 1 n.a.1 0.04 ms
    Mesh level 2 n.a.1 0.10 ms
    Mesh level 3 n.a.1 0.66 ms
  Convert to CRS format
    Mesh level 1 n.a.1 0.72 ms
    Mesh level 2 n.a.1 2.06 ms
    Mesh level 3 n.a.1 13.17 ms
  Total
    Mesh level 1 0.63 ms 0.76 ms 0.83
    Mesh level 2 6.37 ms 2.16 ms 2.95
    Mesh level 3 121.97 ms 13.83 ms 8.82

Solution of the linear systems
  PBCG without multigrid
    Forward solution (Vx, Vm) 10.12 s 736.64 ms 13.73
    Adjoint solution (Wx, Wm) 8.84 s 633.92 ms 13.94
  PBCG with multigrid
    Forward solution (Vx, Vm) 5.45 s 461.60 ms 11.81
    Adjoint solution (Wx, Wm) 6.02 s 508.29 ms 11.85

Computation of measurements
  DVm 496.40 ms 6.53 ms 76.01

Assembly of the sensitivity matrix
  assembleSensitivity 16.68 s 1.03 s 16.23

Solution of the Gauß-Newton system (Sk*Sk+αkI)1 10.87 s 331.92 ms 32.75

Total reconstruction time
  Without multigrid 6.39 min 27.39 s 14.00
  With multigrid 5.41 min 24.94 s 13.01
1

On the CPU the system matrices are assembled in one step