Skip to main content
. 2011 Jun 1;2(7):1781–1793. doi: 10.1364/BOE.2.001781

Table 1. CUDA implementation versus C based sequential implementation.

Image Size CPU/GPU Phase extraction (ms) Residue Identification (ms) Branch cut Placement (ms) Unwrap (ms) Total(ms)
1024 × 1024 CPU 317.42 43.42 6.74 89.32 460.7
1 frame GPU 5.05 0.58 1.125 10.014 24.55
  Speedup factor 62.86 74.19 5.99 8.92 18.77
1024 × 1024 CPU 3174.2 434.2 67.4 893.2 4607.4
10 frames GPU 40.486 5.55 1.128 45.285 111.1
  Speedup factor 78.4 78.19 59.71 19.72 41.47
512 × 512 CPU 71 11 5 16 105
1 frame GPU 2.18 0.2 0.02 1.87 8
  Speedup factor 32.61 55.84 250 8.55 13.13
512 × 512 CPU 710 110 50 160 1050
10 frames GPU 11.57 1.4 0.02 6.722 26
  Speedup factor 61.37 78.57 2500 23.8 40.38