Fig. 3.
Benchmark of the execution times of all CUDA kernels for a B-Scan with a size of 2048 × 1000. H2D: host memory to device memory. D2H: device memory to host memory. InputCast: datacasting from 16-bit unsigned integer data type to 32-bit float point data type. OutputCast: datacasting from 32-bit float point data type to 8-bit unsigned integer data type. DC Removal: the DC background removal of an OCT B-scan. FFT: Fast-Fourier-Transfer using Cufft handle. Modulus: Calculating the magnitude of the FFT results. Flipping: B-Scan transposition using Cublas handle. CubicInterp: cubic interpolation for resampling. The error bar stands for the standard deviation of each kernel.