Fig. 4.
Speed increase for cubic spline interpolation and Loupas' algorithm as a function of a) the number of track pulses (assuming 52 push locations) and b) the number of push locations (assuming 80 track locations). As in figure 3b, the speed increase of interpolation plateaus at approximately 41× faster for the CUDA code as compared to the C++ code. Similarly, Loupas' algorithm plateaus at 27× faster for the CUDA code. Additionally, 37% of the computation time associated with Loupas' algorithm is devoted to copying the displacement estimates from GPU memory to CPU memory.