Skip to main content
. 2016 Nov 15;5:e18722. doi: 10.7554/eLife.18722

Appendix 1—figure 1. Computational flow in difference calculation kernel.

Appendix 1—figure 1.

The kernel is initiated with ceil(𝐏/P0) thread-blocks and N threads, where P is the total number of projections. The work flow of a thread-block in each iteration i is divided into two stages. In stage A the N pixels of P0 reference slices are fetched through texture memory, interpolated, and stored in shared memory. This data is then exhaustively reused in stage B, where groups of threads compute the differences to the corresponding translated image components. Individual threads within a group work with different image components, n, of each reference slice, p. Collectively all threads iterate through the N components of each reference slice, for a total of N×P0 components for each iteration i. The final result is reduced back into shared memory through atomic reduction operations. All image components are covered as i goes from 1 to ceil(C/N), where C is the total number of Fourier components. A reduced sum of differences for each pair of orientation and translation is written to global memory prior to the kernel exiting.

DOI: http://dx.doi.org/10.7554/eLife.18722.012