Given NCPU available and Nmidpt mid-points, we assume Nmidpt ≫ NCPU.
-
Step 1
All atoms are sorted according to their z-coordinates either by the regular (sequential) Quicksort method on the master processor, or by a parallel Quicksort method on all processors.
-
Step 2
Many midpoints may have the same z-coordinates. Count the number of different z-coordinates that the mid-points have (for example, say p different z-coordinates in total) and save the number of mid- points with the same z-coordinate in an array named counts (for example, say counts={k1, k2, …, kp} such that
.
-
Step 3
Each processor intends to perform calculations on the number of midpoints as close as possible in order to balance the workload on each processor as much as possible. Because of this, first of all, the smallest number q1 satisfying
is searched and obtained. Then all mid-points with their z- coordinates belonging to the subset {k1, k2, …, kq1, kq1+1} are given to the 1st processor for future parallel computing. Notice that all mid-points with z-coordinates equal to kq1+1 serve as the “extended right boundary grids”, especially in the case of kq1+1 = kq1 + 1. The results on them are calculated in case they may have impact on the calculations on the rest of points. However, these results are not collected by the master processor for the final assemblage.
Similarly, starting from kq1 + 1, the smallest number q2 satisfying
is searched and obtained. Then all mid-points with their z-coordinates belonging to the subset {kq1, kq1+1, …, kq2, kq2+1} are given to the 2nd processor for future parallel computing. All mid-points with z-coordinates equal to kq1 or kq2+1 serve as the “extended left/right boundary grids”, respectively, and the results obtained on them are not collected.
Repeat the same procedure until all mid-points are assigned to one processor. It is possible that there are more processors left. In such case, these processors are marked “idle” and not involved in the next step parallel computing.
-
Step 4
All processors perform calculations of constructing surface on its own mid-points independently. The obtained results are sent back to the master processor for the final assemblage.
|