Skip to main content
. 2022 May 31;7(3):e00028-22. doi: 10.1128/msystems.00028-22

TABLE 2.

Speedups on the 113k data set relative to a few different architectures for unweighted UniFraca

Platform RAM (GB) Runtime (h) Speedup GPU speedup No. of chunks
Original CPU Xeon Gold 6242 5.5 498   36
CPU Mobile i7-8850H Not collected 10 50×   12
CPU Xeon Gold 6242 148 3 166× 1
GPU Mobile GTX 1050 Max-Q 3.6 3 166× 36
GPU T4 38 0.68 730× 4.4× 4
GPU RTX2080TI 27 0.32 1,560× 9.4× 6
GPU V100 PCIE 32GB 75 0.22 2,260× 13.6× 2
GPU RTX3090 51 0.19 2,600× 15.8× 3
a

Speedup is relative to performance on the same data using Striped UniFrac from McDonald et al. (10). In all cases, all available compute resources for an architecture were utilized. Peak resident memory for the runs is provided; however, the amount of maximum memory used for processing is a function of how many chunks are processed at one time. The largest memory use comes from creating the distance matrix that is N2 to the number of samples (not shown) and is effectively invariant to the architecture.