. 2022 May 31;7(3):e00028-22. doi: 10.1128/msystems.00028-22

TABLE 2.

Speedups on the 113k data set relative to a few different architectures for unweighted UniFrac^a

Platform	RAM (GB)	Runtime (h)	Speedup	GPU speedup	No. of chunks
Original CPU Xeon Gold 6242	5.5	498	1×		36
CPU Mobile i7-8850H	Not collected	10	50×		12
CPU Xeon Gold 6242	148	3	166×	1×	1
GPU Mobile GTX 1050 Max-Q	3.6	3	166×	1×	36
GPU T4	38	0.68	730×	4.4×	4
GPU RTX2080TI	27	0.32	1,560×	9.4×	6
GPU V100 PCIE 32GB	75	0.22	2,260×	13.6×	2
GPU RTX3090	51	0.19	2,600×	15.8×	3

Speedup is relative to performance on the same data using Striped UniFrac from McDonald et al. (10). In all cases, all available compute resources for an architecture were utilized. Peak resident memory for the runs is provided; however, the amount of maximum memory used for processing is a function of how many chunks are processed at one time. The largest memory use comes from creating the distance matrix that is N² to the number of samples (not shown) and is effectively invariant to the architecture.