. 2022 Jul 5;13:3863. doi: 10.1038/s41467-022-31502-1

Table 1.

Main characteristics of the datasets used to construct the WIS, UNITN, and UHGG reference sets.

	N	WIS	UNITN	UHGG
Samples	Samples	51,052	9428	Did not generate new assemblies from samples
	Body sites	1: Gut (100%)	5: Gut (85%), oral (8.5%), skin (5.4%), vagina (1%), and maternal milk (0.1%)	NA
	Countries	2: Israel (90%) and USA (10%)	31: USA (15%), China (14%), Israel (10%), Sweden (6%), and Denmark (6%)*	NA
	Age	Adults (99%) and children (<1%)	Adults (81%) and children (19%)	NA
	Gender	Female (61%) and male (39%)	Not specified	NA
Assemblies	Assemblies from samples before filtration criteria	483,192	345,654	0
	Assemblies from samples after filtration criteria^#	142,912 (30%)	154,723 (45%)	0 (0%)
	External assemblies after filtration criteria^#	98,206 (88% Passoli et al.⁷)	80,990	286,997 (48% Passoli et al.⁷)
	Total assemblies used	241,118 (36% Passoli et al.⁷)	154,723	286,997 (48% Passoli et al.⁷)
Clusters	Species	3594	4930	4644
	Genera	2365	2640	Not specified
	Families	627	778	Not specified

*Five most represented countries.

^#Each set of assemblies went through different filtration criteria (“Methods”).