. 2022 Feb 15;12:2517. doi: 10.1038/s41598-022-06547-3

Table 1.

Datasets collected and purpose.

	Dataset name	Purpose	# of variants
Model building	Clinvitae training	Training	8496
Model building	Clinvitae probability threshold tuning (PTT)	Tuning the probability threshold for classification	4247
Model validation	Clinvitae test	Comparison between different ML methods and the pathogenicity score in¹⁹	1415
	Clinvitae Validation	Testing classification of the selected ML method, in comparison with the pathogenicity score and the bayesian score	161,744
	ICR639	Testing classification and prioritization of the selected ML method on a real dataset, in comparison with the pathogenicity score, the bayesian score, CADD and VVP	18,046