. 2020 Dec 15;22(12):e20756. doi: 10.2196/20756

Table 4.

Features of data sets used for development and validation of artificial intelligence models.

Features		Studies (N=82), n
Data sources^a
	Public databases	52
	Clinical settings	24
	Government sources	9
	Literature	6
	News websites	2
	Participants	1
Data types^b
	Radiology image	35
	Biological data	23
	Epidemiological data	15
	Clinical data	11
	Laboratory data	8
	Demographic data	5
	Guidelines	1
	News articles	1
Data set size^c
	<1000	26
	1000-9999	16
	≥10,000	8
Type of validation^d,e
	Train-test split	25
	K-fold cross-validation	18
	External validation	11
Proportion of training set (%)^f
	≤25	3
	26-50	2
	51-75	16
	>75	28
Proportion of validation set (%)^g
	≤25	8
	26-50	3
	51-75	0
	>75	0
Proportion of test set (%)^h
	≤25	35
	26-50	10
	51-75	3
	>75	1

^aNumbers do not add up as several studies collected their data from more than one data source.

^bNumbers do not add up as several studies collected more than one type of data.

^cData set size was reported in 50 studies.

^dType of validation was reported in 53 studies.

^eNumbers do not add up as 1 study used two different types of validation.

^fProportion of the training set was reported in 49 studies.

^gProportion of the validation set was reported in 11 studies.

^hProportion of the test set was reported in 49 studies.