Skip to main content
. 2020 Dec 15;22(12):e20756. doi: 10.2196/20756

Table 4.

Features of data sets used for development and validation of artificial intelligence models.

Features Studies (N=82), n
Data sourcesa

Public databases 52

Clinical settings 24

Government sources 9

Literature 6

News websites 2

Participants 1
Data typesb

Radiology image 35

Biological data 23

Epidemiological data 15

Clinical data 11

Laboratory data 8

Demographic data 5

Guidelines 1

News articles 1
Data set sizec

<1000 26

1000-9999 16

≥10,000 8
Type of validationd,e

Train-test split 25

K-fold cross-validation 18

External validation 11
Proportion of training set (%)f

≤25 3

26-50 2

51-75 16

>75 28
Proportion of validation set (%)g

≤25 8

26-50 3

51-75 0

>75 0
Proportion of test set (%)h

≤25 35

26-50 10

51-75 3

>75 1

aNumbers do not add up as several studies collected their data from more than one data source.

bNumbers do not add up as several studies collected more than one type of data.

cData set size was reported in 50 studies.

dType of validation was reported in 53 studies.

eNumbers do not add up as 1 study used two different types of validation.

fProportion of the training set was reported in 49 studies.

gProportion of the validation set was reported in 11 studies.

hProportion of the test set was reported in 49 studies.