Skip to main content
. 2021 Sep 28;10(9):giab055. doi: 10.1093/gigascience/giab055

Figure 3:

Figure 3:

Sample selection bias: three examples. On the right are graphs giving conditional independence relations [40]. Y is the lesion volume to be predicted (i.e., the output). M are the imaging parameters, e.g., contrast agent dosage. X is the image, and depends both on Y and M (in this toy example X is computed as Inline graphic, where ϵ is additive noise). S indicates that data are selected to enter the source dataset (orange points) or not (blue points). The symbol Inline graphic means independence between variables. Preferentially selecting samples results in a dataset shift (middle and bottom row). Depending on whether Inline graphic, the conditional distribution of Inline graphic—here lesion volume given the image—estimated on the selected data may be biased or not.