Table 1.
Contamination type | Cause (the type of co-multiplexed samples) | Possible somatic variant calling artefacts | Prevalence of given contamination type in affected datasets | Suitable post-sequencing filtering options |
---|---|---|---|---|
a) Contaminant germline variants in a tumour sample | Any samples from other individuals | False positive somatic variants in the form of germline variation from other individuals | The most likely contamination type to occur; Contamination targets are expected to be more affected in copy number loss regions* |
A variant filter based on an appropriate germline variant database or a relevant panel of normal samples; A filter based on PC-AF values (if a more discriminative solution is necessary) |
b) Contaminant somatic variants in a tumour sample | Other tumour samples | False positive “recurrent” somatic variants in the form of somatic variation from other tumour samples – whether from other individual(s) or the same individual | Expected to be relevant in tumour sample pools enriched** for specific somatic variants; Contamination targets are expected to be more affected in copy number loss regions* |
A filter based on PC-AF values (non-discriminative filtering might lead to false negatives of high importance) |
c) Contaminant germline variants in a control sample | Any samples from other individuals | False negatives/missed somatic variant calls – only concerning somatic variants that also occur as germline variants | Dependent on the occurrence of important variants as both germline and somatic in a given project’s setting | Review of calls not classified as somatic, adjustment of the variant caller parameters |
d) Contaminant somatic variants in a control sample | Any tumour samples | False negatives/missed somatic variant calls – concerning all somatic variants | Elevated relevancy when matched samples are co-multiplexed; Prevalence dependent on the enrichment** of potential contaminant variants in a given sample pool; Consequences dependent on variant caller’s tendency to reject a somatic variant candidate due to evidence of its presence in the matched control |
Review of calls not classified as somatic, adjustment of the variant caller parameters |
*Copy number loss regions of high-purity tumour samples will be especially affected.
**The enrichment will increase together with given variant’s recurrence, as well as with purity of tumour samples that carry the variant.