Table 3. Characteristics of the three CL reference library used in this work.
Reference Library | Total number of variants | Cancer Cell Lines | Ø Variants per CL | Number of genes covered | Variant calling software | SNP MAF filtering |
---|---|---|---|---|---|---|
COSMICCLP | 760E5 | 1024 | 7,4E5 | 20965 | Caveman(13) | > 0.0 |
Pindel (14) | (all)* | |||||
CCLE | 140E5 | 904 | 1,5E5 | 1651 | MuTect (15) | > = 0.05 |
CellMiner | 0,68E5 | 60 | 0,01E5 | > 20 k | GATK (16) | None |
The absolute and the average number of variants differ by orders of magnitude since different technologies and algorithms were utilized for sequencing and variant calling. Moreover, the number of genes covered varies strongly. SNPs – required for SNP-based identification - have been mostly or completely excluded in two of the three sets. For the COSMIC CLP, two different methods were used to call small variants and indels.