Skip to main content
[Preprint]. 2024 Aug 19:2024.05.24.595788. [Version 3] doi: 10.1101/2024.05.24.595788

Table 1.

Overview of the 5,734 WGS samples analyzed in this study. Column 4 shows how many reads remained per sample after removing reads that mapped to the GRCh38 human genome. Column 6 shows the number of reads remaining after a second pass further removed reads matching the CHM13 genome. Columns 8 and 10 show the average numbers of reads from column 6 that were identified by Kraken as either human or vector.

Cancer type Total # samples Average read count (millions) Unmapped reads after mapping to GRCh38 (avg, thousands) Unmapped reads after mapping to GRCh38+CHM13 (avg, thousands) Kraken-identified human reads (avg, thousands) Kraken-identified vector reads (avg, thousands)
BLCA 288 284 5,983 (2.10%) 4,829 (1.70%) 199.5 (0.07%) 14 (0.00%)
BRCA 245 690 1,738 (0.25%) 1,532 (0.22%) 35.0 (0.01%) 418 (0.06%)
CESC 130 375 3,347 (0.89%) 2,675 (0.71%) 113.9 (0.03%) 83 (0.02%)
COAD 262 354 5,195 (1.47%) 4,496 (1.27%) 183.9 (0.05%) 745 (0.21%)
DLBC 14 926 381 (0.04%) 381 (0.04%) 0.1 (0.00%) 363 (0.04%)
ESCA 115 349 3,546 (1.02%) 2,620 (0.75%) 88.2 (0.03%) 584 (0.17%)
GBM 117 777 898 (0.12%) 758 (0.10%) 108.6 (0.01%) 28 (0.00%)
HNSC 335 388 5,334 (1.38%) 4,348 (1.12%) 235.4 (0.06%) 279 (0.07%)
KICH 100 869 454 (0.05%) 453 (0.05%) 0.2 (0.00%) 436 (0.05%)
KIRC 87 705 1,054 (0.15%) 1,054 (0.15%) 0.1 (0.00%) 400 (0.06%)
KIRP 77 866 340 (0.04%) 340 (0.04%) 0.2 (0.00%) 324 (0.04%)
LAML 110 742 3,165 (0.43%) 2,205 (0.30%) 264.0 (0.04%) 899 (0.12%)
LGG 185 444 3,546 (0.80%) 2,804 (0.63%) 98.6 (0.02%) 2 (0.00%)
LIHC 108 905 391 (0.04%) 391 (0.04%) 0.2 (0.00%) 368 (0.04%)
LUAD 577 473 4,808 (1.02%) 4,185 (0.88%) 126.3 (0.03%) 1,964 (0.42%)
LUSC 100 906 194 (0.02%) 194 (0.02%) 0.1 (0.00%) 2 (0.00%)
OV 121 747 454 (0.06%) 454 (0.06%) 0.3 (0.00%) 346 (0.05%)
PRAD 272 304 6,680 (2.20%) 5,152 (1.69%) 305.7 (0.10%) 15 (0.01%)
READ 120 283 6,620 (2.34%) 5,704 (2.02%) 230.2 (0.08%) 969 (0.34%)
SARC 82 733 181 (0.02%) 181 (0.02%) 0.1 (0.00%) 159 (0.02%)
SKCM 320 311 3,826 (1.23%) 3,134 (1.01%) 116.7 (0.04%) 632 (0.20%)
STAD 299 368 5,292 (1.44%) 3,877 (1.05%) 284.7 (0.08%) 28 (0.01%)
THCA 1,248 784 676 (0.09%) 527 (0.07%) 20.6 (0.00%) 43 (0.01%)
UCEC 320 383 4,590 (1.20%) 3,811 (1.00%) 290.7 (0.08%) 461 (0.12%)
UVM 102 156 3,599 (2.31%) 2,548 (1.64%) 103.1 (0.07%) 2 (0.00%)
Total 5,734 3,099,077 18,420,684 (0.59%) 14,998,914 (0.48%) 713,538 (0.02%) 2,444,357 (0.08%)
Average 229 540 3,213 0.59% 2,616 0.48% 124 0.02% 426 0.08%