Skip to main content
. Author manuscript; available in PMC: 2019 Jun 1.
Published in final edited form as: Nat Genet. 2018 Nov 5;50(12):1735–1743. doi: 10.1038/s41588-018-0257-y

Table 1 |.

The cancer sequence data used to develop machine learning models included a variety of different tumor subtypes, sequencing approaches and manual review calls

Variants

Training set Hold out test set Total
Malignancy
Leukemia (n = 243) 5,815 2,877 8,692
Lymphoma (n = 23) 8,692 628 1,891
Breast (n = 135) 8,986 4,320 13,306
Small-cell lung (n = 18) 9,177 4,601 13,778
Glioblastoma (n = 17) 844 412 1,256
Melanoma (n = 1) 185 100 285
Colorectal (n = 1) 842 419 1,261
Gastrointestinal stromal (n = 1) 70 31 101
Malignant peripheral nerve sheath (n = 1) 288 142 430
Total 27,470 13,530 41,000
Sequencing methods
Capture sequencing 9,479 4,755 14,234
Exome sequencing 9,367 4,677 14,044
Genome sequencing 8,264 4,098 12,722
Variant calls
Somatic 12,266 6,115 18,381
Ambiguous 7,189 3,454 10,643
Fail 5,909 2,945 8,854
Germline 2,106 1,016 3,122

The number of cases for each malignancy is given in parentheses.