Skip to main content
. Author manuscript; available in PMC: 2021 Jul 13.
Published in final edited form as: J Biomed Inform. 2020 Sep 9;110:103564. doi: 10.1016/j.jbi.2020.103564

Table 2.

List of data partitions, associated primary cancer sites per group, and the number of training, validation (Val), and testing samples for each primary site code. Note that the partitioning was done by the algorithm described in Section 3.3; thus, the group may not represent the cancer topography defined in the ICD-O-3 coding manual.

Group Site Train Val Test Group Site Train Val Test
0 C00 642 62 73 6 C40 799 103 141
0 C01 3 377 362 538 6 C41 1 393 144 240
0 C02 2 491 248 383 7 C42 68,274 7 584 9 881
0 C03 503 55 104 8 C44 44,535 4 864 10,625
0 C04 959 97 94 9 C47 227 17 17
0 C05 810 84 142 9 C48 1 574 156 225
0 C06 1 004 115 140 9 C49 5 324 538 881
0 C07 1 585 161 264 10 C50 144,230 16,089 24,746
0 C08 445 43 71 11 C51 4 881 541 630
0 C09 4 332 495 650 11 C52 1 005 104 117
0 C10 800 86 159 11 C53 6 362 713 919
0 C11 1 065 102 144 12 C54 22,083 2 428 3 747
0 C12 466 44 26 12 C55 626 57 110
0 C13 580 51 88 12 C56 9 080 990 1 333
0 C14 610 65 83 12 C57 1 229 128 281
0 C15 6 257 673 1 043 12 C58 37 4 1
1 C16 12,374 1 443 2 367 13 C60 843 104 119
1 C17 4 241 480 655 14 C61 54,136 5 979 11,878
2 C18 44,198 4 949 7 275 15 C62 2 651 307 417
3 C19 4 544 469 689 15 C63 149 18 21
3 C20 13,392 1 498 2 346 15 C64 16,033 1 686 2 731
3 C21 2 856 349 494 15 C65 1 781 197 272
3 C22 6 350 734 1 135 15 C66 1 228 136 177
3 C23 1 439 161 240 16 C67 27,165 3 028 3 902
3 C24 2 044 219 347 16 C68 770 110 128
4 C25 13,652 1 532 2 608 16 C69 867 123 112
4 C26 1 097 134 217 16 C70 3 460 353 661
4 C30 700 74 89 17 C71 9 210 1 023 1 593
4 C31 576 76 111 17 C72 1 332 159 215
4 C32 6 049 730 726 17 C73 16,866 1 940 2 447
4 C33 116 16 7 17 C74 534 59 39
5 C34 94,089 10,590 16,276 17 C75 2 296 244 437
6 C37 344 35 46 17 C76 419 44 52
6 C38 1 954 222 322 18 C77 36,900 4 191 5 196
6 C39 4 0 7 19 C80 8 428 943 1 412