Skip to main content
. 2023 Dec 18;3:1274599. doi: 10.3389/fbinf.2023.1274599

TABLE 1.

Dataset statistics. The “Interaction” column means the unique count of pairs of {CDR3α, CDR3β, peptide}, and CDR3αβ denotes the unique count of pairs of {CDR3α, CDR3β}. The duplication count, the “in duplication” row of the “Unique count” column, means the number of unique data that are shared between training and test sets, i.e., overlapped data count. The “Pos. rate” column denotes the positive ratio in the binary label.

Dataset Unique count CDR3αβ Peptide Interaction Pos. rate
McPAS In training 3,181 316 23,363 0.1665
McPAS In test 833 190 4,729 0.1512
- In duplication b/w training and test 132 171 0 N/A
VDJdb-without10x In training 2,902 175 19,526 0.1670
VDJdb-without10x In test 689 120 4,010 0.1504
- In duplication b/w training and test 111 111 0 N/A
Combined data dataset (A) In training 23,299 478 119,046 0.1400
Recent data test set (B) In test 33,183 838 33,360 0.1667
COVID-19 dataset (C) In test 1,676 1265 2,120,140 1.887 ⋅ 10–5
- In duplication b/w (A) and (B) 18 44 0 N/A
- In duplication b/w (A) and (C) 1 0 0 N/A