Table 1b.
Levenshtein distances of in-silico generated antibody sequences from their closest training antibody sequences in the control and training datasets.
| Dataset | L-dista | ScFv | VH | VL | HCDR1 | HCDR2 | HCDR3 | LCDR1 | LCDR2 | LCDR3 |
|---|---|---|---|---|---|---|---|---|---|---|
| 71 283 sequences in control dataset | # of 0 L-dist | 0 | 231 | 0 | 78 851 | 56 156 | 9031 | 79 854 | 96 170 | 67 842 |
| Mean ± std | 24.8 ± 5.1 | 11.7 ± 4.5 | 7.0 ± 2.1 | 0.3 ± 0.6 | 0.9 ± 1.2 | 4.1 ± 2.4 | 0.2 ± 0.5 | 0.1 ± 0.2 | 0.4 ± 0.6 | |
| Range (min - max) | 4–46 | 0–31 | 1–16 | 0–4 | 0–8 | 0–13 | 0–3 | 0–2 | 0–3 | |
| 31 416 sequences in training dataset | # of 0 L-dist | 9 | 1184 | 1464 | 74 609 | 48 517 | 11 169 | 78 326 | 91 163 | 68 081 |
| Mean ± std | 22.7 ± 6.2 | 11.1 ± 5.0 | 5.4 ± 2.3 | 0.3 ± 0.6 | 0.9 ± 1.1 | 4.1 ± 2.6 | 0.2 ± 0.5 | 0.1 ± 0.3 | 0.4 ± 0.6 | |
| Range (min - max) | 0–46 | 0–31 | 0–15 | 0–4 | 0–6 | 0–14 | 0–3 | 0–4 | 0–3 |
L-dist stands for Levenshtein distance.