Table 2.
Statistical analysis of 454 pyrosequencing-induced errors for five plasmid antibodies.
Antibody | Length (nt) | NSeq | NIden | Unnormalized |
Normalized (per 100 nt) |
||||
---|---|---|---|---|---|---|---|---|---|
RMSMut (nt) | RMSIns (nt) | RMSDel (nt) | RMSMut (nt) | RMSIns (nt) | RMSDel (nt) | ||||
NO ERROR CORRECTION | |||||||||
VRC01 | 363 | 47542 | 289 | 5.0 | 2.2 | 1.7 | 1.38 | 0.61 | 0.47 |
VRC03 | 390 | 53734 | 12309 | 4.6 | 4.1 | 0.9 | 1.18 | 1.05 | 0.23 |
VRC-PG04cog | 369 | 43718 | 21281 | 5.0 | 1.7 | 0.5 | 1.36 | 0.46 | 0.14 |
gVRC-H3d74 | 381 | 53147 | 19843 | 6.3 | 1.9 | 1.1 | 1.65 | 0.50 | 0.29 |
gVRC-H6d74 | 399 | 13639 | 1013 | 6.4 | 2.8 | 1.6 | 1.60 | 0.70 | 0.40 |
WITH ERROR CORRECTION | |||||||||
VRC01 | 363 | 47542 | 334 | 5.8 | 1.9 | 1.2 | 1.60 | 0.52 | 0.33 |
VRC03 | 390 | 53734 | 12948 | 4.7 | 4.0 | 0.9 | 1.21 | 1.03 | 0.23 |
VRC-PG04cog | 369 | 43718 | 22021 | 5.1 | 1.6 | 0.4 | 1.38 | 0.43 | 0.11 |
gVRC-H3d74 | 381 | 53147 | 23097 | 6.5 | 1.7 | 0.9 | 1.71 | 0.45 | 0.24 |
gVRC-H6d74 | 399 | 13639 | 1033 | 6.6 | 2.7 | 1.5 | 1.65 | 0.68 | 0.38 |
Columns include antibody name, nucleotide-sequence length of antibody heavy-chain variable domain, number of 454 pyrosequencing-determined heavy-chain variable-domain sequences for this antibody, number of sequences 100% identical to the sequenced antibody heavy-chain variable-domain, root-mean-square (RMS) fluctuation of 454 pyrosequencing-induced mutations, insertions, and deletions with respect to the input antibody sequence, and their values after normalization by a length of 100 nucleotides.
As shown in the percent sequence identity matrix in Table 1 and the divergence/identity plots in Figure 1, only five antibodies can be distinguished from others using a single sequence identity cutoff. After mapping the 454 pyrosequencing-determined heavy-chain variable-domain sequences onto 10 plasmid antibodies, a single cutoff of 75% was applied to extract sequences corresponding to VRC01, VRC03, VRC-PG04cog, gVRC-H3d74, and gVRC-H6d74, respectively.
RMSMut, RMSIns, and RMSDel were calculated using the formula:
,
X denotes the type of sequencing error to be characterized, mutation (Mut), insertion (Ins), and deletion (Del), respectively; denotes the averaged sequencing error; NSeq denotes the total number of sequences within a given antibody group.
The RMS values were normalized using RMSnormalized = RMSunnormalized/lenghunnormalized × 100 to take into account the difference in sequence length.