Skip to main content
. 2012 Sep 11;3:315. doi: 10.3389/fmicb.2012.00315

Table 2.

Statistical analysis of 454 pyrosequencing-induced errors for five plasmid antibodies.

Antibody Length (nt) NSeq NIden Unnormalized
Normalized (per 100 nt)
RMSMut (nt) RMSIns (nt) RMSDel (nt) RMSMut (nt) RMSIns (nt) RMSDel (nt)
NO ERROR CORRECTION
VRC01 363 47542 289 5.0 2.2 1.7 1.38 0.61 0.47
VRC03 390 53734 12309 4.6 4.1 0.9 1.18 1.05 0.23
VRC-PG04cog 369 43718 21281 5.0 1.7 0.5 1.36 0.46 0.14
gVRC-H3d74 381 53147 19843 6.3 1.9 1.1 1.65 0.50 0.29
gVRC-H6d74 399 13639 1013 6.4 2.8 1.6 1.60 0.70 0.40
WITH ERROR CORRECTION
VRC01 363 47542 334 5.8 1.9 1.2 1.60 0.52 0.33
VRC03 390 53734 12948 4.7 4.0 0.9 1.21 1.03 0.23
VRC-PG04cog 369 43718 22021 5.1 1.6 0.4 1.38 0.43 0.11
gVRC-H3d74 381 53147 23097 6.5 1.7 0.9 1.71 0.45 0.24
gVRC-H6d74 399 13639 1033 6.6 2.7 1.5 1.65 0.68 0.38

Columns include antibody name, nucleotide-sequence length of antibody heavy-chain variable domain, number of 454 pyrosequencing-determined heavy-chain variable-domain sequences for this antibody, number of sequences 100% identical to the sequenced antibody heavy-chain variable-domain, root-mean-square (RMS) fluctuation of 454 pyrosequencing-induced mutations, insertions, and deletions with respect to the input antibody sequence, and their values after normalization by a length of 100 nucleotides.

As shown in the percent sequence identity matrix in Table 1 and the divergence/identity plots in Figure 1, only five antibodies can be distinguished from others using a single sequence identity cutoff. After mapping the 454 pyrosequencing-determined heavy-chain variable-domain sequences onto 10 plasmid antibodies, a single cutoff of 75% was applied to extract sequences corresponding to VRC01, VRC03, VRC-PG04cog, gVRC-H3d74, and gVRC-H6d74, respectively.

RMSMut, RMSIns, and RMSDel were calculated using the formula:

RMSX=iNSeqXi-X¯2NSeq,

X denotes the type of sequencing error to be characterized, mutation (Mut), insertion (Ins), and deletion (Del), respectively; X¯ denotes the averaged sequencing error; NSeq denotes the total number of sequences within a given antibody group.

The RMS values were normalized using RMSnormalized = RMSunnormalized/lenghunnormalized × 100 to take into account the difference in sequence length.