Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2019 Nov 16;47(22):11978–11980. doi: 10.1093/nar/gkz1106

Robustness by intrinsically disordered C-termini and translational readthrough

April Snofrid Kleppe 1, Erich Bornberg-Bauer 1,
PMCID: PMC7145639  PMID: 31733061

Nucleic Acids Research, 2019, 46(19), 10184–10194, https://doi.org/10.1093/nar/gky778

Post publication, we discovered a programming error in the script concerned with calculation of intrinsic disorder. The error was in a programming script, that was supposed to extract the ratio of disordered residues, by using Iupred, of the individual genes. Iupred is a published software program, that infers how disordered a residue (amino acid) is of a given protein, yielded into a textfile. From the textfile, one can extract the ratio of disordered residues, which is the number of how many amino acids are disordered of the given protein. However, the ratio of disordered residues for each gene, were added to a list and the mean value of the list was reported for the sequence in question. This was done instead of reporting the individual value of the sequence. We have added the lines of code where this mishap occurs, further down. Essentially, this error was committed by confusing ‘ratio of disordered residues’ with ‘mean disorder’, during programming. This loop was ongoing, and the mean value was affected by the initial values added to the list.

The lines of incorrect code in question:

disorder_list.append(seq_disorder_fraction[0])

#append ratio of disordered residues to a list

disorder_mean = np.mean(disorder_list)

#disorder mean was taken from previous list

out_list.append((seq_id, disorder_mean, anchor_mean))

# out_list was written to a text file

Lines of corrected code in bold:

disorder_list.append(seq_disorder_fraction[0])

#append ratio of disordered residues to a list

disorder_mean = np.mean(disorder_list)

#disorder mean was taken from previous list

out_list.append((seq_id, seq_disorder_fraction[0], anchor_mean))

# out_list was written to a text file

Several weeks after the submission and acceptance of the published paper, the same pipeline was applied for new data sets. Upon applying the same pipeline to new data sets, extreme anomalies were observed. Upon investigating the raw files, generated by Iupred, we took single samplings of individual genes and compared to the data given by the script. The compared data differed to such an extent that it indicated that something was incorrect with the script. We did not discover this programming flaw previously, because when we inferred our data with experimentally verified disorder (as reported in the initial sub- mission), these leaky proteins were indeed disordered at the C-termini. It was an unfortunate co-incidence that those who were experimentally verified also showed disorder in our data set. Otherwise we might have caught the programming flaw earlier.

After the finding of this programming flaw, we re-examined the sets regarding disorder. We find that when comparing the disordered C-termini between the sets, with the correct Iupred values, there are significantly more disordered C-termini in the leaky set. The difference is less dramatic than we initially reported, but there is still a significant difference between the sets. We believe the difference we find is of a significant relevance as disordered C-termini can explain the seemingly high tolerance, and thereby prevalence, of translational readthrough.

The following corrections were made to the published article:

ABSTRACT

Old: Our main finding is that proteins undergoing TR are highly expressed and have intrinsically disordered C-termini.

New: Our main finding is that proteins undergoing TR are highly expressed and have a higher proportion of intrinsically disordered C-termini.

RESULTS AND DISCUSSION

Protein characteristics

Error prone proteins have highly disordered C-termini

Old: However, we did find the non-leaky set to be significantly different from the leaky and semi-leaky set with respect to disorder distribution (Figure 2 and Supplementary Table S6): leaky and semi-leaky proteins have an overall wider distribution and lower ratio of disordered residues than non-leaky proteins.

New: We did not find the distributions of disordered residues to differ between the sets (see Figure 2).

Old: Moreover, we analysed the last 30 amino acids of the peptide chains separately and found a significant difference between sets. The majority of genes belonging to the non-leaky set have a ratio of disordered residues <0.5, whereas the vast majority of genes belonging to the leaky and semi-leaky set have a ratio of disordered residues >0.5 (Figure 3). We found five of our proteins to be curated in the DisProt database (67) (Supplementary Table S1).

New: Moreover, we analysed the last 30 amino acids of the peptide chains separately. All sets have a high frequency of disordered C-termini (see Figure 3 and Supplemental Figure S6.), but the leaky set has a significantly higher proportion than the non-leaky set (Mann Whitney one-sided rank test, pvalue 0.03). We found five of our proteins to be curated in the DisProt database (67) (see Table S1).

FIGURE 2 CAPTION

Old: Ratio of disordered residues of full protein sequences. The Y-axis displays density of sequences and the X-axes display ratio of disordered residues. The colours display what set the proteins belong too. (A) The leaky set is intermediate between the non-leaky and semi-leaky sets. The leaky set (red) is mostly overlapping with the semi-leaky set (blue), but also overlapping with the non-leaky set (green). (B) The leaky and semi-leaky sets are clustered as one (purple), whereas non-leaky is maintained unaltered (green).

New: Ratio of disordered residues of full protein sequences. The -axis displays density of sequences and the X-axes display ratio of disordered residues. The colours display what set the proteins belong too. A: The leaky set (red) is mostly overlapping with the semi-leaky set (blue), but also overlapping with the non-leaky set (green). B: The leaky and semi-leaky sets are clustered as one (purple), whereas non-leaky is maintained unaltered (green). The leaky and semi-leaky sets have a significantly higher proportion of disordered residues (Mann Whitney test, P-value 0.005, U-value 966).

FIGURE 3 CAPTION

Old: Ratio of disordered residues of last 30 amino acids of protein sequences in sets. The Y-axis displays density of sequences and the X-axis displays ratio of disordered residues. Leaky and semi-leaky sets are clustered as one (purple), whereas non-leaky is maintained unaltered (green).

New: Ratio of disordered residues of last 30 amino acids of protein sequences in sets. The Y-axis displays density of sequences and the X-axis displays ratio of disordered residues. Leaky and semi-leaky sets are clustered as one (purple), whereas non-leaky is maintained unaltered (green). Many proteins of both error prone and non-leaky set have intrinsically disordered C-termini, but the C-termini of error-prone proteins are more disordered.

FIGURE S6

Supplementary Figure S6 has been replaced.

TABLE S6

The following P-values have been corrected:

Sets: Variable Z-value New P-value Old P-value
Leaky vs Semi-leaky: disorder CDS 1.065 0.287 0.078
Leaky vs Non-leaky: disorder CDS −0.751 0.453 0.0
Semi-leaky vs Non-leaky: disorder CDS −1.225 0.221 0.0
Leaky vs Semi-leaky: disorder C-termini 1.055 0.291 0.847
Leaky vs Non-leaky: disorder C-termini 1.893 0.058 0.0
Semi-leaky vs Non-leaky: disorder C-termini −0.192 0.848 0.0

TABLE S7

The following R-squared and P-value have been corrected:

Variables R-squared P-value R-squared* P-value*
Old GC UTR and disorder CDS 0.0052 0.0321 0.0348 0.0022
New GC UTR and disorder CDS 0.0071 0.0094 0.0364 0.002
Old GC UTR and disorder C-termini 0.0078 0.0058 0.0379 0.0013
New GC UTR and disorder C-termini 0.007 0.0105 0.0367 0.0019
Old Gene expression and disorder CDS 0.0088 0.0029 ins ins
New Gene expression and disorder CDS ins ins 0.0182 0.0464
Old Gene expression and disorder C-termini 0.0237 0.0 ins ins
New Gene expression and disorder C-termini ins ins ins ins
Old length and disorder CDS 0.005 0.0363 0.0515 0.0001
New length and disorder CDS ins ins 0.0892 0.0
Old length and disorder C-termini 0.0185 0.0 0.07 0.0
New length and disorder C-termini ins ins 0.0571 0.0001

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES