Skip to main content
PLOS One logoLink to PLOS One
. 2022 Apr 18;17(4):e0266929. doi: 10.1371/journal.pone.0266929

idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in R

William M McFadden 1,¤, Judith L Yanowitz 1,2,*
Editor: Eugene A Permyakov3
PMCID: PMC9015136  PMID: 35436286

Abstract

Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are proteins or protein-domains that do not have a single native structure, rather, they are a class of flexible peptides that can rapidly adopt multiple conformations. IDPs are quite abundant, and their dynamic characteristics provide unique advantages for various biological processes. The field of “unstructured biology” has emerged, in part, because of numerous computational studies that had identified the unique characteristics of IDPs and IDRs. The package ‘idpr’, short for Intrinsically Disordered Proteins in R, implements several R functions that match the established characteristics of IDPs to protein sequences of interest. This includes calculations of residue composition, charge-hydropathy relationships, and predictions of intrinsic disorder. Additionally, idpr integrates several amino acid substitution matrices and calculators to supplement IDP-based workflows. Overall, idpr aims to integrate tools for the computational analysis of IDPs within R, facilitating the analysis of these important, yet under-characterized, proteins. The idpr package can be downloaded from Bioconductor (https://bioconductor.org/packages/idpr/).

Introduction

Intrinsically disordered proteins (IDPs) are proteins that lack a single, rigid structure under native conditions [14], challenging the long-held paradigm that structure leads to function. In addition to typical cellular processes, IDPs have been implicated in human diseases such as neurodegenerative disorders and various cancers [57]. IDPs contain one or more intrinsically disordered region (IDR), which are regions of proteins composed of thirty or more disordered residues. Bioinformatic studies have shown that one-third to one-half of eukaryotic proteomes are predicted IDPs [811]. Further, viral proteomes appear to be enriched in IDPs, exemplified with the most disordered proteome observed belonging to the Avian carcinoma virus with an average disorder composition of over 77% [9].

Due to their apparent abundance and relevance, research interest in IDPs has been increasing [12]. In this regard, there are many computational tools that predict the intrinsic disorder within a protein sequence [1315]. These tools utilize known differences between disordered and ordered proteins, such as the distinct compositional profile, evolutionary rate, and biochemical properties of IDPs and IDRs compared to proteins or protein-regions with compact, ordered structure [1619]. Since IDPs have decreased levels of secondary and tertiary structures [1], the primary structure serves as the principal source of computational information for IDPs. Thus, most IDP prediction tools rely on the protein’s sequence of amino acids, commonly represented as a character string of individual letters [1315]. While several R packages analyze protein characteristics based on the amino acid sequence alone, to our knowledge, there is not been a package that is focused on the unique features of IDPs and IDRs.

The R package that we created borrows its acronym from “IDPR” or Intrinsically Disordered Protein Regions; idpr stands for “Intrinsically Disordered Proteins in R”. The goal of this R package is to integrate tools for IDP analysis, including amino acid composition, charge, and hydropathy, using the R platform. Additional IDP analysis is facilitated by several amino acid substitution matrices that are IDP-specific [2022] as well as linking to the suite of disorder predictions by IUPred2A [23, 24] retrieved by connection to their REST API. The idpr package can be found at https://bioconductor.org/packages/idpr/.

idpr aims to balance a workflow that automatically generates key visualizations for users of any skill level with a workflow that allows dynamic input and custom output for more-experienced users. The ggplot2 package [25] is used to generate the visualizations, allowing users to access ggplot theme options and aesthetics for further customization. Additionally, idpr graphic functions give users the option to return calculations as values for downstream analysis. Overall, idpr aims to integrate multiple tools for the computational analysis of intrinsically disordered proteins within R.

Methods

A. Implementation

idpr is implemented as an open-source R [26] / Bioconductor [27] package under an LGPL-3 license. For integration with various packages, idpr functions accept protein sequences as character strings, vectors of individual amino acids, and XString objects from the Biostrings package [28]. Alternatively, functions can analyze sequences directly from.fasta files. Substitution matrices within this package can integrate with other R packages used for sequence analysis and multiple sequence alignments. Package dependencies include R version 4.1.3, Biostrings [28], jsonlite [29], and several tidverse packages [30] including ggplot2 [25].

idpr is a well-documented package with detailed user manuals and function descriptions, generated with roxygen2 [31]. This package also includes six vignette documents (long-form documentation) that discusses the theories of IDPs with the functionality of the package. Versions of idpr can be installed through the BiocManager package manager [32] from Bioconductor Release ≥3.13 (bioconductor.org) with the following line of R code: BiocManager::install("idpr"). idpr version 1.5.11 was used for this publication and the workflow can be found in the supplementary materials or at dx.doi.org/10.17504/protocols.io.kqdg3p241l25/v1.

B. ‘idprofile’

To quickly generate the idpr profile for a protein of interest, a UniProt ID and the amino acid sequence are used to create multiple plots with a single command. idprofile() serves as a wrapping function for key graphing tools within idpr. These plots include: Charge-Hydropathy Plot, Local Charge Plot, Local Scaled Hydropathy Plot, Structural Tendency Plot, Compositional Profile Plot, and IUPred Plot, and FoldIndex Plot (Discussed Below). If a UniPot ID is not included, the IUPred plot is skipped (Fig 1). Please refer to the supplementary workflow and package documentation for details on using idprofile and other idpr functions.

Fig 1. The idprofile of α-Synuclein, generated by idpr, returns IDP characteristics.

Fig 1

(A) Charge-hydropathy plot of α-Synuclein (αSyn) predicts a collapsed protein. Method are described in [16]. Mean Scaled Hydropathy calculated with the Kyte and Doolittle measurement of hydropathy [33], scaled to Arg = 0.0 and Ile = 1.0. Mean Net Charge calculated with IPC_protein pKa values [34]. Cutoff equation is <Charge> = ±2.785<Hydropathy>±1.151 as described previously [18]. Proteins are considered insoluble when <Hydropathy> ≥ 0.7. (B) Structural tendency plot shows αSyn is enriched in disorder-promoting residues. Disorder-promoting residues (P, E, S, Q, K, A, and G) in green; order-promoting residues (M, N, V, H, L, F, Y, I, W, and C) in purple; disorder‐neutral residues (D, T, and R) in pink [35]. (C) Local Charge Plot shows an acidic C-terminus. The local charge is the average of a 9 amino acid wide sliding window, calculated with the IPC_protein pKa values [34]. (D) Local Hydropathy Plot shows a C-terminus deficient in hydrophobic residues. The local hydropathy is the average of a 9 amino acid wide sliding window, calculated with the scaled Kyte and Doolittle measurement of hydropathy [33]. (E) IUPred2 predicts a C-terminal IDR in αSyn [23, 24]. Residues with a score 0.0–0.5 are predicted to be ordered, regions 0.5–1.0 are predicted to be disordered. (F) FoldIndex predicts a C-terminal IDR in αSyn [36]. Residues with a score 0.0 –+1.0 are predicted to be ordered, regions -1.0–0.0 are predicted to be disordered.

C. Charge and hydropathy

It has been previously shown that both extreme net charge and deficiency in hydropathy are characteristics of intrinsic disorder proteins [16]. Extended IDPs will occupy a unique area on the plots of both average net charge and mean scaled hydropathy, [16]. meaning that the Charge-Hydropathy Plot can distinguish compact from extended proteins under native conditions (Fig 1A). One cannot, however, make a general rule about where IDPs on the spectrum from collapsed protein or an extended protein because IDPs can have the characteristics of either [16, 37].

Protein charges are calculated using the Henderson-Hasselbalch equation [38] with the IPC_protein pKa values [34] by default, although 15 additional pKa data sets are loaded into idpr for user preference. The Kyte and Doolittle measurement of hydropathy [33] are used, scaled with Arg having a hydropathy of 0.0 and Ile having a hydropathy of 1.0. Local charges and local hydropathy are calculated using a sliding window to identify regions of interesting chemistry (Fig 1C and 1D). The sliding window is 9 residues by default but can be changed to any odd number. The resulting figure is similar to one that can be obtained by the ProtScale tool from ExPASy [39].

D. Structural tendency

IDPs as a class tends to have a different composition of amino acids, and therefore distinct overall chemistry, from that of ordered proteins [40]. The chemistry of the specific residues influences its tendency to favor an extended or a compact structure. Residues enriched in the amino acid sequences of IDPs are typically charged, flexible, hydrophilic, or small; whereas order-promoting residues, found in structured proteins, tend to be hydrophobic, aromatic, aliphatic, or disulfide bond. There are also disorder-neutral residues [18, 35]. The default values, described previously [35], are disorder-promoting residues: P, E, S, Q, K, A, and G; order-promoting residues: M, N, V, H, L, F, Y, I, W, and C; and disorder‐neutral residues: D, T, and R. These are represented by the structural tendency plot (Fig 1B). Other definitions of order- and disorder-promoting have been published [11], so users can opt to manually specify residue definitions.

E. Disorder predictions

FoldIndex utilizes the described relationship of charge and hydropathy to identify unstructured regions of amino acid sequences [16, 36]. This method is implemented as part of many other prediction programs since it was described in 2005. Using a sliding window of size 51, a negative score (<0) indicates a region is predicted disordered; windows with a positive score (>0) are predicted as ordered [36]. Calculations are made with charge and hydropathy functions within idpr and uses IPC_protein pKa values [34] at pH 7.0 and the scaled Kyte and Doolittle measurement of hydropathy [33].

The IUPred2 algorithm calculates a score of intrinsic disorder based on a model of the estimated energy potential for each residue interactions [23]. The structure in protein comes from a network of intramolecular interactions between amino acids. In IDPs, the (lack of) structure comes from the increase interactions of the amino acids with the surrounding environment. This reduced number of interactions leads to the IDP lacking secondary and tertiary structure [41]. IUPred2 predictions are made on a scale of 0.0–1.0, with 0.5 being the dividing line between order and disorder. >0.5 predicting a disordered region; <0.5 predicting an ordered region [23, 24, 41] (Fig 1E). An additional prediction of intermolecular protein-protein interactions is performed with the ANCHOR2 program (Fig 2A), and another predictor of redox-sensitive disorder is performed with IUPred2A Redox (Fig 2B) [23, 24, 41]. A Uniprot ID is required to access the IUPred2A REST API, as well as an internet connection. Visit the IUPred2A website (https://iupred2a.elte.hu/) for terms of use, references, and additional information.

Fig 2. Disorder predictions for p53 domains recapitulate environmental sensitivities.

Fig 2

(A) IUPred2A predicts multiple IDRs that promote protein-protein interactions in p53 [23, 24]. Residues with an IUPred2 score (green and purple line) of 0.0–0.5 are predicted to be ordered and residues 0.5–1.0 are predicted to be disordered. Residues with an ANCHOR2 score (red line) greater than 0.5 are predicted to be IDRs and protein-binding domains. (B) IUPred2A Redox predicts several oxidation-sensitive regions in p53 [23, 24]. Redox-plus (reducing environment) predictions are shown in blue, Redox-minus (oxidizing environment) predictions are shown in purple. Regions predicted as “Redox Sensitive” are highlighted in light green. Residues with an IUPred score of 0.0–0.5 are predicted to be ordered and residues 0.5–1.0 are predicted to be disordered. (C) Sequence map of structural tendency for each residue highlights the composition of p53 domains. N-terminal Domain (NTD) annotated by the black bar, DNA-Binding Domain (DBD) annotated by the red bar, C-terminal Domain (CTD) annotated by the grey bar. Conserved Cys residues (C124, C135, C141, C176, C182, C229, C238, C242, C275, C277] annotated [42]. Disorder-promoting residues (P, E, S, Q, K, A, and G) highlighted in green; order-promoting residues (M, N, V, H, L, F, Y, I, W, and C) in purple; disorder‐neutral residues (D, T, and R) in pink [35].

F. Visualizing discrete values

As mentioned above, specific amino acid residues are preferentially enriched in unstructured or ordered regions [35]. To visualize the location of assigned residue characteristics in the context of the amino acid sequence, idpr contains a way to visualize discrete values with a ‘sequenceMap’ (Fig 2C, S1B Fig in S1 File). This is not part of the idprofile function but is included within the package for additional investigation. The values visualized can be results from idpr or from any other source. This function can also visualize continuous values.

G. Substitution matrices for analyzing IDPs

Because there is less restraint to maintain a specific 3D structure IDPs and IDRs tend to evolve faster than ordered proteins [17, 43]. Therefore, IDPs tend to accept increased point mutations at disparate rates when compared to ordered proteins [21].

Currently, PAM and BLOSUM are the most amino acid substitution matrices [44, 45], which are integrated into many web-based tools including NCBI-BLAST+ and EMBOSS [46, 47]. However, customization of the matrices is often desired but is not possible with these online programs. That said, BLOSUM and PAM matrices can both be used with alignment programs in R when loaded via the Biostrings Package [28]. However, for the analysis of IDPs, PAM and BLOSUM matrices are not ideals since they are derived from ordered proteins or favor residue substitutions common among structured proteins [2022]. To circumvent this pitfall in idpr, EDSSMat [20], Disorder [21], and DUNMat [22] which provide IDP-derived substitution matrices have been incorporated for use in alignments.

Results and discussion

A. Example 1 - α-Synuclein

To highlight the use of the idpr package, α-Synuclein (αSyn; UniProt ID: P37840], is used in an example analysis. αSyn is an IDP, experimentally validated using various methods [4852]. This protein has been extensively studied and is heavily implicated in Parkinson’s Disease pathology [53, 54]. The idprofile of αSyn returns IDP characteristics (Fig 1]. The Charge-Hydropathy plot shows that αSyn appears to be a collapsed protein, rather than an extended IDP (Fig 1A). This is in line with previous reported data showing regions of αSyn are shielded from the cytoplasm under native conditions [49]. The structural tendency plot shows that αSyn is enriched in disorder-promoting residues, mostly represented by Glu, Lys, Ala, and Gly (Fig 1B). Interestingly, αSyn lacks Cys and Trp, both of which are the most order-promoting residues [35], in addition to lacking Arg, a positively charged and order-neutral residue. The local charges of the protein are mostly neutral, apart from a negatively charged C-terminal region (Fig 1C). In conjunction with the local charge, the C-terminal region is deficient in hydrophobic residues, as shown by the local scaled hydropathy (Fig 1D). In fact, it has been reported that residues 104–140 of αSyn are more extended than the N-terminal portion of the protein [51]. The Charge-Hydropathy plot of αSyn residues 104–140 returns an extended IDR, while residues 1–103 returns a collapsed protein with a more neutral charge (S1A Fig in S1 File). This is in line with the IUPred2 and FoldIndex predictions of intrinsic disorder for αSyn, which shows the C-terminal region predicted as disordered (Fig 1E and 1F, S1B Fig in S1 File). There are known point mutations in αSyn that are associated with familial Parkinson’s Disease: A30P, E46K, H50Q, G51D, and A53T [5557]. While these mutations are located in the more compact region of the protein, most mutations occur in disorder-promoting residues, with the exception of H50Q (S1B Fig in S1 File). Overall, the idprofile is useful for identifying biochemical features related to IDRs within a protein of interest.

B. Example 2—p53

Another well characterized IDP is the cellular tumor antigen p53 (UniProt ID: P04637]. p53 has been studied extensively since it is mutated in over 50% of human cancers [58, 59]. It is an experimentally validated IDP [6062] that acts as a protein hub, interacting with many different partners [3, 63]. The idprofile of p53 shows characteristics of a protein with several IDRs (S2 Fig in S1 File). The C-terminal domain (CTD) of p53 has been highly studied due to its ability to reversibly form various secondary structures depending on the specific binding partner studied [3, 60, 63]. For example, residues 377–388 gain an α-helical structure when interacting with S100 calcium-binding protein B, while in the same region, residues 379–387, form a β-strand when interacting with Sirtuin [3, 60, 63]. To predict such regions, ANCHOR2 scores, produced by IUPred2A, predict domains that are disordered and are protein-binding regions which may undergo a gain-of-structure when bound [23, 24]. For p53, ANCHOR2 predicts binding in multiple IDRs, including the CTD and recapitulates the known disorder-to-order transition of this domain mentioned above (Fig 2A).

There are several evolutionarily conserved cysteines within p53, most of which are within the central DNA-binding domain (DBD) [42, 64]. Further, p53 has reported roles in redox regulation [42, 65, 66]. To this point, IUPred2 contains a context-dependent predictor of disorder distinguishing between reducing (plus) or oxidizing (minus) environments that can be used to predict redox-sensitive IDRs that may experience induced folding [23, 24]. IUPred2 Redox predicts that p53 has multiple regions of redox sensitivity in the DBD (Fig 2B). This is in line with the known impact of redox conditions on the DBD that influences the structure—and the subsequent function—of p53, consistent with published literature [64, 65, 67, 68]. The structural tendency of each residue in p53 is highlighted in a sequence map with domains [60] and conserved Cys residues [42] annotated (Fig 2C). This p53 analysis exemplifies the use of functions within idpr that are not automatically generated using the idprofile wrapping function.

C. Example 3 –mouse GCNA

The germ cell nuclear acidic protein (GCNA) is required for male fertility and has roles in repairing DNA-protein crosslinks [6971]. GCNA has orthologs from single-celled protists to mammals. In most species, the N-terminal half of GCNA is disordered and the C-terminal half contains an Sprt-Like metalloprotease domain, zinc finger, and HMG box. While there appears to be occasional losses of either the protease, HMG box, or zinc finger, all GCNA orthologs contain the IDR [70]. The IDR of GCNA is enriched in acidic residues, which contributes to the disordered nature. Interestingly, the mouse GCNA lacks all of the structured domains and was previously predicted to be entirely disordered by IUPRED [70, 72].

The idprofile of mouse GCNA (UniProt ID: A0A1D9BZF0) displays that of an unstructured protein (Fig 3A). There is a long stretch of acidic residues, with glutamic acid (E) being the most abundant residue in the amino acid sequence (Fig 3B and 3C). Further, there is a significant enrichment of disorder promoting residues in mouse GCNA (Fig 3B), aligning with the previously reported amino acid composition of GCNA being similar to that of Disprot, a database of intrinsically disordered proteins [70, 73]. There are very few hydrophobic residues, and the protein has an average scaled hydropathy of 0.348 (Fig 3D). Both the extreme acidity of GCNA and the enrichment of disorder-promoting soluble residues contribute to the entire peptide being predicted as disordered from N- to C-terminus (Fig 3E and 3F). This replicates previously reported predictions of mouse GCNA being disordered [70].

Fig 3. The idprofile of mouse GCNA shows prediction of an entirely disordered IDP.

Fig 3

(A) Charge-hydropathy plot of GCNA predicts a disordered protein. Method are described in [16]. Mean Scaled Hydropathy calculated with the Kyte and Doolittle measurement of hydropathy [33], scaled to Arg = 0.0 and Ile = 1.0. Mean Net Charge calculated with IPC_protein pKa values [34]. Cutoff equation is <Charge> = ±2.785<Hydropathy>±1.151 as described previously [18]. Proteins are considered insoluble when <Hydropathy> ≥ 0.7. (B) Structural tendency plot shows GCNA is enriched in disorder-promoting residues. Disorder-promoting residues (P, E, S, Q, K, A, and G) in green; order-promoting residues (M, N, V, H, L, F, Y, I, W, and C) in purple; disorder‐neutral residues (D, T, and R) in pink [35]. (C) Local Charge Plot shows an acidic C-terminus. The local charge is the average of a 7 amino acid wide sliding window, calculated with the IPC_protein pKa values [34]. (D) Local Hydropathy Plot shows a C-terminus deficient in hydrophobic residues. The local hydropathy is the average of a 9 amino acid wide sliding window, calculated with the scaled Kyte and Doolittle measurement of hydropathy [33]. (E) IUPred2 predicts a C-terminal IDR in GCNA [23, 24]. Residues with a score 0.0–0.5 are predicted to be ordered, regions 0.5–1.0 are predicted to be disordered. (F) FoldIndex predicts a C-terminal IDR in GCNA [36]. Residues with a score 0.0 –+1.0 are predicted to be ordered, regions -1.0–0.0 are predicted to be disordered.

Conclusion

We have created an integrated R package that combines disorder prediction tools, hydropathy, and amino acid composition to facilitate the characterization of IDPs. The presence of charge repulsion and hydrophobic deficiencies are hallmark characteristics of an IDP or IDR [18]. The idpr package contains distinct, customizable methods for calculating charge and hydropathy for a protein sequence of interest. The output is a visually accessible, graphical readout of critical parameter for IDP analysis. We have validated the use of this tool with α-Synuclein, p53, and GCNA.

A significant portion of the eukaryotic proteome is thought to contain IDRs, but our understanding of these domains is still lacking. In some cases, these domains serve as bridges between two structured domains [74]. In others, like p53, the IDR attains different structure with unique protein partners [3, 75]. Yet in others, the IDRs support liquid-liquid phase separation [76]. In most cases, the role of the IDR is unknown. By providing an integrative tool for characterization of these domains, we envision idpr as platform upon which to find commonalities between IDPs and all for sub-division of these protein families.

Supporting information

S1 File. This contains a list of abbreviations and S1 and S2 Figs.

(DOCX)

S2 File. The code used to generate all graphics presented in this manuscript.

(PDF)

Acknowledgments

We would like to acknowledge the support of Dr. Michael Buszczak during the development of the package. Additionally, we would like to thank Dr. Miguel Brieño-Enríquez and the members of the Brieño-Enríquez and Yanowitz labs for feedback during package development. For providing essential details in making the package, we would like to acknowledge the book R Packages by Hadley Wickham (O’Reilly) ©2015 Hadley Wickham, ISBN: 978-1-491-91059-7.

Data Availability

idpr is available for download at: https://doi.org/10.18129/B9.bioc.idpr.

Funding Statement

This work was funded by National Institutes of Health, grant #R01GM127569 to Judith Yanowitz. https://public.era.nih.gov/commonsplus/. The funders had and will not have a role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Dunker AK, Lawson JD, Brown CJ, Williams RM, Romero P, Oh JS, et al. Intrinsically disordered protein. Journal of Molecular Graphics and Modelling. 2001;19(11):26–59. doi: 10.1016/s1093-3263(00)00138-8 [DOI] [PubMed] [Google Scholar]
  • 2.Tompa P. Intrinsically unstructured proteins. Trends in biochemical sciences. 2002;27(10):527–33. doi: 10.1016/s0968-0004(02)02169-2 [DOI] [PubMed] [Google Scholar]
  • 3.Uversky VN. Intrinsically disordered proteins from A to Z. The International Journal of Biochemistry & Cell Biology. 2011;43(8):1090–103. doi: 10.1016/j.biocel.2011.04.001 [DOI] [PubMed] [Google Scholar]
  • 4.Wright PE, Dyson HJ. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. Journal of Molecular Biology. 1999;293(22):321–31. doi: 10.1006/jmbi.1999.3110 [DOI] [PubMed] [Google Scholar]
  • 5.Uversky VN, Oldfield CJ, Midic U, Xie H, Xue B, Vucetic S, et al. Unfoldomics of human diseases: linking protein intrinsic disorder with diseases. BMC Genomics. 2009;10(11):S7. doi: 10.1186/1471-2164-10-S1-S7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Uversky VN, Oldfield CJ, Dunker AK. Intrinsically disordered proteins in human diseases: introducing the D2 concept. Annu Rev Biophys. 2008;37:215–46. doi: 10.1146/annurev.biophys.37.032807.125924 [DOI] [PubMed] [Google Scholar]
  • 7.Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z, Uversky VN, Dunker AK. Intrinsic Disorder and Functional Proteomics. Biophysical Journal. 2007;92(55):1439–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Pancsa R, Tompa P. Structural disorder in eukaryotes. PLoS One. 2012;7(44):e34687–e. doi: 10.1371/journal.pone.0034687 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Xue B, Dunker AK, Uversky VN. Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life. Journal of Biomolecular Structure and Dynamics. 2012;30(22):137–49. doi: 10.1080/07391102.2012.675145 [DOI] [PubMed] [Google Scholar]
  • 10.Kulkarni P, Uversky VN. Intrinsically Disordered Proteins: The Dark Horse of the Dark Proteome. PROTEOMICS. 2018;18(21–22):1800061. doi: 10.1002/pmic.201800061 [DOI] [PubMed] [Google Scholar]
  • 11.Niklas KJ, Dunker AK, Yruela I. The evolutionary origins of cell type diversification and the role of intrinsically disordered proteins. Journal of Experimental Botany. 2018;69(77):1437–46. doi: 10.1093/jxb/erx493 [DOI] [PubMed] [Google Scholar]
  • 12.Kurgan L, Radivojac P, Sussman JL, Dunker AK. On the Importance of Computational Biology and Bioinformatics to the Origins and Rapid Progression of the Intrinsically Disordered Proteins Field. Biocomputing 2020. 2019:149–58. [Google Scholar]
  • 13.Li J, Feng Y, Wang X, Li J, Liu W, Rong L, et al. An Overview of Predictors for Intrinsically Disordered Proteins over 2010–2014. Int J Mol Sci. 2015;16(10):23446–62. doi: 10.3390/ijms161023446 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Meng F, Uversky V, Kurgan L. Computational prediction of intrinsic disorder in proteins. Current protocols in protein science. 2017;88(11):2.16.01–14. doi: 10.1002/cpps.28 [DOI] [PubMed] [Google Scholar]
  • 15.Liu Y, Wang X, Liu B. A comprehensive review and comparison of existing computational methods for intrinsically disordered protein and region prediction. Briefings in Bioinformatics. 2017;20(11):330–46. [DOI] [PubMed] [Google Scholar]
  • 16.Uversky VN, Gillespie JR, Fink AL. Why are “natively unfolded” proteins unstructured under physiologic conditions? Proteins: Structure, Function, and Bioinformatics. 2000;41(33):415–27. doi: [DOI] [PubMed] [Google Scholar]
  • 17.Brown CJ, Takayama S, Campen AM, Vise P, Marshall TW, Oldfield CJ, et al. Evolutionary rate heterogeneity in proteins with long disordered regions. Journal of molecular evolution. 2002;55(11):104. doi: 10.1007/s00239-001-2309-6 [DOI] [PubMed] [Google Scholar]
  • 18.Uversky VN. Intrinsically Disordered Proteins and Their “Mysterious” (Meta)Physics. Frontiers in Physics. 2019;7(10). [Google Scholar]
  • 19.Forcelloni S, Giansanti A. Evolutionary Forces and Codon Bias in Different Flavors of Intrinsic Disorder in the Human Proteome. Journal of Molecular Evolution. 2020;88(22):164–78. doi: 10.1007/s00239-019-09921-4 [DOI] [PubMed] [Google Scholar]
  • 20.Trivedi R, Nagarajaram HA. Amino acid substitution scoring matrices specific to intrinsically disordered regions in proteins. Scientific Reports. 2019;911):16380. doi: 10.1038/s41598-019-52532-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Brown CJ, Johnson AK, Daughdrill GW. Comparing Models of Evolution for Ordered and Disordered Proteins. Molecular Biology and Evolution. 2009;27[33]:609–21. doi: 10.1093/molbev/msp277 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Radivojac P, Obradovic Z, Brown CJ, Dunker AK. Improving sequence alignments for intrinsically disordered proteins. Biocomputing 2002: World Scientific; 2001. p. 589–600. [PubMed] [Google Scholar]
  • 23.Mészáros B, Erdős G, Dosztányi Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic acids research. 2018;46(W1):W329–W37. doi: 10.1093/nar/gky384 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Erdős G, Dosztányi Z. Analyzing Protein Disorder with IUPred2A. Current Protocols in Bioinformatics. 2020;70(11):e99. doi: 10.1002/cpbi.99 [DOI] [PubMed] [Google Scholar]
  • 25.Wickham H. ggplot2: elegant graphics for data analysis. New York: Springer-Verlag; 2016. [Google Scholar]
  • 26.R Core Team. R: A language and environment for statistical computing. 3.6.1 ed. Vienna, Austria: R Foundation for Statistical Computing,; 2019. [Google Scholar]
  • 27.Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome biology. 2004;5(10):R80. doi: 10.1186/gb-2004-5-10-r80 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Pagès H, Aboyoun P, Gentleman R, DebRoy S. Biostrings: Efficient manipulation of biological strings. R package version 2.60.1. 2021. [Google Scholar]
  • 29.Ooms J. The jsonlite package: A practical and consistent mapping between json data and r objects. arXiv [Preprint] arXiv:14032805. 2014. [Google Scholar]
  • 30.Wickham H, Averick M, Bryan J, Chang W, McGowan LDA, François R, et al. Welcome to the Tidyverse. Journal of open source software. 2019;4(43):1686. [Google Scholar]
  • 31.Wickham H, Danenberg P, Csárdi G, Eugster M. roxygen2: In-Line Documentation for R. R Package version 7.1.1. 2020. [Google Scholar]
  • 32.Morgan M. BiocManager: access the Bioconductor project package repository. R package version 1.30.16. 2021. [Google Scholar]
  • 33.Kyte J, Doolittle RF. A simple method for displaying the hydropathic character of a protein. Journal of molecular biology. 1982;157(11):105–32. doi: 10.1016/0022-2836(82)90515-0 [DOI] [PubMed] [Google Scholar]
  • 34.Kozlowski LP. IPC–Isoelectric Point Calculator. Biology Direct. 2016;11(11):55. doi: 10.1186/s13062-016-0159-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Uversky VN. Unusual biophysics of intrinsically disordered proteins. Biochimica et Biophysica Acta (BBA)—Proteins and Proteomics. 2013;1834(55):932–51. doi: 10.1016/j.bbapap.2012.12.008 [DOI] [PubMed] [Google Scholar]
  • 36.Prilusky J, Felder CE, Zeev-Ben-Mordehai T, Rydberg EH, Man O, Beckmann JS, et al. FoldIndex©: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics. 2005;21(16):3435–8. doi: 10.1093/bioinformatics/bti537 [DOI] [PubMed] [Google Scholar]
  • 37.Uversky VN. Paradoxes and wonders of intrinsic disorder: Complexity of simplicity. Intrinsically Disordered Proteins. 2016;4(11):e1135015. doi: 10.1080/21690707.2015.1135015 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Po HN, Senozan N. The Henderson-Hasselbalch equation: its history and limitations. Journal of Chemical Education. 2001;78(11):1499. [Google Scholar]
  • 39.Gasteiger E, Hoogland C, Gattiker A, Wilkins MR, Appel RD, Bairoch A. Protein identification and analysis tools on the ExPASy server. The proteomics protocols handbook: Springer; 2005. p. 571–607. [Google Scholar]
  • 40.Theillet F-X, Kalmar L, Tompa P, Han K-H, Selenko P, Dunker AK, et al. The alphabet of intrinsic disorder: I. Act like a Pro: On the abundance and roles of proline residues in intrinsically disordered proteins. Intrinsically disordered proteins. 2013;1(11):e24360–e. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Dosztányi Z. Prediction of protein disorder based on IUPred. Protein Sci. 2018;27(11):331–40. doi: 10.1002/pro.3334 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Sun Y, Oberley LW. Redox regulation of transcriptional activators. Free Radical Biology and Medicine. 1996;21(33):335–48. doi: 10.1016/0891-5849(96)00109-8 [DOI] [PubMed] [Google Scholar]
  • 43.Franzosa EA, Xia Y. Structural Determinants of Protein Evolution Are Context-Sensitive at the Residue Level. Molecular Biology and Evolution. 2009;26(10):2387–95. doi: 10.1093/molbev/msp146 [DOI] [PubMed] [Google Scholar]
  • 44.Dayhoff M, Schwartz R, Orcutt B. 22 a model of evolutionary change in proteins. Atlas of protein sequence and structure. 5: National Biomedical Research Foundation Silver Spring MD; 1978. p. 345–52. [Google Scholar]
  • 45.Henikoff S, Henikoff JG. Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A. 1992;89(22):10915–9. doi: 10.1073/pnas.89.22.10915 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Johnson M, Zaretskaya I, Raytselis Y, Merezhuk Y, McGinnis S, Madden TL. NCBI BLAST: a better web interface. Nucleic Acids Research. 2008;36(suppl_2):W5–W9. doi: 10.1093/nar/gkn201 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, et al. The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic acids research. 2019;47(W1):W636–W41. doi: 10.1093/nar/gkz268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Weinreb PH, Zhen W, Poon AW, Conway KA, Lansbury PT. NACP, a protein implicated in Alzheimer’s disease and learning, is natively unfolded. Biochemistry. 1996;35(43):13709–15. doi: 10.1021/bi961799n [DOI] [PubMed] [Google Scholar]
  • 49.Theillet F-X, Binolfi A, Bekei B, Martorana A, Rose HM, Stuiver M, et al. Structural disorder of monomeric α-synuclein persists in mammalian cells. Nature. 2016;530(7588):45–50. doi: 10.1038/nature16531 [DOI] [PubMed] [Google Scholar]
  • 50.Fauvet B, Mbefo MK, Fares M-B, Desobry C, Michael S, Ardah MT, et al. α-Synuclein in central nervous system and from erythrocytes, mammalian cells, and Escherichia coli exists predominantly as disordered monomer. J Biol Chem. 2012;287(19):15345–64. doi: 10.1074/jbc.M111.318949 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Eliezer D, Kutluay E, Bussell R Jr, Browne G. Conformational properties of α-synuclein in its free and lipid-associated states. Journal of molecular biology. 2001;307(44):1061–73. doi: 10.1006/jmbi.2001.4538 [DOI] [PubMed] [Google Scholar]
  • 52.Kim D-H, Lee J, Mok KH, Lee JH, Han K-H. Salient Features of Monomeric Alpha-Synuclein Revealed by NMR Spectroscopy. Biomolecules. 2020;10(33):428. doi: 10.3390/biom10030428 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Lücking CB, Brice A. Alpha-synuclein and Parkinson’s disease. Cellular and Molecular Life Sciences CMLS. 2000;57(13):1894–908. doi: 10.1007/PL00000671 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Xu L, Pu J. Alpha-Synuclein in Parkinson’s Disease: From Pathogenetic Dysfunction to Potential Clinical Application. Parkinson’s Disease. 2016;2016:1720621. doi: 10.1155/2016/1720621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Kasten M, Klein C. The many faces of alpha‐synuclein mutations. Movement Disorders. 2013;28(66):697–701. doi: 10.1002/mds.25499 [DOI] [PubMed] [Google Scholar]
  • 56.Flagmeier P, Meisl G, Vendruscolo M, Knowles TPJ, Dobson CM, Buell AK, et al. Mutations associated with familial Parkinson’s disease alter the initiation and amplification steps of α-synuclein aggregation. Proceedings of the National Academy of Sciences. 2016;113(37):10328. doi: 10.1073/pnas.1604645113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Fujioka S, Ogaki K, Tacik PM, Uitti RJ, Ross OA, Wszolek ZK. Update on novel familial forms of Parkinson’s disease and multiple system atrophy. Parkinsonism & Related Disorders. 2014;20:S29–S34. doi: 10.1016/S1353-8020(13)70010-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Hollstein M, Sidransky D, Vogelstein B, Harris CC. p53 mutations in human cancers. Science. 1991;253(5015):49–53. doi: 10.1126/science.1905840 [DOI] [PubMed] [Google Scholar]
  • 59.Surget S, Khoury MP, Bourdon J-C. Uncovering the role of p53 splice variants in human malignancy: a clinical perspective. OncoTargets and therapy. 2014;7:57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Kannan S, Lane DP, Verma CS. Long range recognition and selection in IDPs: the interactions of the C-terminus of p53. Scientific reports. 2016;6:23750. doi: 10.1038/srep23750 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Vise PD, Baral B, Latos AJ, Daughdrill GW. NMR chemical shift and relaxation measurements provide evidence for the coupled folding and binding of the p53 transactivation domain. Nucleic acids research. 2005;33(77):2061–77. doi: 10.1093/nar/gki336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Wells M, Tidow H, Rutherford TJ, Markwick P, Jensen MR, Mylonas E, et al. Structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain. Proc Natl Acad Sci U S A. 2008;105(15):5762–7. doi: 10.1073/pnas.0801353105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Sullivan KD, Galbraith MD, Andrysik Z, Espinosa JM. Mechanisms of transcriptional regulation by p53. Cell Death & Differentiation. 2018;25(11):133–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Buzek J, Latonen L, Kurki S, Peltonen K, Laiho M. Redox state of tumor suppressor p53 regulates its sequence-specific DNA binding in DNA-damaged cells by cysteine 277. Nucleic acids research. 2002;30(11):2340–8. doi: 10.1093/nar/30.11.2340 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Eriksson SE, Ceder S, Bykov VJN, Wiman KG. p53 as a hub in cellular redox regulation and therapeutic target in cancer. Journal of Molecular Cell Biology. 2019;11(44):330–41. doi: 10.1093/jmcb/mjz005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Maillet A, Pervaiz S. Redox Regulation of p53, Redox Effectors Regulated by p53: A Subtle Balance. Antioxidants & Redox Signaling. 2011;16(11):1285–94. [DOI] [PubMed] [Google Scholar]
  • 67.Hainaut P, Milner J. Redox Modulation of p53 Conformation and Sequence-specific DNA Binding &lt;em&gt;in Vitro&lt;/em&gt. Cancer Research. 1993;53(19):4469. [PubMed] [Google Scholar]
  • 68.Rainwater R, Parks D, Anderson ME, Tegtmeyer P, Mann K. Role of cysteine residues in regulation of p53 function. Molecular and Cellular Biology. 1995;15(77):3892. doi: 10.1128/MCB.15.7.3892 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Bhargava V, Goldstein CD, Russell L, Xu L, Ahmed M, Li W, et al. GCNA Preserves Genome Integrity and Fertility Across Species. Developmental Cell. 2020;52(11):38–52.e10. doi: 10.1016/j.devcel.2019.11.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Carmell MA, Dokshin GA, Skaletsky H, Hu Y-C, van Wolfswinkel JC, Igarashi KJ, et al. A widely employed germ cell marker is an ancient disordered protein with reproductive functions in diverse eukaryotes. eLife. 2016;5:e19993. doi: 10.7554/eLife.19993 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Borgermann N, Ackermann L, Schwertman P, Hendriks IA, Thijssen K, Liu JC, et al. SUMOylation promotes protective responses to DNA-protein crosslinks. The EMBO Journal. 2019;38(8):e101496. doi: 10.15252/embj.2019101496 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Dosztányi Z, Csizmok V, Tompa P, Simon I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics. 2005;21(16):3433–4. doi: 10.1093/bioinformatics/bti541 [DOI] [PubMed] [Google Scholar]
  • 73.Quaglia F, Mészáros B, Salladini E, Hatos A, Pancsa R, Chemes LB, et al. DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Research. 2021;50(D1):D480–D7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Basile W, Salvatore M, Bassot C, Elofsson A. Why do eukaryotic proteins contain more intrinsically disordered regions? PLOS Computational Biology. 2019;15(77):e1007186. doi: 10.1371/journal.pcbi.1007186 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Warren C, Shechter D. Fly Fishing for Histones: Catch and Release by Histone Chaperone Intrinsically Disordered Regions and Acidic Stretches. J Mol Biol. 2017;429(16):2401–26. doi: 10.1016/j.jmb.2017.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.McCarty J, Delaney KT, Danielsen SPO, Fredrickson GH, Shea J-E. Complete Phase Diagram for Liquid–Liquid Phase Separation of Intrinsically Disordered Proteins. The Journal of Physical Chemistry Letters. 2019;10(8):1644–52. doi: 10.1021/acs.jpclett.9b00099 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Eugene A Permyakov

7 Jan 2022

PONE-D-21-38609idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in RPLOS ONE

Dear Dr. Yanowitz,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

ACADEMIC EDITOR:Please try to improve your manuscript according to the reviewers' criticism. It would be good to add some more examples of successful predictions carriied out by your software package. 

==============================

Please submit your revised manuscript by Feb 17 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Eugene A. Permyakov, Ph.D., Dr.Sci.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. To comply with PLOS ONE submissions requirements, please provide the Protocols.io DOI in the Methods section of the manuscript using this format: “The protocol described in this peer-reviewed article is published on protocols.io, https://dx.doi.org/10.17504/protocols.io[........] and is included for printing as supporting information file 1 with this article.” Please also provide the Protocols.io DOI in the “Protocol DOI” field of the submission form (via “Edit Submission”). For more information, please see our submission guidelines:  https://journals.plos.org/plosone/s/submission-guidelines#loc-guidelines-for-specific-study-types.

3. We note you have not provided a Protocol.io PDF version of your protocol. As noted in our submission requirements, please upload a Protocol.io PDF version of your protocol as a Supporting Information file and name the file ‘S1 file’. Please update your Supporting Information Captions if necessary. If you have not yet uploaded your protocol to Protocols.io you are welcome to use the Protocols.io customer service code ‘PLOS2021.’ When using this customer code while submitting to Protocols.io, please make reference to your PLOS ONE submission, including your PLOS ONE manuscript number. With this customer code, Protocols.io editorial staff will import and format your protocol at no charge. For more information, please see our submission guidelines:  https://journals.plos.org/plosone/s/submission-guidelines#loc-guidelines-for-specific-study-types.

4. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. 

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

5. We noticed you have some minor occurrence of overlapping text with the following previous publication(s), which needs to be addressed:

- https://rdrr.io/bioc/idpr/f/inst/doc/idpr-vignette.Rmd

In your revision ensure you cite all your sources (including your own works), and quote or rephrase any duplicated text outside the methods section. Further consideration is dependent on these concerns being addressed.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does the manuscript report a protocol which is of utility to the research community and adds value to the published literature?

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the protocol been described in sufficient detail?

Descriptions of methods and reagents contained in the step-by-step protocol should be reported in sufficient detail for another researcher to reproduce all experiments and analyses. The protocol should describe the appropriate controls, sample sizes and replication needed to ensure that the data are robust and reproducible.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Does the protocol describe a validated method?

The manuscript must demonstrate that the protocol achieves its intended purpose: either by containing appropriate validation data, or referencing at least one original research article in which the protocol was used to generate data.

Reviewer #1: No

Reviewer #2: Yes

**********

4. If the manuscript contains new data, have the authors made this data fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: N/A

**********

5. Is the article presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please highlight any specific errors that need correcting in the box below.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript PONE-D-21-38609 presents “idpr”, a software package aimed at the analysis of sequence properties of Intrinsically Disordered Proteins (IDPs) and unfolded protein regions, written by using the programming language R (one of the most common for statistical computing, data mining, and graphical representation). Following the general practice/tradition of reports of software packages, the manuscript illustrates the main features of “idpr” in a succinct way, includes test cases for two well-known IDPs (alpha-synuclein and p53), and refers to the program at the internet address https://bioconductor.org/packages/idpr/ for further info; a workflow including instructions for the installation is also provided as Supporting Information.

This work is interesting and provides a useful tool, because IDPs are currently widely investigated - as explained in the text, they are increasingly recognized as ubiquitously present in all organisms, and heavily involved in many pathologies. The language used is concise and direct, and the presentation is clear enough. Overall, it can be improved as described hereafter.

Major points:

- The two test cases presented both involve IDPs that possess large ordered regions. Alpha-synuclein in its functional conformation has 66% of its sequence in helical conformation, and such secondary structure is even folded in a sort of tertiary hairpin structure. Similarly, p53 has a high structural plasticity, but 63% of the sequence adopts a (labile) secondary and tertiary structure. A test case for an entirely unfolded IDP is lacking, and must be added to fully validate the software/protocol presented.

Any full-unstructured IDP can be chosen to this aim. For instance, NUPR1 (UniProt ID: O60356), a small protein of 82 residues with 0% secondary/tertiary structure (in spite of the wrong prediction of AlphaFold shown on the UniProt page). This protein remains disordered even in molecular complexes, thus is considered a model for a “perfect” IDP. Peaks in the hydropathy plot of NUPR1 nicely identify hot spots for the binding to molecular partners that include other (folded/unfolded) proteins, peptides, DNA, inorganic polymers, and various drugs, thus it constitutes an interesting and very easy test case. Other proteins, on top of my head, may include p21, prothymosin alpha, and perhaps even some unstructured proteins of SARS-CoV-1/2.

Minor points:

- Abstract: please consider mentioning https://bioconductor.org/packages/idpr/ already there or, alternatively, as soon as possible.

- Line 27: “these proteins have been implicated in several human diseases such as Parkinson’s Disease, Alzheimer’s Disease, and various cancers (5-7)”. Given the enormous progress of the research in this field, it is disappointing to see that the only papers cited are 10-15 years old. Please consider substituting or complementing them with updated references.

- Line 46: “the R package idpr stands for a few things: “Intrinsically Disordered Proteins in R” and “IDp PRofiles””. This seems to contradict the Abstract, where a single meaning of the acronym is given. Please correct either the Abstract or, more easily, the text, e.g. “stands for “Intrinsically Disordered Proteins in R”, although other acronyms such as “IDp PRofiles” are possible”.

- Line 64: “fasta files”. Please call it either FASTA (the correct name for the format) or .fasta (its file extension).

- Line 67: “tidverse packages”. Typo, it is “tidyverse”.

- Line 77: “If a UniPot ID is not included, the IUPred plot is skipped”. What happens if one wants to investigate a sequence that has not a UniProt ID, e.g. a new mutant of a known sequence? If this is already possible, it should be explained in the text; otherwise, it should be addressed in a future release of “idpr”.

- Line 81: “Method described in (16). Mean Scaled Hydropathy calculated with The Kyte and Doolittle measurement of hydropathy”. It should be “Methods are described...”, and “the Kyte and Doolittle” (lowercase “the”).

- Line 98: “extreme net charge and deficiency in hydropathy are characteristics of intrinsic disorder (16)”. It should be specified “intrinsic disorder in proteins” (or polypeptides).

- Line 109: “ The resulting figure is similar to ProtScale from ExPASy (38)”. It should be a bit expanded, e.g. “similar to the one that can be obtained by using the ProtScale tool from ExPASy”.

- Line 116: “Order promoting residues, meaning those enriched in structured proteins, tend to be aliphatic, hydrophobic, aromatic, or can form tertiary structures”. It should be “those more frequent in structured proteins” (structured proteins are enriched of these residues, not the other way around). Also, it should be “prone to form secondary/tertiary structures” (any residue “can form” such structures, with a few exception such as Prolines in secondary structures; the difference is again in the frequency).

- Line 117: “Disorder neutral residues”. It should be “Disorder-neutral residues”, the hyphen is crucial. Please also delete the space in “disorder- neutral residues”, three lines below.

- Line 139: “Residues with an IUPred2 long score”. The term “long” is unclear.

- Line 162: “This function can also visualize continuous values”. I cannot understand why it is important to specify this, or imagine an example of an use of such continuous values.

- Line 181: “aSyn is an IDP, experimentally determined using various methods”. Please use “investigated”, “validated”, “identified”, etc., instead of “determined”.

- Line 189: “Although the protein is deficient in Arg, the local charges of the protein are mostly neutral, apart from...”. I do not see the point of using “Although”.

- Line 201: “identifying biochemical features related to IDPs within a protein of interest”. Either it was meant “IDRs” instead of “IDPs”, or it is not clear.

- Discussion. This looks more like a Conclusion to me; maybe it can be slightly enlarged.

- Acknowledgments: It would be nice to state explicitly the names of Dr. M. Buszczak and Dr. M. Brieno-Enriquez.

Reviewer #2: The manuscript describes a relative simple but useful tool for profiling and analysis of intrinsically disordered proteins. The tool is available in R and provides quick analysis and visualization of several relative basic sequence properties of IDPs. These tools should be useful for someone who needs to do quick analysis of these sequence based properties. However, I have some reservation on how useful this tool will be. These properties are quite basic and the targeted users are probably non-experts; yet it require some proficiency in R and scripting to use. It would appear to be much more useful to have a web-based interface for the targeted users. I also recommend the author to build interface to other IDP analysis tools besides IUPred, such as Rohit Pappu's CIDER and others.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Apr 18;17(4):e0266929. doi: 10.1371/journal.pone.0266929.r002

Author response to Decision Letter 0


28 Mar 2022

Response to Reviewers

We thank both reviewers for their time and thoughtful comments. They have helped to improve the manuscript and the functionality of the tool itself. Please find below our detailed responses to each point.

Reviewer #1: The manuscript PONE-D-21-38609 presents “idpr”, a software package aimed at the analysis of sequence properties of Intrinsically Disordered Proteins (IDPs) and unfolded protein regions, written by using the programming language R (one of the most common for statistical computing, data mining, and graphical representation). Following the general practice/tradition of reports of software packages, the manuscript illustrates the main features of “idpr” in a succinct way, includes test cases for two well-known IDPs (alpha-synuclein and p53), and refers to the program at the internet address https://bioconductor.org/packages/idpr/ for further info; a workflow including instructions for the installation is also provided as Supporting Information.

This work is interesting and provides a useful tool, because IDPs are currently widely investigated - as explained in the text, they are increasingly recognized as ubiquitously present in all organisms, and heavily involved in many pathologies. The language used is concise and direct, and the presentation is clear enough. Overall, it can be improved as described hereafter.

Major points:

- The two test cases presented both involve IDPs that possess large ordered regions. Alpha-synuclein in its functional conformation has 66% of its sequence in helical conformation, and such secondary structure is even folded in a sort of tertiary hairpin structure. Similarly, p53 has a high structural plasticity, but 63% of the sequence adopts a (labile) secondary and tertiary structure. A test case for an entirely unfolded IDP is lacking, and must be added to fully validate the software/protocol presented.

Any full-unstructured IDP can be chosen to this aim. For instance, NUPR1 (UniProt ID: O60356), a small protein of 82 residues with 0% secondary/tertiary structure (in spite of the wrong prediction of AlphaFold shown on the UniProt page). This protein remains disordered even in molecular complexes, thus is considered a model for a “perfect” IDP. Peaks in the hydropathy plot of NUPR1 nicely identify hot spots for the binding to molecular partners that include other (folded/unfolded) proteins, peptides, DNA, inorganic polymers, and various drugs, thus it constitutes an interesting and very easy test case. Other proteins, on top of my head, may include p21, prothymosin alpha, and perhaps even some unstructured proteins of SARS-CoV-1/2.

We thank the reviewer for their thoughtful comments and their consideration of our manuscript. Your main critique is an excellent point, thank you for this suggestion. We have added a section describing a protein that our lab is currently investigating. As described in the updated manuscript, the Germ Cell Nuclear Acidic protein (GCNA) from Mus musculus is previously reported as, and is predicted to be, entirely disordered. This protein also showcases the relationship of charge and hydropathy previously discussed in the manuscript.

Minor points:

- Abstract: please consider mentioning https://bioconductor.org/packages/idpr/already there or, alternatively, as soon as possible.

We have added the URL to the abstract.

- Line 27: “these proteins have been implicated in several human diseases such as Parkinson’s Disease, Alzheimer’s Disease, and various cancers (5-7)”. Given the enormous progress of the research in this field, it is disappointing to see that the only papers cited are 10-15 years old. Please consider substituting or complementing them with updated references.

We have added reference Zbinden, A., Pérez-Berlanga, M., De Rossi, P., and Polymenidou, M. (2020) Phase Separation and Neurodegenerative Diseases: A Disturbance in the Force, Developmental Cell 55, 45-68.

- Line 46: “the R package idpr stands for a few things: “Intrinsically Disordered Proteins in R” and “IDp PRofiles””. This seems to contradict the Abstract, where a single meaning of the acronym is given. Please correct either the Abstract or, more easily, the text, e.g. “stands for “Intrinsically Disordered Proteins in R”, although other acronyms such as “IDp PRofiles” are possible”. Addressed by removing multiple acronyms.

- Line 64: “fasta files”. Please call it either FASTA (the correct name for the format) or .fasta (its file extension). Addressed.

- Line 67: “tidverse packages”. Typo, it is “tidyverse”.

This is not a typo as some idpr package dependencies are: ggplot2 (>= 3.3.0), magrittr (>= 1.5), dplyr (>= 0.8.5), plyr (>= 1.8.6), which are tidyverse packages https://www.tidyverse.org/packages/ . idpr does not require the user to have the entirety of tidyverse installed.

- Line 77: “If a UniPot ID is not included, the IUPred plot is skipped”. What happens if one wants to investigate a sequence that has not a UniProt ID, e.g. a new mutant of a known sequence? If this is already possible, it should be explained in the text; otherwise, it should be addressed in a future release of “idpr”.

The IUPred2A web interface allows users to submit any custom sequence, however this cannot be done via the R programming language. The REST API on the IUPred2A server requires a UniProt ID to fetch any prediction, thus we are limited in predictions. Within the idpr package, the iupred(), iupredAnchor(), and iupredRedox() functions’ help pages and the idpr user manual contains a URL to direct users to the IUPred2A website. Additionally, during revisions we have implemented a second method to predict intrinsically disordered regions, FoldIndex, which does not require a UniProt ID and predicts disorder within R.

- Line 81: “Method described in (16). Mean Scaled Hydropathy calculated with The Kyte and Doolittle measurement of hydropathy”. It should be “Methods are described...”, and “the Kyte and Doolittle” (lowercase “the”).

Addressed.

- Line 98: “extreme net charge and deficiency in hydropathy are characteristics of intrinsic disorder (16)”. It should be specified “intrinsic disorder in proteins” (or polypeptides).

Addressed.

- Line 109: “ The resulting figure is similar to ProtScale from ExPASy (38)”. It should be a bit expanded, e.g. “similar to the one that can be obtained by using the ProtScale tool from ExPASy”.

Addressed.

- Line 116: “Order promoting residues, meaning those enriched in structured proteins, tend to be aliphatic, hydrophobic, aromatic, or can form tertiary structures”. It should be “those more frequent in structured proteins” (structured proteins are enriched of these residues, not the other way around). Also, it should be “prone to form secondary/tertiary structures” (any residue “can form” such structures, with a few exception such as Prolines in secondary structures; the difference is again in the frequency).

Addressed by adding in “sequences of structured proteins”. Addressed by changing “tertiary structures” to “disulfide bonds”.

- Line 117: “Disorder neutral residues”. It should be “Disorder-neutral residues”, the hyphen is crucial. Please also delete the space in “disorder- neutral residues”, three lines below.

Addressed.

- Line 139: “Residues with an IUPred2 long score”. The term “long” is unclear.

Addressed by removing ”long”.

- Line 162: “This function can also visualize continuous values”. I cannot understand why it is important to specify this, or imagine an example of an use of such continuous values.

This is based on the arguments of the ggplot2 package aesthetics that require different color pallets and theme arguments if discrete or continuous variables are visualized. Compare the sequenceMaps in figure 2C which visualizes discrete labels of amino acids to Figure S1B which visualizes continuous values of IUPred2 prediction of intrinsic disorder.

- Line 181: “aSyn is an IDP, experimentally determined using various methods”. Please use “investigated”, “validated”, “identified”, etc., instead of “determined”. Addressed.

- Line 189: “Although the protein is deficient in Arg, the local charges of the protein are mostly neutral, apart from...”. I do not see the point of using “Although”. Addressed.

- Line 201: “identifying biochemical features related to IDPs within a protein of interest”. Either it was meant “IDRs” instead of “IDPs”, or it is not clear. Addressed.

- Discussion. This looks more like a Conclusion to me; maybe it can be slightly enlarged.

We have modified the manuscript headings to reflect this change as we agree this section represents our conclusion. Along with the resulting plots for each of the test cases presented, our results section includes discussion of how previous reports are reflected in these results. Thus, we have changed the “Results” section to “Results and Discussion” and changed the “Discussion” section to “Conclusions”.

- Acknowledgments: It would be nice to state explicitly the names of Dr. M. Buszczak and Dr. M. Brieno-Enriquez.

Addressed.

Reviewer #2: The manuscript describes a relative simple but useful tool for profiling and analysis of intrinsically disordered proteins. The tool is available in R and provides quick analysis and visualization of several relative basic sequence properties of IDPs. These tools should be useful for someone who needs to do quick analysis of these sequence based properties. However, I have some reservation on how useful this tool will be. These properties are quite basic and the targeted users are probably non-experts; yet it require some proficiency in R and scripting to use. It would appear to be much more useful to have a web-based interface for the targeted users. I also recommend the author to build interface to other IDP analysis tools besides IUPred, such as Rohit Pappu's CIDER and others.

We thank the reviewer for their comments and consideration of our manuscript. We have added another prediction method to the idpr package, FoldIndex. This is now publicly available, documented within the idpr package, and has been updated throughout the manuscript. We thank you for suggesting that we add another prediction method, as this strengthens the utility of the package.

The goal of the idpr is to simplify these calculations for R users of any skill level. We thank you for raising concern and we have tried to clarify this in the manuscript. Thus, the package has extensive documentation, with a 45-page reference manual and six additional Vignettes (R’s long-form documentation), to guide inexperienced users. To simplify the use for quick analysis, the idprofile serves as a wrapping function for key methods within the package and can return 6 plots with a single line code so long as the user has the sequence .fasta file and the idpr package downloaded:

```{r}

idpr::idprofile(sequence = “path.fasta”, uniprotAccession = “ID”)

```

However, experienced users will not require this wrapping function and have many options available besides the default settings. All graphing functions in idpr can be set to return results as a data frame containing the amino acid sequence in one column and calculated values in the second column. This enables users to perform downstream analysis, run statistical tests, or create their own custom plots. Thus, idpr is both a calculator for IDP characteristics and a tool to quickly visualize these values for analysis within R. Many prediction tools on the web exist for users who do not have coding experience, what is novel about this package is that it is the first for the R programming language.

Attachment

Submitted filename: Response_to_Reviewers.docx

Decision Letter 1

Eugene A Permyakov

30 Mar 2022

idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in R

PONE-D-21-38609R1

Dear Dr. Yanowitz,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Eugene A. Permyakov, Ph.D., Dr.Sci.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Eugene A Permyakov

8 Apr 2022

PONE-D-21-38609R1

idpr: A package for profiling and analyzing Intrinsically Disordered Proteins in R

Dear Dr. Yanowitz:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof. Eugene A. Permyakov

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. This contains a list of abbreviations and S1 and S2 Figs.

    (DOCX)

    S2 File. The code used to generate all graphics presented in this manuscript.

    (PDF)

    Attachment

    Submitted filename: Response_to_Reviewers.docx

    Data Availability Statement

    idpr is available for download at: https://doi.org/10.18129/B9.bioc.idpr.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES