Abstract
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions (IDRs) are now recognised as major determinants in cellular regulation. This white paper presents a roadmap for future e-infrastructure developments in the field of IDP research within the ELIXIR framework. The goal of these developments is to drive the creation of high-quality tools and resources to support the identification, analysis and functional characterisation of IDPs. The roadmap is the result of a workshop titled “An intrinsically disordered protein user community proposal for ELIXIR” held at the University of Padua. The workshop, and further consultation with the members of the wider IDP community, identified the key priority areas for the roadmap including the development of standards for data annotation, storage and dissemination; integration of IDP data into the ELIXIR Core Data Resources; and the creation of benchmarking criteria for IDP-related software. Here, we discuss these areas of priority, how they can be implemented in cooperation with the ELIXIR platforms, and their connections to existing ELIXIR Communities and international consortia. The article provides a preliminary blueprint for an IDP Community in ELIXIR and is an appeal to identify and involve new stakeholders.
Keywords: ELIXIR, intrinsically disordered proteins, protein-protein interactions, protein function, databases, community standards, protein dynamics, cellular regulation
Introduction
Intrinsically disordered regions (IDRs), protein segments that lack persistent secondary or tertiary structure ( Chouard, 2011; Dyson & Wright, 2005; Tompa, 2011), are predicted to cover almost a third of the residues in eukaryotic proteomes ( Pancsa & Tompa, 2012; Xue et al., 2012). IDRs play a central role in cell regulation and contribute significantly to the cellular complexity of higher eukaryotes ( Dunker et al., 2008; Dyson & Wright, 2005; Forman-Kay & Mittag, 2013; Gouw et al., 2018; Mitrea & Kriwacki, 2016; Schad et al., 2018; Tompa, 2005; Van Roey et al., 2014). They represent a major source of protein diversity and versatility on the level of organisms and during evolution ( Babu et al., 2012; Buljan et al., 2012; Davey et al., 2015; Light et al., 2013; Weatheritt & Gibson, 2012). In the human proteome, IDRs are expected to contain up to one hundred thousand interaction interfaces and a million sites of post-translational modification ( Tompa et al., 2014). However, to date, only a small fraction of these functional modules have been characterised ( Gouw et al., 2018; Schad et al., 2018).
IDR-mediated interactions are commonly found in dynamic and transient complexes that underlie enzyme inhibition, signal transduction and liquid-liquid phase transition ( Borgia et al., 2018; Ivarsson & Jemth, 2019; Kriwacki et al., 1996; Martin & Mittag, 2018; Mitrea & Kriwacki, 2016; Olsen et al., 2017; Scott & Pawson, 2009; Wright & Dyson, 2015). IDR-mediated interactions are also major determinants of protein regulation, and are so far known to affect: (i) the post-translational modification state of a protein by acting as docking sites to recruit modifying enzymes; (ii) the cellular half-life of a protein by recruiting E3 ubiquitin ligases resulting in ubiquitin-dependent proteasomal degradation of the protein; and (iii) the localisation of a protein by acting as signals that target proteins to specific subcellular locations ( Beltrao et al., 2012; Davey & Morgan, 2016; Gouw et al., 2018; Guharoy et al., 2016; Iakoucheva et al., 2004; Mészáros et al., 2017; Van Roey et al., 2014). Many functions of IDRs are directly associated with their structural attributes, without directly contributing to binding events that result in complex formation; for example, IDRs can act as entropic springs, flexible linkers or spacers ( Tompa, 2005; van der Lee et al., 2014), or fly-casting regions to capture binding partners ( Shoemaker et al., 2000). Furthermore, IDRs are subject to extensive pre- and post-translational regulation to modulate protein function in response to cellular stimuli ( Bah & Forman-Kay, 2016; Csizmok & Forman-Kay, 2018; Van Roey et al., 2013; Van Roey et al., 2012; Weatheritt et al., 2012).
As a result of the fundamental regulatory functions performed by IDRs, and the cell-state conditionality of these regulatory processes, IDRs encode many of critical steps of a protein life-cycle from the ribosome to the proteasome. Consequently, IDRs play a key role in many human diseases, including cancer (p53), Alzheimer’s disease (Aβ, Tau) or Parkinson’s disease (α-synuclein) ( Shigemitsu & Hiroaki, 2018; Uversky et al., 2008) and human IDR interfaces are often mimicked by pathogens to hijack host pathways and deregulate the cell ( Davey et al., 2011; Dyson & Wright, 2018; Kruse et al., 2019; Via et al., 2015; Xue et al., 2012). Given their therapeutic relevance, IDP-mediated interactions are now seen as potential drug targets ( Corbi-Verge & Kim, 2016). Therefore, a better understanding of their structure and function will help to develop new strategies to fight human diseases.
IDP research spans several experimental fields studying protein structure and function including structural biology, biophysics, biochemistry, cell biology, proteomics, comparative genomics, systems biology, synthetic biology and pharmacology ( Figure 1A) ( Blikstad & Ivarsson, 2015; Corbi-Verge & Kim, 2016; Felli & Pierattelli, 2015; Forman-Kay & Mittag, 2013; Plitzko et al., 2017). On a structural level, IDRs do not adopt a single stable highly populated structure, instead, they are structurally heterogeneous, continuously sample a wide ensemble of conformations, with preferentially sampled intramolecular contacts driving local transient secondary structure and compaction of conformations ( Davey, 2019; Dyson & Wright, 2005; Forman-Kay & Mittag, 2013; Holehouse & Pappu, 2018). Therefore, IDRs are more easily described by probabilistic models than the intuitive visual representations of structures of folded protein regions. Classical methods for structural characterisation of a protein, such as X-Ray crystallography, are unable to capture the dynamic structures of IDRs. Instead, the structural aspects of the IDRs are studied by a range of biophysical methods including Nuclear Magnetic Resonance (NMR), Small-angle X-ray scattering (SAXS), circular dichroism (CD) or Förster resonance energy transfer (FRET) ( Felli & Pierattelli, 2015; Fuertes et al., 2017; Holmstrom et al., 2018; Plitzko et al., 2017; Tompa, 2011; Tolchard et al., 2018). The results of these complementary methods can be integrated to build a model of IDR structure and dynamics.
Figure 1.
( A) Key experimental methods used across the three major research focuses of the IDP field: structure, interactome and function. ( B) The growth in structural IDP data curated by the DisProt database ( Piovesan et al., 2017). DisProt was established over a decade ago in the USA, and recently brought to Europe after years of inactivity and completely re-annotated, this explains the lag in the curation. A huge amount of IDP/IDR data in the IDP literature remain uncurated and the vast majority of IDP regions remain to be characterised ( Pancsa & Tompa, 2012; Xue et al., 2012). ( C) The growth in functional IDP data curated by the Eukaryotic Linear Motif (ELM) resource ( Gouw et al., 2018). Similar to the structural IDR data, a huge amount of functional IDP/IDR data in the IDP literature remains uncurated and the vast majority of functional modules in IDPs remain to be characterised ( Tompa et al., 2014).
The functional aspects of IDPs are studied by a battery of well-established cell biology and biophysical approaches ( Gibson et al., 2015) ( Figure 1). Many of the approaches were developed for protein interaction elucidation, for example, by coupling mutagenesis to affinity-purification. However, interactions mediated by IDRs are often defined by their low affinity and cell-state dependent conditionality, two properties that are not always amenable to the available experimental protein interaction detection methods ( Gibson et al., 2015; Van Roey et al., 2014). Therefore, there are a growing number of approaches that have been specifically designed to characterise low affinity IDR-mediated interactions such as peptide arrays, proteomic phage display (ProP-PD) and peptides attached to Microspheres with Ratiometric Barcode Lanthanide Encoding (MRBLE-pep) ( Blikstad & Ivarsson, 2015; Davey et al., 2017; Nguyen et al., 2017; Volkmer, 2009). As IDRs often adopt a dominant conformation in their bound state, many IDR-containing interfaces can be studied in complex with the structured binding partners using structural approaches including NMR spectroscopy and X-Ray Crystallography ( Bonetti et al., 2018; Fuxreiter, 2018; Iešmantavicius et al., 2014; Schad et al., 2018). Distinct IDR functionalities are also studied using isolation, deletion or mutagenesis of a functional module combined with bespoke assays to characterise phase separation, protein localisation, stability and post-translational modification ( Gibson et al., 2015; Gouw et al., 2018; Martin & Mittag, 2018).
The computational IDP field tackles several distinct research tasks including: (i) the prediction of IDRs ( Cilia et al., 2014; Dosztányi et al., 2005), (ii) the identification of functional modules in IDRs ( Dosztányi et al., 2009; Edwards & Palopoli, 2015; Krystkowiak & Davey, 2017; Neduva & Russell, 2006), (iii) docking of peptides that are intrinsically disordered in their unbound state ( Raveh et al., 2011; Trabuco et al., 2012), (iv) force field development and molecular simulations for IDR structure ( Best, 2017; Chong et al., 2017; Huang & MacKerell, 2018; Stanley et al., 2015), (v) in silico inhibitor design and development for IDRs ( Baggett & Nath, 2018; Santofimia-Castaño et al., 2019; Yu et al., 2016) and (vi) the processing of the data produced by experimental structural and functional analyses of IDRs ( Bernadó et al., 2007; Franke et al., 2017; Nodet et al., 2009; Ozenne et al., 2012) ( Table 1). The members of the European IDP community are major contributors across these computational IDP fields ( Figure 2).
Table 1. Representative core software of the IDP field developed and hosted in Europe.
Name | Description |
---|---|
IUPred | Prediction of intrinsically disordered regions.
URL: https://iupred2a.elte.hu ( Dosztányi et al., 2005) |
ESpritz | Prediction of intrinsically disordered regions
URL: http://protein.bio.unipd.it/espritz/ ( Walsh et al., 2012) |
FoldIndex | Prediction of intrinsically disordered regions
URL: https://fold.weizmann.ac.il/fldbin/findex ( Prilusky et al., 2005) |
MobiDB-lite | Prediction of intrinsically disordered regions
URL: http://protein.bio.unipd.it/mobidblite/ ( Necci et al., 2017) |
DynaMine | Prediction of protein backbone dynamics
URL: https://dynamine.ibsquare.be ( Cilia et al., 2014) |
DiLiMot | Prediction of functional modules in intrinsically disordered regions.
URL: http://dilimot.russelllab.org/ ( Neduva & Russell, 2006) |
SLiMSearch | Prediction of functional modules in intrinsically disordered regions.
URL: http://slim.ucd.ie/slimsearch/ ( Krystkowiak & Davey, 2017) |
ANCHOR | Prediction of functional modules in intrinsically disordered regions.
URL: https://iupred2a.elte.hu ( Dosztányi et al., 2009) |
PepSite 2 | Prediction of IDP binding sites.
URL: http://pepsite2.russelllab.org/ ( Trabuco et al., 2012) |
FlexPepDock | Docking of IDPs to their ordered binding partner.
URL: http://flexpepdock.furmanlab.cs.huji.ac.il/ ( Raveh et al., 2011) |
δ2D | Secondary structure propensity from NMR data
URL: http://www-mvsoftware.ch.cam.ac.uk/ ( Camilloni et al., 2012) |
ncSPC | Structural propensity calculator from NMR data
URL: http://linuxnmr02.chem.rug.nl/ncSPC ( Tamiola & Mulder, 2012) |
Flexible-Meccano | Generation of ensemble descriptions of IDPs
URL: http://www.ibs.fr/research/scientific-output/software/flexible-meccano/ ( Ozenne et al., 2012) |
ATSAS | A data analysis suite for SAXS data including the EOM tool
URL: https://www.embl-hamburg.de/biosaxs/ ( Bernadó et al., 2007; Franke et al., 2017) |
GROMACS | Molecular Dynamics software applicable to IDPs
URL: http://www.gromacs.org/ ( Hess et al., 2008) |
PLUMED2 | Library for free energy calculations
URL: https://www.plumed.org/ ( Tribello et al., 2014) |
Figure 2. Representative core software ( Table 1, marked with a diamond) and resources ( Table 2) for the IDP field developed and hosted in Europe.
Europe hosts many of the key IDP resources ( Table 2). DisProt is the largest database of manually curated experimental data on IDP structure ( Piovesan et al., 2017; Sickmeier et al., 2007). MobiDB and D 2P 2 are central resources of pre-computed structural prediction of IDPs ( Piovesan et al., 2018; Oates et al., 2013). ELM is a manually curated database of binding regions residing in IDPs ( Gouw et al., 2018). SASBDB and PCDDB are central resources for Small-angle X-ray Scattering (SAXS) and Circular Dichroism (CD) data ( Valentini et al., 2015; Whitmore et al., 2017). PED 3 is a database of conformational ensembles from Nuclear Magnetic Resonance (NMR) and SAXS data, and Molecular Dynamics (MD) simulations ( Varadi et al., 2014). It is important to note that the vast majority of the computational frameworks developed by the IDP field had to be created from scratch given the unique structural and functional aspects of IDRs. For example, the analysis of IDP structure drew predominantly from polymer biophysics rather than the existing structural biology of folded proteins ( Holehouse & Pappu, 2018; Milles et al., 2018; Schuler et al., 2016).
Table 2. Representative core resources of the IDP field developed and hosted in Europe.
Name | Description |
---|---|
DisProt | Curated database of experimentally validated intrinsically disordered regions.
URL: http://www.disprot.org ( Piovesan et al., 2017) |
MobiDB | Database of predicted intrinsically disordered regions.
URL: http://mobidb.bio.unipd.it ( Piovesan et al., 2018) |
D2P2 | Database of predicted intrinsically disordered regions.
URL: http://d2p2.pro/ ( Oates et al., 2013) |
PED3 | Database of IDP conformational ensembles.
URL: http://pedb.vib.be/ ( Varadi et al., 2014) |
CheZOD | Database of structural propensities from NMR chemical shift data.
URL: http://www.protein-nmr.org/ ( Nielsen & Mulder, 2016) |
SASBDB | Repository for Small-angle X-ray Scattering (SAXS) data.
URL: https://www.sasbdb.org/ ( Valentini et al., 2015) |
PCDDB | Repository for Circular Dichroism (CD) data.
URL: http://pcddb.cryst.bbk.ac.uk/ ( Whitmore et al., 2017) |
ELM | Curated database of Short, Linear Motifs.
URL: http://elm.eu.org/ ( Gouw et al., 2018) |
DIBS | Curated database of intrinsically disordered binding regions.
URL: http://dibs.enzim.ttk.mta.hu/ ( Schad et al., 2018) |
Switches.ELM | Curated conditional interactomics database for IDPs.
URL: http://switches.elm.eu.org ( Van Roey et al., 2013) |
The ELIXIR IDP Community has grown out of the NGP-NET COST Action (Non-globular proteins: From sequence to structure, function and application in molecular physiopathology), a scientific cooperation funded in 2015 under a Horizon 2020 EU Framework Programme. The NGP-NET community spanned 30 different countries, plus EMBL-EBI and EMBL Heidelberg. NGP-NET held a series of thematic workshops on IDPs to drive the development of computational resources and community standards. A strategic workshop titled “Intrinsically Disordered Proteins in Core Data Resources” was organised by NGP-NET at the EBI campus, Hinxton, the UK on June 1–2, 2017 to discuss the integration of IDP-related data and computational resources into the ELIXIR framework. A major outcome of this workshop was the recognition that IDP annotations are significantly underrepresented in the ELIXIR Core Data Resources (CDRs) ( Durinx et al., 2016). These initial insights evolved into a comprehensive plan to develop the IDP field and integrate key IDP resources and tools into the CDRs. That plan formed the basis for the ELIXIR IDP Community proposal that was submitted and presented at the ELIXIR Head of Nodes meeting in Basel, Switzerland on September 13th, 2017. After completing the ELIXIR Community application process, including the creation of this white paper, the IDP Community proposal was accepted by the ELIXIR Head of Nodes on May 14th, 2019.
In parallel with these developments, the IDP research community has developed closer integration with ELIXIR activities. A follow up strategic meeting, “The 2 nd Workshop on Intrinsically Disordered Proteins in Core Data Resources”, was held in Prague on March 13–14, 2019 and brought together data producers, database developers and a representative of the ELIXIR Data Platform to discuss the integration of IDP data into the CDRs; and a member of the IDP research community attended the ELIXIR Interoperability Platform Face To Face Meeting on 1–2 April 2019. Furthermore, several actions have already taken place within the ELIXIR framework to tackle time-sensitive priorities including two implementation studies, “Implementation study for the integration of ELIXIR-IIB in ELIXIR Data Curation activities” and “Integration and standardisation of intrinsically disordered protein data”, focussing on interoperability. An additional, fundamental development associated with these implementation studies was the initialisation of the HUPO (Human Proteome Organisation) Proteomics Standards Initiative (PSI) Intrinsic Disorder workgroup.
The goal of the IDP Community in ELIXIR is to support the development of standards, tools and resources to accelerate the identification, analysis and functional characterisation of intrinsically disordered regions. Here, we introduce the major areas of priority for the development of an e-infrastructure that will allow the community to realise these goals while supporting the needs of the generators, users and consumers of IDP data.
Identification of community challenges
A strategic workshop titled “An intrinsically disordered protein user community proposal for ELIXIR” was held on October 31st, 2018 at the University of Padua, Italy. The meeting was attended by participants from 13 ELIXIR nodes: Belgium, Cyprus, Czech Republic, EBI, Germany, Hungary, Ireland, Israel, Italy, Netherlands, Spain, Switzerland, and the United Kingdom. ELIXIR members in both Greece and Sweden were also interested, but could not attend. The ELIXIR Hub team was represented by John Hancock, the ELIXIR Communities and Services Coordinator and representatives from the Interoperability, Tools and Training ELIXIR platforms. Members of the 3DBioInfo and Proteomics ELIXIR Communities, the Instruct-ERIC structural biology research infrastructure, and the ELIXIR CDRs of high relevance to the IDP research, namely PDBe ( Mir et al., 2018), IMEx/IntAct ( Orchard et al., 2014) and UniProt ( UniProt Consortium, 2019), were also present. During the meeting, the following topics were identified as key areas of priority for the ELIXIR IDP Community.
Area 1 – Standards and exchange formats for IDP data
The IDP field has not yet established official standards to allow consistent storage and dissemination of data. Hence, a key responsibility of the IDP Community is the definition of guidelines and standards to improve the reproducibility, interpretation, and dissemination of experimental data. The development of novel standards would result in a shift from the current “organic” interoperability of the community, where each group defines their own formats and creates a range of parsers to read the formats developed by other groups, to standardised data dissemination accessible to the larger biological community. For functional data describing protein interactions, molecular interaction interchange standards based on the guidelines of the HUPO-PSI-MI molecular standards ( Sivade Dumousseau et al., 2018) can be applied. However, for structural data, no pre-existing solution exists. In the simplest cases, experimental evidence for IDRs can be mapped to protein sequences as sequence features. Structural descriptions such as IDR conformational ensembles (e.g. based on experimental NMR, SAXS or MD simulations data) or secondary structural propensities (e.g. based on NMR data) will require more descriptive standards, especially if they are probabilistic. Only a subset of the required experimental and functional ontologies are already available and most IDP data cannot be described by pre-existing terminology. IDPs will require novel entries to controlled vocabularies describing specific experimental methods and protein structural concepts.
The first step of this process was undertaken at a workshop for the ELIXIR implementation study: “Integration and standardisation of intrinsically disordered protein data” held at the University of Padua, October 29–30th, 2018. The meeting proposed a first draft Minimum Information About Disorder Experiments (MIADE) standard defining the data required for reporting an IDP experiment. A further key outcome of the meeting was the establishment of the HUPO-PSI intrinsic disorder (HUPO-PSI ID) workgroup to drive the development of the required standards, storage and dissemination formats and controlled vocabularies for the community. Upon completion, the HUPO-PSI ID recommendations will be adopted by the key community resources promoting effortless integration of IDP data into the ELIXIR CDRs. An important goal of the roadmap will be the initiation of collaborations with the experimental communities for each of the distinct structural and functional methods used in IDP research to develop: (i) experimental method specific standards and (ii) workflows with minimal required experimental detail for the characterisation of IDPs. These developments would require extensive collaboration with the data generators and the experimental method specific data repositories including SASBDB ( Valentini et al., 2015), PCDDB ( Whitmore et al., 2017) and the BMRB ( Ulrich et al., 2008) to allow data produced by IDP researchers to be stored and disseminated in the most descriptive and efficient way possible. This will simplify the integration of the results of these analyses into the community resources. The recent development of the PDB-Dev repository of non-atomistic or part-atomistic structural data can provide a prototype for the deposition of experimental IDP data ( Vallat et al., 2018; Peng et al., 2019).
Area 2 - Automated and community-driven curation
The vast majority of the experimental data describing IDPs, the functional modules encoding their function, the regulatory mechanisms conditionally controlling that function, and their dysregulation in disease, is isolated in the text of research and review articles and in poorly formatted supplementary tables. This hampers the integration of IDP information with data such as protein function, modification, splice variants and disease-causing single-nucleotide polymorphisms (SNPs). Consequently, data created at great expense is significantly underutilised. Annotation of bona fide IDPs, especially for their function, is currently a labour-intensive process. The IDP community has already been experimenting with crowdsourced curation, a topic which is of interest to the ELIXIR Data Platform, and included within Task 3 of the ELIXIR Data Platform 2019–2023 ELIXIR Programme.
DisProt is a successful example, leveraging the collective expertise of around 40 researchers from a dozen different IDP labs in as many countries. The rate and accuracy of curation can be improved by integrating the available ELIXIR e-infrastructure into the curation process through partial annotation of articles and through automatic selection, classification and prioritisation of the relevant articles for curation. For example, the annotation of IDP-mediated interactions directly into the IMEx consortium annotation portal would provide a pre-built environment for such an endeavour. Automatic triage and pre-compiling of data based on Europe PMC data would also greatly boost productivity, allowing the community to cope with the increasing amount and complexity of data being published ( Britan et al., 2018). The large number of articles describing the structure and function of IDPs has highlighted the need to exploit text-mining approaches in order to eventually leverage automatic annotation from the literature. ELIXIR would facilitate integration with existing text-mining frameworks. Finally, a future goal of the community is the early capture of IDP data pre-publication directly from the data producers to reduce the need for manual curation. The coordinated experimental and computational fields within the ELIXIR IDP Community can provide a single contact point to lobby journals to require data deposition prior to publication.
Area 3 - Integration with ELIXIR Core Data Resources
IDP annotations are significantly underrepresented in the ELIXIR CDRs and a key goal of the ELIXIR IDP community is to facilitate the integration of IDP data and services into these resources. The CDRs do not currently annotate or import experimental data describing IDR structure and function despite their high abundance and functional importance. Recently, InterPro, PDBe, and UniProt adopted IDP predictions from sequence retrieved from MobiDB, as part of an ELIXIR Implementation Study “Integration and standardisation of intrinsically disordered protein data”. This work can be used as a blueprint for the successful integration of additional IDP data into further CDRs. It would be advantageous both for the IDP Community and the CDRs to comprehensively integrate the available IDP data into these resources. Interoperability can be guaranteed by the implementation of IDP specific ontologies to describe IDP specific features and the availability of standards-compliant RESTful APIs, enabling cross-linking and programmatic access to IDP resources. Many of the key IDP resources already host RESTful APIs and utilise persistent identifiers from CDRs. However, these APIs are currently not fully interoperable with each other without bespoke adaptors, though this will be tackled by Priority Area 1 “Standards and exchange formats for IDP data”. The development of curation guidelines and standards in line with the requirements of the CDRs, or the adoption of CDR guidelines, will streamline the integration process. The IDP resources that annotate functional modules and their interactions are currently not members of the International Molecular Exchange (IMEx) consortium ( Orchard et al., 2012). A key step on the roadmap is joint curation efforts between the ELIXIR IDP Community and the IMEx consortium of IDP-mediated interactions, thereby reducing duplication of effort.
Area 4 – Standardisation, benchmarking and indexing of computational tools
The IDP research field is currently developing best practices for scientific (i/o) file formats, data analysis pipelines and benchmarking of scientific tools. These steps are being taken to raise the quality and accessibility of software developed by the community to produce more accurate, faster, more stable and user-friendly software implementation. The development and adoption of the storage and dissemination formats in Priority Area 1 will help standardise the (i/o) file formats of IDP tools. However, several use cases will not be covered by these formats (e.g. standardised i/o formats for residue-specific IDR scoring) and additional effort will be required to formalise the output formats in these areas. Containerised software, such as those available via BioContainers, or package managers, such as BioConda, are rarely used in IDP research. However, such advances would make IDP tools more accessible to the wider community and would simplify their benchmarking.
The field of IDP research could also benefit from the development of reusable experimental workflows such as those implemented with the Common Workflow Language for commonly used IDP analysis pipelines built on top of ELIXIR CDRs and Deposition Databases. These workflows could then be managed using the Galaxy workflow manager platform. Benchmarking remains an issue for the community as the lack of common benchmarking datasets has hampered the systematic assessment of IDP tools. As a result, many publications have claimed superior performance for biased datasets, furthering the need for standardisation. Due to the availability of high-quality manually curated IDP data from DisProt, the community can provide gold standard blind datasets to run periodic benchmarking assessments of IDR prediction tools. This approach has recently been successfully applied by the Critical Assessment of Intrinsic Disorder (CAID) initiative. A similar platform for the comparison of methods predicting functional modules within IDPs or aligning homologous IDRs would also benefit the community and drive advances in these developing fields.
The development of Open Source software to benchmark IDP analysis methods across a wide range of performance metrics will simplify and standardise the assessment of IDP tools. OpenEbench can be used as an information hub to distribute reference datasets, to run comparative assessments and to publish benchmarking results. From a technical point of view, the adoption of BioContainers and/or Galaxy, and the standardisation of input and output formats would streamline the assessment process. Benchmarking results covering both scientific and technical aspects of the available IDP analysis tools can be hosted at OpenEBench to simplify method selection by users. Finally, the addition of these computational tools for the analysis of IDPs to the bio.tools registry would improve the visibility of these tools.
Area 5 – Development of a centralised knowledge-base for IDP data
Computational IDP researchers based in Europe develop many of the tools and resources that underlie the global IDP e-infrastructure. However, these assets are currently spread over numerous institutes and universities across Europe. The development of an umbrella resource, DisProtCentral (founded June 2017), consolidating the European IDP resources and tools through a single portal will improve the accessibility of these resources for the wider biological community. The DisProtCentral consortium will provide a central hub to access high-quality curation, annotation and predictions of structural and functional information on IDPs, in addition to providing stable identifiers/URIs to describe regions of disorder within specific proteins, enabling cross-referencing and linking between resources. The consortium will include all the stakeholders in the IDP field and provide a centralised repository for the protein disorder-related databases and tools. This will future-proof these key IDP resources against issues arising from the loss of funding or change of group focus.
This initiative to produce such a centralised resource draws attention to the need for a fit-for-purpose interoperability tooling adaptor that translates across the metadata annotations of individual resources. This will allow pre-existing data from distinct resources to be integrated via normalisation functionalities such as ontology cross-referencing for data previously mapped to different ontologies/vocabularies, or ontology term mapping for those that have not been mapped to any standard terminologies. Ideally, newly generated data should be produced FAIR-at-source (Findable, Accessible, Interoperable, and Reusable) ( Wilkinson et al., 2016). DisProtCentral will play a vital role in encouraging IDP data generators to follow the standardised set of best practices: identifying recommended ontologies that are universally adopted by those data generators, defining a common schema markup strategy, and validating persistent identifiers for data to be deposited into the resource. A wider goal of the DisProtCentral resource is to provide a single point of contact to promote discourse and collaboration to ensure that the needs of the IDP data generators, users and consumers are all being met by the computational IDP researchers. An important aspect of this effort will be the development of training material for the wider biological communities describing the best practices for IDP analyses.
Alignment with ELIXIR Activities
The community challenges identified are already well-aligned with on-going ELIXIR platform goals and activities. In particular:
Data Platform: The task proposed in Priority Area 3 “Integration with ELIXIR Core Data Resources” aligns with the goals of the Data Platform. The community is already in the process of integrating IDP annotation into the ELIXIR CDRs as part of two ELIXIR implementation studies, “ Implementation study for the integration of ELIXIR-IIB in ELIXIR Data Curation activities” and “ Integration and standardisation of intrinsically disordered protein data”. The recent integration of MobiDB into InterPro, PDBe and UniProt can be taken as a blueprint for the further integration of IDP information into ELIXIR CDRs. Furthermore, initial discussions have taken place to plan the integration of additional sources of IDP data. A separate task, which can benefit from ongoing Data Platform activities, is the development of an IDP curation framework as described in Priority Area 2 “Automated and community-driven curation”. This aligns with ELIXIR Data Platform Task 3 “Scalable curation” of the 2019–2023 ELIXIR Programme.
The distributed community curation for IDP resources will benefit from stronger interaction with Europe PMC, with the SciLite framework providing literature mining. This is especially important where data is scattered throughout the extensive corpus of biological literature and not properly indexed. Together with an automated curation triage step selecting relevant papers, this can boost the productivity of community curation while ensuring high-quality annotation of the IDP literature. Initial efforts have started with the design of a dedicated curation-support prototype, which has been used by DisProt curators since 2018. The service capitalises on the neXtA5 platform ( Mottin et al., 2017), designed by the Text Mining group of the Swiss Institute of Bioinformatics. The demonstrator ( http://candy.hesge.ch/disprotGUI/) is able to rank articles based on a scoring function, which prioritises articles with a high density of IDP-related concepts. Thanks to the curation-support tool, it will be possible to obtain a high quality curated benchmark, with the aim to evolve the current system from a triage system to a more accurate binary classifier, being able to select not only relevant papers but also to highlight short passages of text in full-text papers likely to support the expert curation.
Tools Platform: The tasks proposed in Priority Area 4 “Standardisation, benchmarking and indexing of computational tools” fall under the objectives of the Tools Platform. A major goal of Priority Area 4 is to increase productivity by developing reusable experimental workflows for commonly used IDP analysis pipelines built on top of ELIXIR CDRs. The Tools Platform can advise the ELIXIR IDP Community on the development of such workflows using the Common Workflow Language in collaboration with the ELIXIR Interoperability platform. The IDP analysis tool benchmarking by the CAID initiative is complementary to the OpenEBench benchmarking platform. OpenEBench represents an information hub for the distribution of reference datasets, the application of metrics and comparative assessments and the distribution of benchmarking results. CAID can also drive prototype development for software containers and reusable experimental workflows for the community. The outcomes could then be generalised for other applications. The bio.tools service registry can be a comprehensive census of the available software for IDP analysis and the BioContainers and Bioconda platforms can assist in the generation of software containers and their registry across different technologies to facilitate access and use of software by the IDP communities and beyond.
Interoperability Platform: Several of the identified priorities will benefit from the input of the Interoperability Platform. The main interaction with the platform will be to develop FAIR-compliant standards and guidelines for reporting, data exchange and retrieval of data on IDP structure and function ( Wilkinson et al., 2016). The recent development of a HUPO PSI-ID workgroup and a draft MIADE standard are key steps towards this goal. IDP resources themselves will benefit from becoming FAIR and adhering and contributing to the further development of standards, controlled vocabularies and ontologies. These advances are of paramount importance to improve the dissemination of IDP data. FAIR-compliant data will also aid in the FAIRification of analysis tools where reusable computational workflows can make programmatic calls to retrieve reusable data from an integrated centralised source. The Interoperability Platform service framework has already defined the various key activities to support the “FAIRification” of data which can be applied to IDP data. The effort to make IDP data FAIR can exploit the various key services that are offered by the Interoperability Platform such as those identified as recommended practices through the ELIXIR Recommended Interoperability Resources (RIRs), and the platform mission-critical initiatives (i.e., Bioschemas, and Common Workflow Language). As this is an emerging community with new data resources in development, deploying these recommended practices in new IDP databases will highlight the IDP Community as a use case example in the mission to make data FAIR-at-source. This work can be supported by the Interoperability Platform under their remit to provide support for interoperability for ELIXIR Communities.
Compute Platform: Work being carried out in the Compute Platform can be leveraged in two main ways. Community curation requires authentication for a wide range of participants, which can benefit from the federated ELIXIR AAI approach for curator login. This could also facilitate the attribution of credit to curators. A second element would be to use the ELIXIR distributed computing infrastructure including identity and access management, data integration and container deployment across a range of appropriate compute endpoints as required, both for updating large-scale databases and to run the CAID experiment. A range of new Compute Platform activities are underway to enable an integrated approach to the deployment of relevant workflows and containers. The IDP Community can take advantage of these activities to deploy IDP-related workflows and containers.
Training Platform: The IDP Community is focussed on the development of the next generation of IDP researchers. Extensive training has been funded and carried out by the COST Action NGP-NET and additional training schools are planned as part of the MSCA-RISE-funded IDPfun project. The Italian ELIXIR node, which has been actively providing training, will organise a yearly training course on IDP resources. The input of the ELIXIR Training Platform will be indispensable to facilitate the development of a core set of training materials, its dissemination and inclusion in ELIXIR training courses. These materials will comprise introductory and advanced lessons on databases access, in silico analysis of IDPs (e.g. IDP prediction, IDP docking, MD simulations and IDP-specific primary sequence analysis techniques), and computational processing of experimental data (e.g. NMR spectra, phage display). There will also be a requirement for the development of training material related to IDP interoperability to train experimentalists, software developers, database developers and curators in topics such as IDP standards, IDP controlled vocabularies and Common Workflow Language (CWL) for IDPs. This aligns with the Interoperability Platform’s training and outreach task in the 2019–2023 ELIXIR Programme to collaborate with Training Platform members to provide Interoperability training. In collaboration with the Training Platform, the IDP Community will be able to adopt and implement a wide range of best practices and guidelines available through the ELIXIR Training Toolkit, which aims to provide a comprehensive reference resource for developing training capacity, rolling out new training programs, as well as expanding existing ones. All materials and activities developed for IDP training will be shared through TeSS for further dissemination. Additionally, train-the-trainer activities will be planned to increase the size of the IDP trainers pool and build additional training capacity.
Alignment with other communities
Links to ELIXIR Communities
Structural Bioinformatics (3DBioInfo): ELIXIR 3DBioInfo is a community of structural bioinformaticians with the remit of continuing the development of the e-infrastructure for the storage, visualisation, analysis, annotation, and prediction of the structure of biological macromolecules and complexes. The probabilistic approach to protein structure employed by the IDP researchers is highly complementary to the work of the ELIXIR 3DBioInfo community and together, a complete structural description of proteins can be achieved. With the two communities addressing different challenges, there are numerous opportunities for synergistic connections between them, particularly in the development of structural ontologies and standards, structure boundary definition for structural studies, methods to predict IDR-globular domain interactions and whole protein structure prediction tools.
Proteomics. The ELIXIR Proteomics Community aspires to improve the research on proteoforms including protein forms caused by post-translational modifications (PTMs) and sequence variants. To this end, proteoform-centric annotations of proteomes are needed. This includes information on the co-occurrences of IDRs with PTMs and non-constitutive exons, and if possible, the functional outcomes of these conditional changes to the protein. Close collaboration is desirable between the communities regarding data interoperability to promote the use of IDP and proteomics data in data analysis workflows. IDP research also applies proteomics-related technologies such as ion-mobility mass spectrometry and cross-linking mass spectrometry. In these cases, close collaboration with the proteomics community within the HUPO-PSI Mass Spectrometry workgroup will support the standardisation efforts of the IDP community.
Rare Diseases: The overarching goal of the ELIXIR Rare Diseases Community is to create a sustainable, reusable, and interoperable infrastructure that will enable researchers to discover, access, and analyse rare disease data. A key aspect of the analysis of rare diseases data is the elucidation of the mechanism(s) by which genetic change(s) result in a diseased state. While the interpretation of the effect of disease mutations altering globular domains is well established, IDPs represent challenges in this context. Interaction and collaboration with the ELIXIR IDP Community will provide invaluable information to aid in the understanding of disease-causing variations, truncations, and/or chimeric oncogene(s) related to IDRs. The development, and dissemination of standards for IDR structural and functional data as proposed in priority area 1 are important aspects for the collaboration between the ELIXIR IDP and Rare Diseases Communities.
Galaxy: Galaxy is a web-based e-infrastructure for computational biomedical research. It allows users with minimal computational proficiency to run and share data analysis workflows. This promotes reproducibility and simplifies sharing of data and results. Currently, there is limited use of Galaxy by the ELIXIR IDP community. However, one of the priority areas of the roadmap, “ Standardisation, benchmarking and indexing of computational tools”, is the development of reusable experimental workflows for commonly used IDP analysis pipelines built on top of ELIXIR CDRs. A collaboration with the ELIXIR Galaxy Community would be highly beneficial for such an endeavour, particularly in the development of IDP pipelines using the ELIXIR scientific workflow platform.
Links to non-ELIXIR communities
Instruct-ERIC: Instruct-ERIC is a European research infrastructure for structural biology that makes high-end technologies and methods available to European researchers. Similarly to the proposed ELIXIR 3DBioInfo Community, the interdependency of the research performed by Instruct-ERIC and the IDP researchers, covering the structured and unstructured parts of the proteome, allows for extensive synergies. Instruct-ERIC coordinates access to facilities which permit the structural analysis of IDPs including several centres specialised in NMR and biophysical techniques. Furthermore, Instruct-ERIC has developed a set of high-quality training courses in structural biology. Given the complementary teaching focus of the ELIXIR IDP Community and the Instruct-ERIC initiative, future training schools with contributions from both sources would be of great benefit to any participant.
The Dark Proteome Initiative: The Dark Proteome Initiative is a US consortium of experimentalists with the goal of fostering collaborations amongst IDP researchers and lobbying for funding to address open and biologically important questions in the IDP field. The Dark Proteome Initiative is a readymade community of world-class generators of IDP data utilising a wide range of experimental approaches allowing the ELIXIR IDP Community to access a single entity for guidance on the needs of experimental data generators. The Dark Proteome Initiative will also be able to provide invaluable advice on the detailed definitions and requirements for the development of ontologies and standards.
PLUMED consortium: The PLUMED consortium has been recently established as an open community including PLUMED developers, contributors and users to help to establish reproducibility, open access data, and harmonisation of the protocols for molecular dynamics simulations, free energy calculations and other simulations that can be run with the PLUMED software (PLUMED consortium, 2019).
Conclusions
Recent years have seen a rapid growth of interest in IDPs. This has coincided with significant advances in the in vivo, in vitro and in silico methods to study the structure and function of IDPs. Unfortunately, the basic requirements for the organisation and dissemination of the tools and data produced by the field have not advanced in step with these developments. The field is on the cusp of an era where the high-throughput characterisation of the extensive intrinsically disordered regions (IDRs) of proteins is possible. This roadmap will provide the foundation that supports this data explosion and provide a solid platform for the future biological research of IDPs.
Data availability
No data are associated with this article.
Funding Statement
A strategic meeting to identify the key priority areas for the roadmap was funded by ELIXIR, the research infrastructure for life-science data. BW was funded by a grant from the BBSRC [BB/P024092/1]. AB, SO and PLM were supported by the National Eye Institute (NEI), National Human Genome Research Institute (NHGRI), National Heart, Lung, and Blood Institute (NHLBI), National Institute of Allergy and Infectious Diseases (NIAID), National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of General Medical Sciences (NIGMS), and National Institute of Mental Health (NIMH) of the National Institutes of Health under Award Number [U24HG007822] (the content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health), by European Molecular Biology Laboratory (EMBL) and by the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI. AG was supported by the European Molecular Biology Laboratory (EMBL) and the Wellcome Trust [104948]. AE was supported by a Vetenskapsrådet grant [2016-03798]. EP was supported by Danmarks Grundforskningsfond [DNRF125] and a Carlsberg Foundation Distinguished Fellowship [CF18-0314]. JLS was supported by the Israel I-CORE Project: Integrated Structural Cell Biology and by Instruct-ERIC. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement [778247] and COST Action BM1405 NGP-net.
[version 1; peer review: 2 approved]
References
- Babu MM, Kriwacki RW, Pappu RV: Structural biology. Versatility from protein disorder. Science. 2012;337(6101):1460–1461. 10.1126/science.1228775 [DOI] [PubMed] [Google Scholar]
- Baggett DW, Nath A: The Rational Discovery of a Tau Aggregation Inhibitor. Biochemistry. 2018;57(42):6099–6107. 10.1021/acs.biochem.8b00581 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bah A, Forman-Kay JD: Modulation of Intrinsically Disordered Protein Function by Post-translational Modifications. J Biol Chem. 2016;291(13):6696–6705. 10.1074/jbc.R115.695056 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Beltrao P, Albanèse V, Kenner LR, et al. : Systematic functional prioritization of protein posttranslational modifications. Cell. 2012;150(2):413–425. 10.1016/j.cell.2012.05.036 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bernadó P, Mylonas E, Petoukhov MV, et al. : Structural characterization of flexible proteins using small-angle X-ray scattering. J Am Chem Soc. 2007;129(17):5656–5664. 10.1021/ja069124n [DOI] [PubMed] [Google Scholar]
- Best RB: Computational and theoretical advances in studies of intrinsically disordered proteins. Curr Opin Struct Biol. 2017;42:147–154. 10.1016/j.sbi.2017.01.006 [DOI] [PubMed] [Google Scholar]
- Blikstad C, Ivarsson Y: High-throughput methods for identification of protein-protein interactions involving short linear motifs. Cell Commun Signal. 2015;13:38. 10.1186/s12964-015-0116-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bonetti D, Troilo F, Brunori M, et al. : How Robust Is the Mechanism of Folding-Upon-Binding for an Intrinsically Disordered Protein? Biophys J. 2018;114(8):1889–1894. 10.1016/j.bpj.2018.03.017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Borgia A, Borgia MB, Bugge K, et al. : Extreme disorder in an ultrahigh-affinity protein complex. Nature. 2018;555(7694):61–66. 10.1038/nature25762 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Britan A, Cusin I, Hinard V, et al. : Accelerating annotation of articles via automated approaches: evaluation of the neXtA 5 curation-support tool by neXtProt. Database (Oxford). 2018;2018. 10.1093/database/bay129 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Buljan M, Chalancon G, Eustermann S, et al. : Tissue-specific splicing of disordered segments that embed binding motifs rewires protein interaction networks. Mol Cell. 2012;46(6):871–883. 10.1016/j.molcel.2012.05.039 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Camilloni C, De Simone A, Vranken WF, et al. : Determination of secondary structure populations in disordered states of proteins using nuclear magnetic resonance chemical shifts. Biochemistry. 2012;51(11):2224–2231. 10.1021/bi3001825 [DOI] [PubMed] [Google Scholar]
- Chong SH, Chatterjee P, Ham S: Computer Simulations of Intrinsically Disordered Proteins. Annu Rev Phys Chem. 2017;68:117–134. 10.1146/annurev-physchem-052516-050843 [DOI] [PubMed] [Google Scholar]
- Chouard T: Structural biology: Breaking the protein rules. Nature. 2011;471(7337):151–153. 10.1038/471151a [DOI] [PubMed] [Google Scholar]
- Cilia E, Pancsa R, Tompa P, et al. : The DynaMine webserver: predicting protein dynamics from sequence. Nucleic Acids Res. 2014;42(Web Server issue):W264–70. 10.1093/nar/gku270 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Corbi-Verge C, Kim PM: Motif mediated protein-protein interactions as drug targets. Cell Commun Signal. 2016;14:8. 10.1186/s12964-016-0131-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Csizmok V, Forman-Kay JD: Complex regulatory mechanisms mediated by the interplay of multiple post-translational modifications. Curr Opin Struct Biol. 2018;48:58–67. 10.1016/j.sbi.2017.10.013 [DOI] [PubMed] [Google Scholar]
- Davey NE: The functional importance of structure in unstructured protein regions. Curr Opin Struct Biol. 2019;56:155–163. 10.1016/j.sbi.2019.03.009 [DOI] [PubMed] [Google Scholar]
- Davey NE, Cyert MS, Moses AM: Short linear motifs - ex nihilo evolution of protein regulation. Cell Commun Signal. 2015;13:43. 10.1186/s12964-015-0120-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davey NE, Morgan DO: Building a Regulatory Network with Short Linear Sequence Motifs: Lessons from the Degrons of the Anaphase-Promoting Complex. Mol Cell. 2016;64(1):12–23. 10.1016/j.molcel.2016.09.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davey NE, Seo MH, Yadav VK, et al. : Discovery of short linear motif-mediated interactions through phage display of intrinsically disordered regions of the human proteome. FEBS J. 2017;284(3):485–498. 10.1111/febs.13995 [DOI] [PubMed] [Google Scholar]
- Davey NE, Travé G, Gibson TJ: How viruses hijack cell regulation. Trends Biochem Sci. 2011;36(3):159–169. 10.1016/j.tibs.2010.10.002 [DOI] [PubMed] [Google Scholar]
- Dosztányi Z, Csizmók V, Tompa P, et al. : The pairwise energy content estimated from amino acid composition discriminates between folded and intrinsically unstructured proteins. J Mol Biol. 2005;347(4):827–839. 10.1016/j.jmb.2005.01.071 [DOI] [PubMed] [Google Scholar]
- Dosztányi Z, Mészáros B, Simon I: ANCHOR: web server for predicting protein binding regions in disordered proteins. Bioinformatics. 2009;25(20):2745–2746. 10.1093/bioinformatics/btp518 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dunker AK, Silman I, Uversky VN, et al. : Function and structure of inherently disordered proteins. Curr Opin Struct Biol. 2008;18(6):756–64. 10.1016/j.sbi.2008.10.002 [DOI] [PubMed] [Google Scholar]
- Durinx C, McEntyre J, Appel R, et al. : Identifying ELIXIR core data resources [version 2; peer review: 2 approved]. F1000Res. 2016;5: pii: ELIXIR-2422. 10.12688/f1000research.9656.2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dyson HJ, Wright PE: How do intrinsically disordered viral proteins hijack the cell? Biochemistry. 2018;57(28):4045–4046. 10.1021/acs.biochem.8b00622 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dyson HJ, Wright PE: Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol. 2005;6(3):197–208. 10.1038/nrm1589 [DOI] [PubMed] [Google Scholar]
- Edwards RJ, Palopoli N: Computational prediction of short linear motifs from protein sequences. Methods Mol Biol. 2015;1268:89–141. 10.1007/978-1-4939-2285-7_6 [DOI] [PubMed] [Google Scholar]
- Felli IC, Pierattelli R: Intrinsically Disordered Proteins Studied by NMR Spectroscopy.Springer.2015. 10.1007/978-3-319-20164-1 [DOI] [Google Scholar]
- Forman-Kay JD, Mittag T: From sequence and forces to structure, function, and evolution of intrinsically disordered proteins. Structure. 2013;21(9):1492–1499. 10.1016/j.str.2013.08.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Franke D, Petoukhov MV, Konarev PV, et al. : ATSAS 2.8: a comprehensive data analysis suite for small-angle scattering from macromolecular solutions. J Appl Crystallogr. 2017;50(Pt 4):1212–1225. 10.1107/S1600576717007786 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fuertes G, Banterle N, Ruff KM, et al. : Decoupling of size and shape fluctuations in heteropolymeric sequences reconciles discrepancies in SAXS vs. FRET measurements. Proc Natl Acad Sci U S A. 2017;114(31):E6342–E6351. 10.1073/pnas.1704692114 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fuxreiter M: Fold or not to fold upon binding - does it really matter? Curr Opin Struct Biol. 2018;54:19–25. 10.1016/j.sbi.2018.09.008 [DOI] [PubMed] [Google Scholar]
- Gibson TJ, Dinkel H, Van Roey K, et al. : Experimental detection of short regulatory motifs in eukaryotic proteins: tips for good practice as well as for bad. Cell Commun Signal. 2015;13:42. 10.1186/s12964-015-0121-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gouw M, Michael S, Sámano-Sánchez H, et al. : The eukaryotic linear motif resource - 2018 update. Nucleic Acids Res. 2018;46(D1):D428–D434. 10.1093/nar/gkx1077 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guharoy M, Bhowmick P, Sallam M, et al. : Tripartite degrons confer diversity and specificity on regulated protein degradation in the ubiquitin-proteasome system. Nat Commun. 2016;7: 10239. 10.1038/ncomms10239 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hess B, Kutzner C, van der Spoel D, et al. : GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation. J Chem Theory Comput. 2008;4(3):435–47. 10.1021/ct700301q [DOI] [PubMed] [Google Scholar]
- Holehouse AS, Pappu RV: Collapse Transitions of Proteins and the Interplay Among Backbone, Sidechain, and Solvent Interactions. Annu Rev Biophys. 2018;47:19–39. 10.1146/annurev-biophys-070317-032838 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Holmstrom ED, Holla A, Zheng W, et al. : Accurate Transfer Efficiencies, Distance Distributions, and Ensembles of Unfolded and Intrinsically Disordered Proteins From Single-Molecule FRET. Methods Enzymol. 2018;611:287–325. 10.1016/bs.mie.2018.09.030 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang J, MacKerell AD, Jr: Force field development and simulations of intrinsically disordered proteins. Curr Opin Struct Biol. 2018;48:40–48. 10.1016/j.sbi.2017.10.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iakoucheva LM, Radivojac P, Brown CJ, et al. : The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 2004;32(3):1037–1049. 10.1093/nar/gkh253 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iešmantavicius V, Dogan J, Jemth P, et al. : Helical propensity in an intrinsically disordered protein accelerates ligand binding. Angew Chem Int Ed Engl. 2014;53(6):1548–1551. 10.1002/anie.201307712 [DOI] [PubMed] [Google Scholar]
- Ivarsson Y, Jemth P: Affinity and specificity of motif-based protein-protein interactions. Curr Opin Struct Biol. 2019;54:26–33. 10.1016/j.sbi.2018.09.009 [DOI] [PubMed] [Google Scholar]
- Kriwacki RW, Hengst L, Tennant L, et al. : Structural studies of p21Waf1/Cip1/Sdi1 in the free and Cdk2-bound state: conformational disorder mediates binding diversity. Proc Natl Acad Sci U S A. 1996;93(21):11504–11509. 10.1073/pnas.93.21.11504 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kruse T, Biedenkopf N, Hertz EPT, et al. : The Ebola Virus Nucleoprotein Recruits the Host PP2A-B56 Phosphatase to Activate Transcriptional Support Activity of VP30. Mol Cell. 2019;69(1):136–145.e6. 10.1016/j.molcel.2017.11.034 [DOI] [PubMed] [Google Scholar]
- Krystkowiak I, Davey NE: SLiMSearch: a framework for proteome-wide discovery and annotation of functional modules in intrinsically disordered regions. Nucleic Acids Res. 2017;45(W1):W464–W469. 10.1093/nar/gkx238 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Light S, Sagit R, Sachenkova O, et al. : Protein expansion is primarily due to indels in intrinsically disordered regions. Mol Biol Evol. 2013;30(12):2645–2653. 10.1093/molbev/mst157 [DOI] [PubMed] [Google Scholar]
- Martin EW, Mittag T: Relationship of Sequence and Phase Separation in Protein Low-Complexity Regions. Biochemistry. 2018;57(17):2478–2487. 10.1021/acs.biochem.8b00008 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mészáros B, Kumar M, Gibson TJ, et al. : Degrons in cancer. Sci Signal. 2017;10(470): pii: eaak9982. 10.1126/scisignal.aak9982 [DOI] [PubMed] [Google Scholar]
- Milles S, Salvi N, Blackledge M, et al. : Characterization of intrinsically disordered proteins and their dynamic complexes: From in vitro to cell-like environments. Prog Nucl Magn Reson Spectrosc. 2018;109:79–100. 10.1016/j.pnmrs.2018.07.001 [DOI] [PubMed] [Google Scholar]
- Mir S, Alhroub Y, Anyango S, et al. : PDBe: towards reusable data delivery infrastructure at protein data bank in Europe. Nucleic Acids Res. 2018;46(D1):D486–D492. 10.1093/nar/gkx1070 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mitrea DM, Kriwacki RW: Phase separation in biology; functional organization of a higher order. Cell Commun Signal. 2016;14:1. 10.1186/s12964-015-0125-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mottin L, Pasche E, Gobeill J, et al. : Triage by ranking to support the curation of protein interaction. Database (Oxford). 2017;2017. 10.1093/database/bax040 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mottin L, Pasche E, Gobeill J, et al. : Triage by ranking to support the curation of protein interactions. Database (Oxford). 2017;2017. 10.1093/database/bax040 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Necci M, Piovesan D, Dosztányi Z, et al. : MobiDB-lite: fast and highly specific consensus prediction of intrinsic disorder in proteins. Bioinformatics. 2017;33(9):1402–1404. 10.1093/bioinformatics/btx015 [DOI] [PubMed] [Google Scholar]
- Neduva V, Russell RB: DILIMOT: discovery of linear motifs in proteins. Nucleic Acids Res. 2006;34(Web Server issue):W350–5. 10.1093/nar/gkl159 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nguyen HQ, Baxter BC, Brower K, et al. : Programmable Microfluidic Synthesis of Over One Thousand Uniquely Identifiable Spectral Codes. Adv Opt Mater. 2017;5(3): pii: 1600548. 10.1002/adom.201600548 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nielsen JT, Mulder FAA: There is Diversity in Disorder-"In all Chaos there is a Cosmos, in all Disorder a Secret Order". Front Mol Biosci. 2016;3:4. 10.3389/fmolb.2016.00004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nodet G, Salmon L, Ozenne V, et al. : Quantitative description of backbone conformational sampling of unfolded proteins at amino acid resolution from NMR residual dipolar couplings. J Am Chem Soc. 2009;131(49):17908–17918. 10.1021/ja9069024 [DOI] [PubMed] [Google Scholar]
- Oates ME, Romero P, Ishida T, et al. : D 2P 2: database of disordered protein predictions. Nucleic Acids Res. 2013;41(Database issue):D508–16. 10.1093/nar/gks1226 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Olsen JG, Teilum K, Kragelund BB: Behaviour of intrinsically disordered proteins in protein-protein complexes with an emphasis on fuzziness. Cell Mol Life Sci. 2017;74(17):3175–3183. 10.1007/s00018-017-2560-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orchard S, Ammari M, Aranda B, et al. : The MIntAct project--IntAct as a common curation platform for 11 molecular interaction databases. Nucleic Acids Res. 2014;42(Database issue):D358–63. 10.1093/nar/gkt1115 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orchard S, Kerrien S, Abbani S, et al. : Protein interaction data curation: the International Molecular Exchange (IMEx) consortium. Nat Methods. 2012;9(4):345–350. 10.1038/nmeth.1931 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ozenne V, Bauer F, Salmon L, et al. : Flexible-meccano: a tool for the generation of explicit ensemble descriptions of intrinsically disordered proteins and their associated experimental observables. Bioinformatics. 2012;28(11):1463–1470. 10.1093/bioinformatics/bts172 [DOI] [PubMed] [Google Scholar]
- Pancsa R, Tompa P: Structural disorder in eukaryotes. PLoS One. 2012;7(4):e34687. 10.1371/journal.pone.0034687 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Peng Y, Cao S, Kiselar J, et al. : A Metastable Contact and Structural Disorder in the Estrogen Receptor Transactivation Domain. Structure. 2019;27(2):229–240.e4. 10.1016/j.str.2018.10.026 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piovesan D, Tabaro F, Mičetić I, et al. : DisProt 7.0: a major update of the database of disordered proteins. Nucleic Acids Res. 2017;45(D1):D219–D227. 10.1093/nar/gkw1056 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Piovesan D, Tabaro F, Paladin L, et al. : MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins. Nucleic Acids Res. 2018;46(D1):D471–D476. 10.1093/nar/gkx1071 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Plitzko JM, Schuler B, Selenko P: Structural Biology outside the box-inside the cell. Curr Opin Struct Biol. 2017;46: 110–121. 10.1016/j.sbi.2017.06.007 [DOI] [PubMed] [Google Scholar]
- Prilusky J, Felder CE, Zeev-Ben-Mordehai T, et al. : FoldIndex©: a simple tool to predict whether a given protein sequence is intrinsically unfolded. Bioinformatics. 2005;21(16):3435–3438. 10.1093/bioinformatics/bti537 [DOI] [PubMed] [Google Scholar]
- Raveh B, London N, Zimmerman L, et al. : Rosetta FlexPepDock ab-initio: simultaneous folding, docking and refinement of peptides onto their receptors. PLoS One. 2011;6(4):e18934. 10.1371/journal.pone.0018934 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Santofimia-Castaño P, Xia Y, Lan W, et al. : Ligand-based design identifies a potent NUPR1 inhibitor exerting anticancer activity via necroptosis. J Clin Invest. 2019;129(6):2500–2513. 10.1172/JCI127223 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schad E, Fichó E, Pancsa R, et al. : DIBS: a repository of disordered binding sites mediating interactions with ordered proteins. Bioinformatics. 2018;34(3):535–537. 10.1093/bioinformatics/btx640 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schuler B, Soranno A, Hofmann H, et al. : Single-Molecule FRET Spectroscopy and the Polymer Physics of Unfolded and Intrinsically Disordered Proteins. Annu Rev Biophys. 2016;45:207–231. 10.1146/annurev-biophys-062215-010915 [DOI] [PubMed] [Google Scholar]
- Scott JD, Pawson T: Cell signaling in space and time: where proteins come together and when they’re apart. Science. 2009;326(5957):1220–1224. 10.1126/science.1175668 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shigemitsu Y, Hiroaki H: Common molecular pathogenesis of disease-related intrinsically disordered proteins revealed by NMR analysis. J Biochem. 2018;163(1):11–18. 10.1093/jb/mvx056 [DOI] [PubMed] [Google Scholar]
- Shoemaker BA, Portman JJ, Wolynes PG: Speeding molecular recognition by using the folding funnel: the fly-casting mechanism. Proc Natl Acad Sci U S A. 2000;97(16):8868–8873. 10.1073/pnas.160259697 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sickmeier M, Hamilton JA, LeGall T, et al. : DisProt: the Database of Disordered Proteins. Nucleic Acids Res. 2007;35(Database issue):D786–93. 10.1093/nar/gkl893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sivade Dumousseau M, Alonso-López D, Ammari M, et al. : Encompassing new use cases - level 3.0 of the HUPO-PSI format for molecular interactions. BMC Bioinformatics. 2018;19(1):134. 10.1186/s12859-018-2118-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stanley N, Esteban-Martín S, De Fabritiis G: Progress in studying intrinsically disordered proteins with atomistic simulations. Prog Biophys Mol Biol. 2015;119(1):47–52. 10.1016/j.pbiomolbio.2015.03.003 [DOI] [PubMed] [Google Scholar]
- Tamiola K, Mulder FA: Using NMR chemical shifts to calculate the propensity for structural order and disorder in proteins. Biochem Soc Trans. 2012;40(5):1014–1020. 10.1042/BST20120171 [DOI] [PubMed] [Google Scholar]
- Tolchard J, Walpole SJ, Miles AJ, et al. : The intrinsically disordered Tarp protein from chlamydia binds actin with a partially preformed helix. Sci Rep. 2018;8(1):1960. 10.1038/s41598-018-20290-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tompa P: The interplay between structure and function in intrinsically unstructured proteins. FEBS Lett. 2005;579(15):3346–3354. 10.1016/j.febslet.2005.03.072 [DOI] [PubMed] [Google Scholar]
- Tompa P: Unstructural biology coming of age. Curr Opin Struct Biol. 2011;21(3):419–425. 10.1016/j.sbi.2011.03.012 [DOI] [PubMed] [Google Scholar]
- Tompa P, Davey NE, Gibson TJ, et al. : A million peptide motifs for the molecular biologist. Mol Cell. 2014;55(2):161–169. 10.1016/j.molcel.2014.05.032 [DOI] [PubMed] [Google Scholar]
- Trabuco LG, Lise S, Petsalaki E, et al. : PepSite: prediction of peptide-binding sites from protein surfaces. Nucleic Acids Res. 2012;40(Web Server issue):W423–7. 10.1093/nar/gks398 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tribello GA, Bonomi M, Branduardi D, et al. : PLUMED2: New feathers for an old bird. Comp Phys Comm. 2014;185(2):604–613. 10.1016/j.cpc.2013.09.018 [DOI] [Google Scholar]
- Ulrich EL, Akutsu H, Doreleijers JF, et al. : BioMagResBank. Nucleic Acids Res. 2008;36(Database issue):D402–8. 10.1093/nar/gkm957 [DOI] [PMC free article] [PubMed] [Google Scholar]
- UniProt Consortium: UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019;47(D1):D506–D515. 10.1093/nar/gky1049 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uversky VN, Oldfield CJ, Dunker AK: Intrinsically disordered proteins in human diseases: introducing the D 2 concept. Annu Rev Biophys. 2008;37:215–246. 10.1146/annurev.biophys.37.032807.125924 [DOI] [PubMed] [Google Scholar]
- Valentini E, Kikhney AG, Previtali G, et al. : SASBDB, a repository for biological small-angle scattering data. Nucleic Acids Res. 2015;43(Database issue):D357–63. 10.1093/nar/gku1047 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vallat B, Webb B, Westbrook JD, et al. : Development of a Prototype System for Archiving Integrative/Hybrid Structure Models of Biological Macromolecules. Structure. 2018;26(6):894–904.e2. 10.1016/j.str.2018.03.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- van der Lee R, Buljan M, Lang B, et al. : Classification of intrinsically disordered regions and proteins. Chem Rev. 2014;114(13):6589–6631. 10.1021/cr400525m [DOI] [PMC free article] [PubMed] [Google Scholar]
- Van Roey K, Dinkel H, Weatheritt RJ, et al. : The switches.ELM resource: a compendium of conditional regulatory interaction interfaces. Sci Signal. 2013;6(269):rs7. 10.1126/scisignal.2003345 [DOI] [PubMed] [Google Scholar]
- Van Roey K, Gibson TJ, Davey NE: Motif switches: decision-making in cell regulation. Curr Opin Struct Biol. 2012;22(3):378–385. 10.1016/j.sbi.2012.03.004 [DOI] [PubMed] [Google Scholar]
- Van Roey K, Uyar B, Weatheritt RJ, et al. : Short linear motifs: ubiquitous and functionally diverse protein interaction modules directing cell regulation. Chem Rev. 2014;114(13):6733–6778. 10.1021/cr400585q [DOI] [PubMed] [Google Scholar]
- Varadi M, Kosol S, Lebrun P, et al. : pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins. Nucleic Acids Res. 2014;42(Database issue):D326–35. 10.1093/nar/gkt960 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Via A, Uyar B, Brun C, et al. : How pathogens use linear motifs to perturb host cell networks. Trends Biochem Sci. 2015;40(1):36–48. 10.1016/j.tibs.2014.11.001 [DOI] [PubMed] [Google Scholar]
- Volkmer R: Synthesis and application of peptide arrays: quo vadis SPOT technology. Chembiochem. 2009;10(9):1431–1442. 10.1002/cbic.200900078 [DOI] [PubMed] [Google Scholar]
- Walsh I, Martin AJ, Di Domenico T, et al. : ESpritz: accurate and fast prediction of protein disorder. Bioinformatics. 2012;28(4):503–509. 10.1093/bioinformatics/btr682 [DOI] [PubMed] [Google Scholar]
- Weatheritt RJ, Davey NE, Gibson TJ: Linear motifs confer functional diversity onto splice variants. Nucleic Acids Res. 2012;40(15): 7123–7131. 10.1093/nar/gks442 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weatheritt RJ, Gibson TJ: Linear motifs: lost in (pre)translation. Trends Biochem Sci. 2012;37(8):333–341. 10.1016/j.tibs.2012.05.001 [DOI] [PubMed] [Google Scholar]
- Whitmore L, Miles AJ, Mavridis L, et al. : PCDDB: new developments at the Protein Circular Dichroism Data Bank. Nucleic Acids Res. 2017;45(D1):D303–D307. 10.1093/nar/gkw796 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. : The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. 10.1038/sdata.2016.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright PE, Dyson HJ: Intrinsically disordered proteins in cellular signalling and regulation. Nat Rev Mol Cell Biol. 2015;16(1):18–29. 10.1038/nrm3920 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xue B, Dunker AK, Uversky VN: Orderly order in protein intrinsic disorder distribution: disorder in 3500 proteomes from viruses and the three domains of life. J Biomol Struct Dyn. 2012;30(2):137–149. 10.1080/07391102.2012.675145 [DOI] [PubMed] [Google Scholar]
- Xue B, Mizianty MJ, Kurgan L, et al. : Protein intrinsic disorder as a flexible armor and a weapon of HIV-1. Cell Mol Life Sci. 2012;69(8):1211–1259. 10.1007/s00018-011-0859-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yu C, Niu X, Jin F, et al. : Structure-based Inhibitor Design for the Intrinsically Disordered Protein c-Myc. Sci Rep. 2016;6:22298. 10.1038/srep22298 [DOI] [PMC free article] [PubMed] [Google Scholar]