Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2019 Nov 12;48(D1):D269–D276. doi: 10.1093/nar/gkz975

DisProt: intrinsic protein disorder annotation in 2020

András Hatos 1, Borbála Hajdu-Soltész 2, Alexander M Monzon 1, Nicolas Palopoli 3, Lucía Álvarez 4, Burcu Aykac-Fas 5, Claudio Bassot 6, Guillermo I Benítez 3, Martina Bevilacqua 1, Anastasia Chasapi 7, Lucia Chemes 4,8, Norman E Davey 9, Radoslav Davidović 10, A Keith Dunker 11, Arne Elofsson 6, Julien Gobeill 12, Nicolás S González Foutel 4, Govindarajan Sudha 6, Mainak Guharoy 13,14, Tamas Horvath 15, Valentin Iglesias 16, Andrey V Kajava 17,18, Orsolya P Kovacs 15, John Lamb 6, Matteo Lambrughi 5, Tamas Lazar 13,14, Jeremy Y Leclercq 17, Emanuela Leonardi 19,20, Sandra Macedo-Ribeiro 21, Mauricio Macossay-Castillo 13,14, Emiliano Maiani 5, José A Manso 21, Cristina Marino-Buslje 22, Elizabeth Martínez-Pérez 22, Bálint Mészáros 2, Ivan Mičetić 1, Giovanni Minervini 1, Nikoletta Murvai 15, Marco Necci 1, Christos A Ouzounis 7, Mátyás Pajkos 2, Lisanna Paladin 1, Rita Pancsa 15, Elena Papaleo 5,23, Gustavo Parisi 3, Emilie Pasche 12, Pedro J Barbosa Pereira 21, Vasilis J Promponas 24, Jordi Pujols 16, Federica Quaglia 1, Patrick Ruch 12, Marco Salvatore 6, Eva Schad 15, Beata Szabo 15, Tamás Szaniszló 2, Stella Tamana 24, Agnes Tantos 15, Nevena Veljkovic 10, Salvador Ventura 16, Wim Vranken 13,14,25, Zsuzsanna Dosztányi 2, Peter Tompa 13,14,15, Silvio C E Tosatto 1,26,, Damiano Piovesan 1
PMCID: PMC7145575  PMID: 31713636

Abstract

The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the ‘dark’ proteome.

INTRODUCTION

About 20 years ago, the concept of the intrinsic structural disorder of proteins came into being (1,2). Since then, the field has reached adulthood, with the concept of protein disorder gaining wide acceptance in the community. Intrinsically disordered proteins/regions (IDPs/IDRs) are now often being referred to without a citation, the term having become as common as the ‘globular’ structure of a protein, or the ‘active site’ of an enzyme. Yet, the field is still accelerating and has not reached its climax, as signaled by several recent breakthroughs and high-impact stories (3,4).

For example, it was recently recognized by ‘omics’ data analyses that about half of eukaryotic proteins are ‘dark’, in the sense that we have no information on their 3D structure (5), which poses a serious bottleneck in their functional characterization and annotation. Similarly, only 45% of the residues of all human proteins are covered by multiple sequence alignment-based Pfam-A protein family annotations (6). These values suggest that we have only a vague notion about the structure and function of the majority of proteins in our databases. As a significant fraction of the dark proteome and non-Pfam annotated proteins and protein regions are intrinsically disordered (the concepts having become almost synonymous), our best approach for illuminating the dark proteome is to predict disorder from sequence, and experimentally characterize the underlying structural ensembles (7).

The prediction of protein disorder from sequence was on the menu of the Critical Assessment of Protein Structure Prediction (CASP), a community-wide experiment of predicting protein structures from sequence (8), for many years. A new initiative, the Critical Assessment of Intrinsic protein Disorder (CAID), has now reached maturity and will be reintegrated into the CASP programme, with a clearer IDP perspective. New annotations in DisProt have already been used to provide a blind evaluation of disorder predictors (9).

Several recent breakthroughs have also signaled the vitality of the field. An unsettled question with IDPs/IDRs is whether their structural disorder persits in the crowded interior of cells. Whereas diverse indirect evidence indicates that this is the case (10), only in-cell NMR seems currently available to address this issue. For example, it was recently applied to study Parkinson's disease protein α-synuclein (DisProt DP00070), once suggested to have folded, oligomeric structure in cells (11). In-cell NMR has clearly shown that α-synuclein preserves its disordered, monomeric state in non-neuronal and neuronal cells alike (12).

Another aspect of the functionality of IDPs is that they often mediate protein-protein interactions, mostly by folding upon partner binding (13), but sometimes by preserving their structural disorder (fuzziness) in the bound state (14). This was recently shown to occur in the extremely tight (picomolar) interaction between two human IDPs, histone H1 (DisProt DP01156) and its nuclear chaperone, prothymosin-α (DisProt DP01677). These proteins associate while retaining their highly dynamic, fully disordered state (15). Functional regulation of another type may also arise from structural disorder, via the entropic force generated by the structural ensemble of an IDP/IDR. In the enzyme UDP-α-D-glucose-6-dehydrogenase (UGDH, DisProt DP02338), the C-terminal disordered tail has such a role, fine-tuning the energy landscape of the protein and stabilizing a sub-state that has a high affinity for an allosteric inhibitor (16,17).

It is without doubt that we cannot afford to ignore this intrinsically disordered, yet functionally important part of the proteome. Not only does structural disorder play an exquisite role in cellular signaling and regulation (18), it is also often implicated in disease (19,20). Consequently, IDPs also represent important drug targets: a largely unexplored frontier in developing molecular medicine is the rational design of drugs against IDPs (21,22).

Due to these challenges, it is important to update and upgrade DisProt, the primary database of protein disorder. Whereas predicted disorder features are available in MobiDB (18), which has recently been integrated in UniProtKB (23), the crux of understanding protein disorder is the availability of manually curated, experimentally verified disorder annotations. The previous release of the database, DisProt 7 (24), held data of ∼800 entries of IDPs/IDRs. Other databases, like IDEAL (22), ELM (25), DIBS (26) and MFIB (27), also include curated disorder information but are somehow different capturing specific functional aspects, or protein classes, and the overlap with DisProt is minimal (28). To reflect on the above-noted breakthroughs and the recent explosion of the related liquid-liquid phase separation (LLPS) field (29), we present a significant update and upgrade of the DisProt database, which is now at version 8. DisProt 8 holds almost two-times as many entries as DisProt 7, including the majority of those available in aforementioned databases.

DisProt has been completely redesigned with an extended and updated functional classification scheme that relies on functional/structural aspects of annotated regions and incorporates a novel functional class ‘biological condensation’. Annotation concepts have been formalized in a new Disorder Ontology (DO), which is maintained by the entire DisProt community.

DisProt 8 also has many novel features that make it easier to search. The graphical interface has been redesigned and a new entry format provides greater flexibility, simplifies maintenance and allows the capture of more information from the literature.

Lastly, we made significant improvements on the new annotation interface used by DisProt curators to populate the database. It is now easier to use and leverages curators’ work by enabling text-mining technologies, integrating third-party information on-the-fly and implementing several validation checks.

In recent work, specific sequence features have been associated with different disorder ‘flavours’ and mapped on a large scale (30). This information has been used to improve protein function prediction from sequence (31). We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the ‘dark’ proteome.

PROGRESS AND NEW FEATURES

Database structure and implementation

The way disorder information is represented in the literature is inherently complex. Articles describe functional and structural aspects, where IDPs are strictly connected to dynamic behavior. DisProt tries to capture as much biological knowledge as possible while at the same time providing simple and clear annotations. The idea is to optimize user experience and improve data exchange with other major annotation resources.

Database records

The major change compared to the previous release is the new annotation paradigm. In DisProt 7, experimental methods represented the annotation core of a DisProt region and function terms were used as attributes. Now the core of an annotation is the functional/structural aspect of a region and the experimental method is an attribute representing the quality of the annotation. Both functional/structural aspects and the type of evidence are encoded in a controlled vocabulary, in line with other core data resources (e.g. UniProtKB). In the new DisProt region format, a ‘statement’ field has been introduced to track the literature text supporting the evidence. When the text is too long or complicated, a curator statement is provided instead. All ‘statements’ are available from the website and could be used to train text-mining algorithms and to highlight sentence-based annotations on abstracts and full text articles. A new ‘obsolete’ field has been introduced in order to track regions which have been excluded from the current release. It also includes the reason for obsolescence, usually changes in the reference sequence due to UniProKB updates or curator errors.

At present, functional terms can be associated to a subset of disordered residues, i.e. to a region shorter than the one for which disorder has been experimentally evaluated. For example, a paper describing a folding upon binding event can provide two DisProt records, one region spanning the folding residues and another showing the interacting ones. All regions have now a region identifier field which is unique and stable, i.e. it is never reused and becomes obsolete if the reference sequence changes. Functional and structural vocabulary terms along with experimental methods have been encoded in a new Disorder Ontology (DO).

Disorder ontology

In order to describe the different functional aspects of IDPs and the experimental methods used to characterize them, an annotation scheme was introduced in DisProt 7. A more formalized version of the disorder ontology was implemented in DisProt 8, to move towards a descriptive, interoperable and collaborative ontology of IDPs. This is the first release of the Disorder Ontology in the specific Biomedical Ontology (OBO) or the Web Ontology Language (OWL) formats (32,33). Besides improving the ability to reuse and share the ontology, these formats allow definition of label attributes such as ‘xterm’ (cross-references to external databases or ontologies) and ‘synonym EXACT’ (alternative names). They also support assignment of relationships among terms (including for example ‘disjoint_from’ to mark terms that should not be linked together).

An identifier was assigned to each term in the ontology. It gives each label an 8-character accession code (e.g. ‘DO:00001’), with the string ‘DO:’ to indicate the disorder ontology and five numeric characters to indicate the term unambiguously. Mirroring the Gene Ontology, accession numbers are assigned incrementally and there is no relationship between accession codes and the ontology topology.

We have reviewed the terms and organization of the whole ontology, paying particular attention to the ‘Function’ category. We made some straightforward changes, for example, we split ‘Fatty acylation (myristoylation and palmitylation)’ into a renamed parent class ‘Fatty acylation’ and its new children terms ‘Myristoylation’ and ‘Palmitoylation’. A new functional term was also introduced to annotate different phenomena related to ‘Biological condensation’ (DO:00040). It describes proteins that undergo phase separation from a solution, e.g. either to form a dynamic liquid droplet (DO:00041, ‘liquid–liquid phase separation’) or a hydrogel (DO:00042). It also includes cellular protein condensates (DO:00045 and DO:00046 describe ‘granule’ and ‘cellular puncta’, respectively), regardless of their existence in physiological or pathological states (as in ‘Amyloid’, DO:00046). This class provides an initial scheme to annotate the relevant but still scarce information available about protein condensates, and we expect this subset of the hierarchy to be modified (possibly by conforming its own sub-ontology) as the field matures.

The distinction between structural states and dynamic events, like disorder-to-order transitions, has been made clearer. Previously ‘Structural state’ terms were part of the ‘structural transition’ category and ‘disorder’ was only used implicitly. Now, a new ‘structural state’ category has been created and it includes ‘disorder’, ‘order’, ‘pre-molten globule’ and ‘molten globule’ terms. In the future, structural states will be annotated in conjunction with the corresponding environmental conditions affecting the conformation (pH, post-translational modifications (PTMs), temperature, etc.).

All experimental methods are now encoded under the ‘detection method’ branch. An overlap with other ontologies exists, but it is not complete or the definition of the same experiment is often slightly different. For example, in DisProt the term ‘crystallography’ includes ‘missing electron density’ as a child. In other ontologies ‘crystallography’ always indicates methods for structural determination. A new ‘electron cryomicroscopy’ (DO:00128) term has been also introduced in DisProt 8.

The Disorder Ontology (version 0.1.0) is maintained by the DisProt consortium and is available to be adopted by other databases for general use. In the future, it will be made available also from third party dedicated repositories.

Curation process and updates

DisProt data is provided by a community effort and annotations are collected through a web interface, which has been improved drastically compared to the previous version in terms of field validation, autocompletion and Named Entity Recognition (NER). In particular, curators can use a dedicated service from the NextA5 literature triage infrastructure (34) to rank relevant literature starting from a gene name. In complement, when curators start from an article, the DisProt interface exploits the SciLite software through the EuropePMC API (35) to automatically retrieve biological entities and identifiers in the manuscript.

The annotation interface implements the concept of ownership and user privileges. DisProt distinguishes two types of users, curators and reviewers. Curators can edit only entries that they have created, while reviewers can modify all entries. Before release, the reviewers check all annotations to ensure high quality of the data. Curators are experts in the field and trained to meet DisProt annotation standards. As a community database, DisProt looks for new curators. Curator candidates are enrolled upon an evaluation of the curriculum and curation skills.

Access to the annotation interface is restricted to registered curators and provided through Google Authentication (based on the OAuth 2.0 protocol) or the ELIXIR authentication and authorization infrastructure system (36). In the past, the DisProt interface had been kept open for limited time slots. Now the new DisProt interface is always open and new releases will be delivered more frequently, i.e. every six months.

DisProt versioning has been improved. A numeric identifier indicates the version of the database entry, e.g., version ‘8.0’ and a ‘<year>_<month>’ code indicates the version (timestamp) of annotated data, e.g. ‘2019_09’.

Database content

Since the last release, both the number of proteins and regions has almost doubled. DisProt 8 contains 1556 proteins and 3511 sequence segments annotated as disordered, which cover 19.7% of the number of residues. These numbers become 1390 proteins, 3041 regions and 18.7% of disorder content when ambiguous evidence is not considered. Previous annotations have been fixed and updated. Regions shorter than ten residues are no longer allowed and existing short regions were marked as obsolete as the majority are flexible loops annotated from X-ray experiments that do not represent disorder-related functional sites. Regions ending outside the sequence, regions with a start index of zero instead of one and entries for which the reference sequence in UniProtKB changed, were corrected and, when necessary, new records were created manually.

Figure 1 shows the distribution of regions based on their length and experimental detection method. Compared to the previous version, the distribution shape has not changed. Secondary methods, which include all ‘detection methods’ terms except ‘missing electron density’ (DO:00130) and ‘nuclear magnetic resonance’ (DO:00120) dominate experiments used to identify longer (>100 residues) regions.

Figure 1.

Figure 1.

Distribution of region length. Regions shorter than 100 residues (left) are binned in groups of 10 residues. Regions longer than 100 (right) are binned in 100 residues. The tick labels indicate the lower bound which is included. Gray bars refer to the previous release (DisProt 7).

The statistics on annotation data for the main branches of the disorder ontology are reported in Figure 2. Only terms one node away from the ontology root are considered and more specific annotations are propagated following the ‘true path rule’, i.e. following the ontology hierarchy, so that parent terms account for children counts.

Figure 2.

Figure 2.

Distribution of disorder annotation terms. Terms belong to the Disorder Ontology and only those one node away from the ontology root are shown. Annotation counts for child terms are propagated to parents up to the root. The dark segments correspond to proteins (left) or residues (right) for which more than one piece of evidence is available. Different ontology aspects (namespaces) are grouped and have different colors.

Different ontology aspects (‘namespace’ field in DisProt records), are shown with different colors. In red the ‘structural state’ terms show as the majority of region records in DisProt are annotated as disordered. Only five proteins are annotated with the ‘order’ term. In the future, curators will be encouraged to also track information about order, in particular when relevant for structural transitions. Transitions are mainly covering folding events (‘disorder to order’), 365 proteins and 36 200 residues, and not the contrary. The majority of interaction partner annotations refers protein and nucleic acid binding. Binding residues are, however, overestimated since in the previous DisProt version, due to hard constraints in the database schema, it was not possible to narrow region boundaries to real interacting positions. Binding positions will become more precise in the future. The new term introduced in DisProt 8, ‘Biological condensation’ (DO:00040) has been assigned to a total of 20 proteins, 29 regions and 2610 residues. The new ‘electron cryomicroscopy’ (DO:00128) term, which is a child of ‘crystallography’, covers 34 proteins, 67 regions and 4726 residues.

Darker segments in Figure 2 indicate the fraction of proteins (left plot) and residues (right plot) for which more than one experimental evidence is available. At the bottom in orange the distribution of ‘Detection methods’ terms. ‘Proteins’ and ‘residues’ distributions have a similar shape. ‘Crystallography’, which is a parent of ‘missing electron density’, covers less residues compared to ‘spectrometry’ and ‘optical analysis’, indicating that regions identified with crystallographic techniques are shorter on average. Moreover, ‘crystallography’ has less residues covered by multiple experimental evidence compared to other techniques. In general, disorder annotation is well supported with 44.4% of disordered proteins and 43.2% of the disordered residues backed by two or more literature references.

DisProt website

The DisProt website has been completely redesigned, improving the user experience, visualization and functionalities. Additionally, a big effort was made to develop the DisProt Application Programming Interface (API) to enable users to retrieve a single entry or a region and to perform advanced searches via RESTful endpoints (URLs). The new API and distribution formats are extensively documented in the help page.

Entry page

The entry page is composed of three main sections. On the top, general information of the protein including name, DisProt ID, organism, sequence length, MobiDB and UniProtKB accession numbers are provided. On the top right, it is possible to select the DisProt version and hide/show ambiguous/obsolete evidence. A download dropdown button allows saving the whole entry data in JSON, TSV (tab-separated) or the corresponding sequence in FASTA format.

A new dynamic feature viewer allows to visualize DisProt evidence mapped onto sequence. The feature viewer shows two tracks by default, DisProt consensus and domains, the latter including Pfam (37) and Gene3D (38) annotation. DisProt consensus is generated by merging region annotation following the hierarchy of the ontology terms. In the last step, when merging the four main ontology branches, priority is given to ‘interaction partner’, ‘structural transition’, ‘structural state’ and ‘disorder function’, respectively.

The feature viewer can be expanded to see sub tracks and it is possible to zoom in and out specific regions, customize the view and download a high quality image. Region tooltips are activated on mouse over and provide detailed information about the corresponding annotation.

Region details are also provided on the bottom of the page, organized in a dynamic list of boxes. A search box, which supports regular expressions, allows to filter the list of regions. The filter is also applied to the feature and sequence viewers (right) in real time, for example, by typing ‘nuclear magnetic resonance’ it is possible to select only region evidence from NMR experiments.

Browsing and searching data

DisProt implements both a database and a BLAST search (39), both available from the ‘browse’ page. The database search allows to compose a query against several fields, which can be combined to meet multiple criteria. All search fields accept regular expressions, and ‘Free text’ allows to search against the entire database content. For example, by searching ‘p53’ in ‘free text’ and ‘homo | mus’ in ‘organism’ will return all human and mouse proteins with the ‘p53’ string somewhere in the corresponding database records (protein name, annotation reference title, etc.). Query results are displayed in the table below the search box. Table columns are customizable and the result can be downloaded in JSON, TSV or FASTA format.

DisProt API

DisProt provides programmatic access to perform a search through REpresentational State Transfer (or RESTful) Web Service API. A single entry or evidence can be retrieved by using DisProt or UniProtKB identifiers. Additionally, a text search against the entire database can be performed by specifying query fields (name, organism, etc.) directly as URL parameters in the HTTP request. JSON, TSV and FASTA formats are supported.

CONCLUSIONS AND FUTURE WORK

In the previous release, DisProt disorder annotations were polished and major errors were fixed but the number of newly annotated proteins was limited. In DisProt 8, disorder annotations doubled and a robust infrastructure has been put in place to leverage and accelerate the annotation process. The database format has been improved to be flexible enough to capture essential information from the literature but, at the same time, keeping disorder representation simple and clear. A new disorder ontology has been formalized with the aim of improving maintenance and data exchange with core data resources. The new ontology is versioned and provides a hierarchy to facilitate term traversal. Article sentences tracking statements about disorder experimental evidence are now captured providing a corpus for the implementation of new text-mining models. New protein examples are used as ground-truth to evaluate prediction methods as in the Critical Assessment of Disorder Annotation (CAID). DisProt long term sustainability is guaranteed by the centrality of DisProt in several initiatives involving large communities of bioinformaticians working on disorder, such as the IDPfun Marie Curie RISE and the ELIXIR IDP User Community.

ACKNOWLEDGEMENTS

DisProt is a service of the Italian ELIXIR node. Part of this work was done in the context of an ELIXIR Implementation Study linked to the ELIXIR Data platform.

FUNDING

Agencia Nacional de Promoción Científica y Tecnológica (ANPCyT) of Argentina [PICT-2015/3367, PICT-2017/1924]; Ministry of Education, Science and Technological Development of the Republic of Serbia [ON173001]; Vetenskapsrådet [2016-03798]; Hungarian National Research, Development, and Innovation Office (NKFIH) [FK-128133]; Italian Ministry of Health Young Investigator Grant [GR-2011-02347754]; Ministerio de Economía y Competitividad (MINECO) [BIO2016-78310-R]; ICREA (ICREA-Academia 2015); Fundação para a Ciência e a Tecnologia (FCT, Portugal); European Regional Development Fund [POCI-01-0145-FEDER-031173, POCI-01-0145-FEDER-029221]; Mexican National Council of Science and Technology (CONACYT) [215503]; Elixir-GR, Action ‘Reinforcement of the Research and Innovation Infrastructure’, Operational Programme ‘Competitiveness, Entrepreneurship and Innovation’ [NSRF 2014-2020]. co-financed by Greece and the European Union (European Regional Development Fund); Hungarian Academy of Sciences [PREMIUM-2017-48]; Carlsberg Distinguished Fellowship [CF18-0314]; Danmarks Grundforskningsfond [DNRF125]; National Research, Development and Innovation Office [K-125340]; Research Foundation Flanders (FWO) [G.0328.16N]; Hungarian Academy of Sciences [LP2014-18]; OTKA [K108798 and K124670]. This project has received funding from the European Union's Horizon 2020 research and innovation programme [778247]. Funding for open access charge: European Union’s Horizon 2020 research and innovation programme [778247].

Conflict of interest statement. None declared.

REFERENCES

  • 1. Romero P., Obradovic Z., Kissinger C.R., Villafranca J.E., Garner E., Guilliot S., Dunker A.K.. Thousands of proteins likely to have long disordered regions. Pac. Symp. Biocomput. 1998; 1998:437–448. [PubMed] [Google Scholar]
  • 2. Wright P.E., Dyson H.J.. Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J. Mol. Biol. 1999; 293:321–331. [DOI] [PubMed] [Google Scholar]
  • 3. van der Lee R., Buljan M., Lang B., Weatheritt R.J., Daughdrill G.W., Dunker A.K., Fuxreiter M., Gough J., Gsponer J., Jones D.T. et al.. Classification of intrinsically disordered regions and proteins. Chem. Rev. 2014; 114:6589–6631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Davey N.E. The functional importance of structure in unstructured protein regions. Curr. Opin. Struct. Biol. 2019; 56:155–163. [DOI] [PubMed] [Google Scholar]
  • 5. Perdigão N., Heinrich J., Stolte C., Sabir K.S., Buckley M.J., Tabor B., Signal B., Gloss B.S., Hammang C.J., Rost B. et al.. Unexpected features of the dark proteome. Proc. Natl. Acad. Sci. U.S.A. 2015; 112:15898–15903. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6. Mistry J., Coggill P., Eberhardt R.Y., Deiana A., Giansanti A., Finn R.D., Bateman A., Punta M.. The challenge of increasing Pfam coverage of the human proteome. Database. 2013; 2013:bat023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Bhowmick A., Brookes D.H., Yost S.R., Dyson H.J., Forman-Kay J.D., Gunter D., Head-Gordon M., Hura G.L., Pande V.S., Wemmer D.E. et al.. Finding our way in the dark proteome. J. Am. Chem. Soc. 2016; 138:9730–9742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Monastyrskyy B., Kryshtafovych A., Moult J., Tramontano A., Fidelis K.. Assessment of protein disorder region predictions in CASP10. Proteins. 2014; 82(Suppl. 2):127–137. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Necci M., Piovesan D., Dosztanyi Z., Tompa P., Tosatto S.C.E.. A comprehensive assessment of long intrinsic protein disorder from the DisProt database. Bioinformatics. 2017; 34:445–452. [DOI] [PubMed] [Google Scholar]
  • 10. Tompa P. The interplay between structure and function in intrinsically unstructured proteins. FEBS Lett. 2005; 579:3346–3354. [DOI] [PubMed] [Google Scholar]
  • 11. Bartels T., Choi J.G., Selkoe D.J.. α-Synuclein occurs physiologically as a helically folded tetramer that resists aggregation. Nature. 2011; 477:107–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Theillet F.-X., Binolfi A., Bekei B., Martorana A., Rose H.M., Stuiver M., Verzini S., Lorenz D., van Rossum M., Goldfarb D. et al.. Structural disorder of monomeric α-synuclein persists in mammalian cells. Nature. 2016; 530:45–50. [DOI] [PubMed] [Google Scholar]
  • 13. Yang J., Gao M., Xiong J., Su Z., Huang Y.. Features of molecular recognition of intrinsically disordered proteins via coupled folding and binding. Protein Sci. 2019; 28:1952–1965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Pricer R., Gestwicki J.E., Mapp A.K.. From fuzzy to function: the new frontier of protein-protein interactions. Acc. Chem. Res. 2017; 50:584–589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Borgia A., Borgia M.B., Bugge K., Kissling V.M., Heidarsson P.O., Fernandes C.B., Sottini A., Soranno A., Buholzer K.J., Nettels D. et al.. Extreme disorder in an ultrahigh-affinity protein complex. Nature. 2018; 555:61–66. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Keul N.D., Oruganty K., Schaper Bergman E.T., Beattie N.R., McDonald W.E., Kadirvelraj R., Gross M.L., Phillips R.S., Harvey S.C., Wood Z.A.. The entropic force generated by intrinsically disordered segments tunes protein function. Nature. 2018; 563:584–588. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Egger S., Chaikuad A., Kavanagh K.L., Oppermann U., Nidetzky B.. Structure and mechanism of human UDP-glucose 6-dehydrogenase. J. Biol. Chem. 2011; 286:23877–23887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Piovesan D., Tabaro F., Paladin L., Necci M., Micetic I., Camilloni C., Davey N., Dosztányi Z., Mészáros B., Monzon A.M. et al.. MobiDB 3.0: more annotations for intrinsic disorder, conformational diversity and interactions in proteins. Nucleic Acids Res. 2018; 46:D471–D476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Mészáros B., Zeke A., Reményi A., Simon I., Dosztányi Z.. Systematic analysis of somatic mutations driving cancer: uncovering functional protein regions in disease development. Biol. Direct. 2016; 11:23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Babu M.M. The contribution of intrinsically disordered regions to protein function, cellular complexity, and human disease. Biochem. Soc. Trans. 2016; 44:1185–1200. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Ruan H., Sun Q., Zhang W., Liu Y., Lai L.. Targeting intrinsically disordered proteins at the edge of chaos. Drug Discov. Today. 2019; 24:217–227. [DOI] [PubMed] [Google Scholar]
  • 22. Hu G., Wu Z., Wang K., Uversky V.N., Kurgan L.. Untapped potential of disordered proteins in current druggable human proteome. Curr. Drug Targets. 2016; 17:1198–1205. [DOI] [PubMed] [Google Scholar]
  • 23. The UniProt Consortium UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 2019; 47:D506–D515. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Piovesan D., Tabaro F., Mičetić I., Necci M., Quaglia F., Oldfield C.J., Aspromonte M.C., Davey N.E., Davidović R., Dosztányi Z. et al.. DisProt 7.0: a major update of the database of disordered proteins. Nucleic Acids Res. 2017; 45:D1123–D1124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Gouw M., Michael S., Sámano-Sánchez H., Kumar M., Zeke A., Lang B., Bely B., Chemes L.B., Davey N.E., Deng Z. et al.. The eukaryotic linear motif resource – 2018 update. Nucleic Acids Res. 2018; 46:D428–D434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Schad E., Fichó E., Pancsa R., Simon I., Dosztányi Z., Mészáros B.. DIBS: a repository of disordered binding sites mediating interactions with ordered proteins. Bioinformatics. 2018; 34:535–537. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Fichó E., Reményi I., Simon I., Mészáros B.. MFIB: a repository of protein complexes with mutual folding induced by binding. Bioinformatics. 2017; 33:3682–3684. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Necci M., Piovesan D., Tosatto S.C.E.. Where differences resemble: sequence-feature analysis in curated databases of intrinsically disordered proteins. Database. 2018; 2018: [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Shin Y., Brangwynne C.P.. Liquid phase condensation in cell physiology and disease. Science. 2017; 357:eaaf4382. [DOI] [PubMed] [Google Scholar]
  • 30. Necci M., Piovesan D., Tosatto S.C.E.. Large-scale analysis of intrinsic disorder flavors and associated functions in the protein sequence universe. Protein Sci. 2016; 25:2164–2174. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Piovesan D., Tosatto S.C.E. INGA 2.0: improving protein function prediction for the dark proteome. Nucleic Acids Res. 2019; 47:W373–W378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Smith B., Ashburner M., Rosse C., Bard J., Bug W., Ceusters W., Goldberg L.J., Eilbeck K., Ireland A., Mungall C.J. et al.. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 2007; 25:1251–1255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Smith M.K., Welty C., McGuinness D.L.. 2004; OWL Web Ontology Language Overview.
  • 34. Mottin L., Gobeill J., Pasche E., Michel P.-A., Cusin I., Gaudet P., Ruch P.. neXtA5: accelerating annotation of articles via automated approaches in neXtProt. Database. 2016; 2016:doi:10.1093/database/bay127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Europe PMC consortium. Europe PMC: a full-text literature database for the life sciences and platform for innovation. Nucleic Acids Res. 2015; 43:D1042–D1048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Linden M., Prochazka M., Lappalainen I., Bucik D., Vyskocil P., Kuba M., Silén S., Belmann P., Sczyrba A., Newhouse S. et al.. Common ELIXIR Service for Researcher Authentication and Authorisation [version 1; peer review: 3 approved, 1 approved with reservations]. F1000Research. 2018; 7:1199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. El-Gebali S., Mistry J., Bateman A., Eddy S.R., Luciani A., Potter S.C., Qureshi M., Richardson L.J., Salazar G.A., Smart A. et al.. The Pfam protein families database in 2019. Nucleic Acids Res. 2019; 47:D427–D432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Lewis T.E., Sillitoe I., Dawson N., Lam S.D., Clarke T., Lee D., Orengo C., Lees J.. Gene3D: Extensive prediction of globular domains in proteins. Nucleic Acids Res. 2018; 46:D435–D439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J.. Basic local alignment search tool. J. Mol. Biol. 1990; 215:403–410. [DOI] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES