Table 1.
Summary of data enrichment collaborations
| Enriched data | Enrichment process | Main outcomes |
|---|---|---|
| Mapping to Rfam | • Data incorporated into the PDBe database and displayed on relevant PDBe entry pages. • Data integrated into the PDBe REST API, search and in visualization, via web components. • Links to Rfam resource on PDBe entry pages. • Rfam data resource was updated where missing mapping information was identified. |
• Assignment of over 5000 RNA chains to 98 Rfam families in >1500 structures in the PDB. • Improved findability of RNA structures. • Easier identification of RNA function in PDB entries. |
| Identification of cofactors | • PDBe worked with the Thornton group at EMBL-EBI to set up a process to identify cofactor and cofactor-like molecules in the PDB. • Cofactors grouped by class. • The process verifies that such molecules are bound to enzymes associated with the correct cofactor class. • Data made available via PDBe REST API, on PDBe entry pages and search. |
• Identified 417 unique cofactor-like small molecules. • Annotation of over 78 000 PDB entries containing enzymes, 12 000 contain cofactor or cofactor-like molecules, representing over 1500 different Enzyme Commission numbers. |
| Preliminary Pfam domain assignments | • Collaboration with Pfam team to implement provisional domain-assignment process at PDBe. • Pfam domains assigned in newly released PDB entries, in advance of Pfam database release. • PDBe integrates this data, enabling Pfam domain assignments for PDB entries immediately after release. |
• As of September 2019, over 16 000 PDB entries had Pfam domains assigned which are not in the official latest Pfam release (V32.0, September 2018). • This includes over 3100 distinct Pfam domains. |
| Preliminary CATH domain assignments (CATHb) | • CATHb data released by CATH team provides preliminary CATH structural domain assignment for PDB structures on a weekly basis. • PDBe integrates this data, enabling structure-domain assignments for PDB entries immediately after release. |
• As of August 2019, around 30 000 new entries have CATH domains assigned which are not in the official full CATH release (V4.2, September 2017). |
| Standardized information on crystallographic cells dimensions (NIGGLI) | • Standardization of cell dimensions using Niggli reduction (27). • Standardized cell dimensions made available through PDBe search API. |
• Standardized cell dimensions in PDBe's search used by Phaser (28). |