Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2021 May 17;49(13):e78. doi: 10.1093/nar/gkab356

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

PMC Copyright notice

Figure 1. — Workflow for predicting binding positions in protein domains. (A) For a given protein domain (represented as an HMM), instances of the domain are found across all human proteins and aggregated to construct per-position features. The zf-C2H2 domain (PF00096) is shown as an example. (B) For instances of the domain, features for each position are calculated at either the DNA base, protein amino acid or whole-domain level. The figure illustrates a few example features calculated at each level. DNA-level features (left) include population allele frequencies and evolutionary conservation. Protein amino acid-level features (middle) include amino acid identity, information derived from predicted structure (e.g., secondary structure and solvent accessibility), amino acid conservation across orthologs, and physicochemical properties of the amino acids. Domain-level features (right) include the HMM emission probabilities and the predominant amino acid at each position. (C) Features from the three levels are aggregated across instances for each protein domain position. (D) Using these features and a set of known DNA, RNA, ion, peptide and small molecule binding positions within domains, we train a heterogeneous ensemble of classifiers to identify positions binding each ligand type. (E) Results from these classifiers are combined and each final model outputs per-domain-position binding scores for one of the ligand types (Figure generated using biorender.com).