Figure 1:
Pfam code agreement as a proxy for predicted protein structure quality. A query peptide sequence is submitted to I-TASSER software to predict its 3D structure (colored blue). Metrics describing the predicted structure (“model”) are extracted for downstream analysis. The model is compared with empirically determined protein crystal structures available in the PDB using TM-align, from which the closest structural homologue is identified ("reference"; colored red). Metrics describing the alignment are also extracted. Pfam codes are assigned to primary peptide sequences that constitute the model and reference structures using InterPro Scan software (lower right side). The presence of at least one matching Pfam code assigned to the query and reference peptides (“PFAM match”) indicates greater likelihood of structural similarity between the model and the reference. Models with this feature are assigned as “high-confidence.” The ability of each extracted metric (“Feature”) to predict the high-confidence category (“Factor”) is assessed, and then a RF classifier is trained to identify the factor using all available features.