Zhang et al. 10.1073/pnas.0509379103. |
Fig. 5. Schematic representation of the H-bond potential, EHB. A H-bond is counted when the four atom pairs are within a square well, eliminating the need for angle calculations and increasing computational efficiency. The indicated distances are as follows: d1 is between the donor nitrogen and acceptor oxygen; d2 is between the donor nitrogen and acceptor carbonyl carbon; d3 is between the donor hydrogen and acceptor oxygen; and d4 is between the donor hydrogen and acceptor carbonyl carbon. The interaction energies for specific distance parameters are given in the accompanying file, tabulated in the format d1 d2 d3 d4 E.
Fig. 6. Histogram of occurrence of loops, helices, b-strands, and coils at the tails (N and C termini) found in a representative set of 1,489 proteins in the PDB.
Fig. 7. Illustration of geometry of H-bond donor and receptor residues in CAS model. and are unit Ca-Ca vectors. is the unit bisector vector and is the unit normal vector. is a vector pointing from donor Ca atom to receptor Ca atom.
Fig. 8. Histogram of the TM-score of the best structural alignment to the PDB template library of 158 distinct compact FJC conformations of 200 residue chains. The mean and median are 0.2963 and 0.2956, respectively.
Fig. 9. Superposition of the TASSER model (thick backbone) with 1fjgl (thin backbone); this is the only case where TASSER generates models with rmsd > 6.5 Å because of the misorientation of the N-terminal tail.
Fig. 10. Representative example, 1at0_, of TASSER modeling improvement. (Left) Superposition of the homopolypeptide structure template (thick backbone) and the native (thin backbone). (Right) Superposition of the TASSER model (thick backbone) and the native (thin backbone).
Fig. 11. An example, 1nkws, where TASSER generates a model having a higher rmsd to native than the initial structural alignment because of misorientation of the dangling tail. The TASSER model and the target are indicated by thick and thin backbones, respectively.
Fig. 12. Detection of substructures resembling enzyme active sites. Relative frequency distributions of the drmsd of selected AFTs (identified by their associated EC numbers, left) to the best sequence-independent hit detected in 750 sticky homopolypeptides (magenta boxes) or 750 native structures functionally unrelated to the specified AFT (blue boxes). The statistics represented in the box-and-whisker plots are as follows: 10th percentile (whisker, left), 25th percentile (box, left), median (thick line), 75th percentile (box, right), 90th percentile (whisker, right) and outliers (circles). Only the top 10 AFTs are shown. They are ranked by increasing median drmsd of the best hit in sticky homopolypeptides. Restrictive (pointed red line) and permissive AFT cutoffs (pointed green line) are plotted as references to assess the significance of a sequence-dependent match to an AFT.
Table 1. TASSER modeling of the 10 proteins in the PDB150 set with the worst matches to structures in the 15,000-member, compact homopolypeptide library
| Homopolypeptide templates selected by TM-ALIGN | Final models by TASSER | ||||||
ID* | Model_A | Lch | Coverage§ | TM-score_A | R_A_ali|| | TM-score** | R_all | R_ali |
1at0_ | b10R_70 | 142 | 0.62 | 0.371 | 4.46 | 0.874 | 2.05 | 2.13 |
1fjgl | b49R_85 | 125 | 0.70 | 0.371 | 5.01 | 0.502 | 7.33 | 5.61 |
1gqva | ab23R_30 | 135 | 0.64 | 0.369 | 5.00 | 0.763 | 4.0 | 2.65 |
1i1ja | b50R_19 | 106 | 0.67 | 0.364 | 4.68 | 0.518 | 5.96 | 3.92 |
1khia | ab9R_6 | 147 | 0.63 | 0.362 | 5.07 | 0.591 | 6.13 | 5.03 |
1knma | b14R_77 | 129 | 0.66 | 0.371 | 4.64 | 0.512 | 5.11 | 4.69 |
1mi8A | b11R_42 | 141 | 0.62 | 0.365 | 4.43 | 0.605 | 5.22 | 3.62 |
1nkws | a6R_77 | 113 | 0.60 | 0.363 | 4.37 | 0.480 | 6.14 | 6.27 |
1urk_ | a46R_1 | 130 | 0.64 | 0.372 | 4.95 | 0.538 | 5.73 | 4.79 |
2ila_ | ab33R_71 | 145 | 0.66 | 0.357 | 5.25 | 0.775 | 3.49 | 2.80 |
<...> | 131 | 0.64 | 0.366 | 4.79 | 0.616 | 5.11 | 4.15 |
*Ten PDB proteins in the PDB150 set that have the worst match (lowest TM-score) to the homopolypeptide model.
Templates found by TM-ALIGN in the 15,000-member compact, homopolypeptide structure library.
Size of the PDB target.§
Coverage of the TM-ALIGN structural alignments of the homopolypeptide model to the PDB target structure. TM-score of TM-ALIGN of the homopolypeptide model to the PDB target structure.||
rmsd of the structurally aligned regions of the homopolypeptide model compared with the target PDB structure.**TM-score of the first models built by TASSER (ranked by cluster density) started from homopolypeptide templates.
rmsd to native of the TASSER models over all residues of the target protein.
rmsd to native of the TASSER models over the same structure aligned region as the compact, homopolypeptide template.Table 2. H-bond parameters calculated from 100 high-resolution PDB structures
bba 0.45 | cca | ra | ppa | qqa | cca0 0.4 | bba0 0.815 | bra0 1.56 Å |
bbb 0.25 | ccb 0.4 | rb | ppb 0.35 | qqb 0.35 |