Zhang et al. 10.1073/pnas.0509379103.

Supporting Information

Files in this Data Supplement:

Supporting Figure 5
Supporting Material and Methods
Supporting Figure 6
Supporting Figure 7
Supporting Figure 8
Supporting Table 1
Supporting Figure 9
Supporting Figure 10
Supporting Figure 11
Supporting Figure 12
Supporting Table 2




Supporting Figure 5

Fig. 5. Schematic representation of the H-bond potential, EHB. A H-bond is counted when the four atom pairs are within a square well, eliminating the need for angle calculations and increasing computational efficiency. The indicated distances are as follows: d1 is between the donor nitrogen and acceptor oxygen; d2 is between the donor nitrogen and acceptor carbonyl carbon; d3 is between the donor hydrogen and acceptor oxygen; and d4 is between the donor hydrogen and acceptor carbonyl carbon. The interaction energies for specific distance parameters are given in the accompanying file, tabulated in the format d1 d2 d3 d4 E.





Supporting Figure 6

Fig. 6. Histogram of occurrence of loops, helices, b-strands, and coils at the tails (N and C termini) found in a representative set of 1,489 proteins in the PDB.





Supporting Figure 7

Fig. 7. Illustration of geometry of H-bond donor and receptor residues in CAS model. and are unit Ca-Ca vectors. is the unit bisector vector and is the unit normal vector. is a vector pointing from donor Ca atom to receptor Ca atom.





Supporting Figure 8

Fig. 8. Histogram of the TM-score of the best structural alignment to the PDB template library of 158 distinct compact FJC conformations of 200 residue chains. The mean and median are 0.2963 and 0.2956, respectively.





Supporting Figure 9

Fig. 9. Superposition of the TASSER model (thick backbone) with 1fjgl (thin backbone); this is the only case where TASSER generates models with rmsd > 6.5 Å because of the misorientation of the N-terminal tail.





Supporting Figure 10

Fig. 10. Representative example, 1at0_, of TASSER modeling improvement. (Left) Superposition of the homopolypeptide structure template (thick backbone) and the native (thin backbone). (Right) Superposition of the TASSER model (thick backbone) and the native (thin backbone).





Supporting Figure 11

Fig. 11. An example, 1nkws, where TASSER generates a model having a higher rmsd to native than the initial structural alignment because of misorientation of the dangling tail. The TASSER model and the target are indicated by thick and thin backbones, respectively.





Supporting Figure 12

Fig. 12. Detection of substructures resembling enzyme active sites. Relative frequency distributions of the drmsd of selected AFTs (identified by their associated EC numbers, left) to the best sequence-independent hit detected in 750 sticky homopolypeptides (magenta boxes) or 750 native structures functionally unrelated to the specified AFT (blue boxes). The statistics represented in the box-and-whisker plots are as follows: 10th percentile (whisker, left), 25th percentile (box, left), median (thick line), 75th percentile (box, right), 90th percentile (whisker, right) and outliers (circles). Only the top 10 AFTs are shown. They are ranked by increasing median drmsd of the best hit in sticky homopolypeptides. Restrictive (pointed red line) and permissive AFT cutoffs (pointed green line) are plotted as references to assess the significance of a sequence-dependent match to an AFT.





Table 1. TASSER modeling of the 10 proteins in the PDB150 set with the worst matches to structures in the 15,000-member, compact homopolypeptide library

 

Homopolypeptide templates selected by

TM-ALIGN

Final models by

TASSER

ID*

Model_A

Lch

Coverage§

TM-score_A

R_A_ali||

TM-score**

R_all††

R_ali‡‡

1at0_

b10R_70

142

0.62

0.371

4.46

0.874

2.05

2.13

1fjgl

b49R_85

125

0.70

0.371

5.01

0.502

7.33

5.61

1gqva

ab23R_30

135

0.64

0.369

5.00

0.763

4.0

2.65

1i1ja

b50R_19

106

0.67

0.364

4.68

0.518

5.96

3.92

1khia

ab9R_6

147

0.63

0.362

5.07

0.591

6.13

5.03

1knma

b14R_77

129

0.66

0.371

4.64

0.512

5.11

4.69

1mi8A

b11R_42

141

0.62

0.365

4.43

0.605

5.22

3.62

1nkws

a6R_77

113

0.60

0.363

4.37

0.480

6.14

6.27

1urk_

a46R_1

130

0.64

0.372

4.95

0.538

5.73

4.79

2ila_

ab33R_71

145

0.66

0.357

5.25

0.775

3.49

2.80

<...>

131

0.64

0.366

4.79

0.616

5.11

4.15

*Ten PDB proteins in the PDB150 set that have the worst match (lowest TM-score) to the homopolypeptide model.

Templates found by TM-ALIGN in the 15,000-member compact, homopolypeptide structure library.

Size of the PDB target.

§

Coverage of the TM-ALIGN structural alignments of the homopolypeptide model to the PDB target structure.

TM-score of TM-ALIGN of the homopolypeptide model to the PDB target structure.

||

rmsd of the structurally aligned regions of the homopolypeptide model compared with the target PDB structure.

**TM-score of the first models built by TASSER (ranked by cluster density) started from homopolypeptide templates.

††

rmsd to native of the TASSER models over all residues of the target protein.

‡‡

rmsd to native of the TASSER models over the same structure aligned region as the compact, homopolypeptide template.



Table 2. H-bond parameters calculated from 100 high-resolution PDB structures

bba 0.45

cca
0.1

ra
6.03 Å

ppa
0

qqa
0

cca0 0.4

bba0 0.815

bra0 1.56 Å

bbb 0.25

ccb 0.4

rb
6.15 Å

ppb 0.35

qqb 0.35