Skip to main content
. 2017 Mar 20;34(6):1291–1306. doi: 10.1093/molbev/msx095

Fig. 1.

Fig. 1.

Potts model is predictive of higher order sequence statistics. For each subsequence length varying from 2 to 14, subsequence frequencies determined by counting occurrences in the MSA are computed for all observed subsequences at 500 randomly chosen combinations among 36 PI-associated positions. (A) Pearson R2 of the 200 most probable observed subsequence frequencies (marginals) with corresponding predictions by Potts (blue) and independent (gray) models for varying subsequence lengths. The dashed line represents perfect correlation R2=1. (B) Second and (C) 14th order observed marginals predicted by both models. Shown in (B,C) are observed frequencies at the 500 randomly chosen combinations of 2 and 14 positions among 36 PI-associated sites, with ∼2500 and 5600 subsequence frequencies >0.01 visible, respectively.