Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2018 Jan 9;114(1):21–31. doi: 10.1016/j.bpj.2017.10.028

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2017 Biophysical Society.

PMC Copyright notice

Subsequence frequency predictions. (A) Predicted subsequence frequencies for a set of seven positions known to be important for kinase activity, compared to the data set frequencies. The Potts distribution (top) models the observed distribution well, in contrast to the independent model (bottom). (B) Average correlation between observed and predicted frequencies for the top 20 subsequences for large samples of subsequences of varying length, for observed subsequence frequencies with the Potts model (blue), and with the independent model (red, dotted). Circles show the means, and error bars show the range of first to third quartile values (25–75% of sets of positions). The dashed line (black) is an estimate of the expected correlation due only to finite sampling, computed by comparing the subsequence frequencies of a finite synthetic data set MSA of size 8149 to the frequencies of a large MSA of $4 \times 10^{6}$ sequences generated from a Potts model fitted to the synthetic MSA of size 8149. Both the trend and range of the expected correlations due to the effects of the sample size (8149) are consistent with the correlation between the observed frequencies and those predicted by the Potts model. To see this figure in color, go online.