Skip to main content
. 2008 Dec 16;37(3):815–824. doi: 10.1093/nar/gkn981

Figure 5.

Figure 5.

Selecting an optimal proportion of pseudocounts using the MDL principle. For n = 500 and the observed frequencies f listed in Table 3, we apply pseudocounts as implied by the BLOSUM-62 substitution matrix. We use Equation (5) to compute the change in the description length of the data, when compared to the description length of the data at Inline graphic, for α between 0 and 0.1. The dot-dashed curve (in red) shows the increase in the description length of the data. The dashed curve (in blue) shows the decrease in the description length of the model. The total decrease in the description length, shown by the solid curve (in black), is maximized at Inline graphic, which corresponds to 19.4 pseudocounts.

HHS Vulnerability Disclosure