Figure 8.
For a larger multiple sequence alignment, the mean-field approximation to the log posterior (bottom row) converges much more quickly than the pair marginal estimate, despite the fact that the indel model used includes neighbour-dependent terms. This is due to the fact that column marginals can be estimated more reliably than pair marginals, combined with the fact that allowing crossovers in the DAG results in a higher effective sample size (see Figure 5). Results shown for the simulated dataset described later in the main text, using the TKF92 indel model [17]. In this case the true posterior probability cannot be computed analytically, but the log likelihood (conditional on specific values of the other unknown parameters) is known. Since the log likelihood is expected to be linearly related to the log posterior, convergence can be gauged approximately by assessing the fit to a relationship of y=x+k (overlaid in red, with k, the approximate normalising constant, chosen to match the distribution to which the mean-field approximation converges, here k=−9420).