Skip to main content
. 2022 Jan 31;23:39. doi: 10.1186/s13059-022-02610-4

Fig. 2.

Fig. 2

Schematic representation of AGAMEMNON’s quantification engine. Each black line indicates a microbial genome. In this example, most reads are unambiguously aligned to a single genome (shown as short green lines), while 6 reads map to multiple genomes (rounded red, turquoise, purple, orange, gray, and yellow boxes). Each EM step consists of K iterations (default k = 10). In the first EM step and first iteration, multi-mapping reads are equally partially assigned to all the genomes that they align against. For example, the turquoise read that maps to three genomes, G2, G3 and G4, is assigned a base coverage/probability of 0.33 in each (shown by the same opacity of color in EM Step, first iteration). During EM, read assignments are resolved through iterations of reassigning the reads based on the abundance of the genomes/strains observed in the previous iteration. In each iteration, the quantification of each strain, as estimated based on the current read assignment, is used as the prior for multi-mapping read assignment in the subsequent iteration. Following each EM step (i.e., K iterations), the set-cover step is also adopted, in order to resolve special multi-mapping cases that are unsolvable by the EM, called “multi-mapping islands.” These are groups of highly similar strains with low abundance for which all reads are multi-mapped making it infeasible for EM to prioritize one strain over another, leading to reporting the whole group of strains with small abundances, while only few of them exist in the sample of interest, introducing false positives. The EM step - set-cover step is a looping process until set-cover is unable to remove any further genomes in which case, EM process iterates until termination. In the last step of the procedure, all the genomes with abundance values lower than a predefined cutoff are removed. In the figure’s example, the process starts with six genomes (G1–G6). Throughout the iterations of the first EM step, the read probabilities change but all six genomes remain in the quantification process. When the first EM step is over, the model continues with the first set-cover step. In the set-cover step, only the genomes in which all reads are multi-mapped will be taken into consideration (i.e., G4, G5, G6). Through the set-cover process, we will keep only genome G4 and remove genomes G5 and G6 aiming for minimum number of strains that explain all multi-mapping reads. In the second EM step (not shown in the figure), only genomes G1–G4 will participate in the process. Subsequently, in this particular example, the set-cover step will never be called again because there are no multi-mapping islands left in the reference. Thus, the EM process will iterate until termination. Finally, after the whole EM process is done, the heuristic removal step will further remove the genomes whose abundance is equal to or less than 2 reads, and thus, in this example, genome G1 will also be removed before reporting the final quantification results