Skip to main content
. 2020 May 18;37(10):2838–2856. doi: 10.1093/molbev/msaa122

Fig. 4.

Fig. 4.

Benchmarking three different algorithms for BGC detection. (a) Proportional Venn diagram of distinct and overlapping BGC genes of interest detected by SMURF, antiSMASH, and CO-OCCUR. SMURF and antiSMASH use pHMMs to identify clustered genes of interest, whereas CO-OCCUR uses linkage-based criteria (see Materials and Methods). Clustered genes (unparenthesized) and secondary metabolism biosynthesis, transport, and catabolism-clustered genes (fuNOG) detected are indicated for each algorithm/combination. (b) Complementary recovery of the cercosporin BGC using antiSMASH and CO-OCCUR. Shading of genes in the Cercospora zeae-maydis cercosporin BGC (MIBiG ID BGC0001541; recovered clusterID Cerzm1_BGC0001541_h92 in supplementary table SG, Supplementary Material online) indicates genes identified by antiSMASH (blue), CO-OCCUR (yellow), or both algorithms (green). Gene names are as in de Jonge et al. (2018) and those required for cercosporin biosynthesis (Chen et al. 2007Newman and Townsend 2016; de Jonge et al. 2018) are indicated with an asterisk. (c) Gene recovery and discovery in clusters homologous to known BGCs. Scatterplots show the percent of genes recovered (top) or discovered (bottom) by antiSMASH versus CO-OCCUR at each locus homologous to a MIBiG BGC (search criteria: minimum three-gene cutoff; minimum of 75% genes similar to MIBiG BGC genes in locus). Percent recovery is defined as the number of genes identified by BlastP in an algorithm-identified cluster divided by the size of the BlastP identified BGC, multiplied by 100. Percent discovery is defined as the number of genes identified by the cluster algorithm but not identified in the BlastP search, divided by the size of the BlastP identified BGC, multiplied by 100. y = x at the dotted reference line.