Skip to main content
. 2022 Mar 21;13(2):e03297-21. doi: 10.1128/mbio.03297-21

FIG 2.

FIG 2

Streptomyces AZL proteins are found in diverse uncharacterized biosynthetic gene clusters. (A) Schematic depicting the workflow for identification of HTH_42 homologs in uncharacterized Streptomyces BGCs. Homologs were identified through the presence of the catalytic motif (red text in sequence alignment). The amino acid numbering is in relation to S. sahachiroi AlkZ. The corresponding Streptomyces genomes were input into antiSMASH, from which genomic distances between YQL/AZL and the nearest BGC as well as homologous clusters were extracted. (B) Violin plot showing the distribution of distances of YQL (n = 167) and AZL (n = 154) genes to the nearest BGC (in kbp; ±100 kb). The dotted line at 0 kb represents the 5′ (+)/3′ (−) termini of the nearest BGC. Thick and thin dashed lines within the plot represent the median and upper/lower quartiles, respectively. The chi-square significance (P) value between YQL and AZL data is less than 0.0001. (C) Frequency of various types of BGCs in which AZL genes were found (n = 68 clusters identified). The y axis denotes the natural product/scaffold type to which that cluster is most homologous. Black bars represent known DNA alkylators or DNA interacting metabolites, and hashed bars represent potential DNA-damaging metabolites. Lowercase letters to the right of the bars correspond to structures shown in panel D. (D) Representative compounds corresponding to BGC types in panel C. Potential reactive sites are colored red. LL-D4919α1 and hedamycin structures are shown in Fig. 3. (E and F) Nearest-neighbor analysis of AZL (E) and YQL (F). (E) Nearest genes to AZL proteins found inside and outside clusters, shown as the ratio of GO terms inside and outside and grouped by function (blue, metabolic; green, cell signaling and function; orange, genome maintenance). (F) Representative example from Streptomyces griseoviridis of nearest neighbor analysis for YQL proteins. Genes are colored according to function as in panel E (gray, unknown/hypothetical gene). These genes are invariant for all YQL proteins, with the exception of the outermost genes, in which only one instance of variance was observed.