Fig. 2.
Correlation between the mol% G+C content of mobile dfrA (red circles) and sul (green squares) genes and that of their host genome. Large open circles/squares denote representatives of clusters of redundant sequences (identity >90 %), and dfrA genes from clade 1 and clade 2 in Fig. 1 are marked with an additional corona. A 0.75 % jitter to both x- and y-axis values has been applied for visualization purposes. The red line shows the linear regression for representative dfrA gene values. The Pearson R2 coefficient is superimposed. Vertical background bars in (a) designate DfrA sequences harboured by mobile genetic elements (MGEs) identified in E. coli and K. pneumoniae isolates, which are heavily overrepresented in the dataset. Sequences from clusters with more than 100 sequences (represented by dfrA12, dfrA5 and dfrA1) are shown with specific markers, and highlighted by horizontal background bars. The number of MGEs identified as harbouring dfrA genes, before and after filtering DfrA sequence identity (>90 %), is shown in (b).