Skip to main content
. 2020 Dec 4;11:6217. doi: 10.1038/s41467-020-19940-1

Fig. 4. A simple genome evolution model can generate GCNs that capture key topological features of the real GCN.

Fig. 4

a Schematic diagram of the genome evolution model. At each time step t, the genome of a species i (shown in red) randomly chosen with probability proportional to kih will be updated based on one of the following three events: gene loss, gene gain, and horizontal gene transfer (HGT), with corresponding rates qgl, qgg, qHGT, respectively. Note that the parameter h ≥ 0 representing the selection pressure, and h = 0 corresponds to the case of neutral model. The three rates naturally satisfy qgl + qgg + qHGT = 1. During HGT, a gene a from a randomly chosen donor species is randomly selected and then transferred to the genome of species i. During gene loss, a gene a in the genome of species i is randomly selected and then removed. During gene gain, a new gene is added to the genome of species i. The initial GCN is a random bipartite graph that consists of 500 species and 200 genes with connection probability 0.8. The total number of evolution time steps is 5 × 105. be The incident matrix of the final GCN (with nestedness value NODF = 0.703), functional distance, species degree and gene degree distributions with h = 2, qHGT = 0.795, qgg = 0.005, and qgl = 0.2 (See Supplementary Figs. 12, 13 for those topological features with other model parameters). f The nestedness (quantified by NODF) of the final GCN calculated as a function of HGT rate with different selection pressure h = 0,2,4. g The Kullback–Leibler (KL) divergence between the normalized gene degree distribution P(k~gene) of real GCN and that of the simulated GCNs calculated with different selection pressures and HGT rates as shown in f. Here the normalized gene degree k~genekgene/kgenemax.