Table 4.
Summary of pangene clusters obtained for datasets ACK2 and rice3 and the corresponding orthogroups in Ensembl Plants. Core clusters contain genes from all analyzed genomes; in rice, shell clusters contain genes from two species. BUSCO completeness percentages for core sets are shown in parentheses. Clusters with multiple copies have several genes from the same species. gDNA segments are shell clusters that bring together a gene model and a matching genomic segment from the underlying WGA. Column ‘match Compara’ shows the number of pangene clusters that contain the same genes as the corresponding Compara orthogroups. The last column shows the number of pangene clusters that contain sequences that share an InterPro domain (the number in square brackets is for core clusters only)
Dataset | Core clusters [%BUSCO] | Multiple copies | Shell clusters | gDNA segments | Match Compara | Share InterPro domains | |
---|---|---|---|---|---|---|---|
Compara orthogroups | ACK2 | 20,192 [90.6] | 161 | [18,259] | |||
minimap2 clusters | ACK2 | 20,647 [94.1] | 731 | 18,245 | [18,792] | ||
GSAlign clusters | ACK2 | 16,476 [74.9] | 454 | 14,181 | [14,817] | ||
Compara orthogroups | rice3 | 13,020 [65.6] | 219 | 6386 | 16,766 [11,571] | ||
minimap2 clusters | rice3 | 22,880 [85.2] | 3360 | 7825 | 6521 | 18,281 | 23,062 [19,239] |
GSAlign clusters | rice3 | 20,399 [84.6] | 2885 | 9730 | 6103 | 17,103 | 22,834 [17,135] |