Table 1.
Category | Viral Feature | Data type | Reason for inclusion |
---|---|---|---|
Host-driven | Mean phylogenetic distance between hosts | Continuous | Capturing phylogenetic and ecological distances between each virus’ known hosts and each mammal in our study. |
Mean ecological distance between hosts | |||
Maximum phylogenetic breadth4 | Greater phylogenetic breadth indicates more generalist potential of the virus. | ||
Virus genome & capsid | RNA | Binary | RNA viruses mutate/adapt faster65, and are generally deactivate quickly when exposed to the environment. |
Retro-transcribing | Retroviruses are generally very conserved66, have to enter the nucleus67 and insert into the genome. Additional steps may require specificity and limit range. | ||
Negative sense/positive sense | Sense affects replication cycle and range of host enzymes needed. | ||
Circular/linear | Circular/linear genome affects enlisting host enzymes for replication and translation68. | ||
Monopartite/segmented | Segmented viruses can undergo recombination if two strains of the same virus infect a cell69. This can lead to host range changes of segments of the genome. | ||
Enveloped | Envelopes are derived from the host cell membrane, so can affect specific-host immune activation. Enveloped viruses deactivate rapidly in the external environment (often requiring direct transfer). The envelope will change upon infection of a new host70. | ||
GC-content | Continuous | High GC content usually leads to higher thermo-stability of the genome71. | |
Genome size |
Genome size is indicative of many aspects of the virus such as complexity, DNA/RNA, and replication type. Replication site is linked to RNA/DNA genome – if a virus has a DNA stage it must replicate in the nucleus and overcome additional cell barriers. |
||
Virus replication, release, and cell entry | Cytoplasm | Binary | |
Release | Categorical | Affects rate of virus production, cell life-span and means of presentation to the immune system72. | |
Cell entry | Availability of receptors influences potential host range. | ||
Transmission routes | 8 main transmission routes | Binary for each route | Route(s) of transmission affected by structure/stability of virus and nature of interaction between potential hosts. |
We trained a suite of models for each mammalian species with two or more known viruses (n = 699). Each model comprised the below described features (response variable = 1 if the virus is known to associate with the focal mammalian species, 0 otherwise – methods section provides further details). Full description of these features, their sources and justification are listed in Supplementary Note 2.