Fig 1. Evolution of catarrhini-infecting CMVs.
(A) A maximum-likelihood tree of the full-length amino acid sequence of alkaline nuclease (encoded by the core gene UL98) is drawn to exemplify phylogenetic relationships among primate CMVs (CCMV, Chimpanzee cytomegalovirus; DrCMV, Drill monkey cytomegalovirus; BaCMV, Chacma baboon cytomegalovirus; SCMV, Simian cytomegalovirus; CyCMV, Cynomolgus macaque cytomegalovirus; RhCMV, Rhesus macaque cytomegalovirus; SaHV4, Squirrel monkey cytomegalovirus; AoHV1, Owl monkey cytomegalovirus). Murine CMV (murid herpesvirus 1) was used as the outgroup and the tree was constructed using RAxML (version 8.2.12) [19]. Asterisks denote viruses that were included in the analysis of selective patterns of catarrhini-infecting CMVs. (B) Whole-genome alignment of four representative primate CMVs obtained with progressive MAUVE. Each genome is laid out in a horizontal track, with annotated coding regions shown as boxes (white: core genes, gray: non-core genes); repetitive elements are shown as orange boxes. A colored similarity plot generated by progressive MAUVE is also shown: each colored block delimits a genome region that aligns to part of another genome (presumably homologous and free from internal rearrangements) and thus represents a locally collinear block. A similarity profile is plotted within blocks, with its height proportional to the average level of conservation in that region. White areas correspond to regions that could not be aligned. When the similarity plot points downward it indicates an alignment to the reverse strand of the genome. The location of genes belonging to the US22, US12, RL11, and US6 families is shown. (C) Phylogenetic relationships for large gene families. The protein sequences of family homologs were searched for as described in the Materials and Methods. Phylogenetic trees were constructed using RAxML with 1000 bootstrap replicates (reported at nodes). Orthologous gene groups, shown in red on the tree and denoted by the gray shading, were inferred on the basis of the tree topology and of bootstrap values > 90. Magenta asterisks denote genes that are frequently deleted/mutated in clinical isolates [16]. (D) Analysis of selective patterns. The dN/dS parameter is compared among genes showing different levels of sequence conservation and distinct growth phenotypes (upper panels). Growth phenotypes in human fibroblasts were obtained from a previous work [11] that merged data from two systematic analyses of gene disruption [18, 20]. Statistical significance was assessed by Kruskal-Wallis tests followed by Nemenyi tests as post-hocs (reported in the figure). In the lower panels, genes are grouped based on function. Functional categories were derived from a previous annotation effort that combined multiple information sources [11]. p values derive from Wilcoxon Rank-Sum tests with FDR correction.
