Skip to main content
. 2021 Feb 25;11:4586. doi: 10.1038/s41598-021-84070-7

Figure 1.

Figure 1

Illustration of EBV sequence variation. (A) The EBV genome is about 170 Kbp long and contains 83 genes, for a total of 4392 amino acid residues. As an example, we focus on the BNRF1 gene and on two amino acid changes: Glu → Ala and Ser → Leu. We know for each sample the genomic variants across the whole genome, as illustrated with the colored nucleotides. Using the nucleotide information and a reference genome we can compute the amino acid changes. (B) We compare each individual (ID) to reference data and encode an amino acid as 1 if that individual has a non-synonymous change, and a 0 if not. This process returns us a matrix containing binary values, with individuals as row, and amino acids as columns. In our example, individual 2 has an amino acid change Glu → Ala and individual 3 an amino acid change Ser → Leu. (C) To transform the data into outcomes for the G2G analysis we can use the amino acid matrix as it is (EBV amino acids dataset) or remove all amino acid columns that appear in more than 1 individual and then pool amino acids per gene (1 = variant present, EBV genes dataset).