Fig. 4.
Very strong positive selection for sequons with Thr and Ser in gp120 of HIV and other retroviruses results from an increased conditional probability that Asn, Thr, and Ser will be present in sequons rather than elsewhere for selection in gp120, amino acid composition bias, and changes in AT content. (A) Density of sequons per 500 aa in gp120 versus capsid proteins and enzymes (negative controls) for various retroviruses, which are abbreviated and color-coded as follows: HIV-1 strains (A–O) are marked with lowercase blue letters. HIV-2 strains are marked with uppercase red letters, whereas other lentiviruses are marked with green numbers. Very strong selection for sequons in gp120 of all of the retroviruses dwarfs the relatively modest selection for sequons in host-secreted and membrane proteins (marked with a red plus sign). (B) An important mechanism for positive selection for sequons in gp120 of retroviruses, which is based on an increased conditional probability that Asn, Thr, and Ser will be present in sequons rather than elsewhere in gp120, is shown by plotting the counted density of sequons in gp120 versus that calculated (expected value) by the Asn, Thr, Ser, and Pro content of gp120. (C) Amino acid composition bias, which increases the number of sequons in gp120, is shown by plotting the calculated sequon density of gp120 versus that of capsid and enzymes. Although Asn, Ser, and Thr are relatively increased in gp120 versus retroviral capsid and enzymes, Pro is decreased (Fig. S4). (D) AT content of gp120 coding sequence versus the rest of the coding sequence of the retrovirus shows there is moderate positive selection for AT in gp120s of all retroviruses examined. Fig. S5 shows there is no change in sequon densities of gp120 of HIV strains A1, B, C, and D with time.