Length is the median length of nucleotide (nt) sequences. HXB2 coords = reference nucleotide coordinates in the HXB2 genome (Genbank accession K03455). Year type: sequences are annotated with year of sample collection, and in some cases date of HIV diagnosis. N = Total sample size, including both old and new sequences. Incid = number of sequences in ‘incident’ subset (most recent year). Subtype classifications were derived from the original data sources, when available, or generated de novo with SCUEAL [60].