Abstract
Background
The classification of HIV-1 strains in subtypes and Circulating Recombinant Forms (CRFs) has helped in tracking the course of the HIV pandemic. In Senegal, which is located at the tip of West Africa, CRF02_AG predominates in the general population and Female Sex Workers (FSWs). In contrast, 40% of Men having Sex with Men (MSM) in Senegal are infected with subtype C. In this study we analyzed the geographical origins and introduction dates of HIV-1 C in Senegal in order to better understand the evolutionary history of this subtype, which predominates today in the MSM population
Methodology/Principal Findings
We used a combination of phylogenetic analyses and a Bayesian coalescent-based approach, to study the phylogenetic relationships in pol of 56 subtype C isolates from Senegal with 3,025 subtype C strains that were sampled worldwide. Our analysis shows a significantly well supported cluster which contains all subtype C strains that circulate among MSM in Senegal. The MSM cluster and other strains from Senegal are widely dispersed among the different subclusters of African HIV-1 C strains, suggesting multiple introductions of subtype C in Senegal from many different southern and east African countries. More detailed analyses show that HIV-1 C strains from MSM are more closely related to those from southern Africa. The estimated date of the MRCA of subtype C in the MSM population in Senegal is estimated to be in the early 80's.
Conclusions/Significance
Our evolutionary reconstructions suggest that multiple subtype C viruses with a common ancestor originating in the early 1970s entered Senegal. There was only one efficient spread in the MSM population, which most likely resulted from a single introduction, underlining the importance of high-risk behavior in spread of viruses.
Introduction
HIV-1 group M, which predominates in the global HIV/AIDS epidemic, can be further subdivided into subtypes (A–D, F–H, J, K), sub-subtypes (A1 to A4, F1 and F2), circulating recombinant forms (CRF01 to CRF51) and numerous unique recombinant forms (URFs) (www.hiv.lanl.gov). This genetic diversity has an impact on almost all aspects of the management of this infection going from identification and monitoring of infected persons, to treatment efficacy and vaccine design [1]–[3]. The classification of HIV strains has also helped in tracking the course of the HIV pandemic [4]. Numerous molecular epidemiological studies showed a heterogeneous geographic distribution of the different HIV-1 M subtypes and CRFs. The initial diversification of group M most likely occurred within or near the Democratic Republic of Congo (DRC) [5], [6], where the highest diversity of group M strains has been observed and the earliest cases of HIV-1 infection (1959 and 1960) have been documented in Kinshasa, the capital city [7]. Different HIV variants have then spread across the world, and the epidemics in the different continents and countries are the result of different founder effects. Today, subtype C accounts for 50% of all infections [8]. The majority of subtype C infections are found in southern Africa where they represent almost 100% of circulating HIV-1 strains. Subtype C also predominates in India, Ethiopia and southern China, and has entered East Africa, Brazil, and many European countries. With increasing mobility and human migration, HIV-1 variants inevitably intermix in different parts of the world and the distribution of the different HIV-1 variants is a dynamic process.
In Senegal, which is located at the tip of West Africa, both AIDS viruses, HIV-1 and HIV-2, co-circulate. HIV-2 was first described in Senegal, but like in other West African countries, the prevalence of HIV-2 remained low and is decreasing [9], [10]. Today HIV-1 predominates and since the description of the first HIV-1 AIDS case in 1986, HIV-1 seroprevalence remains below 1% in the general population but can reach up to 20% in population groups with high risk behavior like female sex workers (FSWs) or men having sex with men (MSM) [11]. Several studies showed that CRF02_AG predominates in Senegal, representing 50–70% of circulating strains in the general population and FSWs, but in contrast to surrounding west African countries, a wide diversity of other HIV-1 variants co-circulate; subtypes A1, A3, B, D, F, G, H, CRF01, CRF06, CRF09, CRF11, CRF45 and HIV-1 group O have all been documented [10], [12]–[14]. As mentioned above, the distribution of HIV-1 subtypes/CRFs can differ between geographic origins and between population groups. Recently our studies showed that 40% of MSM in Senegal are infected with subtype C, which is in strong contrast with 4% to 10% in the general population and FSWs [10], [12]–[15]. The factors associated with the rapid spread of subtype C and its predominance in the global epidemic are not entirely known, but in certain regions where it has been introduced, subtype C has overtaken other HIV-1 variants [16]. The high prevalence and the rapid spread of subtype C among MSM needs thus particular attention because this could also lead to an increase overtime of subtype C in the general population because more than 90% of MSM recognize having sex with women [17].
Using a combination of phylogenetic analyses and a Bayesian coalescent-based approach, we studied the phylogenetic relationships of subtype C isolates from Senegal with other subtype C strains that were sampled worldwide, in order to define the origin and onset of the subtype C epidemic in MSM in Senegal.
Results
Origin of subtype C sequences in Senegal
Among the HIV-1 subtype C pol sequences that were downloaded, we first eliminated all sequences that were not identified as subtype C (i.e. intersubtype recombinants) by the REGA-subtyping tool and kept only one isolate per patient. The final dataset includes a total of 3,081 sequences spanning a 1,011 bp fragment in pol between positions 2,253 and 3,263 on the HXB2 genome, including 56 (among which 24 MSM and 18 newly sequenced) strains from Senegal (Table 1 and Table S1). Sequences were included from 4 different continents and 61 countries: Africa (22 countries), the Americas (7 countries), Asia (9 countries) and Europe (23 countries) (Table 2). The majority (67.73%) of the sequences are from Africa and more precisely from southern Africa (55.14%) that is South Africa (22.36%) and Zambia (20.55%), and to a lower extent Botswana (4.32%), Mozambique (3.18%), Malawi (2.30%), Swaziland (1.53%), and Zimbabwe (0.91%). Subtype C sequences from Asia are predominantly from India (355 sequences on a total of 380) and those from the Americas mainly from southern Brazil (253 sequences on a total of 299). Subtype C sequences from Europe represent 10.22% of the dataset and are collected from 23 different countries, without a single country or area that predominates in the dataset.
Table 1. HIV-1 subtype C strains from Senegal included in this study.
Strain identification | Accession Number | Year of isolation | Population group | Reference |
90SN-90SE364 | AY713416 | 1990 | general population | [53] |
98SN-66HPD | AJ583722 | 1998 | general population | [54] |
99SN-159HALD | AJ583716 | 1999 | general population | [54] |
99SN-142HPD | AJ583715 | 1999 | general population | [54] |
98SN-39HALD | AJ287005 | 1998 | general population | [55] |
99SN-86HPD | AJ583739 | 1999 | general population | [54] |
04SN-MS003 | FM210753 | 2004 | MSM | [15] |
04SN-MS883 | FM210752 | 2004 | MSM | [15] |
04SN-MS855 | FM210749 | 2004 | MSM | [15] |
04SN-MS835 | FM210745 | 2004 | MSM | [15] |
04SN-MS821 | FM210741 | 2004 | MSM | [15] |
04SN-MS816 | FM210740 | 2004 | MSM | [15] |
04SN-MS779 | FM210737 | 2004 | MSM | [15] |
04SN-MS700 | FM210736 | 2004 | MSM | [15] |
04SN-MS540 | FM210726 | 2004 | MSM | [15] |
04SN-MS522 | FM210725 | 2004 | MSM | [15] |
04SN-MS492 | FM210723 | 2004 | MSM | [15] |
04SN-MS048 | FM210722 | 2004 | MSM | [15] |
04SN-MS481 | FM210718 | 2004 | MSM | [15] |
04SN-MS477 | FM210717 | 2004 | MSM | [15] |
04SN-MS475 | FM210716 | 2004 | MSM | [15] |
04SN-MS448 | FM210712 | 2004 | MSM | [15] |
04SN-MS422 | FM210709 | 2004 | MSM | [15] |
04SN-MS245 | FM210699 | 2004 | MSM | [15] |
04SN-MS029 | FM210691 | 2004 | MSM | [15] |
04SN-MS015 | FM210689 | 2004 | MSM | [15] |
04SN-MS011 | FM210687 | 2004 | MSM | [15] |
04SN-MS010 | FM210686 | 2004 | MSM | [15] |
04SN-MS007 | FM210685 | 2004 | MSM | [15] |
04SN-MS002 | FM210684 | 2004 | MSM | [15] |
03SN-980HALD | FN599776 | 2003 | general population | [14] |
03SN-965HALD | FN599773 | 2003 | general population | [14] |
02SN-510HALD | FN599737 | 2002 | general population | [14] |
99SN-67HDP | FN599718 | 1999 | general population | [14] |
09SN-SNA3-366 | HM002544 | 2009 | not known | unpublished |
08SN-SNA3-220 | HM002517 | 2008 | not known | unpublished |
08SN-SNA3-191 | HM002515 | 2008 | not known | unpublished |
07SN-SNA3-107 | HM002507 | 2007 | not known | unpublished |
02SN-260HALD | HE588158 | 2002 | general population | this study |
03SN-154HALD | HE588157 | 2003 | general population | this study |
03SN-321HALD | HE588156 | 2003 | general population | this study |
03SN-L065 | HE588149 | 2003 | general population | this study |
06SN-463HALD | HE588155 | 2006 | general population | this study |
07SN-2658HALD | HE588150 | 2007 | general population | this study |
07SN-2909HALD | HE588151 | 2007 | general population | this study |
07SN-2911HALD | HE588152 | 2007 | general population | this study |
07SN-2936HALD | HE588153 | 2007 | general population | this study |
07SN-3076HALD | HE588154 | 2007 | general population | this study |
00SN-102HALD | HE588159 | 2000 | general population | this study |
97SN-1119 | HE588162 | 1997 | general population | this study |
02SN-478HALD | HE588163 | 2002 | general population | this study |
97SN-14Fann | HE588165 | 1997 | general population | this study |
97SN-25Fann | HE588164 | 1997 | general population | this study |
96SN-1083 | HE588166 | 1996 | general population | this study |
97SN-1186 | HE588161 | 1997 | general population | this study |
97SN-1189 | HE588160 | 1997 | general population | this study |
Table 2. Numbers of HIV-1 subtype C strains from different countries that were included in this study.
Continent | Country | Number | % |
Africa | 2087 | 67.73 | |
Botswana | 133 | 4.32 | |
Burundi | 91 | 2.95 | |
Democratic Republic of Congo | 19 | 0.62 | |
Djibouti | 1 | 0.03 | |
Equatorial Guinea | 1 | 0.03 | |
Eritrea | 2 | 0.06 | |
Ethiopia | 99 | 3.21 | |
Gabon | 1 | 0.03 | |
Kenya | 4 | 0.13 | |
Malawi | 71 | 2.30 | |
Mali | 1 | 0.03 | |
Mozambique | 98 | 3.18 | |
Niger | 4 | 0.13 | |
Senegal | 56 | 1.82 | |
Somalia | 1 | 0.03 | |
South Africa | 689 | 22.36 | |
Sudan | 10 | 0.32 | |
Swaziland | 47 | 1.53 | |
Tanzania | 82 | 2.66 | |
Uganda | 16 | 0.52 | |
Zambia | 633 | 20.55 | |
Zimbabwe | 28 | 0.91 | |
America | 299 | 9.71 | |
Argentina | 8 | 0.26 | |
Brazil | 253 | 8.21 | |
Cuba | 25 | 0.81 | |
Honduras | 1 | 0.03 | |
United States of America | 9 | 0.29 | |
Uruguay | 2 | 0.06 | |
Venezuela | 1 | 0.03 | |
Asia | 380 | 12.33 | |
China | 7 | 0.23 | |
India | 355 | 11.52 | |
Israël | 5 | 0.16 | |
Myanmar | 1 | 0.03 | |
Philippines | 1 | 0.03 | |
Russia | 1 | 0.03 | |
South Korea | 2 | 0.06 | |
Taiwan | 1 | 0.03 | |
Yemen | 7 | 0.23 | |
Europe | 315 | 10.22 | |
Austria | 3 | 0.10 | |
Belgium | 35 | 1.14 | |
Cyprus | 8 | 0.26 | |
Czech Republic | 11 | 0.36 | |
Danmark | 21 | 0.68 | |
Finland | 6 | 0.19 | |
France | 7 | 0.23 | |
Georgia | 1 | 0.03 | |
Germany | 7 | 0.23 | |
Greece | 3 | 0.10 | |
Italy | 22 | 0.71 | |
Luxemburg | 3 | 0.10 | |
Norway | 16 | 0.52 | |
Poland | 2 | 0.06 | |
Portugal | 28 | 0.91 | |
Roumania | 35 | 1.14 | |
Slovakia | 1 | 0.03 | |
Spain | 26 | 0.84 | |
Sweden | 64 | 2.08 | |
Switzerland | 2 | 0.06 | |
The Netherlands | 8 | 0.26 | |
Ukraine | 3 | 0.10 | |
United Kingdom | 3 | 0.10 | |
Total | 3081 |
The maximum likelihood (PhyML) tree of the 3,081 subtype C sequences is shown in Figure 1. The strains from Senegal are highlighted in red, those from southern Africa (South Africa, Zambia, Zimbabwe, Malawi, Mozambique, Botswana, and Swaziland) in orange and those from the other African countries, which are predominantly from East Africa, in yellow. Strains from Asia, the Americas, and Europe are highlighted in green, purple and blue respectively. The sequences from Senegal are interspersed with the other African strains, but one significant cluster (98.9% aLRT support), which comprised all sequences obtained from MSM from Senegal, was identified. The phylogenetic tree shows also separate clades for subtype C strains from southern Africa and one from eastern Africa (cluster B, 75.9% aLRT support), each of which contains sequences from Senegal. The tree shows the presence of two other major clusters, one for the majority of South American (cluster A, purple) and one for the Asian strains (cluster C, green), each apparently resulting from different single introductions, but no strain from Senegal was observed in these clusters. The clusters from South America and Asia are each supported by 72.7% and 82.3% aLRT values, respectively. No significant cluster of European subtype C was observed, they are all interspersed with strains from different geographic origins mainly in Africa and in Asia and southern America. In order to exclude the possibility of artifactual phylogenetic clustering due to drug induced convergent evolution, especially for the clades from Senegal, the phylogenetic tree analysis was repeated on an alignment where 43 (i.e. 129 nt, ∼12.7% of the full alignment) codon positions known to be associated with major resistance mutations were removed. This analysis shows the same subtype C clusters (Figure S1).
The above analysis showed that subtype C was introduced into Senegal at multiple occasions. Figure 2 shows in more details the subtype C sequences that are most closely related to those observed in Senegal. As described in Materials and Methods, only sequences that branched with one or more sequences from Senegal until the second ancestral node in the phylogenetic tree of the 3,081 sequences, were used for this subtree. In addition to the 56 sequences from Senegal, 121 other subtype C sequences were included (Table S2), representing 5.7% of the total alignment. Figure 2 shows the tree obtained by PhyML with strains colored according to their geographic origin (the same tree with strain names is available in Figure S2). HIV-1 strains from Zambia are represented by a separate color in this tree because strains from this country are frequently present. The majority of the subtype C strains from Senegal and those from the MSM cluster (node C) are falling in clusters (aLRT >85%) which are mainly represented by strains from Zambia and other countries from southern Africa (for example node A, E and F). Nevertheless, some strains from Senegal are related to subtype C from east African countries (majority Ethiopia: node D). Although the exact country at the origin of the most recent common ancestor of the MSM strains remains uncertain, this was most likely in southern Africa. The first ancestral node to the MSM cluster (node B) suggests an origin in Zambia, but this node is only supported with 83.7% aLRT and 11% bootstrap values. The first ancestral node (node A), supported by an aLRT value of 94.7% and a bootstrap value of 49%, contains mainly strains from Zambia but also from other southern African countries. The Bayesian phylogenetic tree analysis performed with MrBayes shows similar results (Figure S3).
Dating the subtype C epidemic in Senegal and MSM population
We used a Bayesian MCMC approach implemented in BEASTv1.6.1 to estimate the dates of the most recent common ancestors (MRCAs) for the subtype C sequences from Senegal in the general population and for the subtype C epidemic in the MSM population. We used the Bayesian skyride population growth model associated to three molecular clock models: strict, relaxed uncorrelated lognormal, and relaxed uncorrelated exponential. Moreover, we used four different priors on the average substitution rate among branches with varying informative levels. Figure 3 shows the resulting estimations of the MRCA dates for the different models and priors used. More details are provided in Table S3, including substitution rate estimations.
Bayes factors (BF) indicate that the relaxed exponential model has a small advantage (BF in the 3 to 5 range) over the relaxed lognormal model, which in turn is slightly better (BF in the 3 to 6 range) than the strict molecular clock. However, the relaxed exponential model becomes non-informative when non- or poorly informative priors on the substitution rate are used (U[0,1] and N[2.5×10−3, 10×10−4], see Materials and Methods), which reveals spurious peaks leading to very large (up to ∼400 years) 95% Highest Posterior Density (HPD) intervals and unrealistic estimates. Except in these two cases, the results with all models and priors are quite consistent. As expected, when we used more informative priors we obtained more restricted 95% HPD intervals. Nevertheless, the median date estimates of the MRCAs of subtype C in the general population of Senegal and for the MSM cluster are similar for all models and priors, indicating likely epidemic origins in the early 80's, in the MSM population. The MRCA for the subtype C strains that entered at multiple occasions into the general population (i.e. heterosexual or mother to child transmission), is estimated in the early 70's.
To illustrate in more detail the MRCA of the subtype C strains in the MSM population and their relation to the other HIV-1 C strains from Senegal, the maximum clade credibility (MCC) tree with time scale obtained from BEAST is shown in Figure 4. We see the same MSM cluster as in the phylogeny of Figure 2 (see also Figure S2 and S3), and the early 70's and 80's dates for the MRCAs of general and MSM population respectively.
We verified whether presence of drug resistance mutations could have an impact on MRCA dates and substitution rate estimations. Therefore calculations were repeated on the three different molecular clock models and for the four priors on an alignment where 43 codon positions known to be associated with major resistance mutations were removed. This analysis showed no significant difference, compared to the results obtained with the complete alignment (Table S3 for details on estimations and Figure S4 for the MCC tree with time scale).
Finally, our reconstruction of the demographic history of HIV-1 C in Senegal identified an initial, slow growth phase until the end of the 70's followed by a period of quick exponential-like growth at the end of the 90's where the epidemic growth became slower (Figure 5).
Discussion
In this study we analyzed the geographical origins and introduction dates of HIV-1 subtype C in Senegal in order to better understand the evolutionary history of this subtype which predominates today in the MSM population [15]. Our evolutionary reconstructions suggest that multiple subtype C viruses with a common ancestor originating in the early 1970s entered the country, followed by a sharp growth of the effective number of infections over the next decade.
This analysis of more than 3,000 globally collected reference sequences most likely provides an adequate representation of global subtype C diversity, and provides also additional information on the subtype C epidemic in other continents. The phylogenetic tree analysis showed several major clusters of subtype C sequences, mainly related to the continent of origin, like Asia, Southern America or Africa, except for Europe. Interestingly, among the African strains, a separate cluster of strains derived from patients living in east African countries was observed [18], and subtype C strains from Europe do not form a separate cluster and are interspersed among the different continents and major clusters. Our data also confirm the previously reported link of the subtype C epidemic in Brazil with east Africa [19]–[22].
Our analyses with various methods (PhyML, MrBayes and BEAST) showed a significantly well-supported cluster which contained all subtype C strains that circulate among MSM in Senegal. The MSM cluster and other strains from Senegal are widely dispersed among the different subclusters of African strains, suggesting multiple introductions of subtype C into Senegal from many different southern and also eastern African countries. More detailed analyses showed that the majority of the HIV-1 C strains from Senegal, including those circulating among MSM, are more closely related to strains from southern African countries, mainly Zambia. The cluster of subtype C strains derived from the MSM population includes also strains from HIV-1 infected men from Senegal, who were not identified as MSM. Homosexuality is illegal in Senegal and male-to-male sex is condemned by political and religious authorities and by the general population, therefore most MSM keep their sexual life secret, including from their own family and more than 90% of MSM reported having sex also with women [17]. Thus, these additional strains in the MSM cluster are most likely from individuals with male-to-male sex activities. Subtype C in MSM may have its origin directly from southern Africa but it is also possible that the ancestor of this subtype C cluster circulated already for a certain period in the general population in Senegal before it was introduced into the MSM group.
The wide diversity and multiple introductions of subtype C fit also with the distribution of the HIV-1 variants in the general population in Senegal. Several studies showed that in addition to CRF02_AG, many other HIV-1 subtypes and CRFs are also present in the country, reflecting multiple introductions [10], [12]–[14]. This is most likely related to the important trading activity and travel links of the country with many other African countries [23], [24]. Our estimates suggest that the MRCA of the subtype C strains that entered Senegal was in the early 1970's, about 10–15 years before the description of the first HIV-1 AIDS case in the country or the first HIV-1 subtype C strain in 1988 in Senegal [25]. The MRCA date estimate of subtype C in Senegal is relatively close to those estimated in other African countries, like 1966 for subtype C in Ethiopia [26], beginning of the 70's for Zimbabwe [27] or in the late 60's for Malawi [28]. As expected, we found that MRCA of subtype C in Senegal is not specific, because multiple introductions occurred, and our MRCA date estimate corresponds most likely to those of subtype C strains outside Senegal. In contrast to southern African countries, subtype C did not become the predominant strain in Senegal and did only spread efficiently in the MSM population, underlining the importance of high risk behavior in spread of viruses [29]. The MRCA of subtype C in the MSM population is estimated in the early 80's and is the result of a single introduction. This estimate coincides with the period where the HIV-1 C epidemic started a quick exponential-like growth phase in Senegal for nearly 15 years according to the Bayesian skyride analysis.
Our study showed also that analysis of alignments with or without codons that are associated with drug resistance did not have a significant impact on phylogenetic clustering or on MRCA date and substitution rate estimations. Among the different molecular clock models used, Bayes factors suggested the use of the relaxed exponential molecular clock above the most frequently used relaxed lognormal molecular clock. However, the very large confidence intervals and convergence problems with the exponential model with poorly informative priors, and the almost similar results with informative priors for both models are probably at the basis for the preferential use of the relaxed lognormal molecular clock model for HIV.
Previous studies suggest that subtype C could spread more efficiently due to the predominance of CCR5 variants or a stronger predisposition for localization in the female genital mucosa than other subtypes, which may facilitate both vertical and heterosexual transmission [30]–[33]. Increase of subtype C could also have implications on treatment because other subtype C specific mutations have been documented and commercial drug resistance assays cannot correctly test subtype C infections [2], [34]–[36]. A cross-sectional study of women in Kenya indicated that women infected with subtype C had a higher viral load and lower CD4 counts than those infected with subtypes A and D, which could also have an impact on pathogenesis and transmission [37]. Therefore, it is important to continue to monitor HIV-1 subtype/CRF distribution among different population groups in Senegal. However, in order to be able to compare trends over time, such studies should be organized in a standardized way. For example, WHO proposed standardized protocols for surveillance of drug resistance mutations in recently infected individuals [38]. These studies can be combined with subtype/CRF characterization.
Because MSM reported having sex also with women, they could potentially serve as a bridge between high-risk men and low-risk women. This sexual mixing pattern might contribute in the future to the subsequent increase of subtype C in the general population. An increase from 4% in 2000 to almost 10% between 2000 and 2010 among the general population in Senegal has already been observed, and subtype C sequences recently obtained from HIV-1 C infected women in 2011 that cluster within the clade of strains from the MSM population have now been observed (Coumba Toure Kane, unpublished results). Understanding the origins and dispersal patterns of HIV-1 clades at regional and country levels is useful to improve the characterization and control of HIV spread. Continuous monitoring of HIV variants seems necessary to adapt treatment and vaccine strategies to be efficient against local and contemporary circulating HIV variants.
Materials and Methods
Nucleotide sequence dataset
In order to increase the number of sequences and to cover a wide geographic range, we used the pol region for our analysis. Pol sequences are highly studied because they are the target of antiretroviral drugs. A total of 56 subtype C pol gene sequences from Senegal were used in this study. Thirty-eight were obtained from the Los Alamos HIV sequence database (www.hiv.lanl.gov) from previously published reports and eighteen were newly characterized from ongoing molecular epidemiology and/or drug resistance studies mainly in Dakar, the capital city of Senegal (Table 1). We downloaded only sequences that were at least 1,000 nucleotides in length and spanning the genomic region which covers protease and majority of RT in pol between positions 2,253–3,263 on the HXB2 genome. Sequences were from blood samples collected between 1990 and 2009. In addition, all available subtype C sequences spanning the same genomic region and for which country of origin and sampling year were known, were also downloaded from the Los Alamos HIV database (www.hiv.lanl.gov). We then submitted all the sequences to the REGA subtyping tool v.2 to confirm subtype assignments and to eliminate eventual intersubtype recombinants [39], [40]. We selected one sequence per individual when sequential sequences were available or when sequences were epidemiologically linked by direct donor–recipient transmission.
HIV-1 pol sequencing
The 18 new HIV-1 pol sequences were obtained with an in-house technique as previously described [41]. Briefly, RNA was extracted using the QIAamp Viral RNA extraction kit (Qiagen SA, Courtabeauf, France) and processed for reverse transcription polymerase chain reaction (RT-PCR) with the integrase specific primer IN3 5′-TCTATBCCATCTAAAAATAGTACTTTCCTGATTCC-3′ using the Expand reverse transcriptase (Roche Diagnostics, Meylan, France) according to the manufacturer's instructions. The resulting cDNA served as template in the subsequent nested PCR reaction during which a 1,865 base pairs fragment, corresponding to the protease and the first 440 amino acids of the reverse transcriptase region of the pol gene, was amplified with previously described primers and cycling conditions using the Expand Long Template PCR system (Roche Diagnostics, Meylan, France). The amplified HIV-1 nucleic acid fragments were purified using the Geneclean Turbo Kit (Q-Biogen, MPbiomedicals, France) and directly sequenced with primers encompassing the pol region using BigDye Terminator version 3.1 (Applied Biosystems, Courtaboeuf, France) according to the manufacturer's instructions. Electrophoresis and data collection were done on an Applied Biosystems 3130XL Genetic Analyzer. The sequenced fragments from both strands were reconstituted using Seqman II from the DNAstar package v5.08 (Lasergene, Madison, WI, USA).
Sequence alignment and phylogenetic tree analysis
The 18 newly obtained sequences were aligned with the alignment of subtype C sequences downloaded from the Los Alamos HIV database, using the L-INS-i method from MAFFT [42], [43], and then manually edited with MEGA5 [44]. The HXB2 subtype B prototype strain was used as outgroup. In order to study potential bias due to drug-induced convergent evolution, all our analysis were also repeated on an alignment for which we removed 43 codon positions known to be associated with major resistance mutations according to the WHO-list of 2009 [45]. The following positions were excluded for protease (23, 24, 30, 32, 46, 47, 48, 50, 53, 54, 73, 76, 82, 83, 84, 85, 88, 90) and RT (41, 65, 67, 69, 70, 74, 75, 77, 100, 101, 103, 106, 115, 116, 151, 179, 181, 184, 188, 190, 210, 215, 219, 225, 230), leaving 882 nt in the final alignment. Both complete (1,011 nt) and restricted (882 nt) sequence alignments are available from the authors upon request. Maximum Likelihood phylogenies were inferred using the GTR+I+Γ4 nucleotide substitution model recommended by [46] and implemented in PhyML v3.0 [47]. The SPR option was selected to search the tree space and aLRT SH-like branch supports were used to assess confidence in topology [48]. The phylogenetic tree was drawn with FIGTREE (tree.bio.ed.ac.uk/software/figtree/).
In order to better determine and visualize the relationship of the subtype C sequences from Senegal to those from other geographic areas, another phylogenetic analysis was performed with less sequences. For this subtree, we collected from the large, previous phylogenetic tree, all descendant sequences of nodes that are first or second level ancestor of at least one sequence from Senegal (i.e., all Senegalese sequences plus their sisters and close relatives). A phylogeny was then inferred, using the same method and options as described above, but in addition to aLRT we ran a non-parametric bootstrap with 100 replicates to obtain a second assessment of branch supports. A phylogenetic analysis on this subset of sequences was also inferred using MrBayes v3.1 [49] with the same substitution model as for the maximum likelihood tree, and with chain length and tree sampling frequency of 5×107 and 1×104 generations, respectively. A burn-in of 2,000 sampled trees (i.e. ∼40%) was selected. By the end of the run, the average standard deviation of split frequencies was below 0.01 and the potential scale reduction factor of every parameter was in the range [0.999, 1.001], except the parameter pinvar which is at 1.002, proving the convergence of the Markov chains (see MrBayes manual).
Dating the introduction of subtype C in Senegal and MSM population
Estimates of the substitution rate and dates of the most recent common ancestor (MRCA) of subtype C in Senegal and in the sub-epidemic in MSM were obtained using BEAST v1.6.1 [50]. The 56 pol gene subtype C sequences from Senegal were analyzed under a GTR+I+Γ4 substitution process (as for phylogenetic analyzes). We used three different molecular clock models (strict clock, relaxed uncorrelated exponential and relaxed uncorrelated lognormal) [51] as implemented in BEAST with a Bayesian skyride tree prior as a coalescent demographic model with time-aware smoothing [52]. For the parameters of each molecular clock model (ucld.mean, uced.mean and clock.rate for the relaxed lognormal, relaxed exponential and strict molecular clock respectively) we tested a total of four different priors, one non-informative prior based on a uniform distribution (between 0.0 and 1.0) and three priors with varying information levels based on normal distribution with a mean of 2.5×10−3 (based on estimations from a previous study [27] in the same genomic region and as estimated by Path-O-Gen: tree.bio.ed.ac.uk/software/pathogen/) and standard deviations of 10×10−4, 7.5×10−4, and 5.0×10−4, respectively. For the ucld.stdev parameter (representing the variability of the rates among branches for the relaxed lognormal molecular clock) we used a prior based on an exponential distribution with mean of 0.1 (personal communication with A. Drummond). MCMC simulations were run for 2.5×108 chain steps with sub-sampling every 2.5×105 steps. Convergence of the chains was inspected using Tracer v.1.5. For each tested prior and for each parameter, effective sample size (ESS) values were always above 300. The Bayes Factor was calculated to compare molecular clock models, using marginal likelihood as implemented in Tracer v.1.5. The Maximum Clade Credibility with time scale (MCC) tree was obtained by TreeAnnotator v1.6.1 with a burn-in of the first hundred trees.
Supporting Information
Footnotes
Competing Interests: The authors have declared that no competing interests exist.
Funding: MJ was supported by a PhD grant from the Région Languedoc-Roussillon and from the University of Montpellier 2, France. Nafissatou Leye has a PhD grant from S.C.A.C. (Service de Coopération et d'Action Culturelle) of the French Embassy in Senegal. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Thomson MM, Pérez-Alvarez L, Nájera R. Molecular epidemiology of HIV-1 genetic forms and its significance for vaccine development and therapy. Lancet Infect Dis. 2002;2:461–71. doi: 10.1016/s1473-3099(02)00343-2. Review. [DOI] [PubMed] [Google Scholar]
- 2.Peeters M, Aghokeng AF, Delaporte E. Genetic diversity among human immunodeficiency virus-1 non-B subtypes in viral load and drug resistance assays. Clin Microbiol Infect. 2010;16:1525–31. doi: 10.1111/j.1469-0691.2010.03300.x. Review. [DOI] [PubMed] [Google Scholar]
- 3.Gamble LJ, Matthews QL. Current progress in the development of a prophylactic vaccine for HIV-1. Drug Des Devel Ther. 2010;5:9–26. doi: 10.2147/DDDT.S6959. Review. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Tebit DM, Arts EJ. Tracking a century of global expansion and evolution of HIV to drive understanding and to combat disease. Lancet Infect Dis. 2011;11:45–56. doi: 10.1016/S1473-3099(10)70186-9. Review. [DOI] [PubMed] [Google Scholar]
- 5.Vidal N, Peeters M, Mulanga-Kabeya C, Nzilambi N, Robertson D, et al. Unprecedented degree of human immunodeficiency virus type 1 (HIV-1) group M genetic diversity in the Democratic Republic of Congo suggests that the HIV-1 pandemic originated in Central Africa. J Virol. 2000;74:10498–507. doi: 10.1128/jvi.74.22.10498-10507.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Rambaut A, Robertson DL, Pybus OG, Peeters M, Holmes EC. Human immunodeficiency virus. Phylogeny and the origin of HIV-1. Nature. 2001;410:1047–8. doi: 10.1038/35074179. [DOI] [PubMed] [Google Scholar]
- 7.Worobey M, Gemmel M, Teuwen DE, Haselkorn T, Kunstman K, et al. Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960. Nature. 2008;455:661–4. doi: 10.1038/nature07390. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hemelaar J, Gouws E, Ghys PD, Osmanov S WHO-UNAIDS Network for HIV Isolation and Characterisation. Global trends in molecular epidemiology of HIV-1 during 2000–2007. AIDS. 2011;25:679–89. doi: 10.1097/QAD.0b013e328342ff93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Barin F, M'Boup S, Denis F, Kanki P, Allan JS, et al. Serological evidence for virus related to simian T-lymphotropic retrovirus III in residents of west Africa. Lancet. 1985;2:1387–9. doi: 10.1016/s0140-6736(85)92556-5. [DOI] [PubMed] [Google Scholar]
- 10.Hamel DJ, Sankalé JL, Eisen G, Meloni ST, Mullins C, et al. Twenty years of prospective molecular epidemiology in Senegal: changes in HIV diversity. AIDS Res Hum Retroviruses. 2007;23:1189–96. doi: 10.1089/aid.2007.0037. [DOI] [PubMed] [Google Scholar]
- 11.UNAIDS website. Available: www.unaids.org/en/regionscountries/countries/senegal/. Accessed 2011 Aug 23.
- 12.Toure-Kane C, Montavon C, Faye MA, Gueye PM, Sow PS, et al. Identification of all HIV type 1 group M subtypes in Senegal, a country with low and stable seroprevalence. AIDS Res Hum Retroviruses. 2000;16:603–9. doi: 10.1089/088922200309025. [DOI] [PubMed] [Google Scholar]
- 13.Ayouba A, Lien TT, Nouhin J, Vergne L, Aghokeng AF, et al. Low prevalence of HIV type 1 drug resistance mutations in untreated, recently infected patients from Burkina Faso, Côte d'Ivoire, Senegal, Thailand, and Vietnam: the ANRS 12134 study. AIDS Res Hum Retroviruses. 2009;25:1193–6. doi: 10.1089/aid.2009.0142. [DOI] [PubMed] [Google Scholar]
- 14.Diop-Ndiaye H, Toure-Kane C, Leye N, Ngom-Gueye NF, Montavon C, et al. Antiretroviral drug resistance mutations in antiretroviral-naive patients from Senegal. AIDS Res Hum Retroviruses. 2010;26:1133–8. doi: 10.1089/aid.2009.0295. [DOI] [PubMed] [Google Scholar]
- 15.Ndiaye HD, Toure-Kane C, Vidal N, Niama FR, Niang-Diallo PA, et al. Surprisingly high prevalence of subtype C and specific HIV-1 subtype/CRF distribution in men having sex with men in Senegal. J Acquir Immune Defic Syndr. 2009;52:249–52. doi: 10.1097/QAI.0b013e3181af70a4. [DOI] [PubMed] [Google Scholar]
- 16.Soares EA, Martinez AM, Souza TM, Santos AF, Da Hora V, et al. HIV-1 subtype C dissemination in southern Brazil. AIDS. 2005;19(Suppl 4S81) doi: 10.1097/01.aids.0000191497.00928.e4. [DOI] [PubMed] [Google Scholar]
- 17.Wade AS, Kane CT, Diallo PAN, Diop AK, Gueye K, et al. HIV infection and sexually transmitted infections among men who have sex with men in Senegal. AIDS. 2005;19:2133–2140. doi: 10.1097/01.aids.0000194128.97640.07. [DOI] [PubMed] [Google Scholar]
- 18.Thomson MM, Fernández-García A. Phylogenetic structure in African HIV-1 subtype C revealed by selective sequential pruning. Virology. 2011;415:30–8. doi: 10.1016/j.virol.2011.03.021. [DOI] [PubMed] [Google Scholar]
- 19.Fontella R, Soares MA, Schrago CG. On the origin of HIV-1 subtype C in South America. AIDS. 2008;22:2001–11. doi: 10.1097/QAD.0b013e3283108f69. [DOI] [PubMed] [Google Scholar]
- 20.Bello G, Passaes CP, Guimarães ML, Lorete RS, Matos Almeida SE, et al. Origin and evolutionary history of HIV-1 subtype C in Brazil. AIDS. 2008;22:1993–2000. doi: 10.1097/QAD.0b013e328315e0aa. [DOI] [PubMed] [Google Scholar]
- 21.de Oliveira T, Pillay D, Gifford RJ UK Collaborative Group on HIV Drug Resistance. The HIV-1 subtype C epidemic in South America is linked to the United Kingdom. PLoS One. 2010;5(2):e9311. doi: 10.1371/journal.pone.0009311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Véras NM, Gray RR, Brígido LF, Rodrigues R, Salemi M. High-resolution phylogenetics and phylogeography of human immunodeficiency virus type 1 subtype C epidemic in South America. J Gen Virol. 2011;92:1698–709. doi: 10.1099/vir.0.028951-0. [DOI] [PubMed] [Google Scholar]
- 23.Kane F, Alary M, Ndoye I, Coll AM, M'boup S, et al. Temporary expatriation is related to HIV-1 infection in rural Senegal. AIDS. 1993;9:1261–5. doi: 10.1097/00002030-199309000-00017. [DOI] [PubMed] [Google Scholar]
- 24.Kanki PJ, Peeters M, Gueye-Ndiaye A. Virology of HIV-1 and HIV-2: implications for Africa. AIDS. 1997;11(Suppl B):S33–4. [PubMed] [Google Scholar]
- 25.Kanki PJ, Hamel DJ, Sankalé JL, Hsieh C, Thior I, et al. Human immunodeficiency virus type 1 subtypes differ in disease progression. J Infect Dis. 1999;179:68–73. doi: 10.1086/314557. [DOI] [PubMed] [Google Scholar]
- 26.Tully DC, Wood C. Chronology and evolution of the HIV-1 subtype C epidemic in Ethiopia. AIDS. 2010;24:1577–82. doi: 10.1097/QAD.0b013e32833999e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Dalai SC, de Oliveira T, Harkins GW, Kassaye SG, Lint J, et al. Evolution and molecular epidemiology of subtype C HIV-1 in Zimbabwe. AIDS. 2009;23:2523–32. doi: 10.1097/QAD.0b013e3283320ef3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Travers SA, Clewley JP, Glynn JR, Fine PE, Crampin AC, et al. Timing and reconstruction of the most recent common ancestor of the subtype C clade of human immunodeficiency virus type 1. J Virol. 2004;78:10501–6. doi: 10.1128/JVI.78.19.10501-10506.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.McDaid LM, Hart GJ. Sexual risk behaviour for transmission of HIV in men who have sex with men: recent findings and potential interventions. Curr Opin HIV AIDS. 2010;5:311–5. doi: 10.1097/COH.0b013e32833a0b86. Review. [DOI] [PubMed] [Google Scholar]
- 30.Abraha A, Nankya IL, Gibson R, Demers K, Tebit DM, et al. CCR5-and CXCR4-tropic subtype C human immunodeficiency virus type 1 isolates have a lower level of pathogenic fitness than other dominant group M subtypes: implications for the epidemic, J Virol. 2009;83:5592–5605. doi: 10.1128/JVI.02051-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Ball SC, Abraha A, Collins KR, Marozsan AJ, Baird H, et al. Comparing the ex vivo fitness of CCR5-tropic human immunodeficiency virus type 1 isolates of subtypes B and C. J Virol. 2003;77:1021–38. doi: 10.1128/JVI.77.2.1021-1038.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Renjifo B, Gilbert P, Chaplin B, Msamanga G, Mwakagile D, et al. Preferential in-utero transmission of HIV-1 subtype C as compared to HIV-1 subtype A or D. AIDS. 2004;18:1629–1636. doi: 10.1097/01.aids.0000131392.68597.34. [DOI] [PubMed] [Google Scholar]
- 33.John-Stewart GC, Nduati RW, Rousseau CM, Mbori-Ngacha DA, Richardson BA, et al. Subtype C is associated with increased vaginal shedding of HIV-1. J Infect Dis. 2005;192:492–496. doi: 10.1086/431514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Martinez-Cajas JL, Pai NP, Klein MB, Wainberg MA. Differences in resistance mutations among HIV-1 non-subtype B infections: A systematic review of evidence (1996–2008). J Int AIDS Soc. 2009;12:11. doi: 10.1186/1758-2652-12-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Vergne L, Snoeck J, Aghokeng A, Maes B, Valea D, et al. Genotypic drug resistance interpretation algorithms display high levels of discordance when applied to non-B strains from HIV-1 naive and treated patients. FEMS Immunol Med Microbiol. 2006;46:53–62. doi: 10.1111/j.1574-695X.2005.00011.x. [DOI] [PubMed] [Google Scholar]
- 36.Snoeck J, Kantor R, Shafer RW, Van Laethem K, Deforche K, et al. Discordances between interpretation algorithms for genotypic resistance to protease and reverse transcriptase inhibitors of human immunodeficiency virus are subtype dependent. Antimicrob Agents Chemother. 2006;50:694–701. doi: 10.1128/AAC.50.2.694-701.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Neilson JR, John GC, Carr JK, Lewis P, Kreiss JK, et al. Subtypes of human immunodeficiency virus type 1 and disease stage among women in Nairobi, Kenya. J Virol. 1999;73:4393–4403. doi: 10.1128/jvi.73.5.4393-4403.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bennett DE, Bertagnolio S, Sutherland D, Gilks CF. The World Health Organization's global strategy for prevention and assessment of HIV drug resistance. Antivir Ther. 2008;13(Suppl 2):1–13. [PubMed] [Google Scholar]
- 39.de Oliveira T, Deforche K, Cassol S, Salminen M, Paraskevis D, et al. An automated genotyping system for analysis of HIV-1 and other microbial sequences. Bioinformatics. 2005;21:3797–800. doi: 10.1093/bioinformatics/bti607. [DOI] [PubMed] [Google Scholar]
- 40.Alcantara LC, Cassol S, Libin P, Deforche K, Pybus OG, et al. Standardized framework for accurate, high-throughput genotyping of recombinant and non-recombinant viral sequences. Nucleic Acids Res. 2009;37:634–42. doi: 10.1093/nar/gkp455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Vergne L, Diagbouga S, Kouanfack C, Aghokeng A, Butel C, et al. HIV-1 drug-resistance mutations among newly diagnosed patients before scaling-up programmes in Burkina Faso and Cameroon. Antivir Ther. 2006;11:575–9. [PubMed] [Google Scholar]
- 42.Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059–3066. doi: 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Katoh K, Kuma K, Toh H, Miyata T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–8. doi: 10.1093/nar/gki198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Tamura K, Peterson D, Peterson N, Stecher G, Nei M, et al. MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol Biol Evol. 2011;28:2731–9. doi: 10.1093/molbev/msr121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Bennett DE, Camacho RJ, Otelea D, Kuritzkes DR, Fleury H, et al. Drug resistance mutations for surveillance of transmitted HIV-1 drug-resistance: 2009 update. PLoS One. 2009:e4724. doi: 10.1371/journal.pone.0004724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Posada D, Crandall KA. Selecting models of nucleotide substitution: an application to human immunodeficiency virus 1 (HIV-1). Mol Biol Evol. 2001;18:897–906. doi: 10.1093/oxfordjournals.molbev.a003890. [DOI] [PubMed] [Google Scholar]
- 47.Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59:307–21. doi: 10.1093/sysbio/syq010. [DOI] [PubMed] [Google Scholar]
- 48.Anisimova M, Gascuel O. Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Syst Biol. 2006;55:539–52. doi: 10.1080/10635150600755453. [DOI] [PubMed] [Google Scholar]
- 49.Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–1574. doi: 10.1093/bioinformatics/btg180. [DOI] [PubMed] [Google Scholar]
- 50.Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214. doi: 10.1186/1471-2148-7-214. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Drummond AJ, Ho SYW, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006;4:e88. doi: 10.1371/journal.pbio.0040088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Minin VN, Bloomquist EW, Suchard MA. Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol Biol Evol. 2008;25:1459–71. doi: 10.1093/molbev/msn090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Brown BK, Darden JM, Tovanabutra S, Oblander T, Frost J, et al. Biologic and genetic characterization of a panel of 60 human immunodeficiency virus type 1 isolates, representing clades A, B, C, D, CRF01_AE, and CRF02_AG, for the development and assessment of candidate vaccines. J Virol. 2005;79:6089–101. doi: 10.1128/JVI.79.10.6089-6101.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Vergne L, Kane CT, Laurent C, Diakhaté N, Gueye NF, et al. Low rate of genotypic HIV-1 drug-resistant strains in the Senegalese government initiative of access to antiretroviral therapy. AIDS. 2003;17(Suppl 3):S31–8. doi: 10.1097/00002030-200317003-00005. [DOI] [PubMed] [Google Scholar]
- 55.Vergne L, Peeters M, Mpoudi-Ngole E, Bourgeois A, Liegeois F, et al. Genetic diversity of protease and reverse transcriptase sequences in non-subtype-B human immunodeficiency virus type 1 strains: evidence of many minor drug resistance mutations in treatment-naïve patients. J Clin Microbiol. 2000;38:3919–25. doi: 10.1128/jcm.38.11.3919-3925.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.