Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2023 Dec 21;18(12):e0296053. doi: 10.1371/journal.pone.0296053

Distribution pattern, molecular transmission networks, and phylodynamic of hepatitis C virus in China

Jingrong Ye 1,#, Yanming Sun 1,#, Jia Li 1,#, Xinli Lu 2,#, Minna Zheng 3,#, Lifeng Liu 4, Fengting Yu 5, Shufang He 1, Conghui Xu 1, Xianlong Ren 1, Juan Wang 1, Jing Chen 1, Yuhua Ruan 6, Yi Feng 6, Yiming Shao 6,*,#, Hui Xing 6,*,#, Hongyan Lu 1,*,#
Editor: Jason T Blackard7
PMCID: PMC10734925  PMID: 38128044

Abstract

In China, few molecular epidemiological data on hepatitis C virus (HCV) are available and all previous studies were limited by small sample sizes or specific population characteristics. Here, we report characterization of the epidemic history and transmission dynamics of HCV strains in China. We included HCV sequences of individuals belonging to three HCV surveillance programs: 1) patients diagnosed with HIV infection at the Beijing HIV laboratory network, most of whom were people who inject drugs and former paid blood donors, 2) men who have sex with men, and 3) the general population. We also used publicly available HCV sequences sampled in China in our study. In total, we obtained 1,603 Ns5b and 865 C/E2 sequences from 1,811 individuals. The most common HCV strains were subtypes 1b (29.1%), 3b (25.5%) and 3a (15.1%). In transmission network analysis, factors independently associated with clustering included the region (OR: 0.37, 95% CI: 0.19–0.71), infection subtype (OR: 0.23, 95% CI: 0.1–0.52), and sampling period (OR: 0.43, 95% CI: 0.27–0.68). The history of the major HCV subtypes was complex, which coincided with some important sociomedical events in China. Of note, five of eight HCV subtype (1a, 1b, 2a, 3a, and 3b), which constituted 81.8% HCV strains genotyped in our study, showed a tendency towards decline in the effective population size during the past decade until present, which is a good omen for the goal of eliminating HCV by 2030 in China.

Introduction

Hepatitis C virus (HCV) infection is a major public health threat in China. The most recent estimate of the national prevalence of HCV infection is 0.7%, representing approximately 10 million people [1]. In 2020 alone, 194,066 individuals were newly diagnosed with HCV infection [2]. Moreover, an estimated 34,198 people died of cirrhosis attributed to HCV infection in 2017 [3]. Inspired by the exciting curable therapeutic effect of new all-oral antivirals with a short treatment duration, more manageable side effects, and improved sustained virologic response (SVR), in 2016, the WHO introduced ambitious global targets to eliminate HCV infection by 2030 [4,5]. To achieve these goals, China needs to develop national policies based on up-to-date and reliable epidemiological data. Previous national or quasi-national studies have determined HCV genotype distribution in China. However, all these studies are limited by small sample sizes, samples of a specific population (mostly from people who inject drugs [PWID] and former paid blood donors, whereas men who have sex with men [MSM] were seldom considered) and restricted geographic sampling [611]. Previous studies have also reconstructed the evolutionary history of HCV lineages in China and successfully linked the time scale of HCV evolution to unique historical events and past sociomedical conditions in China, such as the "Cultural Revolution" and "Encouraged Plasma Campaign" [12,13]. However, the outcomes of these theoretical studies have been limited by a relatively narrow span of sampling time.

Rapidly evolving RNA viruses, such as HIV and HCV, contain measurable footprints in their genome, which can be used for molecular transmission networks. Thus, by using nucleotide sequences, HIV transmission networks that link people who are infected with genetically similar isolates can be constructed, whereby linked people are presumed to have a direct or indirect epidemiologic connection and usually represent a “hotspot” of HIV transmission [1416]. Over the last two decades, many clustering methods have been developed to define HIV transmission networks within a population. Broadly speaking, these methods can be grouped into two categories: methods that cluster directly on sequence variation via pairwise genetic distance measures, and methods that interpret this variation in the context of subtrees in a phylogeny. Phylogenetic analysis can be associated with high computational burden, especially for large sequence datasets. However, the genetic distance method can be computed rapidly. Therefore recent network analyses have favoured the generally faster and parameter-rich distanced-based methods [17,18].

These network analyses contribute significantly to our understanding of HIV epidemiology, for example, by providing information about HIV epidemics by identifying transmission linkages and by elucidating differences in transmission within and between populations [1416]. HCV also evolves rapidly and shares the same routes of transmission as HIV; however, HCV transmission networks have never been characterized in China.

In this study, we aimed to update the genotype distribution, infer the molecular transmission networks and reconstruct the epidemic history of HCV in China using a substantially more comprehensive dataset and metadata than previous works.

Methods

Sampling strategy

We designed a cross-sectional study to make full use of all available HCV genotyping data in China. The study population consisted of four separate groups of HCV-infected individuals (Fig 1 and S1 Table). The first group consisted of patients diagnosed with HIV infection at the Beijing HIV laboratory network (BHLN) from 1999 through 2017. The BHLN, established in 1986, is a collaborative network of laboratories involved in HIV diagnosis. It was authorized by the Beijing Municipal Commission of Health and includes a central reference laboratory in the Beijing Center for Disease Prevention and Control (CDC), four additional HIV reference laboratories (DiTan, YouAn, Peking Union Medical College, and People’s Liberation Army of China [PLA] General Hospital), and approximately 280 HIV screening laboratories [19,20].

Fig 1. Study profile.

Fig 1

BHLN = Beijing HIV laboratory network. LANL = Los Alamos National Laboratory. TDR = Transmitted drug resistance. MSM = Men who have sex with men. PWID = People who inject drugs.

To ensure that as many sequences as possible were obtained, we adopted a cost-effective sampling strategy, which is one that obtains many sequences at a low cost. We mainly considered the two groups of populations most seriously affected by HCV infection in the BHLN: PWID and former paid blood donors. We acknowledge that we obtained HCV RNA first and designed a sampling strategy later. We obtained RNA remnants from China’s national HIV transmitted drug resistance surveillance programme conducted by the BHLN. This programme randomly selected approximately 40% of individuals with newly diagnosed HIV infection between 1999 and 2017 from the national HIV epidemic database maintained by the BHLN. In total, we obtained 9,059 RNA samples from heterosexual individuals (2,190), MSM (6,136), PWID (539), and former paid blood donors (194) [19,20]. Unfortunately, for most of these samples, HCV antibody test records were not available (except for four heterosexuals and two volunteer blood donors). Therefore, we devised cost-effective inclusion criteria. According to the literature, the national prevalence of HCV co-infection in people living with HIV is approximately 4.0%, 6.4%, 82.4%, and 92.9% for heterosexuals, MSM, PWID, and former paid blood donors [6]. There is an unduly high HCV prevalence among the PWID and former paid blood donors. In other words, if we were concerned with these two groups of populations, we could obtain more HCV sequences at a lower cost. Moreover, or most importantly, the plasma samples were not sufficient for performing another HCV antibody test. Therefore, we included all samples from PWID and former paid blood donors in the BHLN in our analysis. Four heterosexual and two volunteer blood donors who were HCV antibody positive were also included.

The second group consisted of individuals who visited the health examination outpatient service of the Beijing CDC in 1999 and had HCV antibody-positive records. We roughly deemed these individuals to be from the general population.

The third group consisted of individuals with HCV antibody-positive records from an MSM cohort. In this cohort, we conducted seven serial consecutive cross-sectional surveys of MSM from 2015 through 2021. The purpose of this survey was to track trends in the prevalence of HIV, HCV and syphilis in this population [21,22].

The fourth group consisted of publicly available sequences from the Los Alamos National Laboratory (LANL) HCV sequence database. We retrieved all sequences sampled in China with information on the province of origin and sample year and covering the same genomic region from the databases (data available as of Dec 1, 2021). The sampling year and locations were confirmed by reference to the original literature.

Patient inclusion and data collection

We extracted baseline data on individuals from the national HIV epidemic database or LANL HCV database, including demographic and population characteristics and CD4 cell count. For geographic location, we grouped individuals into 25 provinces according to Hukou, a basic system of household registration in China. It officially identifies a person as a resident of an area and includes identifying information such as name, parents, spouse, and date of birth.

Genotypic analysis

We performed population-based sequencing of the HCV Ns5b and C/E2 regions in all specimens using in-house methods [8]. These sequences correspond to nucleotides 8,400 to 9,100 and 927 to 2,040, respectively, in the H77 genome. We inferred HCV genotype by automated genotyping in context-based modelling for expeditious typing (COMET)-HCV [23], followed by maximum likelihood (ML) phylogenetic analysis of the sequences [24]. An ML tree was used to confirm the results of COMET.

Transmission network analysis

To construct the transmission network, we followed the protocol outlined by Wertheim et al. [1416]. We aligned HCV sequences in a pairwise fashion and then evaluated Tamura-Nei 93 (TN93) distances for all sequences using the HyPhy package [25]. TN93 genetic distance was used because it can be computed rapidly and is the most complex genetic distances that can be represented by a closed-form solution [17,18].We performed stepwise transmission network analysis using a serial set of genetic threshold (0.005–0.045 subsitution/site, increment every 0.0025 subsitution/site) [26]. We selected 0.01 subsitution/site and 0.0325 substituion/site for Ns5b and C/E2 datasets, because this distance identifies the maximum number of clusters in the transmission network (S1 Fig). The degree (connectivity) of each individual was defined as the number of links (edges in the transmission network) to other individuals. Clusters were defined as connected components of the network comprising two or more nodes. We used Cytoscape (3.8.0) to visualize the networks.

Phylogenetic analysis

We aligned sequences by using the BioEdit tool and manually corrected the alignment according to the encoded reading frame. If several sequences from the same patient were available in the dataset, we retained only the oldest sequence. Long branch trees were reconfirmed regarding genotype, and those found to be misclassified were eliminated. All these efforts help to minimize the possibility of duplicate patient sampling. We reconstructed ML phylogenetic tree with the datasets using the GTR+CAT nucleotide substitution model in FastTree 2.1 [24]. Temporal signal was examined using root-to-tip regression in TempEst v1.5.3 [27]. The sequences whose sampling year is incongruent with genetic divergence were excluded for Bayesian analysis. We estimated time-calibrated phylogenies dated from time-stamped genome data using the Bayesian software package BEAST(version 1.10.4) [28]. We only did Bayesian evolutionary analysis for main eight HCV subtype (1a, 1b, 2a, 3a, 3b, 6a, 6n, and 6xa) because their datasets contain at least 10 dated sequences. We used the HKY nucleotide substitution model with codon partitions [29] and Bayesian SkyGrid tree prior [30] with an uncorrelated relaxed clock with a lognormal distribution [31,32].

For each dataset, at least three independent Markov chain Monte Carlo (MCMC) chains were run for 50 million generations with states sampled every 1,000 generations. Multiple MCMC chains were calculated to increase Effective Sample Size(ESS). Log files were combined using Logcombiner (v.1.10.4) to ensure sufficient convergence (ESS≧200) with 10% of posterior samples discarded as burn-in. MCMC mixing was diagnosed using visual trace inspection and calculation of ESS in Tracer (v.1.7.2) [33]. The ESS of a parameter sampled from an MCMC is the number of effectively independent draws from the posterior distribution that the Markov chain is equivalent to. Maximum clade credibility trees were summarized using TreeAnnotator after discarding 10% as burn-in (S1 File.The protocol for Bayesian estimation of past population dynamics using the Skygrid coalescent model.).

Ethical issues

All analyses were performed on de-identified datasets to protect the participants’ anonymity. The research ethics committee at the Beijing CDC approved this study, and all the methods in this study were performed in accordance with approved guidelines. By law (Law of the People’s Republic of China on the Prevention and Treatment of Infectious Diseases, and Regulations on AIDS Prevention and Treatment), consent was not required, as these data were collected and analysed in the course of routine public health surveillance.

Statistical analysis

Four sampling phases were established: 1994–2003, 2004–2008, 2009–2013, and 2014–2020. The most early (1994–2003) and recent (2014–2020) phases encompassed more years to account for the relatively fewer data available in these years. We compared categorical data with the x2 test and continuous data with one-way ANOVA, wherever appropriate. We analysed the variables for clustering using univariable and multivariable logistic regression. Variables considered were region, HCV subtype, population characteristics, and sampling phase. We analysed all variables separately and entered those associated (P<0.1) with the outcomes into the multivariable model. We present the results as odds ratios (ORs) with 95% confidence intervals (CIs). We performed all analyses using R (version 4.1.1; R Foundation, Vienna, Austria). We used listwise deletion to handle missing data.

Results

Study population

Our study population was four cohorts (Fig 1 and S1 Table). First, we included 756 individuals newly diagnosed with HIV from the national epidemiology database of China. The BHLN is authorized officially to participate in maintaining this database. Second, we included 50 individuals who visited the health examination outpatient service of the Beijing Center for Disease Prevention and Control (CDC) and had HCV antibody-positive records. Third, we included 19 individuals with HCV antibody-positive records from an MSM cohort that consist of 4,200 people recruited between 2015 and 2021.

From the above three cohorts, we included 825 individuals in our analysis. Amplification and sequencing of Ns5b and C/E2 fragments were successful for 342 (40.6%) individuals. For an additional 126 patients, sequences were obtained for either Ns5b alone (n = 64) or C/E2 fragment alone (n = 62). Hence, we were able to perform HCV genotyping and phylogenetic analysis for 468 (56.7%) individuals based on the availability of sequence data. The prevalence of viraemic HCV infection in PWID, former paid blood donors, and MSM was 60.7% (327 of 539), 40.7% (79 of 194), and 0.05% (2 of 4,200) respectively. The majority of the participants were men (80.1%). Han, Uygur and Yi ethnicities accounted for 43.0%, 38.4% and 10.3%, respectively. The median age was 32 years (interquartile range [IQR] 26–39). The CD4 was only available for individuals with HIV/HCV co-infection and the overall median baseline CD4 count was 336 cells per μL (IQR 240–461).

Fourth, we included all Ns5b and C/E2 sequences sampled in China with known sampling provinces and sampling years available in the LANL HCV sequence database. After rigorous phylogenetic analysis, we obtained both HCV Ns5b and C/E2 sequence fragments from 322 individuals and either of the fragments from 1,021(S2 File.Accession numbers).

Thus, we included 1,603 Ns5b and 865 C/E2 sequences from 1,811 individuals from 25 provinces of China in the final analysis (Fig 1 and S1 Table). The transmission risk group were predominantly PWID (77.1%), followed by general population (16.5%), former paid blood donor (5.7%), heterosexual (0.3%), MSM(0.1%), and volunteer blood donor (0.1%) (Table 1).

Table 1. Baseline characteristic by sampling phase.

1994–2003 2004–2008 2009–2013 2014–2020 Total
Sex
Men 123(85.4) 271(80.7) 69(78.4) 48(68.6) 511(80.1)
Women 21(14.6) 65(19.3) 19(21.6) 22(31.4) 127(19.9)
Age at diagnosis(years)a 27(23–31) 32(26–37) 34(28–39) 44(33–72) 32(26–39)
CD4 counts(cells per μL)b 213(119–306) 294(206–422) 369(286–506) 356(299–405) 336(240–461)
Ethnicity
Han 71(75.5) 86(36.1) 47(53.4) 9(12) 213(43)
Uighur 23(24.5) 124(52.1) 25(28.4) 18(24) 190(38.4)
Yi 0(0) 22(9.2) 16(18.2) 13(17.3) 51(10.3)
Li 0(0) 0(0) 0(0) 34(45.3) 34(6.9)
Other Minority 0(0) 6(2.5) 0(0) 1(1.3) 7(1.4)
Population characteristicc
Heterosexual 2(1) 0(0) 2(0.3) 0(0) 4(0.3)
MSM 0(0) 0(0) 0(0) 2(0.7) 2(0.1)
PWID 153(77.3) 180(75.9) 442(67.7) 288(99.3) 1063(77.1)
Former paid blood donor 4(2) 57(24.1) 18(2.8) 0(0) 79(5.7)
General population 37(18.7) 0(0) 191(29.2) 0(0) 228(16.5)
Volunteer blood donor 2(1) 0(0) 0(0) 0(0) 2(0.1)
Region
North 13(6) 40(11.5) 18(2) 2(0.6) 73(4)
Northeast 2(0.9) 7(2) 2(0.2) 0(0) 11(0.6)
East 0(0) 5(1.4) 367(39.8) 0(0) 372(20.5)
Central South 2(0.9) 37(10.7) 281(30.4) 43(13.3) 363(20)
Southwest 96(44.2) 128(36.9) 226(24.5) 257(79.3) 707(39)
Northwest 54(24.9) 128(36.9) 27(2.9) 20(6.2) 229(12.6)
Unknown 50(23) 2(0.6) 2(0.2) 2(0.6) 56(3.1)
Genotype and subtype
1a 2(0.9) 2(0.6) 38(4.1) 22(6.8) 64(3.5)
1b 80(36.9) 121(34.9) 291(31.5) 35(10.8) 527(29.1)
2a 18(8.3) 18(5.2) 85(9.2) 0(0) 121(6.7)
3a 43(19.8) 62(17.9) 104(11.3) 64(19.8) 273(15.1)
3b 48(22.1) 99(28.5) 213(23.1) 102(31.5) 462(25.5)
6a 8(3.7) 10(2.9) 91(9.9) 26(8) 135(7.5)
6n 11(5.1) 18(5.2) 50(5.4) 40(12.3) 119(6.6)
6xa 7(3.2) 12(3.5) 36(3.9) 13(4) 68(3.8)
Other 0(0) 5(1.4) 15(1.6) 22(6.8) 42(2.3)

Data are n (%)

North = Beijing, Hebei, Shanxi, Inner Mongolia

Northeast = Liaoning, Heilongjiang

East = Shanghai, Jiangsu, Zhejiang, Anhui, Jiangxi, Shandong

Central South = Henan, Hubei, Hunan, Guangdong,Guangxi, Hainan

Southwest = Chongqing, Sichuan, Guizhou, Yunnan

Northwest = Shannxi, Qinghai, Sinkiang

a Data for n = 429

b Data for n = 125

cData for n = 1378

dData for n = 1755

MSM = Men who have sex with men

PWID = People who inject drugs

Other = 6e, 6 g, 6 l, 6w, and 6v.

Phylogenetic analysis

We performed phylogenetic analysis using the merged Ns5b and C/E2 sequence dataset, which consisted of 1,603 and 865 sequences respectively. The phylogenetic tree confirmed the genotype assignment by COMET-HCV, and the genotype determinations between the Ns5b and C/E2 fragments were consistent (S2 Fig). All isolates in our study belong to four genotypes (1, 2, 3, and 6) and 13 subtypes (1a, 1b, 2a, 3a, 3b, 6a, 6e, 6g, 6l, 6n, 6v, 6w, and 6xa). The prevalence of genotypes 1, 2, 3, and 6 was 32.6%, 6.7%, 40.6%, and 20.1%, respectively. The most common HCV subtypes in order of decreasing frequency were 1b (29.1%), 3b (25.5%), 3a (15.1%), 6a (7.5%), 2a (6.7%), 6n (6.6%), 6xa (3.8%), and 1a (3.5%). Additional clades, including subtypes 6e, 6g, 6l, 6w, and 6v, were present in fewer than 1.0% of individuals. HCV genotype patterns differed between population groups. In most groups, subtype 1b was the most prevalent (Tables 1 and 2). Table 1 presents the temporal trends for these eight major subtypes. There was a decreasing trend for genotype 1b and a stable trend for 3a and 3b. Table 2 illustrates the geographical distribution of HCV subtypes in China.

Table 2. HCV genotype and subtype assignment by selected characteristics.

1a 1b 2a 3a 3b 6a 6n 6xa Other Total
Sexa
Men 7(1.4) 162(31.7) 17(3.3) 106(20.7) 140(27.4) 11(2.2) 26(5.1) 29(5.7) 13(2.5) 511(100)
Women 1(0.8) 45(35.4) 11(8.7) 17(13.4) 22(17.3) 15(11.8) 3(2.4) 1(0.8) 12(9.4) 127(100)
Ethnicityb
Han 3(1.4) 101(47.4) 35(16.4) 21(9.9) 38(17.8) 11(5.2) 2(0.9) 1(0.5) 1(0.5) 213(100)
Uyghur 2(1.1) 89(46.8) 0(0) 57(30.0) 40(21.1) 2(1.1) 0(0) 0(0) 0(0) 190(100)
Yi 1(2) 3(5.9) 0(0) 14(27.5) 11(21.6) 0(0) 0(0) 22(43.1) 0(0) 51(100)
Li 0(0) 0(0) 0(0) 0(0) 0(0) 12(35.3) 0(0) 0(0) 22(64.7) 34(100)
Other minority 0(0) 3(42.9) 0(0) 2(28.6) 2(28.6) 0(0) 0(0) 0(0) 0(0) 7(100)
Population characteristicc
Heterosexual 0(0) 3(75) 1(25) 0(0) 0(0) 0(0) 0(0) 0(0) 0(0) 4(100)
MSM 0(0) 2(100) 0(0) 0(0) 0(0) 0(0) 0(0) 0(0) 0(0) 2(100)
PWID 49(4.6) 202(19) 6(0.6) 241(22.7) 339(31.9) 58(5.5) 96(9) 65(6.1) 7(0.7) 1063(100)
Former paid blood donor 0(0) 52(65.8) 16(20.3) 3(3.8) 6(7.6) 1(1.3) 1(1.3) 0(0) 0(0) 79(100)
General population 0(0) 169(74.1) 37(16.2) 5(2.2) 8(3.5) 7(3.1) 2(0.9) 0(0) 0(0) 228(100)
Volunteer blood donor 0(0) 1(50) 0(0) 0(0) 1(50) 0(0) 0(0) 0(0) 0(0) 2(100)
Regiond
North 0(0) 33(45.2) 4(5.5) 9(12.3) 21(28.8) 5(6.8) 0(0) 1(1.4) 0(0) 73(100)
Northeast 0(0) 4(36.4) 1(9.1) 1(9.1) 4(36.4) 1(9.1) 0(0) 0(0) 0(0) 11(100)
East 3(0.8) 187(50.3) 30(8.1) 49(13.2) 46(12.4) 37(9.9) 13(3.5) 1(0.3) 6(1.6) 372(100)
Central South 15(4.1) 105(28.9) 65(17.9) 14(3.9) 53(14.6) 68(18.7) 10(2.8) 3(0.8) 30(8.3) 363(100)
Southwest 44(6.2) 70(9.9) 6(0.8) 123(17.4) 281(39.7) 20(2.8) 95(13.4) 63(8.9) 5(0.7) 707(100)
Northwest 2(0.9) 98(42.8) 1(0.4) 74(32.3) 51(22.3) 2(0.9) 1(0.4) 0(0) 0(0) 229(100)
Co-infectione
HCV single infection 0(0) 25(34.2) 13(17.8) 0(0) 0(0) 13(17.8) 0(0) 0(0) 22(30.1) 73(100)
HCV/HIV co-infection 34(4.2) 219(26.9) 22(2.7) 177(21.7) 227(27.9) 51(6.3) 43(5.3) 37(4.5) 5(0.6) 815(100)

Data are n (%)

aData for n = 638

bData for n = 493

cData for n = 1378

dData for n = 1755

eData for n = 886

North = Beijing, Hebei, Shanxi, Inner Mongolia

Northeast = Liaoning, Heilongjiang

East = Shanghai, Jiangsu, Zhejiang, Anhui, Jiangxi, Shandong

Central South = Henan, Hubei, Hunan, Guangdong,Guangxi, Hainan

Southwest = Chongqing, Sichuan, Guizhou, Yunnan

Northwest = Shannxi, Qinghai, Sinkiang

MSM = Men who have sex with men

PWID = People who inject drugs

Other = 6e, 6 g, 6 l, 6w, and 6v.

Network inference

Using the Ns5b sequence (n = 1,603), we built an HCV transmission network representing 25 provinces of China. The network contains 111 connected components with ≥2 nodes (clusters) comprising 530 nodes (individual sequence) and 2,194 edges (undirected, potential links). The average degree (number of edges per node) was 4.1. The number of sequences per cluster ranged from 2–84 (median: 3, interquartile range:2–3) (Fig 2). In multivariable logistic analyses, being in a cluster was significantly associated with region (OR: 0.37, 95% CI: 0.19–0.71), subtype (OR: 0.23, 95% CI: 0.1–0.52), and sampling period(OR: 0.43, 95% CI: 0.27–0.68) (S2 Table).

Fig 2. HCV molecular transmission network in China.

Fig 2

Clusters with ≥2 cases (i.e., nodes) are depicted. Links (i.e., edges) indicate genetic distance≤0.01 substitutions/site for Ns5b and ≤0.0325 substitutions/site for C/E2. Shape indicates population groups: Diamond, heterosexual; ellipse, people inject drugs; rectangle, former blood donors; hexagon, unknown; triangle, general population; parallelogram, volunteer blood donors. Color indicate sampling region: Red, North; orange, Northeast; yellow, East; green, Central South; blue, Southwest; purple, Northwest. North = Beijing, Hebei, Shanxi, Inner Mongolia, Northeast = Liaoning, Heilongjiang, East = Shanghai, Jiangsu, Zhejiang, Anhui, Jiangxi, Shandong, Central South = Henan, Hubei, Hunan, Guangdong,Guangxi, Hainan, Southwest = Chongqing, Sichuan, Guizhou, Yunnan, Northwest = Shannxi, Qinghai, Sinkiang.

We repeated the same network inference procedure for 865 C/E2 sequences. Although the available dataset is relatively smaller, we observed a similar pattern in the transmission network inferred using C/E2 sequences (Fig 2 and S3 Table).

Phylodynamic analyses and inference of divergence date

We performed a Bayesian SkyGrid Plots (BSP) analysis for 19 datasets: 1) six Ns5b datasets (1a, 1b, 2a, 3a, 3b, 6a, 6n, and 6xa), 2) six C/E2 datasets (1a, 1b, 2a, 3a, 3b, 6a, 6n, and 6xa), and 3) three Ns5b +C/E2 concatenated datasets (1b, 3a, and 3b). Table 3 and S3 Fig. show the date of the Time to the Most Recent Common Ancestor (TMRCA) for the eight major HCV subtypes. Among them, subtype 1a and 6n were the oldest, subtype 6xa was the youngest. The TMRCA dates for strains 1b, 2a, 3a, and 3b were in the same range, approximately 80 years ago. The BSP shown in Figs 3 and S4. depict the estimated change in the effective number of infected individuals over time. Of the eight major subtypes, the epidemic history of 1b was one of most complicated in our datasets: it showed an "M-shape" or "Roller Coaster" curve that consisted of two major epidemic waves. The first wave began circa 1910 and ended circa 1985, with a peak circa 1957. The increasing period of the wave coincides with the introduction of modern medicine in China (probably through the reuse and inadequate sterilization of glass and metal syringes). The decreasing period coincides with the two major social and political events in China: the "Great Leap Forward" (1958–1960) and "Cultural Revolution" (1966–1976). The second wave seemed to be sparked by the increase in PWID in the middle 1980s and was enhanced by the “Encouraged Plasma Campaign”(1993–2000) in the 1990s. This escalating trend was abruptly reversed in approximately 2000, when the Chinese government outlawed the use of paid blood donors. After that, despite a small rebound between 2005 and 2010, the 1b epidemic entered a downward trend from 2010 until the present. The other seven major subtypes have similar but relatively simple BSP curves. Of note, five major subtypes (1a, 1b, 2a, 3a, and 3b) exhibited a declining trend after 2010 until the present, whereas three subtypes (6a, 6n and 6xa) showed an increasing or stable trend. We repeated the same phylodynamic procedure using C/E2 sequences datasets, and TMRCA and BSP were roughly consistent with that of Ns5b except for subtype 1a (Figs 3 and S4). The BSP for concatenated datasets have smaller confidence limits but narrower time scale (Fig 3).

Table 3. The TMRCA of HCV in China.

Ns5b C/E2 C/E2+ Ns5b
Subtype Numbers Numbers Numbers
1a 58 1899.6(1616.1–1981.5) 16 2007.4(1960–2013.1)
1b 458 1915.2(1871.8–1942) 282 1920.4(1897.4–1939.3) 128 1897.3(1844.9–1935.4)
2a 95 1977.9(1963–1987.8) 33 1800.2(190–1975.9)
3a 216 1948.9(1910.3–1974) 143 1959.9(1947.9–1970.3) 75 1955.4(1935.9–1971)
3b 399 1952.2(1918.8–1976.2) 188 1962.9(1949.9–1974.7) 69 1975.9(1961.7–1985.8)
6a 112 1964.6(1938.2–1982.2) 44 1972.3(1954.1–1984.5)
6n 80 1967.8(1925–1993.5) 50 1932.3(1836.5–1973.7)
6xa 58 1972.9(1935.5–1990.5) 42 1976(1957.6–1988.5)

TMRCA = Time to the Most Recent Common Ancestor.

aData are TMRCA (the 95% highest posterior density [HPD] interval).

Fig 3. The past population dynamics of HCV visualized using the Skygrid model.

Fig 3

The shaded portion is the 95% Bayesian credibility interval, and the solid line is the posterior median.

Discussion

Here, we report large amounts of data on HCV molecular epidemiology in China based on demographic and clinical data and HCV sequences from 1,811 patients of 25 provinces between 1994 and 2020. These data show that the HCV epidemic in China exhibits some degree of genetic diversity [3441], consisting of four HCV subtypes and corresponding to 13 subtypes. Consistent with previous studies, the most prevalent HCV variant was subtype 1b, followed by 3b and 3a [611]. These subtypes are responsible for the majority of HCV cases globally [3441]. Of note, five of eight major epidemic subtypes, together with 81.8% of HCV strains in our study, showed a declining tendency in effective population size during the past decade. In HCV transmission network analysis, 33.1% of patients grouped into 111 molecularly defined HCV transmission clusters.

Nakano, et al. [12] also reconstructed the population genetic history of HCV 1b in China and found that both groups of 1b grew at a rapid exponential rate during the "Cultural Revolution" of 1966–1976. They further attribute this rapid growth to the introduction of a million nonprofessional health-care providers ("barefoot doctors"). Barefoot doctors were healthcare providers who underwent basic medical training and worked in rural villages in China. The barefoot doctors system was developed and institutionalized in 1965 and broke down in the 1980s. Barefoot doctors included farmers, folk healers, rural healthcare providers and recent middle or secondary school graduates who received minimal basic medical and paramedical education.

Contrary to Nakano’s finding, we observed a declining trend for HCV 1b in the effective population size during the "Cultural Revolution", and we suggest that attributing the increasing trend only to the introduction of "barefoot doctors" during the "Cultural Revolution" is oversimplified. Indeed, the impacts of large historical events such as the "Cultural Revolution" on the epidemic dynamics of HCV are complicated. On the one hand, the closure of medical schools and specialist hospital departments led to the introduction of "barefoot doctors" into the medical system, which may have caused an increase in HCV infection. On the other hand, nearly all professional medical staff had to stop working and were dispersed across the countryside during that period, which led to a sharp decline in the total amount of medical activity, including unsafe injections. We believe that the latter was the real determinant for the declining trend in the "Cultural Revolution" period.

Pybus et al. [42] showed that genotype 6 infection worldwide descended from a common ancestor that existed approximately 1,100 to 1,350 years earlier. How stable endemic transmission of HCV genotype 6 could be maintained for such long a time period has always fascinated scientists. As introduction and transmission events of HCV genotype 6 occurred so many years ago, we can only speculate. We suggest that traditional tattooing, which once prevailed in some minor ethnic populations of Yunnan Province, is responsible [43]. We further suggest that Yunnan is the epidemic centre of HCV genotype 6 in China as well as that of HIV [4446]. Yunnan is located in southwestern China, bordering Myanmar, Laos, and Vietnam. There are 16 ethnic minorities inhabiting the border, many of whom used to practice the custom of a traditional tattooing. The proximity and close cultural ties between populations in Yunnan and Southeast Asia countries have linked these groups for many years. It is plausible to speculate that HCV genotype 6 was introduced to China from Southeast Asian and maintained through traditional tattooing until the modern time, when this traditional custom was no longer popular.

To our knowledge, this is the largest study of its type thus far and involves the longest time period. Through this informative dataset, we conducted a national HCV molecular epidemiology study with broad representativeness and accurate phylogenetic reconstruction.

This analysis also has limitations. First, since approximately two-thirds of the sequences were from publicly available databases, most of the baseline characteristics of the patients were not available, which prevented us from including these variables in transmission cluster analysis and from making a more detailed investigation of the risk factors driving HCV epidemics in China. Second, because we used a cost-effective sampling method, participants with HIV/HCV co-infection or PWID were overrepresented in our study. Hence, the findings might not be fully representative of the whole population in China. Third, the number of the recently sampled sequences was relatively small (S5 Fig). Therefore, the small rebound observed between 2005 and 2010 in our study is more likely due to sampling biases (e.g., distribution of samples in time and, lack of convergence in chain) than a real trend in the data.

Fourth, we discarded 152(6.5%) sequences from the original dataset because we thought they had quality problems, which could reshape a dataset with no temporal signal into one that strongly supports phylogenetic molecular clock analysis. The original results without the filtering sequences are listed in S4 Table.

In summary, this national study of 1,811 patient HCV sequences describes the most recent data on HCV genotype distribution in China. The most common HCV strain was found to be 1b, followed by 3b and 3a. Phylodynamic analysis revealed a complex scenario that was most likely driven by a combination of social, demographic, and medical factors over both recent and historical timescales. Crucially, BSP analysis showed a declining trend up to the present for 81.8% of the HCV strains in our study, which is a good omen for the goal of eliminating HCV by 2030.

Supporting information

S1 Fig. Number of transmission clusters as a function of the TN93 distance.

The threshold that was selected is highlighted in red.

(DOCX)

S2 Fig. The maximum likelihood (ML) phylogenetic tree based on Ns5b and C/E2 gene.

1,603 Ns5b and 865 C/E2 sequences from China were analyzed with HCV reference strains (NC_004102, D90208, AB047639, JN714194, JQ065709, HQ639936, DQ278894) as an out-group using Fasttree 2.1.

(DOCX)

S3 Fig. The TMRCA of HCV in China.

TMRCA = Time to the Most Recent Common Ancestor. The solid line indicates the 95% highest posterior density [HPD] interval for TMRCA.

(DOCX)

S4 Fig. The past population dynamics of HCV (1a, 2a, 6a, 6n, and 6xa) visualized using the Skygrid model.

The shaded portion is the 95% Bayesian credibility interval, and the solid line is the posterior median.

(DOCX)

S5 Fig. The distribution of sampling year for HCV sequences in China.

(DOCX)

S1 Table. Descriptions of the participating cohort.TDR = Transmitted drug resistance.

BHLN = Beijing HIV laboratory network; PWID = People who inject drugs; MSM = Men who have sex with men; NA = not available; LANL = Los Alamos National Laboratory.

(DOCX)

S2 Table. Demographic and clinical factors associated with clustering based on Ns5b gene.

North = Beijing, Hebei, Shanxi, Inner Mongolia, Northeast = Liaoning, Heilongjiang, East = Shanghai, Jiangsu, Zhejiang, Anhui, Jiangxi, Shandong, Central South = Henan, Hubei, Hunan, Guangdong,Guangxi, Hainan, Southwest = Chongqing, Sichuan, Guizhou, Yunnan, Northwest = Shannxi, Qinghai, Sinkiang; MSM = men who have sex with men, PWID = people who inject drugs; NA = not available; OR = odds ratio; aData are n (%); bUnivariable logistic regression analysis; cMultivariable logistic regression analysis; dData for n = 1552, eData for n = 1175, Other = 6e, 6g, 6l, 6w, and 6v.

(DOCX)

S3 Table. Demographic and clinical factors associated with clustering based on C/E2 gene.

North = Beijing, Hebei, Shanxi, Inner Mongolia, Northeast = Liaoning, Heilongjiang, East = Shanghai, Jiangsu, Zhejiang, Anhui, Jiangxi, Shandong, Central South = Henan, Hubei, Hunan, Guangdong,Guangxi, Hainan, Southwest = Chongqing, Sichuan, Guizhou, Yunnan, Northwest = Shannxi, Qinghai, Sinkiang; MSM = men who have sex with men, PWID = people who inject drugs; NA = not available; OR = odds ratio; aData are n (%), bUnivariable logistic regression analysis, cMultivariable logistic regression analysis, dData for n = 849, eData for n = 846, Other = 6e, 6g, 6l, 6w, and 6v.

(DOCX)

S4 Table. The TMRCA of HCV in China inferred from original dataset.

TMRCA = Time to the Most Recent Common Ancestor. aData are TMRCA (the 95% highest posterior density [HPD] interval).

(DOCX)

S1 File. The protocol for Bayesian estimation of past population dynamics using the Skygrid coalescent model.

(DOCX)

S2 File. Accession numbers.

(DOCX)

Acknowledgments

We thank the study participants and the staff at the collaborating clinical sites and laboratories. We thank the local health workers of the BHLN, who spent numerous hours and great effort in obtaining, verifying, and cleaning the data used in this study. We thank Dr. Xiang He from Guangdong Institute of Public Health for useful comments on drafts of the manuscript.

Data Availability

The minimal anonymized data sets necessary to replicate study findings have been uploaded in Open Science Framework (OSF) DOI: https://doi.org/10.17605/OSF.IO/NKD8Y. HCV sequences have been submitted to GenBank (See the Supporting Information file for accession numbers).

Funding Statement

This work was supported by China Capital's Funds for Health Improvement and Research (2022-1G-3011) to Jingrong Ye, Beijing Municipal Science & Technology Commission (D161100000416002), Beijing High-Level Public Health Doctor Cultivation Project (Academic Leader-01-04) to Hongyan Lu, Cultivation Fund of Beijing Center for Disease Prevention and Control (2019-BJYJ-13) to Yanming Sun. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.The Polaris Observatory HCV Collaborators. Global prevalence and genotype distribution of hepatitis C virus infection in 2015: a modeling study. Lancet Gastroenterol Hepatol.2017; 2,161–176. [DOI] [PubMed] [Google Scholar]
  • 2.National Health Commission of the People’s Republic of China. An overview of the epidemic situation of legal infectious diseases in China in 2020.2021;http://http://www.nhc.gov.cn/jkj/s3578/202103/f1a448b7df7d4760976fea6d55834966.shtml.
  • 3.GBD 2017 Cirrhosis Collaborators. The global, regional, and national burden of cirrhosis by cause in 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017.Lancet Gastroenterol Hepatol.2020; 5,245–266. doi: 10.1016/S2468-1253(19)30349-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Assembly WHOS-NWH. Draft Global Health Sector Strategies Viral Hepatitis 2016–2021, 2016. http://www.who.int/hepatitis/strategy2016-2021/Draft_global_health_sector_strategy_viral_hepatitis_13nov.pdf (accessed Jun 1, 2020). [Google Scholar]
  • 5.WHO. Combating Hepatitis B and C to Reach Elimination by 2030. Geneva: World Health Organization, 2016. https://www.who.int/hepatitis/publications/hep-elimination-by-2030-brief/en/ (accessed Jun 1, 2020). [Google Scholar]
  • 6.Shang Hong, Zhong Ping, Liu Jing,et al.High Prevalence and Genetic Diversity of HCV among HIV-1 Infected People from Various High-Risk Groups in China. PLoS ONE.2010; 5,e10631. doi: 10.1371/journal.pone.0010631 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Fu Y, Wang Y,Xia W,et al. New trends of HCV infection in China revealed by genetic analysis of viral sequences determined from first-time volunteer blood donors.J Viral Hepat.2011; 18,42–52. doi: 10.1111/j.1365-2893.2010.01280.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Zhang Chiyu, Wu Nana, Liu Jun,et al.HCV Subtype Characterization among Injection Drug Users: Implication for a Crucial Role of Zhenjiang in HCV Transmission in China. PLoS ONE.2011; 6,e16817. doi: 10.1371/journal.pone.0016817 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Tian Di, Li Lin, Liu Yongjian, Li Hanping, Xu Xiaoyuan, Li Jingyun. Different HCV Genotype Distributions of HIV-Infected Individuals in Henan and Guangxi, China. PLoS ONE.2012; 7,e50343. doi: 10.1371/journal.pone.0050343 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lu Ling, Wang Min, Xia Wenjie, et al. Migration patterns of hepatitis C virus in China characterized for five major subtypes based on samples from 411 volunteer blood donors from 17 provinces and municipalities.J Virol.2014; 88,7120–9. doi: 10.1128/JVI.00414-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Zhou S,Cella E,Zhou W,et al. Population dynamics of hepatitis C virus subtypes in injecting drug users on methadone maintenance treatment in China associated with economic and health reform. J Viral Hepat.2017; 24, 551–560. doi: 10.1111/jvh.12677 [DOI] [PubMed] [Google Scholar]
  • 12.Nakano Tatsunori,Lu Ling,He Yunshao,Fu Yongshui,Robertson Betty H, Pybus Oliver G. Population genetic history of hepatitis C virus 1b infection in China.J Gen Virol.2006; 87,73–82. doi: 10.1099/vir.0.81360-0 [DOI] [PubMed] [Google Scholar]
  • 13.Lu Ling, Tong Wangxia,Gu Lin,et al.The Current Hepatitis C Virus Prevalence in China May Have Resulted Mainly from an Officially Encouraged Plasma Campaign in the 1990s: a Coalescence Inference with Genetic Sequences.J Virol.2013; 87,12041–50. doi: 10.1128/JVI.01773-13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wertheim Joel O, Leigh Brown Andrew J, Hepler N Lance, et al. The Global Transmission Network of HIV-1. J Infect Dis.2014;209,1642–1652. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Jacka Brendan,Applegate Tanya,Krajden Mel,et al.Phylogenetic Clustering of Hepatitis C Virus Among People Who Inject Drugs in Vancouver, Canada. Hepatology.2014; 60,1571–1580. doi: 10.1002/hep.27310 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Bartlet S R, Wertheim J O, Bull R A, et al. A molecular transmission network of recent hepatitis C infection in people with and without HIV: Implications for targeted treatment strategies.J Viral Hepat.2017; 24,404–411. doi: 10.1111/jvh.12652 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Hassan AS, Pybus OG, Sanders EJ, et al.Defining HIV-1 transmission clusters based on sequence data.AIDS.2017;31,1211–1222. doi: 10.1097/QAD.0000000000001470 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Poon Art F. Y. Impacts and shortcomings of genetic clustering methods for infectious disease outbreaks.Virus Evol.2016;2,vew031. doi: 10.1093/ve/vew031 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Ye Jingrong, Hao Mingqiang, Xing Hui, et al.Transmitted HIV drug resistance among individuals with newly diagnosed HIV infection: a multicenter observational study.AIDS.2020; 34,609–619. doi: 10.1097/QAD.0000000000002468 [DOI] [PubMed] [Google Scholar]
  • 20.Ye Jingrong, Hao Mingqiang, Xing Hui, et al.Characterization of subtypes and transmitted drug resistance strains of HIV among Beijing residents between 2001-2016.PLoS One. 2020;26,e0230779. doi: 10.1371/journal.pone.0230779 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ma Xiaoyan, Zhang Qiyun, He Xiong, et al. Trends in prevalence of HIV, syphilis, hepatitis C, hepatitis B, and sexual risk behavior among men who have sex with men. The results of 3 consecutive respondent-driven sampling surveys in Beijing, 2004 through 2006. J Acquir Immune Defic Syndr.2007; 45,581–7. [DOI] [PubMed] [Google Scholar]
  • 22.Chen Qiang, Sun Yanming, Sun Weidong, et al. Trends of HIV incidence and prevalence among men who have sex with men in Beijing, China: Nine consecutive cross-sectional surveys, 2008–2016. PLoS One.2018;13, e0201953. doi: 10.1371/journal.pone.0201953 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Struck D, Lawyer G, Ternes AM, Schmit JC, Bercoff DP. COMET: adaptive context-based modeling for ultrafast HIV-1 subtype identification. Nucleic Acids Res.2014; 42,e144. doi: 10.1093/nar/gku739 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Price MN, Dehal PS, Adam AP. Arkin. FastTree 2-approximatelymaximum-likelihood trees for large alignments. PLoS One.2010; 5, e9490. doi: 10.1371/journal.pone.0009490 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kosakovsky Pond S.L., Frost S. D. W. and Muse S.V. HyPhy: hypothesis testing using phylogenies. Bioinformatics.2005; 21,676–679. doi: 10.1093/bioinformatics/bti079 [DOI] [PubMed] [Google Scholar]
  • 26.Wertheim Joel O., Pond Sergei L. Kosakovsky1, Forgione Lisa A., et al.Social and Genetic Networks of HIV-1 Transmission in New York City. PLoS Pathog.2017;13,e1006000. doi: 10.1371/journal.ppat.1006000 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Rambaut Andrew, Lam Tommy T, Carvalho Luiz Max, Pybus Oliver G .Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen).Virus Evol.2016; 2, vew007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Marc A Suchard Philippe Lemey, Baele Guy, Ayres Daniel L, Drummond Alexei J, Andrew Rambaut. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4, vey016. doi: 10.1093/ve/vey016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Shapiro Beth, Rambaut Andrew, Drummond Alexei J. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol.2006; 23, 7–9. doi: 10.1093/molbev/msj021 [DOI] [PubMed] [Google Scholar]
  • 30.Hill Verity,Baele Guy.Bayesian estimation of past population dynamics in BEAST 1.10 using the SkyGrid coalescent model.Mol Biol Evol.2019; 36,2620–2628. doi: 10.1093/molbev/msz172 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Drummond Alexei J, Ho Simon Y W,Phillips Matthew J,Rambaut Andrew. Relaxed phylogenetics and dating with confidence. PLoS Biol.2006;4, e88. doi: 10.1371/journal.pbio.0040088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Mandev S Gill Philippe Lemey, Nuno R Faria Andrew Rambaut, Shapiro Beth, Suchard Marc A. Improving Bayesian population dynamics inference: a coalescent based model for multiple loci. Mol. Biol. Evol.2013;30, 713–724. doi: 10.1093/molbev/mss265 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Rambaut Andrew, Alexei J Drummond Dong Xie, Baele Guy, Suchard Marc A .Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol.2018; 67, 901–904. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Le Ngoc Chau, Thanh Thanh Tran Thi, Lan Phuong Tran Thi, et al. Differential prevalence and geographic distribution of hepatitis C virus genotypes in acute and chronic hepatitis C patients in Vietnam. PLoS ONE.2019; 14,e0212734. doi: 10.1371/journal.pone.0212734 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Bouacida Lobna, Suin Vanessa, Hutse Veronik, et al. Distribution of HCV genotypes in Belgium from 2008 to 2015. PLoS ONE. 2018;13,e0207584. doi: 10.1371/journal.pone.0207584 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Palladino Claudia, Ezeonwumelu Ifeanyi Jude, Marcelino Rute, et al.Epidemic history of hepatitis C virus genotypes and subtypes in Portugal, Sci Rep.2018; 16,12266. doi: 10.1038/s41598-018-30528-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Shier Medhat K, Iles James C, El-Wetidy Mohammad S, Ali Hebatallah H, Al Qattan Mohammad M. Molecular characterization and epidemic history of hepatitis C virus using core sequences of isolates from Central Province, Saudi Arabia. PLoS ONE. 2017;12,e0184163. doi: 10.1371/journal.pone.0184163 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Nouhin Janin, Iwamoto Momoko, sophearot prak,et al. Molecular epidemiology of hepatitis C virus in Cambodia during 2016–2017,Sci Rep.2019; 9, 7314. doi: 10.1038/s41598-019-43785-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Petruzziello Arnolfo, Sabatino Rocco, Loquercio Giovanna,et al. (2019) Nine year distribution pattern of hepatitis C virus (HCV) genotypes in Southern Italy. PLoS ONE.2019; 14,e0212033. doi: 10.1371/journal.pone.0212033 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.McNaughton Anna L, Cameron Iain Dugald,Wignall-Fleming Elizabeth B,et al.2015. Spatiotemporal reconstruction of the introduction of hepatitis C virus into Scotland and its subsequent regional transmission. J Virol.2015; 89:11223–11232. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Gower Erin, Estes Chris, Blach Sarah, Razavi-Shearer Kathryn, Razavi Homie.Global epidemiology and genotype distribution of the hepatitis C virus infection. J Hepatol. 2014;61, S45–S57. doi: 10.1016/j.jhep.2014.07.027 [DOI] [PubMed] [Google Scholar]
  • 42.Pybus Oliver G.,Barnes Eleanor,Taggart Rachel,et al.Genetic History of Hepatitis C Virus in East Asia.J Virol.2009; 83,1071–82. doi: 10.1128/JVI.01501-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Liu Jun,Survey on the status of the traditional tattoo and its static protection in Yunnan minorities, Proceedings of the 2011 annual meeting of the professional committee of ethnic museum of China association of museums, Xining Qinghai, 2011.8.1,329–344. [Google Scholar]
  • 44.Meng Zhefeng, Xin Ruolei, Zhong Ping, et al.A new migration map of HIV-1 CRF07_BC in China: analysis of sequences from 12 provinces over a decade.PLoS One.2012;7,e52373. doi: 10.1371/journal.pone.0052373 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Feng Yi, He Xiang, Jenny H Hsi,et al.The rapidly expanding CRF01_AE epidemic in China is driven by multiple lineages of HIV-1 viruses introduced in the 1990s.AIDS.2013; 27,1793–802. doi: 10.1097/QAD.0b013e328360db2d [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Li Zhe, He Xiang, Wang Zhe, et al.Tracing the origin and history of HIV-1 subtype B’ epidemic by near full-length genome analyses.AIDS.2012;26,877–84. doi: 10.1097/QAD.0b013e328351430d [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Jason T Blackard

27 Feb 2023

PONE-D-22-30314Distribution pattern, phylodynamic and molecular transmission networks of hepatitis C virus in ChinaPLOS ONE

Dear Dr. Lu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Apr 13 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jason T. Blackard, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information.

Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”).

For additional information about PLOS ONE ethical requirements for human subjects research, please refer to http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research

3. We note that Figure (2) in your submission contain copyrighted images. All PLOS content is published under the Creative Commons Attribution License (CC BY 4.0), which means that the manuscript, images, and Supporting Information files will be freely available online, and any third party is permitted to access, download, copy, distribute, and use these materials in any way, even commercially, with proper attribution. For more information, see our copyright guidelines: http://journals.plos.org/plosone/s/licenses-and-copyright.

We require you to either (1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or (2) remove the figures from your submission:

1. You may seek permission from the original copyright holder of Figure (2) to publish the content specifically under the CC BY 4.0 license. 

We recommend that you contact the original copyright holder with the Content Permission Form (http://journals.plos.org/plosone/s/file?id=7c09/content-permission-form.pdf) and the following text:

“I request permission for the open-access journal PLOS ONE to publish XXX under the Creative Commons Attribution License (CCAL) CC BY 4.0 (http://creativecommons.org/licenses/by/4.0/). Please be aware that this license allows unrestricted use and distribution, even commercially, by third parties. Please reply and provide explicit written permission to publish XXX under a CC BY license and complete the attached form.”

Please upload the completed Content Permission Form or other proof of granted permissions as an "Other" file with your submission. 

In the figure caption of the copyrighted figure, please include the following text: “Reprinted from [ref] under a CC BY license, with permission from [name of publisher], original copyright [original copyright year].”

2. If you are unable to obtain permission from the original copyright holder to publish these figures under the CC BY 4.0 license or if the copyright holder’s requirements are incompatible with the CC BY 4.0 license, please either i) remove the figure or ii) supply a replacement figure that complies with the CC BY 4.0 license. Please check copyright information on all replacement figures and update the figure caption with source information. If applicable, please specify in the figure caption text when a figure is similar but not identical to the original image and is therefore for illustrative purposes only.

Additional Editor Comments (if provided):

This a large cross-sectional study of HCV transmission networks in 4 surveillance populations in China.

Overall, the methods and results are well described.

In addition to addressing the comments raised by the two reviewers, the revisions below would strengthen the manuscript further.

  • The authors mention duplicate sequences and repeated sampling of the same individuals but never state how often these two situations actually occurred here.

  • Table 3 can be simplified by rounding to the nearest whole year.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This paper reports a fairly conventional comparative study of HCV sequences that were collected in China. Like many previous studies, it uses BEAST to reconstruct changes in coalescence rates (i.e., effective number of infections) over time with time-stamped sequences. They also use pairwise genetic distances to cluster sequences into connected components and then evaluate statistical associations of this outcome with individual-level variables such as transmission risk. The authors distinguish their study from previous work by the number of sequences, and the addition of clinical sequences with metadata such as mode of transmission. Overall the manuscript is quite well written.

First, I have some concerns about how the BEAST analyses were carried out. It is generally considered good practice to assess whether there is adequate evidence of a molecular clock in your data by root-to-tip regression (e.g., TempEST) before attempting an analysis in BEAST. This may impact some of the HCV subtypes with smaller numbers of sequences.

The results would also be more convincing if the authors had run replicate chains for each dataset to demonstrate convergence. Furthermore, wouldn't it be possible to analyze both NS5B and C/E2 regions jointly? There is limited recombination in HCV, so combining the sequences should lend more statistical power to estimate TMRCA etc. The sequences can be concatenated, adding stretches of N's where one region is not covered in a given sample. On the other hand, adding the LANL data severely reduces the proportion of samples for which both regions are covered.

The authors should include a figure summarizing the distribution of sample collection dates over time as a histogram or density plot - this could even be incorporated into the Bayesian skyline plots. In addition, the skyline plots should be moved into the main document, ideally displayed with the same time (horizontal) axis. Moreover, the authors should consider combining skylines into a single plot.

Please report the configuration of BEAST analyses (e.g., prior hyperparameters) to a degree that would enable others to reproduce the analysis. Please report specific criteria used to assess convergence, such as effective sample size.

Specific comments:

- page 3, line 56 abstract and elsewhere, "65.2% of HCV strains" - it is not clear what the authors mean by strains. Are they referring to the different HCV subtypes / genotypes analyzed independently in BEAST?

- page 4, lines 87-88: "However, the outcomes of these theoretical studies have been limited by a relatively narrow span of sampling time." What makes these past studies theoretical? Nakano et al. (2006) and Lu et al. (2013) report phylogenetic analyses of HCV sequence data, so they seem just as empirical as the present study.

- page 5, line 94, please clarify what you mean by "epidemiologic connection"

- page 5, lines 95-96: "Over the last decade, a simplified genetic distance-based method has increasingly been used to define HIV transmission networks within a population." The authors should indicate which method they are talking about (there are several), and cite the relevant literature.

- page 5, line 105: "[...] using our unique dataset." Every dataset is unique. It would be more appropriate to state "[...] using a substantially more comprehensive dataset and metadata than previous work." or something along those lines.

- page 6, line 111: please define BHLN, CDC and PLA at first use.

- page 6, line 114: do the authors mean "reference laboratory" when they write "confirmatory laboratory"? I have not seen this term used as a noun in the literature - I can only find the phrase "confirmatory laboratory testing".

- page 6, line 119 and elsewhere: perhaps use "cost-effective" rather than "economical"?

- page 6, lines 123-124: please cite a reference

- page 6, line 130: "Accepting the reality" - this is awkward phrasing

- page 6, line 131: "we devised unique economic [...]" - is it really necessary to assert that these inclusion criteria are unique?

- page 7, it would be helpful to refer to Figure 1 (data collection flowchart) somewhere here, and/or Table S1

- lines 169-170: presumably building the ML tree was to confirm COMET geno/subtyping - please clarify

- lines 173-174: how many duplicate sequences? It is not necessary to discard these for a molecular clock analysis (i.e., BEAST), and in fact removing duplicate sequences can bias the analysis.

- line 177: what do you mean by "miscued"? Do you mean "misclassified"?

- lines 188-189: the authors report generating maximum clade credibility trees, but these do not appear anywhere in the manuscript or supplementary materials.

- lines 195: what is the rationale for using these TN93 distance thresholds? How sensitive are your results to varying these thresholds?

- lines 219-222: minor point - it is a bit unusual to report both confidence intervals and P-values, where CI is the presently the recommended method for assessing statistical significance.

- lines 261-262: please display the phylogenetic tree as a supplementary figure, preferably with tips coloured by COMET results

- line 272: "Table 2 presents the temporal trends for these eight major subtypes." Table 2 does not appear to contain any temporal information - I think the authors meant to refer to Table 1?

- Please consider using a set of choropleths (e.g., https://www.esri.com/arcgis-blog/wp-content/uploads/2020/02/redmap.png) to summarize the distribution of HCV subtypes in China (one choropleth per subtype) instead of pie charts. I think they would be much easier to interpret.

- Table 3 might be more effectively presented as a series of histograms or density plots (e.g., ridgeplots)

- line 287: regarding "modern medicine", it is most likely the reuse and inadequate sterilization of glass and metal syringes, is it not?

- lines 293-294: the "small rebound between 2005 and 2010" is more likely due to sampling biases (e.g., distribution of samples in time, lack of convergence in chain) than a real trend in the data.

- lines 298-300: it is not sufficient to claim that the BEAST analyses of C/E2 sequences "were consistent with that of NS5B" without showing any data or reporting any quantitative results. It would not be difficult, for example, to plot the skylines for both genome regions together for a given subtype.

- lines 310-312: Please summarize odds ratios and CIs from univariate analyses here. The reader should not have to dig into the supplementary materials for this information.

- lines 314-316: "Although the available dataset is relatively smaller, we observed a similar pattern in the transmission network inferred using C/E2 sequences." This is really inadequate. If you are not going to show these results (e.g., sizes and composition of largest components, network graph), then you need to justify this conclusion with quantitative results, i.e., a statistical comparison of the two networks.

- lines 321-322: "These data show that the HCV epidemic in China exhibits great genetic diversity." This claim needs to be justified. How much more diverse are the HCV sequences (with respect to number of different genotypes and subtypes) in China compared to other regions? Ideally you should adjust for differences in sample sizes.

- line 331 and line 349: It is unconventional to give the full name of the first authors when referencing previous work. Usually one would just write "Nakano et al.", for example.

- lines 334-344: This section really needs supporting references to the peer-reviewed literature.

- line 347: typo, should be "Asia".

- line 575: "All code is shared openly for review." Where? Please provide a URL.

- lines 576-577: "HCV sequences have been submitted to Genbank." Are the accession numbers available? Please provide them.

- The legend for Figure 1 is very incomplete, as is Figure 2.

- lines 609-610: the highest posterior density (Bayesian) is not equivalent to a confidence interval (frequentist).

- Table 3, why are some tMRCA estimates not available ("NA")?

- Figure 3 axis label, "Cultural Revolution", not "Culture Revolution"

- Supplementary methods, text on Hukou system is not referenced in main text.

Reviewer #2: This is an interesting article characterizing HCV phylogenetic analysis in China. Overall, the manuscript would benefit from more background/detail as outlined below.

Line 247: The authors note the median baseline CD4 count was 336. Was this only among people living with HIV? This should be clarified.

Line 253: How did the authors arrive at 1024 and 1811? It doesn't seem like this number is possible given the samples available based on the text in the manuscript. It becomes evident when looking at figure 1. Better characterization of the number of samples obtained from the LANL database would be helpful.

Figure 1: How are the authors able to define the "effective sample size"? It would be helpful to define this, explicitly in the methods.

Line 334-338: The authors should unpack terms like barefoot doctors and the importance of the Cultural Revolution on HCV spread more for those who aren't familiar with this literature. A discussion of these factors and what is known about them thus far in the literature might fit nicely in the introduction.

Line 353: How did the authors arrive at 200 years? Again, better description in the method section of how these estimates are made would be beneficial.

Line 359: What is the link between HIV, HCV, and Yunnan. It seems plausible that HCV could be spread by traditional tattooing, but this seems unlikely for HIV. Is HIV thought to have originated in China in this area? Was this conclusion arrive at through phylogenetic analysis? Again, more thorough explanation would be useful.

Minor points:

- Please use person first language – e.g. people who inject drugs rather than IDU

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Art Poon

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Dec 21;18(12):e0296053. doi: 10.1371/journal.pone.0296053.r002

Author response to Decision Letter 0


26 Jun 2023

Dear Editor and Reviewers:

We are very grateful to you for giving us an opportunity to revise our manuscript. We really appreciate you very much for your positive and constructive comments and suggestions on our manuscript. We have studied reviewers' comments carefully and tried our best to revise our manuscript according to the comments.

The following are the responses and revisions we have made in response to the reviewer' questions and suggestions point-by-point.

Thanks again to the hard work of the editor and reviewers.

Reviewer #1: This paper reports a fairly conventional comparative study of HCV sequences that were collected in China. Like many previous studies, it uses BEAST to reconstruct changes in coalescence rates (i.e., effective number of infections) over time with time-stamped sequences. They also use pairwise genetic distances to cluster sequences into connected components and then evaluate statistical associations of this outcome with individual-level variables such as transmission risk. The authors distinguish their study from previous work by the number of sequences, and the addition of clinical sequences with metadata such as mode of transmission. Overall the manuscript is quite well written.

Dear Professor Poon:

Response: Thank you for the positive comments for our manuscript.

First, I have some concerns about how the BEAST analyses were carried out. It is generally considered good practice to assess whether there is adequate evidence of a molecular clock in your data by root-to-tip regression (e.g., TempEST) before attempting an analysis in BEAST. This may impact some of the HCV subtypes with smaller numbers of sequences.

Response: Thank you for your good suggestion. We assessed all the sequences using TempEst and found that there were 146 sequences with quality problems. After excluding the 146 sequences, we went back and reanalyzed all the data and got completely different result. We should have known TempEst earlier.

The results would also be more convincing if the authors had run replicate chains for each dataset to demonstrate convergence. Furthermore, wouldn't it be possible to analyze both NS5B and C/E2 regions jointly? There is limited recombination in HCV, so combining the sequences should lend more statistical power to estimate TMRCA etc. The sequences can be concatenated, adding stretches of N's where one region is not covered in a given sample. On the other hand, adding the LANL data severely reduces the proportion of samples for which both regions are covered.

Response: We ran replicated the chains for each datasets and combined the result using the LogCombiner. The performance of ESS value improved a lot. The Skygrid results for concatenated datasets have smaller confidence limits but narrower time scale. As expected, few patients from LANL have sequences covering both Ns5b and C/E2 region. Furthermore, even both Ns5b and C/E2 were available for one patient, we were not sure that they are from identical patient. Therefore, we constructed concatenated dataset for Ns5b and C/E2 sequences only from our study. We succeeded in getting 3 concatenated datasets for 1b, 3a, and 3b, but not for other subtype (1a, 2a, 6a, 6n, and 6xa), because the number of the sequences is too small.

The authors should include a figure summarizing the distribution of sample collection dates over time as a histogram or density plot - this could even be incorporated into the Bayesian skyline plots. In addition, the skyline plots should be moved into the main document, ideally displayed with the same time (horizontal) axis. Moreover, the authors should consider combining skylines into a single plot.

Response: As you suggested, we included a histogram which summarize the distribution of sampling year. However, we did not incorporated the histogram in the Bayesian Skygrid plot because the Skygrid plots were too busy. We provided the histogram in the supplementary materials. The Skygrid plot was moved into the main document, which was displayed with the same time axis.

Please report the configuration of BEAST analyses (e.g., prior hyperparameters) to a degree that would enable others to reproduce the analysis. Please report specific criteria used to assess convergence, such as effective sample size.

Response: We provided a SOP for the BEAST analysis in supplementary method section, which enable anyone to reproduce our analysis. We also provided the criteria for assessing the convergence in the method section.

Specific comments:

- page 3, line 56 abstract and elsewhere, "65.2% of HCV strains" - it is not clear what the authors mean by strains. Are they referring to the different HCV subtypes / genotypes analyzed independently in BEAST?

Response: 65.2% was updated as 81.8%. In this study, we reconstruct the past dynamic history for 8 HCV subtype. The results of Skygrid showed that five of them (1a, 1b, 2a, 3a, and 3b), which constituted 81.8% (1447 of 1769) of HCV strains genotyped, have declining trend.

- page 4, lines 87-88: "However, the outcomes of these theoretical studies have been limited by a relatively narrow span of sampling time." What makes these past studies theoretical? Nakano et al. (2006) and Lu et al. (2013) report phylogenetic analyses of HCV sequence data, so they seem just as empirical as the present study.

Response: The studies by professor Nakano,et al. and professor Lu et al. were just classic. I learn a lot from these two articles. However, even the most recent study was conducted ten years ago. The technology of phylodynamic analysis are progressing. The data about molecular epidemiology of HCV in China need updated. We provided a table (Table 1) to summary the characteristic of these two studies and ours.

Table 1. The characteristic of studies by Nakano,et al., Lu et al., and ours.

Nakaro’ Lu’ Our

Gene E1 and Ns5b E1 and Ns5b Ns5b and C/E2

Sampling region 9 cities# 1 city (Guangzhou) 25 provinces*

Sampling period 2002 2009-2011 1994-2020

Number of sequence

E1 89 417

Ns5b 92 423 406

C/E2 397

Number of reference sequence

E1 72

Ns5b 61 1197

C/E2 468

Subtype 1b 1b, 2a, 3a, 3b, 6a 1a, 1b, 2a, 3a, 3b, 6a, 6n, and 6xa

Model Bayesian skyline plots in BEAST 1.2. Bayesian skyline plots in BEAST 1.6.1. Bayesian Skygrid in BEAST 1.10.4.

#Shenyang, Beijing, Hohhot, Shanghai, Zhengzhou, Guangzhou, Shenzhen, Foshan, Kunming.

*North=Beijing, Hebei, Shanxi, Inner Mongolia; Northeast=Liaoning, Heilongjiang; East=Shanghai, Jiangsu, Zhejiang, Anhui, Jiangxi, Shandong; Central South=Henan, Hubei, Hunan, Guangdong,Guangxi, Hainan; Southwest=Chongqing, Sichuan, Guizhou, Yunnan; Northwest=Shannxi, Qinghai, Sinkiang.

- page 5, line 94, please clarify what you mean by "epidemiologic connection"

Response: In this article, “epidemiologic connection” means “epidemiologically related”. For example, two individuals with similar viruses are likely to have direct transmission relationship, or be infected by a common source. A cluster of individuals with genetically similar infections may represent a outbreak related through a succession of recent transmission events.

- page 5, lines 95-96: "Over the last decade, a simplified genetic distance-based method has increasingly been used to define HIV transmission networks within a population." The authors should indicate which method they are talking about (there are several), and cite the relevant literature.

Response: To date, as far as I knew, there is still no clear consensus on how transmission clusters should be defined. Over the last two decades, many clustering methods have been developed to define HIV transmission networks within a population. Broadly speaking, these methods can be grouped into two categories: methods that cluster directly on sequence variation via pairwise genetic distance measures, and methods that interpret this variation in the context of subtrees in a phylogeny. Phylogenetic analysis can be associated with high computational burden, especially for large sequence datasets. However, the genetic distance method can be computed rapidly. Therefore recent network analyses have favoured the generally faster and parameter-rich distanced-based methods. We chose to used the pairwise genetic method too.

- page 5, line 105: "[...] using our unique dataset." Every dataset is unique. It would be more appropriate to state "[...] using a substantially more comprehensive dataset and metadata than previous work." or something along those lines.

Response: I agreed with your opinion. We removed the word “unique” in our manuscript.

- page 6, line 111: please define BHLN, CDC and PLA at first use.

Response: We gave the full name for these three abbreviation when they first appeared in the manuscript.

- page 6, line 114: do the authors mean "reference laboratory" when they write "confirmatory laboratory"? I have not seen this term used as a noun in the literature - I can only find the phrase "confirmatory laboratory testing".

Response: Yes, confirmatory laboratories are reference laboratories.

- page 6, line 119 and elsewhere: perhaps use "cost-effective" rather than "economical"?

Response: We used cost-effective in new revision.

- page 6, lines 123-124: please cite a reference

Response: We cited the reference.

- page 6, line 130: "Accepting the reality" - this is awkward phrasing

Response: We removed this phrase.

- page 6, line 131: "we devised unique economic [...]" - is it really necessary to assert that these inclusion criteria are unique?

Response: We removed the word “unique” in the new revision.

- page 7, it would be helpful to refer to Figure 1 (data collection flowchart) somewhere here, and/or Table S1

Response: We referred to Fig 1 and Table S1 .

- lines 169-170: presumably building the ML tree was to confirm COMET geno/subtyping - please clarify

Response: Yes, we built the ML tree to confirm the result from COMET.

- lines 173-174: how many duplicate sequences? It is not necessary to discard these for a molecular clock analysis (i.e., BEAST), and in fact removing duplicate sequences can bias the analysis.

Response: As the datasets were not too large, we realized that there is no need to discard duplicate sequences. In the new revision, we restored the sequences that were discarded, as you suggested.

- line 177: what do you mean by "miscued"? Do you mean "misclassified"?

Response: Yes, the “miscued” do mean the “misclassified”. We corrected it.

- lines 188-189: the authors report generating maximum clade credibility trees, but these do not appear anywhere in the manuscript or supplementary materials.

Response: In the new revision, we provided the maximum clade credibility(MCC) trees in supplementary materials (Fig S2).

- lines 195: what is the rationale for using these TN93 distance thresholds? How sensitive are your results to varying these thresholds?

Response: First, the TN93 genetic distance was used because it can be computed rapidly and is the most complex genetic distances that can be represented by a closed-form solution; Second, it is easy to grasp; Third, it is very popular. As far as I know, more than 80% article about HIV and HCV molecular transmission network published during the past decade used TN93 model. The sensitive analysis showed that the conclusion of the transmission network using different TN93 threshold did not changed.

- lines 219-222: minor point - it is a bit unusual to report both confidence intervals and P-values, where CI is the presently the recommended method for assessing statistical significance.

Response: We removed the P-values in the new revision.

- lines 261-262: please display the phylogenetic tree as a supplementary figure, preferably with tips coloured by COMET results.

Response: We provided the phylogentic trees as a supplementary figure (Figure S2).

- line 272: "Table 2 presents the temporal trends for these eight major subtypes." Table 2 does not appear to contain any temporal information - I think the authors meant to refer to Table 1?

Response: Yes, it referred Table 1. I corrected it.

- Please consider using a set of choropleths (e.g., https://www.esri.com/arcgis-blog/wp-content/uploads/2020/02/redmap.png) to summarize the distribution of HCV subtypes in China (one choropleth per subtype) instead of pie charts. I think they would be much easier to interpret.

Response: Including figure 2 in our manuscript brought some trouble to me.

The editor thought that Figure 2 may contain copyrighted images. The editor require me to either 1) present written permission from the copyright holder to publish these figures specifically under the CC BY 4.0 license, or 2) remove the figures from my submission. We preferred to remove it. We included this map to characterize the geographical distribution of HCV subtype in China. Under the present circumstance, we think it is better to characterize it in a table.

I have heard so much about Arcgis for a long time. But I did not have opportunity to used it, because it is so expensive that I could not afford to buy it.

- Table 3 might be more effectively presented as a series of histograms or density plots (e.g., ridgeplots)

Response: We transformed Table 3 as a series of histograms. But we still kept this table in the manuscript as well.

- line 287: regarding "modern medicine", it is most likely the reuse and inadequate sterilization of glass and metal syringes, is it not?

Response: Yes, I agreed with you for the opinion. We added it in the manuscript.

- lines 293-294: the "small rebound between 2005 and 2010" is more likely due to sampling biases (e.g., distribution of samples in time, lack of convergence in chain) than a real trend in the data.

Response: We agreed with you for the opinion. We included this opinion in the limitation section.

- lines 298-300: it is not sufficient to claim that the BEAST analyses of C/E2 sequences "were consistent with that of NS5B" without showing any data or reporting any quantitative results. It would not be difficult, for example, to plot the skylines for both genome regions together for a given subtype.

Response: We have plot skygrid for both genome for the main eight subtype (1a, 1b, 2a, 3a, 3b, 6a, 6n, and 6xa). We also plot skygrid for concatenated genome for 1b, 3a, and 3b, but not for the other five subtype (1a, 2a, 6a, 6n, and 6xa) because the number of sequences are too small.

- lines 310-312: Please summarize odds ratios and CIs from univariate analyses here. The reader should not have to dig into the supplementary materials for this information.

Response: We summarized OR and CIs here.

- lines 314-316: "Although the available dataset is relatively smaller, we observed a similar pattern in the transmission network inferred using C/E2 sequences." This is really inadequate. If you are not going to show these results (e.g., sizes and composition of largest components, network graph), then you need to justify this conclusion with quantitative results, i.e., a statistical comparison of the two networks.

Response: We did the transmission network analysis in parallel using Ns5b and C/E2 and showed these results together in the new revision.

- lines 321-322: "These data show that the HCV epidemic in China exhibits great genetic diversity." This claim needs to be justified. How much more diverse are the HCV sequences (with respect to number of different genotypes and subtypes) in China compared to other regions? Ideally you should adjust for differences in sample sizes.

Response: We adjusted this claim. As far as I knew, the number of sequences for most of study of this kind is about 500. Therefore we analyzed more than threefold of the numbers of sequences of other study.

- line 331 and line 349: It is unconventional to give the full name of the first authors when referencing previous work. Usually one would just write "Nakano et al.", for example.

Response: We made some adjustment.

- lines 334-344: This section really needs supporting references to the peer-reviewed literature.

Response: The peer-reviewed literature about relationship of HCV epidemiology and “Cultural Revolution” and “ Great Leap” were limited.

We searched HCV and/or “Cultural Revolution” or “Great Leap Forward” or “Encouraged Plasma Campaign” in Pubmed. We only got two articles. We just provided these two articles as supporting references.

- line 347: typo, should be "Asia".

Response: We corrected it.

- line 575: "All code is shared openly for review." Where? Please provide a URL.

Response: We provided a URL in the second revision (DOI 10.17605/OSF.IO/NKD8Y).

- lines 576-577: "HCV sequences have been submitted to Genbank." Are the accession numbers available? Please provide them.

Response:

We provided accession number of the sequences from LANL database in the supplementary materials.

- The legend for Figure 1 is very incomplete, as is Figure 2.

Response: I apologized for my careless. In the new revision, we tried our best to make the legend as complete as possible.

- lines 609-610: the highest posterior density (Bayesian) is not equivalent to a confidence interval (frequentist).

Response: Thank you for your suggestion. We corrected it.

- Table 3, why are some tMRCA estimates not available ("NA")?

Response: We provided the missing tMRCA in the new revision.

- Figure 3 axis label, "Cultural Revolution", not "Culture Revolution"

Response: Thank you for your suggestion. We corrected it.

- Supplementary methods, text on Hukou system is not referenced in main text.

Response: Thank you for your suggestion. As Hukou system was not mentioned in discussion section, we have removed it in supplementary methods.

Reviewer #2: This is an interesting article characterizing HCV phylogenetic analysis in China. Overall, the manuscript would benefit from more background/detail as outlined below.

Dear Professor:

Response: Thank you for the positive comments for our manuscript.

Line 247: The authors note the median baseline CD4 count was 336. Was this only among people living with HIV? This should be clarified.

Response:

Yes, the CD4 was only available for individuals with HIV/HCV co-infection.

We clarified it in the manuscript. This study is a secondary product of a multi-center HIV molecular epidemiology in China.

Line 253: How did the authors arrive at 1024 and 1811? It doesn't seem like this number is possible given the samples available based on the text in the manuscript. It becomes evident when looking at figure 1. Better characterization of the number of samples obtained from the LANL database would be helpful.

Response: I realized that there was much confusion in the numbers of sequences in the text. The flowchart also failed to give clear information indeed. We included 1,197 Ns5b, and 468 C/E2 sequences from 1343 individuals from the LANL database. Ns5b is over-represented. 322 individuals have both Ns5b and C/E2 sequences. 1,021 have either of the fragments, of which 875 have single Ns5b and 146 have single C/E2. We ourselves provided 406 Ns5b, and 397 C/E2 sequences from 468 individuals for this analysis. 335 individuals have both Ns5b and C/E2 sequences, and 133 have either of the fragments. Together we obtained 1603 Ns5b and 865 C/E2 sequences from 1811 individuals. In the pooled dataset, 657 have both regions and 1154 have either.

In the new revision, we excluded 3 C/E2 sequences with problem from LANL.

Therefore, number 1024 became 1021, but the 1811 was consistent.

There so many figures in the manuscript. We tried to present them clearly and concisely.

Figure 1: How are the authors able to define the "effective sample size"? It would be helpful to define this, explicitly in the methods.

Response: The Effective Sample Size (ESS) of a parameter sampled from an MCMC (such as BEAST) is the number of effectively independent draws from the posterior distribution that the Markov chain is equivalent to. We defined the ESS in methods section.

Line 334-338: The authors should unpack terms like barefoot doctors and the importance of the Cultural Revolution on HCV spread more for those who aren't familiar with this literature. A discussion of these factors and what is known about them thus far in the literature might fit nicely in the introduction.

Response:

Barefoot doctors were healthcare providers who underwent basic medical training and worked in rural villages in China. The barefoot doctors system was developed and institutionalized in 1965 and broke down in the 1980s. Barefoot doctors included farmers, folk healers, rural healthcare providers and recent middle or secondary school graduates who received minimal basic medical and paramedical education. The name comes from southern farmers in China, who would often work barefoot in the rice paddies, and simultaneously worked as medical practitioners. Major social and political events may deeply influence the transmission of infectious disease. “The culture revolution” is the largest social and political event in China during the past century. The cultural revolution damaged China’s healthcare system. During the revolution, nearly all professional medical staff had to stop working and were dispersed across the countryside.

Line 353: How did the authors arrive at 200 years? Again, better description in the method section of how these estimates are made would be beneficial.

Response: I am sorry for giving you the impression that we have time-travel to 200 years ago. Followed the suggestion given by professor Poon, we assess the quality of the sequences using the TempEst before doing analysis in BEAST. We excluded 146 problematic sequences and did BEAST analysis once more. At this time, most of TMRA fell within 100 years ago, except for subtype a1.

Our study was inspired by two classic articles in HCV molecular epidemiology area. The first is the article entitled “Genetic history of hepatitis C virus in East Asia” wrote by professor Oliver published in J Virol (2009;83:1071-82.).

Oliver et al. revealed a >1,000-year-long development of genotype 6 in Asia, characterized by substantial phylogeographic structure and two distinct phases of epidemic history, before and during the 20th century.

The second is article entitled “Colonial history and contemporary transmission shape the genetic diversity of hepatitis C virus genotype 2 in Amsterdam" wrote by professor Markov published in J Virol (2012;86:7677-87.).

Markov et al. detected multiple HCV-2 movements from present-day Ghana/Benin to the Caribbean during the peak years of the slave trade (1700 to 1850) and extensive transfer of HCV-2 among the Netherlands and its former colonies Indonesia and Surinam over the last 150 years. The latter coincides with the bidirectional migration of Javanese workers between Indonesia and Surinam and subsequent immigration to the Netherlands.

Therefore, it is not surprised that the TMRA of some HCV subtype (a1) dated back to 200 years ago.

These HCV sequences contain information about the rate of sequence evolution and consequently such data sets can be used to directly infer molecular phylogenies on a natural time-scale of months, years, or millennia.

In the past two decades, Bayesian method has been so popular that to reconstruct the epidemic history of RNA viruses, such as HIV, HCV, Ebola, and Zika, using it have been common things. Some of them with high quality have been published in Science and Nature.

In the revision, we included more description and reference about this method in our manuscript.

Line 359: What is the link between HIV, HCV, and Yunnan. It seems plausible that HCV could be spread by traditional tattooing, but this seems unlikely for HIV. Is HIV thought to have originated in China in this area? Was this conclusion arrive at through phylogenetic analysis? Again, more thorough explanation would be useful.

Response: It was thought that subtype B and CRF07_BC have originated Yunnan province. Yes, this conclusion was concluded through phylogentic and molecular clock analysis. HCV and HIV share routes of transmission and many people with HIV are co-infected with HCV, especially in people who inject drugs and former paid blood donors. Therefore, the history of the epidemic of the two viruses was intermixed.

The origin and evolutionary history of three main HIV subype (B, CRF01_AE, and CRF07_BC), which were responsible for approximately 85% infection in China, have been well characterized.

Li,et al. showed that subtype B epidemics among former blood donors and heterosexuals in inland China were most likely originated from a single founding subtype B strain that had been circulating among PWID in Yunnan province. Yunnan province plays a pivotal role in bridging the preexisting subtype B epidemics in south-east Asia with the subsequent epidemic among FPDs and heterosexuals in inland China. (AIDS,2012,26:877-84. )

Meng, et al. demonstrated that CRF07_BC was originated in 1993 in IDU in Yunnan province and then initially spread to Guangxi (eastern neighbor to Yunnan) in 1994, to Xinjiang (northwest) in 1995 and to Sichuan (northern neighbor to Yunnan) in 1996. (PLoS ONE 7(12): e52373. )

Feng, et al. identified seven distinct phylogenetic clusters of CRF01_AE in China. Molecular clock analysis indicated that all CRF01_AE clusters were introduced from Southeast Asia in the 1990s, coinciding with the peak of Thailand’s HIV epidemic and the initiation of China’s free overseas travel policy for their citizens, which started with Thailand as the first destination country.(AIDS 2013, 27:1793-1802.)

We included the above three literature to support the opinion that Yunnan was the early epicenter of HIV in China.

Minor points:

- Please use person first language – e.g. people who inject drugs rather than IDU

Response: We used the person first language in the new revision.

We appreciate for Editors/Reviewers’ warm work earnestly. We acknowledged that it was difficult to incorporated all comments, and we just hoped that the revision is acceptable. Once again, thank you very much for your comments and suggestions.

Attachment

Submitted filename: HCVPloscomment3.pdf

Decision Letter 1

Jason T Blackard

2 Oct 2023

PONE-D-22-30314R1Distribution pattern, molecular transmission networks, and phylodynamic of hepatitis C virus in ChinaPLOS ONE

Dear Dr. Lu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please address the concerns raised by Reviewer #1 in the revised manuscript prior to its acceptance.

Please submit your revised manuscript by Nov 16 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Jason T. Blackard, PhD

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

Additional Editor Comments:

Please address the concerns raised by Reviewer #1 in the revised manuscript prior to its acceptance.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: (No Response)

Reviewer #2: I Don't Know

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: (No Response)

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors were diligent in making revisions to their manuscript in response to reviews.

* "The sequences whose sampling year is incongruent with genetic divergence were excluded for Bayesian analysis."

I apologize for making myself clearer in my previous review - my intention was to recommend that the authors should use TempEst to evaluate each data set for evidence of a molecular clock, and then to *discard the entire alignment for a given HCV subtype* if it did not meet this criterion. Individually discarding a substantial number of sequences based on a clock-based criterion is dangerous, because one can reshape a data set that has no temporal signal into one that strongly supports a clock. Rather, I think this treatment is associated with identifying sequences with incorrect dates, or from external evidence of systematic and substantial sequencing errors, for example.

As a compromise:

1. Please provide your exact quantitative criteria for filtering sequences at this step.

2. I am somewhat concerned that the results were quite sensitive to filtering sequences, i.e., estimates of TMRCA. The most transparent thing to do here would be to present both sets of results: (1) the original results without filtering sequences, and (2) results having filtered sequences under clearly reported criteria. The original results can be moved to Supplementary Materials to minimize revisions to the main text.

* "For each dataset, three independent Markov chain Monte Carlo (MCMC) chains were run for 100 million generations with states sampled every 10,000 generations. Log files were combined using Logcombiner to ensure sufficient convergence [...]"

More miscommunication here. The comment in my previous review was: "The results would also be more convincing if the authors had run replicate chains for each dataset to demonstrate convergence." I should have clarified that the replicate chain samples should be compared for evidence of convergence, *e.g.* overlapping posterior traces, before combining samples. It does seem that the authors did something like this, however, given the following line: "The convergence of MCMC chains was checked using Tracer (version 1.7.2)" If this is correct, then please provide the specific criteria you used to determine convergence - even visual assessment is okay. (On reflection, I suppose that one might demonstrate convergence if combining chains increases ESS.)

* the Open Science Framework URL does not work as written. Please use https://doi.org/10.17605/OSF.IO/NKD8Y

AP

Reviewer #2: The authors' revisions are appreciated and I feel they have sufficiently addressed my questions and comments.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Art Poon

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Dec 21;18(12):e0296053. doi: 10.1371/journal.pone.0296053.r004

Author response to Decision Letter 1


11 Nov 2023

PONE-D-22-30314R1

Distribution pattern, molecular transmission networks, and phylodynamic of hepatitis C virus in China

Dear Editor and Reviewers:

We are very grateful to you for giving us an opportunity to revise our manuscript. We really appreciate you very much for your positive and constructive comments and suggestions on our manuscript. We have studied Poon' comments carefully and tried our best to revise our manuscript according to the comments. The following are the responses and revisions we have made in response to the Poon' questions and suggestions point-by-point.

Thanks again to the hard work of the editor and reviewers.

Dear Dr. Lu,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please address the concerns raised by Reviewer #1 in the revised manuscript prior to its acceptance.

Reviewer #1: The authors were diligent in making revisions to their manuscript in response to reviews.

Response: Thank you for your positive comments. I learn a lot from your comments. Thanks.

* "The sequences whose sampling year is incongruent with genetic divergence were excluded for Bayesian analysis."

I apologize for making myself clearer in my previous review - my intention was to recommend that the authors should use TempEst to evaluate each data set for evidence of a molecular clock, and then to *discard the entire alignment for a given HCV subtype* if it did not meet this criterion. Individually discarding a substantial number of sequences based on a clock-based criterion is dangerous, because one can reshape a data set that has no temporal signal into one that strongly supports a clock. Rather, I think this treatment is associated with identifying sequences with incorrect dates, or from external evidence of systematic and substantial sequencing errors, for example.

As a compromise:

1.Please provide your exact quantitative criteria for filtering sequences at this step.

Response:

I agree with your opinion concerning discarding a substantial number of sequences based on clock-based criterion.

Our study invariably falls within the caveat of sampling bias.

We acknowledge this in the limitations section.

In HIV molecular epidemiology studies, researchers always need to discard some sequences either to reduce the computational load or to avoid potential bias that may arise from over-sampling a particular location.

In discarding sequences, we referred to the article by Andrew Rambaut,et al. entitled “Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen).” published in Virus Evol 2016;2(1):vew007.

The quantitative criterion for filtering sequences is their positions in the root-to-tip plot.

Outliers were carefully investigated and discarded if they were suspected of having quality problems.

In total, we discarded 152(6.5%) sequences from the original datasets which consist of 2342 sequences.

The problems for these outliers sequences were 1)low sequencing quality, 2)errors in sequence assembly, 3) alignment error in part of the sequence.

The majority of the 152(88.2%) sequences were retrieved from GenBank.

The TMRCA of the original datasets has a larger interval and lower ESS value, suggesting that including sequences with problems could bias the inference.

2. I am somewhat concerned that the results were quite sensitive to filtering sequences, i.e., estimates of TMRCA. The most transparent thing to do here would be to present both sets of results: (1) the original results without filtering sequences, and (2) results having filtered sequences under clearly reported criteria. The original results can be moved to Supplementary Materials to minimize revisions to the main text.

Response:

We provided the original results without filtering sequences in the Supplementary Materials.

* "For each dataset, three independent Markov chain Monte Carlo (MCMC) chains were run for 100 million generations with states sampled every 10,000 generations. Log files were combined using Logcombiner to ensure sufficient convergence [...]"

More miscommunication here. The comment in my previous review was: "The results would also be more convincing if the authors had run replicate chains for each dataset to demonstrate convergence." I should have clarified that the replicate chain samples should be compared for evidence of convergence, *e.g.* overlapping posterior traces, before combining samples. It does seem that the authors did something like this, however, given the following line: "The convergence of MCMC chains was checked using Tracer (version 1.7.2)" If this is correct, then please provide the specific criteria you used to determine convergence - even visual assessment is okay. (On reflection, I suppose that one might demonstrate convergence if combining chains increases ESS.)

Response:

The biggest difficulty we faced was the inclusion of too many datasets(8 NS5B+8 CE2+3 combination) in our analysis.

Moreover, we did not down-sample sequences to maintain the integrity of the original data.

Therefore, the computational load was large.

We ran multiple MCMC and combined the log files to increase ESS.

Fortunately, of the 19 datasets, only six have ESS value less than 200 in Bayesian inference.

We only had to run multiple for these six datasets.

Before we combine these results, we compare the evidence of convergence, e.g. ,statistics such as joint, prior, and likelihood.

I beg your pardon for the meaning of “overlapping posterior traces.”

We referred to at least nine classic papers on Bayesian evolutionary analysis, and none of them provided direct answer.

Would you kindly recommend some manuscripts concerning “overlapping posterior traces” .

Your kindness and help are greatly appreciated.

1.Nuno R Faria,et al.HIV epidemiology. The early spread and epidemic ignition of HIV-1 in human populations.Science 2014;346(6205):56-61.

2.Claudia Palladino,et al.Epidemic history of hepatitis C virus genotypes and subtypes in Portugal.Sci Rep 2018;8(1):12266.

3.Medhat K Shier ,et al.Molecular characterization and epidemic history of hepatitis C virus using core sequences of isolates from Central Province, Saudi Arabia.PLoS One

2017;12(9):e0184163.

4.Oliver G. Pybus,Eleanor Barnes,Rachel Taggart,et al.Genetic History of Hepatitis C Virus in East Asia.J Virol 2009; 83,1071-82.

5.Anna L McNaughton, Iain Dugald Cameron,Elizabeth B Wignall-Fleming,et al.2015. Spatiotemporal reconstruction of the introduction of hepatitis C virus into Scotland and its subsequent regional transmission. J Virol 2015; 89:11223–11232.

6.K Hoshino ,et al.Phylogenetic and phylodynamic analyses of hepatitis C virus subtype 1a in Okinawa, Japan.J Viral Hepat 2018;25(8):976-985.

7.S Zhou,E Cella,W Zhou,et al. Population dynamics of hepatitis C virus subtypes in injecting drug users on methadone maintenance treatment in China associated with economic and health reform. J Viral Hepat 2017; 551-560.

8.Anna L McNaughton, Iain Dugald Cameron,Elizabeth B Wignall-Fleming,et al.2015. Spatiotemporal reconstruction of the introduction of hepatitis C virus into Scotland and its subsequent regional transmission. J Virol 2015; 89:11223–11232.

9.Peter V Markov,et al.Colonial history and contemporary transmission shape the genetic diversity of hepatitis C virus genotype 2 in Amsterdam.J Virol 2012 ;86(14):7677-87.

* the Open Science Framework URL does not work as written. Please use https://doi.org/10.17605/OSF.IO/NKD8Y.

Response: We used it.

We acknowledge that we have not encompass all of your comments.

We hope that these responses are acceptable.

Thank you very much.

Yours sincerely

Hongyan Lu.

Attachment

Submitted filename: HCV plos second comment.docx

Decision Letter 2

Jason T Blackard

6 Dec 2023

Distribution pattern, molecular transmission networks, and phylodynamic of hepatitis C virus in China

PONE-D-22-30314R2

Dear Dr. Lu,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Jason T. Blackard, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

None

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I am pleased to recommend the revised manuscript for acceptance.

Regarding the authors' question about "overlapping posterior traces", this assessment is often visual (opening the replicate log files in Tracer, selecting all posterior traces in a combined plot and visually inspecting the resulting combined plots to determine whether the traces overlap). A more reproducible approach would be to use a convergence diagnostic like the Gelman-Rubin convergence diagnostic, which is available in the R package coda. There is no need to change anything in your manuscript, I am just trying to answer your question.

AP

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Art Poon

**********

Acceptance letter

Jason T Blackard

12 Dec 2023

PONE-D-22-30314R2

PLOS ONE

Dear Dr. Lu,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Jason T. Blackard

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Number of transmission clusters as a function of the TN93 distance.

    The threshold that was selected is highlighted in red.

    (DOCX)

    S2 Fig. The maximum likelihood (ML) phylogenetic tree based on Ns5b and C/E2 gene.

    1,603 Ns5b and 865 C/E2 sequences from China were analyzed with HCV reference strains (NC_004102, D90208, AB047639, JN714194, JQ065709, HQ639936, DQ278894) as an out-group using Fasttree 2.1.

    (DOCX)

    S3 Fig. The TMRCA of HCV in China.

    TMRCA = Time to the Most Recent Common Ancestor. The solid line indicates the 95% highest posterior density [HPD] interval for TMRCA.

    (DOCX)

    S4 Fig. The past population dynamics of HCV (1a, 2a, 6a, 6n, and 6xa) visualized using the Skygrid model.

    The shaded portion is the 95% Bayesian credibility interval, and the solid line is the posterior median.

    (DOCX)

    S5 Fig. The distribution of sampling year for HCV sequences in China.

    (DOCX)

    S1 Table. Descriptions of the participating cohort.TDR = Transmitted drug resistance.

    BHLN = Beijing HIV laboratory network; PWID = People who inject drugs; MSM = Men who have sex with men; NA = not available; LANL = Los Alamos National Laboratory.

    (DOCX)

    S2 Table. Demographic and clinical factors associated with clustering based on Ns5b gene.

    North = Beijing, Hebei, Shanxi, Inner Mongolia, Northeast = Liaoning, Heilongjiang, East = Shanghai, Jiangsu, Zhejiang, Anhui, Jiangxi, Shandong, Central South = Henan, Hubei, Hunan, Guangdong,Guangxi, Hainan, Southwest = Chongqing, Sichuan, Guizhou, Yunnan, Northwest = Shannxi, Qinghai, Sinkiang; MSM = men who have sex with men, PWID = people who inject drugs; NA = not available; OR = odds ratio; aData are n (%); bUnivariable logistic regression analysis; cMultivariable logistic regression analysis; dData for n = 1552, eData for n = 1175, Other = 6e, 6g, 6l, 6w, and 6v.

    (DOCX)

    S3 Table. Demographic and clinical factors associated with clustering based on C/E2 gene.

    North = Beijing, Hebei, Shanxi, Inner Mongolia, Northeast = Liaoning, Heilongjiang, East = Shanghai, Jiangsu, Zhejiang, Anhui, Jiangxi, Shandong, Central South = Henan, Hubei, Hunan, Guangdong,Guangxi, Hainan, Southwest = Chongqing, Sichuan, Guizhou, Yunnan, Northwest = Shannxi, Qinghai, Sinkiang; MSM = men who have sex with men, PWID = people who inject drugs; NA = not available; OR = odds ratio; aData are n (%), bUnivariable logistic regression analysis, cMultivariable logistic regression analysis, dData for n = 849, eData for n = 846, Other = 6e, 6g, 6l, 6w, and 6v.

    (DOCX)

    S4 Table. The TMRCA of HCV in China inferred from original dataset.

    TMRCA = Time to the Most Recent Common Ancestor. aData are TMRCA (the 95% highest posterior density [HPD] interval).

    (DOCX)

    S1 File. The protocol for Bayesian estimation of past population dynamics using the Skygrid coalescent model.

    (DOCX)

    S2 File. Accession numbers.

    (DOCX)

    Attachment

    Submitted filename: HCVPloscomment3.pdf

    Attachment

    Submitted filename: HCV plos second comment.docx

    Data Availability Statement

    The minimal anonymized data sets necessary to replicate study findings have been uploaded in Open Science Framework (OSF) DOI: https://doi.org/10.17605/OSF.IO/NKD8Y. HCV sequences have been submitted to GenBank (See the Supporting Information file for accession numbers).


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES