Abstract
Non-CODIS STRs, with high polymorphism and allele frequency difference among ethnically and geographically different populations, play a crucial role in population genetics, molecular anthropology, and human forensics. In this work, 332 unrelated individuals from Sichuan Province (237 Tibetan individuals and 95 Yi individuals) are firstly genotyped with 21 non-CODIS autosomal STRs, and phylogenetic relationships with 26 previously investigated populations (9,444 individuals) are subsequently explored. In the Sichuan Tibetan and Yi, the combined power of discrimination (CPD) values are 0.9999999999999999999 and 0.9999999999999999993, and the combined power of exclusion (CPE) values are 0. 999997 and 0.999999, respectively. Analysis of molecular variance (AMOVA), principal component analysis (PCA), multidimensional scaling plots (MDS) and phylogenetic analysis demonstrated that Sichuan Tibetan has a close genetic relationship with Tibet Tibetan, and Sichuan Yi has a genetic affinity with Yunnan Bai group. Furthermore, significant genetic differences have widely existed between Chinese minorities (most prominently for Tibetan and Kazakh) and Han groups, but no population stratifications rather a homogenous group among Han populations distributed in Northern and Southern China are observed. Aforementioned results suggested that these 21 STRs are highly polymorphic and informative in the Sichuan Tibetan and Yi, which are suitable for population genetics and forensic applications.
Introduction
Short tandem repeats (STRs) are DNA sequences with a number of tandemly repeated short sequence motifs (2–6 bp), such as (ATCT)n1–3. In the past decades, comprehensive recognition of human STRs has provided insights into anthropology, archaeology, human forensics and population genetics4,5. Recently, non-Combined DNA Index System short tandem repeats (non-CODIS STRs) are attractive to genetic applications like population stratification analysis, regional population structure studies and forensic individual identification and paternity testing6–12. Non-CODIS STRs combined with previous commercial CODIS STRs amplification systems play an indispensable complementary role in the forensic applications: disentangling missing person cases, identifying victims, and solving the parentage testing cases with mutation. Genetic diversity of non-CODIS STRs in major ethnic groups in the East Asia, especially in China, has been explored7,8,13–17. However, the genetic architecture of geographically and linguistically distinct populations in Sichuan Province, the 5th largest and 3rd most populous province in China, remains uncharacterized.
Sichuan consists of two geographically distinct parts: the eastern part is mostly within the fertile Sichuan Basin and the western part consists of the numerous mountain range. Han Chinese (the majority of the province’s population) mainly reside in the eastern portion, while significant minorities of Yi, Tibetan and Qiang people reside in the western portion that is impacted by inclement weather and natural disasters. Yi population, also known as Lolo population, is the seventh largest of the Chinese officially recognized 55 ethnic minority groups and the Yi is the largest ethnic minority group in Sichuan18,19. The population history of the Yi group remains controversial18–20. The main point is that the Yi migrated from southeastern Tibet through Sichuan and into the Yunnan Province and has the common ancestor with the Tibetan, Nakhi and Qiang peoples. Tibetan population is mainly resided throughout the Qinghai-Tibetan Plateau for hundreds of generations has genetic adaptations of distinct combinations of phenotype in high-altitude (>4000 m)21–23. Chengdu, the capital of Sichuan Province, is home to a large community of Tibetans, with 30,000 permanent Tibetan residents and up to 200,000 Tibetan floating population. In our previous study24, we have investigated the genetic polymorphism data of non-CODIS STRs in Sichuan Han population. However, until now no genetic diversity data about non-CODIS STRs was available for the Tibetan population and Yi population from Sichuan Province.
In continuation to our previous study24, the present study characterizes the genetic diversity of 21 non-CODIS STRs in Tibetan population (237 individuals) and Yi population (95 individuals) from Sichuan Province. Furthermore, other genetic data of 9,444 previously investigated individuals6–17,24–37 from 26 populations is used to investigate Sichuan and Chinese population genetic substructure using analysis of molecular variance (AMOVA), principal component analysis (PCA), multidimensional scaling plots (MDS) and phylogenetic analysis.
Results
Genetic parameters of the 21 non-CODIS STRs
Population genetic structures in East Asia are complex, especially in Chinese population stratification consisted of 56 Chinese officially recognized ethnic groups widely distributed in 34 administrative divisions. Different ethnic origin has been believed to have their special ethnic origin or common ancestors. Clearly understanding the genetic variation and forensic characteristics of different ethnic populations is indispensable in the forensic applications, especially in the paternity testing and individual identification. In the present study, a total of 332 unrelated individuals residing in Sichuan Province are genotyped using a multiplex assay amplifying 21 non-CODIS autosomal STR loci (AGCU 21 + 1 System). The detailed genotypes of 237 Tibetan individuals and 95 Yi individuals are presented respectively in Supplementary Tables S1 and S2. Forensic parameters including observed heterozygosity (Ho), expected heterozygosity (He), polymorphism information content (PIC), power of discrimination (PD), power of exclusion (PE) and typical paternity index (TPI) for each locus in two ethnic groups are shown in Table 1. No significant deviations from Hardy–Weinberg equilibrium (HWE) are observed for any of the 21 non-CODIS STRs or in two ethnic groups after Bonferroni correction (p > 0.0024). No significant deviations from linkage disequilibrium between pairwise STR loci (378 pairwise groups) are observed after Bonferroni correction (p > 0.0002) with the exception of pairwise groups between D11S4463 and D5S2500 in Tibetan population, as well as D10S1248 and D6S1017 in Yi population (Supplementary Tables S3 and S4).
Table 1.
Populations | Tibetan | Yi | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Parameters | TPI | PD | PIC | PE | Ho | He | p | TPI | PD | PIC | PE | Ho | He | p |
D6S474 | 1.4277 | 0.8488 | 0.6296 | 0.3549 | 0.6498 | 0.6864 | 0.2243 | 1.9000 | 0.8472 | 0.6436 | 0.4875 | 0.7368 | 0.6965 | 0.3920 |
D12ATA63 | 1.8810 | 0.8746 | 0.6704 | 0.4831 | 0.7342 | 0.7147 | 0.5060 | 2.6389 | 0.8700 | 0.6978 | 0.6187 | 0.8105 | 0.7458 | 0.1474 |
D22S1045 | 1.8516 | 0.8904 | 0.7090 | 0.4761 | 0.7300 | 0.7533 | 0.4050 | 2.2619 | 0.8742 | 0.7037 | 0.5606 | 0.7790 | 0.7520 | 0.5429 |
D10S1248 | 2.1161 | 0.8984 | 0.7217 | 0.5335 | 0.7637 | 0.7599 | 0.8900 | 2.0652 | 0.9268 | 0.7982 | 0.5234 | 0.7579 | 0.8260 | 0.0802 |
D1S1677 | 1.3022 | 0.8208 | 0.5909 | 0.3106 | 0.6160 | 0.6457 | 0.3393 | 1.1875 | 0.8086 | 0.5699 | 0.2664 | 0.5790 | 0.6339 | 0.2665 |
D11S4463 | 2.5213 | 0.9195 | 0.7570 | 0.6022 | 0.8017 | 0.7906 | 0.6760 | 1.8269 | 0.9086 | 0.7315 | 0.4701 | 0.7263 | 0.7727 | 0.2806 |
D1S1627 | 1.2474 | 0.7318 | 0.5084 | 0.2899 | 0.5992 | 0.5905 | 0.7854 | 1.4844 | 0.8472 | 0.6341 | 0.3736 | 0.6632 | 0.6805 | 0.7173 |
D3S4529 | 1.7955 | 0.8866 | 0.6929 | 0.4623 | 0.7215 | 0.7401 | 0.5134 | 2.1591 | 0.8889 | 0.7151 | 0.5418 | 0.7684 | 0.7616 | 0.8755 |
D2S441 | 2.1944 | 0.8999 | 0.7176 | 0.5484 | 0.7722 | 0.7573 | 0.5925 | 1.9792 | 0.9090 | 0.7492 | 0.5053 | 0.7474 | 0.7850 | 0.3725 |
D6S1017 | 1.8231 | 0.8984 | 0.7121 | 0.4692 | 0.7257 | 0.7548 | 0.2988 | 2.1591 | 0.8665 | 0.6783 | 0.5418 | 0.7684 | 0.7301 | 0.4002 |
D4S2408 | 1.8231 | 0.8828 | 0.6849 | 0.4692 | 0.7257 | 0.7333 | 0.7913 | 1.8269 | 0.8691 | 0.6716 | 0.4701 | 0.7263 | 0.7244 | 0.9671 |
D19S433 | 2.6333 | 0.9452 | 0.8039 | 0.6180 | 0.8101 | 0.8275 | 0.4798 | 1.8269 | 0.9394 | 0.7838 | 0.4701 | 0.7263 | 0.8116 | 0.0336 |
D17S1301 | 1.3466 | 0.8283 | 0.6013 | 0.3267 | 0.6287 | 0.6436 | 0.6327 | 2.0652 | 0.8758 | 0.7007 | 0.5234 | 0.7579 | 0.7414 | 0.7127 |
D1GATA113 | 1.3941 | 0.8186 | 0.5895 | 0.3435 | 0.6414 | 0.6561 | 0.6323 | 1.6964 | 0.8031 | 0.5946 | 0.4364 | 0.7053 | 0.6598 | 0.3492 |
D18S853 | 1.6233 | 0.8388 | 0.6226 | 0.4160 | 0.6920 | 0.6690 | 0.4518 | 2.1591 | 0.8869 | 0.6971 | 0.5418 | 0.7684 | 0.7367 | 0.4823 |
D20S482 | 1.6014 | 0.8598 | 0.6571 | 0.4096 | 0.6878 | 0.7003 | 0.6741 | 1.6964 | 0.8660 | 0.6748 | 0.4364 | 0.7053 | 0.7175 | 0.7917 |
D14S1434 | 1.4277 | 0.8414 | 0.6179 | 0.3549 | 0.6498 | 0.6609 | 0.7180 | 1.3971 | 0.8366 | 0.6172 | 0.3445 | 0.6421 | 0.6672 | 0.6042 |
D9S1122 | 1.6458 | 0.8440 | 0.6412 | 0.4224 | 0.6962 | 0.6913 | 0.8704 | 1.8269 | 0.8421 | 0.6425 | 0.4701 | 0.7263 | 0.6967 | 0.5298 |
D2S1776 | 1.9750 | 0.9204 | 0.7447 | 0.5044 | 0.7468 | 0.7779 | 0.2495 | 1.4844 | 0.9146 | 0.7433 | 0.3736 | 0.6632 | 0.7797 | 0.0061 |
D10S1435 | 1.9113 | 0.8996 | 0.7137 | 0.4901 | 0.7384 | 0.7544 | 0.5680 | 2.3750 | 0.8767 | 0.7045 | 0.5797 | 0.7895 | 0.7509 | 0.3844 |
D5S2500 | 1.6690 | 0.8749 | 0.6722 | 0.4289 | 0.7004 | 0.7229 | 0.4386 | 1.6379 | 0.8709 | 0.6676 | 0.4202 | 0.6947 | 0.7225 | 0.5452 |
TPI: Typical Paternity Index, PD: Power of Discrimination, PIC: Polymorphism Information Content, PE, Power of Exclusion, Ho: observed Heterozygosity, He, expected Heterozygosity, p: the probability of the Hardy-Weinberg testing.
In Sichuan Tibetan population, a total of 183 alleles are identified with corresponding frequencies vary from 0.0021 to 0.5401 (Supplementary Table S5). D19S433 is detected with the 15 alleles at the maximum, while D1S1627 is only detected with 6 alleles (Supplementary Figure S1). The TPI spans from 1.2474 at locus of D1S1627 to 2.6333 at locus of D19S433. The observed heterozygosity ranges from 0.5992 (D1S1627) to 0. 8101 (D19S433) with an average of 0.7258. The first three loci with highest PD are D19S433, D2S1776, and D11S4463, and the combined power of discrimination (CPD) value is 0.9999999999999999999. The highest and lowest PE loci are D19S433 (0.6180) and D1S1627 (0.2899), respectively, and the combined power of exclusion (CPE) value is 0. 999997.
In Sichuan Yi population, a total of 149 alleles are observed with corresponding allele frequencies span from 0.0053 to 0.5053 (Supplementary Table S6). As shown in Supplementary Figure S2, only 5 alleles are observed at four loci (D3S4529, D6S1017, D4S2408, and D4S2408), while D10S1248 and D19S433 are detected the most variations with 10 alleles observed. The TPI spans from 1.1875 at locus of D1S1677 to 2.6389 at locus of D12ATA63. The observed heterozygosity ranges from 0.5790 (D1S1677) to 0. 8105 (D12ATA63) with an average of 0.7258. The CPD and CPE in Yi population are 0.9999999999999999993 and 0.999999, respectively.
Population pairwise differences
To explore the genetic similarities and differences between the two investigated populations (Sichuan Tibetan and Sichuan Yi) and 26 previously studied populations, we first calculated the Fst and corresponding p values via analysis of molecular variance. For Sichuan Tibetan (Supplementary Table S7), Hainan Li shows significant genetic differences at five loci, followed by Guangdong Han, Hunan Han, Zhejiang Han at three loci; Aksu Uyghur, Fujian She, Inner Mongolia Mongolian, Northern Han, Yunnan Yi, Inner Mongolia Russian at two loci; Qinghai Salar, Hubei Tujia, Ningxia Han, Guanzhong Han, Huadong Han, Beijing Han, Shandong Han, Liaoning Han, two Henan Han groups, Xinjiang Xibe, Sichuan Han and Sichuan Yi at one locus. No significant genetic differences between Sichuan Tibetan and Lhasa Tibetan, Yunnan Bai and Ili Kazakh are observed. The Fst and corresponding p values between Sichuan Yi and other reference populations are listed in Supplementary Table S8. Fujian She and Zhejiang Han show genetic differences with Sichuan Yi at two loci, followed by Qinghai Salar, Inner Mongolia Mongolian, Northern Han, Guanzhong Han, Yunnan Yi, Guangdong Han, Hunan Han, Beijing Han, Inner Mongolia Russian, Shandong Han, Henan Han1, Xinjiang Xibe at one locus. No difference is identified with other populations.
Principal component analysis
PCA dissected the major factors accounting for the total variances of 28 Chinese populations using the Multivariate Statistical Package version 3.22 (MVSP) and SPSS software. Figure 1 presents the PCA results for the investigated Sichuan Tibetan, Yi, and 26 Chinese reference populations, and the first three components account for 42.159% of total variances. The PCA1 is made up of 18.17% and can clearly separate Qinghai Salar, Inner Mongolia Russian, Yunnan Yi and Fujian She from other 24 reference populations. The PCA2 accounts for 14.598% and shows a clear distinction between six populations (Hainan Li, Lhasa Tibetan, Sichuan Tibetan, Fujian She, Ili Kazakh and Qinghai Salar) and others, as shown in Fig. 1A and Supplementary Figure S3. In the PCA3 (being made up of 9.391%), the cluster shows a separation between Fujian She, Lhasa Tibetan, Hainan Li, Gansu Yugu, Sichuan Tibetan, as well as Inner Mongolia Mongolian and Russian, Xinjiang Xibe and other Chinese populations (Fig. 1B and Supplementary Figure S3). Next, PCA based on the allele frequency variations of 21 non-CODIS STRs using the different calculated formula implemented in SPSS is listed in Supplementary Figure S4, which displays the first two principal components which account for 97.299% of the total variations. The population distribution pattern is consistent with the results revealed using the MVSP software and shows a clear separation between minorities and Han Chinese populations.
Multidimensional scaling analysis
Subsequently, to evaluate the proportion of genetic heterogeneity and homogeneity attributable to Chinese population stratification, we calculated the Nei’s genetic distances for total 378 pairwise groups (Supplementary Table S9 and Fig. 2). The largest Nei’s standard genetic distance is observed between the Henan Han1 and Fujian She, and the relatively small genetic distances are identified between the Han populations distributed in different administrative divisions (0.0013 for Shandong Han and Henan2, 0.0018 for Beijing Han vs. Shandong Han, and Beijing Han vs. Henan Han2). For our studied two populations, Sichuan Yi has a relative far genetic relationship with Fujian She (0.0552) and a close genetically relationship with Shandong Han (0.0216) with a mean of 0.0316 ± 0.0095. In addition, Sichuan Tibetan has a far relationship with Fujian She (0.0577) and a close relationship with Lhasa Tibetan (0.0209) and Shandong Han (0.0208) with a mean of 0.0322 ± 0.0105.
We next constructed a multidimensional scaling plots based on Nei’s genetic matrix to explore the population genetic structure in our 9,776 individuals. As shown in Fig. 3, all Han Chinese populations are tightly grouped together and located at the center of MDS plots, with the exception of one Han population sampled from Henan Province. We can also find that several minority groups, including Yunnan Bai, Xinjiang Xibe, Hubei Tujia and Aksu Uyghur, are intermingled with aforementioned Han Chinese groups. As expected, other 13 minority ethnic populations are scattered located in the MDS: Hainan Li is located in the top left corner; Sichuan Yi and Inner Mongolia Mongolian in the top right corner; Fujian She, Yunnan Yi, Inner Mongolia Mongolian and Qinghai Salar are being assigned in the lower left quarter; Ili Kazakh, Gansu Yugu, Lhasa Tibetan and Sichuan Tibetan are clustered together and located in the bottom right corner. Our new investigated Sichuan Tibetan keeps a close position with Lhasa Tibetan as our expectation, however, Sichuan Yi shows a subtle separation with Yunnan Yi.
Phylogenetic relationship reconstruction
To further assess the Chinese population affiliations, we next sought to explore the phylogenetic relationship between Han Chinese populations and minority ethnic groups. The population relationships among 28 groups are depicted using a Neighbor-Joining tree based on Nei’s genetic distance matrix. As presented in Fig. 4, the Chinese Han populations are substantially differentiated from most of minority ethnic groups, especially significant in Chinese Muslim groups, Tibet high-altitude population, and even She ethnicity. Three main clusters can be clearly identified in the phylogenetic tree, the corresponding population compositions and distributions are congruent with the findings in the MDS and PCA. Notably, Sichuan Tibetan keeps genetic affinity with Lhasa Tibetan and first group together, and then cluster with Gansu Yugu population. Sichuan Yi is first clustered with Inner Mongolia Mongolian and then grouped with branch consisted of Liaoning Han and Yunnan Bai.
Discussion
China is currently populated by over 1.3 billion people who belong to at least 56 Chinese officially recognized linguistically and ethnically different groups. Genetic studies of Chinese populations from different minority ethnic groups are of great interest due to China’s complex demographics, large population size and complex geographical characteristics. Additionally, clearly identifying and detecting the modern human evolution, origin and demographic history have been a resurgence of interest in population geneticists and medical geneticists due to successfully analyze the ancient nuclear and mitochondrial DNA sequences of Neanderthal and Denisovan38–40 and corresponding affection of anatomically modern human or present-day population disease susceptibility41. Previous anthropological and genetic studies have provided evidences that the peopling of China is characterized by different ancestral ethnicity origin by maternal lineages (mitochondrial genome) and paternal genetic signatures (Y-Chromosome)42–48. Despite these large-scale efforts in investigating patterns of natural selection, estimating individual ancestry and predicting the evolutionary history in Chinese populations based on SNPs, InDels, and CODIS STR loci21–23,49,50. The human genetic variation of non-CODIS STRs in the Sichuan Yi and Tibetan populations remains unexplored. In this study, a total of 237 Chinese Tibetan individuals and 95 Yi individuals from Southwest China are genotyped using AGCU 21 + 1 PCR amplification kit. In addition, the population differentiation analyses also included 9,444 individuals in 26 groups from 23 distinct administrative divisions or 14 ethnic groups that are genotyped using this same kit in the previous studies. The final data set is made up of 21 non-CODIS STR loci genotypes in 9,776 individuals from 28 Chinese populations. We have analyzed the genetic variation and population structure of the aforementioned populations via analysis of molecular variance, PCA, MDS and phylogenetic analyses.
Forensic features of non-CODIS STRs in Tibetan and Yi
Recently, large number of commercial kits included the overlapped 13 CODIS STRs, such as GlobalFilerTM STR kit51 (Thermo Fisher Scientific, Carlsbad, USA), HuaxiaTM Platinum PCR Amplification kit52 (Thermo Fisher Scientific, Carlsbad, USA), PowerPlex® Fusion kit53 (Promega, USA) and so on, are widely used in the forensic human identification, paternity testing, and DNA database construction in criminal investigations or missing persons cases. CODIS STRs amplification systems with the limitation of improving the forensic efficiency when used them as a complementary to each other in the complex forensic cases. However, simultaneously testing the 21 non-CODIS STRs included in the AGCU 21 + 1 kit can minimize adventitious matches, increase discrimination power and facilitate data sharing in the cases with mutation, degraded sample cases and deficiency cases of paternity testing. Several measures of genetic diversity (observed heterozygosity, expected heterozygosity) and forensic statistical indexes of 21 non-CODIS STRs (PD, PE, PIC and so on) are relatively high in Sichuan Yi and Sichuan Tibetan populations in the present study. Most previous genetic studies based on a set of SNPs or STRs located on sex Chromosome show similar results18,19,54,55. But some researchers illustrated the lack of enough combined power of discrimination (0.99999999995713) and power of exclusion (0.97746) in Yi ethnicity56. However, the CPEs in the new studied populations and previous studied Han group24 are over 0.99999 and the CPDs are larger than 0.9999999999999999999. Our findings in these two investigated populations in combination with the previous studied Sichuan Han group24 demonstrated that twenty-one non-CODIS STRs included in AGCU 21 + 1 PCR amplification kit are highly discriminative and informative in diverse ethnic populations residing in Sichuan Province, West China, and should be used as a complementary tool in complicated paternity cases (parentage relationship identification with mutation, historical human skeletal remains, missing persons investigation and disaster victim identification). Besides, it can also be integrated into the new panel examined using the massively parallel sequencing platform, such as Ion S5 XL and Illumina-Miseq sequencer. This study also provides the first batch of genetic diversity information of 21 non-CODIS STRs in two ethnic groups and enriches the Chinese non-CODIS STRs reference databases.
Inner and inter population structure construction
The results observed in this comprehensive population comparison reveal that significant genetic differences are identified between Han Chinese populations and some minority ethnic groups, especially predominantly in Tibetan and She populations. Besides, our analyses of phylogenetic relationship reconstruction and MDS indicated that Han Chinese population is homogenous based on autosomal genetic makers compared with sex-inherited genetic markers57. Among Han Chinese populations, no significant differences are observed in different populations defined by geographic boundary (Yangzi River), which is identified by the Y-STRs and high density SNPs panel57. However, a slight North-South gradient difference can be vaguely identified and not a significant North-South genetic distinction. The identified genetic similarities and differences among Chinese populations are a valuable technique for identifying accurate disease risk gene in genome-wide association study, avoiding a spurious association, and detecting more ethnicity-special ancestry informative markers in forensic ancestry inference.
Tibetan and Yi populations belong to the Tibeto-Burman-speaking subfamily in the Sino-Tibetan languages and the previously investigated Southwest Chinese Han population belongs to another subfamily (Chinese). Tibetan, as a most representative group, is genetically adapted to extreme hypoxia, and has been the genetic subject for multidisciplinary Studies. Our results reveal that two Tibetan populations distributed in different geographic positions (high altitude: Tibet, and low altitude: Sichuan) have a strong genetic affinity, however, keep a far genetic relationship with other populations. These features are consistent with previous findings revealed by genetic studies based on high-throughput genotyping data and genome sequence data22,50,58. Yi population, as we expected, keeps a relatively distant genetic relationship with the Tibetan population residing in Sichuan. Besides, these two Tibeto-Burman-speaking populations keep a relatively genetically distinct relationship with our investigated Sichuan Han population24 although all three groups have a close geographical position, which is accordance with different ethnicity origin, cultural background. It is strange to find that Sichuan Yi and Yunnan Yi keep a distant genetic relationship which may be influenced by the culturally different of three subgroups of Yi (Ni, Lolo, and other). In the future, large-scale population genetic history studies from different administrative divisions based on different high-density genetic marker sets (even whole genome sequence of archaic or present-day human DNA) will be needed to investigate and elucidate the origin, migration of the Sino-Tibetan-speaking ethnicity groups.
Conclusions
In summary, we sampled 332 individuals from two minority ethnic groups to assess the genetic variations of 21 non-CODIS STR loci and combined these samples with 9,444 individuals previously investigated from 26 Chinese populations to explore Chinese population structures. Our results demonstrated that this panel of STRs is highly informative and polymorphic in the Sichuan Tibetan and Yi population, and can be widely used as a tool for personal identification and parentage testing in forensics. Additionally, the estimate of genetic differentiation (Fst and p values, and Nei’s genetic distance) suggested that the Sichuan Tibetan population and Sichuan Yi keep the close relationship with Lhasa Tibetan and Mongolian population, respectively, but being relatively isolated from other ethnic groups, especially within Han Chinese populations. The results obtained from PCA, MDS and phylogenetic analyses also demonstrated that genetic differences among Han Chinese and minorities widely exist and Han Chinese populations are homogeneous in different geographical divisions.
Methods
Ethnics standard
This study was approved by the institutional review boards of Sichuan University. All participants signed informed consent statements prior to participation. Human blood samples were collected upon approval of the Ethics Committee at the Institute of Forensic Medicine, Sichuan University. All the experimental procedures and the methods for each procedure were carried out in accordance with the approved guidelines of the Institute of Forensic Medicine, Sichuan University.
Sample preparation
Unrelated blood samples were collected from 237 unrelated Tibetan individuals (120 males and 117 females) recruited from Chengdu and 95 Yi individuals (55 males and 40 females) recruited from Liangshan Yi Autonomous Prefecture, Sichuan Province. All individuals had been required to be the indigenous inhabitants or with a recent ancestor residing in the corresponding sample collection region at least three generations.
Human genomic DNA was extracted using PureLink Genomic DNA Mini Kit (Thermo Fisher Scientific) according to the manufacturer’s protocol. The quantity of the DNA template was determined using Quantifiler Human DNA Quantification Kit on a 7500 Real-time PCR System (Thermo Fisher Scientific). DNA samples were normalized to 1.0 ng/μL and stored at −20 °C until amplification.
PCR amplification and genotyping
PCR amplification was performed following the manufacturer’s protocol on a ProFlex 96-well PCR System (Thermo Fisher Scientific). The PCR system was a 25 μL reaction volume containing 10 μL of Reaction Mix, 5 μL of Primers 21 + 1, 0.75 μL C-Taq and 1.0 ng of template DNA. The thermal cycling conditions consisted of an initial step at 95 °C for 2 min; followed by 30 cycles of 94 °C for 30 s, 60 °C for 60 s, and 65 °C for 90 s; and a final extension at 60 °C for 60 min.
Amplification products were separated and detected on the Applied Biosystems 3130 Genetic Analyzers following the manufacturer’s recommendations. One microliter of PCR products or allelic ladder was added to a mixture containing 9.5 μL of deionized Hi-Di formamide and 0.5 of μL AGCU Marker Size-500 (AGCU ScienTech Incorporation). The mixture was injected at 1.2 kV for 16 s and electrophoresed at 13 kV for 1550 s with a run temperature at 60 °C. Allele allocation was carried out with GeneMapper ID 3.20 analysis software using the allelic ladder and the set of bins and panels provided by the kit.
Population studies
In order to evaluate the forensic efficiency of this non-CODIS STR panel for application in the Sichuan Tibetan and Yi, genotype data of 332 unrelated individuals were analyzed. The observed heterozygosity (Ho), expected heterozygosity (He), the exact test of Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium (LD) were estimated and performed using Arlequin 3.5.2.259. Allelic frequencies and forensic parameters, including the polymorphism information content (PIC), power of discrimination (PD), power of exclusion (PE) were calculated using the modified PowerStat V12 spreadsheet (Promega)60.
Additionally, 26 previously investigated populations6–17,24–37 comprising of 9444 individuals were retrieved to investigate the Sichuan and Chinese population genetic substructure. These reference populations include Salar (n = 120) residing in Qinghai Province, Tibetan (n = 104) from Lhasa in Tibetan autonomous region of Tibet, Tujia (n = 107) from Hubei Province, Bai (n = 106) from Yunnan Province, Han (n = 202) from Ningxia Hui Autonomous Region, Uyghur (n = 502) from Aksu region in Xinjiang Uyghur Autonomous Region, Kazakh (n = 114) from Ili region in Xinjiang Uyghur Autonomous Region, She (n = 154) from Fujian Province, Mongolian (n = 523) from Inner Mongolia Autonomous Region, Han (n = 220) from North China, Han (n = 275) from Guanzhong region in Shanxi Province, Yi (n = 110) from Yunnan Province, Li (n = 504) from Hainan Province, Han (n = 506) from Guangdong Province, Han (n = 501) from Hunan Province, Han (n = 225) from Huadong region of China, Han (n = 459) from Beijing Municipality, Han (n = 481) from Zhejiang Province, Russian (n = 114) from Inner Mongolia Autonomous Region, Han (n = 1030) from Shandong Province, Han (n = 207) from Liaoning Province, Han1 (n = 1136) from Henan Province, Yugu (n = 180) from Gansu Province, Xibe (n = 226) from Xinjiang Uyghur Autonomous Region, Han2 (n = 970) from Henan Province and Han (n = 368) from Sichuan Province. Population comparison was conducted using the principal components analysis (PCA), multidimensional scaling plots (MDS) and phylogenetic analysis. PCA was conducted using the Multivariate Statistical Package version 3.2261 and the SPSS software62 (version 21.0, SPSS Inc, Chicago, IL) based on the allele frequency distribution of 21 non-CODIS STRs in 28 Chinese populations. Genetic distance (Nei’s) was calculated using Phylip-3.695. The MDS was performed in SPSS and a Neighbor-Joining tree was constructed in Mega 7.063 based on Nei’s genetic distance matrix64.
Quality control
All experiments were conducted at the Forensic Genetics Laboratory of the Institute of Forensic Medicine, Sichuan University, which is an accredited laboratory (ISO 17025), and has been accredited by the China National Accreditation Service for Conformity Assessment (CNAS). We strictly followed the recommendations of Chinese National Standards and Scientific Working Group on DNA Analysis Methods (SWGDAM)65. Control DNA 9947A (AGCU ScienTech Incorporation) and ddH2O were used as positive and negative controls respectively for each batch of genotyping.
Electronic supplementary material
Acknowledgements
This work was supported by grants from National Key R&D Program of China (2016YFC0800703) and from the National Natural Science Foundation of China (No. 81330073 and No. 81571854) and the Fundamental Research Funds for the Central University (2012017yjsy187). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author Contributions
G.H. and Z.W. wrote the manuscript, X.Z. and X.C. collected the samples, G.H., Z.W., M.W. and J.L. conducted the experiment and analyzed the results, Z.W. modified the manuscript, and Y.H. conceived the experiment. All authors have reviewed the manuscript.
Competing Interests
The authors declare no competing interests.
Footnotes
Guanglin He and Zheng Wang contributed equally to this work.
Electronic supplementary material
Supplementary information accompanies this paper at 10.1038/s41598-018-24291-5.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Ellegren H. Microsatellites: simple sequences with complex evolution. Nat. Rev. Genet. 2004;5:435–445. doi: 10.1038/nrg1348. [DOI] [PubMed] [Google Scholar]
- 2.Jobling MA, Gill P. Encoded evidence: DNA in forensic analysis. Nat. Rev. Genet. 2004;5:739–751. doi: 10.1038/nrg1455. [DOI] [PubMed] [Google Scholar]
- 3.Kayser M, de Knijff P. Improving human forensics through advances in genetics, genomics and molecular biology. Nat. Rev. Genet. 2011;12:179–192. doi: 10.1038/nrg2952. [DOI] [PubMed] [Google Scholar]
- 4.Sudmant PH, et al. Diversity of human copy number variation and multicopy genes. Science. 2010;330:641–646. doi: 10.1126/science.1197005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Sudmant PH, et al. Global diversity, population stratification, and selection of human copy-number variation. Science. 2015;349:aab3761. doi: 10.1126/science.aab3761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Qian Q, et al. Genetic polymorphisms of 39 STR loci in Shandong Han population (In Chinese) Chin. J. Forensic Med. 2015;30:513–516. [Google Scholar]
- 7.Xiao N, Yu W, Zhou S. Genetic polymorphisms 21 Non-CODIS STR loci in Liaoning Han population (In Chinese) J. Clin. Transfus Lab. Med. 2015;17:394–397. [Google Scholar]
- 8.Zhang YD, et al. Genetic variability and phylogenetic analysis of Han population from Guanzhong region of China based on 21 non-CODIS STR loci. Sci. Rep. 2015;5:8872. doi: 10.1038/srep08872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Zhang Z, Ma L, Sun H, Yang X, Luo J. Genetic diversity of 21 Non-CODIS STR loci in Gansu Yugu population (In Chinese) J. Foren. Med. 2015;31:394–395. [Google Scholar]
- 10.Guo Y, et al. Genetic polymorphic investigation of 21 autosomal short tandem repeat loci in the Chinese Li ethnic group. Forensic Sci. Int. Genet. 2016;24:e17–e18. doi: 10.1016/j.fsigen.2016.07.001. [DOI] [PubMed] [Google Scholar]
- 11.Lu H, Qiu P, Liu C, Du W, Chen L. Genetic polymorphism of 21 non-CODIS STR loci for Guangdong (Southern China) Han population. Forensic Sci. Int. Genet. 2017;27:180–181. doi: 10.1016/j.fsigen.2016.11.005. [DOI] [PubMed] [Google Scholar]
- 12.Shen C, et al. Forensic effectiveness and population differentiations study of AGCU 21 + 1 fluorescence multiplex in Chinese Henan Han population. Forensic Sci. Int. Genet. 2017;28:e18–e21. doi: 10.1016/j.fsigen.2017.01.013. [DOI] [PubMed] [Google Scholar]
- 13.Song F, Luo HB, Hou YP. Population data of 21 non-CODIS STR loci in the Chinese Uygur ethnic minority. Forensic Sci. Int. Genet. 2014;13:e1–e2. doi: 10.1016/j.fsigen.2014.04.007. [DOI] [PubMed] [Google Scholar]
- 14.Yuan JY, et al. Genetic profile characterization and population study of 21 autosomal STR in Chinese Kazak ethnic minority group. Electrophoresis. 2014;35:503–510. doi: 10.1002/elps.201300398. [DOI] [PubMed] [Google Scholar]
- 15.Yuan L, et al. Population genetics analysis of 38 STR loci in the She population from Fujian Province of China. Leg. Med. (Tokyo) 2014;16:314–318. doi: 10.1016/j.legalmed.2014.05.008. [DOI] [PubMed] [Google Scholar]
- 16.Zha L, et al. Genetic polymorphism of 21 non-CODIS STR loci in the Chinese Mongolian ethnic minority. Forensic Sci. Int. Genet. 2014;9:e32–33. doi: 10.1016/j.fsigen.2013.08.010. [DOI] [PubMed] [Google Scholar]
- 17.Guo J, et al. Genetic polymorphism of 21 non-CODIS STR loci for Han population in Hunan Province, China. Forensic Sci. Int. Genet. 2015;17:81–82. doi: 10.1016/j.fsigen.2015.03.016. [DOI] [PubMed] [Google Scholar]
- 18.He G, et al. Genetic polymorphism investigation of the Chinese Yi minority using PowerPlex(R) Y23 STR amplification system. Int. J. Legal Med. 2017;131:663–666. doi: 10.1007/s00414-017-1537-2. [DOI] [PubMed] [Google Scholar]
- 19.He G, et al. Forensic characteristics and phylogenetic analyses of the Chinese Yi population via 19 X-chromosomal STR loci. Int. J. Legal Med. 2017;131:1243–1246. doi: 10.1007/s00414-017-1563-0. [DOI] [PubMed] [Google Scholar]
- 20.Yao, H. B. et al. The genetic admixture in Tibetan-Yi Corridor. Am. J. Phys. Anthropol. 10.1002/ajpa.23291 (2017). [DOI] [PubMed]
- 21.Kang L, et al. Y-chromosome O3 haplogroup diversity in Sino-Tibetan populations reveals two migration routes into the eastern Himalayas. Ann. Hum. Genet. 2012;76:92–99. doi: 10.1111/j.1469-1809.2011.00690.x. [DOI] [PubMed] [Google Scholar]
- 22.Lorenzo FR, et al. A genetic mechanism for Tibetan high-altitude adaptation. Nat. Genet. 2014;46:951–956. doi: 10.1038/ng.3067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Xing J, et al. Genomic analysis of natural selection and phenotypic variation in high-altitude mongolians. PLoS Genet. 2013;9:e1003634. doi: 10.1371/journal.pgen.1003634. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.He G, Wang Z, Wang M, Hou Y. Genetic Diversity and Phylogenetic Differentiation of Southwestern Chinese Han: a comprehensive and comparative analysis on 21 non-CODIS STRs. Sci. Rep. 2017;7:13730. doi: 10.1038/s41598-017-13190-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Zhu BF, et al. Population genetics and forensic efficiency of twenty-one novel microsatellite loci of Chinese Yi ethnic group. Electrophoresis. 2013;34:3345–3351. doi: 10.1002/elps.201300362. [DOI] [PubMed] [Google Scholar]
- 26.Shao WB, Zhang SH, Li L. Genetic Polymorphisms of 21 Non CODIS STR Loci (In Chinese) Fa Yi Xue Za Zhi. 2011;27:36–38. [PubMed] [Google Scholar]
- 27.Guo J, et al. Polymorphism analysis and evaluation of 21 non CODIS STR loci in the Beijing Han population (In Chinese) Foren. Sci. Technol. 2010;6:13–15. [Google Scholar]
- 28.Wu W, et al. Genetic polymorphisms of 21 Non-CODIS STR loci in Zhejiang Han population (In Chinese) Foren. Sci. Technol. 2010;3:19–22. [Google Scholar]
- 29.Wang HD, et al. Allelic diversity distributions of 21 new autosomal short tandem repeat loci in Chinese Ningxia Han population. Forensic Sci. Int. Genet. 2013;7:e78–79. doi: 10.1016/j.fsigen.2012.11.012. [DOI] [PubMed] [Google Scholar]
- 30.Meng HT, et al. Genetic diversities of 20 novel autosomal STRs in Chinese Xibe ethnic group and its genetic relationships with neighboring populations. Gene. 2015;557:222–228. doi: 10.1016/j.gene.2014.12.037. [DOI] [PubMed] [Google Scholar]
- 31.Yuan L, Ge J, Lu D, Yang X. Population data of 21 non-CODIS STR loci in Han population of northern China. Int. J. Legal Med. 2012;126:659–664. doi: 10.1007/s00414-011-0664-4. [DOI] [PubMed] [Google Scholar]
- 32.Zhu BF, et al. Genetic diversities of 21 non-CODIS autosomal STRs of a Chinese Tibetan ethnic minority group in Lhasa. Int. J. Legal Med. 2011;125:581–585. doi: 10.1007/s00414-010-0519-4. [DOI] [PubMed] [Google Scholar]
- 33.Liu Y, et al. Genetic polymorphisms of 39 autosomal STR loci in Henan Han population (In Chinese) J. Foren. Med. 2014;30:217–220. [Google Scholar]
- 34.Wang HD, et al. Allelic frequency distributions of 21 non-combined DNA index system STR loci in a Russian ethnic minority group from Inner Mongolia, China. J. Zhejiang Univ. Sci. B. 2013;14:533–540. doi: 10.1631/jzus.B1200262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Shen CM, et al. Allelic polymorphic investigation of 21 autosomal short tandem repeat loci in a Chinese Bai ethnic group. Leg. Med. (Tokyo) 2013;15:109–113. doi: 10.1016/j.legalmed.2012.08.012. [DOI] [PubMed] [Google Scholar]
- 36.Teng Y, et al. Genetic variation of new 21 autosomal short tandem repeat loci in a Chinese Salar ethnic group. Mol. Biol. Rep. 2012;39:1465–1470. doi: 10.1007/s11033-011-0883-2. [DOI] [PubMed] [Google Scholar]
- 37.Yuan GL, et al. Genetic data provided by 21 autosomal STR loci from Chinese Tujia ethnic group. Mol. Biol. Rep. 2012;39:10265–10271. doi: 10.1007/s11033-012-1903-6. [DOI] [PubMed] [Google Scholar]
- 38.Meyer M, et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Natur. 2014;505:403–406. doi: 10.1038/nature12788. [DOI] [PubMed] [Google Scholar]
- 39.Sawyer S, et al. Nuclear and mitochondrial DNA sequences from two Denisovan individuals. Proc. Natl. Acad. Sci. USA. 2015;112:15696–15700. doi: 10.1073/pnas.1506646112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Slon V, et al. Neandertal and Denisovan DNA from Pleistocene sediments. Science. 2017;356:605–608. doi: 10.1126/science.aam9695. [DOI] [PubMed] [Google Scholar]
- 41.Vernot B, Akey JM. Resurrecting surviving Neandertal lineages from modern human genomes. Science. 2014;343:1017–1021. doi: 10.1126/science.1245938. [DOI] [PubMed] [Google Scholar]
- 42.Chen J, et al. Genetic structure of the Han Chinese population revealed by genome-wide SNP variation. Am. J. Hum. Genet. 2009;85:775–785. doi: 10.1016/j.ajhg.2009.10.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Li, H., Cho, K., Kidd, J. R. & Kidd, K. K. Genetic landscape of Eurasia and “admixture” in Uyghurs. Am. J. Hum. Genet. 85, 934–937; author reply 937–939, 10.1016/j.ajhg.2009.10.024 (2009). [DOI] [PMC free article] [PubMed]
- 44.Wen B, et al. Analyses of genetic structure of Tibeto-Burman populations reveals sex-biased admixture in southern Tibeto-Burmans. Am. J. Hum. Genet. 2004;74:856–865. doi: 10.1086/386292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Wei LH, et al. Genetic trail for the early migrations of Aisin Gioro, the imperial house of the Qing dynasty. J. Hum. Genet. 2017;62:407–411. doi: 10.1038/jhg.2016.142. [DOI] [PubMed] [Google Scholar]
- 46.Wen B, et al. Genetic evidence supports demic diffusion of Han culture. Natur. 2004;431:302–305. doi: 10.1038/nature02878. [DOI] [PubMed] [Google Scholar]
- 47.Li H, et al. Mitochondrial DNA diversity and population differentiation in southern East Asia. Am. J. Phys. Anthropol. 2007;134:481–488. doi: 10.1002/ajpa.20690. [DOI] [PubMed] [Google Scholar]
- 48.Qin Z, et al. A mitochondrial revelation of early human migrations to the Tibetan Plateau before and after the last glacial maximum. Am. J. Phys. Anthropol. 2010;143:555–569. doi: 10.1002/ajpa.21350. [DOI] [PubMed] [Google Scholar]
- 49.Sun K, Ye Y, Luo T, Hou Y. Multi-InDel Analysis for Ancestry Inference of Sub-Populations in China. Sci. Rep. 2016;6:39797. doi: 10.1038/srep39797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Simonson TS, et al. Genetic evidence for high-altitude adaptation in Tibet. Science. 2010;329:72–75. doi: 10.1126/science.1189406. [DOI] [PubMed] [Google Scholar]
- 51.Hennessy LK, et al. Developmental validation of the GlobalFiler((R)) express kit, a 24-marker STR assay, on the RapidHIT((R)) System. Forensic Sci. Int. Genet. 2014;13:247–258. doi: 10.1016/j.fsigen.2014.08.011. [DOI] [PubMed] [Google Scholar]
- 52.Wang M, et al. Genetic characteristics and phylogenetic analysis of three Chinese ethnic groups using the Huaxia Platinum System. Sci. Rep. 2018;8:2429. doi: 10.1038/s41598-018-20871-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Oostdik K, et al. Developmental validation of the PowerPlex((R)) Fusion System for analysis of casework and reference samples: A 24-locus multiplex for new database standards. Forensic Sci. Int. Genet. 2014;12:69–76. doi: 10.1016/j.fsigen.2014.04.013. [DOI] [PubMed] [Google Scholar]
- 54.He, G. et al. X-chromosomal STR-based genetic structure of Sichuan Tibetan minority ethnicity group and its relationships to various groups. Int. J. Legal Med. 10.1007/s00414-017-1672-9 (2017). [DOI] [PubMed]
- 55.He G, et al. Genetic polymorphisms for 19 X-STR loci of Sichuan Han ethnicity and its comparison with Chinese populations. Leg. Med. (Tokyo) 2017;29:6–12. doi: 10.1016/j.legalmed.2017.09.001. [DOI] [PubMed] [Google Scholar]
- 56.Zhang YD, et al. Forensic evaluation and population genetic study of 30 insertion/deletion polymorphisms in a Chinese Yi group. Electrophoresis. 2015;36:1196–1201. doi: 10.1002/elps.201500003. [DOI] [PubMed] [Google Scholar]
- 57.Wang Z, Du W, He G, Liu J, Hou Y. Forensic characteristics and phylogenetic analysis of Hubei Han population in central China using 17 Y-STR loci. Forensic Sci. Int. Genet. 2017;29:e4–e8. doi: 10.1016/j.fsigen.2017.04.013. [DOI] [PubMed] [Google Scholar]
- 58.He, G. et al. Genetic variation and forensic characterization of highland Tibetan ethnicity reveled by autosomal STR markers. Int. J. Legal Med. 10.1007/s00414-017-1765-5 (2018). [DOI] [PubMed]
- 59.Excoffier L, Lischer HE. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 2010;10:564–567. doi: 10.1111/j.1755-0998.2010.02847.x. [DOI] [PubMed] [Google Scholar]
- 60.Zhao F, Wu X, Cai G, Xu C. The application of Mdified-Powerstates software in forensic biostatistics (In Chinese) Chin. J. Forensic Med. 2003;18:297–298. [Google Scholar]
- 61.Kovach, W. L. MVSP-A MultiVariate Statistical Package for Windows, ver. 3.1. Kovach Computing Services, Pentraeth, Wales, U.K (2007).
- 62.Hansen J. Using SPSS for Windows and Macintosh: Analyzing and Understanding Data. Amer. Statistician. 2005;59:113–113. doi: 10.1198/tas.2005.s139. [DOI] [Google Scholar]
- 63.Kumar S, Stecher G, Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol. Biol. Evol. 2016;33:1870–1874. doi: 10.1093/molbev/msw054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 1987;4:406–425. doi: 10.1093/oxfordjournals.molbev.a040454. [DOI] [PubMed] [Google Scholar]
- 65.Scientific Working Group on DNA Analysis (SWGDAM). SWGDAM Interpretation Guidelines for Autosomal STR Typing by Forensic DNA Testing Laboratories. (2010).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.