Abstract
Understanding genetic variation between populations is important because it affects the portability of human genome wide analytical methods. We compared genetic variation and substructure between Malawians and other African and non-African HapMap populations. Allele frequencies and adjacent linkage disequilibrium (LD) were measured for 617,715 single nucleotide polymorphisms (SNPs) across subject genomes. Allele frequencies in the Malawian population (N = 226) were highly correlated with allele frequencies in HapMap populations of African Ancestry (AFA, N = 376), namely Yoruban in Ibadan, Nigeria (Spearman’s r2 = 0.97), Luhya in Webuye, Kenya (r2 = 0.97), African Americans in the southwest United States (r2 = 0.94), and Maasai in Kinyawa, Kenya (r2 = 0.91). This correlation was much lower between Malawians and other ancestry populations (r2 < 0.52). LD correlations between Malawians and HapMap populations were strongest for the populations of African ancestry (AFA r2 > 0.82, other ancestries r2 < 0.57). Principal components analyses revealed little population substructure within our Malawi sample but provided clear distinction between Malawians, AFA populations, and two European populations. Five SNPs within the lactase gene (LCT) had substantially different allele frequencies between the Malawi population and Maasai in Kenyawa, Kenya (rs3769013, rs730005, rs3769012, rs2304370; p values < 1×10−33).
Keywords: genome variation, African ancestry, population genetics, population substructure, LCT, lactase, HapMap
Introduction
The International HapMap Project has offered an extraordinary amount of information on common genetic variation across the human genome, leading to publicly available data including more than one million single nucleotide polymorphisms (SNPs) genotyped in populations across the world (1). Currently, genetic data from 11 populations are included: Han Chinese in Bejing, China (CHB), Japanese in Tokyo, Japan (JPT), Utah residents with Northern and Western European ancestry from the CEPH collection (CEU), Yoruba in Ibadan, Nigeria (YRI), African ancestry in Southwest USA (ASW), Chinese in Metropolitan Denver, Colorado (CHD), Gujarati Indians in Houston, Texas (GIH), Luhya in Webuye, Kenya (LWK), Mexican ancestry in Los Angeles, California (MEX), Maasai in Kinyawa, Kenya (MKK), and Toscans in Italy (TSI). HapMap data allows researchers to characterize haplotype patterns and allele frequencies of SNPs in the HapMap populations and to compare such patterns to those observed in other populations. This exercise has helped researchers to more adequately understand global genetic diversity and has facilitated a greater understanding of the genetic etiology of disease.
A number of studies have described transferability or “portability” of tagSNPs in the HapMap populations to tagSNPs in other populations. Most of these studies have focused on specific ENCODE regions (2–4), candidate genes (5, 6), or collections of SNPs genotyped across one to three chromosomes (7–10). No such studies have included Malawian individuals. The goal of this work was to compare genetic variation among HapMap populations of African ancestry (AFA) to a population from Blantyre, Malawi. Information on 617,715 SNPs across 22 autosomal chromosomes of the human genome is described. To our knowledge, this study is foremost in incorporating Malawians into the population genetics forum, and adds an additional assessment of genetic variability within Malawi in relation to self-reported ethnicity. The findings from this study add to our understanding of genomic variation across the African continent as well as within one urban area of Malawi.
Materials and Methods
Malawi Study Population
The participants involved in this work are a subset of a larger previously conducted cohort study of malaria and HIV in pregnancy (11, 12). The prospective cohort was conducted from 2000 to 2004 and included 3,825 consenting pregnant women admitted to Queen Elizabeth Central Hospital (12). HIV testing was performed at delivery and patients were followed up at 6 and 12 weeks post delivery. A total of 1,157 women tested positive for HIV, 884 of which delivered at Queen Elizabeth Central Hospital resulting in 807 singleton live births. At delivery, 751 infants were tested for HIV, identifying 65 HIV positive infants at birth. Of the 668 HIV negative infants, 179 were lost to follow-up. A total of 507 infants were tested for HIV at 6 or 12 weeks, resulting in 89 additional HIV positive infants. A subset of the cohort (N = 246) consisting of all HIV positive (at birth, 6 weeks, or 12 weeks) infants and an equal proportion of HIV negative (at all visits) infants of HIV positive mothers was selected. The HIV negative infants were obtained from a random sample of the HIV-exposed negative infants and had a similar distribution across time of enrollment as the cases. Both positive and negative infants were required to have quality DNA samples available. This subset was originally used in a genome wide association study (GWAS) to assess the association between SNPs across infant genomes and susceptibility to maternal HIV infection (24). The genome-wide SNP data was applied to this work in order to evaluate the external generalizability of our findings.
Self-reported ethnicity for our final dataset of 226 subjects after quality control (see below) was available for the Malawi dataset, which included the following groups: Ngoni (N = 62), Lomwe (N = 58), Yao (N = 33), Chewa (N = 18), Tumbuka (N = 15), Mang’anja/Nyanja (N = 15), Sena (N = 12), Khokhola (N = 4), Chipeta (N = 1), Likoma (N = 1), Nkhonde (N = 1), Ntcheu (N = 1), Portugues (N = 1), Tonga (N = 1), and 3 missing. A categorical variable was created corresponding to the first seven groups listed and groups with a frequency less than 12 were combined into one category (Other, N=15). The Nyanja and Mang’anja are different names for the same ethnic group (13). The study population included from our data collection will be referred to as individuals of various self-reported ethnicity in Blantyre, Malawi (BMW).
Genotyping
Genotyping was completed for 114 HIV-exposed, infected infants and 132 exposed, uninfected infants. Additional cases were not available because DNA samples were of poor quality or had a very low concentration of DNA. The genotyping was performed at Duke University, using Illumina’s HumanHap650Y Genotyping BeadChip version 3 (Illumina, Inc., San Diego, CA). This BeadChip enabled whole-genome genotyping of over 655,000 tagSNPs derived from the International HapMap Project (14) and over 100,000 tag SNPs selected based on the Yoruban Nigerian HapMap Population. The 650Y BeadChip v3 utilized information in dbSNP up to version 126.
Quality Control
For the Malawi study population, quality control for genotyping error was performed at Duke University and as previously described (15). Briefly, all samples were brought into a BeadStudio data file using an Illuminum cluster file and clustering of samples was evaluated in order to determine random clustering of SNPs. Samples with very low call rates (<95%) were excluded. Subsequent reclustering of undeleted SNPs and additional exclusion by call rate was performed (15). SNPs with Het Excess value between −1.0 to −0.1 and 0.1 to 1.0 were evaluated to determine if raw and normalized data indicated clean calls for the genotypes (15).
Statistical quality control measures were performed at UNC Chapel Hill. Individuals missing more than 10% of marker data and SNPs with a genotyping rate less than or equal to 10% were excluded from analyses. Related individuals were identified by first estimating identity by descent (IBD). A small number of individuals with estimated genome-wide IBD values > 0.05 were removed (N = 5). All statistical quality control measures were performed in PLINK version 1.05 (16). After completing quality control, a total sample size of N = 226 BMW subjects were included in subsequent data analyses.
Data Management and Integration of HapMap Populations
All HapMap populations from Phase 3 were included in this study. HapMap data was downloaded in PLINK format from the International HapMap Project website (http://hapmap.ncbi.nlm.nih.gov/). The samples from the 11 HapMap populations were genotyped on approximately 1.5 million SNPs using the Illumina Human1M and Affymetrix Genome-Wide Human SNP Array 6.0 platforms. For this study, populations were processed through identical statistical quality control as completed for the Malawi population with the exception of IBD estimation. In order to ensure the populations included only unrelated individuals, offspring of populations with duos or trios (ASW, CEU, MEX, MKK, YRI) were removed. HapMap populations were then merged, either individually or jointly, with the Malawi population. Due to strand differences between the Malawi population and the HapMap populations, some Malawi strands were flipped and the files were remerged. The genotype data from HapMap and our sample was based on the same dbSNP build 126.
Statistical Analysis
Four analytic methods were conducted in order to compare genetic variation between populations: 1) the correlation of allele frequencies across the genome, 2) the correlation of adjacent SNP-SNP linkage disequilibrium (LD) across the genome, 3) allelic-based chi-square tests to evaluate the association between population and SNPs, and 4) principal component (PC) analysis to evaluate population substructure. Each HapMap population was individually compared to the Malawi population and within Malawi; the self-reported ethnic groups were compared. SNPs included on Illumina’s HumanHap650Y Genotyping BeadChip were selected based on allele frequencies and LD patterns in the CEU and YRI HapMap samples. To reduce the potential for bias due to the SNP selection strategy on the HumanHap650Y panel, when comparing different populations we excluded SNPs with relatively rare minor allele frequencies (MAF) in our calculations. Specifically, in all analyses, comparisons were only made between SNPs with MAF > 0.05 in each included population sample.
Comparison of Allele Frequencies and Adjacent SNP-SNP Linkage Disequilibrium
Allele frequencies were computed for each population using PLINK version 1.05 (16). Each allele frequency file was formatted and imported into STATA version 10 (17). In order to compare allele frequencies across populations, a generic coded allele was set for each SNP, using alphabetical hierarchy (A < C < T < G). For example, if the alleles were A and G for a particular SNP, the coded allele would be designated as A and the non-coded allele as G, regardless of the observed allele frequency in a population. Spearman’s correlation coefficient was computed for the allele frequencies of the coded alleles between each pair of populations (i.e. BMW vs. ASW, BMW vs. YRI, etc.) using all SNPs with MAF >0.05 in both populations. Adjacent linkage disequilibrium (LD) was estimated by calculating the standard r2 value for all pairs of adjacent SNPs using PLINK version 1.05 (16). For each pair of populations, Spearman’s correlation coefficient between populations was calculated using all adjacent SNP pair r2 estimates.
Allelic-based (1 df) chi-square tests were systematically computed to identify SNPs with significantly different allele frequencies between the Malawi population and each HapMap population containing subjects of African ancestry. Specifically, we defined a dichotomous outcome variable, with value of ‘1’ assigned to the BMW population and value of ‘2’ assigned to the comparison population (ASW, LWK, MKK, or YRI). Two measures of the genomic inflation factor, based on the median and mean chi-square statistic over all SNP comparisons, were computed for each genome-wide association analysis using PLINK version 1.05 (16). Because chi-square statistics are affected by sample size under the alternative hypothesis, analyses were also performed using a random selection of 42 individuals from each HapMap comparison group to facilitate more direct comparison of the results across the HapMap populations. The value of 42 was used as it represented the smallest group (ASW, N = 42).
Population Substructure
Population substructure was evaluated using PC analyses for: (1) the Malawi population, (2) the Malawi population combined with the HapMap populations of African (ASW, LWK, MKK, YRI) ancestry, and (3) the Malawi population combined with the HapMap populations of African and European (CEU, TSI) ancestry. The PC analyses were conducted using EIGENSOFT version 2.0 (18). SNP inclusion in the PC analysis was restricted to autosomal SNPs that had MAF > 0.05 and observed genotype frequencies consistent with Hardy-Weinberg equilibrium expected proportions (p > 0.001) in each participating individual population. Strict SNP pruning based on pair-wise SNP-SNP LD was conducted to identify a subset of independent SNPs for inclusion in PC analysis. Specifically, we calculated pair-wise SNP-SNP LD, measured by r2, between all SNP pairs within 500 kb in the BMW sample using PLINK. A custom computer program was used to select the largest number of SNPs from each chromosome such that each selected SNP had no other selected SNPs within 500 kb that were in LD with it (defined by r2 > 0.01). Based on these selection rules, we identified (1) 23,612, (2) 18,481 and (3) 16,912 SNPs for use in the three PC analyses, respectively.
Finally, we performed global ancestry estimation using the software ADMIXTURE on our combined sample of 7 African and European populations using the same 16,912 SNPs included in the PC analyses (19). ADMIXTURE uses a maximum likelihood approach to model the probabilities of the observed genotype data using ancestry proportions and population allele frequencies. Similar to the program STRUCTURE, ADMIXTURE requires the user specification of the number of postulated ancestral populations that preceded the observed populations included in the study sample. For this study, we considered K = 2, 3 and 5 ancestral populations.
Results
Quality Control
The HapMap data available at the time of this study contained approximately 1.5 million SNPs for 1,115 individuals of 11 unique ethnic groups. The Malawi data included 112 males and 114 females, with a genotyping call rate of 99.975%. Following quality control, the combined HapMap and Malawi dataset included 1,150 individuals, 602 of which were of African ancestry, and 633,763 SNPs, 617,715 of which were on autosomal chromosomes and incorporated in the analyses.
Comparison of Allele Frequencies across Populations
The allele frequencies of the Malawi population were highly correlated with the allele frequencies of the HapMap AFA populations. Among the AFA subgroups, the allele frequencies of the Yoruban of Ibadan, Nigeria and the Luhya in Webuye, Kenya were most strongly correlated to those of the Malawians (Table 1, Figure S1). Interestingly, allele frequencies of the Malawi population were more closely correlated with allele frequencies of individuals of African ancestry in the Southwest USA than they were with that of the Maasai in Kinyawa, Kenya (Table 1, Figure S1). Much lower correlations in allele frequencies were observed between the Malawi population and HapMap populations of other ancestry (r2 < 0.52).
Table 1.
Correlation of allele frequency across populations1
Ancestry2 | Population | BMW | YRI | LWK | MKK | ASW | CEU | TSI | CHB | CHD | GIH | JPT | MEX |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AFA | BMW | 1 | |||||||||||
AFA | YRI | 0.972 | 1 | ||||||||||
AFA | LWK | 0.971 | 0.960 | 1 | |||||||||
AFA | MKK | 0.913 | 0.906 | 0.932 | 1 | ||||||||
AFA | ASW | 0.942 | 0.947 | 0.937 | 0.917 | 1 | |||||||
EUA | CEU | 0.475 | 0.474 | 0.498 | 0.618 | 0.622 | 1 | ||||||
EUA | TSI | 0.486 | 0.485 | 0.510 | 0.634 | 0.628 | 0.967 | 1 | |||||
ASA | CHB | 0.418 | 0.417 | 0.433 | 0.500 | 0.498 | 0.607 | 0.602 | 1 | ||||
ASA | CHD | 0.415 | 0.415 | 0.430 | 0.496 | 0.495 | 0.602 | 0.597 | 0.976 | 1 | |||
ASA | GIH | 0.511 | 0.510 | 0.532 | 0.632 | 0.628 | 0.850 | 0.848 | 0.712 | 0.709 | 1 | ||
ASA | JPT | 0.415 | 0.415 | 0.430 | 0.497 | 0.495 | 0.603 | 0.598 | 0.959 | 0.952 | 0.709 | 1 | |
MXA | MEX | 0.490 | 0.490 | 0.508 | 0.603 | 0.609 | 0.834 | 0.826 | 0.735 | 0.727 | 0.811 | 0.733 | 1 |
Spearman’s correlation coefficients for allele frequencies, MAF > 0.05
Abbreviations: ASW, African ancestry in Southwest USA; BMW, Individuals of various self-reported ancestry in Blantyre, Malawi; CEU, Utah residents with Northern and Western European ancestry from the CEPH collection; CHB, Han Chinese in Bejing, China; CHD, Chinese in Metropolitan Denver, Colorado; GIH, Gujarati Indians in Houston, Texas; JPT, Japanese in Tokyo, Japan; LWK, Luhya in Webuye, Kenya; MEX, Mexican ancestry in Los Angeles, California; MKK, Maasai in Kinyawa, Kenya; TSI, Toscans in Italy; YRI, Yoruba in Ibadan, Nigeria.
A total of 14 self-reported ethnic groups comprised the BMW group. The genotyping was completed for infants of the mother-infant pairs, so the ethnic groups reflect self-reported maternal ethnicity. Data on paternal ethnic group was not available. Allele frequencies were highly correlated across all ethnic groups in the Malawian study population (Table 2). The greatest correlation in allele frequency was observed for Ngoni and Lomwe, which represented the majority of infants in the dataset (27% and 26%, respectively). The smallest correlation was observed between the Sena and Mang’anja/ Nyanja ethnic groups (Table 2). All SNPs summarized were restricted to having a MAF > 0.05. This resulted in approximately 569,373 SNPs compared by population. This number was slightly different for each comparison, as the number of SNPs with a MAF > 0.05 varied by population.
Table 2.
Correlation of allele frequency across Malawi ethnic groups1
Population | Ngoni | Lomwe | Yao | Chewa | Tumbuka | Nyanja/Mang'anja | Sena | Other |
---|---|---|---|---|---|---|---|---|
Ngoni | 1 | |||||||
Lomwe | 0.969 | 1 | ||||||
Yao | 0.958 | 0.956 | 1 | |||||
Chewa | 0.937 | 0.935 | 0.925 | 1 | ||||
Tumbuka | 0.928 | 0.927 | 0.916 | 0.897 | 1 | |||
Nyanja/Mang'anja | 0.919 | 0.918 | 0.907 | 0.888 | 0.880 | 1 | ||
Sena | 0.914 | 0.913 | 0.903 | 0.883 | 0.875 | 0.867 | 1 | |
Other | 0.928 | 0.927 | 0.916 | 0.897 | 0.889 | 0.879 | 0.876 | 1 |
Spearman’s correlation coefficients for allele frequencies, MAF > 0.05
Similar to allele frequencies, adjacent SNP-SNP LD was highly correlated across populations of African ancestry (Table 3). A lower correlation in adjacent LD was observed between the Malawi population and other ancestry HapMap populations (r2 < 0.57, Table 3, Figure S2). The average LD between pairs of adjacent SNPs in the Malawi population was similar to that observed in the other African ancestry HapMap populations and substantially lower than the average LD observed between adjacent SNPs in the other ancestry HapMap populations (Table S1).
Table 3.
Correlation of adjacent linkage disequilibrium across populations1
Ancestry2 | Population | BMW | YRI | LWK | MKK | ASW | CEU | TSI | CHB | CHD | GIH | JPT | MEX |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
AFA | BMW | 1 | |||||||||||
AFA | YRI | 0.896 | 1 | ||||||||||
AFA | LWK | 0.887 | 0.857 | 1 | |||||||||
AFA | MKK | 0.826 | 0.805 | 0.836 | 1 | ||||||||
AFA | ASW | 0.822 | 0.819 | 0.802 | 0.805 | 1 | |||||||
EUA | CEU | 0.543 | 0.533 | 0.555 | 0.666 | 0.634 | 1 | ||||||
EUA | TSI | 0.550 | 0.540 | 0.562 | 0.676 | 0.636 | 0.937 | 1 | |||||
ASA | CHB | 0.523 | 0.514 | 0.529 | 0.604 | 0.569 | 0.700 | 0.695 | 1 | ||||
ASA | CHD | 0.520 | 0.511 | 0.526 | 0.600 | 0.564 | 0.694 | 0.689 | 0.945 | 1 | |||
ASA | GIH | 0.564 | 0.553 | 0.574 | 0.673 | 0.634 | 0.852 | 0.848 | 0.768 | 0.761 | 1 | ||
ASA | JPT | 0.517 | 0.508 | 0.523 | 0.597 | 0.562 | 0.692 | 0.687 | 0.932 | 0.925 | 0.761 | 1 | |
MXA | MEX | 0.551 | 0.542 | 0.560 | 0.652 | 0.623 | 0.829 | 0.820 | 0.760 | 0.753 | 0.810 | 0.755 | 1 |
Spearman’s correlation coefficients, MAF>0.05. Linkage disequilibrium measured in each population using adjacent marker r2.
Abbreviations: ASW, African ancestry in Southwest USA; BMW, Individuals of various self-reported ancestry in Blantyre, Malawi; CEU, Utah residents with Northern and Western European ancestry from the CEPH collection; CHB, Han Chinese in Bejing, China; CHD, Chinese in Metropolitan Denver, Colorado; GIH, Gujarati Indians in Houston, Texas; JPT, Japanese in Tokyo, Japan; LWK, Luhya in Webuye, Kenya; MEX, Mexican ancestry in Los Angeles, California; MKK, Maasai in Kinyawa, Kenya; TSI, Toscans in Italy; YRI, Yoruba in Ibadan, Nigeria.
SNPs Associated with Population Membership
Allelic-based (1-df) chi-square tests revealed that the allele frequencies of many SNPs were significantly different between the Malawians and all four HapMap populations comprised of individuals of African ancestry. The greatest differences in observed allele frequencies existed between the Malawians and the Maasai in Kinyawa, Kenya (BMW vs. MKK genomic inflation factor (GIF) or median χ2 = 9.21, mean χ2 = 9.16; random sample of 42 GIF = 4.31, mean χ2 = 4.49). The Malawians and the individuals of African ancestry in the Southwest United States (BMW vs. ASW) also demonstrated strong differences in allele frequencies (GIF = 2.54, mean χ2 = 2.72, for 42 total individuals in the ASW group). Smaller, though still highly significant, differences in allele frequencies were observed between the Malawi population and the Luhya in Webuye, Kenya (BMW vs. LWK) (GIF = 2.22, mean χ2 = 2.25; sample of 42 GIF = 1.77, mean χ2 = 1.78) and between the Malawi population and the Yorubans in Ibadan, Nigeria (BMW vs. YRI) (GIF = 2.59, mean χ2 = 2.62; sample of 42 GIF = 1.78, mean χ2 = 1.81).
The top SNPs associated with population membership were investigated for functional significance, for each BMW vs. population comparison. Depending on the comparison of interest, many SNPs reached genome-wide statistical significance, having Bonferroni corrected p values < 0.05 (Figure 1, Table 4). For the comparison of allele frequencies between BMW and MKK, over 100 SNPs had a Bonferroni corrected p value < 1×10−23. The most significant SNPs (the top 16) were found on chromosome 2. Of the 30 most significant SNPs within genes, 4 were located within the Lactase gene (LCT) (frequencies of the 3 most significant SNPs are displayed in Figure 2 and Table 4). This gene is involved in production of the lactase enzyme, essential for the digestion of lactose, and has clinical implications for lactose intolerance (20).
Figure 1.
SNP Associations with population membership. Individuals in Blantyre, Malawi (BMW) were compared to each African ancestry HapMap population: Individuals of African ancestry in the Southwest USA (ASW), Luhya in Webuye, Kenya (LWK), Maasai in Kinyawa, Kenya (MKK), and Yoruban of Ibadan, Nigeria (YRI).
Table 4.
Top 10 Outlier SNPs for each comparison of BMW vs. HapMap populations of African ancestry1
Comparison | CHR | SNP | POS | MAF1 | MAF2 | UNADJ | BONF | Gene |
---|---|---|---|---|---|---|---|---|
BMW vs. MKK |
2 | rs6430594 | 136435643 | 0.69 | 0.08 | 5.31E-68 | 2.64E-62 | Aspartyl-tRNA synthetase (DARS) |
2 | rs12472293 | 136364547 | 0.11 | 0.70 | 7.30E-61 | 3.63E-55 | NA | |
2 | rs309143 | 136430648 | 0.78 | 0.18 | 4.70E-58 | 2.34E-52 | Aspartyl-tRNA synthetase (DARS) | |
2 | rs3769013 | 136272652 | 0.81 | 0.23 | 8.33E-54 | 4.14E-48 | Lactase (LCT) | |
2 | rs730005 | 136299164 | 0.71 | 0.16 | 2.13E-51 | 1.06E-45 | Lactase (LCT) | |
2 | rs3769012 | 136272950 | 0.71 | 0.16 | 2.73E-50 | 1.36E-44 | Lactase (LCT) | |
2 | rs961360 | 136110128 | 0.70 | 0.17 | 2.81E-48 | 1.40E-42 | R3H domain containing 1 (R3HDM1) | |
2 | rs6430585 | 136223397 | 0.15 | 0.70 | 1.35E-47 | 6.72E-42 | UBX domain protein 4 (UBXN4) | |
2 | rs3806502 | 136004743 | 0.73 | 0.21 | 6.98E-46 | 3.47E-40 | R3H domain containing 1 (R3HDM1) | |
2 | rs2305248 | 135644782 | 0.72 | 0.21 | 9.44E-44 | 4.69E-38 | RAB3 GTPase activating protein subunit 1 (catalytic) (RAB3GAP1) |
|
BMW vs. YRI |
19 | rs2190687 | 14765415 | 0.50 | 0.06 | 9.26E-28 | 4.68E-22 | NA |
2 | rs6733349 | 231976556 | 0.44 | 0.06 | 2.48E-22 | 1.25E-16 | NA | |
7 | rs1717725 | 38071558 | 0.05 | 0.30 | 1.50E-18 | 7.59E-13 | NA | |
7 | rs6944302 | 79942827 | 0.49 | 0.17 | 5.20E-18 | 2.62E-12 | Guanine nucleotide binding protein, alpha transducing 3 (GNAT3) |
|
7 | rs12700014 | 18930601 | 0.54 | 0.23 | 1.19E-15 | 6.00E-10 | Histone deacetylase 9 (HDAC9) | |
9 | rs3739821 | 129742298 | 0.32 | 0.08 | 8.61E-15 | 4.35E-09 | Family with sequence similarity 102, member A (FAM102A) |
|
21 | rs494619 | 18347077 | 0.35 | 0.07 | 1.84E-14 | 9.27E-09 | NA | |
6 | rs2301220 | 33146744 | 0.26 | 0.57 | 6.68E-14 | 3.38E-08 | Major histocompatibility complex, class II, DP alpha 1 (HLA-DPA1) |
|
7 | rs10216027 | 79968467 | 0.30 | 0.08 | 2.12E-13 | 1.07E-07 | Guanine nucleotide binding protein, alpha transducing 3 (GNAT3) |
|
6 | rs6457713 | 33185754 | 0.33 | 0.63 | 3.33E-13 | 1.68E-07 | NA | |
BMW vs. LWK |
2 | rs6733349 | 231976556 | 0.44 | 0.05 | 3.22E-19 | 1.62E-13 | NA |
16 | rs1017228 | 21971218 | 0.31 | 0.06 | 2.16E-17 | 1.08E-11 | Chromosome 16 open reading frame 52 (C16orf52) |
|
7 | rs11772387 | 48019075 | 0.29 | 0.06 | 9.83E-15 | 4.95E-09 | Sad1 and UNC84 domain containing 1 (SUNC1) |
|
1 | rs2236906 | 208038108 | 0.58 | 0.27 | 7.48E-13 | 3.77E-07 | Interferon regulatory factor 6 (IRF6) |
|
1 | rs4304614 | 107256813 | 0.25 | 0.55 | 1.08E-12 | 5.45E-07 | NA | |
2 | rs3789106 | 111437355 | 0.28 | 0.07 | 1.54E-12 | 7.77E-07 | Acyl-Coenzyme A oxidase-like (ACOXL) | |
6 | rs1572438 | 803970 | 0.16 | 0.42 | 3.71E-12 | 1.87E-06 | NA | |
7 | rs1915960 | 48011003 | 0.07 | 0.27 | 9.55E-12 | 4.80E-06 | Sad1 and UNC84 domain containing 1 (SUNC1) |
|
7 | rs10248243 | 47478639 | 0.05 | 0.24 | 1.09E-11 | 5.49E-06 | Tensin 3 (TNS3) | |
7 | rs983186 | 27155184 | 0.06 | 0.25 | 1.36E-11 | 6.84E-06 |
LOC100133311, similar to hCG1644697 No known function. |
|
BMW vs. ASW |
1 | rs12030126 | 234879762 | 0.05 | 0.39 | 4.84E-20 | 2.43E-14 | NA |
9 | rs7020021 | 132242790 | 0.48 | 0.10 | 7.63E-19 | 3.83E-13 | Hemicentin 2 (HMCN2) | |
23 | rs226711 | 98339530 | 0.08 | 0.49 | 9.45E-19 | 4.74E-13 | NA | |
2 | rs282268 | 224628420 | 0.06 | 0.39 | 5.16E-18 | 2.59E-12 | NA | |
9 | rs7045276 | 191644 | 0.07 | 0.39 | 3.87E-17 | 1.94E-11 | NA | |
2 | rs282273 | 224631266 | 0.05 | 0.35 | 5.75E-17 | 2.89E-11 | NA | |
2 | rs2577284 | 224637656 | 0.07 | 0.39 | 2.56E-16 | 1.28E-10 | NA | |
9 | rs3739821 | 129742298 | 0.42 | 0.08 | 3.32E-16 | 1.66E-10 | Family with sequence similarity 102, member A (FAM102A) |
|
1 | rs6586395 | 232715442 | 0.10 | 0.45 | 4.77E-16 | 2.39E-10 | NA | |
8 | rs7003117 | 115759827 | 0.08 | 0.39 | 6.30E-16 | 3.16E-10 | NA |
Abbreviations: ASW, African ancestry in Southwest USA; BMW, Individuals of various self-reported ancestry in Blantyre, Malawi; LWK, Luhya in Webuye, Kenya; MKK, Maasai in Kinyawa, Kenya; YRI, Yoruba in Ibadan, Nigeria. NA if SNP not located within a gene.
Figure 2.
Lactase gene SNP frequencies by African ancestry population
Fewer SNPs were statistically significantly different between BMW and LWK. Three SNPs had a Bonferroni corrected p ≤ 1×10−7, 2 of which were located within genes (Table 4). For BMW vs. YRI, 8 SNPs had a Bonferroni corrected p ≤ 1×10−7. Three SNPs with Bonferroni corrected p ≤ 1×10−4 (1 shown in Table 4 with Bonferroni corrected p ≤ 1×10−7) were within the major histocompatibility complex, class II, DP alpha 1 (HLA-DPA1) gene. This gene is involved in many immunological functions, including interaction with HIV-1 (20). The comparison between BMW and ASW resulted in 69 SNPs with a Bonferroni corrected p ≤ 1×10−7, most of which were not located within genes (66.7%).
Population Substructure
Three separate PC analyses were performed to evaluate population substructure:1) for the Malawi population by itself, 2) for the HapMap populations of African ancestry (ASW, LWK, MKK, YRI) combined with the Malawi population, and 3) for the HapMap populations of African ancestry, the Malawi population, and two HapMap populations of European ancestry (ASW, LWK, MKK, YRI, BMW, CEU, TSI). The eigenvalues (EVs) for the first 10 PCs for each analysis is reported in the supplementary material (Table S2). PC analysis for the Malawi population revealed little evidence for population substructure (Figure 3) and we were unable to detect any genetic variation across self-reported ethnicity. The largest EV, corresponding to PC1, was only 1.21 and nine next largest EVs decreased very slowly (Table S2).
Figure 3.
No evidence of population substructure in Malawi population: Component 1 vs. 2. Analyses performed in EIGENSOFT software using 23,612 SNPs.
PC analyses using the HapMap African ancestry populations and the Malawi population revealed clear distinction among the different populations of African ancestry (Figure 4). Overall, the BMW, LWK and YRI samples showed tight within population clustering of PC values while the ASW and MKK populations showed relatively strong dispersion. PC1 (corresponding EV = 11.62) values were generally ordered MKK > ASW > LWK > YRI > BMW, with considerable overlap between ASW and both MKK and LWK (Figure 4). PC1 provided distinction between BMW and all HapMap populations of African ancestry with the exception of YRI. Two BMW subjects had overlap with LWK. PC2 (corresponding EV = 2.96) provided two very distinct clusters, YRI and ASW in one cluster and BMW, LWK and MKK in the other cluster (Figure 4). The next eight PCs (corresponding EVs ranging from 2.75–2.13) were all driven by diversity within the MKK sample and provided little distinction between the other African samples (data not shown).
Figure 4.
Separation of BMW and African ancestry HapMap populations: Component 1 vs. 2. Analyses performed in EIGENSOFT software using 18,481 SNPs.
PC analyses including the Malawi population and the HapMap African and European ancestry populations revealed clear separation across all 7 populations (Figure S3 and Figure S4). PC1 (corresponding EV = 74.78) presented three distinct clusters, CEU and TSI in one cluster, ASW and MKK in the intermediate cluster, and LWK, YRI, and BMW in the third cluster. PC2 (corresponding EV = 6.42) showed separation of LWK and MKK from ASW, YRI, and BMW as well as separation of MKK from CEU and TSI (Figure S3). The next eight PCs (EVs ranging from 2.48–1.95) were largely driven by the variability in MKK samples (data not shown); however, PC4 showed clear separation between BMW and YRI samples (Figure S4).
Finally, model-based ancestry estimation results using the program ADMIXTURE for our sample that included all 7 African and European populations are summarized in Table 5 (for K = 2, 3 and 5 postulated ancestral populations) and graphically presented in a triangle plot for K = 3 postulated ancestral populations in Figure S5. For K = 2, individual (data not shown) and summary measures for each population suggest a strong clustering of the European populations in one ancestry population and three African populations (LWK, BMW and YRI) in the other ancestry population. ASW and MKK were largely indistinguishable and had proportions for both putative ancestry populations that were between the European and other African populations. When expanding the number of ancestral populations to K = 3, we noted complete separation between ASW and MKK, with MKK dominating one ancestral population, the Europeans dominating another ancestral population and two African populations (BMW and YRI) defining the extremes of the third ancestral population. ASW is positioned between the European and two African (BMW and YRI) populations while LWK is clearly defined by it’s own cluster positioned between MKK and the two African populations (BMW and YRI). The resulting triangle plot (Figure S5) looks similar to the PC plot of PC1 vs PC2 (Figure S3). Expanding the number of postulated ancestral populations to K = 5 provided clear separation between all populations and evidence for separation between individual members of MKK. Interestingly, ASW subjects were estimated to be a mixture of the three postulated ancestries most closely linked to YRI, BMW and CEU/TSI, respectively, even after effectively separating YRI from BMW.
Table 5.
Admixture analyses for clusters of size K = 2, 3 and 5 with reported means, (standard deviations) and [ranges] by ancestral population.
CEU | TSI | ASW | YRI | BMW | LWK | MKK | |
---|---|---|---|---|---|---|---|
(N = 109) | (N = 77) | (N = 42) | (N = 108) | (N = 226) | (N = 83) | (N = 143) | |
K=2 | |||||||
1 | 0.014 (0.010) | 0.028 (0.008) | 0.733 (0.088) | 0.930 (0.008) | 0.959 (0.014) | 0.900 (0.011) | 0.714 (0.034) |
[0.000,0.057] | [0.010,0.049] | [0.457,0.906] | [0.909,0.948] | [0.865,0.984] | [0.869,0.926] | [0.617,0.834] | |
2 | 0.986 (0.010) | 0.972 (0.008) | 0.267 (0.088) | 0.070 (0.008) | 0.041 (0.014) | 0.100 (0.011) | 0.286 (0.034) |
[0.943,1.000] | [0.951,0.990] | [0.094,0.543] | [0.052,0.091] | [0.016,0.135] | [0.074,0.131] | [0.166,0.383] | |
K=3 | |||||||
1 | 0.012 (0.013) | 0.050 (0.013) | 0.098 (0.024) | 0.095 (0.023) | 0.070 (0.021) | 0.249 (0.037) | 0.678 (0.124) |
[0.000,0.056] | [0.007,0.075] | [0.047,0.143] | [0.037,0.151] | [0.019,0.142] | [0.177,0.321] | [0.323,0.953] | |
2 | 0.017 (0.010) | 0.006 (0.008) | 0.663 (0.081) | 0.858 (0.017) | 0.905 (0.019) | 0.721 (0.033) | 0.242 (0.107) |
[0.000,0.039] | [0.000,0.032] | [0.411,0.831] | [0.814,0.896] | [0.834,0.947] | [0.655,0.785] | [0.035,0.605] | |
3 | 0.971 (0.013) | 0.945 (0.010) | 0.239 (0.089) | 0.047 (0.011) | 0.025 (0.016) | 0.030 (0.010) | 0.080 (0.034) |
[0.919,0.997] | [0.924,0.967] | [0.068,0.518] | [0.022,0.075] | [0.000,0.131] | [0.010,0.057] | [0.000,0.206] | |
K=5 | |||||||
1 | 0.010 (0.011) | 0.006 (0.010) | 0.442 (0.062) | 0.667 (0.045) | 0.132 (0.044) | 0.224 (0.044) | 0.089 (0.050) |
[0.000,0.047] | [0.000,0.038] | [0.301,0.600] | [0.549,0.776] | [0.000,0.278] | [0.112,0.329] | [0.000,0.194] | |
2 | 0.007 (0.011) | 0.036 (0.016) | 0.055 (0.029) | 0.042 (0.023) | 0.043 (0.021) | 0.202 (0.037) | 0.601 (0.181) |
[0.000,0.046] | [0.003,0.075] | [0.000,0.127] | [0.000,0.106] | [0.000,0.010] | [0.133,0.290] | [0.000,1.000] | |
3 | 0.012 (0.013) | 0.003 (0.007) | 0.256 (0.048) | 0.248 (0.042) | 0.774 (0.045) | 0.495 (0.050) | 0.106 (0.097) |
[0.000,0.045] | [0.000,0.030] | [0.136,0.382] | [0.157,0.357] | [0.626,0.918] | [0.395,0.631] | [0.000,0.499] | |
4 | 0.963 (0.012) | 0.938 (0.10) | 0.219 (0.091) | 0.019 (0.010) | 0.021 (0.015) | 0.019 (0.010) | 0.066 (0.035) |
[0.912,0.989] | [0.917,0.957] | [0.042,0.503] | [0.000,0.044] | [0.000,0.125] | [0.001,0.047] | [0.000,0.189] | |
5 | 0.008 (0.009) | 0.017 (0.012) | 0.028 (0.016) | 0.025 (0.014) | 0.031 (0.014) | 0.060 (0.013) | 0.138 (0.160) |
[0.000,0.035] | [0.000,0.043] | [0.000,0.063] | [0.000,0.066] | [0.000,0.080] | [0.027,0.096] | [0.000,1.000] |
Abbreviations: ASW, African ancestry in Southwest USA; BMW, Individuals of various self-reported ancestry in Blantyre, Malawi; CEU, Utah residents with Northern and Western European ancestry from the CEPH collection; LWK, Luhya in Webuye, Kenya; MKK, Maasai in Kinyawa, Kenya; TSI, Toscans in Italy; YRI, Yoruba in Ibadan, Nigeria.
Discussion
The aim of this work was to compare genetic variation of HapMap populations of African ancestry (AFA) to a population from Blantyre, Malawi. We also present some results contrasting Malawians and HapMap populations of other ancestry. HapMap populations were first compared to Malawians based on observed allele frequencies and adjacent SNP-SNP linkage disequilibrium (LD) r2 values across the autosomal genome. Allele frequency correlations were strong between the Malawi population (BMW) and the HapMap populations of African ancestry, with Spearman’s r2 > 90%. However, it was observed that the Malawi population appeared to be more closely related to the individuals of African ancestry in the Southwest United States (ASW) than they were with the Maasai in Kinyawa, Kenya (MKK). This was interesting, as we expected the European ancestry in ASW subjects would dilute the correlation of allele frequencies between the Malawi and ASW samples. The stronger contrast in allele frequencies between BMW and MKK compared to BMW and other AFA populations is likely a reflection of the variable regional ancestry within the African continent. The Maasai are classified as a Nilotic population and speak Maa, a Nilo-Saharan language (21). Thus, they may have stronger ancestral roots from North-Eastern Africa whereas the BMW, YRI, and LWK claim origins closer to West-Central Africa. Results from the association analyses were consistent with the findings of the allele frequency comparisons by population, showing that the Malawi population is most similar to the YRI and LWK populations, less similar to the ASW, and least similar to the MKK. Local LD patterns, as measured by r2 for all adjacent SNP pairs, showed a similar pattern as observed when evaluating allele frequencies, with BMW having the most similar SNP-SNP LD values to YRI and LWK, and less similar LD values to ASW and MKK. Still, it is noted that the differences in the correlation of allele frequencies and pair-wise SNP-SNP LD between BMI and MKK or ASW are considerably smaller than the differences between any sample of African ancestry and any HapMap sample of non-African ancestry. The differences in allele frequencies and SNP-SNP LD values between the Malawi population and the HapMap populations of other ancestry (i.e. CEU, TSI, JPT, etc.) were striking. These findings illustrate inter-continental and cross-continental genetic diversity and suggest that care must be taken when assessing generalization of a genetic study, e.g. the results of a drug clinical trial.
A number of relative outlier SNPs identified by the comparison of allele frequencies across African ancestry populations were of interest with regards to function. This included three SNPs in the comparison of BMW to MKK located in the Lactase (LCT) gene. This gene encodes a protein that is integral to plasma membrane and has both phlorizin hydrolase activity and lactase activity and has clinical relevance to lactose intolerance (20). We speculate that the differences between BMW and MKK for the LCT gene may be evidence of recent selective pressure. The Maasai are a pastoralist community, relying heavily on the consumption of milk as part of a daily diet. The SNPs discussed here were located within an intron of LCT, and functional value was undetermined.
Our final analyses assessed population substructure by using PC and model-based admixture analyses. The Malawi sample itself was found to be reasonably homogenous, exemplified by the high correlation in allele frequencies by self-reported ethnicity and by the lack of apparent population substructure. The generalizability of these findings to the rest of Malawi is unclear, as this population was ascertained in Blantyre, the second largest city. Intermarriage or multiple ethnicities per household may be more common in urban areas compared to rural areas of Malawi. It is also important to note that ethnicity was reported by mothers in the study, and genotyping was conducted in the infants. Thus, the ethnicity was known to be a surrogate for infant ethnicity in this study. Because no separation was observed, we would not expect information on ethnicity from the father or any other family members to alter the findings that this group was very genetically homogenous. It should also be noted that accounting for infant HIV status did not alter our conclusions (data not shown).
The PC and model-based admixture analyses showed that the Malawi population was genetically distinct from the other African ancestry populations. In fact, all 5 African ancestry populations (BMW, ASW, LWK, MKK and YRI) could be distinguished by three PCs. Interestingly, our results suggest that the Malawi population from Blantyre are possibly more similar genetically to Yorubans in Ibadan, Nigeria than to the Luhya in Webuye, Kenya despite the substantially larger geographical distance between Malawi and Nigeria. The subjects from Blantyre are clearly genetically more similar to both the Yorubans and the Luhya than to the Maasai in Kenyawa, Kenya. These findings would appear to violate Malicot’s Isolation-by-Distance model, which predicts genetic similarity between populations will decrease exponentially as the geographic distance between them increases (22). However, consideration of migratory history and corresponding ancestral populations may explain our observations. It is believed that most tribes in Malawi are descendants of the mass Bantu migration from West-Central Africa between the 10th and 15th centuries, and have predominantly Niger-Kordofanian ancestry (23). In contrast, the Masaai are thought to have migrated down from the Nile region in Egypt in the middle of the 15th century, and considered of Nilo-Saharan origin (23). The ancestral origin(s) of the Luhya tribes are disputed, with their own oral history suggesting they migrated from Egypt while historians generally believe the Luhya tribes migrated from West-Central Africa alongside other Bantu tribes.
While considerable efforts were made to include stringent quality control, there was a potential for batch effects in genotype calling between the Malawi and HapMap samples and between the HapMap samples themselves. It is conceivable that such batch effects contributed to the differences in allele frequencies observed between the populations. In order to determine whether or not this was the case, additional genotyping of the Malawi population simultaneously with HapMap samples and/or replication of these findings in other Malawi populations is necessary.
This study showed that the Malawi population in Blantyre does not exhibit strong genetic variability by self-reported ethnicity. We also showed that although highly correlated with regards to allele frequencies and adjacent SNP-SNP LD, the Malawi population and populations of African ancestry in the International HapMap Project are genetically distinct. Furthermore, we determined that the allele frequencies and LD are not strongly correlated between Malawi and the HapMap populations of non-African ancestry. The discordance in genetic variation observed both within and across continental lines highlights the necessity for researchers to consider ancestry and always account for population stratification. It also suggests that such differences should be taken into account when predicting drug and vaccine efficacy for patients across the African continent. Future work may involve more fine-tuning of these results, including projects to sequence specific regions of interest in BMW subjects followed by a comparison of sequence variation by population.
Supplementary Material
Acknowledgements
The genotyping in the Malawi population completed at Duke University was in collaboration with David Goldstein and Kevin Shianna. This work was supported by the NIAID Center for HIV/AIDS Vaccine Initiative (CHAVI) grant AI067854. The corresponding author also received support from the Centers for Disease Control and Prevention Dissertation Award (PAR 231-07, 2008).
References
- 1.Consortium IH. A haplotype map of the human genome. Nature. 2005 Oct 27;437(7063):1299–1320. doi: 10.1038/nature04226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Montpetit A, Nelis M, Laflamme P, Magi R, Ke X, Remm M, et al. An evaluation of the performance of tag SNPs derived from HapMap in a Caucasian population. PLoS Genet. 2006 Mar;2(3):e27. doi: 10.1371/journal.pgen.0020027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Marvelle AF, Lange LA, Qin L, Wang Y, Lange EM, Adair LS, et al. Comparison of ENCODE region SNPs between Cebu Filipino and Asian HapMap samples. J Hum Genet. 2007;52(9):729–737. doi: 10.1007/s10038-007-0175-9. [DOI] [PubMed] [Google Scholar]
- 4.Arnold JC, Singh KK, Spector SA, Sawyer MH. Undiagnosed respiratory viruses in children. Pediatrics. 2008 Mar;121(3):e631–e637. doi: 10.1542/peds.2006-3073. [DOI] [PubMed] [Google Scholar]
- 5.Ribas G, Gonzalez-Neira A, Salas A, Milne RL, Vega A, Carracedo B, et al. Evaluating HapMap SNP data transferability in a large-scale genotyping project involving 175 cancer-associated genes. Hum Genet. 2006 Feb;118(6):669–679. doi: 10.1007/s00439-005-0094-9. [DOI] [PubMed] [Google Scholar]
- 6.Mueller JC, Lohmussaar E, Magi R, Remm M, Bettecken T, Lichtner P, et al. Linkage disequilibrium patterns and tagSNP transferability among European populations. Am J Hum Genet. 2005 Mar;76(3):387–398. doi: 10.1086/427925. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Willer CJ, Scott LJ, Bonnycastle LL, Jackson AU, Chines P, Pruim R, et al. Tag SNP selection for Finnish individuals based on the CEPH Utah HapMap database. Genet Epidemiol. 2006 Feb;30(2):180–190. doi: 10.1002/gepi.20131. [DOI] [PubMed] [Google Scholar]
- 8.Smith EM, Wang X, Littrell J, Eckert J, Cole R, Kissebah AH, et al. Comparison of linkage disequilibrium patterns between the HapMap CEPH samples and a family-based cohort of Northern European descent. Genomics. 2006 Oct;88(4):407–414. doi: 10.1016/j.ygeno.2006.04.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Hu C, Jia W, Zhang W, Wang C, Zhang R, Wang J, et al. An evaluation of the performance of HapMap SNP data in a Shanghai Chinese population: analyses of allele frequency, linkage disequilibrium pattern and tagging SNPs transferability on chromosome 1q21-q25. BMC Genet. 2008;9:19. doi: 10.1186/1471-2156-9-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Evans DM, Cardon LR. A comparison of linkage disequilibrium patterns and estimated population recombination rates across multiple populations. Am J Hum Genet. 2005 Apr;76(4):681–687. doi: 10.1086/429274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Mwapasa V, Rogerson SJ, Kwiek JJ, Wilson PE, Milner D, Molyneux ME, et al. Maternal syphilis infection is associated with increased risk of mother-to-child transmission of HIV in Malawi. Aids. 2006 Sep 11;20(14):1869–1877. doi: 10.1097/01.aids.0000244206.41500.27. [DOI] [PubMed] [Google Scholar]
- 12.Mwapasa V, Rogerson SJ, Molyneux ME, Abrams ET, Kamwendo DD, Lema VM, et al. The effect of Plasmodium falciparum malaria on peripheral and placental HIV-1 RNA concentrations in pregnant Malawian women. Aids. 2004 Apr 30;18(7):1051–1059. doi: 10.1097/00002030-200404300-00014. [DOI] [PubMed] [Google Scholar]
- 13.Kaspin D. The politics of ethnicity in Malawi's democratic transition. The Journal of Modern African Studies. 1995;33(4):595–620. [Google Scholar]
- 14.Thorisson GA, Smith AV, Krishnan L, Stein LD. The International HapMap Project Web site. Genome Res. 2005 Nov;15(11):1592–1593. doi: 10.1101/gr.4413105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Fellay J, Shianna KV, Ge D, Colombo S, Ledergerber B, Weale M, et al. A whole-genome association study of major determinants for host control of HIV-1. Science. 2007 Aug 17;317(5840):944–947. doi: 10.1126/science.1143767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007 Sep;81(3):559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.StataCorp. Stata Statistical Software: Release 10. College Station, TX: StataCorp LP; 2007. [Google Scholar]
- 18.Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006 Dec;2(12):e190. doi: 10.1371/journal.pgen.0020190. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009 Sep;19(9):1655–1664. doi: 10.1101/gr.094052.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009 Jan;37:D5–D15. doi: 10.1093/nar/gkn741. (Database issue) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.NHGRI. Camden: The Coriell Institute for Medical Research; 2009. NHGRI Sample Repository for Human Genetic Research. Available from: http://ccr.coriell.org/sections/collections/NHGRI/?SsId=11. [Google Scholar]
- 22.Harpending HC, Ward RH. Biochemical Aspects of Evolutionary Biology. Chicago: University of Chicago Press; 1981. Chemical systematics and human populations; pp. 213–246. [Google Scholar]
- 23.Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A, et al. The genetic structure and history of Africans and African Americans. Science. 2009 May 22;324(5930):1035–1044. doi: 10.1126/science.1172257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Joubert BR, Lange EM, Franceschini N, Mwapasa V, North KE, Meshnick SR, et al. A whole genome association study of mother-to-child transmission of HIV in Malawi. Genome Medicine. 2010 Mar 1;2(17) doi: 10.1186/gm138. 2010 http://genomemedicine.com/content/2/3/17. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.