Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Feb 24.
Published in final edited form as: Int J Obes (Lond). 2021 Feb 26;45(5):1017–1029. doi: 10.1038/s41366-021-00761-1

Admixture/fine-mapping in Brazilians reveals a West African associated potential regulatory variant (rs114066381) with a strong female-specific effect on body mass and fat mass indexes

Marilia O Scliar 1,2, Hanaisa P Sant’Anna 1,3, Meddly L Santolalla 1,4, Thiago P Leal 1,5, Nathalia M Araújo 1,6, Isabela Alvim 1,3, Victor Borda 1,7, Wagner C S Magalhães 1,8, Mateus H Gouveia 1,6,9, Ricardo Lyra 1, Moara Machado 1, Lucas Michelin 1, Maíra R Rodrigues 1,10, Gilderlanio S Araújo 11, Fernanda S G Kehdy 1,12, Camila Zolini 1,13,14, Sérgio V Peixoto 6,15, Marcelo R Luizon 1, Francisco Lobo 1, Michel S Naslavsky 2, Guilherme L Yamamoto 2,16, Yeda A O Duarte 17,18, Matthew E B Hansen 19, Shane A Norris 20, Robert H Gilman 4,21, Heinner Guio 22, Ann W Hsing 23, Sam M Mbulaiteye 24, James Mensah 25, Julie Dutil 26, Meredith Yeager 27, Edward Yeboah 25, Sarah A Tishkoff 19, Ananyo Choudhury 28, Michele Ramsay 28,29, Maria Rita Passos-Bueno 2, Mayana Zatz 2, Timothy D O’Connor 30,31,32, Alexandre C Pereira 33, Mauricio L Barreto 34,35, Maria Fernanda Lima-Costa 6, Bernardo L Horta 36, Eduardo Tarazona-Santos 1,14,37,38
PMCID: PMC9952852  NIHMSID: NIHMS1864828  PMID: 33633342

Abstract

Background/objectives

Admixed populations are a resource to study the global genetic architecture of complex phenotypes, which is critical, considering that non-European populations are severely underrepresented in genomic studies. Here, we study the genetic architecture of BMI in children, young adults, and elderly individuals from the admixed population of Brazil.

Subjects/methods

Leveraging admixture in Brazilians, whose chromosomes are mosaics of fragments of Native American, European, and African origins, we used genome-wide data to perform admixture mapping/fine-mapping of body mass index (BMI) in three Brazilian population-based cohorts from Northeast (Salvador), Southeast (Bambuí), and South (Pelotas).

Results

We found significant associations with African-associated alleles in children from Salvador (PALD1 and ZMIZ1 genes), and in young adults from Pelotas (NOD2 and MTUS2 genes). More importantly, in Pelotas, rs114066381, mapped in a potential regulatory region, is significantly associated only in females (p = 2.76e−06). This variant is rare in Europeans but with frequencies of ~3% in West Africa and has a strong female-specific effect (95% CI: 2.32–5.65 kg/m2 per each A allele). We confirmed this sex-specific association and replicated its strong effect for an adjusted fat mass index in the same Pelotas cohort, and for BMI in another Brazilian cohort from São Paulo (Southeast Brazil). A meta-analysis confirmed the significant association. Remarkably, we observed that while the frequency of rs114066381-A allele ranges from 0.8 to 2.1% in the studied populations, it attains ~9% among women with morbid obesity from Pelotas, São Paulo, and Bambuí. The effect size of rs114066381 is at least five times higher than the FTO SNPs rs9939609 and rs1558902, already emblematic for their high effects.

Conclusions

We identified six candidate SNPs associated with BMI. rs114066381 stands out for its high effect that was replicated and its high frequency in women with morbid obesity. We demonstrate how admixed populations are a source of new relevant phenotype-associated genetic variants.

Introduction

Overweight and obesity are risk factors for noncommunicable diseases, which are responsible for 63% of deaths worldwide [1] and 72% in Brazil [2]. Interindividual differences in BMI result from the effects of multiple genetic variants, environmental factors, and their interactions [3, 4]. Most of BMI heritability, estimated to be ~40%, is attributable to unknown genetic factors [5, 6]. Indeed, a meta-analysis of genome-wide association studies (GWAS) of BMI estimated that 97 loci explain ~2.7% of its variance [3]. The GWAS Catalog [7] and the DANCE web tool [8] report 389 SNPs associated with BMI, with a mean effect size of 0.054 kg/m2 (Fig. S1). Thus, BMI genetic architecture is characterized by a high number of loci with small effect sizes [5].

Our knowledge of the genetic architecture of complex phenotypes is biased [9] because only 22% of individuals included in GWAS are non-Europeans/non-US whites, 2.4% are from Africa, and 1.3% from Latin America [10]. The BMI meta-GWAS by Locke et al. [3] included only 5% of individuals of non-European ancestry among 339,226 individuals. Thus, expanding GWAS-based strategies beyond non-European populations is critical to discover differences in the genetic architecture of BMI among populations. This is especially important for phenotypes such as obesity, whose prevalence is higher in US African Americans, Hispanics, and Native Americans than in European Americans [11, 12], and in Brazil, higher in black women than in white women [13].

Few studies consider the influences of age- and sex-associated genetic factors on BMI. Despite the high correlation of intraindividual measurements of BMI at diverse ages, some genetic variants do have distinct effects depending on age [1416]. For example, a meta-analysis of 14 GWAS [17] found that variants near to PRKD1, TNNI3K, SEC16B, and CADM2 genes had larger effects on BMI during adolescence/young adulthood than later in the lifespan, while a variant near SH2B1 had the opposite trend. Regarding sex, variants in SEC16B and ZFP64 were identified with stronger effects in women ([3] and see “Discussion”).

Here we study the genetic architecture of BMI in the admixed population of Brazil, the largest and most populous Latin-American country, with more than 200 million inhabitants. Brazilians are the product of about 500 years of admixture between Africans, Europeans, and Native Americans [18] and therefore, are suitable for admixture mapping. This method uses an admixed population to map genomic regions associated both with a specific ancestry and the phenotype of interest. Admixture mapping, by performing less statistical tests respect to classical GWAS, results in higher statistical power. Thus, for medium-sized studies (more feasible in limited resources setting environments hosting non-European populations), admixture mapping improves the power to detect an association when compared to GWAS that include only a few thousands of individuals. So far, admixture mapping has identified seven loci associated with BMI at chromosomes 2 (2p23.3), 3 (3q29), 5 (5q13.3 and 5q14), 15 (15q26), and X (Xq25, Xq13.1) [11, 19, 20], but these studies were restricted to US African American populations.

We performed admixture mapping (followed by fine-mapping) of BMI using data of ~2.3 million SNPs for three Brazilian population-based cohorts, from Northeast (Salvador), Southeast (Bambuí), and South (Pelotas), with distinct admixture and socio-demographic backgrounds, studied by the EPIGEN-Brazil Initiative (https://epigen.grude.ufmg.br, [18]). Salvador has 51% of African ancestry, while Pelotas and Bambuí have predominant European ancestry (76% and 79%, respectively) (Table S1 and Fig. 1). As these cohorts include individuals of three different epochs of life—children, young adults, and older adults—we performed three separate admixture mapping to avoid confounding effects of age. Additionally, we performed a replication by testing the association between 216 BMI GWAS Catalog hits in our three Brazilian cohorts.

Fig. 1. Admixture in the Brazilian cohorts, BMI distributions, and admixture mapping (AM) Manhattan plots with significant peaks.

Fig. 1

A, B Manhattan plots showing AM peaks using linear regressions with PCAdmix local ancestry inferences. Consensus significant AM peaks for PCAdmix and RFMix local ancestry inferences are specified on each plot. A Manhattan Plot showing the AM results of African (left) and European (right) ancestry in Salvador cohort. African ancestry AM shows two positive significant peaks 10q22.1 (β = 0.36, p value = 3.21e−05) and 10q22.3 (β = 0.36, p value = 7.87e−05). European AM ancestry analysis shows one negative associated peak 10q22.3 (β = −0.36, p value = 2.92e−05). B Manhattan plot showing the AM results of African (left) and European (right) ancestry in the Pelotas cohort. One peak in 16q12.1 (β = −0.80, p value = 4.30e−06) was observed associated with African ancestry, and two associated peaks, 13q12.3 (β = −0.95, p value = 1.84e−05) and 20p12.1–2 (β = −1.05, p value = 1.79–06), with European ancestry in females. Results are presented as log10(p value) to the given ancestry of each window of 100 SNPs along the genome. Black line in the Manhattan plots correspond to the genome-wide threshold p value estimated for the given ancestry and dataset (Table S5). The linear regression coefficient (β) and p values for all peaks correspond to the lead window, the genomic window with the most significant p value in the linear regression result. C Brazilian regions and continental individual ancestry bar plots for each cohort. D Histogram of Z-score adjusted by sex and age according to WHO guidelines in Salvador (top), histogram of BMI in Bambui (center) and Pelotas (bottom) cohorts.

Materials and methods

Study populations and genotyping

The Salvador-SCAALA cohort comprises 1445 children aged 4–11 years in 2005, when BMI was measured. Salvador (Fig. 1) is a city of 2.8 million inhabitants in Northeast Brazil [21]. This population is part of an earlier observational study and represents the population without sanitation in Salvador.

The 1982 Pelotas birth cohort study was conducted in Pelotas, a city in southern Brazil, with 340,000 inhabitants (Fig. 1). Throughout 1982, 99.2% of all births in the city were enrolled. Of these, the 5914 liveborn infants whose families lived in the urban area constituted the original cohort [22]. BMI was measured in 2004/2005, when individuals were 23 years old. The 2012–2013 follow-up measured participants’ body fat, lean, and bone mineral masses using dual-energy X-ray absorptiometry (DXA; GE Lunar Prodigy densitometer) in a full-body scan. We calculated fat mass index by dividing the adjusted fat mass (kg) by squared height (meters).

The Bambuí cohort study of ageing is ongoing in Bambuí, a city of ~15,000 inhabitants, in Minas Gerais State in Southeast of Brazil (Fig. 1). This cohort consisted of all residents aged 60 years and over on January 1997. From 1742 eligible residents, 1606 constituted the original cohort [2325]. BMI was measured in 1997, when individuals were between 60 and 93 years old.

The EPIGEN-Brazil initiative genotyped individuals from Salvador, Pelotas, and Bambuí cohorts using the Illumina (San Diego, CA, USA) HumanOmni2.5–8v1 and the HumanOmni5–4v1 arrays. We used the consensus working datasets described in Kehdy et al. [18] that went through extensive quality control of SNPs and samples, as detailed in Kehdy et al. [18] and briefly explained in the Supplementary Material. For the three cohorts, measurements were taken by trained research staff and BMI was calculated as weight (kg) divided by squared height (meters). Potential confounding variables are sex, age, and different socioeconomic status (SES) (Table S1). In the end, we had genotyped, BMI, and covariables data for 1222, 3628, and 1342 individuals from the Salvador, Pelotas, and Bambuí cohorts, respectively. The EPIGEN protocol was approved by Brazil’s national research ethics committee (CONEP, resolution number 15895, Brasília). Informed consent was obtained from all subjects. In addition, we used African, European, and Native American individual ancestries estimated in Kehdy et al. [18] and performed on the same set of individuals using the software ADMIXTURE [26] (Fig. 1 and Table S1).

Kinship coefficients

Relatives were identified and removed from Salvador (63 individuals) and Pelotas (83 individuals) cohorts using a network-based approach that aims to eliminate the smallest possible number of individuals [18]. The Bambuí cohort has 516 (36%) individuals with relatives in the cohort. Thus, for Bambuí, we identified families with a categorical variable.

Phasing and local ancestry inference by PCAdmix and RFMix software

We phased our datasets using the software SHAPEIT2 [27], as detailed in ref. [18]. As inferences of continental local chromosome ancestry based on genome-wide data are more uncertain than inferences about individual or population ancestry, we used two methods for local ancestry inference, implemented in the software PCAdmix [28] and RFMix [29]. PCAdmix inferences were performed as in ref. [18]. For RFmix inferences, we used Europeans, Africans, and Native American as parental populations and fixed parameters as described in the Supplementary Methods. For PCAdmix and RFMix results, we considered only the windows which ancestry was inferred with a posterior probability > 0.90.

Admixture mapping

We tested the association between BMI and each local ancestry (African, European, and Native American) across the genome using linear regression models. The regressions were adjusted by age (Salvador and Bambuí), sex, SES, and genome-wide African ancestry. For Bambuí, we corrected for family structure. We used an additive model that considers the number of inferred African, European, or Native American ancestry copies (0, 1, or 2) carried by an individual for each window. Because we found an association between individual African ancestry and BMI in females in Pelotas (Supplementary Methods, Tables S2S4), we performed a stratified analysis for each sex in this cohort. While for Salvador and Pelotas we used simple linear regression [30], for Bambuí we used robust variance estimators to correct results by family structure [31].

To establish a significance threshold for the admixture mapping, accounting both for multiple testing and linkage disequilibrium (LD) due to admixture, we estimated the effective number of tests (ENT) for each chromosome for each individual [32], and obtained an equivalent Bonferroni p value threshold for significance dividing 0.05 divided by ENT (Table S5). We conservatively used the same genome-wide thresholds to identify admixture mapping hits for X-chromosome. We compared the significant admixture mapping peaks of each chromosome obtained using PCAdmix and RFMix local ancestry inferences (using p values thresholds from Table S5 multiplied by 10), and, considered a consensus significant peak the ones mapped to the same chromosomal region (not only the same chromosome bands) using both inferences. Only these consensus significant admixture mapping peaks were followed-up for fine-mapping.

Imputation, fine-mapping, and annotation

Fine-mapping was performed using both genotyped and imputed SNPs. We imputed our dataset focusing on ±1 Mb centered in the most significant window of each admixture mapping hit (based on PCAdmix). To this, we used IMPUTE2 [33] with a reference panel that merged the public reference panel data from 1000G and 270 individuals from EPIGEN (90 of each cohort) genotyped for 4.3 million SNPs, and considered only SNPs imputed with an info score quality metric > 0.9 [34].

Genotyped and imputed SNPs were tested for association with BMI using the same linear regressions models used for admixture mapping. We excluded SNPs with minor allele frequency < 0.005 for these analyses. We considered significant, the associations with p values less than or equal to the ones obtained for the admixture mapping peaks and suggestively significant those SNPs with a p value higher than the ones obtained for the admixture mapping peaks but not more than one unit of −log (p value). Fine-mapping results were plotted using the LocusZoom tool [35] and annotated using ANNOVAR [36]. We estimated the LD statistics (r2, [37]) on phased data using the software Haploview [38]. A flowchart summarizing the study design is shown in Fig. S12.

Replication cohorts and meta-analysis

We tested for replication the association of rs114066381-A with BMI in other four cohorts: (1) whole-genome data with a mean target coverage of 30x from 651 unrelated females from São Paulo, Brazil, the SABE (Health, Wellbeing, and Aging) study [3941]. Linear regression was adjusted for age, education level, SES, and individual African ancestry proportion; (2) 1082 women from Puerto Rico (547 non-cancer controls and 535 cases) genotyped on the Affymetrix Axio UK biobank array. rs114066381 was imputed with IMPUTE2 using samples from 1000G as reference. Linear regression was adjusted for age, education level, individual African ancestry proportion, and breast cancer diagnosis; (3) 1103 adult women (age ≥18 years) from Nigeria, Cameroon, Sudan, Ethiopia, Kenya, Tanzania, and Botswana [42]. Individuals were genotyped on either the Illumina 5M-Omni array or the Illumina 1M-Duo BeadChip array. For individuals typed on the last one, rs114066381 was imputed using MiniMac, based on a reference panel of 180 whole-genome sequences from eastern and southern Africa as well as the African populations from the 1000G. Association tests were performed using a linear-mixed model in which age was modeled as a fixed effect and the kinship matrix was used for the random effects term; (4) 859 women from Soweto, South Africa [43] genotyped on the 2,3M H3-Africa array. rs114066381 was imputed using the African reference panel at Sanger Imputation facility. Linear regression was adjusted for age and SES. We used the package metafor [44] to perform the meta-analysis using a random effects model with Hedges method.

Statistical power estimation

Power estimation was performed separately for each EPI-GEN cohort, according to the specific BMI distribution (Table S1). SNPs associated with BMI in the GWAS Catalog were extracted using the keyword “body mass index”, and those with p values > 9e−05 were filtered out. From the 2205 SNPs reported on 61 published studies, we kept 216 SNPs with effect size unambiguously associated to a specific allele and reported as a regression coefficient expressed in kg/m2 from cross-sectional studies, genotyped or imputed in our database. We calculated the statistical power using the latter effect size (regression coefficient) values, and allele frequency and number of individuals from each EPIGEN cohort (Table S1). The type I error rate was set at α = 0.00023. All power estimates were calculated with QUANTO v1.2.4 program [45], assuming an additive genetic model with independent individuals.

Replication analysis of previous GWAS hits

To test the association of previous BMI GWAS hits, we used the regression model used in fine-mapping for all the selected 216 SNPs in the three Brazilian cohorts. p values were adjusted considering 216 independent tests using the Benjamini–Hochberg correction [46].

Genomic in silico analyses

The search for candidate regulatory SNPs was performed using HaploReg v4.1 database (http://archive.broadinstitute.org/mammals/haploreg/haploreg.php, [47]), Ensembl (https://grch37.ensembl.org/, [48]), and RegulomeDB (http://www.regulomedb.org/, [49]). ChIP-seq data were provided by [50], available at HaploReg v4.1 database.

Results

Admixture mapping and fine-mapping

We performed an admixture mapping analysis for the three continental ancestries (African, European, and Native American) in the three cohorts using an additive model considering the number of inferred African, European, or Native American ancestry copies (0, 1, or 2) carried by an individual for each chromosome fragment.

Table 1 shows the five consensus significant admixture mapping peaks found in Salvador and Pelotas. The distribution of BMI for each allele of African or European ancestry for the five peaks are shown in Supplementary Figs. S2S4. No consensus significant peak was found in older adults from Bambuí.

Table 1.

Admixture mapping peaks obtained both with RFMix and PCAdmix local ancestry inferences.

Cohort Genomic region (length in base pairs) Local ancestry (sub-dataset) Regression coefficienta p valuea Associated region (length in base pairs)b
Children from Salvador 10q22.1 (4,300,000) African 0.36 3.21e–05 445,166
10q22.3 (4,400,000) African 0.36 7.87e–05 255,990
European −0.36 2.92e–05 577,546
Young adults from Pelotas 16q12.1 (5,600,000) African −0.80 4.30e–06 600,961
13q12.3 (3,300,000) European (female) −0.95 1.84e–05 118,909
20p12.1–2 (8,700,000) European (female) −1.05 1.79e–06 3,481,320

Regression coefficients, p values, and windows length associated in the linear regressions using PCAdmix local ancestry inferences.

a

Regression coefficients and p values of the lead genomic window of 100 SNPs, using PCAdmix inference. The lead window is the genomic window with the most significant regression coefficient in the admixture mapping among those windows below the significance cut-off. Regression coefficients are the change in units of BMI (Kg/m2) for each additional copy of a specific ancestry.

b

Length of the continuous genomic region including significant admixture mapping results. Each window of PCAdmix inference has 100 SNPs.

Fine-mapping on Salvador children

The high African ancestry (51%) in children from Salvador allowed us to identify two genomic regions where this ancestry is positively associated with BMI and within these regions, we identified three significant variants (Tables S8 and S9 and Fig. S5): within 10q22.1, rs1334909357-CTTT in an intron in the PALD1 gene and, within 10q22.3, the linked SNPs rs79947827-A and rs141274185-T (LD: r2 = 0.86) in the ZMIZ1 gene, that encodes a protein that regulates the activity of many transcription factors [51]. Other SNPs in ZMIZ1 are associated with 19 complex disorders, and this gene is among the 21 human genes most associated with complex phenotypes, including not only BMI-related phenotypes such as height and sitting height ratio, but also psychiatric disorders, breast cancer and autoimmune diseases (http://gilderlanio.pythonanywhere.com/home, Fig. S6).

Fine-mapping in young adults from Pelotas

While the low non-European admixture reduces the power to detect non-European associated variants in Pelotas, this is compensated by its larger size (n = 3628) in respect to the Salvador cohort. Also, as Pelotas is a birth cohort, all individuals have the same age, which limits nongenetic variance for BMI. For the entire Pelotas cohort, we identified one genomic region, 16q12.1, where African ancestry is negatively associated with BMI and, within this region, we found one significant SNP, rs76416629-G, 2 kb upstream of NOD2 gene (Tables S6 and S7 and Fig. S5).

Furthermore, we identified a genomic region, 13q12.3, for which European ancestry is associated with lower values of BMI in females and, within this region, two significant SNPs (not in LD, r2 < 0.001). rs113214936-G in the intron of MTUS2, a gene previously associated with obesity-related traits [52]. Our most striking result is the association of the SNP rs114066381-A with a strong effect on BMI in females (beta = 3.99 ± 0.84 kg/m2 per allele, 95% CI: 2.32–5.65, p = 2.76 × 10−6, Table 2 and Fig. 2). This SNP is present in 31 unrelated females (all heterozygous) that have a mean BMI of 27.99 kg/m2, which is larger than the mean BMI for the cohort females (23.61 kg/m2, p = 0.0008, Fig. 3). These 31 females have a mean African ancestry of 35%, while the mean in unrelated females is 16%. Remarkably, the BMI of 25 males carrying the rs114066381-A allele (mean: 23.72 kg/m2) does not differ from the general population (mean: 23.81 kg/m2, p = 0.5397, Fig. 3).

Table 2.

Association of rs114066381-A with body mass index and fat mass index in unrelated females of the Pelotas cohort and replications.

Phenotype Cohort N Age (years old) Frequency Regression coefficient SE p value Power
Fat mass index Pelotas 1417 30 0.008 2.21 0.84 9.00e–03
Body mass index Pelotas 1795 23 0.008 3.99 0.84 2.76e–06
Bambui 516 60–93 0.010 2.93 2.65 0.26 39%
Bambui (all femalesa) 821 60–93 0.009 2.64 2.33 0.25 53%
Sao Paulo 651 59–99 0.019 3.34 1.12 3.26e–03 89%
Bambui + Sao Paulo 1173 59–99 0.015 3.55 0.94 1.91e–04 99%
Salvador 664 4–11 0.021 −0.24 0.38 0.53 9%
Puerto Rico 1082 21–89 0.005 1.30 1.92 0.50 13%
African populationsb (all femalesa) 1103 17–97 0.013 −0.57 0.80 0.47 9%
Soweto 859 39–60 0.020 2.39 7.63 0.75 69%
a

Using genetic relationship matrix.

b

Nigeria, Cameroon, Sudan, Ethiopia, Kenya, Tanzania, and Botswana populations.

Fig. 2. LocusZoom plot of the fine-mapping of consensus significant admixture mapping peak in young adults from Pelotas at 13q12.3 associated with European ancestry in females performed using both genotyped and imputed SNPs ±1 Mb from target region (lead windows).

Fig. 2

The SNP with the lowest p value is color coded in purple and labeled. The linkage disequilibrium between this SNP and the remaining nearby SNPs is indicated by the color coding according to r2 values based on Africans from 1000 Genomes Project (Color figure online).

Fig. 3. Body mass index (BMI) in females and males’ adults from Pelotas cohort, according to their genotypes in the SNP rs114066381.

Fig. 3

The increase of BMI associated with the rs114066381-A is observed in females (p value = 0.0008), but not in males (p value = 0.5397).

rs114066381 is 2 kb from a CTCF-binding site [48], but no evidence of transcription regulation is shown by RNA-seq [50]. Besides, this genomic region contains binding sites for the histone-interacting proteins KAP1 and SETDB1, as reported by ChIP-seq data (HaploReg v4.1, [47], [50], Fig. S7). However, there is no evidence in the literature that the region acts as an enhancer in vivo. This genomic region is primate-specific, being absent from the genome of other vertebrates (Supplementary Methods, UCSC Genome Browser 2013, [53], Figs. S8 and S9). The derived allele A is very rare in Europeans, but has frequencies of ~3% in West Africans (Table S6).

We confirmed the rs114066381 female-specific association using the fat mass index (a more direct measurement of adiposity), measured by DXA, 7 years after the measurement of BMI on the same individuals (beta = 2.21 kg of fat/m2 per allele, 95% CI: 0.55–3.88, p = 9× 10−3, Table 2). We replicated with 89% of power the association in older adult females from São Paulo (Brazil) (SABE cohort, [40]) and in the merged dataset from São Paulo and Bambuí (power = 99%, Table 2). We tested but did not observe significant association for the other cohorts tested (Table 2). However, a meta-analysis synthesizing the seven effect sizes showed a positive association between rs114066381-A allele and BMI in females (Fig. 4) both considering all effects together or only the effects obtained with admixed populations. Remarkably, while in Pelotas and São Paulo the frequency of rs114066381-A allele is 1.1%, its frequency increases in women with overweight (1.2%) and obesity (1.98%), and attains 9% among women with morbid obesity. The same pattern is also observed in Bambuí and Soweto populations but not in Puerto Rico, in which the frequency of rs114066381-A allele is higher only for women with obesity (Table S10). We observed that the BMI distribution of Salvador and rural Africa are very different compared to the other populations, while in Pelotas, Bambuí, São Paulo, Puerto Rico, and Soweto, 27% of individuals have BMI greater than 25 kg/m2, in Salvador and rural Africa, 11% of individuals fall in this category (and only 0.2% of individuals are morbidly obese in both cohorts, contrasting to more than 1% in the other populations). Moreover, 23% of the individuals of rural Africa are underweight (BMI < 18.5 kg/m2).

Fig. 4. Forest plots from the meta-analysis synthesizing association results between rs114066381 and BMI from seven populations.

Fig. 4

Effect size [95% confidence interval (CI)] in each individual study, subgroups of African populations and admixed populations, and combining all populations.

Power estimation and replication of previous GWAS hits

We estimated the statistical power to detect associations for 216 BMI GWAS catalog hits on the three EPIGEN cohorts, conditioning on the effect sizes reported in kg/m2 units on the GWAS Catalog (took as a population parameter), the BMI distribution, as well as the number of studied individuals and the allele frequencies in each of the EPIGEN cohorts. These 216 GWAS Catalog hits were selected because, in the context of the high heterogeneity of data stored in the GWAS Catalog, their effect sizes (linear regression coefficient) were unambiguously associated with a specific allele and were consistently reported in units of kg/m2. Out of the 216 hits, 189 were observed in adults, 4 in children, and 23 in both, in individuals with predominant European ancestry.

Based on the mean statistical power of the 216 GWAS Catalog hits, and assuming that these SNPs (and their regression coefficient) are part of the genetic architecture of BMI in the Brazilian cohorts, we would expect to observe 24 SNPs out of 216 (average power = 11%, but we observed 0 replications) associated in Salvador children, 22 SNPs (average power = 10%, we observed 20) in Pelotas young adults, and 15 SNPs (average power = 7%, and we observed 8) in Bambuí older adults (Fig. S10 and Table S11). Specifically, in Pelotas, we confirmed the association for the six FTO SNPs included in our analysis (rs9930333, rs62033400, rs8050136, rs3751812, rs1558902, and rs9939609).

Discussion

Leveraging admixture in Brazil, we used genome-wide data of three population-based cohorts to find loci associated with BMI through admixture mapping followed by fine-mapping. We acknowledge that a limitation of our study is the relatively small number of individuals respect to GWAS standards in European or US populations; but this is a limitation shared by several studies, considering the difficulties of achieving larger sample sizes in admixed populations of low-medium income countries. Besides, the present study has important characteristics absent in most studies. First, it relies on population-based cohorts that better capture the phenotypic variation of populations, but are rarely considered in genetic studies [54, 55]. Second, it is one of the few studies that explore the genetic architecture of BMI in three different age stages: children, young adults, and older adults. Third, in the context of under-representation of non-European populations in GWAS, we analyzed populations of South America with African and Native American ancestry. Because none of the six new candidate SNPs to influence BMI reported in this study (Tables S7 and S9) are in LD with previous 389 GWAS Catalog hits of BMI (r2 < 0.022), we conclude that we have contributed to expand the catalog of SNPs of the global genetic architecture of BMI.

Factors supporting the female-specific effect of rs114066381 on adiposity

First, we replicated the female-specific association in an independent admixed cohort from São Paulo and confirmed its association by a meta-analysis. Second, while our initial female-specific association with BMI and fat mass index is based on imputed genotypes, replication in the São Paulo cohort is based on whole-genome data. Third, in the discovery cohort, we not only observed a strong association with BMI, but also with fat mass index measured with DXA, a more direct measure of adiposity collected 7 years later than BMI in the same individuals. Fourth, rs114066381 is located in a potential regulatory region, which makes a biological role for this genomic region plausible. With respect to the pattern of LD, rs114066381 presents an r2 = 0.5 with three SNPs in LD with each other (r2 = 1) mapped in near regulatory regions (in sensu HaploREG v4.1, [47], Fig. S11). In Bambuí the r2 of rs114066381 with the same three SNPs varies between 0.3 and 0.7 (Fig. S11). rs114066381 is ~300 kb from rs7335631 associated with “Fat distribution in HIV” [56], but there is no (r2 < 0.001) LD between these two SNPs in any of our three Brazilian populations. Thus, our results suggest a specific role for rs114066381.

The discovery size effect for rs114066381 (beta = 3.99 ± 0.847 kg/m2) is one of the highest observed for BMI, considering both sexes. The size effects suggested by meta-analyses (Fig. 4) are also high considering the distribution of BMI effect sizes. According with GWAS Catalog (October/2019), the range of estimated beta in kg/m2 for BMI hits is 0.013–4.119 with an average of 0.054 kg/m2.

The female-specific association for rs114066381 is observed in the following context: out of 833 hits reported in GWAS Catalog as associated with BMI with beta reported in kg/m2 (October/2019) independently of sex, 229 are female-specific associations (beta range: 0.009–0.484, beta mean: 0.025) and 134 male-specific [beta range: 0.013–0.095, beta mean: 0.025]. Even if mean effect sizes are similar in men and women, the effect size distribution of women shows a tail of higher beta values, which suggests that higher effect sizes are more common in women than in men. Our finding is paradigmatic of this context: in adult females of Southern Brazil, rs114066381 alone explains a similar portion of the variance of BMI (r2 range for Pelotas, Bambuí, and São Paulo cohorts: 0.008–0.044) as the entire set of 97 GWAS hits recently reported [3]. Also, we can speculate that rs114066381 could be an example of a thrifty genotype [57, 58] associated with energy storage in females and pregnancy (but see [59] for a counterpoint of the thrifty theory).

Replication of other GWAS hits

We replicated 28 of the 216 associations reported for SNPs in previous GWAS, mostly performed in adults of European ancestry, with the Pelotas cohort presenting not only the largest rate of replication (20/216) (Table S11), but also a very good concordance between the observed (20) and expected (22) number of replications. This is consistent with: (1) the larger size of Pelotas cohort; (2) the relative lower SES of the Salvador cohort adds a layer of complexity to the definition of the genetic architecture of BMI, respect to GWAS in predominantly European populations with different socioeconomic background; (3) the age dependence of the genetic architecture of BMI, and the fact that most GWAS of BMI were performed in adults and in Pelotas BMI was also measured in young adults, while in the Salvador and Bambuí cohorts, BMI was measured in children and older adults, respectively. Lasky-Su et al. [54] showed how age-dependent effects can be an important and misjudged cause of non-replication. These results exemplify how differences in age, SES and ancestry contribute to differences in the genetic architecture of BMI in particular and complex traits in general.

In conclusion, we performed three admixture mapping/fine-mapping for BMI and tested the association of GWAS Catalog hits in three Brazilian population-based cohorts. We provide six candidate SNPs associated with African or European ancestry that are associated with BMI. More importantly, our admixture/fine-mapping in Brazilians reveals a West African associated potential regulatory variant (rs114066381), with a female-specific effect on BMI, which seems to be particularly important for the development of morbid obesity. Altogether, our results show that the study of South American admixed populations, as well as other populations worldwide [6062] are a source of novel non-European associated variants with considerable effect size that may explain in non-European populations an important portion of the current “missing heritability”. This statement can be generalized by the observation that ~25% of the variants discovered in GWASs of BMI were identified by studies with Latin Americans, although they represent only 11% of such studies, indicating the importance of increase the number and size of studies with these populations.

Code availability

Used bioinformatics pipelines are available in the EPIGEN-Brazil Project Scientific Workflow (http://www.ldgh.com.br/scientificworkflow, [34]).

Supplementary Material

Support methods

Acknowledgements

For analyses, we used the Sagarana cluster (from Centro de Laboratórios Multiusuários do Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais). We thank Miguel Ortega for help in the use of Sagarana, Ms. Evelyn Tay at University of Ghana Medical School (Accra, Ghana) for managing the study, and Ms. Marcelle Bartholomeu and Ms. Àlex Teixeira for technical support. The EPIGEN-Brazil Initiative is funded by the Brazilian Ministry of Health (Department of Science and Technology from the Secretaria de Ciência, Tecnologia e Insumos Estratégicos) through Financiadora de Estudos e Projetos. The EPIGEN-Brazil investigators received funding from the Brazilian Ministry of Education (CAPES Agency), Brazilian National Research Council (CNPq), the Minas Gerais State Agency for Support of Research (FAPEMIG), Rede Mineira de Genômica Populacional e Medicina de Precisão (FAPEMIG-RED-00314–16), and TWAS-CNPq Full PhD fellow, and grant 2019/19998–8, São Paulo Research Foundation (FAPESP).

Footnotes

Supplementary information The online version contains supplementary material available at https://doi.org/10.1038/s41366-021-00761-1.

Compliance with ethical standards

Conflict of interest The authors declare that they have no conflict of interest.

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Support methods

RESOURCES