Abstract
Background and aims
Polymorphisms in the first intron of FTO have been robustly replicated for associations with obesity. In the Sorbs, a Slavic population resident in Germany, the strongest effect on body mass index (BMI) was found for a variant in the third intron of FTO (rs17818902). Since this may indicate population specific effects of FTO variants, we initiated studies testing FTO for signatures of selection in vertebrate species and human populations.
Methods
First, we analyzed the coding region of 35 vertebrate FTO orthologs with Phylogenetic Analysis by Maximum Likelihood (PAML, ω = dN/dS) to screen for signatures of selection among species. Second, we investigated human population (Europeans/CEU, Yoruba/YRI, Chinese/CHB, Japanese/JPT, Sorbs) SNP data for footprints of selection using DnaSP version 4.5 and the Haplotter/PhaseII. Finally, using ConSite we compared transcription factor (TF) binding sites at sequences harbouring FTO SNPs in intron three.
Results
PAML analyses revealed strong conservation in coding region of FTO (ω<1). Sliding-window results from population genetic analyses provided highly significant (p<0.001) signatures for balancing selection specifically in the third intron (e.g. Tajima’s D in Sorbs = 2.77). We observed several alterations in TF binding sites, e.g. TCF3 binding site introduced by the rs17818902 minor allele.
Conclusion
Population genetic analysis revealed signatures of balancing selection at the FTO locus with a prominent signal in intron three, a genomic region with strong association with BMI in the Sorbs. Our data support the hypothesis that genes associated with obesity may have been under evolutionary selective pressure.
Introduction
Obesity is a complex disease with an estimated heritability of 40–70% [1,2] The existence of genetic factors is well supported by a number of polymorphisms identified in recent genome-wide association studies [3]. Single nucleotide polymorphisms (SNPs) in the fat mass and obesity-associated gene (FTO) locus seem to be among the eminent factors associated with obesity measures such as body mass index (BMI). The associations between FTO variants and BMI have been robustly replicated in populations with different ethnic backgrounds [4–7]. The human FTO maps on 16q12.2 and encodes a 2-oxoglutarate-dependent nucleic acid demethylase [8]. The SNP rs9939609 representing a cluster of variants with strong associations to BMI and overweight is located in the first intron [5]. Whereas these associations have initially been shown in cohorts of European origin [5], they could not be replicated in an African sample and the Han Chinese [9,10]. On the other hand, in the Sorbs—a population of Slavic origin residing in Eastern Germany, in addition to the association signal in the first intron, the strongest association to BMI was found for the SNPs mapping to intron three [11]. These findings indicate specific effects of FTO alleles in Sorbs and raise the question, whether FTO has been subject to natural selection and so, does show population specific patterns of selection.
Considering the “thrifty genotype” hypothesis which states that an evolutionarily advantageous increased capacity to store energy may result in obesity and type 2 diabetes (T2D) in Western-lifestyle societies [12], genes associated with obesity and T2D have become attractive targets of evolutionary studies. Until recently, there has not been a strong evidence for consistent patterns of selection at loci associated with T2D which would provide conclusive confirmation of the thrifty genotype hypothesis. It has been shown more recently that in a locus-by-locus study, 14 loci associated with T2D, and to a lesser extent, obesity, from European, Africans and East Asian populations appear to have undergone selection, however there is no positive selection evidence found when all the T2D loci were analyzed together [13].
Since comprehensive data regarding FTO evolution are sparse, we initiated studies searching for signatures of selection in vertebrates and human populations. To test for the conservation of the protein-coding sequence on the inter-species level, we calculated the ratio of non-synonymous to synonymous base exchanges (ω = dN/dS) of the coding region in 35 vertebrate FTO orthologs with Phylogenetic Analysis by Maximum Likelihood (PAML). Furthermore, we investigated SNP data for footprints of selection using DnaSP version 4.5 and the Haplotter/PhaseII in human population of Yoruba from Ibadan, Nigeria (YRI), Utah residents with Northern and Western European ancestry (CEU), East Asians, (ASN), Han Chinese from Beijing, China (CHB), and Japanese from Tokyo, Japan (JPT)). Finally, we examined in silico the possible impact of FTO SNPs in intron 3 on transcription factor (TF) binding sites.
Materials and Methods
All studies were approved by the ethics committee of the University of Leipzig. All subjects gave written informed consent.
Phylogenetic Analyses by Maximun Likelihood (PAML)
PAML [14] provides a program CODEML to estimate the level of gene conservation by calculating the dN/dS ratio ω (dN: non-synonymous mutation substitution rate, dS: synonymous mutation substitution rate). In the present study, ω was calculated in PAML version 4.1 [15]. The coding sequences of 35 vertebrate FTO orthologs were extracted from Ensembl (http://www.ensembl.org) and the NCBI (http://www.ncbi.nlm.nih.gov) databases. Species and accession numbers are provided in S1 Table. Subsequently, all coding sequences were aligned by a widely used progressive alignment method, ClustalW [16] within MEGA 6 [17]. Phylogenetic tree was conducted by Neighbor-Joining (NJ) algorithm using aligned coding sequences in MEGA 6. The evolutionary distances were computed by Jukes-Cantor model which is the best predicted model giving by jModeltest 2.1.5 [18]. 1,000 bootstrap searches were performed to infer the phylogenetic tree and bootstrap consensus phylogenetic tree. The initial input for PAML analysis is displayed in Fig. 1.
Due to the fact that the power to detect positive selection is reduced when the rates across sites are averaged, diverse tests were adopted according to recommendations for real data analyses [19]. The tests conducted include the one-ratio model (M0), free-ratio [20], nearly neutral (M1a), positive selection (M2a) [21], discrete (M3) [22], beta (M7), and beta&ω (M8) [23]. The likelihood ratio tests (LRT = 2 (l 1–l 0), 2Δl, where l 1 and l 0 are the log likelihoods from two models respectively) are conducted to every two nested models [24] in order to identify which model better fits to the data. LRTs and nested models are briefly introduced as followed: the M0 model is a plain model in which the same dN/dS ratio is assumed for all branches in the phylogeny [20]. The free-ratio model is the most general model, where an independent dN/dS ratio is assumed for every branch [20]. The first LRT involves the M0 model and the free-ratio model which can be compared to survey if the dN/dS ratios ω differ among lineages [20]. Paired M0 model and M3 models can be tested by the second LRT, which is to analyze if ω varies among sites. In the discrete model, M3, three site classes, each of them with an independently estimated ω which also allows for sites with ω > 1, are estimated over a general discrete distribution. For each site class the proportion p is given [22]. The third LRT compares the M1a model and M2a model. M1a postulates a class of sites with ω = 0 and a second class of sites with 0 < ω < 1, where in the M2a model a third class of sites is added (ω >1) [25,24]. The fourth LRT is between M7 model and M8 model, which has more power to detect positively selected sites, as both models allow for sites with 0 < ω < 1 [23]. For M7 a beta distribution for ω over sites is assumed, which is limited to the interval (0, 1) [23]. In M8 another site class is added with ω valuated from the data set which allows sites with ω > 1 [23].
SNPs selection and population genetic measures
Population genetic measures were calculated with DnaSP version 4.50.0.3 [26], Sweep [27], and the iHS (integrated haplotype score) tool available under http://hgdp.uchicago.edu/. In the Sorbs cohort, a self-contained ethnic group in Eastern Germany, which has been described in detail elsewhere [11], closely related subjects (Identity by descent/IBD > 0.05, calculated by PLINK v1.01 [28] ) were removed from the analyses. Further inclusion criteria were: minor allele frequency (MAF) > 0.01, Hardy-Weinberg equilibrium (HWE) > 0.0001, missingness per SNP < 0.05, and missingness per sample < 0.07. According to these standards, genotype data for 307 SNPs within the FTO locus (positions ranging from 52013417 to 52997859 bp, NCBI build 35/hg17) in 34 individuals were extracted from an available dataset obtained by genotyping DNA from about 1000 Sorbs with 500K/6.0 Affymetrix GeneChip arrays (S2 Table).
Additionally, data of the HapMap populations (YRI, CEU, ASN, CHB, JPT) were downloaded from http://www.hapmap.org/ and filtered for the same SNPs genotyped in the Sorbs (S2 Table). As the CEU and YRI comprise parent/child trios, analyses were performed without SNP data of the children. Haplotype reconstruction in all populations was performed with PHASE version 2.1 [29,30].
DnaSP provides population genetic measures like Tajima’s D [31], Fu and Li’s D*, and Fu and Li’s F* [32], which all detect deviations from the normal distribution of common or rare alleles in neutral evolution. The iHS is based on different levels of linkage disequilibrium (LD) surrounding selected allele region compared to the background allele at the same position. Suggestive evidence for natural selection is defined as iHS < −1.5 or > 1.5, powerful selection is iHS < −2 or >2 [33]. The calculated fixation index (Fst) is a measure for the extent of variations in allele frequency between populations [34]. Population differentiation increased by local adaptation may result in larger Fst values [35].
For the HapMap populations, standardized iHS and Fst values were also provided by the Haplotter (http://hg-wen.uchicago.edu/selection/haplotter.htm.) [36,37]
Transcription factor binding sites
To uncover alleles that may change binding sites in intron three, sequences surrounding eight obesity associated SNPs (20bp up—and downstream) within FTO intron three were downloaded from UCSC genome browser (http://genome.ucsc.edu/). Comparing transcription factor binding probabilities of sequences carrying either the major or the minor alleles were performed using ConSite [38]. Sequences included in the analysis are listed in S4 Table. We particularly analyzed transcription factors specifically in vertebrates, as it is well acknowledged that functional transcription factor binding sites are conserved among close species, where substitutions occur mostly at nonfunctional positions when the evolutionary distance of species increases [39]. ConSite incorporates the datasets from JASPAR [40] and ConSite summarizing transcription factor binding profiles as well as phylogenetic footprinting algorithms for additional constraints further improving prediction algorithms. The noise level of ConSite compared to other single sequence analysis is reduced by ∼ 85% [38]. The program has been applied to several studies [41–43] and has been validated in functional experiments both in vitro and in vivo [42].
Results
Phylogenetic Analyses by Maximum Likelihood (PAML)
PAML analysis revealed that the coding region of FTO is highly conserved among all studied species (average ω = 0.1616). The LRT statistic for lineage-specificity model (M0 vs. free-ratio) was calculated as 2Δl = 63.74. Compared with a χ2 distribution under d.f. = 66, the difference between these two models was not significant indicating that the ω is not different among lineages. This suggests no differences in the direction and magnitude of selection acting on FTO coding regions of each species [44]. The second LRT was conducted between M0 model and M3 model. The significance of the result of LRT (2Δl = 193.47, d.f. = 4 ) pointed to the M3 model. In this case, ω varied among sites within a species instead of having a constant value, and the substitution rate between non-synonymous and synonymous mutations fluctuated within FTO coding region of each species. In the last two LRTs, non-significant results were detected which suggested null neutral hypothesis (M1a and M7) [45]. In summary, positive selection cannot be inferred for any of the sites in coding sequences. All data are summarized in Table 1.
Table 1. PAML analysis of FTO.
Model | Parameter Estimates | In L | LRT | |||
---|---|---|---|---|---|---|
Models | 2Δl | d.f. | p-value | |||
M0 | ω0 = 0.1616 | −1462.43 | M0 vs. free-ratio | 63.74 | 66 | 0.58 |
free-ratio | variable | −1430.56 | ||||
M3 | p0 = 0.40765, p1 = 0.39476, p2 = 0.19759, ω0 = 0.00264. ω1 = 0.12330, ω2 = 0.90116 | −1365.70 | M0 vs. M3 | 193.47 | 4 | < 0.001 |
M1a | p0 = 0.79894, p1 = 0.20106, ω0 = 0.05695, ω1 = 1.00000 | −1378.39 | M1a vs. M2a | 0.61 | 2 | 0.74 |
M2a | p0 = 0.79860, p1 = 0.16914, p2 = 0.03227, ω0 = 0.06053, ω1 = 1.00000, ω2 = 1.90635 | −1378.09 | ||||
M7 | p = 0.21277, q = 0.75616 | −1367.96 | M7 vs. M8 | 3.71 | 2 | 0.22 |
M8 | p0 = 0.94790, p = 0.26173, q = 1.37577 (p1 = 0.05210), w = 1.63117 | −1366.10 |
All values computed under default settings. InL: log likelihood, LRT = likelihood ratio test to detect positive selection, d.f. = degrees of freedom.
Population genetic measures
The analyses with DnaSP provided strong evidence for a non-neutral evolution of the FTO locus. Across the whole gene locus (1 Mb), Tajima’s D showed significant deviations from the normal distribution of alleles (summarized in Table 2). The sliding-window analyses further supported these findings (Fig. 2). Interestingly, Tajima’s D seemed to be slightly higher in the third intron than in the first intron. Furthermore, the values across the studied populations in the third intron were more consistent when compared with the first intron which showed decreased Tajima’s D in Asian populations (Japanese and Chinese; Table 2 and Fig. 2). In line with Tajima’s D, also Fu and Li’s D* and Fu and Li’s F* tests showed significant deviations from neutrality in the investigated populations (Table 2).
Table 2. Results of the DnaSP Analyses.
Data | n | SNPs | Tajima’s D | p | Fu and Li’s D* | p | Fu and Li’s F* | p |
---|---|---|---|---|---|---|---|---|
FTO | ||||||||
CEU | 120 | 155 | 2.8228 | ** | 2.4349 | ** | 3.1269 | ** |
CHB | 90 | 152 | 2.173 | * | 2.0845 | ** | 2.5467 | ** |
JPT | 90 | 152 | 2.3878 | * | 1.7632 | ** | 2.4324 | ** |
Sorbs | 68 | 155 | 2.605 | * | 2.1099 | ** | 2.7689 | ** |
YRI | 120 | 147 | 2.8541 | ** | 2.5895 | ** | 3.2467 | ** |
Intron 1 | ||||||||
CEU | 120 | 31 | 2.76787 | ** | 1.30288 | # | 2.26319 | ** |
CHB | 90 | 30 | 1.8609 | # | 1.91378 | ** | 2.24036 | ** |
JPT | 90 | 30 | 1.90727 | # | 1.91378 | ** | 2.28686 | ** |
Sorbs | 68 | 31 | 2.48058 | * | 1.32095 | # | 2.09295 | ** |
YRI | 120 | 27 | 2.67329 | * | 1.88863 | ** | 2.63200 | ** |
Intron 3 | ||||||||
CEU | 120 | 12 | 3.378 | ** | 1.4607 | # | 2.5645 | ** |
CHB | 90 | 11 | 2.7264 | ** | 1.4261 | # | 2.2333 | ** |
JPT | 90 | 11 | 3.0817 | ** | 1.4261 | # | 2.3809 | ** |
Sorbs | 68 | 12 | 2.7653 | ** | 1.4762 | # | 2.275 | ** |
YRI | 120 | 12 | 3.4569 | *** | 1.4607 | # | 2.5984 | ** |
n = number of haplotypes
* p<0,05
** p<0,02
*** p<0,001
# 0,1>p>0,05
FTO: 52.297.274–52.696.065 bp; rs1421091 - rs2689269 on Human May 2004 (NCBI35/hg17), Intron 1: 52.326.794–52401034bp; rs7203521 - rs6499646, Intron 3: 52.421.901–52.434.067 bp; rs7204916 - rs7205213
From the publicly available data, the Haplotter showed iHS top scores > 2 in the CEU on the FTO-region, e.g. for rs7193144 and rs8050136 (S3 Table). All SNPs in the third intron had rather low iHS-values. These results were not significant according to the published map of recent positive selection in the human genome [36]. The unstandardized iHS values calculated with the iHS-tool supported publically available data in the Haplotter (Table 3 and S3 Table). It is noteworthy that in the Sorbs, the iHS for SNPs in the third intron (rs17818902 and rs17818920) was nearly three times higher than in the CEU sample (1.468 vs. 0.590). Notably, the strength of association with BMI positively correlated with the unstandardized iHS Table 3. iHS values indicate that no long haplotype was observed for variants in FTO. The Fst values between comparisons were close to zero among variants which indicated no significant population differences (Table 3).
Table 3. Population genetic measures on unstandardized iHS and Fst.
Unstandardized iHS | Fst | p/beta | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP | p (BMI) | Intron | Sorbs | CEU | CHB | JPT | YRI | CEU vs. Sorbs | CHB vs. Sorbs | JPT vs. Sorbs | YRI vs. Sorbs | |
rs1861869 | 0.01465 | 1 | −0.294 | −0.766 | 0.212 | −0.839 | 0.261 | 0.0040 | 0.1283 | 0.0919 | 0.0081 | 0.068/−0.501 |
rs1861868 | 0.00938 | 1 | −0.869 | −1.029 | −0.049 | −1.005 | −0.310 | 0.0017 | 0.0912 | 0.0592 | 0.0558 | |
rs9940700 | 0.07348 | 1 | 0.000 | 0.539 | −0.618 | −0.401 | −0.693 | 0.0021 | 0.4219 | 0.3889 | 0.2070 | |
rs9939973 | 0.005447 | 1 | −0.667 | −1.411 | −0.865 | −0.664 | −0.429 | 0.0059 | 0.0751 | 0.0543 | 0.0062 | |
rs9940128 | 0.01583 | 1 | −0.667 | −1.411 | −0.866 | −0.665 | −0.429 | 0.0059 | 0.0751 | 0.0543 | 0.0062 | |
rs9922047 | 0.01035 | 1 | −0.400 | 0.220 | −0.876 | −1.006 | −1.171 | 0.0063 | 0.0034 | 0.0034 | 0.0781 | |
rs16952522 | 0.1976 | 1 | n. a. | n. a. | −1.994 | −1.746 | n. a. | 0.0020 | 0.0016 | 0.0150 | 0.0055 | |
rs17817288 | 0.006795 | 1 | 0.274 | −0.500 | 0.294 | 0.534 | −0.537 | 0.0060 | 0.0067 | 0.0064 | 0.0135 | |
rs1477196 | 0.02183 | 1 | −1.197 | −1.057 | −1.701 | −1.891 | n. a. | 0.0045 | 0.0053 | 0.0053 | 0.1362 | |
rs1121980 | 0.004003 | 1 | 0.529 | 1.179 | 0.197 | −0.109 | 0.370 | 0.0059 | 0.0751 | 0.0543 | 0.0059 | |
rs7193144 | 0.003387 | 1 | 0.826 | 1.550 | 0.727 | 0.649 | 0.740 | 0.0053 | 0.1005 | 0.0664 | 0.0063 | |
rs16945088 | 0.5796 | 1 | −1.989 | 0.220 | −1.380 | −1.338 | −0.914 | 0.0061 | 0.0064 | 0.0062 | 0.0924 | |
rs8050136 | 0.003092 | 1 | 0.747 | 1.361 | 0.741 | 0.667 | 0.280 | 0.0047 | 0.1005 | 0.0664 | 0.0032 | |
rs9939609 | 0.008727 | 1 | 0.375 | 1.361 | 0.411 | 0.088 | −0.019 | 0.0047 | 0.1005 | 0.0664 | 0.0049 | |
rs9930506 | 0.02465 | 1 | 0.748 | 1.220 | 0.741 | 0.667 | −0.133 | 0.0059 | 0.0751 | 0.0482 | 0.0789 | |
rs11075994 | 0.9537 | 2 | 0.912 | 0.772 | 0.149 | 0.493 | 0.842 | 0.0043 | 0.0041 | 0.0201 | 0.0332 | 0.041/−0.959 |
rs1421090 | 0.1254 | 2 | −0.598 | −0.309 | −0.916 | 0.103 | −0.480 | 0.0132 | 0.1470 | 0.1807 | 0.0276 | |
rs9972717 | 0.7354 | 2 | −1.106 | −1.648 | n. a. | n. a. | n. a. | 0.0050 | 0.0761 | 0.0522 | 0.1063 | |
rs10852522 | 0.001739 | 2 | 0.146 | 0.199 | 0.111 | 0.138 | 0.896 | 0.0054 | 0.0456 | 0.0037 | 0.0053 | |
rs10521308 | 0.7355 | 3 | n. a. | n. a. | −0.943 | −1.439 | −1.072 | 0.0174 | 0.0387 | 0.0387 | 0.0061 | 0.009/0.881 |
rs17818902 | 0.000632 | 3 | 1.468 | 0.590 | 1.101 | 0.915 | 0.888 | 0.0154 | 0.0011 | 0.0015 | 0.0598 | |
rs17818920 | 0.0007213 | 3 | 1.468 | 0.590 | 1.101 | 0.915 | 0.888 | 0.0154 | 0.0011 | 0.0015 | 0.0598 | |
rs8053367 | 0.0009839 | 3 | −0.760 | −0.348 | −0.427 | −0.348 | 0.080 | 0.0029 | 0.0062 | 0.0006 | 0.0029 | |
rs8053740 | 0.001025 | 3 | −0.760 | −0.348 | −0.427 | −0.348 | 0.080 | 0.0029 | 0.0062 | 0.0006 | 0.0029 | |
rs7203051 | 0.001116 | 3 | −0.760 | −0.348 | −0.427 | −0.348 | 0.138 | 0.0029 | 0.0062 | 0.0006 | 0.0038 | |
rs7205009 | 0.001319 | 3 | −0.760 | −0.348 | −0.427 | −0.348 | 0.073 | 0.0029 | 0.0062 | 0.0006 | 0.0029 | |
rs7205213 | 0.001292 | 3 | −0.871 | −0.348 | −0.460 | −0.399 | 0.073 | 0.0009 | 0.0067 | 0.0011 | 0.0044 |
p = p-value for association to BMI in the Sorbs; iHS = integrated Haplotype score; CEU = Central Europeans; CHB = Han Chinese from Beijing; JPT = Japanese from Tokyo; YRI = Yoruba from Ibadan; n. a. = not available
Transcription factor binding sites
To elucidate the potential functional mechanisms underlying the strong association of variants in FTO’s third intron with BMI in the Sorbs, we investigated in silico the impact of SNPs on predicting putative transcription factor binding sites (S5 Table). As shown in S5 Table, minor alleles at variants rs17818902 showing the top association signal with BMI in the Sorbs [11] and rs8053740 predicted novel binding sites for two transcription factors TCF3 and SOX17, respectively.
Further, at rs8053367 the presence of the minor allele led to binding of multiple transcription factors, namely FREAC-2, HNF-3beta, HFH-1, HFH-2 (S5 and S6 Tables), while the binding site of transcription factor Irf-1 seemed to be significantly compromised. Finally, minor alleles of rs17818920 and rs7205213 led to alterations in binding sites for HLF, SOX17 and HNF-3beta (S5 and S6 Tables).
Discussion
Polymorphisms in the FTO gene have been shown to be associated with obesity in different ethnic groups of European and other ancestries [3–5,7,11,46]. Whereas associations with SNPs in the first intron have been robustly replicated, in the Sorbs, the strongest effects on BMI were found for variants in the third intron [11]. To address specific associations of FTO variants in Sorbs, we aimed to test the gene for signatures of selection in mammals and particularly in human populations.
PAML-analyses on FTO coding sequence from 35 vertebrates revealed constant results with ω < 1. In the NearlyNeutral (M1a) model, most sites in the coding sequence were under strong (∼80%) purifying selection or neutral mutation (∼20%) and experienced a very high rate of synonymous substitutions, thus suggesting strong gene conservation. This underlines the biological importance of the gene, as functionally relevant genes are expected to be highly conserved and thus subject to purifying selection [47]. The fact that FTO is subject to purifying selection is consistent with findings of Ohashi et al. who studied the genetic architecture of FTO polymorphisms in oceanic populations [48]. However, considering purifying selection being an important means of evolution to maintain the optimized form of a gene, it cannot be excluded that FTO variants were positively selected in the past when the ability to store energy was beneficial.
It is of note that PAML analysis only examined the coding sequence, however most of the obesity-associated SNPs, like rs9939609 and rs17818920, map in the intronic regions. Therefore, test statistics such as iHS and Fst which are independent from coding regions are inevitable in evolutionary analyses. It has been stated before that at least in the oceanic populations FTO does not seem to comply with the thrifty genotype hypothesis [48]. The analyses in these populations have only considered polymorphisms in the first intron. In the context of our present data, further studies systematically targeting the FTO locus in populations of different ethnic backgrounds will be inevitable. As we show in the present study, neither iHS nor Fst values indicate positive selection for any allele from the third intron in individual groups or groups of populations (see S1 Fig.). Thus, consistent with studies in oceanic populations, our data would not support the thrifty genotype hypothesis. In contrast, other population genetic measures like Tajima’s D, Fu and Li’s D*, and Fu and Li’s F* suggest the signature of balancing selection in FTO on a significant level. Detection of balancing selection might be explained by the fact that whereas Tajima’s D considers the sites themselves in terms of allele frequencies, it does not take into account the surrounding regions of sites through addressing LD (such as iHS) [49,50]. This is interesting when considering that polymorphisms in the third intron showed the strongest association with BMI in the Sorbs from Germany [11]. Remarkably, in the first intron, Tajima’s D is rather low in Asian populations when compared with European Caucasians, which might at least in part explain ethnic specificity in the genotype-phenotype associations with SNPs in this gene region. However, it has to be noted that Tajima’s D was consistent across studied populations in the third intron, which does not seem to support a population specific pattern of selection for the Sorbs. Rather than that, the specific association of FTO variants in the third intron with BMI in the Sorbs is more likely to be explained by specific environmental factors interacting with the genetic background in the Sorbs.
Given the fact that the strongest effects on BMI in the Sorbs is on the third intron, rs17818902, we also investigated its potential impact on the transcription factor binding sites. In silico analyses using publically available transcription factor databases suggested that the minor rs17818902 allele would predict a novel binding site for TCF3 and that of rs8053740 for SOX17. TCF3 acts as a transcriptional regulator involved in the initiation of neuronal differentiation [51,52] whereas SOX17 is an important player in the regulation of embryonic development and in the determination of the cell fate [53]. However, the causal functional variant remains to be discovered. Thus, studies on pathways downstream of TCF3 may pave the path for better understanding the mechanism underlying associations of FTO with obesity. Nevertheless, it has to be acknowledged that a recent study strongly suggested a direct interaction of noncoding regions in the first intron of FTO showed enhancer activity with the promoter of the homeobox gene IRX3 thus regulating IRX3 expression [54]. However, the clear association connecting functional variants in the first intron and obesity remains vague. Secondly, the experiments of loss of function on IRX3 were conducted in human cerebellum [55]. In contrast, there is strong evidence for the role of FTO in the complex pathophysiology of obesity (systematically reviewed in [56]). For example, it has been showed that the highest expression of FTO is in the brain region controlling food intake [8] and that hypothalamic-specific manipulation of Fto affects food intake in rats [57].
In conclusion, population genetic analyses revealed balancing signatures of selection at the FTO locus with a prominent signal in the third intron, a genomic region with strong association with BMI in the Sorbs. Data provide some evidence supporting evolutionary selective pressure on genes associated with obesity.
Supporting Information
Acknowledgments
We thank all those who participated in the studies. We thank Stefano Berto (University of Leipzig) for his helpful comments and suggestions in regard to evolutionary analyses.
Data Availability
All relevant data are within the paper and its Supporting Information files.
Funding Statement
This project was supported by grants from the Boehringer Ingelheim Foundation to P.K., from the IFB AdiposityDiseases (ADI-K50D and ADI-K7-45 to Y.B. and ADI-K60E to P.K.), and from the Collaborative Research Center granted by the DFG (CRC 1052; B03). The work was further funded by the German Research Foundation (BO 3147/4-1 to Y.B.). IFB AdiposityDiseases is supported by the Federal Ministry of Education and Research (BMBF), Germany, FKZ: 01EO1001. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1. Stunkard AJ, Foch TT, Hrubec Z (1986) A twin study of human obesity. JAMA 256 (1): 51–54. Available: 10.1001/jama.1986.03380010055024. [DOI] [PubMed] [Google Scholar]
- 2. Maes HH, Neale MC, Eaves LJ (1997) Genetic and environmental factors in relative body weight and human adiposity. Behav Genet 27 (4): 325–351. [DOI] [PubMed] [Google Scholar]
- 3. Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G et al. (2010) Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat Genet 42 (11): 937–948. 10.1038/ng.686 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Dina C, Meyre D, Gallina S, Durand E, Korner A et al. (2007) Variation in FTO contributes to childhood obesity and severe adult obesity. Nat Genet 39 (6): 724–726. [DOI] [PubMed] [Google Scholar]
- 5. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM et al. (2007) A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316 (5826): 889–894. 10.1126/science.1141634 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Yajnik CS, Janipalli CS, Bhaskar S, Kulkarni SR, Freathy RM et al. (2009) FTO gene variants are strongly associated with type 2 diabetes in South Asian Indians. Diabetologia 52 (2): 247–252. 10.1007/s00125-008-1186-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Hotta K, Nakata Y, Matsuo T, Kamohara S, Kotani K et al. (2008) Variations in the FTO gene are associated with severe obesity in the Japanese. J Hum Genet 53 (6): 546–553. 10.1007/s10038-008-0283-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Gerken T, Girard CA, Tung YL, Webby CJ, Saudek V et al. (2007) The obesity-associated FTO gene encodes a 2-oxoglutarate-dependent nucleic acid demethylase. Science 318 (5855): 1469–1472. 10.1126/science.1151710 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Hennig BJ, Fulford AJ, Sirugo G, Rayco-Solon P, Hattersley AT et al. (2009) FTO gene variation and measures of body mass in an African population. BMC Med Genet 10: 21 10.1186/1471-2350-10-21 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Li H, Wu Y, Loos, Ruth J F, Hu FB, Liu Y et al. (2008) Variants in the fat mass- and obesity-associated (FTO) gene are not associated with obesity in a Chinese Han population. Diabetes 57 (1): 264–268. [DOI] [PubMed] [Google Scholar]
- 11. Tonjes A, Zeggini E, Kovacs P, Bottcher Y, Schleinitz D et al. (2010) Association of FTO variants with BMI and fat mass in the self-contained population of Sorbs in Germany. Eur J Hum Genet 18 (1): 104–110. 10.1038/ejhg.2009.107 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. NEEL JV (1962) Diabetes mellitus: a “thrifty” genotype rendered detrimental by “progress”. Am J Hum Genet 14: 353–362. [PMC free article] [PubMed] [Google Scholar]
- 13. Ayub Q, Moutsianas L, Chen Y, Panoutsopoulou K, Colonna V et al. (2014) Revisiting the thrifty gene hypothesis via 65 loci associated with susceptibility to type 2 diabetes. Am J Hum Genet 94 (2): 176–185. 10.1016/j.ajhg.2013.12.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Yang Z (1997) PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci 13 (5): 555–556. [DOI] [PubMed] [Google Scholar]
- 15. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24 (8): 1586–1591. [DOI] [PubMed] [Google Scholar]
- 16. Thompson JD, Higgins DG, Gibson TJ (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22 (22): 4673–4680. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30 (12): 2725–2729. 10.1093/molbev/mst197 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Darriba D, Taboada GL, Doallo R, Posada D (2012) jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9 (8): 772 10.1038/nmeth.2109 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Anisimova M, Bielawski JP, Yang Z (2001) Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol 18 (8): 1585–1592. [DOI] [PubMed] [Google Scholar]
- 20. Yang Z (1998) Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol 15 (5): 568–573. [DOI] [PubMed] [Google Scholar]
- 21. Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148 (3): 929–936. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Yang Z, Swanson WJ, Vacquier VD (2000) Maximum-likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Mol Biol Evol 17 (10): 1446–1455. [DOI] [PubMed] [Google Scholar]
- 23. Yang Z, Nielsen R, Goldman N, Pedersen AM (2000) Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155 (1): 431–449. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Yang Z, Wong, Wendy S W, Nielsen R (2005) Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol 22 (4): 1107–1118. [DOI] [PubMed] [Google Scholar]
- 25. Wong Wendy SW, Yang Z, Goldman N, Nielsen R (2004) Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics 168 (2): 1041–1051. 10.1534/genetics.104.031153 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Rozas J, Rozas R (1995) DnaSP, DNA sequence polymorphism: an interactive program for estimating population genetics parameters from DNA sequence data. Comput Appl Biosci 11 (6): 621–625. [DOI] [PubMed] [Google Scholar]
- 27. Sabeti PC, Reich DE, Higgins JM, Levine, Haninah ZP, Richter DJ et al. (2002) Detecting recent positive selection in the human genome from haplotype structure. Nature 419 (6909): 832–837. [DOI] [PubMed] [Google Scholar]
- 28. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira, Manuel AR et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81 (3): 559–575. 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68 (4): 978–989. 10.1086/319501 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Stephens M, Scheet P (2005) Accounting for decay of linkage disequilibrium in haplotype inference and missing-data imputation. Am J Hum Genet 76 (3): 449–462. 10.1086/428594 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123 (3): 585–595. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Fu YX, Li WH (1993) Statistical tests of neutrality of mutations. Genetics 133 (3): 693–709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Southam L, Soranzo N, Montgomery SB, Frayling TM, McCarthy MI et al. (2009) Is the thrifty genotype hypothesis supported by evidence based on confirmed type 2 diabetes- and obesity-susceptibility variants. Diabetologia 52 (9): 1846–1851. 10.1007/s00125-009-1419-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Lewontin RC, Krakauer J (1973) Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74 (1): 175–195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Biswas S, Akey JM (2006) Genomic insights into positive selection. Trends in Genetics 22 (8): 437–446. [DOI] [PubMed] [Google Scholar]
- 36. Voight BF, Kudaravalli S, Wen X, Pritchard JK (2006) A map of recent positive selection in the human genome. PLoS Biol 4 (3): e72 10.1371/journal.pbio.0040072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449 (7164): 851–861. 10.1038/nature06258 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Sandelin A, Wasserman WW, Lenhard B (2004) ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res 32 (Web Server issue): W249–52. 10.1093/nar/gkh372 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Moses AM, Chiang DY, Pollard DA, Iyer VN, Eisen MB (2004) MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model. Genome Biol 5 (12): R98 10.1186/gb-2004-5-12-r98 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Sandelin A, Alkema W, Engstrom P, Wasserman WW, Lenhard B (2004) JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res 32 (Database issue): D91–4. 10.1093/nar/gkh012 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Kulzer JR, Stitzel ML, Morken MA, Huyghe JR, Fuchsberger C et al. (2014) A Common Functional Regulatory Variant at a Type 2 Diabetes Locus Upregulates ARAP1 Expression in the Pancreatic Beta Cell. The American Journal of Human Genetics 94 (2): 186–197. 10.1016/j.ajhg.2013.12.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Melas PA, Lennartsson A, Vakifahmetoglu-Norberg H, Wei Y, Aberg E et al. (2013) Allele-specific programming of Npy and epigenetic effects of physical activity in a genetic model of depression. Transl Psychiatry 3: e255 Available: 10.1038/tp.2013.31. 10.1038/tp.2013.31 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Friese RS, Altshuler AE, Zhang K, Miramontes-Gonzalez JP, Hightower CM et al. (2013) MicroRNA-22 and promoter motif polymorphisms at the Chga locus in genetic hypertension: functional and therapeutic implications for gene expression and the pathogenesis of hypertension. Human Molecular Genetics 22 (18): 3624–3640. 10.1093/hmg/ddt213 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Yang Z, Bielawski JP (2000) Statistical methods for detecting molecular adaptation. Trends in Ecology & Evolution 15 (12): 496–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Metzger KJ, Thomas MA (2010) Evidence of positive selection at codon sites localized in extracellular domains of mammalian CC motif chemokine receptor proteins. BMC Evol Biol 10: 139 10.1186/1471-2148-10-139 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Chang Y, Liu P, Lee W, Chang T, Jiang Y et al. (2008) Common variation in the fat mass and obesity-associated (FTO) gene confers risk of obesity and modulates BMI in the Chinese population. Diabetes 57 (8): 2245–2252. 10.2337/db08-0377 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47. Yang Z, Nielsen R (2008) Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol Biol Evol 25 (3): 568–579. 10.1093/molbev/msm284 [DOI] [PubMed] [Google Scholar]
- 48. Ohashi J, Naka I, Kimura R, Natsuhara K, Yamauchi T et al. (2007) FTO polymorphisms in oceanic populations. J Hum Genet 52 (12): 1031–1035. [DOI] [PubMed] [Google Scholar]
- 49. Ma Y, Zhang H, Zhang Q, Ding X (2014) Identification of Selection Footprints on the X Chromosome in Pig. PLoS ONE 9 (4): e94911 EP -. Available: http://dx.doi.org/10.1371%2Fjournal.pone.0094911. 10.1371/journal.pone.0094911 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50. Pybus M, Dall’Olio GM, Luisi P, Uzkudun M, Carreno-Torres A et al. (2014) 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of natural selection in modern humans. Nucleic Acids Res 42 (Database issue): D903–9. 10.1093/nar/gkt1188 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51. Bain G, Maandag EC, Izon DJ, Amsen D, Kruisbeek AM et al. (1994) E2A proteins are required for proper B cell development and initiation of immunoglobulin gene rearrangements. Cell 79 (5): 885–892. [DOI] [PubMed] [Google Scholar]
- 52. Yi F, Pereira L, Hoffman JA, Shy BR, Yuen CM et al. (2011) Opposing effects of Tcf3 and Tcf1 control Wnt stimulation of embryonic stem cell self-renewal. Nat Cell Biol 13 (7): 762–770. 10.1038/ncb2283 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53. Niakan KK, Ji H, Maehr R, Vokes SA, Rodolfa KT et al. (2010) Sox17 promotes differentiation in mouse embryonic stem cells by directly regulating extraembryonic gene expression and indirectly antagonizing self-renewal. Genes Dev 24 (3): 312–326. 10.1101/gad.1833510 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54. Smemo S, Tena JJ, Kim K, Gamazon ER, Sakabe NJ et al. (2014) Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature 507 (7492): 371–375. 10.1038/nature13138 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55. Gorkin DU, Ren B (2014) Genetics: Closing the distance on obesity culprits. Nature 507 (7492): 309–310. Available: 10.1038/nature13212. 10.1038/nature13212 [DOI] [PubMed] [Google Scholar]
- 56. Loos Ruth JF, Yeo Giles SH (2014) The bigger picture of FTO--the first GWAS-identified obesity gene. Nat Rev Endocrinol 10 (1): 51–61. 10.1038/nrendo.2013.227 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57. Tung YL, Ayuso E, Shan X, Bosch F, O’Rahilly S et al. (2010) Hypothalamic-specific manipulation of Fto, the ortholog of the human obesity gene FTO, affects food intake in rats. PLoS ONE 5 (1): e8771 10.1371/journal.pone.0008771 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All relevant data are within the paper and its Supporting Information files.