Gene-based genome-wide association study identified 19p13.3 for lean body mass

Shu Ran; Lei Zhang; Lu Liu; An-Ping Feng; Yu-Fang Pei; Lei Zhang; Ying-Ying Han; Yong Lin; Xiao Li; Wei-Wen Kong; Xin-Yi You; Wen Zhao; Qing Tian; Hui Shen; Yong-Hong Zhang; Hong-Wen Deng

doi:10.1038/srep45025

. 2017 Mar 21;7:45025. doi: 10.1038/srep45025

Gene-based genome-wide association study identified 19p13.3 for lean body mass

Shu Ran ^1,^*, Lei Zhang ^2,^3,^*, Lu Liu ^2,³, An-Ping Feng ^2,³, Yu-Fang Pei ^3,⁴, Lei Zhang ¹, Ying-Ying Han ¹, Yong Lin ¹, Xiao Li ^2,³, Wei-Wen Kong ^2,³, Xin-Yi You ^2,³, Wen Zhao ^2,³, Qing Tian ⁵, Hui Shen ⁵, Yong-Hong Zhang ^3,⁴, Hong-Wen Deng ^1,^5,^a

PMCID: PMC5359571 PMID: 28322352

Abstract

Lean body mass (LBM) is a complex trait for human health. To identify genomic loci underlying LBM, we performed a gene-based genome-wide association study of lean mass index (LMI) in 1000 unrelated Caucasian subjects, and replicated in 2283 unrelated Caucasians subjects. Gene-based association analyses highlighted the significant associations of three genes UQCR, TCF3 and MBD3 in one single locus 19p13.3 (discovery p = 6.10 × 10⁻⁵, 1.65 × 10⁻⁴ and 1.10 × 10⁻⁴; replication p = 2.21 × 10⁻³, 1.84 × 10⁻³ and 6.95 × 10⁻³; combined p = 2.26 × 10⁻⁶, 4.86 × 10⁻⁶ and 1.15 × 10⁻⁵, respectively). These results, together with the known functional relevance of the three genes to LMI, suggested that the 19p13.3 region containing UQCR, TCF3 and MBD3 genes was a novel locus underlying lean mass variation.

The muscular tissue, as characterized by lean body mass (LBM), is related to human health. Low LBM may be related to a series of health problems, such as sarcopenia, obesity, and increased mortality¹,². LBM is under genetic control, with heritability over 50%³,⁴. Previous GWA studies have found novel single nucleotide polymorphisms (SNPs) and genes associated with LBM⁵,⁶,⁷. However, the vast majority of LBM candidate genes remain to be revealed. LBM can be measured accurately by dual energy X-ray absorptiometry (DXA). Body lean mass index (LMI) is frequently used to predict sarcopenia⁸,⁹.

Genomic regions may present allelic heterogeneity to the phenotype, i.e., multiple variants in a region affect the phenotype jointly. In the presence of allelic heterogeneity, gene-based association test can improve statistical power and robustness of genetic association analysis by integrating multiple SNP signals into a single statistic¹⁰,¹¹. A variety of statistical methods were developed for gene-based test, such as Versatile Gene-based Association Study (VEGAS)¹². VEGAS combines p-values of multiple SNPs within a gene region into a gene-based score while accounting for linkage disequilibrium (LD) by simulating genotype data from a multivariate normal distribution in permutation-based test. VEGAS is computationally efficient because the number of permutation simulations is adaptive. Its performance is similar to other statistical methods¹², but is superior to the others in certain conditions due to its usage of large population reference panels when performing permutation. Therefore, it uses summary statistics only instead of raw genotype and phenotype data, making it suitable for the summary results from large-scale meta-analyses.

In this study, we reported a gene-based GWAS for LMI to identify genetic loci underlying variation of LBM. The discovery sample included 1000 unrelated Caucasian subjects genotyped with the Affymetrix 500 k genotyping array. The replication sample included 2283 unrelated subjects of Caucasian subjects, genotyped with the Affymetrix SNP6.0 genotyping array. Genotypes in both samples were imputed with the 1000 genomes project sequencing reference panel.

Materials and Methods

Ethics Statement

Study participants were recruited from the cities of Omaha and Kansas city and their neighboring areas. The study was approved by institutional review boards of the Creighton University and the University of Missouri-Kansas city. All participants provided written informed consent documents before entering the study. The methods carried out in accordance with the approved study protocol.

Subjects

Discovery sample

The discovery sample consisted of 1,000 unrelated Caucasian subjects of European ancestry, of whom 501 were males and 499 were females. The sample was randomly selected from a large-scale cohort containing over 6000 subjects. The inclusion and exclusion criteria for cases were described in our previous publications¹³.

Replication sample

The replication sample consisted of 2283 unrelated subjects of European ancestry. There were 556 male subjects and 1727 female subjects. All subjects were healthy individuals recruited from the Midwestern United States. There was no overlap between the subjects of the discovery and the replication cohorts.

Phenotyping

All subjects completed a structured questionnaire including lifestyle, medical history, family information, anthropometric variables, etc. Lean body mass and fat body mass (FBM) were measured with a Hologic QDR 4500 W DXA scanner (Hologic Inc., Bedford, MA, USA). Weight was measured in light clothing, on a calibrated balance beam scale. Height was obtained using a calibrated stadiometer. Lean mass index (LMI, kg/m²) was calculated as the ratio of lean mass to square of height¹⁴.

Genotyping and quality control

Genomic DNA was extracted from peripheral blood leukocytes using a commercial isolation kit (Gentra Systems, Minneapolis, MN, USA). Genotyping was performed as described in our previous publication¹⁵. Briefly, the discovery cohort was genotyped with the Affymetrix Mapping 250 K Nsp and Affymetrix Mapping 250 K Sty arrays at the Vanderbilt Microarray Shared Resource at Vanderbilt University Medical Center (Nashville, TN, USA) using the standard protocol recommended by the manufacturer (Affymetrix, Inc., Santa Clara, CA, USA). The Caucasian replication cohort was genotyped using the Affymetrix SNP 6.0 arrays by the standard protocol of the manufacturer.

We followed strict quality control (QC) procedure. Samples that had a minimum call rate of 95% were included. We discarded SNPs that deviated from Hardy-Weinberg equilibrium (p < 0.0001) and those containing a minor allele frequency (MAF) less than 0.01. After QC, 379,319 SNPs remained in the discovery sample.

Genotype imputation

Genotype imputation was applied to both the discovery and replication samples, with the 1000 Genomes projects sequence variants as reference panel (as of August 2010). Reference sample included 283 individuals of European ancestry.

The details of genotype imputation process had been described earlier¹⁶. Briefly, strand orientations between reference panel and test sample were checked before imputation, and inconsistencies were resolved by changing the test sample to reverse strand or removing the SNP from the test sample. Imputation was performed with MINIMAC¹⁷. Quality control was applied to impute SNPs with the following criteria: imputation r² > 0.5 and MAF > 0.01. SNPs failing the QC criteria were excluded from subsequent association analyses.

SNP-based association analyses

We used VEGAS for gene based association analyses, which takes individual SNP association p-values as input. Therefore, we performed SNP-based association analyses first to generate association signals. We used the principal component based approach for the correction of population stratification problem¹⁸. In both samples, covariates including gender, age, age², fat body mass (FBM) and the first five principal components¹⁹,²⁰ derived from genome-wide genotype data were screened for significance with the step-wise linear regression model implemented in the R function stepAIC. Raw BMI values were adjusted by significant covariates (age, gender and FBM), and the residuals were normalized by inverse quantiles of standard normal distribution.

Genetic associations were examined between genotyped and/or imputed SNPs and normalized phenotypes under an additive mode of inheritance with MACH2QTL²¹,²², which fitted phenotype by allele dosage with a linear regression model.

Linkage disequilibrium (LD) measure r² was calculated with Haploview²³. To examine potential confounding effect caused by population stratification, we estimated the genomic control inflation factor (λ)²⁴.

Gene-based test using VEGAS

Gene-based association test was examined to identify genes that were associated with the phenotype. We used the VEGAS approach for such analysis, which requires LD structure information and individual SNP p-values as input¹². The SNP inclusion criteria was the same as that for individual SNP tests. Specifically, we required that SNP imputation accuracy r² > 0.5 and MAF > 0.01.

Meta-analysis

Significant genes identified in the discovery sample were further replicated in the replication sample. The two gene-based association signals were then jointly analyzed with the Fisher’s method. Specifically, the Fisher’s statistic was calculated as

where p₁ and p₂ were the two gene-level p-values. Under the null hypothesis of no association, this statistic approximately follows the chi-square distribution with 4 degrees of freedom. Note that the Fisher’s method is always valid regardless of whether the directions of two effect sizes are consistent.

Gene-level effect direction evaluation

To evaluate the consistency of two gene-level effect directions, we studied the effects of all individual SNPs within the gene and proposed an overall gene-level effect direction measure. We then compared the directions of the two measures. To accomplish this, we set all the SNP effects in the discovery sample to be positive so that a gene-level z-score is estimated by Inline graphic , where is the positive z-score of the ith SNP in the discovery sample. If the reported SNP z-score is negative, then the reference allele and the alternative allele will exchange so that the z-score changes to be positive. For example, if the reference allele and the alternative allele are A and G, and the reported z-score of allele A is −1.0, then the allele G, whose z-score is 1.0, will change to the reference allele.

Once all reference alleles are determined by referring to the discovery sample, we will calculate a gene-level z-score for the replication sample as Inline graphic , where is the z-score of the reference allele at the ith SNP. If the z_replication is positive, then it has the same direction as z_discovery, and we declare that the two genes are consistent in effect direction; otherwise, they are opposite in effect direction.

Results

The basic characteristics of the subjects used in both discovery and replication samples are summarized in Table 1.