Abstract
Genome-wide association studies (GWAS) have identified numerous genetic variants in the human genome associated with diseases and traits. Nevertheless, for most loci the causative variant is still unknown. Expression quantitative trait loci (eQTL) in disease relevant tissues is an excellent approach to correlate genetic association with gene expression. While liver is the primary site of gene transcription for two pathways relevant to age-related macular degeneration (AMD), namely the complement system and cholesterol metabolism, we explored the contribution of AMD associated variants to modulate liver gene expression. We extracted publicly available data and computed the largest eQTL data set for liver tissue to date. Genotypes and expression data from all studies underwent rigorous quality control. Subsequently, Matrix eQTL was used to identify significant local eQTL. In total, liver samples from 588 individuals revealed 202,489 significant eQTL variants affecting 1,959 genes (Q-Value < 0.001). In addition, a further 101 independent eQTL signals were identified in 93 of the 1,959 eQTL genes. Importantly, our results independently reinforce the notion that high density lipoprotein metabolism plays a role in AMD pathogenesis. Taken together, our study generated a first comprehensive map reflecting the genetic regulatory landscape of gene expression in liver.
Introduction
Large genome-wide association studies (GWAS) have led to the identification of risk-associated variants with genome-wide significance for a multitude of diseases1. The very first successful GWAS identified an association between the complement factor H (CFH) locus on chromosome 1q31.3 and late stage age-related macular degeneration (AMD), the most common cause of blindness in industrialized countries2. The International AMD Genomics Consortium (IAMDGC) recently reported the most up-to-date list of genetic associations with 52 independent variants in 34 loci involved in AMD risk greatly extending our understanding of the genetic architecture of this blinding disease3. As one result, non-synonymous variants in five genomic loci point towards an involvement of the complement cascade as part of the innate immunity system4–6, implicating genes such as complement component 2 (C2), 3 (C3), 4 (C4), 9 (C9) as well as complement factor H (CFH), I (CFI), and B (CFB) in AMD pathology.
In addition, four AMD-associated loci harbour genes involved in high density lipoprotein (HDL) metabolism7–9. So far, the functional variants in the potential HDL-metabolism genes are not unambiguously identified, mainly due to extensive linkage disequilibrium between the strongest associated variants and other correlated variants regularly offering multiple plausible genes as disease-associated candidates. Although statistical methods can help to further reduce the number of candidate variants10, most of the signals associated with AMD are localized within non-coding regions of the genome3. These regions, however, may harbour sequences directly linked to gene expression such as 5′-prime untranslated regions or intronic sequences. On the other side, non-coding regions are often intergenic but nevertheless can have an effect such as recruiting transcription factors, which in turn can influence expression of nearby genes11. In general, such loci potentially harbour regulatory sequences in cis or trans to the gene regulated by the associated genetic variant.
Correlating the allele count at a variant locus and the expression of nearby genes in a given tissue can bridge the gap between the observed genetic association and understanding the mechanisms responsible for disease risk by defining an expression quantitative trait locus (eQTL)12. In recent years, thousands of eQTL were identified in multiple tissues by genome- and transcriptome-wide approaches13. Disease-associated genetic markers that represent a significant eQTL for a nearby gene can thus easily be identified. For AMD, so far only a single eQTL (rs79037040) affecting the expression of the tumor necrosis factor receptor superfamily, member 10a (TNFRSF10A) in white blood cells was reported to be associated with disease risk14. The lack of additional eQTL involved in AMD pathology can possibly be attributed to the observation that many eQTL studies are greatly underpowered15,16. In addition, although around 50% of known eQTL are common to several tissues13, many eQTL are likely to be specific for a given tissue or cell type.
The primary site of disease in AMD is the retinal tissue complex consisting of the retinal pigment epithelium (RPE), Bruch’s membrane and the choriocapillaris. The function of the liver is fundamentally different from the retina; thus the liver likely will react differently to environmental influences than retinal tissue. Furthermore, eQTL in liver might behave differently in retinal cells. However, it is challenging to sample a large number of human retinae and, as a consequence, no eQTL data from one of these cell types have been reported to date. Thus, we aimed at performing eQTL analysis in a surrogate tissue which expresses several genes of interest in loci associated with AMD, with the assumption that a polymorphism could have similar effects on gene expression in the surrogate tissue as in the retina. We selected liver as surrogate tissue since it is the main tissue for expression of genes of the complement system and of HDL metabolism. Moreover, gene products (e.g. proteins) of complement and of HDL metabolism expressed by the liver are frequently secreted into circulation where they exert various biological activities, and which could consequently influence AMD through its systemic effect in the choriocapillaris. With this rational we anticipated that investigating eQTL of these genes in liver could reveal important mechanistic insights into the association of these loci with AMD.
Several previous studies have published eQTL from liver tissue using different genotyping and expression profiling platforms17–20. Raw or curated data files of these studies are publicly available. In the present study, we have jointly analysed the data from the four independent liver eQTL resources by state-of-the-art methods, subsequent to rigorous quality control. In addition, the results were compared to published GWAS data for AMD risk variants. We show that a common, AMD associated deletion of the complement factor H related 1 and 3 genes (CFHR1/3) results in a markedly reduced expression of both genes in the liver. Furthermore, we show that two AMD risk variants are significant eQTL in liver affecting the expression of two genes involved in HDL metabolism.
Results
Data preparation
The main objective of this study was to identify significant cis-eQTL in liver tissue as part of our long-term goal to understand the functional consequences of genetic variants associated with complex diseases such as AMD. To this end, individual datasets publically available were merged although each one used distinct platforms to call genotypes and to measure gene expression (Table 1). Consequently, stringent quality control measures were applied to compile a data set of high quality genotypes and gene expression values comparable across studies. Altogether, the study comprised 6,256,941 imputed variants and expression values of 24,123 genes in 588 samples of European descent.
Table 1.
Study | Schadt et al.18 | Schroeder et al.19 | Innocenti et al.17 | GTEx Start/Mida | Meta-analysis | Mega-analysis |
---|---|---|---|---|---|---|
Sample size before/after QC | 178/178 | 149/149 | 208/178 | 97/83 | 588 | 588 |
Origin of liver tissue | Post-mortem tissue and resections from donor livers | Normal tissue resected during surgery for liver cancer | Post mortem tissue and resections from donor livers | Post mortem tissue | — | — |
Transcriptome profiling platform | Agilent Custom 44k | Illumina Human WG-6v2.0 | Agilent 4 × 44 k | RNA-seq (Illumina HiSeq2000) | — | — |
Probes/genes before QC | 40,638 | 48,701 | 45,015 | 56,318 | — | — |
Genes after QC | 24,123 | 24,123 | 24,123 | 24,123 | 24,123 | 24,123 |
Genotyping platform | Affymetrix 500k; Illumina 650 Y | Illumina HumanHap300 | Illumina 610 Quad | Illumina Omni 5 M/2.5 Ma | — | — |
Variants before QC | 449,699 | 318,237 | 620,901 | 2,526,494/2,378,075a | — | — |
Variants after QC | 383,719 | 296,718 | 545,886 | 2,389,798/2,119,410a | — | — |
Variants merged before imputationb | 861,575 | 861,575 | 861,575 | 861,575 | 861,575 | 861,575 |
Variants after imputation and QC | 6,256,941 | 6,256,941 | 6,256,941 | 6,256,941 | 6,256,941 | 6,256,941 |
eQTL variants (Q-Value < 1 × 10−3) | 29,546 | 71,423 | 52,565 | 19,802 | 101,148 | 202,489 |
eQTL variants (Q-Value < 1 × 10−3, unique) | 27,689 | 69,292 | 49,594 | 16,953 | 95,257 | 183,872 |
eQTL genes (Q-Value < 1 × 10−3, unique) | 363 | 913 | 670 | 387 | 1,313 | 1,959 |
Overlapping eQTL genes with meta-analysis (Q-Value < 1 × 10−3) | 215 (59.23%) | 491 (53.78%) | 408 (60.9%) | 149 (38.5%) | 1,313 (100%) | 1,260 (64.32%) |
Overlapping eQTL genes with mega-analysis (Q-Value < 1 × 10−3) | 288 (79.34%) | 688 (75.36%) | 537 (80.15%) | 207 (53.49%) | 1,260 (95.96%) | 1,959 (100%) |
Independent signals (P-Value < 1 × 10−6) | — | — | — | — | — | 2,060 |
QC = quality control; aOmni 2.5 M for the first data release (GTEx start) and Omni 5 M for the mid-point release (GTEx mid). bAfter quality control the genotype files of the four studies were merged into a single file and variants, which did not overlap in-between datasets, were assigned missing. We only kept variants which were genotyped in at least 100 samples.
eQTL Analysis
First, we performed eQTL calculations for each of the four studies individually13,17–19. Local eQTL were calculated by including all variants on the same chromosome that are located within 1,000,000 base pairs (1 Mbp) up- or downstream of the transcription start site or polyadenylation site of a gene locus, respectively. Next, mixed effects models were used to perform a meta-analysis by including the effect sizes and standard errors obtained from each study separately. In order to account for multiple testing, we controlled the false discovery rate (FDR) to be smaller than 0.00121. At this threshold, 101,148 eQTL variants and 1,313 genes differentially regulated by the eQTL were identified (Table 1).
As meta-analysing data can result in a loss of statistical power22–24, we additionally performed a mega-analysis by directly estimating eQTL in the entire dataset comprising all four studies. The mega-analysis yielded 202,489 statistically significant eQTL variants affecting the expression of 1,959 genes while controlling the FDR to be less than 0.001 (Fig. 1, Table 1 and Supplementary Table S1). Compared to the results from the meta-analysis, the mega-analysis provided a two fold increase in the number of eQTL variants and a 1.5 fold increase in the number of differentially regulated genes. Of note, however, both mega- and meta-analysis discovered more significant results than any of the four individual studies alone (Table 1). Only 38.5 to 60.9% of the significant single study eQTL genes could be replicated in the meta-analysis. The GTEx study had the lowest replication rate, possibly due to its relatively small sample size (N = 83). The overlap of single study results and the mega-analysis is on average 19% higher (53.5 to 80.15%) than the overlap observed in the meta-analysis. As the mega-analysis reproduced 95.96% of the meta-analysis eQTL and detected many signals beyond, we decided to rely on the data of the mega-analysis for further calculations although this may represent a slight overestimation of eQTL derived from the available data set.
We next aimed to identify independent eQTL variants (independent hits) within a significant eQTL. Consequently, the eQTL analysis was repeated for each significant eQTL gene after additionally adjusting the linear regression model for the most significant variant identified for the eQTL gene. The procedure was reiterated until no additional significant variants were identified. In this analysis, a variant was regarded a significant independent eQTL for a given gene if the P-value associated with the regression slope was lower than 1 × 10−6. With this approach, we detected an additional 101 independent eQTL variants in 93 out of 1,959 liver eQTL genes (Fig. 1, Supplementary Tables S2 and S3). Of note, our analysis could not replicate the AMD associated eQTL rs797037040 previously shown to influence the expression of TNFRSF10A in blood14. This is owed to the fact that neither this variant nor any variant in linkage disequilibrium (R > 0.4) to rs797037040 could be reliably imputed into the dataset.
Characterization of eQTL-variants
We further localized all independent eQTL hits with regard to the transcription start site (TSS) of the affected gene (Fig. 2). We observed that the most significant eQTL variants were close to a respective TSS. Overall, 1,599 out of 2,060 (1,959 + 101) independent eQTL variants were within 100,000 base pairs of a nearest TSS, well in agreement with other studies16,25–27.
We then evaluated the RegulomeDB28 scores of eQTL variants (Fig. 3A and Supplementary Table S4). As expected, eQTL variants (N = 183,872) were enriched in RegulomeDB classes one to four (P-values < 6.82 × 10−09), which represent variants with likely regulatory properties while categories 5 and higher show minimal to no functional relevance. In addition, eQTL variants with the smallest P-values and additional secondary signals (independent hits, N = 2,040) revealed an even stronger enrichment in classes one to four compared to controls and compared to all eQTL variants (P-values from 1.72 × 10−04 to 8.27 × 10−11).
To further characterise each eQTL signal for its most severe functional consequence relative to a known gene structure, we applied Ensembl VEP29,30 (Fig. 3B, Supplementary Table S5). Control variants were predominantly located upstream (49.22%) and downstream (49.09%) of known gene structures. Another 1.63% of the control variants were found in introns of genes. Less than 0.1% of the control variants were assigned to functional categories such as missense or untranslated transcript region (UTR). Interestingly, the proportion of intronic variants was significantly larger in both, the mega-analysis variants (19.72%, P < 1.00 × 10−150) and the independent hit variants (29.17%, P < 1.00 × 10−150) (Fig. 3B, Supplementary Table S5). Additionally, other predicted categories like UTR or coding region variants occurred more often (P-values < 1.72 × 10−07).
Taken together, our findings indicate that significant liver eQTL variants are more often localized within known gene structures and are likely regulatory variants as they are found within regions of transcription factor binding and open chromatin. In addition, the most significant variants are also the most likely functional variant in each eQTL. This is supported by findings that the most significant eQTL variants (i) show an increased level of enrichment in all relevant RegulomeDB score categories compared to all eQTL variants and (ii) are enriched within known gene structures such as introns or coding exons.
Liver eQTL in AMD
Finally, we investigated whether any of the 52 independent AMD associated variants reported by Fritsche et al.3 coincides with the established liver eQTL. Out of 52 independent tag variants, only 31 variants had an allele frequency >5% and could be reliably imputed into our dataset. Remarkably, 8 of these 31 variants significantly affect 15 unique eQTL-genes (Q-Value < 0.05, Table 2).
Table 2.
IH* | dbSNP ID | CHR | Position [hg19] | Gene ID (ENSG) | Gene Symbol | P-Value | Q-Value | Effect Size** | SE | Non-risk allele | Risk allele | Frequency of risk allele | Distance to TSS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1.2 | rs570618 | chr1 | 196,657,064 | ENSG00000244414 | CFHR1 | 2.15E-12 | 4.34E-10 | 0.711 | 0.099 | G | T | 0.360307 | −131822 |
1.1 | rs10922109 | chr1 | 196,704,632 | ENSG00000134365 | CFHR4 | 3.29E-24 | 1.66E-21 | 1.118 | 0.105 | A | C | 0.554124 | −114738 |
1.1 | rs10922109 | chr1 | 196,704,632 | ENSG00000244414 | CFHR1 | 7.56E-24 | 2.54E-21 | 0.992 | 0.094 | A | C | 0.554124 | −84254 |
1.1 | rs10922109 | chr1 | 196,704,632 | ENSG00000116785 | CFHR3 | 8.38E-17 | 2.11E-14 | 0.923 | 0.107 | A | C | 0.554124 | −39292 |
1.1 | rs10922109 | chr1 | 196,704,632 | ENSG00000143278 | F13B | 0.0002 | 0.012 | 0.216 | 0.057 | A | C | 0.554124 | −303688 |
1.1 | rs10922109 | chr1 | 196,704,632 | ENSG00000000971 | CFH | 0.0004 | 0.025 | 0.338 | 0.095 | A | C | 0.554124 | 83625 |
1.6 | rs61818925 | chr1 | 196,815,450 | ENSG00000116785 | CFHR3 | 1.38E-08 | 1.55E-06 | 0.649 | 0.113 | G | T | 0.417647 | 71526 |
1.6 | rs61818925 | chr1 | 196,815,450 | ENSG00000244414 | CFHR1 | 5.97E-05 | 0.006 | 0.416 | 0.103 | G | T | 0.417647 | 26564 |
1.6 | rs61818925 | chr1 | 196,815,450 | ENSG00000134389 | CFHR5 | 0.0001 | 0.011 | −0.371 | 0.096 | G | T | 0.417647 | −131216 |
11 | rs7803454 | chr7 | 99,991,548 | ENSG00000121716 | PILRB | 5.67E-27 | 5.72E-24 | 0.251 | 0.022 | C | T | 0.188567 | 57812 |
11 | rs7803454 | chr7 | 99,991,548 | ENSG00000085514 | PILRA | 6.16E-11 | 1.04E-08 | 0.372 | 0.056 | C | T | 0.188567 | 26396 |
23.1 | rs2043085 | chr15 | 58,680,954 | ENSG00000128918 | ALDH1A2 | 0.0002 | 0.016 | 0.207 | 0.056 | T | C | 0.667257 | 435333 |
23.2 | rs2070895 | chr15 | 58,723,939 | ENSG00000166035 | LIPC | 5.45E-09 | 6.88E-07 | 0.561 | 0.095 | A | G | 0.80531 | 21172 |
23.2 | rs2070895 | chr15 | 58,723,939 | ENSG00000137845 | ADAM10 | 0.0003 | 0.021 | −0.217 | 0.06 | A | G | 0.80531 | −163463 |
24.2 | rs17231506 | chr16 | 56,994,528 | ENSG00000087237 | CETP | 8.48E-05 | 0.008 | −0.216 | 0.055 | C | T | 0.327434 | −1233 |
27 | rs6565597 | chr17 | 79,526,821 | ENSG00000182612 | TSPAN10 | 1.70E-09 | 2.46E-07 | −0.526 | 0.086 | C | T | 0.383459 | −77375 |
27 | rs6565597 | chr17 | 79,526,821 | ENSG00000184009 | ACTG1 | 0.0002 | 0.016 | 0.312 | 0.084 | C | T | 0.383459 | 49825 |
27 | rs6565597 | chr17 | 79,526,821 | ENSG00000141552 | ANAPC11 | 0.0006 | 0.036 | −0.171 | 0.05 | C | T | 0.383459 | −321844 |
CHR: chromosome; TSS: transcription start site; SE: standard error of the effect size.
*IH: independent hit according to Fritsche et al.3.
**Effect size (beta) of a single AMD risk increasing allele.
Within the complement factor H (CFH) locus, several AMD associated variants appear to influence expression of CFH and CFH related genes (CFHR). Particularly, the independent hit variant rs10922109 (independent hit 1–1 in3) tags a common deletion of CFHR1/CFHR3. Since the deletion of both genes is protective against AMD, the risk increasing allele results in elevated expression of the two genes (Table 2).
Notably, two genes involved in HDL metabolism, Cholesteryl ester transfer protein (CETP) and hepatic lipase (LIPC), were both significantly regulated by AMD associated variants (Table 2). Specifically, rs17231506 is highly correlated to rs3764261 (R² > 0.99), a variant that results in markedly increased HDL levels in blood31. According to our eQTL data, rs17231506 reduces the expression of CETP, in line with the observation that CETP deficiency or pharmacological inhibition leads to elevated serum HDL. Further, our eQTL data showed that rs2070895 (−250 G > A) increases the expression of LIPC and would be expected to be associated with decreased HDL blood32.
Finally, we identified additional AMD associated variants that potentially act as eQTL in liver. The AMD risk increasing allele of rs7803454 increases the expression of the paired immunoglobin like type 2 receptor alpha (PILRA) and beta (PILRB) genes. The resulting proteins are known to function as antagonists within the Tyrosine-protein phosphatase non-receptor type 6 (PTPN6) pathway33 and have been implicated in both, AMD and Alzheimer’s disease risk34. Interestingly, we did not detect any eQTL within the strongest AMD associated locus located on chromosome 10q26 (ARMS2/HTRA1).
Discussion
In this study, we have combined the genotypes and expression data of four previously published independent studies to further our understanding of the regulatory networks in liver tissue. Each individual study intended to identify new liver specific eQTL in order to elucidate the contribution of regulatory mechanisms on different diseases or traits. For example, Schadt et al.18 were the first to explore eQTL in liver tissue and correlated their results to genome-wide association studies of seven different diseases. AMD was not among them. Innocenti et al.17 and Schroeder et al.19 followed a similar approach but concentrated on the reproducibility of eQTL, while the latter group additionally focused on genes involved in drug response pathways. GTEx analysed eQTL in 44 human tissues and aimed to explore the interplay of gene regulation across tissues. By merging these resources this is to our knowledge the largest study on liver eQTL to date and promises to provide novel insight into the role of genetic variation on gene expression in liver tissue. Combining several studies while jointly analysing the data has drastically increased the power to detect novel eQTL across the genome. The replication rates of eQTL detected in individual studies can be as low as 38.5% (Table 1), even with a stringent FDR threshold of 0.1%. An approach known as mega-analysis has further improved the power of our study to detect novel eQTL. This also revealed a higher replication rate of eQTL identified by individual studies. Although the gain in power attributable to a mega-analysis can depend on the type of study23, the mega-analysis approach allowed us to identify additional, independent signals in 5% of the significant eQTL.
Mapping identified eQTL-variants against known gene structures such as introns, coding or non-coding exons revealed that a large proportion of the identified eQTL variants is highly enriched in intronic and coding regions of genes, in line with previous results13,16, although such an enrichment may be specific for certain tissues35. Similarly, we have observed a strong enrichment of eQTL variants in RegulomeDB classes one to four representing known eQTL and expected regulatory variants. Since many eQTL are shared between tissues20, an enrichment in RegulomeDB class 1 (representing known eQTL) is not surprising. Nevertheless, we also observe a strong enrichment of eQTL variants in RegulomeDB classes two to four, representing variants in experimentally determined regulatory epigenetic elements. Importantly, hypothetic regulatory variants in RegulomeDB class 5 (characterized by either transcription factor binding or a peak of DNase hypersensitivity) are not enriched in the identified liver eQTL variants, greatly increasing confidence in the robustness of our results. Alternatively, variants in RegulomeDB class 5 could be variants with weaker regulatory effects and thus, our study might be underpowered to identify significant eQTL variants that are characterized by mapping to a weak epigenetic mark.
Strikingly, the observed enrichment in gene structures were more pronounced in the independent hits which represented the most significantly associated variants and, in addition, the most significantly associated secondary signals. This strengthen the notion that the variant showing the smallest P-value of association or correlation in a locus is a priori the most likely one to be the true causative mutation36. Alternatively, it is also possible that the functional allele of the variant with the smallest P-value is rather tagging several haplotypes that affect gene expression in the same orientation37. Therefore, in case a defined eQTL is of major interest, such a locus has to be dissected further by statistical means to identify all independent haplotypes carrying functional alleles10.
While the central nervous system and the retina are expressing complement genes, the liver is nevertheless the primary site of synthesis for circulatory complement proteins38. In addition, the liver plays a key role in lipid metabolism39, besides the complement cascade another pathway implicated in AMD pathology by epidemiological and genetic studies. We therefore investigated whether any of the top hits of a recent GWAS for AMD3 are regulatory variants influencing gene expression in liver.
One of the most significant association signals for AMD resides within the CFH locus on chromosome 1 and represents a compound signal of two protective haplotypes tagged by the protective allele of the top variant37. One protective haplotype harbors a common deletion of the CFH-related genes 1 and 3 (CFHR1/3)40. The heterozygous deletion of both genes results in reduced levels of CFHR1/3 proteins in serum, while a homozygous deletion results in a complete absence of CFHR1/341,42. In line with this, we found that the AMD risk increasing allele of rs6677604 is correlated to increased expression of both genes while the protective allele of rs6677604 (in strong linkage disequilibrium with the CFHR1/3 deletion) is correlated with decreased expression. In addition, the protective allele reduces the expression of other CFHR genes as well as the expression of the CFH gene. Since CFH and CFH-related genes share high sequence identity with each other, the expression values of the individual gene may not be distinguishable from the related gene by currently used high-throughput methods43–45. Indeed, we found that the gene expression values of CFH and CFH-related genes (CFHR1-5) are correlated in liver samples (R² between 0.1 and 0.5).
One important result of our study reveals that two AMD-associated signals near LIPC and CETP are significant eQTL, strongly implicating HDL metabolism and serum lipid levels in AMD pathogenesis. We observed that the AMD risk increasing allele of rs17231506 reduces CETP expression, likely resulting in elevated HDL levels in serum46. This is in line with the observation that HDL levels are elevated in AMD patients compared to controls7–9. Further, the risk increasing allele of rs2070895 near LIPC results in increased expression of LIPC, which is generally associated with reduced serum HDL levels47. A study by Burgess and Smith48 also observed an AMD associated variant next to LIPC (rs261342) to be associated with decreased HDL serum levels48. This variant is in high linkage disequilibrium with rs2070895 (R2 = 0.84) which was shown in our study to cause elevated LIPC expression in liver. Burgess and Smith48 in addition demonstrated that the AMD risk associated variant rs261342 predominately results in reduced LDL and increased HDL levels. Of note, CETP and LIPC genes are key regulators of HDL remodelling which might be essential for efficient delivery of lipids (e.g. fatty acids, carotenoids) into the retina and efflux of excess lipids out of the retina. Importantly, CETP and LIPC variants have been shown to have additive effects on cardiovascular risk with low CETP activity variants combined with low LIPC activity variants increased the risk49. Cardiovascular risk could therefore add additional pressure to select specific variant gene combinations in the aged AMD population that were protected from cardiovascular death. A similar line of thought emerged from another recent study, which found that a genetic score based on genome-wide significant variants for elevated HDL serum levels was higher in AMD patients, strongly suggesting that AMD patients have more alleles that increase HDL than controls50, in line with other studies51,52. Other confounding variables such as exercise, drugs or alcohol consumption or the occurrence of AMD in study participants are potentially influencing our eQTL analysis. However, the individuals in the study were largely below 60 years of age (404 out of 588) and thus AMD associated impairment such as an overly sedentary life style should play a minor role in confounding our analysis. Furthermore, this study included a diverse and large set of individuals across multiple studies, which should reduce the effect of confounding environmental factors, especially since AMD associated factors are not likely to significantly influence confounders such as alcohol consumption53,54 or treatment with different, liver-metabolized drugs.
Conclusions
We present the currently most comprehensive eQTL analysis for liver tissue and report that 1,959 out of 24,123 investigated genes have at least one significant eQTL in liver. Significant eQTL variants are more frequently found within gene boundaries and are more enriched in RegulomeDB classes representing likely regulatory variants. Several of these liver eQTL overlap with genetic variants strongly associated with AMD at genome-wide significance. These findings underscore the validity of the eQTL approach to identify disease-associated functional variants and provide further confirmation that HDL metabolism is strongly involved in AMD aetiology. Nevertheless, it should be emphasized that further replication of our results in disease relevant tissues such as retina or RPE or other functional validation studies are warranted. Specifically, this could further validate our notion that HDL metabolism is, in addition to the complement cascade, a major pathway in AMD disease development.
Methods
Genotype data
The genotypes of the four studies were retrieved from the respective databases (Table 1). Genotype quality control was performed for each study separately and, in addition, jointly after imputation. Since some studies reported only the zygosity of their samples at each variant (e.g. homozygosity: AA or BB; heterozygosity: AB), we first matched the reported alleles of each variant to the respective allele in the 1000 Genomes reference dataset to the Biomart30,55 online database (http://grch37.ensembl.org/biomart/). Multi-allelic variants were excluded to avoid potential ambiguity. Next, for each study we extracted the genotypes of all samples at 30,000 randomly chosen variants from all autosomes. We also included the genotypes of all samples from the 1000 Genomes Project Phase 3 (release 20130502)56 at the same variants and performed a PCA with the snpgdsPCA function of the SNPRelate57 package in R58. Since the haplotype structure can greatly vary between populations, we only included individuals clustering next to the European (EUR) reference individuals in the eQTL analyses (Supplementary Fig. S1). We then compared the reference allele in the datasets to the reference allele in the European 1000 Genomes samples. Alleles were flipped when given on the opposite strand. We excluded variants whose reference allele frequency differed by more than 10% from the reference allele frequency of the 1000 Genomes European samples. Furthermore, we excluded variants that were (1) not on autosomes, (2) had a minor allele frequency of MAF < 0.05 or deviated significantly from Hardy-Weinberg equilibrium59 (HWE, P < 1 × 10−6) after applying the respective function in the VCFtools60.
The individual genotype data sets were merged into a single VCF file. Variants which were not present in an individual study or were not genotyped in at least 100 samples were assigned missing in the respective individuals. Phasing and imputation was performed on the merged data, as accuracy of both algorithms increases with increasing sample sizes61. Phasing was performed with SHAPEIT2 and standard settings by supplying the imputed genotypes from the 1000 Genomes Phase 3 reference panel62. The same reference panel was used to conduct a whole genome imputation with IMPUTE263 at standard settings. Next, VCFtools was used to remove variants with a minor allele frequency < 5% and variants which showed evidence for a significant deviation from Hardy-Weinberg equilibrium (P < 1 × 10−6). In addition, variants with an IMPUTE2 info score smaller than 0.4 considered to be of low quality64, were removed. Finally, the reference allele frequency of each study was compared against the reference allele frequency of all other studies (Supplementary Fig. S2). Variants whose reference allele frequency differed by more than 15% between studies were excluded.
Specifics for each data set were as follows:
The GTEx data were retrieved through dbGAP65 (https://www.ncbi.nlm.nih.gov/gap, accession: phs000424.v6.p1). The positions of the variants were already reported based on the final hg19 build and thus, no additional lift-over was required.
Innocenti et al.17 genotype information was retrieved from the GEO database66 (accession code: GSE26105). The genotyping had been performed by the authors on an Illumina 610 Quad chip and the genotypes were encoded by each individual’s zygosity status (homozygosity: AA, BB; or heterozygosity: AB). The hg19 coordinates as well as the respective alleles of the variants were retrieved from Ensemble by querying the Biomart online database with the respective dbSNP identifier.
The genotype information from Schroeder et al.19 was retrieved from the GEO database (accession: GSE39036). The samples had been genotyped by the authors on an Illumina HumanHap300 chip and the genotypes were also encoded according to the individual zygosity status. The hg19 coordinates and alleles were retrieved from Ensemble as specified above.
The genotypes from the Schadt et al.18 study were retrieved from the Synapse database (accession: syn89614). The samples had been genotyped on either the Affymetrix 500k or the Illumina 650 Y genotyping chip. The genotype file included hg17 positions of each variant, a unique dbSNP identifier and both alleles of each individual. We initially removed variants without dbSNP identifiers and then used the program liftover67 from the UCSC Genome Browser (https://genome.ucsc.edu/util.html) to retrieve the hg19 coordinates of each variant.
Gene expression data
The present study included the gene expression data from four independent studies. Three studies profiled gene expression by employing microarray platforms (Table 1) while one study used high-throughput transcriptome sequencing (RNA-Seq) for data generation. First, we remapped array probes to an in silico mRNA reference database based on Ensemble gene annotation30 with the help of the ReAnnotator pipeline68. Only exome-matching probes showing less than five mismatches were retained in the data set. Probes mapping to multiple genes or overlapping with common variants (according to dbSNP release 142) were removed from the analysis69. Probes which measured the gene expression of the same gene, were merged by calculating the mean of all probes within a gene, weighted by the variance of the respective probe over all samples. Hence, probes with a higher variance contributed more to the overall transcript levels than probes with little variation across samples.
For each data set, we performed basic expression normalization and quality control. Briefly, the available expression values were log2-transformed and a PCA was performed with the prcomp function in R to detect potential outlier samples within the dataset. We merged replicate samples by taking the mean of all replicate values.
The expression data of the four studies were merged and missing expression values were imputed using the K-Nearest-Neighbour70 method provided by the impute.knn function of the impute Bioconductor package71 in R. Genes that were included in one study but could not be imputed into the other studies were removed. Differences between all individuals were evaluated by conducting a PCA on the gene expression data (Supplementary Fig. S3A–C). In addition, the expression values for each individual were plotted as a boxplot (Supplementary Fig. S3D–F). Due to substantial differences between datasets, we applied further normalisation steps. Initially, we performed a quantile normalisation with the normalize.quantiles function of the R package preprocessCore72,73. Since quantile normalization alone was not sufficient to normalize all studies, we adopted an empirical batch correction method called ComBat with the combat function from the sva package in R74. By supplying known batch effects to the function (i.e. the study labels), ComBat standardises the data gene-wise and then applies an empirical batch effect correction (Supplementary Fig. S3C and F). The batch corrected expression values were used for the eQTL analyses, as no obvious bias of the single studies was noticeable.
Methods specific to the individual studies were as follows:
Firstly, for the GTEx data expression values (release GTex-V6p) were downloaded from the GTEx Portal (http://www.gtexportal.org/home/). The levels of transcript expression were encoded as “reads per kilobase of transcript per million mapped reads” (RPKM). We added 0.001 to all RPKM values to perform a log2 transformation of the data.
Secondly, the expression data from Innocenti et al.17 were retrieved from the gene expression omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/, accession: GSE25935). The expression values were already background subtracted and transformed to the log2 scale.
Thirdly, Schadt et al. (2008) provided a curated version of their data in the Synapse database (https://www.synapse.org/, accession: syn89614). As this study used an Agilent Custom 44k array, probe sequences were not openly available. In addition, not all samples had values for both genotype and gene expression data. The authors supplied an annotation file which links probe IDs to Ensemble and RefSeq75 identifiers. Expression values of probes were only used, if they were unanimously linked to a single Ensemble or RefSeq identifier. Furthermore, RefSeq identifiers were converted to Ensemble gene identifiers with the help of the Ensembl biomart tool55. A Shapiro–Wilk test76 revealed that raw values larger than 2 or smaller than −2 values are likely outliers. Thus, all of these were set to missing.
Finally, expression values from Schroeder et al.19 were retrieved from the GEO database (accession: GSE32504) as quantile normalized data. To retrieve probe sequences of the Illumina Human WG-6v2.0 chip for probe remapping, the illuminaHumanv2.db R package77 was used.
eQTL analysis
Linear regression analysis between gene expression values and imputed allele dosages was performed with Matrix eQTL78. Age, gender and the first five principal components of the genotype PCA were included in the models as covariates. We exclusively calculated local eQTL (variant-gene distance less than one million base pairs) due to limited power to perform distant eQTL analyses15.
Two approaches were adopted to jointly analyse eQTL. First, a classic meta-analysis was applied to the individual study results. The effect size (slope) and standard error of the effect size were estimated with Matrix eQTL for each study separately. Further, a random effects model implemented in the function MiMa79 was applied to estimate the joint effect sizes and standard errors as well as the joint P-Values. The latter approach (mega-analysis) estimated local eQTL from the merged genotype and expression data directly. This approach also allowed us to search for novel independent eQTL for a gene by adjusting the linear regression model for the most significant eQTL variant for this gene. To account for multiple testing, the false discovery rate (FDR) was controlled to be smaller than 0.001. Thus, joint Q-Values were considered to be smaller than 0.001 for statistical significance.
Functional annotation of eQTL variants
A control set of variants was generated by randomly choosing around 200,000 genetic variants within 1 Mbp of a gene locus (defined by the transcription start and stop site of each gene). A RegulomeDB score (www.regulomedb.org/) was then assigned to each control and eQTL variant. The score denotes the confidence that a certain variant is important for transcription factor binding or chromatin accessibility and thus gene regulation. Variants in classes one to four are deemed very likely regulatory variants, while variants in classes five to seven are less likely to influence gene expression. In addition, the Ensembl Variant Effect Predictor (VEP, www.ensembl.org/vep) was used to assign each eQTL variant to a functional consequence relative to known gene structures. The program predicted the most severe consequence per gene within a range of 1 Mbp up and downstream of each variant. For eQTL variants, only predicted consequences affecting the associated eQTL gene were evaluated. For the control variants, a single random consequence for a nearby gene was chosen.
Ethics approval and consent to participate
This study used data of four public datasets. For further specifics on the respective ethics approvals, we refer to the single study publications.
Data availability statement
All data are available in public databases as detailed in the methods section.
Electronic supplementary material
Acknowledgements
TS was an awardee of the Roche Internships for Scientific Exchange (RiSE) Programme. The work has been supported in part by institutional funds (TG77) of the Institute of Human Genetics Regensburg and by a grant from the Helmut Ecker Foundation (Ingolstadt, Germany) to BHFW (No. 05/17).
Author Contributions
T.S. carried out the analysis and contributed to writing the manuscript. F.G. participated in study design, supervising the analysis and writing the initial manuscript draft. J.G. and S.N. participated in supervising the analysis and contributed to the interpretation of results. D.S.-C., C.M. and S.F. contributed to generation and interpretation of data. E.N. and B.H.F.W. participated in study design, coordination of the study, and finalizing the manuscript. All authors have read and approved the manuscript.
Competing Interests
F.G. and B.H.F.W. declare no competing interest. T.S., J.G., S.N., D.S.-C., C.M., S.F., and E.N., are current or former employees of F. Hoffmann-La Roche Ltd. (Basel, Switzerland). Funding bodies had no influence on data analysis, interpretation or presentation of the results.
Footnotes
Tobias Strunz, Felix Grassmann and Bernhard H. F. Weber contributed equally to this work.
Electronic supplementary material
Supplementary information accompanies this paper at 10.1038/s41598-018-24219-z.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Welter D, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42:D1001–6. doi: 10.1093/nar/gkt1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Klein RJ, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–9. doi: 10.1126/science.1109557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Fritsche LG, et al. A large genome-wide association study of age-related macular degeneration highlights contributions of rare and common variants. Nat. Genet. 2016;48:134–43. doi: 10.1038/ng.3448. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Grassmann F, Fauser S, Weber BHF. The genetics of age-related macular degeneration (AMD) – Novel targets for designing treatment options? Eur. J. Pharm. Biopharm. 2015;95:194–202. doi: 10.1016/j.ejpb.2015.04.039. [DOI] [PubMed] [Google Scholar]
- 5.Weber BHF, et al. The role of the complement system in age-related macular degeneration. Dtsch. Arztebl. Int. 2014;111:133–8. doi: 10.3238/arztebl.2014.0133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Grassmann F, et al. Multiallelic copy number variation in the complement component 4A (C4A) gene is associated with late-stage age-related macular degeneration (AMD) J. Neuroinflammation. 2016;13:81. doi: 10.1186/s12974-016-0548-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Paun CC, et al. Genetic Variants and Systemic Complement Activation Levels Are Associated With Serum Lipoprotein Levels in Age-Related Macular Degeneration. Invest. Ophthalmol. Vis. Sci. 2015;56:7766. doi: 10.1167/iovs.15-17035. [DOI] [PubMed] [Google Scholar]
- 8.Cougnard-Grégoire A, et al. Elevated high-density lipoprotein cholesterol and age-related macular degeneration: the Alienor study. PLoS One. 2014;9:e90973. doi: 10.1371/journal.pone.0090973. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Klein R, et al. Lipids, lipid genes, and incident age-related macular degeneration: the three continent age-related macular degeneration consortium. Am. J. Ophthalmol. 2014;158:513–24.e3. doi: 10.1016/j.ajo.2014.05.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Grassmann, F., Heid, I. M. & Weber, B. H. F. Recombinant Haplotypes Narrow the ARMS2/HTRA1 Association Signal for Age-Related Macular Degeneration. Genetics. 205, 919–24 (2017). [DOI] [PMC free article] [PubMed]
- 11.Gutierrez-Arcelus M, et al. Tissue-Specific Effects of Genetic and Epigenetic Variation on Gene Regulation and Splicing. PLoS Genet. 2015;11:e1004958. doi: 10.1371/journal.pgen.1004958. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M. Mapping complex disease traits with global gene expression. Nat. Rev. Genet. 2009;10:184–94. doi: 10.1038/nrg2537. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.GTEx Consortium, Gte. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science348, 648–60 (2015). [DOI] [PMC free article] [PubMed]
- 14.Arakawa, S. et al. Genome-wide association study identifies two susceptibility loci for exudative age-related macular degeneration in the Japanese population. Nat. Genet. 43, 1001–4 (2011). [DOI] [PubMed]
- 15.Wright FA, et al. Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 2014;46:430–437. doi: 10.1038/ng.2951. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kim Y, et al. A meta-analysis of gene expression quantitative trait loci in brain. Transl. Psychiatry. 2014;4:e459. doi: 10.1038/tp.2014.96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Innocenti F, et al. Identification, replication, and functional fine-mapping of expression quantitative trait loci in primary human liver tissue. PLoS Genet. 2011;7:e1002078. doi: 10.1371/journal.pgen.1002078. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Schadt EE, et al. Mapping the genetic architecture of gene expression in human liver. PLoS Biol. 2008;6:1020–1032. doi: 10.1371/journal.pbio.0060107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Schröder A, et al. Genomics of ADME gene expression: mapping expression quantitative trait loci relevant for absorption, distribution, metabolism and excretion of drugs in human liver. Pharmacogenomics J. 2013;13:12–20. doi: 10.1038/tpj.2011.44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Aguet, F. et al. Local genetic effects on gene expression across 44 human tissues. bioRxiv (Cold Spring Harbor Labs Journals), 10.1101/074450 (2016).
- 21.Benjamini Y, Hochberg Y. On the Adaptive Control of the False Discovery Rate in Multiple Testing With Independent Statistics. J. Educ. Behav. Stat. 2000;25:60–83. doi: 10.3102/10769986025001060. [DOI] [Google Scholar]
- 22.Crowder, M. Meta-analysis and Combining Information in Genetics and Genomics edited by Rudy Guerra, Darlene R. Goldstein. Int. Stat. Rev. 79, 134–135 (2011).
- 23.Lin DY, Zeng D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet. Epidemiol. 2009;34:60–6. doi: 10.1002/gepi.20435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Shrier I, Platt RW, Steele RJ. Mega-trials vs. meta-analysis: Precision vs. heterogeneity? Contemp. Clin. Trials. 2007;28:324–328. doi: 10.1016/j.cct.2006.11.007. [DOI] [PubMed] [Google Scholar]
- 25.Schramm K, et al. Mapping the Genetic Architecture of Gene Regulation in Whole Blood. PLoS One. 2014;9:e93844. doi: 10.1371/journal.pone.0093844. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Stranger BE, et al. Patterns of Cis regulatory variation in diverse human populations. PLoS Genet. 2012;8:e1002639. doi: 10.1371/journal.pgen.1002639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Stranger BE, et al. Population genomics of human gene expression. Nat. Genet. 2007;39:1217–1224. doi: 10.1038/ng2142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Boyle AP, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22:1790–7. doi: 10.1101/gr.137323.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.McLaren W, et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010;26:2069–70. doi: 10.1093/bioinformatics/btq330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Yates A, et al. Ensembl 2016. Nucleic Acids Res. 2016;44:D710–D716. doi: 10.1093/nar/gkv1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Global Lipids Genetics Consortium et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet.45, 1274–83 (2013). [DOI] [PMC free article] [PubMed]
- 32.Zhao S, Xie X, Nie S. The −250G− −>A polymorphism in the human hepatic lipase gene promoter affects blood lipids in Chinese. Clin. Chim. Acta. 2006;365:149–52. doi: 10.1016/j.cca.2005.08.013. [DOI] [PubMed] [Google Scholar]
- 33.Mousseau DD, Banville D, L’Abbé D, Bouchard P, Shen SH. PILRalpha, a novel immunoreceptor tyrosine-based inhibitory motif-bearing protein, recruits SHP-1 upon tyrosine phosphorylation and is paired with the truncated counterpart PILRbeta. J. Biol. Chem. 2000;275:4467–74. doi: 10.1074/jbc.275.6.4467. [DOI] [PubMed] [Google Scholar]
- 34.Logue MW, et al. Search for age-related macular degeneration risk variants in Alzheimer disease genes and pathways. Neurobiol. Aging. 2014;35(1510):e7–18. doi: 10.1016/j.neurobiolaging.2013.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Narahara M, et al. Large-scale East-Asian eQTL mapping reveals novel candidate genes for LD mapping and the genomic landscape of transcriptional effects of sequence variants. PLoS One. 2014;9:e100924. doi: 10.1371/journal.pone.0100924. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Maller JB, et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 2012;44:1294–301. doi: 10.1038/ng.2435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Grassmann F, Fritsche LG, Keilhauer CN, Heid IM, Weber BHF. Modelling the genetic risk in age-related macular degeneration. PLoS One. 2012;7:e37979. doi: 10.1371/journal.pone.0037979. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Barnum SR. Complement Biosynthesis in the Central Nervous System. Crit. Rev. Oral Biol. Med. 1995;6:132–146. doi: 10.1177/10454411950060020301. [DOI] [PubMed] [Google Scholar]
- 39.Nguyen P, et al. Liver lipid metabolism. J. Anim. Physiol. Anim. Nutr. (Berl). 2008;92:272–83. doi: 10.1111/j.1439-0396.2007.00752.x. [DOI] [PubMed] [Google Scholar]
- 40.Spencer KL, et al. Deletion of CFHR3 and CFHR1 genes in age-related macular degeneration. Hum. Mol. Genet. 2008;17:971–7. doi: 10.1093/hmg/ddm369. [DOI] [PubMed] [Google Scholar]
- 41.Pouw RB, et al. Complement Factor H-Related Protein 3 Serum Levels Are Low Compared to Factor H and Mainly Determined by Gene Copy Number Variation in CFHR3. PLoS One. 2016;11:e0152164. doi: 10.1371/journal.pone.0152164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Schäfer, N. et al. Complement Regulator FHR-3 Is Elevated either Locally or Systemically in a Selection of Autoimmune Diseases. Front. Immunol. 7, (2016). [DOI] [PMC free article] [PubMed]
- 43.Zipfel PF, et al. Factor H family proteins: on complement, microbes and human diseases. Biochem. Soc. Trans. 2002;30:971–978. doi: 10.1042/bst0300971. [DOI] [PubMed] [Google Scholar]
- 44.Zhang P, et al. A novel, multiplexed targeted mass spectrometry assay for quantification of complement factor H (CFH) variants and CFH-related proteins 1–5 in human plasma. Proteomics. 2017;17:1600237. doi: 10.1002/pmic.201600237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Hughes AE, et al. Sequence and Expression of Complement Factor H Gene Cluster Variants and Their Roles in Age-Related Macular DegenerationRisk. Investig. Opthalmology Vis. Sci. 2016;57:2763. doi: 10.1167/iovs.15-18744. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Mabuchi H, Nohara A, Inazu A. Cholesteryl Ester Transfer Protein (CETP) Deficiency and CETP Inhibitors. Mol. Cells. 2014;37:777–784. doi: 10.14348/molcells.2014.0265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Nong Z, et al. Hepatic lipase expression in macrophages contributes to atherosclerosis in apoE-deficient and LCAT-transgenic mice. J. Clin. Invest. 2003;112:367–378. doi: 10.1172/JCI200316484. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Burgess, S. & Davey Smith, G. Mendelian Randomization Implicates High-Density Lipoprotein Cholesterol–Associated Mechanisms in Etiology of Age-Related Macular Degeneration. Ophthalmology, 124, 1165–1174 (2017). [DOI] [PMC free article] [PubMed]
- 49.van Acker BaC, et al. High HDL cholesterol does not protect against coronary artery disease when associated with combined cholesteryl ester transfer protein and hepatic lipase gene variants. Atherosclerosis. 2008;200:161–7. doi: 10.1016/j.atherosclerosis.2007.11.019. [DOI] [PubMed] [Google Scholar]
- 50.Grassmann, F. et al. Genetic pleiotropy between age-related macular degeneration (AMD) and sixteencomplex diseases and traits. Genome Med. 9, 29, (2017). [DOI] [PMC free article] [PubMed]
- 51.Burgess S, Davey Smith G. Mendelian Randomization Implicates High-Density Lipoprotein Cholesterol-Associated Mechanisms in Etiology of Age-Related Macular Degeneration. Ophthalmology. 2017;124:1165–1174. doi: 10.1016/j.ophtha.2017.03.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Fan Q, et al. HDL-cholesterol levels and risk of age-related macular degeneration: a multiethnic genetic study using Mendelian randomization. Int. J. Epidemiol. 2017;46:1891–1902. doi: 10.1093/ije/dyx189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Adams MKM, et al. 20/20–Alcohol and age-related macular degeneration: the Melbourne Collaborative Cohort Study. Am. J. Epidemiol. 2012;176:289–98. doi: 10.1093/aje/kws004. [DOI] [PubMed] [Google Scholar]
- 54.Clarke T-K, et al. Genome-wide association study of alcohol consumption and genetic overlap with other health-related traits in UK Biobank (N = 112 117) Mol. Psychiatry. 2017;22:1376–1384. doi: 10.1038/mp.2017.153. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Gentleman RC, et al. BioMart – biological queries made easy. Genome Biol. 2004;5:R80. doi: 10.1186/gb-2004-5-10-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Abecasis GR, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65. doi: 10.1038/nature11632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Zheng X, et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics. 2012;28:3326–8. doi: 10.1093/bioinformatics/bts606. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.R Core Team. R: A language and environment for statistical computing (2015).
- 59.Wigginton JE, et al. A note on exact tests of Hardy-Weinberg equilibrium. Am. J. Hum. Genet. 2005;76:887–93. doi: 10.1086/429864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Danecek P, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Williams AL, et al. Phasing of many thousands of genotyped samples. Am. J. Hum. Genet. 2012;91:238–51. doi: 10.1016/j.ajhg.2012.06.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Delaneau O, Marchini J, Zagury J-F. A linear complexity phasing method for thousands of genomes. Nat. Methods. 2011;9:179–181. doi: 10.1038/nmeth.1785. [DOI] [PubMed] [Google Scholar]
- 63.Howie B, Marchini J, Stephens M. Genotype imputation with thousands of genomes. G3 (Bethesda). 2011;1:457–70. doi: 10.1534/g3.111.001198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Zheng H-F, et al. Performance of Genotype Imputation for Low Frequency and Rare Variants from the 1000 Genomes. PLoS One. 2015;10:e0116487. doi: 10.1371/journal.pone.0116487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Tryka KA, et al. NCBI’s Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res. 2014;42:D975–9. doi: 10.1093/nar/gkt1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Barrett T, et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 2013;41:D991–5. doi: 10.1093/nar/gks1193. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Rosenbloom KR, et al. The UCSC Genome Browser database: 2015 update. Nucleic Acids Res. 2015;43:D670–81. doi: 10.1093/nar/gku1177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Arloth J, Bader DM, Röh S, Altmann A. Re-Annotator: Annotation pipeline for microarray probe sequences. PLoS One. 2015;10:e0139516. doi: 10.1371/journal.pone.0139516. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Ramasamy A, et al. Resolving the polymorphism-in-probe problem is critical for correct interpretation of expression QTL studies. Nucleic Acids Res. 2013;41:e88. doi: 10.1093/nar/gkt069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Hastie, T., Tibshirani, R. & Sherlock, G. Imputing missing data for gene expression arrays. Tech. Report, Div. Biostat. Stanford Univ. 1–9 (1999).
- 71.Hastie, T., Tibshirani, R., Narasimhan Balasubramanian & Chu, G. impute: Imputation for microarray data. (2016).
- 72.Bolstad, B. M. preprocessCore: A collection of pre-processing functions. (2016).
- 73.Bolstad BM, Irizarry R, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]
- 74.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–27. doi: 10.1093/biostatistics/kxj037. [DOI] [PubMed] [Google Scholar]
- 75.O’Leary NA, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2016;44:D733–45. doi: 10.1093/nar/gkv1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.SHAPIRO SS, WILK MB. An analysis of variance test for normality (complete samples) Biometrika. 1965;52:591–611. doi: 10.1093/biomet/52.3-4.591. [DOI] [Google Scholar]
- 77.Dunning, M., Lynch, A. & Eldridge, M. IlluminaHumanv2.db: Illumina HumanWG6v2 annotation data (chip illuminaHumanv2). (2015).
- 78.Shabalin AA. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28:1353–1358. doi: 10.1093/bioinformatics/bts163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Viechtbauer W. Conducting Meta-Analyses in R with the metafor Package. J. Stat. Softw. 2010;36:1–48. doi: 10.18637/jss.v036.i03. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data are available in public databases as detailed in the methods section.