Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2017 Feb 13;114(9):2301–2306. doi: 10.1073/pnas.1621192114

Genetic regulatory signatures underlying islet gene expression and type 2 diabetes

Arushi Varshney a,1, Laura J Scott b,1, Ryan P Welch b,1, Michael R Erdos c,1, Peter S Chines c, Narisu Narisu c, Ricardo D’O Albanus d, Peter Orchard d, Brooke N Wolford d, Romy Kursawe e, Swarooparani Vadlamudi f, Maren E Cannon f, John P Didion c, John Hensley d, Anthony Kirilusha c; NISC Comparative Sequencing Programg, Lori L Bonnycastle c, D Leland Taylor c,h, Richard Watanabe i,j, Karen L Mohlke f, Michael Boehnke b,1, Francis S Collins c,1,3, Stephen C J Parker a,d,1,3, Michael L Stitzel e,1
PMCID: PMC5338551  PMID: 28193859

Significance

The majority of genetic variants associated with type 2 diabetes (T2D) are located outside of genes in noncoding regions that may regulate gene expression in disease-relevant tissues, like pancreatic islets. Here, we present the largest integrated analysis to date of high-resolution, high-throughput human islet molecular profiling data to characterize the genome (DNA), epigenome (DNA packaging), and transcriptome (gene expression). We find that T2D genetic variants are enriched in regions of the genome where transcription Regulatory Factor X (RFX) is predicted to bind in an islet-specific manner. Genetic variants that increase T2D risk are predicted to disrupt RFX binding, providing a molecular mechanism to explain how the genome can influence the epigenome, modulating gene expression and ultimately T2D risk.

Keywords: chromatin, diabetes, eQTL, epigenome, footprint

Abstract

Genome-wide association studies (GWAS) have identified >100 independent SNPs that modulate the risk of type 2 diabetes (T2D) and related traits. However, the pathogenic mechanisms of most of these SNPs remain elusive. Here, we examined genomic, epigenomic, and transcriptomic profiles in human pancreatic islets to understand the links between genetic variation, chromatin landscape, and gene expression in the context of T2D. We first integrated genome and transcriptome variation across 112 islet samples to produce dense cis-expression quantitative trait loci (cis-eQTL) maps. Additional integration with chromatin-state maps for islets and other diverse tissue types revealed that cis-eQTLs for islet-specific genes are specifically and significantly enriched in islet stretch enhancers. High-resolution chromatin accessibility profiling using assay for transposase-accessible chromatin sequencing (ATAC-seq) in two islet samples enabled us to identify specific transcription factor (TF) footprints embedded in active regulatory elements, which are highly enriched for islet cis-eQTL. Aggregate allelic bias signatures in TF footprints enabled us de novo to reconstruct TF binding affinities genetically, which support the high-quality nature of the TF footprint predictions. Interestingly, we found that T2D GWAS loci were strikingly and specifically enriched in islet Regulatory Factor X (RFX) footprints. Remarkably, within and across independent loci, T2D risk alleles that overlap with RFX footprints uniformly disrupt the RFX motifs at high-information content positions. Together, these results suggest that common regulatory variations have shaped islet TF footprints and the transcriptome and that a confluent RFX regulatory grammar plays a significant role in the genetic component of T2D predisposition.


Type 2 diabetes (T2D) is a complex disease characterized by pancreatic islet dysfunction and insulin resistance in peripheral tissues; >90% of T2D SNPs identified through genome-wide association studies (GWASs) reside in nonprotein coding regions and are likely to perturb gene expression rather than alter protein function (1). In support of this finding, we and others recently showed that T2D GWAS SNPs are significantly enriched in enhancer elements that are specific to pancreatic islets (24). The critical next steps to translate these islet enhancer T2D genetic associations into mechanistic biological knowledge are (i) identifying the putative functional SNP(s) from all of those that are in tight linkage disequilibrium (LD), (ii) localizing their target gene(s), and (iii) understanding the direction of effect (increased or decreased target gene expression) conferred by the risk allele. Two recent studies analyzed genome variation and gene expression variation across human islet samples to identify cis-expression quantitative trait loci (cis-eQTLs) that linked T2D GWAS SNPs to target genes (5, 6). However, the transcription factor (TF) molecular mediators of the islet cis-eQTLs remain poorly understood and represent important links to upstream pathways that will help untangle the regulatory complexity of T2D.

Results

Integrated Analysis of Islet Transcriptome and Epigenome Data.

To build links between SNP effects on regulatory element use and gene expression in islets, we performed strand-specific mRNA sequencing of 31 pancreatic islet tissue samples (Table S1) to an average depth of 100 million paired end reads. In parallel, we analyzed unstranded mRNA sequencing (mRNA-seq) data for 81 islet samples from a previous study (5). We subjected both datasets to the same quality control and processing. We additionally completed dense genotyping of 31 islet samples and downloaded genotypes for 81 previously described islet samples (5). Phasing and imputation yielded a final set of 6,060,203 autosomal SNPs present in both datasets with an overall minor allele count >10. To identify SNPs affecting gene expression within 1 Mb of the most upstream transcription start site (TSS), we performed separate cis-eQTL analyses for the two sets of islet samples and combined the cis-eQTL results via meta-analysis. We identified 3,964 unique autosomal cis-eQTL lead SNPs for 3,993 genes at a 5% false discovery rate (FDR).

Table S1.

Characteristics of islets sequenced in this study

UNOS no. Pool Tag Isolation no. Distributor no. Sex Age, y Race RIN Weight, kg Height, in. BMI Cause of death CMV ABO HLA.A HLA.B HLA.C HLA.DR IEQ Viability Purity Tube label Sample ID
TH5160 5 4 H166 OD22246 M 52 C 8.6 96 74 27.2 CVA/SAH + O+ 60,000 87 85 J4 TH5160
TJ1029 1 9 OD22967 F 52 C 8.2 63 64 23.8 ICH + A 10,000 E1 TJ1029
UBT342 3 6 H202 NDRI M 52 C 7.3 88 70 28.1 MVA O+ 10,000 88 85 G6 UBT342
UBZ379 3 1 HR225 JDRF M 57 C 7.4 93 71 28 CVA + O 2, — 7,51 4,11 20,000 96 90 G1 UBZ379
UFK467 2 9 HP029 ICR M 16 C 8.5 83.9 70 26.5 BHT A+ 3,24 49,51 7, N/A 13,15 20,000 93 70 F5 UFK467
UG3360 4 10 H225 ICR F 37 C 7.4 24 + 20,000 97 80 I6 UG3360
UGA076 2 5 HP1775 ICR M 60 C 8.4 90.9 71 27.9 CVH A1 26,68 35,44 4,15 20,000 90 90 F1 UGA076
UGQ298 5 8 HU644 ICR M 56 C 8 100 77 26.1 15,000 80 90 J8 UGQ298
UJG192 4 8 HU656 ICR F 52 C 8.6 77.3 62 31.1 ICH + O 20,000 94 75 I4 UJG192
U.K.2184 4 2 VAHP58 ICR M 44 7.4 24.5 18,000 80 80 H6 U.K.2184
ULI102 3 10 HR2344 JDRF M 54 C 8 98 66 30 ICH A+ 20,000 96 90 H2 ULI102
ULM047 4 7 CITH016 ICR M 56 C 8.3 72.72 70 23 CVA A 20,000 91 78 I3 ULM047
VA2103 2 2 H561 ICR F 42 C 8.3 84.8 70 26.8 CVA B 2,31 27,60 4,13 12,000 99 60 E6 VA2103
VBL403 1 1 HR244 JDRF F 51 C 7.5 70 67 24.2 ICH O+ 20,000 81 70 D1 VBL403
VBS118 5 1 H565 ICR F 42 C 6.5 107.3 67 37 CVD O 20,000 99 80 J1 VBS118
VD3037 4 11 JDRF M 47 C 6.9 132 76 35.4 BHT + B+ 20,000 96 80 I7 VD3037
VDH041 4 4 HR238 JDRF F 59 C 7.3 63.8 64 24.1 ICH + B+ 20,000 98 90 H8 VDH041
VDN110 4 1 CITH024 ICR F 28 C 7.9 92 67 31.8 + O 20,000 95 88 H5 VDN110
VDZ067 5 11 OD29517 NDRI F 26 C 6.6 65 67 22.4 ICH/brain tumor + B+ 2,28 35,50 7,11 20,000 85 90 K3 VDZ067
VEM410 4 9 HU687 ICR M 36 C 8.5 113.6 73 33 BHT/suicide A 20,000 90 70 I5 VEM410
VEY129 5 9 H129 ICR M 40 C 7.5 69 69 22.5 Suicide: drug intoxication B 23,68 44,51 13, 20,000 99 90 K1 VEY129
VFM167 2 12 HP75 ICR F 46 C 7.4 67 62 27.2 CVD A 3, 7,35 CW4,7 4,17 20,000 80 95 F8 VFM167
VLO352 2 6 OD31254 NDRI F 61 C 7.9 57 62 22.2 SAH AB 3,24 27,35 11,16 10,000 80 95 F2 VLO352
WHY272 3 4 H338 NDRI M 32 C 6.5 69.5 66 24.7 GSWH/HT O 20,000 99 93 G4 WHY272
XEC010 1 3 HP1938 NDRI F 47 C 8.5 68.2 68 22.9 CVA/ICH + O− 40,000 D3 XEC010
XFS109 1 5 H363 OD35122 F 35 C 7.5 79.5 64 30 BHT A 20,000 97 95 D5 XFS109
XJY097 3 9 HP1963 OD36229 F 57 C 6.8 63 61 26.2 CVA + A 50,000 90 80 H1 XJY097
YHN107 4 3 HP2003 OD38345 M 51 C 7.7 90.9 70 28.8 HT A 50,000 91 90 H7 YHN107
ZEN060 1 8 NDRI OD40157 M 55 C 7 81 69 26.4 CVA O+ 25,000 D8 ZEN060
ZFX185 2 10 NDRI OD40479 F 21 C 7 59 67 20.4 BHT A 17,000 F6 ZFX185
ZHE289 3 8 NDRI OD40843 M 40 C 6.9 115 73 36.4 BHT + O− 40,000 95 90 G8 ZHE289

Data not reported for a given sample are marked with —. BHT, blunt head trauma; C, Caucasian; CVA, cerebrovascular accident; CVD, cardiovascular disease; CVH, cerebrovascular hemorrhage; F, female; GSWH, gunshot wound to the head; HT, head trauma; ICH, intracerebral hemorrhage; M, male; MVA, motor vehicle accident; N/A, not applicable; SAH, subarachnoid hemorrhage; UNOS, United Network for Organ Sharing.

Next, we integrated chromatin immunoprecipitation followed by sequencing (ChIP-seq) data for five histone modifications across islets (2, 7) and 30 diverse tissues with publicly available datasets (Table S2) (810) using ChromHMM (9). This analysis produced 13 unique and recurrent chromatin states (Fig. 1A and Fig. S1), including promoter, enhancer, transcribed, and repressed regions. To identify specific regulatory element sites within these chromatin states, we profiled open chromatin in two islets using the assay for transposase-accessible chromatin sequencing (ATAC-seq) (11) (Fig. 1A and Table S1). Our high-depth ATAC-seq data (>1.4 billion reads for both islets) allowed us to identify TF DNA footprints using the CENTIPEDE algorithm (12). We assigned regulatory state and TF footprint status to every islet cis-eQTL based on the annotation of SNPs with r2 > 0.8 with the lead SNP (Fig. 1B). We used iterative conditional analyses (7) to identify 28 T2D and related quantitative trait GWAS SNPs that could be islet cis-eQTL signals (Fig. 1C and Datasets S1 and S2). Given the modest cis-eQTL signals at most of these loci, conditional analysis in larger islet samples will likely change this list.

Table S2.

NSC and RSC scores for H3K27ac and H3K4me3 datasets used in this study

Cell type H3K27ac (NSC) H3K4me3 (NSC) H3K27ac (RSC) H3K4me3 (RSC) QC status used to learn ChromHMM model
Islets 2 2.3 2.21 1.33 Passed
GM12878 1.43 1.24 1.28 0.85 Passed
H1 1.1 1.59 1.13 1.1 Passed
HepG2 1.81 2.42 1.35 1.3 Passed
HMEC 1.56 1.97 1.27 1.19 Passed
HSMM 1.73 2.7 1.34 1.35 Passed
Huvec 1.76 2 1.49 1.15 Passed
K562 1.67 1.3 1.54 1.07 Passed
NHEK 1.73 1.64 1.3 1.07 Passed
NHLF 1.74 1.92 1.42 1.22 Passed
Adipose 1.15 1.6 0.98 1.16 Passed
Anterior caudate 1.2 1.14 1.18 1.09 Passed
CD34-PB 1.5 1.69 1.47 1.07 Passed
Colonic mucosa 1.28 1.72 1.11 1.14 Passed
ES-HUES6 1.1 1.78 1.13 1.3 Passed
Liver 1.38 1.3 1.45 0.96 Passed
Mid-frontal lobe 1.14 1.1 0.88 1.01 Passed
Rectal mucosa 1.59 1.36 1.52 1.11 Passed
Rectal smooth muscle 1.31 1.25 1.28 1.1 Passed
Skeletal muscle 1.37 1.24 1.29 1.07 Passed
Stomach smooth muscle 1.11 1.51 0.83 1.16 Passed
hASC-t1 2.18 2.36 1.4 1.24 Passed
hASC-t2 1.94 2.36 1.36 1.23 Passed
hASC-t3 1.84 2.42 1.41 1.27 Passed
hASC-t4 1.7 2.44 1.31 1.26 Passed
Cingulate gyrus 1.1 1.12 0.61 1.06 Failed
Duodenum mucosa 1.13 1.24 0.6 1.02 Failed
ES-HUES64 1.04 1.88 0.74 1.25 Failed
Hippocampus middle 1.29 1.07 1.32 0.89 Failed
Inferior temporal lobe 1.13 1.14 0.45 1.04 Failed
Substantia nigra 1.07 1.07 0.8 0.86 Failed

Fig. 1.

Fig. 1.

Integrated genomic, epigenomic, and transcriptomic analyses of human pancreatic islets. (A) An overview of diverse molecular profiling data types used in this study. Integrative molecular profiling (open chromatin, ATAC-seq; chromatin states; RNA-seq) highlights islet-specific signatures at the KCNK17 locus. (B) Plot of strength of association (y axis) for significant islet cis-eQTLs colored by chromatin-state annotation (A) by chromosomal location (x axis); diamonds indicate SNPs overlapping ATAC-seq footprints. An interactive version of this plot can be found at theparkerlab.org/tools/isleteqtl/. (C) Plot of strength of islet cis-eQTL association for T2D and related trait GWAS SNPs after conditional analysis to identify variants likely independent of stronger cis-eQTL signals for the same gene by chromosomal position and annotated as in B. The plot includes all GWAS SNP–gene pairs with FDR < 0.05 in original cis-eQTL analysis. The dotted red line represents the P value threshold for FDR < 0.05 based on the conditional analysis. (D) Islet cis-eQTL associated with KCNK17 expression highlighted for comparison with molecular profiling tracks in A. (E) Plot of normalized KCNK17 expression in islet samples and cis-eQTL risk allele dosage. (F) Functional validation of KCNK17 cis-eQTL at its promoter region. The haplotype containing alleles associated with T2D risk and increased KCNK17 expression (rs10947804-C, rs12663159-A, rs146060240-G, and rs34247110-A) shows higher transcriptional activity than the haplotype with nonrisk alleles. The cloned region is indicated at the top of A. Relative luciferase activity is given as mean ± SD of four to five independent clones per haplotype normalized to empty vector. Significance was evaluated using a two-sided t test.

Fig. S1.

Fig. S1.

Thirteen-chromatin-state model built from histone modification ChIP-seq data generated using ChromHMM (9) for 33 cell types (Table S2). (A) Each graph represents the overlap enrichment for 18 cell types of each of our 13 generated chromatin states with the Roadmap Epigenomics (8) reported states. (B) Renaming of generated 13 states (Original State) according to Roadmap Epigenomics overlap enrichments (New State) in A. (C) State numbers, histone mark emission probabilities, state names, and percentage genomic coverage of each chromatin state in human islets.

As an example, T2D GWAS index SNP rs1535500 occurs at the KCNK16 locus, and the risk allele results in a glutamate substitution at alanine 277. This change was implicated in increasing the KCNK16 basal channel activity and cell surface localization when tested in a mouse model (13). Our analysis revealed that rs1535500 is not associated with KCNK16 expression (Fig. S2). Interestingly, the rs1535500 risk allele is associated with increased expression of the neighboring potassium channel gene KCNK17 (Fig. 1 D and E); rs1535500 is in high LD (r2 > 0.95), with four SNPs (rs10947804, rs12663159, rs146060240, and rs34247110) that are located in an islet promoter chromatin state, and all but rs34247110 are located in an ATAC-seq peak (Fig. 1A). Motivated by the overlap with islet regulatory annotations, we cloned two different copies of the 473-bp DNA sequence surrounding these SNPs: one containing the T2D risk alleles for each of four SNPs (risk haplotype) and the other containing the nonrisk alleles (nonrisk haplotype). We performed luciferase reporter assays in the mouse insulinoma (MIN6) beta cell line to test the transcriptional activity of these two clones. Both clones exhibited promoter activity, but the T2D risk haplotype showed significantly greater (P = 0.03) transcriptional activity than the nonrisk haplotype (Fig. 1F). This result suggests that one or more of these T2D risk variants cause increased regulatory activity in islets. These findings highlight a complex functional genetic architecture for a single haplotype that results in regulatory activity linked to one gene (KCNK17) and coding variation in another (KCNK16). Together, these results illustrate how integrated analyses help to identify potential causal SNPs associated with islet expression and T2D risk. To enable easy, in-depth exploration of our results, we created an interactive islet cis-eQTL and chromatin-state browser (theparkerlab.org/tools/isleteqtl/).

Fig. S2.

Fig. S2.

(A) LocusZoom plot showing that a T2D GWAS SNP (rs1535500/chr6:39284050, hg19, purple; other variants in LD colored according to r2) is not associated with KCNK16 expression in islets. (B) Plot for normalized KCNK16 expression and rs1535500 risk allele dosage from mRNA-seq and genotyping data in islet samples.

Common and Islet-Specific Gene cis-eQTLs Are Enriched in Different Chromatin States.

To understand the regulatory architecture of islet cis-eQTLs, we measured their co-occurrence with different classes of chromatin states across diverse tissues, including stretch enhancers, defined as enhancer chromatin states ≥3 kb long. These segments tend to mark cell identity regions and have been shown to harbor tissue-specific GWAS SNPs (2, 14). We calculated genome-wide enrichment for cis-eQTL overlaps with these features while controlling for minor allele frequency, distance to TSS, and the number of SNPs in LD (15). cis-eQTLs were enriched in active chromatin states, such as promoter, and genic enhancer in islets, whereas inactive states, such as polycomb repressed, were depleted for such overlaps across multiple tissues (Fig. S3). Reasoning that this common enrichment pattern across diverse tissues may be largely driven by cis-eQTLs of commonly expressed genes, we sought to classify cis-eQTLs by the islet expression specificity of their associated genes. To measure gene expression specificity in islets, we analyzed RNA-seq data from 16 additional tissues from the Illumina Human Body Map 2.0 project. We used an information theory approach to define the islet expression specificity index (iESI) (Fig. S4) (7). iESI values near zero represent lowly and/or ubiquitously expressed genes, whereas values near one represent genes that are highly and specifically expressed in islets. We divided genes into quintiles based on ascending iESI (Fig. S4). We assigned cis-eQTLs for these genes to their respective iESI quintile and measured enrichment of each set in chromatin annotations. Interestingly, although cis-eQTLs across iESI quintile bins were similarly enriched in islet promoter states, cis-eQTL enrichment in active and stretch enhancer states increased concomitantly with iESI (Fig. S5). As an example, we found that the cis-eQTL for the KCNA6 gene (Fig. S6A), which is expressed in islets with high specificity (iESI = 0.78), overlapped islet-specific enhancer states (Fig. S6B). This cis-eQTL does not overlap a known T2D GWAS locus. When we restricted our enrichment analysis to ATAC-seq peaks in islet stretch enhancer states, we saw a stronger trend toward increasing enrichment by iESI quintile (Fig. S5). These results indicate a strong link between active regulatory chromatin architecture and the genetic control of cell-specific gene expression.

Fig. S3.

Fig. S3.

Fold enrichment of islet eQTLs in chromatin states across cells/tissues.

Fig. S4.

Fig. S4.

iESI. (A) Heat map showing mean FPKM for genes expressed in different tissues when binned by iESI quintiles. (B) Scatterplots showing FPKM for genes expressed in different tissues vs. the iESI. (C) Distribution of iESI by quintile of expression.

Fig. S5.

Fig. S5.

Enrichment of islet cis-eQTLs binned into quintiles by target gene iESI in islet active TSS and stretch enhancer chromatin states (red) and consensus islet intersect ATAC-seq peaks (present in both islet samples) in these states (blue). *P < 0.05 from GREGOR analysis.

Fig. S6.

Fig. S6.

Common and islet-specific gene eQTLs are enriched in different chromatin states. (A) LocusZoom plot of an islet cis-eQTL in the KCNA6 locus. (B) The cis-eQTL for KCNA6, which is in the top quintile of the iESI (iESI 5), overlaps an islet-specific enhancer state. (C) Active enhancer clustering (y axis) across cell types (x axis) reveals cell-specific enhancer regions. Cluster 13 is islet-specific. (D) Degree of overlap of enhancer clusters with stretch enhancers from four cell types. Islet stretch enhancers show the strongest overlap with islet-specific enhancer cluster 13, whereas GM12878 stretch enhancers show the strongest overlap with GM12878-specific enhancer cluster 1. The Jaccard statistic was normalized per column, so that values range from zero (no overlap) to one (maximum observed overlap). (E) Enrichment of islet eQTLs across enhancer clusters reveals that the full set of eQTLs (column 1) is enriched across multiple enhancer clusters, whereas eQTLs for islet-specific genes (iESI quintile 5; column 5) are enriched in the islet-specific enhancer cluster 13. Gray bars indicate nonsignificant after Bonferroni correction.

To further identify and dissect regulatory regions critical for islet-specific gene expression, we sought to distinguish between shared and tissue-specific enhancer chromatin states. We performed k-means clustering for active enhancer chromatin states across 31 cells/tissues. This method segregated enhancer regions based on activity across diverse tissues; for example, cluster 13 is islet-specific, whereas cluster 3 is liver-specific (Fig. S6C). We compared these enhancer clusters with stretch enhancer annotations across tissues and found that tissue-specific clusters, such as the islet-specific cluster 13, indeed displayed high enrichment for islet stretch enhancers (Fig. S6D). Likewise, in other tissues, tissue-specific enhancer clusters were enriched for the corresponding tissues’ stretch enhancers (Fig. S6D). Next, we asked if islet cis-eQTLs were enriched in specific enhancer clusters and observed enrichment in multiple clusters (Fig. S6E). We then stratified the cis-eQTLs by iESI quintile and repeated this analysis. Notably, islet cis-eQTLs for genes in iESI quintile 5 only showed significant enrichment in the islet-specific enhancer cluster 13 (P value = 1.2 × 10−8, fold enrichment = 1.91) (Fig. S6E). Together, these results show that islet tissue-specific genetic regulatory architecture is enriched in islet-specific enhancers and stretch enhancers.

Islet Expression Quantitative Trait Loci Are Enriched in Islet ATAC-Seq Peaks and DNA Footprints.

Chromatin-state maps identify regulatory regions, such as promoters and enhancers, but lack the resolution to pinpoint specific sites that may be bound and regulated by a TF. To refine the link between genetic variation, TF binding sites, and gene expression, we leveraged the high-resolution ATAC-seq data to identify in vivo putative TF binding sites using CENTIPEDE as previously described (7, 12). This approach detected high-quality footprints for many TFs, including the general CCCTC-binding factor (CTCF) and the TF Regulatory Factor X (RFX) (Fig. 2 A and B). Notably, we detect RFX footprints in islet stretch enhancers near the islet-specific (iESI = 0.94) TF RFX6 (Fig. 2A), suggesting an autoregulatory mechanism that, based on recent studies (3, 16), may indicate that RFX6 is an islet core transcriptional regulatory gene. Comparing ATAC-seq profiles from islets with those of skeletal muscle tissue (7), adipose tissue (17), and a lymphoblastoid cell line (GM12878) (11), we found that islet ATAC-seq peaks occurred preferentially in islet promoter and enhancer chromatin states (Fig. S7). Islet cis-eQTLs were highly enriched in multiple TF footprint motifs but were not in nonfootprint motifs (Fig. 2C and Dataset S3). These results suggest a strong link between SNPs at TF binding sites in relevant tissues and gene regulation.

Fig. 2.

Fig. 2.

Nucleotide resolution islet ATAC-seq profiling nominates regulatory mechanisms. (A) RFX6 locus with expression (RNA-seq), chromatin states, open chromatin (ATAC-seq), and footprints for CTCF and RFX in islets. (B) Density plots indicating normalized sequence coverage of ATAC-seq from two human islet samples at sites overlapping CTCF (motif = CTCF_known2) and RFX (motif = RFX2_4) motifs. (C) Log twofold enrichment of islet cis-eQTLs in TF footprint motifs compared with their enrichment in TF nonfootprint motifs. TFs for which footprint and nonfootprint motifs overlap four or more eQTL SNPs are shown. Blue shows significant enrichment in footprints only (Bonferroni corrected P < 0.05). No significant enrichment was observed in any TF nonfootprint motif. (D) Reconstruction of CTCF (motif = CTCF_known2) and RFX (motif = RFX2_4) motifs using ATAC-seq TF footprint allelic bias data. Row 1: original motif PWM. Row 2: PWM genetically reconstructed using the overrepresented alleles (and extent of overrepresentation) for SNPs with significant ATAC-seq allelic bias. Row 3: count of nucleotides in SNPs with significant allelic bias. Row 4: PWM reconstructed using the count of nucleotides for heterozygous SNPs in the TF footprint. Row 5: count of nucleotides in heterozygous SNPs in the TF footprint.

Fig. S7.

Fig. S7.

Enrichment of islet, muscle, GM12878, and adipose ATAC-seq peaks (columns) in chromatin states across diverse tissues (y axis). Consensus (islet intersection) and individual (islets 1 and 2) islet ATAC-seq peaks show enrichment for active chromatin states in islets, which is more pronounced at TSS-distal (>5 kb from TSS) regions. Muscle (column 4), GM12878 (column 5), and adipose (column 6) ATAC-seq peak calls show similar trends with chromatin states from matched tissues. Note that TSS-distal ATAC-seq peaks from the islet intersect dataset overlap islet active enhancers more than any other chromatin state in islets. Note also that the level of islet enhancer overlap is larger than enhancer overlap in any other tissue.

To detect motif occurrences that could be altered by the presence of nonreference alleles, we developed a personalized phased SNP-aware genome motif scanning procedure (SI Materials and Methods). This method allowed us to identify motif instances, even when multiple nonreference alleles occur within a few base pairs of each other. We observed significant enrichment for islet cis-eQTLs in the set of TF footprint motifs identified only from this haplotype phase-aware scanning approach (that is, the motifs are missed even when a single SNP-aware motif scanning approach is used) in both islet samples (Fig. S8). Given the informative chromatin accessibility allelic analyses in recent studies (18, 19), we next asked if we could recreate known TF position weight matrices (PWMs) (Fig. 2D, row 1) based on the allele-specific bias at heterozygous SNPs within TF footprint motifs. We identified every heterozygous site in a given TF footprint motif, calculated the allelic bias in ATAC-seq signal at these positions, and retained all SNPs with significant bias (Fig. 2D, row 3 and SI Materials and Methods). We genetically reconstructed a PWM using the degree of allelic bias for the overrepresented alleles (Fig. 2D, row 2). This allelic bias-based PWM (Fig. 2D, row 2) closely matched the canonical PWM for the corresponding TF (Fig. 2D, row 1), providing an in vivo verification of the cognate PWM. There was a larger difference in the PWM score for the two alleles of allelic bias SNPs than for the two alleles of matched the 1000 Genomes Project (1000G) SNPs occurring in the same motif (Fig. S9). To further verify that the allelic bias-based genetically reconstructed PWMs were not simply reflecting the allelic composition of SNPs in the motifs, we constructed PWMs using the allele count for all TF footprint heterozygous SNPs observed at each position (where each observed SNP contributed two alleles) and found that the resulting PWMs had little information and little similarity to the cognate motifs used to scan across the genome (Fig. 2D, rows 4 and 5). Collectively, these results reinforce the potential of ATAC-seq and allelic footprinting analyses to identify relevant and potentially causal TF binding changes in the genetic control of gene expression.

Fig. S8.

Fig. S8.

Enrichment of islet cis-eQTLs (5% FDR) in ATAC-seq TF footprints that are only detected using phased SNP-aware scans (Materials and Methods). *P < 0.05 from GREGOR analysis.

Fig. S9.

Fig. S9.

SNPs that show allelic bias in ATAC-seq data (ab; blue box plot) exhibit larger effects on the predicted TF binding site motifs compared with randomly sampled 1000G SNPs (1000G; red box plot) overlapping the same footprint in islet 1. The y axis shows absolute value of the delta score [delta = −log10(FIMO P value of alternate sequence) − (−log10(FIMO P value of reference sequence))]. P values of the comparisons were determined by the Wilcoxon rank sum of test. (A) Footprints motif = RFX2_4. (B) Footprint motif = CTCF_known2.

T2D GWAS Loci Are Enriched in RFX Footprints, and T2D Risk Alleles Disrupt the Motifs at Independent Locations.

Given the strong enrichment for islet cis-eQTL in diverse TF footprints, we next sought to identify T2D GWAS SNPs that could regulate gene expression by modulating TF binding. We found that T2D-associated SNPs were significantly enriched in islet RFX TF footprints (Fig. 3A and Dataset S4). In contrast, we did not see significant enrichment of T2D-associated SNPs in islet nonfootprint RFX TF motifs or GM12878 TF footprints (Fig. 3A). The RFX family of TFs recognizes X-box motifs and has highly evolutionarily conserved DNA binding domains (20), which may explain why similar motifs from many RFX family members are enriched. A recent study found enrichment of T2D GWAS SNPs in islet FOXA2 ChIP-seq peaks (21). We observed enrichment of T2D-associated SNPs in islet FOX TF footprints, although none passed the Bonferroni threshold of 2.5 × 10−5 (Dataset S4).

Fig. 3.

Fig. 3.

T2D GWAS enrichment at islet footprints reveals confluent RFX motif disruption. (A) T2D GWAS SNPs are significantly enriched in RFX motifs in islet footprints but not in control motifs or footprints from a nondisease-relevant cell type (GM12878). TF motifs for which footprints overlap four or more T2D GWAS SNPs are shown. The red line indicates Bonferroni multiple testing threshold. (B) T2D-associated SNPs that overlap high information content (>1 bit) positions in RFX motifs. The highest scoring RFX footprints are reported for each T2D GWAS SNP. Act. Enh., active enhancer; Act. TSS, active TSS; Wk. Transc., weak transcribed. *Chromatin-state annotation overlapping the SNP. Because RFX motifs in C are organized by alignment to the longest RFX3_1 motif, motifs overlapping rs10947804 and rs1716165 correspond to the reverse complement sequence. Therefore, risk and nonrisk alleles are also reported as reverse complement relative to the plus strand sequence. (C) Alignment of highest scoring RFX footprint at each SNP; the boxes indicate the SNP overlap positions. Note that, in every case, the risk allele disrupts that motif.

Studies of autoimmune disease have found that disease-associated variants often occur near but not in TF motifs (22). We, therefore, asked if T2D-associated SNPs were enriched in regions flanking RFX footprints motifs (n = 22). We found that regions flanking RFX footprint motifs were enriched for T2D-associated SNPs and that the enrichment decreased with increasing distance from footprint motifs (Fig. S10). The flanking enrichment was lower than in the RFX TF footprints. In contrast, we did not see enrichment of T2D-associated SNPs in nonfootprint RFX TF motifs or the regions flanking the nonfootprint RFX TF motifs (Fig. S10).

Fig. S10.

Fig. S10.

Enrichment for T2D GWAS SNPs in regions flanking merged RFX footprint (red) and nonfootprint (blue) motifs.

We next assessed the potential effects of the risk and nonrisk alleles for nine T2D-associated SNPs at five independent loci on RFX TF binding (Fig. 3B). For each SNP, the nonrisk allele was the highest probability nucleotide in the RFX PWM, and thus, the risk allele was predicted to disrupt the motif (Fig. 3 B and C, black boxes). At two of five loci, the T2D GWAS risk alleles were associated with significantly increased gene expression in our conditional eQTL analysis: KCNK17 (KCNK16 locus) (Fig. 1 B, C, and E) and ABCB9 (PITPNM2 locus) (Fig. 1C). Other loci might not have been detectable as cis-eQTLs because of state-specific regulation or small effect sizes. The observation that T2D risk alleles at multiple loci confluently disrupt RFX footprint motifs provides a hypothesis that could explain the mechanism of a subset of T2D-associated variants.

Discussion

We have integrated genome, epigenome, and transcriptome variation and created maps to better understand the genetic control of islet gene expression. Comparison of these maps with T2D GWAS SNPs has helped identify potential disease mechanisms. For example, the risk allele of the coding SNP rs1535500 has been implicated to increase KCNK16 activity and cell surface localization in a mouse model (13). Other risk alleles in SNPs in high LD with rs153550 are associated with increased expression of the neighboring potassium channel gene KCNK17, which is not in the mouse genome. KCNK16 and KCNK17 are two pore domain “background” K+ channels, members of the TWIK-related alkaline pH-activated K+ channel family (23, 24). Both genes are expressed in islets with high specificity (KCNK16 iESI = 0.98; KCNK17 iESI = 0.76). KCNK16 has been implicated in regulating electrical excitability and glucose-stimulated insulin secretion (GSIS) (13). It is possible that the T2D risk haplotype at this locus may have multiple effects that collectively disrupt islet K+ signaling and GSIS by simultaneously overactivating KCNK16 and overexpressing KCNK17.

We find that T2D GWAS-associated SNPs are significantly enriched in RFX TF footprint motifs. We find consistent disruption of islet RFX footprint motifs by T2D risk alleles, including at the KCNK17 locus. Lizio et al. (25) found that knockdown of RFX6 results in increased expression of KCNK17, which is consistent with the T2D risk allele disrupting TF binding and increasing target gene expression. At other T2D GWAS loci, such as the MPHOSPH9 locus (index SNP rs1727313), two or three T2D GWAS SNPs in high LD are each predicted to have risk alleles that coordinately disrupt independent RFX footprint motifs (Fig. 3 B and C). We and others (2, 26, 27) previously described the presence of multiple SNPs in enhancers at individual GWAS loci. Our results build on this concept to include the possibility of multiple confluent disruptions of similar TF motifs in the same locus. Collectively, these results indicate that T2D risk may, in part, be propagated through genetic modulation of RFX binding in islets. Indeed, our study shortlists only a subset of T2D-associated variants as candidates that should be functionally dissected in vivo.

Among the RFX TFs, RFX6 is expressed in islets with high specificity (iESI = 0.94) (Fig. S11) and involved in pancreatic progenitor specification, endocrine cell differentiation, maintenance of beta cell functional identity, and control of glucose homeostasis (2830). Beta cell-specific deletion of RFX6 results in impaired insulin secretion (31, 32). Individuals who are heterozygous for a frameshift mutation in RFX6 have increased 2-h glucose levels (33). Importantly, rare autosomal recessive mutations that alter DNA-contacting amino acids in the DNA binding domain of RFX6 result in Mitchell–Riley syndrome, which is characterized by neonatal diabetes (29). Although RFX6 was not in our motif library, a recent report found it to be highly similar to the other RFX family motifs (25), consistent with the expectation for highly conserved DNA binding domains (20). Our findings could represent a connection between rare coding variation in the islet master TF RFX6 (30, 31) and common noncoding variations in multiple target sites for this TF. The impact of these variations mirror the expected physiological effect, with coding variants that result in neonatal diabetes and noncoding variants that result in later-onset T2D. This study implicates impaired RFX-dependent transcriptional responses in genetic susceptibility to T2D and nominates mechanistic hypotheses about the molecular genetic pathogenesis of this complex disease. Following up on the reported loci to functionally validate this hypothesis could help in better understanding T2D mechanisms. Given that most other GWAS SNPs are noncoding, this approach could be used to identify other master TF and multiple target site relationships.

Fig. S11.

Fig. S11.

RFX gene expression (FPKM) across islets and 16 Illumina Body map 2.0 tissues. The iESI quintile for each RFX gene is labeled in the islet columns. RFX6 has the highest iESI (0.94) among all RFX TF genes.

Materials and Methods

A detailed description of computational and experimental analyses is provided in SI Materials and Methods. Briefly, we conducted high-depth, strand-specific mRNA-seq and dense genotyping in human islets followed by cis-eQTL analysis. We integrated the cis-eQTL maps with chromatin-state annotations generated from ChIP-seq datasets for different histone modifications across diverse cell types. We profiled open chromatin in two islet samples using ATAC-seq and carried out TF footprinting using a library of motifs.

SI Materials and Methods

Islet Procurement and Processing.

We procured the islet samples used in this study from the Integrated Islet Distribution Program, the National Disease Research Interchange (NDRI), or ProdoLabs. Table S1 contains demographic and other reported information for 31 islets [age = 45.3 ± 11.9 y; 52% male; body mass index (BMI) = 27.2 ± 4.3 kg/m2] that passed mRNA-seq and genotype quality control (QC) steps (see below) and two samples used for ATAC-seq. Islets were shipped overnight from the distribution centers. On receipt, we prewarmed islets to 37 °C in shipping media for 1–2 h before harvest; ∼2,500–5,000 islet equivalents (IEQs) from each organ donor were harvested for RNA isolation. We transferred 500–1,000 IEQs to tissue culture-treated flasks and cultured them as in the work in ref. 34; genomic DNA isolated from islet explant cultures was used for genotyping.

SNP Genotyping, Sample, and Genotype QC.

Genomic DNA was genotyped at the Genetic Resources Core Facility of the Johns Hopkins Institute of Genetic Medicine on the HumanOmni2.5–4v1_H BeadChip Array (Illumina): minimum call rate was 97.14%. We mapped the Illumina array probe sequences to the hg19 genome assembly using BWA. We excluded SNPs with ambiguous probe alignments, SNPs with 1000 Genomes (1000G) phase 1 variants with minor allele frequency of ≥1% within 7 bp of the 3′ end of probes, or call rates <95%. All alleles were oriented relative to the reference.

We identified no individuals with greater than or equal to third-degree relatedness using KING (35). We performed principal components analysis (PCA) using PLINK 1.9 (www.cog-genomics.org/plink2/general_usage) on 60,714 SNPs with minor allele count (MAC) > 5 and r2 < 0.2 after excluding SNPs from regions of high LD; 33 self-reported Caucasian samples and 1 sample of unknown ethnicity were clustered together by PCA. One sample self-reported to be of Caucasian ancestry did not cluster with the others and was excluded for eQTL analyses.

Fadista et al. (5), Lund University Diabetes Centre, Department of Clinical Sciences, Skåne University Hospital Malmö, Lund University, Malmo, Sweden, provided genotypes of their 89 islet samples for the Illumina HumanOmniExpress 12v1C BeadChip. We processed the probes and genotypes as described above. We identified no individuals with greater than or equal to third-degree relatedness. We performed PCA as described above using 86,502 LD pruned SNPs. All 89 samples clustered together in the PCA.

RNA Isolation, mRNA-Seq Library Preparation, and mRNA Sequencing.

We extracted and purified total RNA from 2,000–3,000 IEQs using TRIzol (Life Technologies). RNA quality was confirmed with Bioanalyzer 2100 (Agilent); samples with RNA integrity number (RIN) > 6.5 were prepared for mRNA sequencing. We added External RNA Control Consortium (ERCC) spike-in controls (Life Technologies) to 1 μg total RNA. We generated PolyA+ stranded mRNA RNA-sequencing (RNA-seq) libraries for each islet using the TruSeq Stranded mRNA Kit according to the manufacturer’s protocol (Illumina). Each islet RNA-seq library was barcoded, pooled into 12-sample batches, and sequenced over multiple lanes of HiSeq. 2000 to obtain an average depth of 100 million 2 × 101-bp sequences.

mRNA-Seq Processing and QC.

We retained RNA-seq reads passing the Illumina chastity filter and mapped reads to a reference sequence composed of ERCC control fragments and all chromosomes and contigs from hg19, excluding alternate haplotypes, replacing the mitochondrial sequence (chrM) with the Cambridge Reference Sequence and masking the pseudoautosomal region on chromosome Y. We aligned reads using STAR (version 2.3.1y) (36) with default parameters and a splice junction catalog based on Gencode v19. Nonuniquely mapping reads and read pairs with unpaired alignments were discarded. Duplicate read pairs (i.e., those mapping to the same coordinates) were retained.

RNA-seq QC was performed at the level of read groups (i.e., a library on a lane) using QoRTs (37). We inspected the comprehensive set of QC metrics generated by QoRTs for outlying libraries, lanes, and sequencing runs. We used 92 ERCC RNA spike-in controls and in-house scripts to assess library quality and batch effects and check the accuracy of the strand-specific protocol. We also performed a PCA on the matrix of expression data. The QoRTs and PCA processes revealed two outlying sample libraries. One sample showed extreme 3′ bias in gene body coverage, and the other showed low gene diversity and was a strong outlier by PCA. In addition, we excluded one sample that was reportedly Caucasian but an outlier in the genotype PCA (see above). These three libraries were removed, leaving 31 islet samples for analysis.

To confirm sample identity and check for contamination, we compared SNP chip genotypes with RNA-seq alignments in annotated exonic regions using verifyBamID (38) and the maxDepth 100 option to avoid having highly expressed genes bias the estimate of contamination. No sample showed contamination > 0.78%.

We aligned the nonstrand-specific RNA-seq reads from the work by Fadista et al. (5) with the same version of STAR to the same hybrid reference genome. Again, we discarded nonuniquely mapped reads and read pairs with unpaired alignment, and we retained duplicate pairs. We performed QC using QoRTs and PCA of the expression data as described above and identified one outlier library. In addition, comparison of SNP chip genotypes with RNA-seq alignments with verifyBamID identified two swapped samples and five samples that had greater than 2% estimated contamination in the RNA-Seq sample. We removed all 8 samples, leaving 81 samples to be analyzed.

Expression Quantification.

To study regulatory variation, we performed analyses at the gene level. Definitions for all transcriptome features were based on GENCODE v19, which annotates a total of 57,820 genes: 20,345 protein-coding, 13,870 long noncoding RNAs, and 14,206 pseudogenes. We ignored pseudogenes for all downstream analyses. We counted fragments mapping to genes using htseq-count v0.5.4 (39) (www-huber.embl.de/users/anders/HTSeq/doc/count.html) and calculated fragment per kilobase of transcript per million mapped reads (FPKM) values for each gene.

We processed data from the work by Fadista et al. (5) in the same way, except that counts of genes were performed in a nonstrand-specific manner, consistent with the RNA-seq libraries.

Imputation.

We excluded SNPs with MAC < 1, Hardy–Weinberg equilibrium P value < 10−6, absolute alternate allele frequency difference > 0.2 compared with the 1000G EUR sample, and A/T or C/G SNPs with minor allele frequency (MAF) > 0.2. This procedure left 2,057,703 autosomal SNPs for subsequent imputation. We performed autosomal SNP imputation using a two-step strategy (40) with the haplotypes from 1000G phase3 v5 (41) as the reference panel. To improve phasing quality given the small number of islet samples, we prephased our islet samples together with 2,504 reference panel samples using ShapeIT, version 2 (42). We then imputed genotypes with Minimac2 (43). We retained 8,377,422 imputed variants with an MAC ≥ 1 and r2 ≥ 0.3.

For 81 islet samples from the work by Fadista et al. (5), we removed SNPs and prephased and imputed genotypes as described above. We used 692,118 SNPs for imputation. We retained 9,758,857 imputed variants with an MAC ≥ 1 and r2 ≥ 0.3.

cis-eQTL Meta-Analysis.

We performed separate cis-eQTL analysis for our islets (n = 31) and islets from the work by Fadista et al. (5) (n = 81) and combined the results using meta-analysis. We performed PCA (using the same procedure described in SI Materials and Methods, SNP Genotyping, Sample, and Genotype QC) separately on these two sets of samples. We considered for analysis 6,060,203 SNPs that were present in both studies and had a combined MAC ≥ 10. We tested SNPs within 1 Mb of the most upstream TSS of each gene using Matrix eQTL (44). We included in the analysis 19,360 genes present in both sets of samples [of 26,845 genes present in our islets and 19,650 present in islets from the work by Fadista et al. (5)]. For individual i and gene j, to generate the gene expression value Yij, we inverse-normalized FPKMji for each gene j. We then performed factor analysis via PEER (45, 46) on the inverse-normalized FPKM [specifying from 1 to 60 factors to optimize the detection of cis-eQTLs (below) and including age, sex, the top two genotype-based principal components, and for our islet samples only, experimental batch as covariates in the model] and inverse-normalized the resulting residuals. We used the linear regression model with an additive genetic effect:

Yij=α+βjsGis+εij,

where α is the intercept, Gis is the imputed allele count for SNP s for individual i, βjs is the regression coefficient of the imputed allele count for SNP s on transformed gene expression Yij, and εij is a normally distributed error term with mean of zero and variance σ2.

We used FDR (47) to account for multiple testing and considered as significant associations with FDR ≤ 5%. We expect that removing technical and biological variation via PEER will increase power to detect cis-eQTLs (7). For each study, we report results using the number of PEER factors that maximized the number of eQTLs on chromosome 20 at FDR ≤ 5%: 30 for our islets and 32 from the work by Fadista et al. (5).

For each SNP–gene pair, we combined the results from our islet samples with those from the work by Fadista et al. (5) using a sample-sized weighted meta-analysis (48) and report P values based on this analysis. In addition, we performed a fixed effects inverse variance-weighted meta-analysis (48) and report eQTL effect sizes from this analysis. We do not report P values from this analysis, because we found that the P values were consistently inflated. We present results for SNPs present in both studies (MAC ≥ 1) and with MAC ≥ 10 in the combined study.

Gene-Based cis-eQTLs for GWAS Variants for T2D and Related Traits.

We compiled a list of 225 SNPs with P value < 5 × 10−8 in GWAS (GWAS SNPs) for T2D, fasting glucose, fasting glucose adjusted for BMI, fasting insulin, fasting insulin adjusted for BMI, 2-h glucose, 2-h glucose adjusted for BMI, and fasting proinsulin from the National Human Genome Research Institute (NHGRI) GWAS catalog (49) and carried out manual curation of the literature to create a comprehensive list that was up to date as of May of 2014. Of these 225 GWAS SNPs, 214 were tested in our cis-eQTL analysis for a total of 3,995 GWAS SNP–gene pairs. To identify GWAS variant cis-eQTLs that may be independent of other stronger cis-eQTLs for the same gene, we performed iterative conditional analysis on each of 3,995 GWAS SNP–gene pairs. For each GWAS SNP–gene pair and study, we used the linear regression model with an additive genetic effect:

Yij=α+βjGWASGiGWAS+βjsGis+εij,

where GiGWAS is the imputed allele count for the GWAS SNP for individuals i, βjGWAS is the regression coefficient of the imputed allele count for the GWAS SNP, and Gis is the set of SNPs within 1 Mb of the most upstream TSS. We combined the results from our samples and the islets from the work by Fadista et al. (5) using meta-analysis as described above. If greater than or equal to one SNP had a meta-analysis P value < 1.2 × 10−4 (corresponding to the P value threshold for gene-based cis-eQTLs with FDR < 5%), we retained the SNP with the most significant P value in the model and repeated the procedure until no added SNP had a P value < 1.2 × 10−4. This procedure corresponds to performing stepwise forward selection of SNPs within 1 Mb of the most upstream TSS based on the results of the meta-analysis at each step (using a stopping threshold P value of 1.2 × 10−4). The conditional P value for a given GWAS SNP is the P value for βjGWAS from the final model. We considered as significant conditional associations with FDR ≤ 5% based on 3,995 GWAS SNP–gene pairs.

Functional Validation of eQTL Variant Activity and Direction of Effect.

We maintained the MIN6 mouse insulinoma beta cell line (50) as previously described (51). We amplified a 473-bp genomic region containing rs10947804, rs12663159, rs146060240, and rs34247110 from human DNA (primers: 5′-GCCAGGTAAGCCAGGTA-3′ and 5′-GAGTGCGGTTTCCAGAAGTC-3′) and cloned it into the pGL4.10 promoterless vector (Promega) as previously described (51). The region was cloned in the forward orientation with respect to KCNK17 transcription and includes the promoter, 5′-UTR, and the first 34 codons of KCNK17. We performed site-directed mutagenesis with the QuikChange Lightning Site-Directed Mutagenesis Kit (Agilent) to change the KCNK17 start codon from ATG to AGG to prevent interference with translation and function of the luciferase protein. We amplified the KCNK17-increasing and -decreasing haplotypes. The haplotype of alleles associated with higher KCNK17 expression (risk haplotype) includes rs10947804-C, rs12663159-A, rs146060240-G, and rs34247110-A. The haplotype associated with lower KCNK17 expression (nonrisk haplotype) includes rs10947804-T, rs12663159-C, rs146060240-deletion, and rs34247110-G. We performed luciferase assays as previously described (51). We plated 200,000 cells per well in a 24-well plate and transfected after 24 h. We cotransfected 250 ng haplotype plasmid and 80 ng Renilla plasmid in duplicate wells using Lipofectamine LTX (Life Technologies) and assayed luciferase activity 48 h posttransfection. For the KCNK17-increasing haplotype, we transfected four independent clones, and for the KCNK17-decreasing haplotype, we transfected five independent clones. Each independent clone was transfected in duplicate. Quantified luciferase activity was normalized to empty vector, and we tested for difference in luciferase activity between haplotypes using a two-sided t test. We observed the same transcriptional effect in three separate experiments.

Analysis of Islet-Specific Expression.

We used an information theory approach similar to that in our previous study (7) to score genes based on islet expression level and specificity relative to the panel of 16 diverse Illumina Human Body Map 2.0 tissues. We first calculated expression (x) in FPKM values for all Gencode v19 genes across a representative islet sample and each of 16 tissues in the Body Map 2.0 data. We calculated the relative expression of each gene (g) in islets compared with all 17 tissues (t) as p:

pg,islet=xg,islett=117xg,t.

We next calculated the entropy for expression of each gene across all 17 tissues as H:

Hg=t=117pg,tlog2(pg,t).

We defined islet tissue expression specificity (Q) for each gene as

Qg,islet=Hglog2(pg,islet).

To aid in interpretability, we divided Q for each gene by the maximum observed Q and subtracted this value from one; we refer to this new score as the iESI:

iESIg=1Qg,isletQmax,islet.

iESI scores near zero represent lowly and/or ubiquitously expressed genes, and scores near one represent genes that are highly and specifically expressed in islets. We divided genes and subsequently, the associated eQTL variants into quintiles as shown in Figs. S4, S5, and S6E based on the iESI score. The division by quintile provided an average sample size of 618 lead eQTL variants in each bin, which we then used to compute enrichments in genomic features. In our previous analysis of skeletal muscle, we used a decile approach, because we detected lead eQTLs in >90% of testable genes (7), whereas here, we detected a lower number of eQTLs (3,964) and thus, used a quintile approach. To depict a higher-resolution partitioning of genes based on the iESI score, we used deciles in the interactive eQTL browser (theparkerlab.org/tools/isleteqtl/).

Chromatin-State Analyses.

We collected cell/tissue ChIP-seq reads for H3K27ac, H3K27me3, H3K36me3, H3K4me1, and H3K4me3 and input from a diverse set of publically available data (2, 810). Collectively, these data represent 31 cells/tissues (shown in Fig. S6C) as well as eight additional human and rodent datasets included for other ongoing projects. We performed read mapping and integrative chromatin-state analyses in a manner similar to that of our previous reports (2, 7) and followed quality control procedures reported by the Roadmap Epigenomics Study (8). Briefly, we mapped reads using BWA (version 0.5.8c), removed duplicates using samtools, and filtered for mapping quality score of at least 30. To assess the quality of each dataset, we performed strand cross-correlation analysis using phantompeakqualtools (v2.0; code.google.com/p/phantompeakqualtools) (8). To select cells/tissues for ChromHMM to learn chromatin states and following the Roadmap Epigenomics practices, for each tissue, we performed QC on the most well-defined peak datasets, H3K27ac and H3K4me3. We required each of these two marks within a tissue/cell type to have normalized strand cross-correlation (NSC) > 0.8 and relative strand cross-correlation (RSC) >1.1. Islets and 32 other cell/tissue types of 39 passed these criteria (Table S2). The failed samples are consistent with the Roadmap Epigenomics Study analyses: the five brain tissues and ES-HUES64 did not pass these criteria. To more uniformly represent datasets with different sequencing depths, we randomly subsampled each dataset containing >20 million mapped reads to a depth of 20 million. Chromatin states were learned jointly from 33 cell/tissues that passed QC by applying the ChromHMM (version 1.10) hidden Markov model algorithm at 200-bp resolution to five chromatin marks and input (9, 52, 53). We ran ChromHMM with a range of possible states and selected a 13-state model, because it most accurately captured information from higher-state models and provided sufficient resolution to identify biologically meaningful patterns in a reproducible way. We have used this state selection procedure in previous analyses (2, 7). To assign biological function names to our states that are consistent with previously published states, we performed enrichment analyses in ChromHMM comparing our states with the states reported by Roadmap Epigenomics (in their “extended” 18-state model) (8) for 18 matched cells/tissues (Fig. S1). We assigned the name of the Roadmap state that was most strongly enriched in each of our states. We then applied our chromatin-state model to obtain chromatin-state segmentations for six cell/tissue types that were not used to learn the model using ChromHMM MakeSegmentation.

Clustering by Enhancer States Across Tissues.

To identify patterns of active enhancer chromatin-state calls across cell and tissues, we performed k-means clustering using 200-bp genomic windows, where ChromHMM posterior probability for active enhancer state 1 or 2 was greater than 0.95 in at least one cell/tissue type used in this study. We identified an optimal number of clusters by plotting the within-group sum of squares vs. number of clusters for a range of k and selected k = 60, which corresponded to the “elbow” in the plot. We performed k-means clustering using the Hartigan–Wong algorithm with 10,000 iterations and 50 random starts.

Overlap of Enhancer Clusters with Stretch Enhancers.

We called stretch enhancers for all cells/tissues in our chromatin-state segmentations as in our previous work (2, 7) by merging adjacent enhancer states (Active Enhancers 1 and 2, Weak Enhancer, and Genic Enhancer) in a given tissue and identifying contiguous regions ≥3 kb. We quantified the overlap between each enhancer cluster and stretch enhancers for islets, liver, H1, and GM12878 using the Jaccard statistic. In Fig. S6C, we normalized the Jaccard statistic within each column, such that the maximum is set to one.

Enrichment of Genetic Variants in Genomic Features.

We calculated the enrichment of lead islet cis-eQTL or lead T2D GWAS SNPs [including SNPs in r2 ≥ 0.8 with the lead SNP (SNPs in LD)] in features, such as chromatin states, stretch enhancers, enhancer clusters, or TF footprint or nonfootprint motifs, using GREGOR (15). TF nonfootprint motifs (shown in Figs. 2C and 3 A and B) are defined as TF motifs that are not called as footprints in either of the islet samples. For eQTL enrichment, we included the lead cis-eQTL SNP for genes significant at a given FDR threshold. The enrichment trends were consistent across different FDR thresholds (5, 1, and 0.1%), with more stringent sets having slightly more pronounced trends. We report here the results for the FDR ≤ 5% set. For T2D GWAS SNP enrichment, we aimed to use independent T2D association signals (i.e., reported lead T2D SNPs that were not in LD with each other). We sorted the list of lead GWAS SNPs (defined in SI Materials and Methods, Gene-Based cis-eQTLs for GWAS Variants for T2D and Related Traits) by P value of association with T2D and sequentially removed SNPs with r2 > 0.2 with a higher ranked SNP.

For each input SNP, ∼500 control SNPs were selected that matched the input SNP for MAF, distance to the gene, and number of SNPs in r2 ≥ 0.8. Fold enrichment is calculated as the number of loci at which the index SNP (or SNP in LD) overlaps the feature over the mean number of loci at which the matched control SNPs (or SNPs in LD) overlap the same feature. This process accounts for the length of the features, because longer features will have more overlap by chance with control SNP sets. We used the following parameters in GREGOR: r2 threshold (for inclusion of SNPs in LD with the lead eQTL or T2D GWAS SNP) = 0.8, LD window size = 1 Mb, and minimum neighbor number = 500. For both eQTL and GWAS SNP enrichment of TF footprint and nonfootprint motifs, we report results for SNP feature overlaps four or more to avoid artifacts caused by low overlaps.

Open Chromatin Profiling (ATAC-Seq).

We profiled chromatin accessibility in islets from two human organ donor samples (Table S1), which were genotyped using methods identical to the other samples (see above), using ATAC-seq; ∼50–100 IEQs from each sample were transposed in triplicate following the methods in ref. 11. ATAC-seq replicates were barcoded and sequenced 2 × 125 bp on a HiSeq 2000 to combined total depths of >831 million reads for islet 1 and >585 million reads for islet 2.

For each library, we performed read alignment, duplicate removal, and filtering as described in our previous study (7). We next pooled all replicates for each sample and called peaks using MACS2 (https://github.com/taoliu/MACS), version 2.1.0, with flags “-g hs–nomodel–shift -100–extsize 200 -B–broad–keep-dup all,” retaining all peaks that satisfied a 5% FDR.

Haplotype-Aware PWM Scans.

To detect potential transcription factor binding sites (TFBSs) in a haplotype-aware manner, we generated personalized diploid genomes from the phased, imputed genotypes for each of two islet samples using vcf2diploid [v0.2.6a (54)]. We scanned each haplotype using the find individual motif occurrences (FIMO) tool with PWMs from a library that we previously described (7). We ran FIMO using the observed nucleotide frequencies from the hg19 reference [40.9% guanine-cytosine (GC) content] and the default P value cutoff (1 × 10−4). We converted the resulting hits to reference coordinates using chainSwap and liftOver with −minMatch = 0.1 and merged the results from two haplotypes into a single set of results per motif per sample. As an example, for islet 1, this procedure produced a total of 2.16 billion motif matches from our motif database. Of these motifs, 610,544 (0.0283%) are not detected in a single SNP-aware motif-scanning procedure.

ATAC-Seq Footprints.

We used CENTIPEDE (12) to call footprints in the islets ATAC-seq data. Briefly, for each PWM scan result, we built a matrix encoding the number of transposase (Tnp) Tn5 integration events at a region ±100 bp from each motif occurrence. To increase the amount of information given as input for the algorithm, we split the ATAC-seq signal into three different categories based on the diverse fragment length distributions: 36–149, 150–324, and 325–400 bp. We considered any given motif occurrence bound if both the CENTIPEDE posterior probability was ≥0.99 and its coordinates were fully contained within an ATAC-seq peak.

Genetic Reconstruction of PWMs Using ATAC-Seq Footprint Allelic Bias Data.

Previous studies have identified signatures of allelic bias in chromatin accessibility data at TF footprints (18, 19). Motivated by these observations, we used the heterozygous genotype calls from our islet ATAC-seq samples and the alleles observed in the reads to quantify allelic bias in regions of open chromatin (ATAC-seq TF footprints). To diminish reference allele mapping bias of our mapped ATAC-seq reads, we used the WASP mapping pipeline and duplicate removal tool (55) (downloaded from GitHub on February 19, 2016). To avoid double counting alleles that may be covered by each read in a pair as a result of occurring on a short fragment, we clipped overlapping read pairs using the ClipOverlap function of BamUtil. We included properly paired and mapped reads with mapping quality of ≥30 and base quality of ≥20. We restricted our analyses to the set of heterozygous SNPs calls within each sample (see above for genotype information). For each SNP, we counted the number of reads containing each allele. Because we did not have sufficient statistical power to call allelic bias at SNPs with low coverage, we included only SNPs with ≥20× coverage to reduce the multiple testing burden. To help protect against mapping artifacts, we excluded SNPs with ≤2× coverage for either allele.

We used a two-tailed binomial test that accounted for reference allele bias to evaluate the significance of the allelic bias at each SNP in each sample. We estimated the allelic bias expected under the null for each sample and reference–alternate allele pair as previously described (7). Briefly, for each sample and each reference–alternate allele pair (e.g., AG and GA are separate reference–alternate allele pairs), we calculated the expected fraction of reference alleles (fracRef) as the sum of the reference allele count divided by the sum of the total allele count for SNPs of a given reference–alternate allele pair. To prevent SNPs of high coverage from biasing the fracRef, we down-sampled SNPs with coverage in the top 25th percentile to 30× coverage and used the down-sampled reference allele and total count. To prevent SNPs of low coverage from biasing the mean fracRef, only SNPs with a total read coverage ≥ 30 were used. We used the observed sample and allele pair-specific fracRef as the true fracRef under the null hypothesis of no allele-specific expression (ASE) in the binomial test. We did not test SNPs in regions blacklisted by the ENCODE Consortium because of poor mappability (wgEncodeDacMapabilityConsensusExcludable.bed and wgEncodeDukeMapabilityRegionsExcludable.bed). We performed the binomial test using R’s binom.test and multiple testing correction using the “qvalue” command in Bioconductor’s qvalue R package (version 2.2.2; https://github.com/jdstorey/qvalue (47). We considered SNPs with q value < 0.05 as having significant allelic bias.

For each motif, we reconstructed the PWM using variants with significant ATAC-seq footprint allelic bias. To create the PWM for each motif, we took all significant allelic bias SNPs at position j with overrepresented nucleotide i and summed their absolute allelic deviations from the adjusted expected fracRef (sample- and allele pair-specific as calculated above). The resulting matrix of values for nucleotide i at position j is a reflection of the number of allelic-biased SNPs of nucleotide i at position j and the unevenness of their imbalance toward nucleotide i. We summed the values in the matrices for the two islet samples and used them to create a PWM, so that the genetically reconstructed motifs represent the combined data from both samples.

As a control, at each motif, we also reconstructed the PWM by summing the counts of nucleotide i at position j for all SNPs (biased and unbiased) in the motif.

Effect of ATAC-Seq Footprint SNPs with Allelic Bias on Predicted TFBS Strength for CTCF and RFX Motifs.

Given that we were able to reconstruct PWMs with ATAC-seq allelic bias results, we sought to address whether the two alleles from SNPs with significant allelic bias had larger differences in their PWM score than the alleles of randomly chosen SNPs occurring within the same footprints. We calculated PWM scores for the reference and alternate allele version of each sequence using the FIMO tool as described above. For each SNP, we used the FIMO P value to calculate an SNP effect score (delta) as follows: delta = −log10(P value of alternate sequence) − (−log10(P value of reference sequence)). We then measured the delta score for all allelic bias SNPs overlapping a TF footprint for CTCF_known2 and RFX2_4 motifs. We constructed a null set of SNPs by choosing a random set of 1000G SNPs with matching MAF and TSS distance that also overlaps the same footprints. We evaluated the difference in the absolute (delta score) distributions with a Wilcoxon rank sum test. These results are shown in Fig. S9.

T2D GWAS Loci Overlap with RFX Footprints.

We performed enrichment analysis for T2D-associated SNPs (T2D GWAS SNP and SNPs in r2 ≥ 0.8 with the GWAS SNP) to overlap with TF footprint and nonfootprint motifs as described above. We selected TF motifs with less than 100,000 footprint occurrences genome-wide in either of the islet samples or GM12878 to help ensure specificity of binding; 1,995 of 2,870 TF motifs passed these criteria. In Fig. 3B, we show T2D-associated SNPs that occur at high information content (>1) positions in their respective RFX PWM. For each shown T2D-associated SNP, we used our phased genotype calls to determine the T2D-associated SNP risk allele (given the T2D GWAS SNP risk allele). Multiple RFX footprints can be called at the same SNP because of motif similarity; we report the motif from the highest scoring PWM. In Fig. 3C, we used TOMTOM (56) to align the different RFX motifs using the longest (RFX3_1) as the seed.

NISC Comparative Sequencing Program Authors.

NISC Comparative Sequencing Program Authors are Beatrice B. Barnabas, Gerard G. Bouffard, Shelise Y. Brooks, Holly Coleman, Lyudmila Dekhtyar, Xiaobin Guan, Joel Han, Shi-ling Ho, Richelle Legaspi, Quino L. Maduro, Catherine A. Masiello, Jennifer C. McDowell, Casandra Montemayor, James C. Mullikin, Morgan Park, Nancy L. Riebow, Jessica Rosarda, Karen Schandler, Brian Schmidt, Christina Sison, Raymond Smith, Sirintorn Stantripop, James W. Thomas, Pamela J. Thomas, Meghana Vemulapalli, and Alice C. Young.

Supplementary Material

Supplementary File
pnas.1621192114.sd01.xls (17.5KB, xls)
Supplementary File
pnas.1621192114.sd02.xls (21.5KB, xls)
Supplementary File
pnas.1621192114.sd03.xlsx (210.9KB, xlsx)
Supplementary File
pnas.1621192114.sd04.xlsx (103.4KB, xlsx)

Acknowledgments

We thank additional members of our laboratories and Finland–United States Investigation of NIDDM Genetics (FUSION) Study investigators for helpful comments on and critiques of the study and manuscript. This study was supported by National Institute of Diabetes and Digestive and Kidney Diseases Grants F31HL127984 (to M.E.C.), U01DK062370 (to M.B.), ZIAHG000024 (to F.S.C.), R00DK099240 (to S.C.J.P.), 5R00DK092251 (to M.L.S.), and R01DK093757, U01DK105561, and R01DK072193 (to K.L.M.) and American Diabetes Association Pathway to Stop Diabetes Grant 1-14-INI-07 (to S.C.J.P.). This research was supported, in part, by the Intramural Research Program of the National Human Genome Research Institute, NIH.

Footnotes

The authors declare no conflict of interest.

Data deposition: The data reported in this paper have been deposited in the dbGaP (accession no. phs001188.v1.p1; FUSION Tissue Biopsy Study—Islet Expression and Regulation by RNAseq and ATACseq).

2A complete list of the NISC Comparative Sequencing Program can be found in SI Materials and Methods.

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1621192114/-/DCSupplemental.

Contributor Information

Collaborators: Beatrice B. Barnabas, Gerard G. Bouffard, Shelise Y. Brooks, Holly Coleman, Lyudmila Dekhtyar, Xiaobin Guan, Joel Han, Shi-ling Ho, Richelle Legaspi, Quino L. Maduro, Catherine A. Masiello, Jennifer C. McDowell, Casandra Montemayor, James C. Mullikin, Morgan Park, Nancy L. Riebow, Jessica Rosarda, Karen Schandler, Brian Schmidt, Christina Sison, Raymond Smith, Sirintorn Stantripop, James W. Thomas, Pamela J. Thomas, Meghana Vemulapalli, and Alice C. Young

References

  • 1.Mohlke KL, Boehnke M. Recent advances in understanding the genetic architecture of type 2 diabetes. Hum Mol Genet. 2015;24(R1):R85–R92. doi: 10.1093/hmg/ddv264. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Parker SCJ, et al. NISC Comparative Sequencing Program National Institutes of Health Intramural Sequencing Center Comparative Sequencing Program Authors NISC Comparative Sequencing Program Authors Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proc Natl Acad Sci USA. 2013;110(44):17921–17926. doi: 10.1073/pnas.1317023110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Pasquali L, et al. Pancreatic islet enhancer clusters enriched in type 2 diabetes risk-associated variants. Nat Genet. 2014;46(2):136–143. doi: 10.1038/ng.2870. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Trynka G, et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat Genet. 2013;45(2):124–130. doi: 10.1038/ng.2504. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Fadista J, et al. Global genomic and transcriptomic analysis of human pancreatic islets reveals novel genes influencing glucose metabolism. Proc Natl Acad Sci USA. 2014;111(38):13924–13929. doi: 10.1073/pnas.1402665111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.van de Bunt M, et al. Transcript expression data from human islets links regulatory signals from genome-wide association studies for type 2 diabetes and glycemic traits to their downstream effectors. PLoS Genet. 2015;11(12):e1005694. doi: 10.1371/journal.pgen.1005694. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Scott LJ, et al. The genetic regulatory signature of type 2 diabetes in human skeletal muscle. Nat Commun. 2016;7:11764. doi: 10.1038/ncomms11764. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kundaje A, et al. Roadmap Epigenomics Consortium Integrative analysis of 111 reference human epigenomes. Nature. 2015;518(7539):317–330. doi: 10.1038/nature14248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Ernst J, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473(7345):43–49. doi: 10.1038/nature09906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Mikkelsen TS, et al. Comparative epigenomic analysis of murine and human adipogenesis. Cell. 2010;143(1):156–169. doi: 10.1016/j.cell.2010.09.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods. 2013;10(12):1213–1218. doi: 10.1038/nmeth.2688. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Pique-Regi R, et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome Res. 2011;21(3):447–455. doi: 10.1101/gr.112623.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Vierra NC, et al. Type 2 diabetes-associated K+ channel TALK-1 modulates β-cell electrical excitability, second-phase insulin secretion, and glucose homeostasis. Diabetes. 2015;64(11):3818–3828. doi: 10.2337/db15-0280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Quang DX, Erdos MR, Parker SCJ, Collins FS. Motif signatures in stretch enhancers are enriched for disease-associated genetic variants. Epigenetics Chromatin. 2015;8(1):23. doi: 10.1186/s13072-015-0015-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Schmidt EM, et al. GREGOR: Evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinformatics. 2015;31(16):2601–2606. doi: 10.1093/bioinformatics/btv201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Saint-André V, et al. Models of human core transcriptional regulatory circuitries. Genome Res. 2016;26(3):385–396. doi: 10.1101/gr.197590.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Allum F, et al. Multiple Tissue Human Expression Resource Consortium Characterization of functional methylomes by next-generation capture sequencing identifies novel disease-associated variants. Nat Commun. 2015;6:7211. doi: 10.1038/ncomms8211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Moyerbrailean GA, et al. Which genetics variants in DNase-Seq footprints are more likely to alter binding? PLoS Genet. 2016;12(2):e1005875. doi: 10.1371/journal.pgen.1005875. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Maurano MT, et al. Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat Genet. 2015;47(12):1393–1401. doi: 10.1038/ng.3432. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Aftab S, Semenec L, Chu JS-C, Chen N. Identification and characterization of novel human tissue-specific RFX transcription factors. BMC Evol Biol. 2008;8(1):226. doi: 10.1186/1471-2148-8-226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gaulton KJ, et al. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci. Nat Genet. 2015;47(12):1415–1425. doi: 10.1038/ng.3437. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Farh KK-H, et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015;518(7539):337–343. doi: 10.1038/nature13835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Girard C, et al. Genomic and functional characteristics of novel human pancreatic 2P domain K(+) channels. Biochem Biophys Res Commun. 2001;282(1):249–256. doi: 10.1006/bbrc.2001.4562. [DOI] [PubMed] [Google Scholar]
  • 24.Lotshaw DP. Biophysical, pharmacological, and functional characteristics of cloned and native mammalian two-pore domain K+ channels. Cell Biochem Biophys. 2007;47(2):209–256. doi: 10.1007/s12013-007-0007-8. [DOI] [PubMed] [Google Scholar]
  • 25.Lizio M, et al. FANTOM consortium Mapping mammalian cell-type-specific transcriptional regulatory networks using KD-CAGE and ChIP-seq data in the TC-YIK cell line. Front Genet. 2015;6:331. doi: 10.3389/fgene.2015.00331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Corradin O, et al. Combinatorial effects of multiple enhancer variants in linkage disequilibrium dictate levels of gene expression to confer susceptibility to common traits. Genome Res. 2014;24(1):1–13. doi: 10.1101/gr.164079.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Guo C, et al. Coordinated regulatory variation associated with gestational hyperglycaemia regulates expression of the novel hexokinase HKDC1. Nat Commun. 2015;6:6069. doi: 10.1038/ncomms7069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Zhu Z, et al. Genome editing of lineage determinants in human pluripotent stem cells reveals mechanisms of pancreatic development and diabetes. Cell Stem Cell. 2016;18(6):755–768. doi: 10.1016/j.stem.2016.03.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Smith SB, et al. Rfx6 directs islet formation and insulin production in mice and humans. Nature. 2010;463(7282):775–780. doi: 10.1038/nature08748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Soyer J, et al. Rfx6 is an Ngn3-dependent winged helix transcription factor required for pancreatic islet cell development. Development. 2010;137(2):203–212. doi: 10.1242/dev.041673. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Piccand J, et al. Rfx6 maintains the functional identity of adult pancreatic β cells. Cell Reports. 2014;9(6):2219–2232. doi: 10.1016/j.celrep.2014.11.033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Chandra V, et al. RFX6 regulates insulin secretion by modulating Ca2+ homeostasis in human β cells. Cell Reports. 2014;9(6):2206–2218. doi: 10.1016/j.celrep.2014.11.010. [DOI] [PubMed] [Google Scholar]
  • 33.Huopio H, et al. Clinical, genetic, and biochemical characteristics of early-onset diabetes in the Finnish population. J Clin Endocrinol Metab. 2016;101(8):3018–3026. doi: 10.1210/jc.2015-4296. [DOI] [PubMed] [Google Scholar]
  • 34.Gershengorn MC, et al. Epithelial-to-mesenchymal transition generates proliferative human islet precursor cells. Science. 2004;306(5705):2261–2264. doi: 10.1126/science.1101968. [DOI] [PubMed] [Google Scholar]
  • 35.Manichaikul A, et al. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26(22):2867–2873. doi: 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Dobin A, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29(1):15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Hartley SW, Mullikin JC. QoRTs: A comprehensive toolset for quality control and data processing of RNA-Seq experiments. BMC Bioinformatics. 2015;16(1):224. doi: 10.1186/s12859-015-0670-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Jun G, et al. Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. Am J Hum Genet. 2012;91(5):839–848. doi: 10.1016/j.ajhg.2012.09.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Anders S, Pyl PT, Huber W. HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31(2):166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet. 2012;44(8):955–959. doi: 10.1038/ng.2354. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Auton A, et al. 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Delaneau O, Zagury J-F, Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods. 2013;10(1):5–6. doi: 10.1038/nmeth.2307. [DOI] [PubMed] [Google Scholar]
  • 43.Fuchsberger C, Abecasis GR, Hinds DA. minimac2: Faster genotype imputation. Bioinformatics. 2015;31(5):782–784. doi: 10.1093/bioinformatics/btu704. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Shabalin AA. Matrix eQTL: Ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28(10):1353–1358. doi: 10.1093/bioinformatics/bts163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Stegle O, Parts L, Durbin R, Winn J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLOS Comput Biol. 2010;6(5):e1000770. doi: 10.1371/journal.pcbi.1000770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Stegle O, Parts L, Piipari M, Winn J, Durbin R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat Protoc. 2012;7(3):500–507. doi: 10.1038/nprot.2011.457. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003;100(16):9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Willer CJ, Li Y, Abecasis GR. METAL: Fast and efficient meta-analysis of genomewide association scans. Bioinformatics. 2010;26(17):2190–2191. doi: 10.1093/bioinformatics/btq340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Welter D, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42(Database issue):D1001–D1006. doi: 10.1093/nar/gkt1229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Miyazaki J, et al. Establishment of a pancreatic β cell line that retains glucose-inducible insulin secretion: Special reference to expression of glucose transporter isoforms. Endocrinology. 1990;127(1):126–132. doi: 10.1210/endo-127-1-126. [DOI] [PubMed] [Google Scholar]
  • 51.Kulzer JR, et al. A common functional regulatory variant at a type 2 diabetes locus upregulates ARAP1 expression in the pancreatic beta cell. Am J Hum Genet. 2014;94(2):186–197. doi: 10.1016/j.ajhg.2013.12.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Ernst J, Kellis M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat Biotechnol. 2010;28(8):817–825. doi: 10.1038/nbt.1662. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Ernst J, Kellis M. ChromHMM: Automating chromatin-state discovery and characterization. Nat Methods. 2012;9(3):215–216. doi: 10.1038/nmeth.1906. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Rozowsky J, et al. AlleleSeq: Analysis of allele-specific expression and binding in a network framework. Mol Syst Biol. 2011;7(1):522. doi: 10.1038/msb.2011.54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.van de Geijn B, McVicker G, Gilad Y, Pritchard JK. WASP: Allele-specific software for robust molecular quantitative trait locus discovery. Nat Methods. 2015;12(11):1061–1063. doi: 10.1038/nmeth.3582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Gupta S, Stamatoyannopoulos JA, Bailey TL, Noble WS. Quantifying similarity between motifs. Genome Biol. 2007;8(2):R24. doi: 10.1186/gb-2007-8-2-r24. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File
pnas.1621192114.sd01.xls (17.5KB, xls)
Supplementary File
pnas.1621192114.sd02.xls (21.5KB, xls)
Supplementary File
pnas.1621192114.sd03.xlsx (210.9KB, xlsx)
Supplementary File
pnas.1621192114.sd04.xlsx (103.4KB, xlsx)

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES