Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2014 Jun 1.
Published in final edited form as: Arterioscler Thromb Vasc Biol. 2013 Mar 28;33(6):1418–1426. doi: 10.1161/ATVBAHA.112.301169

Gene Expression Signatures of Coronary Heart Disease

Roby Joehanes 1,2, Saixia Ying 2, Tianxiao Huan 1, Andrew D Johnson 1, Nalini Raghavachari 3, Richard Wang 3, Poching Liu 3, Kimberly A Woodhouse 3, Shurjo K Sen 4, Kahraman Tanriverdi 5, Paul Courchesne 1, Jane E Freedman 5, Christopher J O'Donnell 1,6, Daniel Levy 1,*, Peter J Munson 2,*
PMCID: PMC3684247  NIHMSID: NIHMS468335  PMID: 23539218

Abstract

Objective

To identify transcriptomic biomarkers of coronary heart disease (CHD) in 188 CHD cases and 188 age- and sex-matched controls who were participants in the Framingham Heart Study.

Approach and results

A total of 35 genes were differentially expressed in CHD cases vs. controls at FDR<0.5 including GZMB, TMEM56 and GUK1. Cluster analysis revealed three gene clusters associated with CHD, two linked to increased erythrocyte production and a third to reduced natural killer (NK) and T cell activity in CHD cases. Exon-level results corroborated and extended the gene-level results. Alternative splicing analysis suggested that GUK1 and 38 other genes were differentially spliced in CHD cases vs. controls. Gene ontology analysis linked ubiquitination and T-cell-related pathways with CHD.

Conclusion

Two bioinformatically defined groups of genes show consistent associations with CHD. Our findings are consistent with the hypotheses that hematopoesis is up-regulated in CHD, possibly reflecting a compensatory mechanism, and that innate immune activity is disrupted in CHD or altered by its treatment. Transcriptomic signatures may be useful in identifying pathways associated with CHD and point toward novel therapeutic targets for its treatment and prevention.

Keywords: Gene expression, coronary heart disease, myocardial infarction, coronary artery disease, transcriptomics, biomarkers

Introduction

Coronary heart disease (CHD) is a leading cause of death in the United States1 and the world2. In the United States alone, more than 16.3 million adults have CHD and an estimated 935,000 heart attacks occur each year3. The estimated direct cost of CHD in 2010 was $272.5 billion and it is projected to reach $818 billion by 20304. Hence, innovative diagnostic strategies are needed to improve prevention and treatment of CHD and to limit its growing burden.

Recent research has indicated that CHD is influenced by environmental and genetic factors5,6. Most genetic effects are modest in size and the vast majority of the heritability of CHD remains unexplained7. Many of the known CHD susceptibility loci come from genome-wide association studies (GWAS) of common single nucleotide polymorphisms (SNPs)812. Another promising approach to identifying CHD biomarkers is from studies of gene expression signatures of CHD1315, which have yielded promising results, but to date only a few genes are common across studies. More thorough gene expression studies of CHD are needed for new discoveries and to corroborate previous findings16.

The Framingham Heart Study (FHS), a large observational study that has contributed to the identification and elucidation of risk factors for CHD17, recently launched the Systems Approach to Biomarker Research in Cardiovascular Disease (SABRe CVD) Initiative, a large scale population-based study that seeks to discover, validate, and characterize biomarkers of atherosclerotic CVD and its major risk factors18. SABRe resources include gene- and exon-level expression, which can be used to understand biological underpinnings of multiple complex traits and diseases.

As a part of SABRe, this investigation sought to identify and characterize gene expression signatures of CHD. To do so, we used an efficient CHD case-control design and assessed gene expression from RNA derived from whole blood using a commercial microarray. Analysis was conducted at the gene and exon level, and we conducted a validation study using quantitative real-time polymerase chain reaction (qPCR). A search for alternative splicing signatures was performed. Hierarchical cluster analysis was performed to investigate pathways associated with CHD, and results were annotated with gene ontology (GO) and gene-set enrichment analyses (GSEA). Across these analyses, we highlighted common pathways and genes. It is hoped that the identification of gene expression signatures of CHD will facilitate personalized approaches to the diagnosis, prevention, and treatment of this disease.

Materials and methods

The details of the materials and methods used therein are available in a separate online supplement.

Results

Study sample characteristics

The clinical characteristics of the study sample are provided in Table 1. We investigated the extended pedigree structure of the participants and found that three extended families were heavily represented in the study sample, contributing a total of 36 participants (19 cases, 17 controls). To account for the possible bias resulting from inclusion of participants from the same extended families, we adjusted for pedigree in all the subsequent analyses, with the exception of the clustering analysis.

Table 1.

Clinical Characteristics of the Study Participants

Characteristic Controls
(N=188)
Cases
(N=188)
Age, in years* 71.0 ± 8.0 71.0 ± 7.9
Male (percent) 74.5% 74.5%
Myocardial Infarction (percent) N/A 51.6%
Coronary artery bypass surgery (percent) N/A 22.9%
Percutaneous transluminal coronary angioplasty (percent) N/A 25.5%
Total cholesterol-HDL ratio, in mg / dL* 3.5 ± 1.0 3.3 ± 1.1
Current cigarette smoking (percent) 1% 8%
Hypertension (percent) 69.4% 83.5%
Diabetes (percent) 20.2% 27.1%
Total cholesterol, in mg / dL* 173.8±31.0 150.7±33.4
*

Mean ± standard deviation

Technical data adjustment

Ten technical covariates fulfilled our selection criteria (Supplementary Table I). These technical covariates were used to adjust the data in all analyses.

Gene level differential expression

Gene-level results for differential expression in CHD cases vs. controls are reported in Table 2. At FDR<0.5 there were 35 genes. Twenty six of these were found to be up-regulated in CHD. Gene ontology (GO) enrichment analysis of the top 35 differentially expressed genes or of the 26 up-regulated genes did not reveal any significantly enriched gene categories. GO analysis of the 9 down-regulated genes detected a weak association with “induction of apoptosis by intracellular signals” (p=6.7E-4). Investigation using the Novartis GeneAtlas revealed that a large number of up-regulated genes specific to CD71+ early erythroid cells (TSTA3, CHPT1, FIS1, GABARAPL2, GUK1, GYPB).

Table 2.

Differentially Expressed Genes in Coronary Heart Disease with FDR < 0.5.

Transcript
cluster ID
Gene
Symbol
Ch. Description FC1* P value
Model
1
FDR
Model
1
FC2* P value
Model
2
FDR
Model
2
2340695 SGIP1 1 SH3-domain GRB2-like (endophilin) interacting protein 1 1.06 8.1E-6 0.13 1.07 7.4E-7 0.01
3558375 GZMB 14 granzyme B (granzyme 2, cytotoxic T-lymphocyte-associated serine esterase 1) −1.14 1.4E-5 0.13 −1.12 3.5E-4 0.39
2786322 SLC7A11 4 solute carrier family 7, (cationic amino acid transporter, y+ system) member 11 1.09 2.8E-5 0.14 1.09 5.0E-5 0.37
3279410 FAM188A 10 family with sequence similarity 188, member A 1.07 3.2E-5 0.14 1.06 4.1E-4 0.39
3748449 CCDC144B/CCDC144A 17 coiled-coil domain containing 144B/144A 1.26 3.8E-5 0.14 1.24 1.6E-4 0.39
3424442 TMTC2 12 transmembrane and tetratricopeptide repeat containing 2 1.07 4.7E-5 0.14 1.07 2.2E-4 0.39
2939069 SERPINB6 6 serpin peptidase inhibitor, clade B (ovalbumin), member 6 −1.06 1.5E-4 0.38 −1.05 2.4E-3 0.56
3416019 PRR13 12 proline rich 13 1.11 1.8E-4 0.41 1.12 1.8E-4 0.39
2347732 TMEM56 1 transmembrane protein 56 1.12 2.3E-4 0.45 1.12 4.5E-4 0.39
3892918 C20orf20 20 chromosome 20 open reading frame 20 1.08 2.9E-4 0.46 1.07 2.5E-3 0.56
3063856 GATS 7 GATS, stromal antigen 3 opposite strand −1.05 3.1E-4 0.46 −1.05 1.2E-3 0.54
2459924 ABCB10 1 ATP-binding cassette, sub-family B (MDR/TAP), member 10 1.09 3.5E-4 0.46 1.09 5.1E-4 0.39
3430776 ISCU 12 iron-sulfur cluster scaffold homolog (E. coli) 1.08 3.6E-4 0.46 1.09 6.3E-5 0.37
4008078 PAGE1 X P antigen family, member 1 (prostate associated) 1.08 4.0E-4 0.46 1.09 2.4E-4 0.39
3087501 ZDHHC2 8 zinc finger, DHHC-type containing 2 1.10 4.0E-4 0.46 1.11 1.5E-4 0.39
3634509 CIB2 15 calcium and integrin binding family member 2 −1.07 4.1E-4 0.46 −1.07 1.0E-3 0.53
2596975 CRYGB 2 crystallin, gamma B −1.08 4.4E-4 0.47 −1.08 9.6E-4 0.52
3142381 FABP4 8 fatty acid binding protein 4, adipocyte 1.09 4.7E-4 0.47 1.09 1.2E-3 0.54
3064591 FIS1 7 fission 1 (mitochondrial outer membrane) homolog(S. cerevisiae) 1.11 5.5E-4 0.48 1.12 5.1E-4 0.39
2881860 CCDC69 5 coiled-coil domain containing 69 −1.05 5.5E-4 0.48 −1.04 4.8E-3 0.61
2787958 GYPB 4 glycophorin B (MNS blood group) 1.12 5.8E-4 0.48 1.13 3.9E-4 0.39
3513995 DLEU2 13 deleted in lymphocytic leukemia 2 (non-protein coding) 1.09 6.6E-4 0.48 1.07 9.1E-3 0.62
2896848 RBM24 6 RNA binding motif protein 24 1.05 6.9E-4 0.48 1.05 4.8E-4 0.39
3642654 HBM 16 hemoglobin, mu 1.11 7.6E-4 0.48 1.12 7.4E-4 0.47
3157660 TSTA3 8 tissue specific transplantation antigen P35B 1.11 7.7E-4 0.48 1.12 5.5E-4 0.39
2413423 TMEM48 1 transmembrane protein 48 1.06 8.0E-4 0.48 1.07 8.0E-4 0.47
3669059 GABARAPL2 16 GABA(A) receptor-associated protein-like 2 1.08 8.0E-4 0.48 1.09 1.7E-4 0.39
3784344 MAPRE2 18 microtubule-associated protein, RP/EB family, member 2 −1.05 8.1E-4 0.48 −1.05 1.9E-3 0.54
3391816 USP28 11 ubiquitin specific peptidase 28 −1.08 8.1E-4 0.48 −1.07 2.1E-3 0.54
3556556 DAD1 14 defender against cell death 1 1.05 8.4E-4 0.48 1.04 6.7E-3 0.62
3150844 SNTB1 8 syntrophin, beta 1 (dystrophin-associated protein A1, 59kDa, basic component 1) 1.04 8.5E-4 0.48 1.04 2.5E-3 0.56
3688112 STX1B 16 syntaxin 1B 1.07 8.6E-4 0.48 1.08 1.8E-4 0.39
2358171 PRPF3 1 PRP3 pre-mRNA processing factor 3 homolog (S. cerevisiae) −1.05 8.8E-4 0.48 −1.04 3.6E-3 0.59
2383859 GUK1 1 guanylate kinase 1 1.12 9.1E-4 0.48 1.13 5.6E-4 0.39
3428671 CHPT1 12 choline phosphotransferase 1 1.09 9.7E-4 0.49 1.10 2.6E-4 0.39
*

FCn: fold change in microarrsay expression values in model n. Positive numbers indicate overexpression in cases.

**

Model 1 is case-control with technical covariates. Model 2 is case-control with all technical covariates as in Model 1, and the following clinical covariates: hypertension and diabetes diagnoses, smoking status, and total cholesterol-HDL ratio

Validated by qPCR experiment

To better understand the biological themes within the data, the selection cutoff was relaxed to p<0.01, yielding 269 genes (Supplementary Table II). GO analysis the 164 up-regulated genes revealed significant enrichment with hemoglobin complex genes (3 of 8, p=5.3E-5). A large number were specific to CD71+ early erythroid cells. Analysis of 105 down-regulated genes using GSEA did not show strong GO enrichment, but did show significant overlap with the caspase cascade (p=8e-5) and extrinsic apoptosis (p=2E-5) genes (FASLG, GZMB, KLRD1,CASP8, RIPK1 and KIR2DL5A)..

Adjusting for CHD risk factors (Model 2) attenuated the p-values of the top genes, but did not materially alter their ranking. All genes were differentially expressed between cases and controls by 1.4 fold or less, consistent with the small effect sizes found in previous studies13,14. Among 478 genes that were reported to be associated with CHD in prior studies13,14,1921, 59 had p-values <0.05 in our results (Supplementary Tables III and IV).

Cluster analysis

We next analyzed genes in groups using hierarchical cluster analysis, which allows aggregation of smaller effect sizes. This analysis should identify clusters of coherently expressed genes, and by averaging over clusters, allows detection of weaker associations with CHD.

Hierarchical cluster analysis divided all genes into 21 clusters, and for each a representative "metagene" was computed by averaging all genes across the cluster (Supplementary Table V). Paired case-control analysis of the 21 metagenes showed that one was significantly down-regulated (cluster 15, fold change [FC]=−1.06, p=0.003) and two were significantly up-regulated (cluster 21, FC=1.07, p=0.012, and cluster 6, FC=1.02, p=0.038). These three clusters were each significantly enriched with top genes identified in the gene-level results (Bonferroni-adjusted enrichment p<1E-5). The heatmap of these clusters can be found in Supplementary Figure I. The summary of differential expression t-statistic can be found in Figure 1. The list of genes in these three clusters can be found in Supplementary Table VI.

Figure 1.

Figure 1

Differential Expression t-Statistic (Effect Divided by Standard Error) vs. Cluster Number. Hierarchical clustering divided the 17,873 genes into 21 categories according to expression profiles across the 376 samples. Significantly differentially expressed genes (p<0.01, outside of dashed limit lines) are displayed as open circles. Average and 95% confidence limits for the average t-statistic are indicated with green diamonds. Clusters 21 and 15 show averages strongly displaced from zero, with a large proportion of members which are individually significant. Positive numbers are over-expressed in CHD cases. The "metagene" (see Methods) for these clusters and for Cluster 6 are strongly significant, even after adjustment for multiple testing (Table 3). Comparison of genes in each cluster to the Novartis human tissue compendium22, shows that Cluster 15 is highly enriched in genes specific to CD56+ NK cells, CD4+ and CD8+ T cells. Cluster 21 contains over 30% of genes specific to CD71+ Early Erythrocytes with smaller numbers specific to CD105+ endothelial cells and Bone Marrow. Cluster 6 contains a high proportion of CD71+ cells. Cluster 20 contains many platelet specific genes24, with some genes also specific to Cardiac Myocytes.

Many of these clusters are comprised of genes that are specific to particular cell sub-populations using the Novartis Human Tissue Compendium22. The 32 genes in cluster 15 are nearly all specific to CD56+ NK cells but with some expression on peripheral blood CD4+ T cells and CD8+ T cells. Overall down-regulation of genes in this cluster by 6% identified an immune component to CHD. The gene GZMB (cytotoxic T-lymphocyte-associated serine esterase) in this cluster was also the most strongly significant (p<0.0001) gene of any individual down-regulated gene in our study. The protein product of this gene, granzyme B has been associated with atherosclerosis23 and increase plaque instability and risk of rupture by promoting apoptosis of macrophage foam cell23. Many of the 126 genes of cluster 21 are specifically expressed in CD71+ early erythroid cells. As a group, this cluster was significantly up-regulated in CHD cases, on the average by 7%. Cluster 6, with 436 genes, showed a smaller (2%), but significant average up-regulated expression in cases vs. controls. Like cluster 21, cluster 6 includes genes specifically expressed by CD71+ early erythroid cells, but also genes highly expressed on CD105+ endothelial as well as other non-specifically expressed genes. GO enrichment analysis of genes in cluster 6 pointed toward ubiquitination (p<2.0e-4) and apoptosis (p=2.46E-4) pathways. GO enrichment analyses of Clusters 15, 21, and 6 are found in Supplementary Table VII.

Our findings from differential gene expression and cluster analysis suggest an increase in early erythroid cell expression and an overall reduction of NK- and T-cells in CHD cases.

qPCR validation of differential expression

Of the 93 genes assayed by qPCR (Supplementary Table VIII), 52 show positive, significant Spearman correlation coefficients (>0.5). Of these, 50 show directionally consistent fold changes vs. the microarray data. Nine genes, including four of the top differentially expressed genes (SGIP1, CCDC144A, GYPE and DLEU2), were detected by PCR in fewer than 11 samples, and therefore were excluded from this analysis. Differential gene expression analysis using qPCR corroborated fold changes measured by microarray (Figure 2). Many of these 50 genes exhibited larger fold changes in qPCR data compared with microarray. This effective "compression" of fold-change by microarray is well-documented24. P-values from qPCR were generally weaker than those determined from microarray, in part, because fewer samples were available for qPCR. Nevertheless, 35 of the 52 were significantly differentially expressed in cases (p-value <0.05). Eight of these (TSTA3, HBM, GUK1, ZDHHC2, GZMB, GYPB, GABARAPL2, and TMEM56) were among the top gene-level results from microarray analyses (Supplementary Table VIII).

Figure 2.

Figure 2

Validation of Microarray Fold Change by RT-qPCR. Ninety-three genes were submitted for validation of all subjects. Of these, 52 showed positive Spearman correlation coefficients that were above 0.5. Base 2 logarithmic fold changes (Case / Control) were computed for both technical covariate adjusted microarray data (Model 1) and RT-qPCR data. Fifty of 52 show fold changes with the same sign in both methods, although the qPCR data show changes of larger magnitude (above the line of identity in first quadrant). The sole down-regulated gene (GZMB) shows a downward 1.14 fold change by microarray but a downward 1.27 change by qPCR. The largest upward fold change of 1.17 occurs in SLPI by microarray, but is found to be 1.4 when measured by qPCR.

Exon level differential expression

Exon level analysis (n=209,699) confirmed and extended the gene-level differential expression results (Supplementary Table IX). Two gene contain exons that were differentially expressed at FDR <0.1 (ABCC5 and MUC21); at FDR <0.5, 2,199 distinct genes contained significant exons, although some with only a single significant exon was identified. Twenty-nine of the top 35 genes in our gene-level results (Table 2) were also among these 2,199 genes, with 23 of them represented by at least two exons (Supplementary Table X). As with gene-level results, adjusting for CHD risk factors attenuated the p-values of the top exons, but did not materially alter their ranking (Model 1 vs. Model 2, Supplementary Table IX). A subset of 470 genes was represented by two or more significant exons. Of these, 83 genes fell into clusters 6 and 21, and were exclusively up-regulated. Eleven genes fell into cluster 15 and were exclusively down-regulated. Of these, 3 were previously identified at the gene level (Supplementary Table II), and 8 were found only at the exon level (A2M, CTSW, FCRL6, KLRF1, PRF1, TARP and TGFBR3).

The results of exon-level GO enrichment analyses are summarized in Supplementary Table XI. GO enrichment of up-regulated genes pointed to protein K48-linked ubiquitination (7 of 34 genes; p=1.2E-6), while down-regulated genes pointed to leukocyte apoptosis and base-excision repair (4 of 40 and 4 of 33 genes, p=4.2E-4 and 2.0E-4, respectively).

Differential alternative splicing

Because we found many more differentially expressed genes at the exon level than at the gene level, we hypothesized that some of the additional genes actually display differential usage of splice variants. Accordingly, we performed differential alternative splicing analysis that yielded 6 genes at FDR <0.05 and 40 genes at FDR <0.5 (Supplementary Tables XII and XIII). The most significant differentially alternatively spliced gene, GUK1, is in cluster 21 and among the top 35 differentially expressed genes. For GUK1, the reduced differential expression at the first exon (probeset 2383862) may be the result of utilization of an alternative transcription start site, as suggested in the RefSeq collection of transcript isoforms25.

Validation of differential alternative splicing

Two examples of alternative splicing detected by the exon array where evidence for alternative exon usage coincided with that given in the RefSeq database were studied by RNA-seq: GUK1, the top ranked alternatively spliced gene, and TMEM56, the third ranked gene. GUK1 has 3 alternative start sites, but the Affymetrix Exon array targeted only one of them (first probeset, 23833862, Figure 3, Supplementary Figure II). This probeset showed a strong signal for differential alternative splicing (p<4E-9). RNA-seq data showed that all three start-sites are used. Viewing the individual gapped reads (Supplementary Figure II) in the neighborhood of the first two start sites, we see that the exon targeted by the probeset is more commonly used, but a small fraction of reads map to the earlier start site, confirming the possibility for alternate start site usage. This RNA-seq information provides support for the potential differential alternative splicing between CHD cases and controls in our study.

Figure 3.

Figure 3

Schematic of known alternative transcripts for GUK1 gene and TMEM56 gene.

a) GUK1 has five RefSeq identified transcripts (NM- numbers identify each), having three alternate start sites (1a, 1b, 1c), and alternately used exons 2 and 8. In addition, a distinct translation start site (coding regions are shown by taller boxes) appears in exon 1b, and an alternate translation termination site appears in exon 9. Thus, alternate splicing may change the sequence of the protein product. Affymetrix core probesets (grey triangles) were available for exons 1b, 2 through 9, but not for exons 1a or 1c. Microarray evidence for alternative splicing was strong for exon 1b. The alternate usage of exon 1b was confirmed in a separate RNA-seq study (see Supplementary Figure II for full details).

b) TMEM56 has three documented RefSeq transcripts with alternate first exons 1a, 1b and 1c. Exons 2 through 6 are common to all transcripts, and comprise the main part of the coding sequence. The third transcript skips exon 7, but includes exons 8 and 9, having an extended coding region. Again, alternate splicing can potentially change the sequence of the protein product. Affymetrix core probesets (grey triangles) were not available for exons 1a or 1c. The strongest evidence for alternative splicing came from the probeset at exon 1b. Alternative usage of this exon was again confirmed in a separate RNA-seq study (see Supplementary Figure III for full details).

The second example, TMEM56, also has three start sites, and again the first probeset, 2347744, which gives rise to the alternative splice signal, corresponds to the second start site (Figure 3, Supplementary Figure III). Unfortunately, the Affymetrix Exon Array does not provide probes for the first or third start sites. However, viewing the gapped reads for all 16 subjects reveals alternate usage of the second and third start sites, with 4 gapped reads mapping to the second site and 25 reads mapping to the third (Supplementary Figure III). The overall expression level is too low to conclude whether the utilization of each start site differs among individuals, but both sites (exons) are utilized. This provides further support for alternative usage of start sites of TMEM56 and our evidence that alternative splicing of this gene differs in CHD cases vs. controls.

Discussion

We identified and characterized transcriptomic signatures and pathways (Tables 3 and 4) associated with CHD based on gene expression analysis of whole blood. Our study is one of several attempts to identify transcriptomic biomarkers of CHD, which may be helpful in understanding the pathobiology of CHD, in identifying robust risk markers of individuals at high risk for CHD, and in prioritizing novel therapeutic targets for treating the disease.

Table 3.

Genes that were Detected across Multiple Analyses that pass the qPCR validation.

Transcript
cluster ID
Gene
Symbol
Description Gene
level*
Exon
level
A.S.§ Cluster FC1 # Lit. **
2383859 GUK1 guanylate kinase 1 YY Y Y 21 1.12
2347732 TMEM56 transmembrane protein 56 YY Y Y 1.12
2787958 GYPB glycophorin B (MNS blood group) YY Y 21 1.12
3087501 ZDHHC2 zinc finger, DHHC-type containing 2 YY Y 21 1.1
3157660 TSTA3 tissue specific transplantation antigen P35B YY Y 6 1.11
3642654 HBM hemoglobin, mu YY Y 21 1.11 Y
3669059 GABARAPL2 GABA(A) receptor-associated protein-like 2 YY Y 21 1.08
3558375 GZMB granzyme B (granzyme 2, cytotoxic T-lymphocyte-associated serine esterase 1) YY Y 15 −1.14
2700828 SIAH2 seven in absentia homolog 2 (Drosophila) Y Y Y 6 1.05
3676165 HAGH hydroxyacylglutathione hydrolase Y Y Y 21 1.1
2451463 ADIPOR1 adiponectin receptor 1 Y Y 21 1.07
2737840 CISD2 CDGSH iron sulfur domain 2 Y Y 21 1.11
3090006 SLC25A37 solute carrier family 25, member 37 Y Y 21 1.06 Y
3657253 AHSP alpha hemoglobin stabilizing protein Y Y 21 1.17
3727712 PCTP phosphatidylcholine transfer protein Y Y 6 1.07
3759077 SLC25A39 solute carrier family 25, member 39 Y Y 21 1.11
3814978 CDC34 cell division cycle 34 homolog (S. cerevisiae) Y Y 21 1.09
3890597 RBM38 RNA binding motif protein 38 Y Y 21 1.09
2403301 RPA2 replication protein A2, 32kDa Y Y −1.05
3862108 CLC Charcot-Leyden crystal protein Y Y 1.14 Y
3886765 PI3 peptidase inhibitor 3, skin-derived Y Y 1.14
3907190 SLPI secretory leukocyte peptidase inhibitor Y Y 1.17
3360417 HBD hemoglobin, delta Y Y 21 1.07
3591327 CCNDBP1 cyclin D-type binding-protein 1 Y Y 6 1.05
2504328 GYPC glycophorin C (Gerbich blood group) Y 21 1.07
3142967 CA1 carbonic anhydrase I Y 21 1.12
3261255 DPCD deleted in primary ciliary dyskinesia homolog (mouse) Y 6 1.08
3642687 HBQ1 hemoglobin, theta 1 Y 6 1.05
3759006 SLC4A1 solute carrier family 4, anion exchanger, member 1 (erythrocyte membrane protein band 3, Diego blood group) Y 21 1.11
4009849 ALAS2 aminolevulinate, delta-, synthase 2 Y 21 1.11 Y
*

"Y" if p < 0.01 in gene-level analysis model 1, "YY" if also FDR<0.5

"Y" if at least one of the exon is at FDR<0.5 in exon-level analysis model 1

§

"Y" if alternatively spliced at FDR<0.5

Cluster #. Numbers 6, 15, and 21 indicate the cluster membership

#

Fold change in model 1. Negative sign indicate overexpression in controls.

**

Validated in the literature13,14,1921.

Table 4.

Summary of Pathways Detected across Multiple Analyses

Pathway type Gene-level analysis Exon-level analysis Clustering analysis
Ubiquitination-related Y* Y Y
T cell-related Y
NK cell-related Y
B cell-related Y
Apoptosis-related Y Y
Immune system-related Y
Inflammatory-related Y
Red blood cell-related Y
*

Only when intersected with genes previously associated in the literature

The pathways were detected using Gene Ontology or Gene Set Enrichment analyses, at cutoff level of p<0.001. The pathways were then grouped based on the keywords in the first column.

Although many of the effect sizes of differentially expressed genes were small, as found in previous studies14,21, three key findings emerged. First, many up-regulated genes fall into clusters strongly associated with CD71+ early erythrocyte cells. Second, a cluster of genes associated with NK-cells is down-regulated in CHD. Third, our results suggest that alternative splicing may play a role in CHD.

Many genes that were up-regulated in CHD cases fall into cluster 21 or cluster 6, both strongly associated with CD71+ early erythrocyte cells. Association of elevated reticulocyte count with pulmonary and cardiac disease has been reported26 and attributed to hypoxia mediated erythropoietin production. Cluster 6 genes are also associated with the ubiquitination pathway (Supplementary Table VI). Gene SIAH2, in cluster 6, controls the ubiquitination of prolyl hydroxylase (PHD) 1 and PHD3 during mild hypoxic conditions, and is one regulator of hypoxia-inducible factor 1-alpha (HIF1-α)27. HIF1-α mediates hypoxia induced up-regulation of GATA1 (also in cluster 6), which itself is essential to erythropoiesis28. Thus, CHD or the ischemia associated with it, may up-regulate hematopoiesis as a compensatory mechanism.

Cluster 15 contains many NK- and T-cell specific genes such as GZMB and is down-regulated in CHD. Paradoxically, plasma levels of granzyme B have been seen to increase after myocardial infarction,29,30. In any case, our observation may signal that perturbations of the innate immune system are involved in the pathogenesis of CHD or possibly are a result of its treatment. Other clusters also show interesting expression changes in cases vs. controls, but do not reach the same degree of statistical significance. For example, the 61 genes in cluster 20 include many markers of platelets and are consistently, but not significantly, down-regulated in CHD cases.

Our top gene findings suggest potentially important roles of ubiquitination and apoptosis pathways in CHD. SLC7A11 has been associated with ubiquitination31, may have widespread regulatory roles32,33, and may play role in cell apoptosis34. SERPINB6, GUK1, HAGH, and USP28 also are associated with ubiquitination and may have widespread regulatory roles32,33. ZDHHC2 also may have pro-apoptotic activity35. To follow up on the differential expression analyses, qPCR validation was conducted and it confirmed several of the top genes from the microarray results, including TMEM56, ZDHHC2, and GUK1. In general, qPCR results were highly correlated with microarray expression and revealed greater fold changes, but weaker p-values than on microarray, partly due to the reduced sample size.

Our results of differential expression confirmed only a relatively small number of the genes reported in prior coronary disease expression studies13,14, including HBM, TMEM48, and PDGFD (Supplementary Table III). Of 14 genes reported to discriminate coronary artery stenosis from controls by Wingrove, et al.14, we could confirm only one, and only with modest significance (p=0.04). Differences in phenotypes, in the biological samples (peripheral blood mononuclear cells vs. whole blood), and in the measurement platforms may account for the low degree of confirmation. For example, one gene, CSF2RA, was not measured on our platform. For 11 others, the direction of the effect we observed was zero or opposite to that previously reported. The gene association we did confirm, HK3, is specifically expressed in CD33+ myeloid cells and CD14+ monocytes22. It is partially transcriptionally regulated by hypoxia inducible factor signaling36 and thus, may respond to the same signals which gave rise to up-regulation of reticulocyte specific, cluster 21 and cluster 6 genes, although HK3 itself did not fall into those clusters. Three other genes reported by Wingrove (HBM, TFDP1, and KCTD15) had p-values<0.01 and the same effect direction in our results.

Exon-level results identified many of the same genes that emerged from gene level analyses and also pointed to apoptosis3739 and ubiquitination40 in CHD. Apoptosis in the heart can be activated by cytokines, thereby increasing oxidative stress and DNA damage39. Ubiquitination is involved in many cellular processes. One of the systems that regulates ubiquitination is the ubiquitin-proteasome system (UPS)41. UPS also maintains dynamic equilibrium of proteins and prevents accumulation of damaged and misfolded proteins that may contribute to CHD40.

Our analysis identified 40 genes, including GUK1 and TMEM56 that may be alternatively spliced in relation to CHD status. Since alternative splicing analysis is prone to false positives42, especially when there are many predicted splice sites in a gene43, it is difficult to ascertain alternative splice events from statistical inference based on microarray data alone. To confirm and validate our results, we exploited the high-resolution RNA-seq technique, which can recognize individual splice-sites from the alignment of fragmentary sequences ("reads"). Validation of the alternative splicing hypothesis for two top genes was achieved. Alternative splicing of GUK1 produces 5 transcripts leading to 3 possible protein isoforms. Our analysis could distinguish the first transcript variant that produces only isoform b, from the second, third or fourth variants that produce isoforms b, c or a, respectively. Thus, the observed alternative splicing could potentially have functional significance. Similarly, TMEM56 was found to be differentially alternatively spliced in CHD cases vs. controls. Here we could distinguish between the second transcript variant of TMEM56 and a third, TMEM56-RWDD3, which represents naturally occurring read-through transcription between TMEM56 and the neighboring RWDD3 gene. But the functional significance of the distinct protein products of these two variants is unclear. However, TMEM56 was recently reported to have significant differential allelic distribution and differential expression in a GWAS of hypertension44. These authors also reported that the gene has a cis-eQTL at SNP rs11165334. Confirmation of the specific genes and differential alternative splicing patterns that we identified in CHD cases vs. controls will require external confirmation.

Study strengths and weaknesses

To our knowledge, this study is the largest CHD gene expression study to date using an exon-specific microarray. We used an efficient case-control experimental design with meticulous ascertainment of cases and careful accounting for technical covariates affecting expression. Since differential gene expression signals for CHD are weak, careful experimental design and quality control are important to improve the detection of differentially expressed genes. To reduce laboratory effects, we located each case and its corresponding control within the same batch of 96 samples. While a few of the genes associated with CHD in previous studies were confirmed, the large majority were not. Inability to replicate gene associations with CHD may be due to multiple factors including differences in the biological tissue sampled, RNA processing methods, microarray design, experimental design, and phenotypes.

Our study analyzed prevalent CHD and not incident events. Thus the observed gene signatures may be downstream of the disease process and not causally linked to its occurrence. Similarly, lipid, hypertension, and other treatment effects on expression may be powerful, and not fully accounted for by multivariate adjustment. Whole blood, the RNA source used in this experiment, comprises many different blood cell types, each of which may have a unique signature with respect to CHD. Such admixture of disparate signatures may obscure the detection of signals from any single cell type. Adjustment for CHD risk factors in our analyses (Model 2) diminished the significance of many of the gene signals, suggesting that these signals may be partially mediated by established risk factors. Yet, the inclusion of CHD risk factors into the model did not significantly alter the rank of the top genes, suggesting that other factors, in conjunction with these risk factors, may drive the expression signatures we identified. Far larger sample sizes and collection of samples before the occurrence of CHD are needed to corroborate our findings and identify additional transcriptomic signals.

Our results are consistent with the hypothesis that CHD exhibits complex patterns of gene expression changes that are influenced by both environmental and CHD risk factors45. Because of the existence such complex patterns, elucidation of the molecular mechanisms of CHD will require an integrative, systems approach. Such an approach is discussed in our companion paper46.

Conclusions

We provide the results of complementary approaches that identified several transcriptomic signatures of CHD We used hierarchical cluster analysis and found convergence of results with those from single gene differential expression. Additionally, our results suggest that alternative splicing differs between CHD cases and controls, and we validated splice variation via RNA-seq in a separate study. Although several genes from prior studies were confirmed, the majority did not replicate. This underscores the need for further studies with larger sample sizes, careful experimental designs, and rigorous accounting for CHD risk factors and post-CHD treatment interventions. Such studies may contribute toward our understanding of CHD causal mechanisms and ultimately toward new approaches to the treatment and prevention of this disease.

Material and methods

Study participants

Of the 5,124 participants recruited into the FHS offspring cohort1, 225 individuals who attended the 8th offspring cohort examination cycle had a diagnosis of atherosclerotic CVD, including 193 with CHD (myocardial infarction [MI], coronary artery bypass surgery [CABG], or percutaneous transluminal coronary angioplasty [PTCA]) and 32 with atherothrombotic stroke. These 225 CVD cases were selected along with 225 controls who were matched to cases on sex, age, and when possible, on treatment with lipid lowering medication. After quality control procedures (described further), 188 CHD and 29 stroke case-control pairs remained. Principal component analysis (PCA) revealed that the gene signatures of atherothrombotic stroke were different from those of CHD; for this reason, this investigation focused on the 188 CHD case-control pairs. Protocols for participant examinations and collection of genetic materials were approved by the Boston Medical Center Institutional Review Board. All participants gave informed consent.

RNA preparation

Fasting peripheral whole blood samples from cases and controls were collected in PAXgene™ tubes (PreAnalytiX, Hombrechtikon, Switzerland) during FHS offspring cohort examination 8 (2005–2008). These samples were processed for use in both microarray and quantitative real-time polymerase chain reaction (qPCR).

RNA isolation

Fasting peripheral whole blood samples (2.5ml) from cases and controls were collected in PAXgene™ tubes (PreAnalytiX, Hombrechtikon, Switzerland) during the FHS offspring cohort examination 8 (2005–2008), incubated at room temperature for 4h for complete lysis of blood cells, and then stored at −80°C. Total RNA was isolated from frozen PAXgene blood tubes by Asuragen, Inc., according to the company’s standard operating procedures for automated isolation of RNA from 96 samples in a single batch on a KingFisher® 96 robot. Briefly, tubes were allowed to thaw at room temperature. After centrifugation and washing to collect white blood cell pellets, cells were lysed in guanidinium-containing buffer. Organic extraction was performed prior to adding binding buffer and magnetic beads in preparation for the KingFisher run. The purity and quantity of total RNA samples were determined by absorbance readings at 260 and 280 nm using a NanoDrop ND-1000 UV spectrophotometer. The integrity of total RNA was qualified by Agilent Bioanalyzer 2100 microfluidic electrophoresis, using the Nano Assay and the Caliper LabChip system.

Preparation of cDNA from RNA

50 ng RNA samples were amplified using the WT-Ovation Pico RNA Amplification System (NuGEN, San Carlos, CA) as recommended by the manufacturer in an automated manner using the genechip array station (GCAS). In brief, first strand cDNA was prepared using a unique first strand DNA/RNA chimeric primer mix and reverse transcriptase. In the second step, with the DNA/RNA heteroduplex double stranded cDNA was generated which served as the substrate for single primer isothermal amplification (SPIA), a linear isothermal DNA amplification process developed by NuGEN. In the third step amplified DNA along with RNA was treated with RNase H to degrade the RNA in the DNA/RNA heteroduplex at the 5´ end of the first cDNA strand which then served as the initiation site for the next round of cDNA synthesis. The process of SPIA DNA/RNA primer binding, DNA replication, strand displacement and RNA cleavage is repeated, resulting in rapid accumulation of microgram amounts of SPIA cDNA. An aliquot of the SPIA cDNA was used for qPCR analysis.

Microarray platform annotation

Annotations of each probeset on the Human Exon 1.0 ST Array (Affymetrix, Inc., Santa Clara, CA) chip were obtained from the manufacturer2, on files HuEx-1_0-st-v2.na31.hg19.transcript.csv.zip and HuEx-1_0-st-v2.na31.hg19.probeset.csv.zip. Only "core" level probesets and transcript clusters that mapped to a RefSeq transcript were used in the analysis. In total, 287,329 probesets representing 17,873 distinct genes were available for analysis, based on NetAffx version 312.

Microarray processing

Three micrograms of the SPIA cDNA were processed with the WT-Ovation Exon Module in GCAS to produce sense strand ST-cDNA following the manufacturer’s (Nugen) procedure. 5 µg ST-cDNA was fragmented and labeled with the FL-Ovation™ cDNA Biotin Module using a proprietary two-step fragmentation and labeling process. The first step is a combined chemical and enzymatic fragmentation process that yields single-stranded cDNA products in the 50- to 100-base range. In the second step, this fragmented product is labeled via enzymatic attachment of a biotin-labeled nucleotide to the 3-hydroxyl end of the fragmented cDNA generated in the first step. Hybridization, washing and laser scanning of the microarrays were performed according to the manufacturer’s protocol. Hybridization was performed at 45°C overnight, followed by washing and staining using FS450 fluidics station. Scanning was carried out using the Affymetrix 7G GCS3000 scanner.

Case-control study design and assay layout

Following the selection of CHD cases and paired controls, and evaluation of available RNA quality of samples, a statistical design was developed, which randomized the location of each pair to one of 5 batches of 90 samples each. The pairs were located in adjacent positions within the 96 well racks in a pattern that minimized possible row, column, and edge effects. The laboratory personnel were blinded to the identity of cases and controls in an effort to further minimize potential bias. Quality control pools occupied the remaining 6 positions in each rack.

Expression data quality assessment

The robust multi-chip average (RMA) method3 was applied to normalize expression values for the 450 samples, using the Power Tools (APT)4 version 1.12.0. The following quality control (QC) metrics were used to determine the quality of the hybridized samples: all_probeset_mean, all_probeset_rle_mean, pm_mean, and pos_vs_neg_auc. Three samples and their pairs were excluded due to outlying values for all_probeset_rle_mean. Chromosome Y-linked gene expression was used to check for agreement with reported sex. Agreement of expression levels at 395 genes with a panel of 395 most associated expression single nuclear polymorphism (eSNPs) was used to further confirm the identity of each sample. Five additional samples failed this test and were removed along with their pairs. The remaining 434 samples with satisfactory results constituted the study samples and were again normalized with RMA, retaining only core-level probesets.

Technical covariate adjustment

The APT program4 provides a list of 75 QC parameters intended to gauge the quality of the particular chip. RNA quality index, or RNA integrity number (RIN), RNA quantity, RNA yield values were provided from the laboratory. In addition, as RNA was isolated in a separate laboratory using an automated method, RNA isolation batch and hybridization batch were recorded for each sample.

To identify influential technical covariates, the principal components (PCs) were extracted on centered, but unscaled gene-level RMA data. Each PC was regressed on each technical covariate, provided by either the APT program or the laboratory, to determine the percentage variance of that PC explained. The variance was summed over all PCs to determine the percentage of total variance explained by that covariate. A multivariate stepwise regression was used to select, in descending order, the most influential covariates. The stepwise regression terminated when no remaining covariate could explain more than 1% of the total variance. The analysis was repeated on the exon-level data, and a "best" minimal set of covariates was determined that could explain the largest share of variance for both levels. The RMA normalized data were then adjusted for this set of technical covariates, using the "lmer" function of R5, where batch was designated as a random effect and the other factors were treated as fixed effects.

Ten technical covariates fulfilled our selection criteria. Together, they explained 42% of total variation in gene expression (Supplementary Table 1). These technical covariates were used to adjust the data in all analyses.

Microarray platform

The Human Exon 1.0 ST Array (Affymetrix, Inc., Santa Clara, CA) was used to measure RNA expression levels. This array consists of over 6 million probes grouped into about 1.2 million probesets, targeted to the majority of known exons in the human genome. Both gene-level (transcript cluster level) and exon-level (probeset-level) analyses were conducted. Each CHD case and control pair was located in adjacent positions to minimize possible row, column, and edge effects.

Quantitative real-time polymerase chain reaction (qPCR) processing

Based on statistical analysis of the case-control study, 93 genes were selected for analysis by qPCR in all 225 atherosclerotic CVD cases and 225 matched controls. Since several statistical models were used, genes were selected if they met any one of several criteria: 1) p-value <0.0005 in any analysis, or 2) p-value <0.001 and either the fold change is consistently >1.1 within the three case-control subgroups of MI, CABG, and PTCA, or 3) it was previously identified in the literature. In total, 114 genes were nominated, and availability of adequate PCR primers determined the final selection for the 93 positions in the assay. Together with 3 housekeeping genes (GAPDH, ACTB, and B2M), 93 candidate genes filled the design capacity of 96 genes for one Fluidigm DynamicArray (Fluidigm, San Francisco, CA). GAPDH, ACTB, and B2M were chosen based on previous analyses showing high correlation of these genes in circulating cells from 1,850 participants from the FHS offspring cohort6.

Prior to the PCR reactions, the SPIA cDNA samples were pre-amplified using TaqMan PreAmp Master Mix (Applied Biosystems, Foster City, CA). SPIA cDNA samples were diluted 1:5 times. The PreAmp Master Mix and 0.2× TaqMan assays were added to each cDNA sample and the pre-amplification protocol was as follows: 95°C for 10 min, 95°C for 15 sec and 60°C for 4 min (these last two steps were repeated for 14 cycles). The pre-amplified cDNA samples were kept overnight at 4°C or stored at −20°C for further analysis.

The qPCR reactions were performed with a high throughput RT-PCR instrument (BioMark; Fluidigm, San Francisco, CA). Preamplified cDNA samples were mixed with TaqMan Universal Master Mix (Applied Biosystems, Foster City, CA) and Sample Loading Reagent (Fluidigm, San Francisco, CA) and pipetted into sample inlets of the DynamicArray 96.96 chips (Fluidigm, San Francisco, CA). TaqMan Gene Expression Assays (Applied Biosystems, Foster City, CA) were diluted 1:2 times with Assay Loading Reagent (Fluidigm, San Francisco, CA) and then pipetted into the assay inlets of the DynamicArray 96.96. The DynamicArray was placed into the IFC Controller HX to distribute the assays and samples into the reaction wells of the chip through microfluidic delivery. All qPCR reactions, which were performed in the BioMark Real-Time PCR system, used the following protocol: 10 min at 95°C, 15 sec at 95°C and 1 min at 60°C for 30 cycles.

Statistical models

A standard paired t-test was applied to the pre-defined case-control pairs. To account for family structure within the cohort, the R package "pedigreemm"7 was used, after modification to allow for only one observation per participant. Since the “lmer” function that is used by the “pedigreemm” does not provide p-values, the p-values were computed from the z-statistics under an assumption of normality. The p-values were then adjusted for a false discovery rate (FDR)8. Details regarding statistical models and software modifications can be found in the Supplementary Materials. In addition to a model that accounted for age and sex through matched pairing (Model 1), a second model incorporated additional risk factor for CHD including hypertension, diabetes, cigarette smoking, and the total cholesterol/HDL ratio (Model 2). Detected-above-background (DABG) filtering was performed on exon-level data. The filter removed 74,106 probesets, leaving 209,699 exon probesets for analysis.

Cluster analysis

To further evaluate the patterns of expression within the data, a hierarchical clustering of all 17,873 genes was performed using Ward's method9, after mean-centering the data for each gene and after adjusting for technical covariates. The genes were then partitioned into 21 clusters by cutting the dendrogram at the corresponding points. The clusters included between 2 and 3,689 genes. The two largest clusters consisted of genes showing almost no pattern of expression, i.e., were nearly constant in expression level. For each cluster, a representative "metagene" was computed by taking the average expression of genes within the cluster across participants. These metagenes were analyzed with the paired t-test described above. Theoretically, this approach should allow for better detection of weaker effects involving an entire cluster or pathway, as it averages over many genes and thus reduces the effect of noise in any single gene. As the clustering is performed without regard to case-control status, the test is a valid comparison of case versus control for each metagene. P-values were adjusted to give FDR values, as above, but required a less severe adjustment since only 21 clusters were considered. Cluster metagenes were selected if FDR was <0.5. Clusters were also tested for enrichment of differentially expressed genes (at p<0.01). Enrichment was tested with Fisher's exact test, and clusters were identified when Bonferroni-corrected p-value was <0.0001, adjusting for testing 21 clusters.

Analysis of qPCR data

The qPCR cycle threshold (CT) values were normalized with two of the housekeeping genes, GAPDH and B2M, to yield delta CT values. The third housekeeping gene, ACTB, did not correlate with the other two and was excluded. Delta CT values greater than 25 were considered to be missing. Thirteen case-control pairs with more than 30 missing values were excluded from analysis. The correlation between the delta CT values and the corresponding microarray expression values was computed, without regard to case-control status, using Spearman’s method. The data were then analyzed using paired t-tests. For validating differential expression, genes with two-tailed p-value <0.05 (Model 1), Spearman’s correlation >0.5, and with agreement on effect sign were considered to be validated.

Alternative Splicing Analysis and Validation

Affymetrix exon-level probeset analysis was used as input to the alternative splicing ExonANOVA model10. This model calculates the significance of alternative splicing based on the agreement of fold-change estimates for the exons in a particular gene11. The model calculates the exon-by-case interaction effect, a measure that a particular exon fold-change estimate diverges from that of the full gene. The interaction effect divided by its standard error is termed the "splicing statistic". The absolute values of interaction effect of greater than 3 are considered highly significant, once the significance for the overall gene has been determined by the model.

Validation of alternative splicing via RNA-Seq was attempted in sixteen new samples collected as part of a separate study of unrelated men of European descent between the ages of 52 and 66 years. These subjects were participants in the ClinSeq project™ 12 and were selected for RNA-Seq. This study was approved by the NHGRI Institutional Review Board and all subjects gave written informed consent for whole genome sequencing. From each subject, 2.5 ml of peripheral blood was drawn into PAXgene RNA tubes (PreAnalytiX, Switzerland) during visits to the NIH Clinical Center during 2007 and 2008. RNA extraction was done using the PAXgene Blood RNA Kit IVD (Qiagen, CA). RNA-Seq libraries were constructed after extracting poly(A) RNA from one microgram of total RNA and sequenced on Illumina GAIIx sequencers. Two lanes of 76 bp reads were generated for each library, corresponding to a median sequencing depth of ~60 million reads per subject. Reads were mapped to the human genome using the TopHat RNA-Seq aligner13 using default settings, and the output BAM files were loaded into a custom session on the UCSC Genome Browser for data viewing.

Gene ontology (GO) enrichment analysis

We performed GO enrichment analysis using GOrilla14 on statistically significant genes from each of the analyses. This method determines whether the number of differentially expressed genes having a particular GO assignment is significantly greater than would be expected by chance, given the total number of genes, the total number of genes having that assignment, and the number of statistically significant genes, overall. We ran GOrilla on two unranked lists and used genes annotated in the NetAffx core-level annotation version 312 as the background set of genes. GO categories with p-values <0.001 were considered significant. Additionally, since co-regulated genes are typically consistently up or down regulated, we also performed GO analysis on up- and down-regulated genes separately to investigate functional characteristics of genes in each group.

Gene set enrichment analysis (GSEA)

The Molecular Signatures Database v3.0 (MSigDB)15, was used to compute the significance of overlap of our gene lists with curated gene sets including GO gene sets. Further, this tool allows viewing of the expression profiles of the Novartis GeneAtlas (Human Tissue Compendium, ref 32)16.

Software

All statistical analyses were performed both at the exon/probeset level and at the gene/transcript cluster level using R version 2.15.1 or JMP version 10 (SAS, Cary, NC) and the MSCL Analyst's Toolbox17. The false discovery rates (FDR) was calculated with Benjamini and Hochberg’s method8.

Details on methods and software

Mixed model method and software

The “pedigreemm”7 and “lmer”18 packages in R are well-known linear mixed model packages in R. The pedigreemm package uses the lmer package in the back end. Other than providing adjustments for correlation structure, these two packages are essentially the same. The pedigreemm and lmer packages in R only output the effect size, standard error, and the t-value, but not the p-value. The current suggestion is to use the available “mcmcsamp” method in the lmer package. However mcmcsamp is very computationally expensive and produces p-values at low resolution, which can destabilize the computation of the false discovery rate (FDR). Hence, using the asymptotics results, namely as the number of samples approaches infinity, effect size divided by the corresponding standard error (or z-score) is asymptotically normally distributed with mean 0 and variance 1.

The pedigreemm package does not allow one observation per sample and thereby a modification is required to remove the limitation, which involves only a simple removal of a test. A bigger modification is in order for allowing the newer, beta version of the lmer to be used in conjunction with pedigreemm. The newer lmer is still being finalized and, in our case, provides better convergence figure (i.e., deviance number) than the lmer that is currently available in Comprehensive R Archive Network (CRAN), the standard repository of R packages. The patch involved renaming variables, using new data structures, and internal calls for the new lmer. We used the lmer beta Subversion Release 1784 and the corresponding patch is available in the following URL: http://r-forge.r-project.org/tracker/index.php?func=detail&aid=1928&group_id=60&atid=300

Supplementary Material

1

Significance.

Physicians and public health experts have long sought accessible (blood-based) biomarkers of cardiovascular disease. Such biomarkers would be useful in improving the accuracy of risk scores and might become essential in the clinic setting for distinguishing innocuous chest pain from imminent myocardial infarction. Our study, based on one of the largest community-based cohorts, shows promise that there indeed are transcriptomic profiles in blood that help distinguish coronary heart disease (CHD) cases from controls, although little consistency was seen between our results and earlier published studies. Our study also shows that the problem of identifying specific biomarkers is complex and that no single gene is likely to be diagnostic in isolation. We have shown that sets of genes related to two cellular components of blood are likely to differ in CHD cases, suggesting the need for further study of the underlying mechanisms. Integrating genetic data may also provide a path towards identifying biomarker signatures of CHD.

Acknowledgements

D. L. and P. J. M. designed, directed, and supervised the experiment. D. L. was responsible for funding of the project. R. J. and P. J. M. drafted the manuscript. P. J. M., D. L., H. T., A. D. J., C. J. O., J. E. F., and K. T. revised and edited the manuscript. R. J, S. Y., and P. J. M. performed the statistical analysis. N. R., P. L., R. W., and K. A. W. collected the microarray data. J. E. F. and K. T. collected the RT-qPCR data. S. S. collected the RNA-Seq data. P. C. organized the experiment material and data exchange. All authors have read and approved the final version of the manuscript.

Sources of Funding

From the Center for Population Studies and the Intramural Research Programs of the National Heart, Lung, and Blood Institute, and of the Center for Information Technology of the National Institutes of Health, and from Boston University School of Medicine. The Framingham Heart Study is supported by National Heart, Lung, and Blood Institute contract N01-HC-25195.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Disclosures

No conflicts of interest, financial or otherwise, are declared by the authors.

Data availability

All microarray data used herein are available online in dbGaP (http://www.ncbi.nlm.nih.gov/gap) accession number phs000007.

References

  • 1.Keenan NL, Shaw KM. Coronary heart disease and stroke deaths - United States, 2006. MMWR Surveill Summ. 2011;60(Suppl):62–66. [PubMed] [Google Scholar]
  • 2.Lozano R, Naghavi M, Foreman K, et al. Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010. The Lancet. 2012;380:2095–2128. doi: 10.1016/S0140-6736(12)61728-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Roger VL, Go AS, Lloyd-Jones DM, et al. Heart disease and stroke statistics--2012 update: a report from the American Heart Association. Circulation. 2012;125:e2–e220. doi: 10.1161/CIR.0b013e31823ac046. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Heidenreich PA, Trogdon JG, Khavjou OA, et al. Forecasting the future of cardiovascular disease in the United States: a policy statement from the American Heart Association. Circulation. 2011;123:933–944. doi: 10.1161/CIR.0b013e31820a55f5. [DOI] [PubMed] [Google Scholar]
  • 5.Zeller T, Blankenberg S, Diemert P. Genomewide Association Studies in Cardiovascular Disease--An Update 2011. Clinical Chemistry. 2011;58:92–103. doi: 10.1373/clinchem.2011.170431. [DOI] [PubMed] [Google Scholar]
  • 6.Kullo IJ, Ding K. Mechanisms of disease: The genetic basis of coronary heart disease. Nat Clin Pract Cardiovasc Med. 2007;4:558–569. doi: 10.1038/ncpcardio0982. [DOI] [PubMed] [Google Scholar]
  • 7.Preuss M, König IR, Thompson JR, et al. Design of the Coronary ARtery DIsease Genome-Wide Replication And Meta-Analysis (CARDIoGRAM) Study: A Genome-wide association meta-analysis involving more than 22 000 cases and 60 000 controls. Circ Cardiovasc Genet. 2010;3:475–483. doi: 10.1161/CIRCGENETICS.109.899443. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Cupples LA, Arruda HT, Benjamin EJ, et al. The Framingham Heart Study 100K SNP genome-wide association study resource: overview of 17 phenotype working group reports. BMC Med. Genet. 2007;8(Suppl 1):S1. doi: 10.1186/1471-2350-8-S1-S1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. doi: 10.1038/nature05911. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ripatti S, Tikkanen E, Orho-Melander M, et al. A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses. Lancet. 2010;376:1393–1400. doi: 10.1016/S0140-6736(10)61267-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Coronary Artery Disease (C4D) Genetics Consortium. A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease. Nat. Genet. 2011;43:339–344. doi: 10.1038/ng.782. [DOI] [PubMed] [Google Scholar]
  • 12.Schunkert H, König IR, Kathiresan S, et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 2011;43:333–338. doi: 10.1038/ng.784. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Sinnaeve PR, Donahue MP, Grass P, Seo D, Vonderscher J, Chibout S-D, Kraus WE, Sketch M, Jr, Nelson C, Ginsburg GS, Goldschmidt-Clermont PJ, Granger CB. Gene expression patterns in peripheral blood correlate with the extent of coronary artery disease. PLoS ONE. 2009;4:e7037. doi: 10.1371/journal.pone.0007037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Wingrove JA, Daniels SE, Sehnert AJ, Tingley W, Elashoff MR, Rosenberg S, Buellesfeld L, Grube E, Newby LK, Ginsburg GS, Kraus WE. Correlation of peripheral-blood gene expression with the extent of coronary artery stenosis. Circ Cardiovasc Genet. 2008;1:31–38. doi: 10.1161/CIRCGENETICS.108.782730. [DOI] [PubMed] [Google Scholar]
  • 15.Szmit S, Jank M, Maciejewski H, Grabowski M, Glowczynska R, Majewska A, Filipiak KJ, Motyl T, Opolski G. Gene expression profiling in peripheral blood nuclear cells in patients with refractory ischaemic end-stage heart failure. J. Appl. Genet. 2010;51:353–368. doi: 10.1007/BF03208866. [DOI] [PubMed] [Google Scholar]
  • 16.Ioannidis JPA. Why Most Published Research Findings Are False. PLoS Medicine. 2005;2:e124. doi: 10.1371/journal.pmed.0020124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J. Factors of risk in the development of coronary heart disease---six-year follow up experience: The Framingham Heart Study. Ann. Intern. Med. 1961;55:33–50. doi: 10.7326/0003-4819-55-1-33. [DOI] [PubMed] [Google Scholar]
  • 18.Joehanes R, Johnson AD, Barb JJ, Raghavachari N, Liu P, Woodhouse KA, O’Donnell CJ, Munson PJ, Levy D. Gene expression analysis of whole blood, peripheral blood mononuclear cells, and lymphoblastoid cell lines from the Framingham Heart Study. Physiol. Genomics. 2012;44:59–75. doi: 10.1152/physiolgenomics.00130.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Rosenberg S, Elashoff MR, Beineke P, et al. Multicenter validation of the diagnostic accuracy of a blood-based gene expression test for assessing obstructive coronary artery disease in nondiabetic patients. Ann. Intern. Med. 2010;153:425–434. doi: 10.7326/0003-4819-153-7-201010050-00005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Elashoff MR, Wingrove JA, Beineke P, et al. Development of a Blood-based Gene Expression Algorithm for Assessment of Obstructive Coronary Artery Disease in Non-Diabetic Patients. BMC Med Genomics. 2011;4:26. doi: 10.1186/1755-8794-4-26. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Taurino C, Miller WH, McBride MW, McClure JD, Khanin R, Moreno MU, Dymott JA, Delles C, Dominiczak AF. Gene expression profiling in whole blood of patients with coronary artery disease. Clin. Sci. 2010;119:335–343. doi: 10.1042/CS20100043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. U.S.A. 2004;101:6062–6067. doi: 10.1073/pnas.0400782101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Skjelland M, Michelsen AE, Krohg-Sørensen K, Tennøe B, Dahl A, Bakke S, Brosstad F, Damås JK, Russell D, Halvorsen B, Aukrust P. Plasma levels of granzyme B are increased in patients with lipid-rich carotid plaques as determined by echogenicity. Atherosclerosis. 2007;195:e142–e146. doi: 10.1016/j.atherosclerosis.2007.05.001. [DOI] [PubMed] [Google Scholar]
  • 24.Raghavachari N, Xu X, Harris A, Villagra J, Logun C, Barb J, Solomon MA, Suffredini AF, Danner RL, Kato G, Munson PJ, Morris SM, Jr, Gladwin MT. Amplified expression profiling of platelet transcriptome reveals changes in arginine metabolic pathways in patients with sickle cell disease. Circulation. 2007;115:1551–1562. doi: 10.1161/CIRCULATIONAHA.106.658641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Pruitt KD, Tatusova T, Brown GR, Maglott DR. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Research. 2011;40:D130–D135. doi: 10.1093/nar/gkr1079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Kendall RG, Mellors I, Hardy J, Mardle B. Patients with pulmonary and cardiac disease show an elevated proportion of immature reticulocytes. Clin Lab Haematol. 2001;23:27–31. doi: 10.1046/j.1365-2257.2001.00353.x. [DOI] [PubMed] [Google Scholar]
  • 27.Nakayama K, Frew IJ, Hagensen M, Skals M, Habelhah H, Bhoumik A, Kadoya T, Erdjument-Bromage H, Tempst P, Frappell PB, Bowtell DD, Ronai Z. Siah2 regulates stability of prolyl-hydroxylases, controls HIF1alpha abundance, and modulates physiological responses to hypoxia. Cell. 2004;117:941–952. doi: 10.1016/j.cell.2004.06.001. [DOI] [PubMed] [Google Scholar]
  • 28.Zhang F-L, Shen G-M, Liu X-L, Wang F, Zhao Y-Z, Zhang J-W. Hypoxia-inducible factor 1-mediated human GATA1 induction promotes erythroid differentiation under hypoxic conditions. Journal of Cellular and Molecular Medicine. 2012;16:1889–1899. doi: 10.1111/j.1582-4934.2011.01484.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kondo H, Hojo Y, Tsuru R, Nishimura Y, Shimizu H, Takahashi N, Hirose M, Ikemoto T, Ohya K-I, Katsuki T, Yashiro T, Shimada K. Elevation of plasma granzyme B levels after acute myocardial infarction. Circ. J. 2009;73:503–507. doi: 10.1253/circj.cj-08-0668. [DOI] [PubMed] [Google Scholar]
  • 30.Tsuru R, Kondo H, Hojo Y, Gama M, Mizuno O, Katsuki T, Shimada K, Kikuchi M, Yashiro T. Increased granzyme B production from peripheral blood mononuclear cells in patients with acute coronary syndrome. Heart. 2008;94:305–310. doi: 10.1136/hrt.2006.110023. [DOI] [PubMed] [Google Scholar]
  • 31.Lee KA, Hammerle LP, Andrews PS, Stokes MP, Mustelin T, Silva JC, Black RA, Doedens JR. Ubiquitin Ligase Substrate Identification through Quantitative Proteomics at Both the Protein and Peptide Levels. Journal of Biological Chemistry. 2011;286:41530–41538. doi: 10.1074/jbc.M111.248856. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wagner SA, Beli P, Weinert BT, Nielsen ML, Cox J, Mann M, Choudhary C. A proteome-wide, quantitative survey of in vivo ubiquitylation sites reveals widespread regulatory roles. Mol. Cell Proteomics. 2011;10 doi: 10.1074/mcp.M111.013284. M111.013284. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Danielsen JMR, Sylvestersen KB, Bekker-Jensen S, Szklarczyk D, Poulsen JW, Horn H, Jensen LJ, Mailand N, Nielsen ML. Mass spectrometric analysis of lysine ubiquitylation reveals promiscuity at site level. Mol. Cell Proteomics. 2011;10 doi: 10.1074/mcp.M110.003590. M110.003590. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Liu X-X, Li X-J, Zhang B, Liang Y-J, Zhou C-X, Cao D-X, He M, Chen G-Q, He J-R, Zhao Q. MicroRNA-26b is underexpressed in human breast cancer and induces cell apoptosis by targeting SLC7A11. FEBS Lett. 2011;585:1363–1367. doi: 10.1016/j.febslet.2011.04.018. [DOI] [PubMed] [Google Scholar]
  • 35.Li B, Cong F, Tan CP, Wang SX, Goff SP. Aph2, a protein with a zf-DHHC motif, interacts with c-Abl and has pro-apoptotic activity. J. Biol. Chem. 2002;277:28870–28876. doi: 10.1074/jbc.M202388200. [DOI] [PubMed] [Google Scholar]
  • 36.Wyatt E, Wu R, Rabeh W, Park H-W, Ghanefar M, Ardehali H. Regulation and cytoprotective role of hexokinase III. PLoS ONE. 2010;5:e13823. doi: 10.1371/journal.pone.0013823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Gill C, Mestril R, Samali A. Losing heart: the role of apoptosis in heart disease--a novel therapeutic target? FASEB J. 2002;16:135–146. doi: 10.1096/fj.01-0629com. [DOI] [PubMed] [Google Scholar]
  • 38.Reeve JLV, Duffy AM, O’Brien T, Samali A. Don’t lose heart--therapeutic value of apoptosis prevention in the treatment of cardiovascular disease. J. Cell. Mol. Med. 2005;9:609–622. doi: 10.1111/j.1582-4934.2005.tb00492.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Lee Y, Gustafsson AB. Role of apoptosis in cardiovascular disease. Apoptosis. 2009;14:536–548. doi: 10.1007/s10495-008-0302-x. [DOI] [PubMed] [Google Scholar]
  • 40.Powell SR, Herrmann J, Lerman A, Patterson C, Wang X. The ubiquitin-proteasome system and cardiovascular disease. Prog Mol Biol Transl Sci. 2012;109:295–346. doi: 10.1016/B978-0-12-397863-9.00009-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Powell SR. The ubiquitin-proteasome system in cardiac physiology and pathology. Am. J. Physiol. Heart Circ. Physiol. 2006;291:H1–H19. doi: 10.1152/ajpheart.00062.2006. [DOI] [PubMed] [Google Scholar]
  • 42.Bemmo A, Benovoy D, Kwan T, Gaffney DJ, Jensen RV, Majewski J. Gene expression and isoform variation analysis using Affymetrix Exon Arrays. BMC Genomics. 2008;9:529. doi: 10.1186/1471-2164-9-529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Laajala E, Aittokallio T, Lahesmaa R, Elo LL. Probe-level estimation improves the detection of differential splicing in Affymetrix exon array studies. Genome Biology. 2009;10:R77. doi: 10.1186/gb-2009-10-7-r77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Yang H-C, Liang Y-J, Chen J-W, et al. Identification of IGF1, SLC4A4, WWOX, and SFMBT1 as Hypertension Susceptibility Genes in Han Chinese with a Genome-Wide Gene-Based Association Study. In: Mariño-Ramírez L, editor. PLoS ONE. Vol. 7. 2012. p. e32907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Libby P, Ridker PM, Hansson GK. Progress and challenges in translating the biology of atherosclerosis. Nature. 2011;473:317–325. doi: 10.1038/nature10146. [DOI] [PubMed] [Google Scholar]
  • 46.Huan T, Bin Z, Zhi W, et al. A Systems Biology Framework Identifies Molecular Underpinnings of Coronary Heart Disease. ATVB. 2012 doi: 10.1161/ATVBAHA.112.300112. [DOI] [PMC free article] [PubMed] [Google Scholar]

Bibliography

  • 1.Feinleib M, Kannel WB, Garrison RJ, et al. The Framingham Offspring Study. Design and preliminary data. Prev Med. 1975;4(4):518–525. doi: 10.1016/0091-7435(75)90037-7. [DOI] [PubMed] [Google Scholar]
  • 2.Affymetrix. Transcript assignment for NetAffx(TM) Annotations. 2006 Available at: http://www.affymetrix.com/support/technical/byproduct.affx?product=huexon-st. [Google Scholar]
  • 3.Irizarry RA, Hobbs B, Collin F, et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003;4(2):249–264. doi: 10.1093/biostatistics/4.2.249. [DOI] [PubMed] [Google Scholar]
  • 4.Affymetrix. Affymetrix Power Tools. Affymetrix Available at: http://www.affymetrix.com/partners_programs/programs/developer/tools/powertools.affx. [Google Scholar]
  • 5.R Development Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: 2010. Available at: http://www.R-project.org. [Google Scholar]
  • 6.Freedman JE, Larson MG, Tanriverdi K, et al. Relation of platelet and leukocyte inflammatory transcripts to body mass index in the Framingham heart study. Circulation. 2010;122(2):119–129. doi: 10.1161/CIRCULATIONAHA.109.928192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Vazquez AI, Bates DM, Rosa GJM, et al. Technical note: an R package for fitting generalized linear mixed models in animal breeding. J. Anim. Sci. 2010;88(2):497–504. doi: 10.2527/jas.2009-1952. [DOI] [PubMed] [Google Scholar]
  • 8.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. JSSRB. 1995;57(1):289–300. [Google Scholar]
  • 9.Murtagh F. Multidimensional Clustering Algorithms. Physica-Verlag: 1985. [Google Scholar]
  • 10.Cline MS, Blume J, Cawley S, et al. ANOSVA: a statistical method for detecting splice variation from expression data. Bioinformatics. 2005;21(Suppl 1):i107–i115. doi: 10.1093/bioinformatics/bti1010. [DOI] [PubMed] [Google Scholar]
  • 11.Raghavachari N, Xu X, Harris A, et al. Amplified expression profiling of platelet transcriptome reveals changes in arginine metabolic pathways in patients with sickle cell disease. Circulation. 2007;115(12):1551–1562. doi: 10.1161/CIRCULATIONAHA.106.658641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Biesecker LG, Mullikin JC, Facio FM, et al. The ClinSeq Project: Piloting large-scale genome sequencing for research in genomic medicine. Genome Research. 2009;19(9):1665–1674. doi: 10.1101/gr.092841.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25(9):1105–1111. doi: 10.1093/bioinformatics/btp120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Eden E, Navon R, Steinfeld I, et al. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics. 2009;10:48. doi: 10.1186/1471-2105-10-48. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 2005;102(43):15545–15550. doi: 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Su AI, Wiltshire T, Batalov S, et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. U.S.A. 2004;101(16):6062–6067. doi: 10.1073/pnas.0400782101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Barb J, Munson PJ. The MSCL Analyst’s Toolbox. Available at: http://abs.cit.nih.gov/MSCLtoolbox/.
  • 18.Pinheiro J, Bates D. Mixed-effects Models in S and S-PLUS [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES