Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

bioRxiv logoLink to bioRxiv
[Preprint]. 2025 Aug 21:2025.08.15.670514. [Version 1] doi: 10.1101/2025.08.15.670514

SEAHORSE: A Serendipity Engine Assaying Heterogeneous Omics-Related Sampling Experiments

Adam Quackenbush 1,2,*, Jaya Kolluri 3,4,*, Rohan Biju 1,5, Saron Nhong 6, Derrick K DeConti 6, John Quackenbush 6,7, Enakshi Saha 6,8
PMCID: PMC12393347  PMID: 40894779

Abstract

Large-scale, open-access data sets such as the Genotype Tissue Expression Project (GTEx) and The Cancer Genome Atlas (TCGA) include multi-omic data on large numbers of samples along with extensive clinical and phenotypic information. These datasets provide a unique opportunity to discover correlations among clinical and genomic data features that can lead to testable hypotheses and new discoveries. SEAHORSE (http://seahorse.networkmedicine.org/) is a web-based database and search tool for exploratory data analysis in which we have pre-computed statistical associations between available data elements. An easy-to-use user interface allows users to explore significant associations using tabulated summary statistics, data visualizations, and functional enrichment analyses (using RNA-seq data) for identified sets of genes. We describe the motivation and construction of SEAHORSE and demonstrate its utility by documenting several surprising association patterns observed across multiple tissues in GTEx and multiple different cancer types in TCGA.

Introduction

Scientific inquiry is generally defined by the application of the scientific method which involves formulating hypotheses and testing these through a process of careful experimentation. While many scientists favor formulating incremental, testable hypotheses based on their existing knowledge of the system, this approach tends to reinforce our current understanding of the system and so can fall short in uncovering previously unexplored aspects of the data. If we are interested in expanding our understanding, a more holistic statement of the scientific method would have us begin with observations of natural phenomena and from those develop testable hypotheses. Indeed, with the advent of genome-wide screening methods and other large-scale data-intensive studies, there is growing recognition of the value of open-ended hypothesis-seeking studies13 in which one often seeks novel relationships that do not overlap with existing knowledge.4 In practice, however, even “discovery-driven” genomic studies often default to looking for either correlations between genomic measurements (such as correlations between gene expression patterns across samples) or use statistical analyses to identify new genomic associations that distinguish phenotypic states (such as finding differentially expressed genes across conditions or associations of genetic variants with traits). While these approaches have proven extremely useful in extending our understanding of health and disease, there are clearly many associations within these data sets that remain unexplored—often because no one has thought to look.

The Johari Window (Figure 1) is a rubric for quantifying an individual’s awareness of the state of knowledge that they possess.5 It has been used in a broad range of applications—most notably in assessing risk—and generally classifies the universe of facts into four groups. The first, the “known knowns,” are the things that we are aware of and about which we have information. The core of our scientific knowledge is built on the foundation of these known facts as they represent our current state of “truth” and its supporting evidence. The second group are the “known unknowns,” and these are the raw material for application of the scientific method as this is where we generally develop and test hypotheses to bridge gaps in our existing knowledge. The third group are the “unknown knowns” that represent associations which have already been discovered but about which we may not presently be aware—associations that might possibly be documented in the literature. The fourth and final group are the “unknown unknowns,” things about which we have no knowledge and which are outside our current sphere of understanding. Because of their nature, both the “unknown knowns” and “unknown unknowns” are classes of facts that we generally do not think to search for. However, these two fact classes, and the latter in particular, represent areas of inquiry in which we can make unexpected, serendipitous discoveries and develop new, unexpected, testable hypotheses. In biological science, the emergence and expansion of large-scale cohort studies that include extensive clinical phenotypic and multi-omic data on sizable groups of individuals, provide unique opportunities to systematically explore the space of “unknown unknowns,” uncovering knowledge that could lead to unique insights and hypotheses that can be validated in other data sets.

Figure 1.

Figure 1.

The Johari Window was first developed in psychology to classify what we and others know about ourselves. It has subsequently been generalized for application in many areas, including risk assessment, where the most dangerous elements are those things that we simply do not recognize to be unknown to us (either because we are blind to them, or they are hidden from us). In science, the “Known Knowns” encapsulate our current understanding, the “Known Unknowns” are the areas we explore through hypothesis testing experiments. The “Unknown Unknowns” are where new, unexpected hypotheses can be developed.

While the clinical and phenotypic data have been widely used to characterize cohorts for comparison in correlation-based studies68 or used as covariates in statistical comparisons (such as those used to identify differential patterns of gene expression or gene regulation),913 associations between many combinations of these variables remain unexplored. There are many large-scale studies such as the Trans-Omics for Precision Medicine (TOPMed), All of Us, the Million Veterans Project, and the UK BioBank, that have one or more omic data types and extensive phenotypic information for each research subject, potentially hiding important but unexpected associations. Here we will concentrate on the Genotype Tissue Expression Project (GTEx) and The Cancer Genome Atlas (TCGA) as initial examples of what one can recover using such data resources as a discovery engine.

GTEx v8 release1416 has data on 948 individual donors that includes RNA-seq expression from samples representing 43 unique tissues in the body (a reduced subset of the 54 presented in GTEx; see Methods) as well as whole-genome sequencing data from which single nucleotide polymorphisms (SNPs) can be inferred. GTEx also provides a fairly extensive collection of information about the samples and individuals from whom they were collected, that includes 189 health-related variables such as height, weight, sex, race, age, type of death, smoking status, and other relevant information (Supplemental Table S1).17 The availability of such comprehensive data has already allowed many important analyses, including investigations into phenotypic associations with gene expression and the biological interaction networks that drive transcription,1820 as well as trait association studies that include expression quantitative trait locus (eQTL) and other analyses.2124 The Cancer Genome Atlas (TCGA)25 has collected extensive molecular and phenotypic data in more than 20,000 primary cancer and matched normal samples representing 33 organ sites and many annotated cancer types and subtypes. These data include whole-genome sequencing, whole-transcriptome profiling, and in many cases additional data such as miRNA expression, DNA methylation, proteomics, or other data. Many individuals who are represented in TCGA have molecular data on matched tumor and adjacent normal samples and some have tumors and distant metastases. Associated with each sample in TCGA is extensive clinical and phenotypic data on the disease and there is also some limited data on treatment.

The abundance of extensive phenotypic and multi-omic data in each of these studies, together with the availability of large-scale computing resources that enable computation, storage, and access to precomputed correlations between variables, present a unique opportunity to identify unexpected associations in these studies—observations that might ultimately lead to new hypotheses and ultimately new discoveries. To this end, we created SEAHORSE (Serendipity Engine Assaying Heterogeneous Omics Sampling Experiments; https://seahorse.networkmedicine.org/) a resource that allows users to find significant associations between genes and clinical variables, paving the way for serendipitous discoveries. SEAHORSE is implemented as a web-based tool for exploratory data analysis, available through an intuitive user interface, designed to allow users to identify both expected and unexpected associations within large data sets. Drawing on large-scale genomic profiling projects that have collected multi-omic data sets on well-annotated populations for whom clinical and other demographic data are available, SEAHORSE allows users to ask and answer questions about individual variables or about cohorts identified by selecting multiple demographic and other parameters. For example, one could ask “What phenotypic variables correlate with subject age?” Or, “What gene expression levels are correlated with subject BMI in different tissues?” Or “Which genes are correlated with the expression of HER2 in the transverse colon?” SEAHORSE allows these open-ended questions to be quickly explored, thereby facilitating the development of new hypotheses that can then be tested and validated using other data resources or follow-up studies in the laboratory.

User Interface: A Webtool to Facilitate Unexpected Discoveries

SEAHORSE is designed to allow users to explore correlations between parameters measured in large data sets, including correlations that are potentially unexpected. Users access the data through a web query interface (https://seahorse.networkmedicine.org/) that allows them to explore pre-computed correlations and associations within the data (Figure 2). For example, selecting GTEx as the data source, the web portal provides summary statistics and displays that include distributions of samples by tissue, summaries of subject age, sex, race, and other variables. Correlations in the data can then be explored using the left-hand query selector, which provides a scrollable list of phenotype parameters, a list of the tissues sampled, and a gene query box. Phenotype parameters (listed for GTEx in Supplemental Table S1) are grouped together; selecting one of these presents a plot of the distribution for that parameter as a histogram followed by plots representing those phenotypic variables that are significantly correlated with it.

Figure 2.

Figure 2.

GRAPHICAL OVERVIEW of the workflow.

Pre-Calculating Correlations in GTEx

As a first implementation of SEAHORSE (Figure 2), we chose to use data from the 948 genotyped individuals for whom clinical phenotypic data and RNA-seq data were available from the Genotype-Tissue Expression (GTEx) project version 826, 27 on the GTEx portal (https://gtexportal.org/home/); RNA-seq data were downloaded as Transcripts per Million (TPM) and used without further processing.

The GTEx consortium also collected phenotypic data on the individuals they profiled and this was downloaded from dbGaP (https://dbgap.ncbi.nlm.nih.gov/, under study accession phs000424.v8.p2). Phenotypic data included 189 parameters reported for each individual (See Supplemental Table S1): one is the subjectID; 23 parameters are represented as continuous variables (including height, weight, and age); 124 (including sex, smoking status, and drinking status) are dichotomous categorical variables; seven (including race, blood type and various disease status measures (are nominal categorical variables; and the remaining 34 are descriptive comments or provide units used for various measurements. Excluding the subjectID and the 34 comments/units, left 154 phenotypic parameters that we used in our association analysis.

We considered possible correlations between all 11,781 (=154×153/2) pairwise sets of phenotypic variables and using the appropriate statistical tests for different pairs of variable types (listed in Table 1), calculated the p-value corresponding to the measure of association between those variables. We also tested for associations between the 154 phenotypic parameters and the expression level of each gene in each tissue using linear regression, ANOVA, or a t-test as appropriate (Table 1) and ranked these 247,544,281 associations by p-value. Finally, within each tissue, we calculated the Pearson correlation between expression levels of each gene and reported 415,360,884 significant gene-gene correlations within each tissue that has sufficient r2 values (cutoff of 0.7) . The input data as well as the resulting correlation values were stored in R dataframes indexed by the various parameters that were tested. All analyses were implemented in the R statistical programming language and to ensure reproducibility, sample code is available as an executable Jupyter notebook entitled ”Uncovering Associations among Genes and Phenotypes with SEAHORSE” in Netbooks28 (v2.4.1; https://netbooks.networkmedicine.org). Besides computing measures of associations, we also performed functional enrichment analysis on the correlated sets of genes within each tissue, ranked by their strength of association with particular phenotypes; complete results are available through the SEAHORSE Dashboard; functional enrichment results for correlations of height with tissue-specific gene expression in GTEx and for age with tumor-type specific gene expression in TCGA are available as text files in the Harvard Dataverse (https://dataverse.harvard.edu/dataverse/SEAHORSE).

Table 1.

Statistical tests used for measuring associations between variables within the GTEx dataset.

Clinical: Dichotomous Clinical: Nominal Clinical: Continuous Gene Expression
Clinical: Dichotomous Chi-squared / Cramer’s V Chi-squared / Cramer’s V t-test t-test
Clinical: Nominal Chi-squared / Cramer’s V ANOVA ANOVA
Clinical: Continuous Pearson Correlation Pearson Correlation
Gene Expression Pearson Correlation

Example Analysis on GTEx: Unexpected Correlations with Human Height

The motivation for developing SEAHORSE was to create a resource that would allow the discovery of unexpected associations in large public datasets that might subsequently be validated through independent corroborating lines of evidence, including examination of data from other studies. As an example, we examined each of the forty-three tissues profiled in 948 individuals by the GTEx project and calculated the correlation between height and gene expression in each tissue. We then ranked the genes based on Pearson correlation with the phenotype and performed pre-ranked Gene Set Enrichment Analysis (GSEA) using the R package “fgsea” (version 1.20.0)29 testing for significant associations with KEGG pathways,30 GO Biological Process terms,31 and Reactome pathways.32. For each annotation source, we ranked significant terms by the number of tissues in which they appeared as significant. We then analyzed these catalogs to look for trends among the significant functional annotation terms.

Upon reviewing the results from the KEGG functional enrichment analysis, the first surprising result that stands out is the large number of disease related pathways found to be significantly associated (p-value < 0.1) with height. The sixth item on the list, “Pathways in Cancer,” was found to be significantly correlated across 22 tissues. Although this was surprising to us, there is a growing body of evidence indicating a link between height and different types of cancer.33, 34 A 2019 review by Giovannucci of large-population studies found consistently increased risk with height of cancers of the nervous system, thyroid, breast, lung, colon, rectum, prostate, ovary, testes, cervix, endometrium, and skin, as well as lymphoma, multiple myeloma, and leukemia.35 Although Giovannucci summarizes potential factors contributing to the increased risk, including the number of cells or cell divisions, or metabolic differences, the mechanisms are unclear. However, the associations seen in SEAHORSE suggest that there are underlying changes in the gene expression levels of cancer related pathways in taller people that may contribute to an elevated risk of developing certain kinds of cancers.

As can be seen in Table 2, there are several other disease-related processes whose gene expression levels are correlated with height. The KEGG pathway, “cardiac muscle contraction,” was significantly correlated (p-value < 0.1) with height in nineteen tissues. There are a number of cardiac diseases that have been linked to height36. Congestive heart failure, coronary artery disease, and aortic valve calcification have all been reported in some studies to be less prevalent with increasing height.3638 However, the best-established correlation is with atrial fibrillation, which has been observed to be more common in taller individuals39. Atrial fibrillation is a common cardiac arrhythmia or irregular heartbeat that affects the atria, the upper chambers of the heart. In atrial fibrillation, the electrical signals that coordinate the contractions of the atria become disorganized, causing them to quiver or fibrillate rather than contract properly; this irregular electrical activity disrupts the normal rhythm of the heart. Nearly as prevalent, with significant associations in seventeen tissues, is a correlation with “arrhythmogenic right ventricular cardiomyopathy (ARVC),” a genetic heart condition characterized by the replacement of normal heart muscle with fatty or fibrous tissue in the right ventricle, which can lead to arrhythmias and potentially life-threatening complications. Although we could not find any studies reporting an observed change in prevalence or severity of ARVC with height, the data from SEAHORSE suggest that there might be a shared mechanism between this and other height-associated heart diseases that could be tested in a population-based study.

Other interesting associations of disease-related patterns of gene expression that change with height (and found in Table 1) have support in the literature. For example, it has been reported that shorter people have higher risk of Parkinson’s disease4042 (possibly due to lower nigral neuron density40) and of Alzheimer’s disease4347—which may also be linked48, 49 to an increased risk of both greater BMI50 and Type I and II diabetes with shorter stature.

Pre-Calculating Correlations in TCGA

SEAHORSE also contains exploratory analysis on data from The Cancer Genome Atlas (TCGA) which contains information over 20,000 tumor and normal samples across 33 different cancer-types. Several omics data types including gene expression, mutation, methylation and copy number variations are recorded for the samples. In the current version of SEAHORSE, we have included exploratory analysis on only the gene expression and phenotypic data from primary tumor samples. (Analysis of omic data types in addition to gene expression will be included in future releases of SEAHORSE, as will the analysis of normal adjacent samples.) A full list of genomic and clinical measurements collected by TCGA are available on the TCGA website.51

The gene expression and phenotypic data for 33 cancer types are downloaded from Recount3 (http://rna.recount.bio/),52 a resource that contains uniformly processed RNA-seq data from studies, including TCGA. After filtering lowly expressed genes, measures of associations were computed between pairs of gene expression values, between gene expression and phenotypes, and between pairs of phenotypes, separately for each cancer type using the same approach described for GTEx (Table 1), followed by pathway enrichment analysis using KEGG, GO Biological Processes, and Reactome pathways.

Example Analysis on TCGA: Unexpected Correlations with Age

Cancer is often characterized as a disease of aging, in large part because most adult cancers are predominantly diagnosed among individuals aged 60 and above53. Although prognosis is usually worse among older individuals, for many different cancer types including lung, pancreatic and breast cancer, tumors are often detected at more advanced stages in younger individuals and disease in younger individuals often have poorer outcomes compared to older individuals54. This indicates that age-dependent alterations in cancer-associated processes may play a role in disease risk, development, and progression.

We examined age-associated patterns of gene expression across all tissues reported in SEAHORSE using KEGG pathway-based functional enrichment analysis, Not surprisingly, we found that cancer-related pathways involving cell adhesion and cell proliferation including focal adhesion, ECM (extracellular matrix) receptor interaction, and WNT signaling pathway were among those most frequently found to have significant association with age, each having been identified in more than 15 tissues. “Pathways in cancer,” which contains most proto-oncogenes and tumor suppressor genes, was found to be significantly correlated with age among tumor samples across 18 different cancer types.

In this regard, a greater correlation between age and genes associated with tumor progression across multiple different cancer types is particularly interesting as it might point towards common biological mechanisms that explain why, in many tissues tumor incidence and prognosis are so strongly age dependent. It may be that the altered function of these critical pathways facilitates tumor growth, immune evasion, or catalyze tumor-specific metabolic processes. Further, it may be that these altered cellular processes, rather than co-morbidities alone, contribute to overall poorer outcomes with age. A deeper investigation into these cell proliferation related pathways might also help identify novel age-specific therapeutic mechanisms.

Conclusions and Future Development

SEAHORSE is an open-source, interactive database that contains pre-computed summary statistics, measures of associations, and graphical summaries, of data from two relatively large multi-omic studies on populations for which there is extensive phenotypic data, GTEx and TCGA. The SEAHORSE website is designed to allow extensive exploratory data analysis, allowing users to uncover unexpected associations within complex biological data—the “unknown unknowns” that may provide hypotheses that can be further tested and validated either through exploration of other public data sets or through direct experimentation. However, we have also found SEAHORSE to be a convenient resource for validating findings from other focused studies in which we find that particular biological processes distinguish between phenotypes or are correlated with some parameter such as age or biological sex.

We recognize that there is an inherent multiple testing problem in making the large number of comparisons required to build the SEAHORSE database. But if we treat SEAHORSE in the spirit in which it was constructed—as a discovery tool—then observations arising from these comparisons can serve as the basis for formulating hypotheses that can be tested independently and specifically in independent data sources. One can also use SEAHORSE to explicitly test associations found in other data analyses by making a specific query about the association between some set of parameters measured in the cohorts and are represented in the database.

The current release of SEAHORSE contains summary statistics, visualizations, and pathway enrichment analyses using RNA-seq data from GTEx and TCGA, each of which profiled an extensive repertoire of tissues. We demonstrated the efficacy of SEAHORSE by highlighting several intriguing associations between phenotypic traits and biological processes. This includes some that one might expect to find, such as the age-based association of the expression of cell proliferation and other cancer pathways in multiple tumors, or the links between BMI and multiple disorders. Other associations that we found were, at least to us, unexpected, including the significantly high correlation between height and multiple disease-related pathways that we found across multiple tissues. Both scenarios argue for the value of pre-computing correlations in the growing number of large-cohort studies that have collected multi-omic data.

Given the value proposition represented in SEAHORSE, we plan to expand the resource to include other large genomic databases including TOPMED, AACR GENIE, the UKBiobank, and other human transcriptomics datasets that are available in recount3, as well as updating the analyses to the latest release of GTEx. We also plan to include other omics data types in our correlation analyses that are possible, including methylation, mutation, and copy number variation, thereby expanding the exploratory analysis to involve interactions across multiple omics modalities and their association with phenotypes.

Integrating network analysis tools including PANDA,55 DRAGON,7 and BONOBO6 into SEAHORSE’s analytical toolkit will further enhance its efficacy in enabling the discovery of the “unknown unknowns” associaterd with gene regulatory processes through hypothesis generation and testing.

Supplementary Material

Supplement 1
media-1.xlsx (43.5KB, xlsx)

Figure 3.

Figure 3.

Examples of exploratory queries possible in SEAHORSE regarding phenotypes (clinical variables) in the GTEx dataset: Selecting a phenotypic variable such as “Age” and a specific tissue such as “Adipose subcutaneous” produces a display of (A) the sampling distribution of the selected phenotypic variable and four different analyses: (B) a ranked list of other phenotypes; (C) a ranked list of library metadata variables with which age is most strongly associated, along with boxplots (for categorical phenotypes) and scatterplots (for continuous phenotypes); (D) a ranked list of genes most strongly associated with age in the selected tissue along with scatterplot of age and expression levels of each gene; (E) a ranked list of biological pathways with which age is most strongly associated within the selected tissue, along with a rugplot for each pathway and a button that enables the user to list all other tissues where age is strongly associated with the corresponding biological pathway.

Figure 4.

Figure 4.

Examples of exploratory queries possible in SEAHORSE regarding specific genes in the GTEx dataset: Selecting a gene such as “TP53” and a specific tissue such as “Breast” produces a display of (A) the sampling distribution of the expression levels of the selected gene in the given tissue and three different analyses: (B) a ranked list of the association between the expression level of the selected gene in the selected tissue and all phenotypes (clinical variables); (C) a ranked list of correlation between the expression level of the selected gene in the selected tissue and each of the library metadata variables, along with boxplots (for categorical phenotypes) and scatterplots (for continuous phenotypes); (D) a ranked list of genes most strongly correlated with the expression level of the selected gene in the selected tissue, along scatterplots showing the joint distribution of the selected gene and all other genes.

Acknowledgements

ES and JQ were supported by grants from the National Institutes of Health R35CA220523 and R01HG011393; DD and JQ were supported by grant U24CA231846.

Data Availability

SEAHORSE is available as a web-based query tool (https://seahorse.networkmedicine.org/) and a Jupyter notebook is available through Netbooks v2.4.1 (https://netbooks.networkmedicine.org) titled “Uncovering Associations among Genes and Phenotypes with SEAHORSE.” Full results for functional enrichment analysis using height as a query for correlated gene expression in GTEx tissues and age for correlated gene expression in TCGA tumors is available through the Harvard Dataverse (https://dataverse.harvard.edu/dataverse/SEAHORSE). All software code for calculating the correlations is written in R and is available under the MIT License at https://github.com/Enakshi-Saha/netZooR/tree/SEAHORSE.

References

  • 1.Kitsios GD, Zintzaras E. Genome-wide association studies: hypothesis-“free” or “engaged”? Translational research : the journal of laboratory and clinical medicine. 2009;154(4):161–4. Epub 2009/09/22. doi: 10.1016/j.trsl.2009.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Yanai I, Lercher M. A hypothesis is a liability. Genome Biol. 2020;21(1):231. Epub 2020/09/05. doi: 10.1186/s13059-020-02133-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Felin T, Koenderink J, Krueger JI, Noble D, Ellis GFR. The data-hypothesis relationship. Genome Biol. 2021;22(1):57. Epub 2021/02/12. doi: 10.1186/s13059-021-02276-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC. Detecting novel associations in large data sets. Science. 2011;334(6062):1518–24. Epub 2011/12/17. doi: 10.1126/science.1205438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Luft J, Ingham H. The Johari Window, A Graphic Model of Interpersonal Awareness. Proceedings of the Western Training Laboratory in Group Development, Los Angeles: UCLA1955. [Google Scholar]
  • 6.Saha E, Fanfani V, Mandros P, Ben Guebila M, Fischer J, Shutta KH, DeMeo DL, Lopes Ramos CM, Quackenbush J. Bayesian inference of sample-specific coexpression networks. Genome Res. 2024. Epub 20240812. doi: 10.1101/gr.279117.124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Shutta KH, Weighill D, Burkholz R, Guebila MB, DeMeo DL, Zacharias HU, Quackenbush J, Altenbuchinger M. DRAGON: Determining Regulatory Associations using Graphical models on multi-Omic Networks. Nucleic Acids Res. 2023;51(3):e15. doi: 10.1093/nar/gkac1157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Micheletti S, Schlauch D, Quackenbush J, Ben Guebila M. Higher-order correction of persistent batch effects in correlation networks. Bioinformatics. 2024. Epub 20240903. doi: 10.1093/bioinformatics/btae531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Lopes-Ramos CM, Shutta KH, Ryu MH, Huang Y, Saha E, Ziniti J, Chase R, Hobbs BD, Yun JH, Castaldi P, Hersh CP, Glass K, Silverman EK, Quackenbush J, DeMeo DL. Sex-biased Regulation of Extracellular Matrix Genes in COPD. Am J Respir Cell Mol Biol. 2024. Epub 20240805. doi: 10.1165/rcmb.2024-0226OC. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lopes-Ramos CM, Kuijjer ML, Ogino S, Fuchs CS, DeMeo DL, Glass K, Quackenbush J. Gene Regulatory Network Analysis Identifies Sex-Linked Differences in Colon Cancer Drug Metabolism. Cancer Res. 2018;78(19):5538–47. Epub 2018/10/03. doi: 10.1158/0008-5472.CAN-18-0454. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Morrow JD, Cho MH, Hersh CP, Pinto-Plata V, Celli B, Marchetti N, Criner G, Bueno R, Washko G, Glass K, Choi AMK, Quackenbush J, Silverman EK, DeMeo DL. DNA methylation profiling in human lung tissue identifies genes associated with COPD. Epigenetics. 2016;11(10):730–9. Epub 2016/08/27. doi: 10.1080/15592294.2016.1226451. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Glass K, Quackenbush J, Silverman EK, Celli B, Rennard SI, Yuan GC, DeMeo DL. Sexually-dimorphic targeting of functionally-related genes in COPD. BMC Syst Biol. 2014;8:118. Epub 2014/11/29. doi: 10.1186/s12918-014-0118-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Saha E, Ben Guebila M, Fanfani V, Fischer J, Shutta KH, Mandros P, DeMeo DL, Quackenbush J, Lopes-Ramos CM. Gene regulatory networks reveal sex difference in lung adenocarcinoma. Biology of sex differences. 2024;15(1):62. Epub 20240806. doi: 10.1186/s13293-024-00634-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Consortium GT, Laboratory DA, Coordinating Center -Analysis Working G, Statistical Methods groups-Analysis Working G, Enhancing Gg, Fund NIHC, Nih/Nci, Nih/Nhgri, Nih/Nimh, Nih/Nida, Biospecimen Collection Source Site N, Biospecimen Collection Source Site R, Biospecimen Core Resource V, Brain Bank Repository-University of Miami Brain Endowment B, Leidos Biomedical-Project M, Study E, Genome Browser Data I, Visualization EBI, Genome Browser Data I, Visualization-Ucsc Genomics Institute UoCSC, Lead a, Laboratory DA, Coordinating C, management NIHp, Biospecimen c, Pathology, e QTLmwg, Battle A, Brown CD, Engelhardt BE, Montgomery SB. Genetic effects on gene expression across human tissues. Nature. 2017;550(7675):204–13. Epub 2017/10/13. doi: 10.1038/nature24277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Consortium GT. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348(6235):648–60. Epub 2015/05/09. doi: 10.1126/science.1262110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Consortium GT. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45(6):580–5. Epub 2013/05/30. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.GTEx. Data dictionary for data table pht002743.v8.p2. Available from: https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs000424/phs000424.v8.p2/pheno_variable_summaries/phs000424.v8.pht002743.v8.GTEx_Sample_Attributes.data_dict.xml.
  • 18.Lopes-Ramos CM, Chen CY, Kuijjer ML, Paulson JN, Sonawane AR, Fagny M, Platig J, Glass K, Quackenbush J, DeMeo DL. Sex Differences in Gene Expression and Regulatory Networks across 29 Human Tissues. Cell Rep. 2020;31(12):107795. Epub 2020/06/25. doi: 10.1016/j.celrep.2020.107795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Sonawane AR, Platig J, Fagny M, Chen CY, Paulson JN, Lopes-Ramos CM, DeMeo DL, Quackenbush J, Glass K, Kuijjer ML. Understanding Tissue-Specific Gene Regulation. Cell Rep. 2017;21(4):1077–88. Epub 2017/10/27. doi: 10.1016/j.celrep.2017.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Lopes-Ramos CM, Paulson JN, Chen CY, Kuijjer ML, Fagny M, Platig J, Sonawane AR, DeMeo DL, Quackenbush J, Glass K. Regulatory network changes between cell lines and their tissues of origin. BMC Genomics. 2017;18(1):723. Epub 2017/09/14. doi: 10.1186/s12864-017-4111-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Fagny M, Paulson JN, Kuijjer ML, Sonawane AR, Chen CY, Lopes-Ramos CM, Glass K, Quackenbush J, Platig J. Exploring regulation in tissues with eQTL networks. Proc Natl Acad Sci U S A. 2017;114(37):E7841–E50. Epub 2017/08/31. doi: 10.1073/pnas.1707375114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Bahcall OG. Human genetics: GTEx pilot quantifies eQTL variation across tissues and individuals. Nat Rev Genet. 2015;16(7):375. Epub 2015/06/17. doi: 10.1038/nrg3969. [DOI] [PubMed] [Google Scholar]
  • 23.Fagny M, Platig J, Kuijjer ML, Lin X, Quackenbush J. Nongenic cancer-risk SNPs affect oncogenes, tumour-suppressor genes, and immune function. Br J Cancer. 2020;122(4):569–77. Epub 2019/12/07. doi: 10.1038/s41416-019-0614-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Barry JD, Fagny M, Paulson JN, Aerts H, Platig J, Quackenbush J. Histopathological Image QTL Discovery of Immune Infiltration Variants. iScience. 2018;5:80–9. Epub 2018/09/22. doi: 10.1016/j.isci.2018.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gonda DD, Cheung VJ, Muller KA, Goyal A, Carter BS, Chen CC. The Cancer Genome Atlas expression profiles of low-grade gliomas. Neurosurgical focus. 2013;34(2):E8. Epub 2013/02/05. doi: 10.3171/2012.12.FOCUS12351. [DOI] [PubMed] [Google Scholar]
  • 26.Carithers LJ, Moore HM. The Genotype-Tissue Expression (GTEx) Project. Biopreserv Biobank. 2015;13(5):307–8. Epub 2015/10/21. doi: 10.1089/bio.2015.29031.hmm. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, Hasz R, Walters G, Garcia F, Young N, Foster B, Moser M, Karasik E, Gillard B, Ramsey K, Sullivan S, Bridge J, Magazine H, Syron J, Fleming J, Siminoff L, Traino H, Mosavel M, Barker L, Jewell S, Rohrer D, Maxim D, Filkins D, Harbach P, Cortadillo E, Berghuis B, Turner L, Hudson E, Feenstra K, Sobin L, Robb J, Branton P, Korzeniewski G, Shive C, Tabor D, Qi L, Groch K, Nampally S, Buia S, Zimmerman A, Smith A, Burges R, Robinson K, Valentino K, Bradbury D, Cosentino M, Diaz-Mayoral N, Kennedy M, Engel T, Williams P, Erickson K, Ardlie K, Winckler W, Getz G, DeLuca D, MacArthur D, Kellis M, Thomson A, Young T, Gelfand E, Donovan M, Meng Y, Grant G, Mash D, Marcus Y, Basile M, Liu J, Zhu J, Tu Z, Cox NJ, Nicolae DL, Gamazon ER, Im HK, Konkashbaev A, Pritchard J, Stevens M, Flutre T, Wen X, Dermitzakis ET, Lappalainen T, Guigo R, Monlong J, Sammeth M, Koller D, Battle A, Mostafavi S, McCarthy M, Rivas M, Maller J, Rusyn I, Nobel A, Wright F, Shabalin A, Feolo M, Sharopova N, Sturcke A, Paschal J, Anderson JM, Wilder EL, Derr LK, Green ED, Struewing JP, Temple G, Volpi S, Boyer JT, Thomson EJ, Guyer MS, Ng C, Abdallah A, Colantuoni D, Insel TR, Koester SE, Little AR, Bender PK, Lehner T, Yao Y, Compton CC, Vaught JB, Sawyer S, Lockhart NC, Demchok J, Moore HF. The Genotype-Tissue Expression (GTEx) project. Nature Genetics. 2013;45:580–5. doi: 10.1038/ng.2653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Ben Guebila M, Weighill D, Lopes-Ramos CM, Burkholz R, Pop RT, Palepu K, Shapoval M, Fagny M, Schlauch D, Glass K, Altenbuchinger M, Kuijjer ML, Platig J, Quackenbush J. An online notebook resource for reproducible inference, analysis and publication of gene regulatory networks. Nat Methods. 2022;19(5):511–3. doi: 10.1038/s41592-022-01479-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A. 2005;102(43):15545–50. Epub 2005/10/04. doi: 0506580102 10.1073/pnas.0506580102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Kanehisa M. The KEGG database. Novartis Found Symp. 2002;247:91–101; discussion - 3, 19–28, 244–52. [PubMed] [Google Scholar]
  • 31.Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32 Database issue:D258–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Matthews L, Gopinath G, Gillespie M, Caudy M, Croft D, de Bono B, Garapati P, Hemish J, Hermjakob H, Jassal B, Kanapin A, Lewis S, Mahajan S, May B, Schmidt E, Vastrik I, Wu G, Birney E, Stein L, D’Eustachio P. Reactome knowledgebase of human biological pathways and processes. Nucleic Acids Res. 2009;37(Database issue):D619–22. Epub 2008/11/05. doi: gkn863 10.1093/nar/gkn863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Choi YJ, Lee DH, Han KD, Yoon H, Shin CM, Park YS, Kim N. Adult height in relation to risk of cancer in a cohort of 22,809,722 Korean adults. Br J Cancer. 2019;120(6):668–74. Epub 2019/02/20. doi: 10.1038/s41416-018-0371-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Green J, Cairns BJ, Casabonne D, Wright FL, Reeves G, Beral V, Million Women Study c. Height and cancer incidence in the Million Women Study: prospective cohort, and meta-analysis of prospective studies of height and total cancer risk. Lancet Oncol. 2011;12(8):785–94. Epub 2011/07/26. doi: 10.1016/S1470-2045(11)70154-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Giovannucci E. A growing link-what is the role of height in cancer risk? Br J Cancer. 2019;120(6):575–6. Epub 2019/02/20. doi: 10.1038/s41416-018-0370-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Rosenbush SW, Parker JM. Height and heart disease. Rev Cardiovasc Med. 2014;15(2):102–8. doi: 10.3909/ricm0678. [DOI] [PubMed] [Google Scholar]
  • 37.Emerging Risk Factors C. Adult height and the risk of cause-specific death and vascular morbidity in 1 million people: individual participant meta-analysis. Int J Epidemiol. 2012;41(5):1419–33. Epub 20120723. doi: 10.1093/ije/dys086. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Marouli E, Del Greco MF, Astley CM, Yang J, Ahmad S, Berndt SI, Caulfield MJ, Evangelou E, McKnight B, Medina-Gomez C, van Vliet-Ostaptchouk JV, Warren HR, Zhu Z, Hirschhorn JN, Loos RJF, Kutalik Z, Deloukas P. Mendelian randomisation analyses find pulmonary factors mediate the effect of height on coronary artery disease. Commun Biol. 2019;2:119. Epub 20190327. doi: 10.1038/s42003-019-0361-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Levin MG, Judy R, Gill D, Vujkovic M, Verma SS, Bradford Y, Regeneron Genetics C, Ritchie MD, Hyman MC, Nazarian S, Rader DJ, Voight BF, Damrauer SM. Genetics of height and risk of atrial fibrillation: A Mendelian randomization study. PLoS medicine. 2020;17(10):e1003288. Epub 2020/10/09. doi: 10.1371/journal.pmed.1003288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Saari L, Backman EA, Wahlsten P, Gardberg M, Kaasinen V. Height and nigral neuron density in Parkinson’s disease. BMC neurology. 2022;22(1):254. Epub 2022/07/13. doi: 10.1186/s12883-022-02775-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Osler M, Okholm GT, Villumsen M, Rozing MP, Jorgensen TSH. Associations of Young Adult Intelligence, Education, Height, and Body Mass Index with Subsequent Risk of Parkinson’s Disease and Survival: A Danish Cohort Study. J Parkinsons Dis. 2022;12(3):1035–43. Epub 2022/02/12. doi: 10.3233/JPD-213102. [DOI] [PubMed] [Google Scholar]
  • 42.Ragonese P, D’Amelio M, Callari G, Aiello F, Morgante L, Savettieri G. Height as a potential indicator of early life events predicting Parkinson’s disease: a case-control study. Movement disorders : official journal of the Movement Disorder Society. 2007;22(15):2263–7. Epub 2007/09/14. doi: 10.1002/mds.21728. [DOI] [PubMed] [Google Scholar]
  • 43.Guo J, Song S. Associations of height loss with cognitive decline and incident dementia in adults aged 50 years and older. J Gerontol A Biol Sci Med Sci. 2023. Epub 2023/02/09. doi: 10.1093/gerona/glad054. [DOI] [PubMed] [Google Scholar]
  • 44.Russ TC, Kivimaki M, Starr JM, Stamatakis E, Batty GD. Height in relation to dementia death: individual participant meta-analysis of 18 UK prospective cohort studies. Br J Psychiatry. 2014;205(5):348–54. Epub 2014/11/05. doi: 10.1192/bjp.bp.113.142984. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Petot GJ, Vega U, Traore F, Fritsch T, Debanne SM, Friedland RP, Lerner AJ. Height and Alzheimer’s disease: findings from a case-control study. Journal of Alzheimer’s disease : JAD. 2007;11(3):337–41. Epub 2007/09/14. doi: 10.3233/jad-2007-11310. [DOI] [PubMed] [Google Scholar]
  • 46.Beeri MS, Davidson M, Silverman JM, Noy S, Schmeidler J, Goldbourt U. Relationship between body height and dementia. Am J Geriatr Psychiatry. 2005;13(2):116–23. Epub 2005/02/11. doi: 10.1176/appi.ajgp.13.2.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Larsson SC, Traylor M, Burgess S, Markus HS. Genetically-Predicted Adult Height and Alzheimer’s Disease. Journal of Alzheimer’s disease : JAD. 2017;60(2):691–8. Epub 2017/09/05. doi: 10.3233/JAD-170528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Harris MA, Brett CE, Deary IJ, Starr JM. Associations among height, body mass index and intelligence from age 11 to age 78 years. BMC Geriatr. 2016;16(1):167. Epub 2016/09/30. doi: 10.1186/s12877-016-0340-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.West RK, Ravona-Springer R, Heymann A, Schmeidler J, Leroith D, Koifman K, Guerrero-Berroa E, Preiss R, Hoffman H, Silverman JM, Beeri MS. Shorter adult height is associated with poorer cognitive performance in elderly men with type II diabetes. Journal of Alzheimer’s disease : JAD. 2015;44(3):927–35. Epub 2014/11/07. doi: 10.3233/JAD-142049. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Bosy-Westphal A, Plachta-Danielzik S, Dorhofer RP, Muller MJ. Short stature and obesity: positive association in adults but inverse association in children and adolescents. Br J Nutr. 2009;102(3):453–61. Epub 2009/03/03. doi: 10.1017/S0007114508190304. [DOI] [PubMed] [Google Scholar]
  • 51.TCGA. Data Types Collected by TCGA Available from: https://www.cancer.gov/ccg/research/genome-sequencing/tcga/using-tcga-data/types.
  • 52.Wilks C, Zheng SC, Chen FY, Charles R, Solomon B, Ling JP, Imada EL, Zhang D, Joseph L, Leek JT, Jaffe AE, Nellore A, Collado-Torres L, Hansen KD, Langmead B. recount3: summaries and queries for large-scale RNA-seq expression and splicing. Genome Biol. 2021;22(1):323. Epub 20211129. doi: 10.1186/s13059-021-02533-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Age and Cancer Risk 2021. Available from: https://www.cancer.gov/about-cancer/causes-prevention/risk/age.
  • 54.Cai S, Zuo W, Lu X, Gou Z, Zhou Y, Liu P, Pan Y, Chen S. The Prognostic Impact of Age at Diagnosis Upon Breast Cancer of Different Immunohistochemical Subtypes: A Surveillance, Epidemiology, and End Results (SEER) Population-Based Analysis. Frontiers in oncology. 2020;10:1729. Epub 20200923. doi: 10.3389/fonc.2020.01729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Glass K, Huttenhower C, Quackenbush J, Yuan GC. Passing messages between biological networks to refine predicted interactions. PLoS One. 2013;8(5):e64832. Epub 2013/06/07. doi: 10.1371/journal.pone.0064832. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1
media-1.xlsx (43.5KB, xlsx)

Data Availability Statement

SEAHORSE is available as a web-based query tool (https://seahorse.networkmedicine.org/) and a Jupyter notebook is available through Netbooks v2.4.1 (https://netbooks.networkmedicine.org) titled “Uncovering Associations among Genes and Phenotypes with SEAHORSE.” Full results for functional enrichment analysis using height as a query for correlated gene expression in GTEx tissues and age for correlated gene expression in TCGA tumors is available through the Harvard Dataverse (https://dataverse.harvard.edu/dataverse/SEAHORSE). All software code for calculating the correlations is written in R and is available under the MIT License at https://github.com/Enakshi-Saha/netZooR/tree/SEAHORSE.


Articles from bioRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES