Abstract
Human complex traits and common diseases show tissue- and cell-type- specificity. Recently, single-cell RNA sequencing (scRNA-seq) technology has successfully depicted cellular heterogeneity in human tissue, providing an unprecedented opportunity to understand the context-specific expression of complex trait-associated genes in human tissue-cell types (TCs). Here, we present the first web-based application to quickly assess the cell-type-specificity of genes, named Web-based Cell-type Specific Enrichment Analysis of Genes (WebCSEA, available at https://bioinfo.uth.edu/webcsea/). Specifically, we curated a total of 111 scRNA-seq panels of human tissues and 1,355 TCs from 61 different general tissues across 11 human organ systems. We adapted our previous decoding tissue-specificity (deTS) algorithm to measure the enrichment for each tissue-cell type (TC). To overcome the potential bias from the number of signature genes between different TCs, we further developed a permutation-based method that accurately estimates the TC-specificity of a given inquiry gene list. WebCSEA also provides an interactive heatmap that displays the cell-type specificity across 1355 human TCs, and other interactive and static visualizations of cell-type specificity by human organ system, developmental stage, and top-ranked tissues and cell types. In short, WebCSEA is a one-click application that provides a comprehensive exploration of the TC-specificity of genes among human major TC map.
Graphical Abstract
INTRODUCTION
When we prioritize a list of risk genes from complex trait and common disease studies, we always consider the context in which these genes might manifest their impact on human tissue-cell types. With the rapid development of single-cell technology, millions of cellular transcriptomes have been generated by multiple international consortia [e.g. Human Cell Atlas (Fetal) (1), Tabula Sapiens (2) and Human Cell Landscape (3–5)] to characterize the cellular heterogeneity in human major tissues and organ systems among different developmental stages. To harness these large-scale heterogeneous data, there is a pressing need for advanced computational methods and web services that could characterize the signature genes of human cell types in different contexts and, reversely, assess the cell types for which a particular gene set may be enriched. To the best of our knowledge, there are a few databases that provide comprehensive cell marker curation, including PanglaoDB (6), CellMarker (7), CellKb (8) and GeneMarkeR (9). However, these methods and tools were designed mainly for curating the marker genes for each cell type, and some of these databases either require a commercial license or have been out of date for several years. Among them, only PanglaoDB provides a reverse search for the cell type enriched by input genes with limited characters. The Specific Expression Analysis (SEA) tool only provided enrichment analysis by using RNA-seq or microarray data from mouse and human central nervous system (http://genetics.wustl.edu/jdlab/csea-tool-2/). Currently, no online web server systematically provides comprehensive human TCs by leveraging scRNA-seq data. Thus, novel methods and tools are in demand to explore tissue-cell-type- (TC-) and context-specificity of gene lists derived from diverse scenarios, e.g. genetic studies of complex disease, differentially expressed genes from bulk high-throughput profiling, previous tissue-specific curation, functional pathway/terms, among others. To fill this gap, we have published a series of works in decoding tissue- and cell-type- specificity of genes, including algorithm development (10,11) as well as databases curation for the large-scale well-annotated transcriptome data at both tissues and cell type level (12–15).
In this work, we aim to provide a user-friendly interactive platform for a broad set of users or investigators to explore the cellular context of any given gene list, like the gene-set enrichment analysis for Gene Ontology and the KEGG pathway (Figure 1). The accurate assessment of context-specificity requires us to tackle the following challenges: (i) comprehensive collection of human tissue-cell type panels that covers the major human organ systems, tissues, cell types and different developmental stages; (ii) we need to make our cell-type specificity enrichment analysis (CSEA) results comparable among different tissue cell types, considering potential biases due to different lengths of signature genes among tissue-cell types; (iii) we need to consider the extent of reliability for our WebCSEA results. We benchmarked our tool with a couple of known tissue- and cell-type-specific signature genes to evaluate the performance.
MATERIALS AND METHODS
Human scRNA-seq tissue panel collection and ontology annotation
Human bodies are composed of 11 major organ systems, ∼100 organs/tissues and >100 unique cell types or thousands of sub-cell types (16). To unveil the cellular heterogeneity of these diverse human tissues, we harnessed human organ transcriptome profiling recently generated by leading consortiums such as Human Cell Atlas (1), Human Cell Landscape (5), and Tabula Sapiens (2,12), and a few representative studies (3,4,17–22). Overall, we collected >5.5 million cells from 111 tissue panels and 1355 tissue-cell types (TCs), which belong to 61 general tissues from different developmental stages across all 11 human organ systems (16) (Figure 2A–C). Given the diverse cell types between the central nervous system and sensory nervous system, we split them into two different systems for visualization. Hence, we have grouped the 111 tissue panels into 12 organ systems in our WebCSEA platform (Figure 2A). Since we adapted original cell type annotations from each panel in our enrichment analysis, the same tissue-cell type (TC) might be potentially annotated to different cell type names in different studies. Therefore, we created general traceable ontology IDs for each tissue and cell type to make TCs from different studies comparable. We adapted our previous strategy used in CSEA-DB to annotate the tissues using the Uberon anatomy ontology (23) and cell type Ontology ID with Cell Ontology (https://www.ebi.ac.uk/ols/ontologies/cl, accessed 2/15/2022) (24,25). Details of the process are described in the Methods section of our previous work (13).
We filtered out lowly expressed genes and used our in-house t-statistic-based method to identify the tissue-cell-type signature genes (11). Notably, there are 25 tissues with at least two panels from different developmental stages or studies, including those tissues targeted by disease with high prevalence, such as lung (6 panels), pancreas (6 panels), liver (5 panels), heart (4 panels) and eye (4 panels). Due to the sampling issue, single-cell RNA-seq data may miss some cell types of the tissue. These ‘redundant’ tissues in our curation provide an opportunity for both validating the findings and comprehensive observation.
Application of cell type-specific enrichment analysis
We modified our previously developed ‘tissue-specific enrichment analysis’ and applied it to explore the cell type-specificity within each tissue. Details of methods could be found in the Method part of our previous work (11,13). Briefly, we used the log2 (CPM + 1) normalized single-cell transcriptome matrix to calculate the cell type-specific expression within the cell types (number of cells ≥ 30) in each tissue. Then, we built a linear regression model to measure the TC-specificity of i gene in j cell type by using the coefficient (t-statistics) of each gene. We defined the top 5% t-statistic score genes in focal cell type as the cell-type-specific genes. Lastly, Fisher's exact test was conducted to assess whether the trait-associated gene (TAG) set from each trait is overrepresented with the cell type-specific genes, where the P-value indicates the significance of this cell-type specificity enrichment analysis (CSEA).
Permutation-based cell-type specificity test using genetic TAGs
To address the common biased problem raised by different lengths of signature genes and input gene lists during the enrichment analysis, we developed a permutation-based method to adjust the P-value from overrepresentation analysis. One straightforward approach will be sampling the random gene sets purely from all genes expressed in each TC and calculating CSEA for them thousands and millions of times to obtain the permutated P-values. However, this method will require a lot of computational resources and surfer from the inflation of Type I error rate since the null hypothesis that TC-specific enrichment is no better than random sampling results is relatively loose and easy to reject and, therefore, inappropriate. Instead, we decided to define a more biologically meaningful null hypothesis for assessing the tissue and cell-type specificity, given the nature that trait-associated gene sets (TAGs) from human complex traits and diseases have ‘moderate’ tissue- and cell-type- specificity among different human tissue-cell types (26). Intuitively, if one gene list has a CSEA raw P-value ranked as among the top of the P-values from CSEA for large-scale random TAGs, then we reject the null hypothesis that this gene list has no enrichment with this tissue cell type.
Our curation for TAGs has two major components, European Ancestry genome-wide association studies (GWAS) from our long-term curation of GWAS catalog described in TSEA-DB and CSEA-DB (13,15), and rare-variants gene-based association phenotypes from UK biobank (27).
For GWAS TAGs, we used Multi-marker Analysis of GenoMic Annotation (MAGMA v1.07) (28) to calculate the gene-level P-values. Briefly, we used all SNPs within the 50 kb upstream and 35 kb downstream window of each gene region and then calculated the mean χ2 statistic for these SNPs to obtain gene-based P-values. Lastly, we accounted for the effects of the gene length, SNP density, and local linkage disequilibrium structure (1000 Genome Project Phase 3 European population reference panel). We used a dynamic gene-based P-value threshold to generate the TAGs for >5600 GWAS from our previous curation and GWAS catalog accessed by 25 November 2021 (13,29). Consistent with the input gene length, we further limited the number of genes in TAGs from 20 to 2000. In the end, we collected 17 807 GWAS TAGs.
For the TAGs from rare-variants gene-based associations, we obtained Hail matrix table format (https://github.com/hail-is/hail/releases/tag/0.2.13) results from Genebass (gene-biobank association summary statistics) database, which contains associations for over 3700 phenotypes from 281 850 individuals in the UK Biobank (27). Considering different scenarios of causal variant effects, they conducted the kernel association test (SKAT test) (30), SKAT optimal unified test (SKAT-O test) (31), and burden test (30) to assess three types of rare variants (loss of function (LoF), missense and synonymous) in each gene. We used the normal significant 1 × 10–5P-value thresholds for the burden test, SKAT test, and SKAT-O test. After filtering the gene length ranges from 20 to 2000, we collected 1856 TAGs from rare-variants gene-based associations.
We conducted >26 million times of CSEA for 19 663 TAGs over 1355 TCs as precalculated permutations for backend analysis (Figure 1). To assess the distribution of these permutated P-values, we converted these P-values to Z score by the cumulative distribution function. As shown in Figures 2D and E, the largest and smallest Z scores in the organ systems are the lymphatic system (2.81) and skeletal system (0.93), respectively. Similarly, the longest and shortest mean panel lengths are the lymphatic system (1513.45) and skeletal system (764.38), respectively. Thus, we observed a positive correlation between the mean Z score by organ system and the panel signature length by organ system, confirming the aforementioned bias from the signature length in each TC. Therefore, within each TC, the relative rank among these 19 663 TAG P-values was used to depict the TC-specificity.
We followed the formulas below for each inquiry gene list and conduct the CSEA across 1355 TCs to get the permutated P-values by ranking among the P-values in 19 663 TAGs in each TCs. Then, we used Fisher's method to combine an overall P-value from the permutated P-values by TAGs and TCs.
(1) |
(2) |
(3) |
Specifically, i indicates the ith TC, where Praw(i) is the hypergeometric test of inquiry gene list and TC-specific genes by our previous method decoding tissue-specificity (deTS); the cumulative factor is an empirical parameter used to adjust the TAGs length difference. Here we used the proportion of TAGs in a total of 19 663 TAGs with at least the length of input gene list; P#TAG(i) represents the vector of permutated CSEA P-values for 19 663 TAGs; Then, the Praw(i) is further ranked overall the 1355 CSEA P#TC(i) in ith TC. Lastly, we used Fisher's method to combine P-values from Pperm(i) and PTC(i) in ith TC into one chi-squared distribution statistic (X2). The sum of two independent chi-squared values follows a chi-squared distribution with 2*2 degrees of freedom.
Web server specifications
Input
WebCSEA only requires a gene symbol list as the input, which could take up to 3 min for accomplishing the job for 1000 genes (Figure 3A). A job identifier is provided for each job for the user to retrieve and share the results. After the user input the gene list, the system will automatically count the genes mapped to our background reference gene symbols curated from GRCh38 Ensembl version 99. We provide three basic function buttons to submit a job, reset input, and use an example gene list, respectively. To get a meaningful cell-type-specific analysis, we suggest the users input a gene list with a length ranging from 20 to 2000.
Output
We provide the user an instant overview of top TC-specificity results of a query gene list by jitter plots the enrichment results for both raw P-value and combined P-value among the 1355 TCs by five categories of stratification, including adult organ system, organ system, fetal organ system, top 20 tissues, and top 20 general cell types from up to bottom (Figure 3B) in both interactive and static plots. The interactive jitter plot function allows the user to explore the specific tissue-cell-type-specificity freely for both raw and combined P-values. Specifically, we provided several input options to display the cell-type-specificity enrichment results, including organ systems, adult and fetal developmental stages, top 20 tissues, and top 20 general cell types. When the cursor is placed at each dot, the dots in that specific column will all be highlighted and the detailed tissue cell type information, as well as the –log10(P) in that specific category, will be displayed in the floating window. The user could further download the corresponding plot in JPEG, SVG and PDF format. The users could further highlight the tissue-cell type of interest on those downloaded files. For the static jitter plots, we annotated the top 5 TCs in each category of stratification. We highlighted the different components from each category by color. We set two different thresholds to indicate the significance. The red dashed line is Bonferroni-corrected significance (P = 3.69 × 10–5) by 1355 tissue-cell types. The solid grey line is the nominal significance (P = 1 × 10–3). The top 3 enriched TCs in combined P-value were listed with their intersected genes between the input gene list and the corresponding cell type gene signatures (Figure 3C). Lastly, we used an interactive P-value heatmap to display the enrichment result in the overview model or tissue panel model (Figure 3D). The user could easily browse and filter the detailed P-values of the top-ranked tissue panels or tissue of their interest. More importantly, the user could click the heatmap to obtain the intersected genes of input genes and selected TCs. Lastly, we also provide the download link for the full results and the Github repository to reproduce or customize the visualization.
Web service design
The web server runs on a Linux server equipped with CentOS 7 and Apache (version 2.4). The server has four CPUs [Intel(R) Xeon(R) E5-2637 v3] with 128GB memory and an 8TB hard disk to support computational tasks. Our WebCSEA follows the Model-View-Controller (MVC) web design framework. The model is the main calculation function on the backend implemented in the R programming language utilizing the deTS R package. The calculation function takes the input data submitted and generates the result files. The views are the frontend interfaces that were implemented using HTML5, CS and JavaScript. In WebCSEA, the view pages are designed to guide users to input their data properly, show job running status, display feedback including practical errors, and present results when the job finishes successfully. The controller leveraged Ajax technology to validate the inputs, submit jobs, check running status, retrieve result data from the server-side, and update the information in views without refreshing the web page by utilizing the jQuery library. PHP was also used as an auxiliary function for data display. The MVC design provides a clear separation of logic, making the testability frictionless and allowing us to extend functions and add new features easily in the future. We designed a queue strategy to organize the submitted jobs to avoid server overloading or task jams. Each user can submit as many jobs as necessary. WebCSEA allows a maximum of 5 jobs running in parallel, and other jobs will be automatically added to a queue.
Benchmark examples and context of application
We provided extensive examples and applications in the ‘Tutorial’ pages. There, we evaluated the performance of our WebCSEA by using a few ground truth gene lists from cell-type signature genes, disease risk genes with unknown onset tissues and cell-type-specific pathways. As shown in Figure 4A, we obtained the 110 hepatocyte-specific expressed genes from the cell-type signature genes database PanglaoDB (6). We could observe the top four dots were all annotated as hepatocytes from four different scRNA-seq panels, suggesting the high accuracy and consistency of our WebCSEA. Moreover, this also indicated our WebCSEA could help to validate cell type signature genes. Figure 4B shows 119 eye-diseases related genes from the RetNet (https://sph.uth.edu/retnet/). We could see a peak in the Sensory system only, aligning with the disease context. In Figure 4C, we used the 22 gold standard type 2 diabetes GWAS risk genes from T2D Knowledge Portal (https://t2d.hugeamp.org/). The top 3 significant enriched TCs are small intestine enterocyte, pancreas pancreatic beta cell, and fetal pancreas islet endocrine cells, respectively. These findings indicate that WebCSEA could identify the disease-relevant tissue and cell types by using a limited number of gold standard genes. Next, we raised a few scenarios in which the users could use the WebCSEA to explore the TC-specificity of focal gene lists, including differentially expressed genes (DEG) from Bulk RNA-seq, tissue-marker genes, and developmental stage-specific genes. In Figure 4D, we conducted the CSEA for the 190 human homologues derived from the DEG analysis of pancreatic ductal adenocarcinoma mouse with CD73 inhibitor AB680 treatment between controls (32). We identified two major cell types (endothelial cell and stromal cell) enriched in the pancreas, which shed light on exploring the drug mechanism at the cellular level. As shown in Figure 4E, we adapted the 104 signature genes of TCGA lung adenocarcinoma (LUAD) from tissue-specific gene database in cancer database (TissGDB) (33). We successfully identified the tissue marker genes were mainly contributed by Alveolar Type 2 and Alveolar Type 1 cells in multiple tissue panels. Figure 4F shows the CSEA result of 471 Kidney developmental genes derived from the previous study (34). We identified epithelial progenitor and metanephric cells highlighted in the fetal stage, which could help validate the developmental stage of this gene list. In addition, WebCSEA accurately explores TC-specific gene ontology pathways. We identified neurons that were enriched with Excitatory neuron-specific pathway (Figures 3B, C) curated from the previous studies (35,36).
CONCLUSION
WebCSEA (Web-based Cell-type Specific Enrichment Analysis of Genes) provides a gene set enrichment query among a systematic collection of human tissue-cell type expression signatures with manual ontology curation. We leverage our deTS algorithm over 1355 human tissue-cell types and generate an enrichment P-value to assess their specificity for each query gene list. We further calculate a permutation-test P-value with ∼20 000 previously calculated P-values derived from TAGs of diverse human diseases and traits, which provides us a natural reference for the goodness of enrichment adjusted with the length difference in signature genes of different cell types. We provide extensive visualization functions such as interactive heatmap, and interactive and static jitter plots to display the cell-type specificity across 1355 human TCs by human organ system, developmental stage, and top-ranked tissues and cell types. We also provide intersected genes of the most enriched tissue-cell types as well as any tissue-cell types of interest by clicking the interactive heatmap. Furthermore, all the results and visualization in WebCSEA could be downloaded and reproduced. Users can filter, prioritize and customize the visualization of CSEA results with the download materials and script on Github. Lastly, we validated the WebCSEA performance using gold-standard gene lists and obtained high accuracy and consistency. In summary, WebCSEA platform provides a quick, accurate, and comprehensive assessment for exploring and validating the context- and TC-specificity of focal gene list freely. Moreover, WebCSEA results might shed light on identifying unknown pleiotropic effects of genes on some ‘unexpected’ TCs, which potentially help to explain the comorbidities of complex diseases that have shared genetic components and disease genes (37). Just like the gene-set enrichment analysis, we expect WebCSEA to serve the broad users in both research and clinical community to explore the biological context of human genes.
There are a few limitations that could be improved upon in the future. Although we covered 61 human tissues across fetal, neonatal, and adult stages, many tissues and contexts still have not been included. With the exponential growth of single-cell research, we expect a more comprehensive collection of tissues to be generated and available in our WebCSEA. We are aware that the raw P-value will become more significant along with the increasing length of the input gene list and the incidence of false-positive results. Hence, we utilized the weak tissue-specific TAG permutations to account for the ‘inflated’ P-values in TCs with long signature genes. However, this correction might be too stringent when the input gene list is small (∼20 genes). Thus, we suggest that users take both raw P-value and permutated P-value into consideration to depict the TC-specificity. Lastly, the significant enrichment result from WebCSEA could only be interpreted as an association among thousands of TCs. We expect these preliminary results could inspire and support the users to conduct follow-up studies to further uncover the cellular context of the given gene list.
DATA AVAILABILITY
All the data generated or analyzed in this study are available from the authors, upon request. The enrichment method deTS has been deposited in CRAN repository and all the plots from the WebCSEA could be reproduced with the code on Github repertoire https://github.com/davidroad/WebCSEA.
ACKNOWLEDGEMENTS
The authors thank Rui Chen, Sangbae Kim, Kaitlyn Xiong, and Meng Wang from Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine for giving valuable suggestions. We would like to thank all the members of Bioinformatics and Systems Medicine Laboratory (BSML) for helpful discussion.
Contributor Information
Yulin Dai, Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Ruifeng Hu, Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Center for Advanced Parkinson Research, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA; Genomics and Bioinformatics Hub, Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA.
Andi Liu, Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Kyung Serk Cho, Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Astrid Marilyn Manuel, Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Xiaoyang Li, Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Xianjun Dong, Center for Advanced Parkinson Research, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA; Genomics and Bioinformatics Hub, Department of Neurology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA.
Peilin Jia, Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.
Zhongming Zhao, Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA; MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX 77030, USA.
FUNDING
National Institutes of Health [R01LM012806, R01DE030122, R01DE029818 to Z.Z.]; Cancer Prevention and Research Institute of Texas [CPRIT RP180734 and RP210045]; A.L. is supported by a training fellowship from the Gulf Coast Consortia on Training in Precision Environmental Health Sciences (TPEHS) Training Grant [T32ES027801]; A.M.M. is supported by a training fellowship from the Gulf Coast Consortia, on the National Library of Medicine (NLM) Training Program in Biomedical Informatics & Data Science [T15LM007093]. Funding for open access charge: National Institutes of Health [R01LM012806].
Conflict of interest statement. None declared.
REFERENCES
- 1. Regev A., Teichmann S.A., Lander E.S., Amit I., Benoist C., Birney E., Bodenmiller B., Campbell P., Carninci P., Clatworthy M. et al. The human cell atlas. Elife. 2017; 6:e27041. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Jones R.C., Karkanias J., Krasnow M.A., Pisco A.O., Quake S.R., Salzman J., Yosef N., Bulthaup B., Brown P., Harper W. et al. The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans. Science. 2022; 376:eabl4896. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Madissoon E., Wilbrey-Clark A., Miragaia R.J., Saeb-Parsy K., Mahbubani K.T., Georgakopoulos N., Harding P., Polanski K., Huang N., Nowicki-Osuch K. et al. scRNA-seq assessment of the human lung, spleen, and esophagus tissue stability after cold preservation. Genome Biol. 2019; 21:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Vieira Braga F.A., Kar G., Berg M., Carpaij O.A., Polanski K., Simon L.M., Brouwer S., Gomes T., Hesse L., Jiang J. et al. A cellular census of human lungs identifies novel cell states in health and in asthma. Nat Med. 2019; 25:1153–1163. [DOI] [PubMed] [Google Scholar]
- 5. Han X., Zhou Z., Fei L., Sun H., Wang R., Chen Y., Chen H., Wang J., Tang H., Ge W. et al. Construction of a human cell landscape at single-cell level. Nature. 2020; 581:303–309. [DOI] [PubMed] [Google Scholar]
- 6. Franzén O., Gan L.M., Bjorkegren J.L.M. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database (Oxford). 2019; 2019:baz046. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Zhang X., Lan Y., Xu J., Quan F., Zhao E., Deng C., Luo T., Xu L., Liao G., Yan M. et al. CellMarker: a manually curated resource of cell markers in human and mouse. Nucleic Acids Res. 2019; 47:D721–D728. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Patil A., Patil A. CellKb immune: a manually curated database of mammalian hematopoietic marker gene sets for rapid cell type identification. 2022; bioRxiv doi:13 February 2022, preprint: not peer reviewed 10.1101/2020.12.01.389890. [DOI]
- 9. Paisley B.M., Liu Y. GeneMarkeR: a database and user interface for scRNA-seq marker genes. Front Genet. 2021; 12:763431. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Pei G., Yan F., Simon L.M., Dai Y., Jia P., Zhao Z. deCS: a tool for systematic cell type annotations of single-cell RNA sequencing data among human tissues. Genomics, Proteomics & Bioinformatics. 2022; 10.1016/j.gpb.2022.04.001. [DOI] [PubMed] [Google Scholar]
- 11. Pei G., Dai Y., Zhao Z., Jia P. deTS: tissue-specific enrichment analysis to decode tissue specificity. Bioinformatics. 2019; 35:3842–3845. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Battle A., Brown C.D., Engelhardt B.E., Montgomery S.B. Genetic effects on gene expression across human tissues. Nature. 2017; 550:204–213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Dai Y., Hu R., Manuel A.M., Liu A., Jia P., Zhao Z. CSEA-DB: an omnibus for human complex trait and cell type associations. Nucleic Acids Res. 2021; 49:D862–D870. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Davis C.A., Hitz B.C., Sloan C.A., Chan E.T., Davidson J.M., Gabdank I., Hilton J.A., Jain K., Baymuradov U.K., Narayanan A.K. et al. The encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 2018; 46:D794–D801. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Jia P., Dai Y., Hu R., Pei G., Manuel A.M., Zhao Z. TSEA-DB: a trait-tissue association map for human complex traits and diseases. Nucleic Acids Res. 2020; 48:D1022–D1030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Tortora G.J., Derrickson B.H. Introduction to the Human Body. 2017; John Wiley & Sons. [Google Scholar]
- 17. Enge M., Arda H.E., Mignardi M., Beausang J., Bottino R., Kim S.K., Quake S.R. Single-cell analysis of human pancreas reveals transcriptional signatures of aging and somatic mutation patterns. Cell. 2017; 171:321–330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Lake B.B., Chen S., Sos B.C., Fan J., Kaeser G.E., Yung Y.C., Duong T.E., Gao D., Chun J., Kharchenko P.V. et al. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol. 2018; 36:70–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. MacParland S.A., Liu J.C., Ma X.Z., Innes B.T., Bartczak A.M., Gage B.K., Manuel J., Khuu N., Echeverri J., Linares I. et al. Single cell RNA sequencing of human liver reveals distinct intrahepatic macrophage populations. Nat. Commun. 2018; 9:4383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Segerstolpe A., Palasantza A., Eliasson P., Andersson E.M., Andreasson A.C., Sun X., Picelli S., Sabirsh A., Clausen M., Bjursell M.K. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 2016; 24:593–607. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. van Zyl T., Yan W., McAdams A., Peng Y.R., Shekhar K., Regev A., Juric D., Sanes J.R. Cell atlas of aqueous humor outflow pathways in eyes of humans and four model species provides insight into glaucoma pathogenesis. Proc. Natl. Acad. Sci. U.S.A. 2020; 117:10339–10349. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Voigt A.P., Mulfaul K., Mullin N.K., Flamme-Wiese M.J., Giacalone J.C., Stone E.M., Tucker B.A., Scheetz T.E., Mullins R.F. Single-cell transcriptomics of the human retinal pigment epithelium and choroid in health and macular degeneration. Proc. Natl. Acad. Sci. U.S.A. 2019; 116:24100–24107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Mungall C.J., Torniai C., Gkoutos G.V., Lewis S.E., Haendel M.A. Uberon, an integrative multi-species anatomy ontology. Genome Biol. 2012; 13:R5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Bard J., Rhee S.Y., Ashburner M. An ontology for cell types. Genome Biol. 2005; 6:R21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Meehan T.F., Masci A.M., Abdulla A., Cowell L.G., Blake J.A., Mungall C.J., Diehl A.D. Logical development of the cell ontology. BMC Bioinformatics. 2011; 12:6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Watanabe K., Umicevic Mirkov M., de Leeuw C.A., van den Heuvel M.P., Posthuma D. Genetic mapping of cell type specificity for complex traits. Nat. Commun. 2019; 10:3222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Karczewski K.J., Solomonson M., Chao K.R., Goodrich J.K., Tiao G., Lu W., Riley-Gillis B.M., Tsai E.A., Kim H.I., Zheng X. et al. Systematic single-variant and gene-based association testing of 3,700 phenotypes in 281,850 UK biobank exomes. 2021; medRxiv doi:23 June 2021, preprint: not peer reviewed 10.1101/2021.06.19.21259117. [DOI] [PMC free article] [PubMed]
- 28. de Leeuw C.A., Mooij J.M., Heskes T., Posthuma D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 2015; 11:e1004219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Buniello A., MacArthur J.A.L., Cerezo M., Harris L.W., Hayhurst J., Malangone C., McMahon A., Morales J., Mountjoy E., Sollis E. et al. The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019; 47:D1005–D1012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Wu M.C., Lee S., Cai T., Li Y., Boehnke M., Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 2011; 89:82–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Lee S., Emond M.J., Bamshad M.J., Barnes K.C., Rieder M.J., Nickerson D.A.Team, N.G.E.S.P.-E.L.P. Team, N.G.E.S.P.-E.L.P. Christiani D.C., Wurfel M.M., Lin X Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 2012; 91:224–237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Singh K., Faraoni E.Y., Dai Y., Chandra V., Vucic E., Mills T., Pruski M., Clark T., Van Buren G., Thosani N.C. et al. Pancreatic cancer ductal cell of origin drives CD73-dependent generation of immunosuppressive adenosine. 2021; bioRxiv doi:30 November 2021, preprint: not peer reviewed 10.1101/2021.11.29.470415. [DOI]
- 33. Kim P., Park A., Han G., Sun H., Jia P., Zhao Z. TissGDB: tissue-specific gene database in cancer. Nucleic Acids Res. 2018; 46:D1031–D1038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Wang P., Chen Y., Yong J., Cui Y., Wang R., Wen L., Qiao J., Tang F. Dissecting the global dynamic molecular profiles of human fetal kidney development by single-cell RNA sequencing. Cell Rep. 2018; 24:3554–3567. [DOI] [PubMed] [Google Scholar]
- 35. Ruano D., Abecasis G.R., Glaser B., Lips E.S., Cornelisse L.N., de Jong A.P., Evans D.M., Davey Smith G., Timpson N.J., Smit A.B. et al. Functional gene group analysis reveals a role of synaptic heterotrimeric g proteins in cognitive ability. Am. J. Hum. Genet. 2010; 86:113–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Lips E.S., Cornelisse L.N., Toonen R.F., Min J.L., Hultman C.M.International Schizophrenia, Consortium International Schizophrenia, Consortium Holmans P.A., O’Donovan M.C., Purcell S.M., Smit A.B. et al. Functional gene group analysis identifies synaptic gene groups as risk factor for schizophrenia. Mol. Psychiatry. 2012; 17:996–1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Shikov A.E., Skitchenko R.K., Predeus A.V., Barbitoff Y.A. Phenome-wide functional dissection of pleiotropic effects highlights key molecular pathways for human complex traits. Sci. Rep. 2020; 10:1037. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
All the data generated or analyzed in this study are available from the authors, upon request. The enrichment method deTS has been deposited in CRAN repository and all the plots from the WebCSEA could be reproduced with the code on Github repertoire https://github.com/davidroad/WebCSEA.