Skip to main content
eLife logoLink to eLife
. 2024 Jan 10;13:e84613. doi: 10.7554/eLife.84613

Affected cell types for hundreds of Mendelian diseases revealed by analysis of human and mouse single-cell data

Idan Hekselman 1, Assaf Vital 1, Maya Ziv-Agam 1,, Lior Kerber 1,, Ido Yairi 1, Esti Yeger-Lotem 1,2,
Editors: Bogdan Pasaniuc3, Detlef Weigel4
PMCID: PMC10830129  PMID: 38197427

Abstract

Mendelian diseases tend to manifest clinically in certain tissues, yet their affected cell types typically remain elusive. Single-cell expression studies showed that overexpression of disease-associated genes may point to the affected cell types. Here, we developed a method that infers disease-affected cell types from the preferential expression of disease-associated genes in cell types (PrEDiCT). We applied PrEDiCT to single-cell expression data of six human tissues, to infer the cell types affected in Mendelian diseases. Overall, we inferred the likely affected cell types for 328 diseases. We corroborated our findings by literature text-mining, expert validation, and recapitulation in mouse corresponding tissues. Based on these findings, we explored characteristics of disease-affected cell types, showed that diseases manifesting in multiple tissues tend to affect similar cell types, and highlighted cases where gene functions could be used to refine inference. Together, these findings expand the molecular understanding of disease mechanisms and cellular vulnerability.

Research organism: Human, Mouse

Introduction

Hereditary diseases affect 6% of the world population and typically lack cure. Identification of their genetic and molecular basis is challenging, limiting their diagnosis and treatment (Ferreira, 2019). Knowledge of tissues and cell types that manifest with pathophysiological changes in patients was shown to facilitate the understanding of disease mechanisms (Hekselman and Yeger-Lotem, 2020). For example, transcriptomic analysis of muscle tissues helped to genetically diagnose patients with rare muscle disorders (Cummings et al., 2017). Likewise, identification of a rare cell type affected by cystic fibrosis opened new avenues for treatment (Montoro et al., 2018; Plasschaert et al., 2018). However, whereas affected tissues may be evident for many Mendelian diseases, the exact cell types that manifest with pathophysiological changes within affected tissues are often unknown.

Owing to massive single-cell profiling of mammalian tissues (Jones et al., 2022), investigation of diseases in cellular contexts has become feasible. In particular, it was shown that genes whose aberration leads to disease (i.e., disease genes) tend to be expressed preferentially in cell types that express pathology (denoted disease-affected cell types). For instance, 21/29 mouse homologs of human disease genes associated with nephrotic syndrome were shown to be upregulated in podocytes, the disease-relevant renal cell type (Park et al., 2018). In an inspiring manner, cell-type-specific expression of the cystic fibrosis gene CFTR in ionocytes of human and mouse airways revealed their role in cystic fibrosis (Montoro et al., 2018; Plasschaert et al., 2018). Likewise, the preferential expression of disease genes for 18 Mendelian muscle disorders revealed disease-affected muscle cell types (Eraslan et al., 2022). Thus, preferential expression of disease genes may serve as an indicator of disease-affected cell types, and has the potential to shed light on the cellular mechanisms that underlie Mendelian diseases.

Preferential expression of disease- and trait-associated genes has been used to illuminate cell types affected by complex traits in previously published studies (Dai et al., 2021; Eraslan et al., 2022; Jagadeesh et al., 2021; Kim-Hellmuth et al., 2020; Rouhana et al., 2021; Zhang et al., 2022). Rouhana et al., 2021, presented the ECLIPSER method, which tested whether the expression of genes mapped to trait-associated loci was enriched in specific cell types of a given tissue, when compared to a background set of genes associated with unrelated traits. The CSEA-DB repository applied a cell-type-specific enrichment analysis (CSEA): They assessed cell-type specificity of genes using t-statistics, and then tested whether the top 5% cell-type-specific genes were overrepresented in trait-associated genes (Dai et al., 2021). Eraslan et al., 2022, associated traits to relevant cell types by defining cell-type-specific gene modules and assessing their overlap with trait-associated genes. Zhang et al., 2022, introduced a polygenic single-cell disease relevance score (scDRS), which identified cells exhibiting excess expression of trait-associated genes. Interestingly, expression of trait-associated genes was weighted by trait-association and inversely by technical noise level in single-cell expression data, and the resulting polygenic scDRS score was compared to empirical distribution of control gene sets and all cells. Jagadeesh et al., 2022, presented sc-linker, which constructed continuous-valued gene sets that were differentially expressed in a cell type from healthy samples versus other cell types or versus the same cell type from disease samples, and then linked them to trait-related SNPs. Additional studies compared the differential expression of genes in cell types of diseased versus healthy tissues (Mathys et al., 2019; Segerstolpe et al., 2016) and showed that many trait-associated genes were cell-type specific (Smillie et al., 2019). Guan et al., 2021, defined cell-type specific gene interaction modules, which were overlapped with disease-associated genes. Lastly, the SC2disease database compared gene expression differences between cell types of diseased and healthy samples, among cell types of diseased samples, and among cell types of samples with varying degree of disease severity (Zhao et al., 2021). However, most efforts rarely validated or corroborated the inferred associations in an extensive manner.

Here, we developed a method that infers disease-affected cell types from the Preferential Expression of Disease genes in Cell Types (PrEDiCT). The summarized preferential expression of disease genes was compared to empirical distribution of control gene sets in the same cell type, allowing for the inference of associations between diseases and likely affected cell types. We applied PrEDiCT to 1,140 Mendelian diseases that manifest in six human tissues and inferred likely affected cell types for 328 diseases. We corroborated our findings by text-mining of PubMed records, expert curation, and by their recapitulation in mice. The resulting scheme and large-scale resource allowed us to show that diseases manifesting in multiple tissues tend to affect similar cell types, to refine inference of disease-affected cell types by focusing on gene functions, and to explore characteristics of disease-affected cell types.

Results

Preferential expression of disease genes indicates disease-affected cell types

To identify the cell types affected by Mendelian diseases with known disease genes, we developed the PrEDiCT scoring scheme (Figure 1A). Below we describe the PrEDiCT workflow, including data acquisition, PrEDiCT score calculation, and validation.

Figure 1. Overview of PrEDiCT calculation and assessment.

(A) The PrEDiCT workflow. In step 1, we analyzed single-cell expression data from six human tissues and 129 cell types, and associated 1,140 Mendelian diseases with their affected tissues. In step 2, we calculated the preferential expression of disease genes in cell types of disease-affected tissues, used their median to produce the PrEDiCT score per disease and cell type, and assessed significance of each score. In step 3, we validated likely disease–cell-type associations (i.e., PrEDiCT ≥1, FDR <0.1) via literature text-mining, expert curation, and analysis of mouse single-cell expression data. (B) The distribution of PrEDiCT scores in human (median –0.25±0.93). (C) The preferential expression of genes causal for primary ciliary dyskinesia (PCD) and the PrEDiCT scores of PCD in lung cell types. Preferential expression values and the percentage of cells expressing a gene are indicated by the color and the size of each circle, respectively. The resulting PrEDiCT score is displayed on the right, colored by the score value. Bold outline marks likely disease-affected cell types. (D) PrEDiCT scores of Heinz body anemias across human bone marrow cell types depicted as described in panel C. (E) PrEDiCT scores of mitochondrial complex deficiencies across human skeletal muscle cell types depicted as described in panel C. Mitochondrial complex deficiencies were likely to affect slow and fast muscle cells, except for mitochondrial complex II deficiency whose PrEDiCT scores were highest yet insignificant in these cell types (FDR = 0.61 and 0.23, respectively). (F) False-positive (red) and false-negative (orange) rates of disease–cell-type associations (y-axis) across FDR thresholds (x-axis). Rates were estimated based on literature-supported pairs. The dashed line marks the FDR cutoff for likely associations. (G) The overlap between likely disease–cell-type associations (total of 489, left circle) and literature-supported associations (total of 229, right circle) out of all 34,249 possible associations. The overlap of 41 associations was significant (p<E-15, Fisher’s exact test), supporting the validity of likely associations.

Figure 1—source data 1. The Source Data file contains data used to generate Figure 1B-G.
elife-84613-fig1-data1.xlsx (1,003.2KB, xlsx)

Figure 1.

Figure 1—figure supplement 1. Disease-affected tissues.

Figure 1—figure supplement 1.

Bar plot of the number of diseases per affected tissue that were either associated with any likely affected cell type (red) or not (orange).
Figure 1—figure supplement 1—source data 1. The Source Data file contains the number of diseases with and without likely affected cell types per tissue.
Figure 1—figure supplement 2. The PrEDiCT scheme.

Figure 1—figure supplement 2.

To calculate PrEDiCT scores, we first calculated the preferential expression of disease genes in each cell type relative to other cell types of the disease-affected tissue. Next, we set the PrEDiCT score of a disease in each cell type to the median preferential expression of disease genes, and applied permutation tests to assess PrEDiCT score statistical significance. Cell types with significant PrEDiCT scores ≥1 were considered as likely affected.
Figure 1—figure supplement 3. Distribution of diseases by the number of disease genes.

Figure 1—figure supplement 3.

We considered pairs of diseases and affected tissues (briefly ‘Disease-tissue pairs’), and plotted their distribution by the number of disease genes that were expressed in that tissue. Pairs that were associated with a likely affected cell type appear in blue, otherwise in turquoise.
Figure 1—figure supplement 3—source data 1. The Source Data file contains data of numbers of diseases per number of disease-assoociated genes with and without likely affected cell types.
Figure 1—figure supplement 4. PrEDiCT scheme assessment using expert-curated associations.

Figure 1—figure supplement 4.

(A) False-positive and false-negative rates (y-axes; colored red and orange, respectively) across varying FDR thresholds (x-axis) of expert-curated disease–cell-type associations. (B) The overlap between the number of likely disease–cell-type associations (total of 9; left circle) and the number of expert-curated associations (total of 6; right circle). Likely disease–cell-type associations were enriched for verified expert-curated pairs (5/9, 56%) relative to their frequency among the expert-analyzed pairs (6/60, 10%; p=1.3E-4, Fisher’s exact test).
Figure 1—figure supplement 4—source data 1. The Source Data file contains data used to generate Figure 1—figure supplement 4A, B.

Data of annotated human single-cell transcriptomes were obtained from Tabula Sapiens (Jones et al., 2022). We focused on tissues with two or more samples with ≥800 sequenced cells that were also sequenced in mice [(Tabula Muris, 2018); Methods]. Tissues included bone marrow, lung, skeletal muscle, spleen, tongue and trachea, and altogether were comprised of 129 cell types. Next, we associated these tissues with Mendelian diseases that affect them, based on their phenotypic abnormality annotations in Human Phenotype Ontology (HPO) database [(Köhler et al., 2021); Methods]. To assess the reliability of HPO annotations we compared them to manually curated annotations that were available in ODiseA database [(Hekselman et al., 2022); Methods]. This asserted 490 of 649 (76%) annotations available for these diseases in ODiseA, supporting the usage of HPO as an indicator of the disease-affected tissue. We then gathered the respective disease genes from the Online Mendelian Inheritance in Men (OMIM) database (Amberger et al., 2019). Overall, 1140 diseases and 2434 disease genes were associated to at least one affected tissue, with the majority associated to skeletal muscle (Figure 1—figure supplement 1 and Supplementary file 1).

To calculate PrEDiCT scores, we computed the preferential expression of disease genes in each cell type relative to other cell types of the disease-affected tissue (Methods). Next, we set the PrEDiCT score of a disease in each cell type to the median preferential expression of the respective disease genes. Statistical significance of the score was determined using a permutation test (Methods; Figure 1—figure supplement 2). In total, we analyzed 34,249 disease–cell-type associations (Supplementary file 2). PrEDiCT score distribution is shown in Figure 1B.

We identified 489 (1.4%) ‘likely’ associations (PrEDiCT ≥ 1, FDR <0.1), covering 328/1140 diseases and 102/129 cell types. Although most likely associations involved diseases with a single disease gene, the fraction of diseases with likely associations was higher for diseases with multiple disease genes (Figure 1—figure supplement 3). As proof-of-concept, primary ciliary dyskinesia (PCD), which has 31 disease genes expressed in lung and is characterized by damaged ciliary machinery in lung ciliated cells (Leigh et al., 2019), was indeed likely associated with lung ciliated cells (Figure 1C). Additionally, Heinz body anemias, which is characterized by accumulation of inclusion bodies in erythrocytes (Herman et al., 2023), was indeed likely associated with bone marrow erythrocytes (Figure 1D). Lastly, compatible with the high demand for mitochondrial activity in muscle cells, four out of five mitochondrial complex deficiencies were likely associated with slow and fast muscle cells (Figure 1E). ‘Mitochondrial complex II deficiency’ also scored highest in slow and fast muscle cells yet its PrEDiCT scores were insignificant, potentially due to the stringency of our statistical analysis (Figure 1E). As a negative control, gracile bone dysplasia that might manifest with ankyloglossia (also known as ‘tongue-tie’) is not expected to inflict on cell types of tongue tissue, and indeed no likely affected cell-type was detected (Supplementary file 2). Likewise, platelet-type bleeding disorder, which affects bone-marrow–derived platelets, had no likely affected cell-type in bone-marrow since platelets and their precursor cells (megakaryocytes) were missing from that tissue.

To assess at larger scale whether likely associations indicate disease-affected cell types, we turned to literature text-mining and to expert annotations. For text-mining, we postulated that diseases and their affected cell types will co-appear in the literature more frequently than expected by chance. To estimate co-appearance, we extracted PubMed records mentioning a disease or a cell type in our dataset and tested the significance of the co-appearance of all disease–cell-type pairs (Methods). Records were extracted by using Biopython package, which retrieves up to 9,999 records per term (Cock et al., 2009). We identified 229 ‘literature-supported’ pairs that co-appeared in the literature more often than expected by chance (adjusted p<0.001, Chi-squared test; Supplementary file 3). PCD, for example, co-appeared significantly with lung ciliated cells, as well as with respiratory goblet, mucous, and basal cells (the latter are precursors of the other three). Based on literature-supported pairs, we estimated the false-positive and false-negative rates. The false-positive rates for likely associations were low (Methods; Figure 1F). Likely disease–cell-type associations were enriched for literature supported pairs (41/489, 8.4%) relative to all disease–cell-type associations (229/34,249, 0.7%; p<E-15, Fisher’s exact test; Figure 1G). Lastly, we repeated the above analyses using expert-curated annotations of disease-affected cell types, showing a low false-positive rate (Figure 1—figure supplement 4A) and a higher enrichment for expert-curated associations (p=1.3E-4, Fisher’s exact test; Figure 1—figure supplement 4B). Altogether, these results support likely associations as indicators of disease-affected cell types.

Disease-affected cell types are recapitulated in mouse

To further assess whether PrEDiCT scores indicate disease-affected cell types, we tested whether matching cell types were likely affected in mice. For this, we downloaded mouse single-cell transcriptomes for the six tissues from Tabula Muris, 2018. These data consisted of 46 annotated cell types and some unannotated subsets (Figure 2A).

Figure 2. Recapitulation of disease-affected cell types in mouse.

(A) The number of human cell types annotated by Tabula Sapiens [(Jones et al., 2022); red], and the number of mouse cell types annotated by [Tabula Muris, 2018; grey] and this study (blue). (B) The distribution of PrEDiCT scores in mouse. (C) The preferential expression of mouse orthologs of PCD disease genes and the PrEDiCT scores of PCD in mouse lung cell types. Preferential expression values and the percentage of cells expressing a gene are indicated by the color and the size of each circle, respectively. The resulting PrEDiCT score is indicated on the right colored by the score value. Bold outline marks likely disease-affected cell types. (D) PrEDiCT scores of Heinz body anemias across mouse bone marrow cell types depicted as described in panel C. (E) PrEDiCT scores of mitochondrial complex deficiencies across mouse skeletal muscle cell types depicted as described in panel C. Mitochondrial complex deficiencies were likely associated with striated muscle cells, except for mitochondrial complex II deficiency whose PrEDiCT scores were highest yet insignificant in these cell types (FDR = 0.65). (F) The correlation between PrEDiCT scores in human (X-axis) and mouse (Y-axis) cell types. Each dot represents a distinct pair. PrEDiCT scores of non-matching cell types did not correlate (left; r=−0.02, Spearman correlation), in contrast to PrEDiCT scores of matching cell types (right; r=0.38, p<E-15). (G) The cell types affected by the same disease in human and mouse tended to match each other (green) more than expected by chance (grey) according to 1,000 repeats in a permutation test. Error bars represent the standard deviation of the number of randomly matching cell types between the species. Adjusted **p<0.01 and ***p<0.001, permutation test.

Figure 2—source data 1. The Source Data file contains data used to generate Figure 2A-G.

Figure 2.

Figure 2—figure supplement 1. The fraction of likely disease–cell-type associations in human that were recapitulated in mouse.

Figure 2—figure supplement 1.

The comparison included likely associations for diseases with expressed mouse ortholog(s), and cell types with any matching cell type in the mouse corresponding tissue. The fraction of likely disease–cell-type associations that were recapitulated was similar between diseases with a single disease gene (37/62, 60%) and those with multiple disease genes (83/129, 64%; p=0.63, Fisher’s exact test).
Figure 2—figure supplement 1—source data 1. The Source Data file contains the numbers of diseases with a single, or multiple, disease-associated genes that were or were not recapitulated using mouse expression data.

To improve the annotation of cell types, we reanalyzed the single-cell mouse transcriptomes (Methods). Altogether, we obtained 97 cell clusters across all six tissues (Supplementary file 4). Sixteen clusters overlapped considerably with cell types annotated by Tabula Muris and were thus annotated similarly. We annotated the 81 remaining clusters by careful examination of the expression of known cell-type marker genes (Methods), thereby improving cell type annotation per tissue (Figure 2A). For instance, the number of annotated cell types in skeletal muscle increased from six to 19. Newly annotated cell types included clinically relevant cell types, such as basement-membrane residing fibroblasts that are a main source of different collagens and are essential for skeletal muscle physiology (Kivirikko et al., 1995; Zou et al., 2008). Next, we aimed to identify similar cell types between human and mouse. This was based on expression of orthologous marker genes and the matchSCore2 package [Methods; (Mereu et al., 2020)]. As expected, a large variety of human cell types had a matching mouse cell type (82/129) and vice versa (70/97; Supplementary file 5). The impact of disease genes on the matching was negligible, as shown by repeating the identification of similar cell types upon excluding disease genes from the list of marker genes (Methods).

Next, we tested whether diseases affected matching cell types between human and mouse. For this, we calculated PrEDiCT scores in mouse cell types, based on expression of mouse orthologs of human disease genes. The distribution of PrEDiCT scores was similar between mouse and human, and included 380/24,638 (1.5%) likely disease–cell-type associations (PrEDiCT ≥ 1, FDR < 0.1; Figure 2B; Supplementary file 6). Compatible with our previous proof-of-concept cases in human, PCD likely affected mouse lung ciliated epithelial cells which match human lung ciliated cells (Figure 2C), and Heinz body anemias likely affected mouse bone marrow reticulocytes and erythroblasts which match human erythrocytes (Figure 2D). Likewise, four mitochondrial complex deficiencies likely affected mouse striated muscle cells, which matched human slow and fast muscle cells (Figure 2E). Similar to the results in human, ‘mitochondrial complex II deficiency’ scored highest in these cell types, yet the scores were insignificant (Figure 2E). We compared the PrEDiCT scores of all cell-type pairs of the corresponding tissues between the species. Whereas PrEDiCT scores of non-matching cell types did not correlate (r=−0.02, Spearman correlation), PrEDiCT scores of matching types were modestly correlated (r=0.38, p<E-15, Spearman correlation; Figure 2F). Of the 328 diseases with likely affected cell types in humans, 97 diseases (30%) affected matching cell types in mice, a fraction that was larger than expected by chance (adjusted p<0.01, permutation test; Figure 2G and Supplementary file 6; Methods). This enrichment, and the observation that commonly affected cell types were not biased toward specific cell types, support the validity and generality of the PrEDiCT scheme.

Diseases with multiple inflicted tissues affect similar cell types

Most diseases with a likely affected cell type manifested clinically in two or more tissues, and were denoted multi-tissue diseases (168/328, 51%; Figure 3A). We suspected that these diseases affect a cell type that is similar between the disease-manifesting tissues. To test this hypothesis, we identified matching cell types between tissues, and then examined whether cell types affected by the same disease were enriched for matching cell types. Matching cell types were identified using matchSCore2 package [Methods; (Mereu et al., 2020)]. Overall, we identified 840 pairs of matching cell types between tissues, of which 52% were immunocytes, including macrophages, neutrophils, T cells, B cells, as well as endothelia and fibroblasts. Next, we examined whether the cell types likely affected by the same disease were enriched for matching cell types. We found that 18% (30/168) of the diseases likely affected at least one pair of matching cell types, a fraction higher than expected by chance (adjusted p<0.01, permutation test; Figure 3B and Figure 3—figure supplement 1; Methods). For instance, chronic granulomatous disease (CGD) results in splenomegaly and pneumonia due to impaired phagocytes (i.e., neutrophils, macrophages, and monocytes; Anjani et al., 2020; Leiding and Holland, 1993). These cell types were indeed the likely affected cell types for CGD in both spleen and lung (Figure 3C). Also, CGD likely affected bone-marrow phagocytes, consistent with bone-marrow transplant being a curative treatment for this disease (Leiding and Holland, 1993).

Figure 3. Multi-tissue diseases tend to affect similar cell types in those tissues.

(A) The numbers of diseases with likely affected cell type across tissues (Y-axis) grouped by the number of affected tissues (X-axis). (B) The number of diseases that likely affect at least one pair of matching cell types between tissues (green). This number was higher than expected by chance (dark and light grey correspond to selecting cell types at random from the first tissue or the second one, respectively) according to 1,000 repeats in a permutation test. Only pairs of tissues with ≥2 shared diseases are shown. Error bars represent the standard deviation of the number of randomly matching cell types between the tissues. Shown are maximal adjusted p-value for each pairwise randomization: **p<0.01, ***p<0.001. (C) PrEDiCT scores of cell types affected by chronic granulomatous disease (CGD) in spleen, lung, and bone marrow. Bold outline marks likely affected cell types. Likely affected cell types that were matching among the tissues were connected by green lines.

Figure 3—source data 1. The Source Data file contains data used to generate Figure 3A-C.
elife-84613-fig3-data1.xlsx (154.5KB, xlsx)

Figure 3.

Figure 3—figure supplement 1. Cell-type similarity among human tissues.

Figure 3—figure supplement 1.

Circos plot representing similarity between cell types among human tissues. Width and color of lines that connect each cell-type pair indicate the fraction of diseases likely affecting each cell type and whether the cell types match (red: matching, grey: non-matching; Supplementary file 5), respectively.
Figure 3—figure supplement 1—source data 1. The Source Data file contains data used to generate the circos plot that represents the similarity between cell types of distinct human tissues.

Cellular context prediction refined using gene functions

So far, we assumed that the genes underlying a disease work through a single cellular context. However, the same disease phenotype might arise from mechanisms with distinct cellular contexts. For example, the bone-marrow disease hyper-IgM immunodeficiency could arise from mutations in CD40 gene, affecting B cells, or mutations in CD40 ligand (CD40LG), affecting CD4 T cells (Yazdani et al., 2019). In such cases the PrEDiCT scheme might fail to infer the affected cell types. Indeed, only CD4 T cells were predicted as likely affected by hyper-IgM immunodeficiency (PrEDiCT = 1.6, FDR <0.048). We hypothesized that the cellular context of such ‘multi-cellular diseases’ could be revealed by separately applying the PrEDiCT scheme to subsets of disease genes with distinct functions.

We first focused on diseases that alter intercellular communication. We identified seven diseases in our dataset that were caused by mutations in genes encoding ligands and their receptors. Next, we applied PrEDiCT scheme separately to ligands and to receptors (Methods). As expected, hyper-IgM immunodeficiency was found to likely affect naïve B cells through CD40 (receptor; PrEDiCT = 2.9, FDR = 0.06), and CD4αβ T cells through CD40LG (ligand; PrEDiCT = 3.7, FDR <0.001; Figure 4A). Another example is autosomal recessive limb-girdle muscular dystrophy, which is caused by mutations in laminin-α2 (LAMA2) or its receptor α-dystroglycan (DAG1). By applying PrEDiCT scheme without subdividing the disease genes, we could not infer any likely affected cell type. Yet, by separately applying PrEDiCT scheme to LAMA2 (ligand) and DAG1 (receptor), we inferred mesenchymal stem cells and satellite stem cells as the likely affected cell types, respectively (PrEDiCT = 3.8 and 3.7, FDR <0.05; Figure 4B). In support of the prediction for DAG1, disruption of DAG1 in satellite stem cells has been associated with the defective muscle regeneration seen in diseased patients (Cohn et al., 2002; Servián-Morilla et al., 2020). In support of the prediction for LAMA2, the disease was attenuated in ligand-knockout [Col6a1(-/-)] model mice by supplying wild-type mesenchymal-stem-cell derived fibroblasts. This treatment not only restored the collagen VI of the tissue, but also rescued defects in satellite stem cells (Urciuolo et al., 2013).

Figure 4. Refining cell-type inference using gene functions.

Figure 4.

(A) PrEDiCT scores of cell types likely affected by hyper-IgM immunodeficiency in bone marrow, calculated separately for ligand- or receptor-encoding disease genes. (B) PrEDiCT scores of cell types likely affected by autosomal recessive limb-girdle muscular dystrophy in skeletal muscle, calculated separately for ligand- and receptor-encoding disease genes. (C) PrEDiCT scores of cell types likely affected by heritable cancers in lung and trachea, calculated separately for oncogenes and tumor suppressor genes. Likely affected cell types that matched between the tissues were connected by a green line. Bold outline marks likely-affected cell types.

Figure 4—source data 1. The Source Data file contains data used to generate Figure 4A-C.

Next, we focused on heritable cancers, where disease genes could be divided by their function to either oncogenes or tumor suppressor genes (TSGs). We retrieved from the Cancer Gene Census (Sondka et al., 2018) heritable cancers that manifest in any of the tissues in our dataset (Methods; Supplementary file 7). We first applied PrEDiCT scheme jointly to all cancer-associated genes of the same tissue, and then applied it separately to oncogenes or TSGs, which allowed us to infer distinct cell types for oncogenes and TSGs. For example, in airway cancers (lung and trachea), the joint application of PrEDiCT scheme inferred both basal cells and intermediate monocytes as likely affected. The separate application inferred basal cells as likely affected by oncogenes, and intermediate and non-classical monocytes as likely affected by TSGs (Figure 4C).

Characteristics of disease-affected cell types

Are certain cell types more likely to be affected by Mendelian diseases? To answer this, we determined the ‘susceptibility’ of each cell type as the percentage of diseases affecting it out of all diseases that affect its tissue (Supplementary file 8). Most susceptible were capillary endothelial cells of the tongue (Susceptibility = 25%; Figure 5A), which were affected by diseases that result in morphologic aberrations of the tongue (e.g., telangiectasia and macroglossia). Next, we asked whether cell-type susceptibility was correlated with cell-type prevalence (i.e., the proportion of its cells out of all cells of its tissue; Supplementary file 8). The two measures did not correlate (r=−0.02, Pearson correlation; Figure 5A). For example, fast and slow muscle cells were the most susceptible cell types in skeletal muscle, while each accounting for less than 1% of the cells in that tissue.

Figure 5. Characteristics of disease-affected cell types.

Figure 5.

(A) Cell-type susceptibility (Y-axis) did not correlate with cell-type prevalence (X-axis). The blue line represents linear correlation (r=−0.02, Pearson correlation). (B) Cell-type susceptibility varied between cell classes (p=1.5E-3, ANOVA test). Among the cell classes with many cell types shared among tissues, immunocytes and epithelia were the least susceptible, and endothelia were the most susceptible compared to other cell types (adjusted p<0.05, Mann-Whitney U test; Methods).

Figure 5—source data 1. The Source Data file contains data used to generate Figure 5A, B.

Lastly, we assessed the susceptibility of cell classes (e.g., capillary endothelial cells were classified as endothelia; Supplementary file 8). The most common cell classes were immunocytes, epithelia and endothelia covering 61, 21, and 17 cell types, respectively, in over four tissues. Cell classes had varying susceptibility (Figure 5B; p=1.5E-3, ANOVA). Among the common cell classes, immunocytes and epithelia were the least susceptible, and endothelia were the most susceptible compared to other cell types (adjusted p<0.05, Mann-Whitney U test; Methods).

Discussion

Here, we presented the PrEDiCT scheme for identifying disease-affected cell types based on the cell-type preferential expression of Mendelian disease genes. Preferential expression of disease genes was previously explored in tissue contexts and was shown to characterize disease-affected tissues (Barshir et al., 2018; Barshir et al., 2014; Hekselman and Yeger-Lotem, 2020; Lage et al., 2008; Sonawane et al., 2017). Recently, cell-type preferential expression has been used to highlight potentially-affected cell types for Mendelian diseases and complex traits, often in combination with cell-type regulatory information and enrichment analyses (Dai et al., 2021; Eraslan et al., 2022; Jagadeesh et al., 2021; Kim-Hellmuth et al., 2020; Zhao et al., 2021; Rouhana et al., 2021; Zhang et al., 2022). However, large-scale in-silico validation was rarely conducted (Montoro et al., 2018; Plasschaert et al., 2018). Here, in contrast, we corroborated likely disease-affected cell types by literature text-mining (Figure 1), expert curation (Figure 1—figure supplement 4), and recapitulation in mouse (Figure 2).

The use of PrEDiCT scheme to identify affected cell types has limitations. Aberrations in disease genes that are preferentially expressed in a cell type do not necessarily lead to disease phenotypes in that cell type, leading to erroneous annotation of disease-affected cell types. For example, metabolic–myopathy-associated genes were upregulated in adipocytes of both muscle and breast, yet only muscle adipocytes showed myopathy phenotypes (Eraslan et al., 2022). To reduce the risk for erroneous annotations, we applied PrEDiCT only to cell types of disease-affected tissues (Figure 1—figure supplement 1). To enhance the robustness of likely associations, PrEDiCT scores included the cell-type preferential expression of all disease genes, similarly to Zhang et al., 2022. For PCD, the PrEDiCT score included 31 genes, and indeed pointed to known disease-affected cell types (Figure 1C). Additionally, similarly to other RNA-based schemes, PrEDiCT is oblivious to post-translational regulation, and, since most available single-cell transcriptomic datasets do not contain full-length gene reads, PrEDiCT is also oblivious to cell-type preferential expression of alternatively spliced transcripts. Lastly, preferential expression is just one of several mechanisms that lead to tissue-selective disease manifestations (Hekselman and Yeger-Lotem, 2020). Nevertheless, by applying PrEDiCT to 1,140 diseases and single cell transcriptomes of six distinct tissues, we revealed affected cell types for 29% of the Mendelian diseases in our dataset (Supplementary file 2). Interestingly, this fits with previous observations that about 30% of Mendelian diseases manifest clinically in a tissue that overexpresses disease genes (Barshir et al., 2018; Barshir et al., 2014; Hekselman and Yeger-Lotem, 2020; Lage et al., 2008).

We supported likely disease–cell-type associations by three lines of evidence. The first was text-mining of literature for co-appearance of diseases and cell-types. Text-mining enabled large-scale in-silico assessment, yet co-appearance could also reflect negative and/or speculative results. Our second line of evidence was expert curation. This analysis, although on a smaller scale, provided additional support for the relevance of likely associations (Figure 1—figure supplement 4). The third line of evidence came from recapitulation of results using mouse single-cell data. Yet, since the patterns of variation across genes tend to be similar, mouse single-cell data did not provide statistically independent information. This could lead to more false-positive associations for diseases with a single disease gene. However, the fraction of associations that were recapitulated in mouse did not differ between diseases with a single or multiple disease genes, supporting this line of evidence (Figure 2—figure supplement 1). Another caveat in the comparison between human and mouse was that gene expression data drove both PrEDiCT calculation and human-mouse cell-type matching. This caveat too had limited impact, since matched cell-types were almost identical upon excluding disease genes. Notably, it would be intriguing to integrate data obtained from human and mouse to increase discovery power in future applications. Altogether, the three lines of evidence provided complementary support for likely associations.

Overall, 328 diseases affected 102/129 cell types. Interestingly, there was no correlation between cell type prevalence and its likelihood to be affected (Figure 5A). In particular, endothelia were more likely to be affected by diseases than other prevalent cell classes, whereas immunocytes and epithelia were the least likely to be affected among the prevalent cell classes (Figure 5B). This suggests that these cell types are either more resilient than other cell types, or, alternatively, that their impairment is lethal to the organism. To analyze this further, a cross-tissue study of 20 human tissues showed that immunocytes had similar expression signatures across tissues, in accordance with their common functions, whereas endothelia had tissue-specific expression signatures that reflect their tissue-specialized roles (Jones et al., 2022). Hence, it seems that germline impairment of immunocytes is more likely lethal, whereas the tissue-specialization of endothelia limits the impact of their germline impairment and thus facilitates overall survival. Yet, our analyses focused on diseases and cell types from six specific tissues and thus were limited in scope. The generalizability of our observations therefore awaits analysis of larger sets of diseases and cell types. In the future, once single cell technologies could offer a comprehensive coverage of expressed genes per cell, it will also be intriguing to assess disease heterogeneity across cells within a cell type.

Our expansive resource of diseases and likely affected cell types could be used to interrogate disease etiologies. For example, we showed that mitochondrial diseases tend to affect muscle cells, in accordance with their energetic demands (Figure 1E). Additionally, we showed that diseases inflicting on multiple tissues likely affect similar cell types among those tissues, thereby providing a mechanistic explanation for this phenomenon (Figure 3). Lastly, we demonstrated that the inference of likely affected cell types could be refined by applying PrEDiCT scheme separately to subsets of disease genes with distinct functions. For instance, by separately analyzing oncogenes and TSGs of heritable lung and trachea cancers, we found that oncogenes likely affect basal cells, and TSGs likely affect monocytes (Figure 4C). This could suggest that tissue-constructive cells are more susceptible to oncogene mutations, whereas protective cells, such as monocytes, are more susceptible to TSG mutations. The latter is consistent with the function of monocytes in the elimination of malignant transformation of cells in different tissues (Robinson et al., 2021). Another interesting subset of diseases are those that impair intercellular communication. A recent study explored the cell types affected by monogenic muscular disorders (Eraslan et al., 2022). Consistent with their results, we found that autosomal recessive limb-girdle muscular dystrophy disrupts intercellular communication among muscle cell types, via mutations in the DAG1 receptor (Figure 4B). Yet, by applying PrEDiCT separately to the ligand of DAG1, LAMA2, we also highlighted the involvement of mesenchymal stem cells in the disease (Urciuolo et al., 2013). By exploring disease genes in appropriate cellular context, we enhanced the mechanistic understanding of disease emergence.

The associations between diseases and affected cell types, though supported by literature and recapitulated in mice, remain putative. Experimental testing could be performed in human cell lines, or in mouse models, in light of the many shared cell types between human and mouse (Figure 2). These validation experiments have a huge potential to open new directions in disease research and accelerate cell-directed gene therapy.

Methods

Human single-cell transcriptomics analysis

Single-cell transcriptomes were downloaded from Tabula Sapiens (Jones et al., 2022). We focused on tissues that consisted of ≥2 samples with ≥800 cells and were sequenced in both human and mouse using microfluidic droplet-based 3’-end technology. These tissues included bone marrow, lung, skeletal muscle, spleen, tongue, and trachea. Analysis was done using Seurat package v4.0.5 (Hao et al., 2021). Per tissue, gene expression levels were normalized cell-wise using the NormalizeData function. Henceforth, we considered only genes with normalized counts ≥ 0.05 in≥10% of cells of at least one cell type (Supplementary file 9).

Mouse single-cell transcriptomics analysis

Single-cell transcriptomes were downloaded from Tabula Muris, 2018. To improve the annotation of mouse cells from Tabula Muris, we reanalyzed the transcriptomic profiles of each tissue. First, we selected 2,000 variably expressed genes using FindVariableFeatures function in Seurat. The minimum and maximum average normalized expression of genes across cells were set to 0.05 and 3, respectively (mean.cutoff=c(0.05,3)). We scaled and centered the expression values of the variably expressed genes using ScaleData function, while correcting for the different samples (vars.to.regress=‘mouse.id’). Then, we projected their expression on all significant principal components (PCs; p<0.001, JackStraw test) ordered by their explained variance.

Next, we applied a two-phase clustering process. We clustered cells using Seurat FindNeighbors and FindClusters functions based on all the top significant PCs. To resolve over-clustering of cells, we hierarchically ordered cell clusters using BuildClusterTree function. Then, we tested whether cells from different splits in the tree were distinguishable, according to out-of-bag error of a random forest classifier that was trained on variably expressed genes. Indistinguishable cell clusters (p≥0.05) were merged. To estimate sample-based differences between related clusters, we repeated hierarchically ordering of merged cell clusters. Terminal splits of cell clusters that included uneven numbers of cells from different samples (adjusted p<0.001, Chi-square) were merged. Such differences were observed only in tongue and lung tissues. Gene expression normalization and average calculations were applied as described for human tissues. Henceforth, we considered only genes with normalized counts ≥ 0.05 in ≥10% of cells of at least one cell type (Supplementary file 10).

Annotations of mouse cell clusters

We compared the clusters obtained above to the cell type annotations of Tabula Muris. A cluster where a similar annotation was common to >90% of its cells, and where > 90% of cells with that annotation were within that cluster, was annotated according to Tabula Muris. All other clusters were manually annotated based on highly expressed marker genes (Z score ≥2). We manually searched PubMed for evidence that any of these markers indicates a known cell type (including cell identity and\or function), preferably in the context of the relevant tissue. A cluster was annotated if at least two of its markers indicated the same cell type. To comply with other studies, cell types were named as in Cell Ontology (Diehl et al., 2016). Cell type annotations, relevant marker genes and supporting literature appear in Supplementary file 4.

Annotation of diseases-affected tissues

Disease data were retrieved from OMIM (Amberger et al., 2019) and included phenotypes with a known molecular basis and phenotypic series. Each disease was associated with its disease genes and their mouse orthologs according to OMIM, and was associated with its affected tissue according to HPO phenotypic abnormality annotations (Köhler et al., 2021). We focused on disease that were cataloged by HPO as having main phenotypic abnormality in blood and blood-forming tissues (HP: 0001871), lungs (HP: 0002088), musculature (HP: 0003011), spleen (HP: 0001743), tongue (HP: 0000157), and trachea (HP: 0002778), in accordance with the six tissues that we analyzed. Phenotypic abnormalities that HPO categorized under each of these main terms were also included. We assessed the reliability of associations by comparing them to manually curated associations of diseases and their affected tissues from ODiseA database (Hekselman et al., 2022). For this, we downloaded from ODiseA all the diseases affecting blood and bone marrow, lung, and trachea. ODiseA annotations indicating that a tissue is unaffected by a disease were excluded from further analysis.

PrEDiCT score calculation and significance assessment

For each disease, we calculated its cell-type PrEDiCT scores in all cell types of the disease-affected tissue(s). The calculation was divided into three steps: (i) calculating the cell-type preferential expression of each disease gene; (ii) calculating the cell-type PrEDiCT score; and (iii) assessing the statistical significance of the cell-type PrEDiCT score. The three steps are detailed below.

Step (i): The cell-type preferential expression of a gene was set to the Z-score of its average expression in that cell type relative to cell types of the same tissue (Equation 1). Given the sparsity of single-cell data, per tissue, only genes whose average expression in any cell type exceeded the median average expression across genes and cell types were retained. Preferential expression values for human and mouse are available in Supplementary files 11 and 12, respectively.

Pgc=egc-averageegSDeg (1)

Pgc denotes the preferential expression of gene g in cell type c. egc denotes the expression level e of gene g in cell type c. Average expression and standard deviation (SD) were calculated across all cell types of an affected tissue.

Step (ii): The PrEDiCT score of disease D in cell type c was set to the median cell-type preferential expression of its disease genes (g1 to gn; Equation 2).

PrEDiCTD(c)=medianPg1c,Pg2cPgnc (2)

Step (iii): To assess whether a certain PrEDiCT score was significantly higher than expected by chance for a gene set of the same size, we applied a permutation test. Given a disease D with n disease-associated genes, we selected at random n genes expressed in any cell type, and computed the PrEDiCT score for this random gene set in each cell type of the disease-affected tissue (referred to as ‘random score’). We repeated this procedure 1,000 times, resulting in 1,000 random scores per disease and cell type. The p-value of the PrEDiCT score of disease D in cell type c (PrEDiCTD(c)) was set to the fraction of random scores in c that were at least as high as PrEDiCTD(c) . P-values were then adjusted for multiple hypothesis testing per disease using the Benjamini-Hochberg procedure. The distribution of PrEDiCT scores were similar between tissues (Supplementary file 13).

Text-mining of PubMed records

We searched PubMed for records containing names of the disease in our disease dataset or names of human cell types in the disease-affected tissues. For this, we used eSearch function in Biopython package (Cock et al., 2009). The number of maximum records retrieved was set to the maximum that Biopython supports (retmax = 9999). Then, per tissue, we intersected the list of records of each tissue-affecting disease d with the list of records of each tissue cell type c, to identify records that mention both. Diseases with less than three records that mentioned it together with a specific cell type were excluded. Cell types that were not mentioned with any disease were excluded. Next, per tissue, we determined whether disease–cell-type pairs significantly co-appeared by applying a Fisher’s exact test. Pairs with Z-scores higher than expected (adjusted p<0.001, Bonferroni correction) were determined ‘literature-supported’.

Expert curation and assessment of disease-affected cell types

We assessed whether PrEDiCT score value is indicative of true associations by manual curation of a subset of associations with ranging PrEDiCT score values. Per tissue, we sorted all pairs of diseases and cell-types by their PrEDiCT scores, and then selected two pairs from percentile 0%, 25%, 50%, 75%, and 100%, resulting in a total of 10 pairs per tissue. Each pair was manually reviewed by a medical student to determine whether the cell type presents pathophysiological phenotypes in diseased patients. By using expert knowledge, literature, and OMIM records, each pair was designated as either affected, unaffected, or undetermined.

Determining matching cell types between tissues

For each pair of tissues, all cell types were compared to each other. For this, each cell type was associated with marker genes, namely genes with Z score ≥2. Markers of mouse cell types were converted to their human orthologous genes according to the Mouse Genome Informatics database (Bult et al., 2019). Next, we estimated the similarity between each pair of cell types using matchSCore2 package, which compared the two markers lists by using Jaccard index (Mereu et al., 2020). Cell types were determined as matching if their Jaccard index was ≥0.05 and in the top 10th percentile. To test the impact of disease genes on the matching of human and mouse cell types, we repeated the matching upon excluding disease genes from the list of marker genes. Most (109/122, 89%) of the cell-type pairs that matched originally also matched upon excluding disease genes.

Permutation tests for similarity between likely affected cell types

For each pair of tissues, we denoted one as the reference tissue (Tr) and the other as the test tissue (Tt). Success was determined, per disease, if any of the cell types in Tt were matching to any of the likely affected cell types in Tr. We tested the null hypothesis that the number of diseases for which success was determined (num_s) was not higher than expected by chance. For this, we carried out a permutation test. Per disease, we randomly selected cell types equal in number to the likely affected cell types in Tr and repeated the success test; specifically, we checked if the randomly selected cell types in Tr were matching to any of the likely affected cell types in Tt. We repeated this for all diseases with likely affected cell types in Tr, and recorded the number of randomly successful diseases (num_r). We repeated this procedure 1000 times. Significance was calculated as the fraction of cases where num_r≥num_s. p-Values were adjusted for multiple comparisons by Benjamini-Hochberg correction.

Analysis of likely affected cell classes

Susceptibility of a given cell type was set to the percentage of diseases affecting that cell type out of all diseases that affect that same tissue (Supplementary file 7). Cell types were grouped into one of five cell classes: fibroblasts, stem cells, myocytes, endothelia, epithelia and immunocytes, or were grouped as ‘other’. Then, we applied an analysis of variance (ANOVA) test to the proportions of diseases that likely affected each cell class using aov function in R v4.1.1. To further determine whether specific cell classes were more susceptible than others, we compared the cell-type susceptibility of prevalent cell classes (>15 cell types across ≥4 tissues each). to all other cell types, by using Mann-Whitney U test. p-Values were adjusted for multiple comparisons by Benjamini-Hochberg correction.

Ligands- and receptors-associated diseases

We extracted a list of 1625 pairs of ligands and their corresponding receptors from Jin et al., 2021. We retrieved all diseases with disease genes that included both a ligand and its receptor. We filtered this set to include diseases where both the ligand and its receptor were preferentially expressed in distinct cell types.

Heritable cancers

Data of heritable cancers were downloaded from the Cancer Gene Census of the Catalogue of Somatic Mutations in Cancer [COSMIC; (Sondka et al., 2018)]. Specifically, we downloaded germline tumor type, associated genes, and role in cancer (oncogenes or tumor suppressor genes) from tiers 1 and 2. We manually annotated germline tumor types to the six tissues included in our study (Supplementary file 8).

Acknowledgements

This study was funded by the Israel Science Foundation [317/19 to E.Y.-L] and [401/22 to E.Y.-L].

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Esti Yeger-Lotem, Email: estiyl@bgu.ac.il.

Bogdan Pasaniuc, University of California, Los Angeles, United States.

Detlef Weigel, Max Planck Institute for Biology Tübingen, Germany.

Funding Information

This paper was supported by the following grants:

  • Israel Science Foundation 317/19 to Esti Yeger-Lotem.

  • Israel Science Foundation 401/22 to Esti Yeger-Lotem.

Additional information

Competing interests

No competing interests declared.

Author contributions

Formal analysis, Visualization, Methodology, Writing – original draft.

Methodology.

Data curation.

Data curation.

Data curation, Formal analysis.

Conceptualization, Supervision, Writing – original draft.

Additional files

Supplementary file 1. Diseases, disease genes, and likely affected cell types of each tissue.
elife-84613-supp1.xlsx (324KB, xlsx)
Supplementary file 2. Diseases and their PrEDiCT scores across human cell types per tissue.
elife-84613-supp2.xlsx (1.8MB, xlsx)
Supplementary file 3. Names of disease and cell type co-appearance in PubMed records per tissue.
elife-84613-supp3.xlsx (201.6KB, xlsx)
Supplementary file 4. Mouse cell clusters annotations.
elife-84613-supp4.xlsx (904.7KB, xlsx)
Supplementary file 5. Matching cell types between human and mouse tissues.
elife-84613-supp5.xlsx (79.3KB, xlsx)
Supplementary file 6. Diseases and their PrEDiCT scores across mouse cell types per tissue.
elife-84613-supp6.xlsx (1.3MB, xlsx)
Supplementary file 7. Tissues affected by heritable cancers.
elife-84613-supp7.xlsx (12.6KB, xlsx)
Supplementary file 8. Prevalence, susceptibility and classes of cell types.
elife-84613-supp8.xlsx (17.4KB, xlsx)
Supplementary file 9. The percentage of cells that express a gene per cell type in human tissues.
elife-84613-supp9.xlsx (44.9MB, xlsx)
Supplementary file 10. The percentage of cells that express a gene per cell type in mouse tissues.
elife-84613-supp10.xlsx (14.8MB, xlsx)
Supplementary file 11. The preferential expression of genes in cell types of human tissues.
elife-84613-supp11.xlsx (23.4MB, xlsx)
Supplementary file 12. The preferential expression of genes in cell types of mouse tissues.
elife-84613-supp12.xlsx (12.3MB, xlsx)
Supplementary file 13. Summary of PrEDiCT scores.
elife-84613-supp13.xlsx (11.7KB, xlsx)
MDAR checklist

Data availability

All data generated or analyzed during this study are included in the manuscript and in the supporting files. The Source Data files contain data used to generate all figures. Additional data and code to redo analysis are available at GitHub https://github.com/hekselman/PrEDiCT(copy archived at Hekselman, 2024) and Dryad https://doi.org/10.5061/dryad.9w0vt4bm7.

The following dataset was generated:

Hekselman I, Yeger-Lotem E. 2024. Affected cell types for hundreds of Mendelian diseases revealed by analysis of human and mouse single-cell data. Dryad.

The following previously published datasets were used:

The Tabula Sapiens Consortium. Jones RC, Karkanias J, Krasnow MA. 2022. Tabula Sapiens. NCBI Gene Expression Omnibus. GSE201333

Tabula Muris Consortium 2018. Tabula Muris: Transcriptomic characterization of 20 organs and tissues from Mus musculus at single cell resolution. NCBI Gene Expression Omnibus. GSE109774

References

  1. Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Research. 2019;47:D1038–D1043. doi: 10.1093/nar/gky1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Anjani G, Vignesh P, Joshi V, Shandilya JK, Bhattarai D, Sharma J, Rawat A. Recent advances in chronic granulomatous disease. Genes & Diseases. 2020;7:84–92. doi: 10.1016/j.gendis.2019.07.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Barshir R, Shwartz O, Smoly IY, Yeger-Lotem E. Comparative analysis of human tissue interactomes reveals factors leading to tissue-specific manifestation of hereditary diseases. PLOS Computational Biology. 2014;10:e1003632. doi: 10.1371/journal.pcbi.1003632. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Barshir R, Hekselman I, Shemesh N, Sharon M, Novack L, Yeger-Lotem E. Role of duplicate genes in determining the tissue-selectivity of hereditary diseases. PLOS Genetics. 2018;14:e1007327. doi: 10.1371/journal.pgen.1007327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bult CJ, Blake JA, Smith CL, Kadin JA, Richardson JE, Mouse Genome Database G. Mouse genome database (MGD) 2019. Nucleic Acids Research. 2019;47:D801–D806. doi: 10.1093/nar/gky1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. 2009;25:1422–1423. doi: 10.1093/bioinformatics/btp163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Cohn RD, Henry MD, Michele DE, Barresi R, Saito F, Moore SA, Flanagan JD, Skwarchuk MW, Robbins ME, Mendell JR, Williamson RA, Campbell KP. Disruption of DAG1 in differentiated skeletal muscle reveals a role for dystroglycan in muscle regeneration. Cell. 2002;110:639–648. doi: 10.1016/s0092-8674(02)00907-8. [DOI] [PubMed] [Google Scholar]
  8. Cummings BB, Marshall JL, Tukiainen T, Lek M, Donkervoort S, Foley AR, Bolduc V, Waddell LB, Sandaradura SA, O’Grady GL, Estrella E, Reddy HM, Zhao F, Weisburd B, Karczewski KJ, O’Donnell-Luria AH, Birnbaum D, Sarkozy A, Hu Y, Gonorazky H, Claeys K, Joshi H, Bournazos A, Oates EC, Ghaoui R, Davis MR, Laing NG, Topf A, Genotype-Tissue Expression Consortium. Kang PB, Beggs AH, North KN, Straub V, Dowling JJ, Muntoni F, Clarke NF, Cooper ST, Bönnemann CG, MacArthur DG. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Science Translational Medicine. 2017;9:eaal5209. doi: 10.1126/scitranslmed.aal5209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dai Y, Hu R, Manuel AM, Liu A, Jia P, Zhao Z. CSEA-DB: an omnibus for human complex trait and cell type associations. Nucleic Acids Research. 2021;49:D862–D870. doi: 10.1093/nar/gkaa1064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Diehl AD, Meehan TF, Bradford YM, Brush MH, Dahdul WM, Dougall DS, He Y, Osumi-Sutherland D, Ruttenberg A, Sarntivijai S, Van Slyke CE, Vasilevsky NA, Haendel MA, Blake JA, Mungall CJ. The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability. Journal of Biomedical Semantics. 2016;7:44. doi: 10.1186/s13326-016-0088-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Eraslan G, Drokhlyansky E, Anand S, Fiskin E, Subramanian A, Slyper M, Wang J, Van Wittenberghe N, Rouhana JM, Waldman J, Ashenberg O, Lek M, Dionne D, Win TS, Cuoco MS, Kuksenko O, Tsankov AM, Branton PA, Marshall JL, Greka A, Getz G, Segrè AV, Aguet F, Rozenblatt-Rosen O, Ardlie KG, Regev A. Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function. Science. 2022;376:eabl4290. doi: 10.1126/science.abl4290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Ferreira CR. The burden of rare diseases. American Journal of Medical Genetics. Part A. 2019;179:885–892. doi: 10.1002/ajmg.a.61124. [DOI] [PubMed] [Google Scholar]
  13. Guan J, Lin Y, Wang Y, Gao J, Ji G. An analytical method for the identification of cell type-specific disease gene modules. Journal of Translational Medicine. 2021;19:20. doi: 10.1186/s12967-020-02690-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, Lee MJ, Wilk AJ, Darby C, Zager M, Hoffman P, Stoeckius M, Papalexi E, Mimitou EP, Jain J, Srivastava A, Stuart T, Fleming LM, Yeung B, Rogers AJ, McElrath JM, Blish CA, Gottardo R, Smibert P, Satija R. Integrated analysis of multimodal single-cell data. Cell. 2021;184:3573–3587. doi: 10.1016/j.cell.2021.04.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Hekselman I, Yeger-Lotem E. Mechanisms of tissue and cell-type specificity in heritable traits and diseases. Nature Reviews. Genetics. 2020;21:137–150. doi: 10.1038/s41576-019-0200-9. [DOI] [PubMed] [Google Scholar]
  16. Hekselman I, Kerber L, Ziv M, Gruber G, Yeger-Lotem E. The Organ-Disease Annotations (ODiseA) Database of Hereditary Diseases and Inflicted Tissues. Journal of Molecular Biology. 2022;434:167619. doi: 10.1016/j.jmb.2022.167619. [DOI] [PubMed] [Google Scholar]
  17. Hekselman I. Predict. swh:1:rev:71b591e1f4a413f347e5bfc453a411edd5aeb514Software Heritage. 2024 https://archive.softwareheritage.org/swh:1:dir:dfed1277bd7649514be1994ee581757284c4fd70;origin=https://github.com/hekselman/PrEDiCT;visit=swh:1:snp:d5189c6ac53099ec298f1110ee5f0b006dffa18e;anchor=swh:1:rev:71b591e1f4a413f347e5bfc453a411edd5aeb514
  18. Herman TF, Killeen RB, Javaid MU. In: StatPearls. Herman TF, Killeen RB, editors. Treasure Island (FL) Ineligible Companies; 2023. Heinz body. [Google Scholar]
  19. Jagadeesh KA, Dey KK, Montoro DT, Mohan R, Gazal S, Engreitz JM, Xavier RJ, Price AL, Regev A. Identifying Disease-Critical Cell Types and Cellular Processes across the Human Body by Integration of Single-Cell Profiles and Human Genetics. bioRxiv. 2021 doi: 10.1101/2021.03.19.436212. [DOI] [PMC free article] [PubMed]
  20. Jagadeesh KA, Dey KK, Montoro DT, Mohan R, Gazal S, Engreitz JM, Xavier RJ, Price AL, Regev A. Identifying disease-critical cell types and cellular processes by integrating single-cell RNA-sequencing and human genetics. Nature Genetics. 2022;54:1479–1492. doi: 10.1038/s41588-022-01187-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Jin S, Guerrero-Juarez CF, Zhang L, Chang I, Ramos R, Kuan C-H, Myung P, Plikus MV, Nie Q. Inference and analysis of cell-cell communication using CellChat. Nature Communications. 2021;12:1088. doi: 10.1038/s41467-021-21246-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Jones RC, Karkanias J, Krasnow MA, Pisco AO, Quake SR, Salzman J, Yosef N, Bulthaup B, Brown P, Harper W, Hemenez M, Ponnusamy R, Salehi A, Sanagavarapu BA, Spallino E, Aaron KA, Concepcion W, Gardner JM, Kelly B, Neidlinger N, Wang Z, Crasta S, Kolluru S, Morri M, Pisco AO, Tan SY, Travaglini KJ, Xu C, Alcántara-Hernández M, Almanzar N, Antony J, Beyersdorf B, Burhan D, Calcuttawala K, Carter MM, Chan CKF, Chang CA, Chang S, Colville A, Crasta S, Culver RN, Cvijović I, D’Amato G, Ezran C, Galdos FX, Gillich A, Goodyer WR, Hang Y, Hayashi A, Houshdaran S, Huang X, Irwin JC, Jang S, Juanico JV, Kershner AM, Kim S, Kiss B, Kolluru S, Kong W, Kumar ME, Kuo AH, Leylek R, Li B, Loeb GB, Lu W-J, Mantri S, Markovic M, McAlpine PL, de Morree A, Morri M, Mrouj K, Mukherjee S, Muser T, Neuhöfer P, Nguyen TD, Perez K, Phansalkar R, Pisco AO, Puluca N, Qi Z, Rao P, Raquer-McKay H, Schaum N, Scott B, Seddighzadeh B, Segal J, Sen S, Sikandar S, Spencer SP, Steffes LC, Subramaniam VR, Swarup A, Swift M, Travaglini KJ, Van Treuren W, Trimm E, Veizades S, Vijayakumar S, Vo KC, Vorperian SK, Wang W, Weinstein HNW, Winkler J, Wu TTH, Xie J, Yung AR, Zhang Y, Detweiler AM, Mekonen H, Neff NF, Sit RV, Tan M, Yan J, Bean GR, Charu V, Forgó E, Martin BA, Ozawa MG, Silva O, Tan SY, Toland A, Vemuri VNP, Afik S, Awayan K, Botvinnik OB, Byrne A, Chen M, Dehghannasiri R, Detweiler AM, Gayoso A, Granados AA, Li Q, Mahmoudabadi G, McGeever A, de Morree A, Olivieri JE, Park M, Pisco AO, Ravikumar N, Salzman J, Stanley G, Swift M, Tan M, Tan W, Tarashansky AJ, Vanheusden R, Vorperian SK, Wang P, Wang S, Xing G, Xu C, Yosef N, Alcántara-Hernández M, Antony J, Chan CKF, Chang CA, Colville A, Crasta S, Culver R, Dethlefsen L, Ezran C, Gillich A, Hang Y, Ho P-Y, Irwin JC, Jang S, Kershner AM, Kong W, Kumar ME, Kuo AH, Leylek R, Liu S, Loeb GB, Lu W-J, Maltzman JS, Metzger RJ, de Morree A, Neuhöfer P, Perez K, Phansalkar R, Qi Z, Rao P, Raquer-McKay H, Sasagawa K, Scott B, Sinha R, Song H, Spencer SP, Swarup A, Swift M, Travaglini KJ, Trimm E, Veizades S, Vijayakumar S, Wang B, Wang W, Winkler J, Xie J, Yung AR, Artandi SE, Beachy PA, Clarke MF, Giudice LC, Huang FW, Huang KC, Idoyaga J, Kim SK, Krasnow M, Kuo CS, Nguyen P, Quake SR, Rando TA, Red-Horse K, Reiter J, Relman DA, Sonnenburg JL, Wang B, Wu A, Wu SM, Wyss-Coray T, Tabula Sapiens Consortium* The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans. Science. 2022;376:eabl4896. doi: 10.1126/science.abl4896. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Kim-Hellmuth S, Aguet F, Oliva M, Muñoz-Aguirre M, Kasela S, Wucher V, Castel SE, Hamel AR, Viñuela A, Roberts AL, Mangul S, Wen X, Wang G, Barbeira AN, Garrido-Martín D, Nadel BB, Zou Y, Bonazzola R, Quan J, Brown A, Martinez-Perez A, Soria JM, GTEx Consortium. Getz G, Dermitzakis ET, Small KS, Stephens M, Xi HS, Im HK, Guigó R, Segrè AV, Stranger BE, Ardlie KG, Lappalainen T. Cell type-specific genetic regulation of gene expression across human tissues. Science. 2020;369:eaaz8528. doi: 10.1126/science.aaz8528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Kivirikko S, Saarela J, Myers JC, Autio-Harmainen H, Pihlajaniemi T. Distribution of type XV collagen transcripts in human tissue and their production by muscle cells and fibroblasts. The American Journal of Pathology. 1995;147:1500–1509. [PMC free article] [PubMed] [Google Scholar]
  25. Köhler S, Gargano M, Matentzoglu N, Carmody LC, Lewis-Smith D, Vasilevsky NA, Danis D, Balagura G, Baynam G, Brower AM, Callahan TJ, Chute CG, Est JL, Galer PD, Ganesan S, Griese M, Haimel M, Pazmandi J, Hanauer M, Harris NL, Hartnett MJ, Hastreiter M, Hauck F, He Y, Jeske T, Kearney H, Kindle G, Klein C, Knoflach K, Krause R, Lagorce D, McMurry JA, Miller JA, Munoz-Torres MC, Peters RL, Rapp CK, Rath AM, Rind SA, Rosenberg AZ, Segal MM, Seidel MG, Smedley D, Talmy T, Thomas Y, Wiafe SA, Xian J, Yüksel Z, Helbig I, Mungall CJ, Haendel MA, Robinson PN. The Human Phenotype Ontology in 2021. Nucleic Acids Research. 2021;49:D1207–D1217. doi: 10.1093/nar/gkaa1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lage K, Hansen NT, Karlberg EO, Eklund AC, Roque FS, Donahoe PK, Szallasi Z, Jensen TS, Brunak S. A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. PNAS. 2008;105:20870–20875. doi: 10.1073/pnas.0810772105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Leiding JW, Holland SM. Chronic Granulomatous Disease. GeneReviews; 1993. [PubMed] [Google Scholar]
  28. Leigh MW, Horani A, Kinghorn B, O’Connor MG, Zariwala MA, Knowles MR. Primary Ciliary Dyskinesia (PCD): A genetic disorder of motile cilia. Translational Science of Rare Diseases. 2019;4:51–75. doi: 10.3233/TRD-190036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Mathys H, Davila-Velderrain J, Peng Z, Gao F, Mohammadi S, Young JZ, Menon M, He L, Abdurrob F, Jiang X, Martorell AJ, Ransohoff RM, Hafler BP, Bennett DA, Kellis M, Tsai L-H. Single-cell transcriptomic analysis of Alzheimer’s disease. Nature. 2019;570:332–337. doi: 10.1038/s41586-019-1195-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Mereu E, Lafzi A, Moutinho C, Ziegenhain C, McCarthy DJ, Álvarez-Varela A, Batlle E, Grün D, Lau JK, Boutet SC, Sanada C, Ooi A, Jones RC, Kaihara K, Brampton C, Talaga Y, Sasagawa Y, Tanaka K, Hayashi T, Braeuning C, Fischer C, Sauer S, Trefzer T, Conrad C, Adiconis X, Nguyen LT, Regev A, Levin JZ, Parekh S, Janjic A, Wange LE, Bagnoli JW, Enard W, Gut M, Sandberg R, Nikaido I, Gut I, Stegle O, Heyn H. Benchmarking single-cell RNA-sequencing protocols for cell atlas projects. Nature Biotechnology. 2020;38:747–755. doi: 10.1038/s41587-020-0469-4. [DOI] [PubMed] [Google Scholar]
  31. Montoro DT, Haber AL, Biton M, Vinarsky V, Lin B, Birket SE, Yuan F, Chen S, Leung HM, Villoria J, Rogel N, Burgin G, Tsankov AM, Waghray A, Slyper M, Waldman J, Nguyen L, Dionne D, Rozenblatt-Rosen O, Tata PR, Mou H, Shivaraju M, Bihler H, Mense M, Tearney GJ, Rowe SM, Engelhardt JF, Regev A, Rajagopal J. A revised airway epithelial hierarchy includes CFTR-expressing ionocytes. Nature. 2018;560:319–324. doi: 10.1038/s41586-018-0393-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Park J, Shrestha R, Qiu C, Kondo A, Huang S, Werth M, Li M, Barasch J, Suszták K. Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease. Science. 2018;360:758–763. doi: 10.1126/science.aar2131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Plasschaert LW, Žilionis R, Choo-Wing R, Savova V, Knehr J, Roma G, Klein AM, Jaffe AB. A single-cell atlas of the airway epithelium reveals the CFTR-rich pulmonary ionocyte. Nature. 2018;560:377–381. doi: 10.1038/s41586-018-0394-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Robinson A, Han CZ, Glass CK, Pollard JW. Monocyte Regulation in Homeostasis and Malignancy. Trends in Immunology. 2021;42:104–119. doi: 10.1016/j.it.2020.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Rouhana JM, Wang J, Eraslan G, Anand S, Hamel AR, Cole B, Regev A, Aguet F, Ardlie KG, Segrè AV. ECLIPSER: Identifying Causal Cell Types and Genes for Complex Traits through Single Cell Enrichment of e/sQTL-Mapped Genes in GWAS Loci. bioRxiv. 2021 doi: 10.1101/2021.11.24.469720. [DOI]
  36. Segerstolpe Å, Palasantza A, Eliasson P, Andersson E-M, Andréasson A-C, Sun X, Picelli S, Sabirsh A, Clausen M, Bjursell MK, Smith DM, Kasper M, Ämmälä C, Sandberg R. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metabolism. 2016;24:593–607. doi: 10.1016/j.cmet.2016.08.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Servián-Morilla E, Cabrera-Serrano M, Johnson K, Pandey A, Ito A, Rivas E, Chamova T, Muelas N, Mongini T, Nafissi S, Claeys KG, Grewal RP, Takeuchi M, Hao H, Bönnemann C, Lopes Abath Neto O, Medne L, Brandsema J, Töpf A, Taneva A, Vilchez JJ, Tournev I, Haltiwanger RS, Takeuchi H, Jafar-Nejad H, Straub V, Paradas C. POGLUT1 biallelic mutations cause myopathy with reduced satellite cells, α-dystroglycan hypoglycosylation and a distinctive radiological pattern. Acta Neuropathologica. 2020;139:565–582. doi: 10.1007/s00401-019-02117-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Smillie CS, Biton M, Ordovas-Montanes J, Sullivan KM, Burgin G, Graham DB, Herbst RH, Rogel N, Slyper M, Waldman J, Sud M, Andrews E, Velonias G, Haber AL, Jagadeesh K, Vickovic S, Yao J, Stevens C, Dionne D, Nguyen LT, Villani AC, Hofree M, Creasey EA, Huang H, Rozenblatt-Rosen O, Garber JJ, Khalili H, Desch AN, Daly MJ, Ananthakrishnan AN, Shalek AK, Xavier RJ, Regev A. Intra- and Inter-cellular Rewiring of the Human Colon during Ulcerative Colitis. Cell. 2019;178:714–730. doi: 10.1016/j.cell.2019.06.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Sonawane AR, Platig J, Fagny M, Chen C-Y, Paulson JN, Lopes-Ramos CM, DeMeo DL, Quackenbush J, Glass K, Kuijjer ML. Understanding Tissue-Specific Gene Regulation. Cell Reports. 2017;21:1077–1088. doi: 10.1016/j.celrep.2017.10.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Sondka Z, Bamford S, Cole CG, Ward SA, Dunham I, Forbes SA. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nature Reviews. Cancer. 2018;18:696–705. doi: 10.1038/s41568-018-0060-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Tabula Muris C. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature. 2018;562:367–372. doi: 10.1038/s41586-018-0590-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Urciuolo A, Quarta M, Morbidoni V, Gattazzo F, Molon S, Grumati P, Montemurro F, Tedesco FS, Blaauw B, Cossu G, Vozzi G, Rando TA, Bonaldo P. Collagen VI regulates satellite cell self-renewal and muscle regeneration. Nature Communications. 2013;4:1964. doi: 10.1038/ncomms2964. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Yazdani R, Fekrvand S, Shahkarami S, Azizi G, Moazzami B, Abolhassani H, Aghamohammadi A. The hyper IgM syndromes: Epidemiology, pathogenesis, clinical manifestations, diagnosis and management. Clinical Immunology. 2019;198:19–30. doi: 10.1016/j.clim.2018.11.007. [DOI] [PubMed] [Google Scholar]
  44. Zhang MJ, Hou K, Dey KK, Sakaue S, Jagadeesh KA, Weinand K, Taychameekiatchai A, Rao P, Pisco AO, Zou J, Wang B, Gandal M, Raychaudhuri S, Pasaniuc B, Price AL. Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data. Nature Genetics. 2022;54:1572–1580. doi: 10.1038/s41588-022-01167-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Zhao T, Lyu S, Lu G, Juan L, Zeng X, Wei Z, Hao J, Peng J. SC2disease: a manually curated database of single-cell transcriptome for human diseases. Nucleic Acids Research. 2021;49:D1413–D1419. doi: 10.1093/nar/gkaa838. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Zou Y, Zhang R-Z, Sabatelli P, Chu M-L, Bönnemann CG. Muscle interstitial fibroblasts are the main source of collagen VI synthesis in skeletal muscle: implications for congenital muscular dystrophy types Ullrich and Bethlem. Journal of Neuropathology and Experimental Neurology. 2008;67:144–154. doi: 10.1097/nen.0b013e3181634ef7. [DOI] [PubMed] [Google Scholar]

Editor's evaluation

Bogdan Pasaniuc 1

The study presents analyses linking cell-types to monogenic disorders using over-expression of known disease-associated genes in single-cell data to identify disease-affected cell types for 328 Mendelian diseases. Overall, this important study combines multiple data analyses to quantify the connection between cell types and human disorders. Compelling analyses using stringent and rigorous statistical methodologies support the conclusions of this study.

Decision letter

Editor: Bogdan Pasaniuc1

Our editorial process produces two outputs: i) public reviews designed to be posted alongside the preprint for the benefit of readers; ii) feedback on the manuscript for the authors, including requests for revisions, shown below. We also include an acceptance summary that explains what the editors found interesting or important about the work.

Decision letter after peer review:

Thank you for submitting your article "Affected cell types for hundreds of Mendelian diseases revealed by analysis of human and mouse single-cell data" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, one of whom is a member of our Board of Reviewing Editors, and the evaluation has been overseen by Molly Przeworski as the Senior Editor. The reviewers have opted to remain anonymous.

The reviewers have discussed their reviews with one another, and the Reviewing Editor has drafted this to help you prepare a revised submission.

Essential revisions:

1. All reviewers commented with respect to lack of rigor in the approach for determining statistical significance (Rev1#1, Rev2#1, Rev3#2). Criteria for statistical significance needs to be justified/investigated to provide statistical stringency for determining the set of significant associations.

2. Investigation of null/negative control genes would strengthen the conclusions of this work (Rev3#1, Rev2#4). For example, some Mendelian diseases impact disease through coding mutations without impacting expression in any cell type. Such genes could serve as negative controls showcasing the method does not identify any disease-affected cell type.

3. Improved textual contextualization (and caveats) of proposed work to better showcase the proposed results would significantly strengthen the rigor (Rev1#2, Rev3#1,4,5,6).

Reviewer #1 (Recommendations for the authors):

The manuscript by Hekselman et al. presents analyses linking cell-types to monogenic disorders using over-expression of monogenic disease genes as the signal. The manuscript analyses data from 6 tissues (bone marrow, lung, muscle, spleen, tongue and trachea) together with ~1,000 rare diseases from OMIM (with ~2,000 associated genes) to identify cell-type of interest for specific disease of choice. The signal used by the approach is the relative expression of OMIM-genes in a particular cell type relative to the expression of the gene in the tissue of interest identifying cell-type-disease pairs that are then investigated through literature review and recapitulated using mouse expression. A potentially interesting finding is that disease genes manifesting in multiple tissues seem to hit same cell-types. Overall, this important study combines multiple data analyses to quantify the connection between cell types and human disorders. However, whereas some of the analyses are compelling, the statistical analyses are incomplete as they don't provide full treatment of type I error.

I have two main critiques that reduce my enthusiasm for this work.

1. The statistical framework to identify cell-types for a given disease (gene) is likely not calibrated to type I error (page 5 and methods). The approach notes that disease-cell-type pairs with a PREDICT score > 2 (top 5%) of all pairs are "considered significant". I could not find any justification for why this is a well-calibrated test that yields significant associations. Under a model of where no true cell-type-associations are present in the data, this approach will still incorrectly flag the top 5% as being significant. In fact, Figure 1B appears consistent with PREDICT scores drawn from a normal distribution centered on 0 potentially consistent with most (if not all) associations being null. The approach needs a proper statistical model that is well-calibrated test to declare significant associations.

2. Functional validation of the disease-cell-type is limited. Associations are investigate using literature search in Pubmed focused on co-occurrence of disease and cell types; this is not validation as many of those co-appearances reflect negative and/or speculative results. The second line of evidence that these associations are true comes from recapitulation of results using mouse data; these results could be over-interpreted as human and mouse cell type data is matched based on expression itself thus creating an expected loop: mouse cell-type expression matched to human cell-type expression identifies similar expression-driven associations.

Reviewer #2 (Recommendations for the authors):

Comments:

1. The PrEDiCT score threshold for inferring a disease-cell type pair as statistically significant seems unlikely to be sufficiently stringent. The threshold for inferring statistical significance must be rigorously justified.

First, for diseases with only 1 associated gene, z>2 is a lax threshold, as 2.5% of disease-cell type pairs would be expected to be significant by chance. The distribution of the number of associated genes per disease is important and must be discussed in the main text with an associated Supp Table/Supp Figure (how many of the 1113 diseases have only 1 associated gene, how many have exactly 2 associated genes, etc.).

Second, from Table S3, it seems that the total number of disease-cell type pairs tested is 3952 + 5624 + 12996 + 2568 + 780 + 540 = 26460 (which is less than 1113*129 = 143577, because only cell types in disease-associated tissues were tested). The number is 26460 is important and must be reported in the main text. Given that 2.5% * 26460 = 662, in the extreme case that all diseases have only 1 associated gene we would expect 662 significant PrEDiCT scores by chance, such that many of the 753 significant PrEDiCT scores that are reported could be false positives.

Third, under the null hypothesis of no disease-affected cell types, for diseases with only 1 associated gene, the PrEDiCT scores may have more large values than a normal distribution, because there could be many genes with patterns of preferential expression in a particular cell type. On the other hand, for diseases with x associated genes with x>1, the PrEDiCT scores may have fewer large values than a normal distribution, because the median of x independent z-scores has a <2.5% chance of being >2.

The best solution would be to assess statistical significance via empirical comparison with PrEDiCT scores for non-disease-associated control genes (for diseases with only 1 associated gene), or empirical comparison with PrEDiCT scores for the median of x non-disease-associated control genes (for diseases with x associated genes with x>1). I recommend that this approach should be used. The resulting P-values can then be evaluated for statistical significance using Bonferroni (probably too conservative) or FDR. It is likely that the method has higher power for larger values of x, such that FDR could be stratified by the value of x. Note that the main point here is variation across genes: unless disease-associated gene(s) are very significantly different from non-disease-associated control genes, then no significant disease-affected cell type should be inferred.

An alternative, which I have much less enthusiasm for but which is plausible, would be to state throughout that the PrEDiCT scores identify candidate disease-affected cell types, and use excess overlap with disease-cell type pairs from literature co-appearance to assign an FDR to the candidate disease-affected cell types (perhaps at different PrEDiCT score thresholds), e.g., FDR = 100% / (fold excess overlap).

2. The numbers in Figure 1E are not consistent with the numbers in the main text.

First, the main text (p.5) states that 753 disease-cell type pairs have significant literature co-appearance, whereas Figure 1E states that 654+99=753 disease-cell type pairs have significant PrEDiCT scores and 448+99=547 pairs have significant literature co-appearance. I am guessing that 753 in the main text is a typo and should be 547.

Second, the main text (p.6) states that 714 diseases had disease-affected cell types inferred (I believe this is based on the PrEDiCT score, but the wording of the text is confusing and could be improved), whereas Figure 1E states that 654+99=753 disease-cell type pairs have significant PrEDiCT scores. I’m guessing that 753>714 because some diseases have >1 disease-affected cell types inferred, but this should be stated explicitly.

Third, the main text (p.6) states that 18% of disease-cell type pairs with significant PrEDiCT scores have significant literature co-appearance. However, based on the numbers in Figure 1E, 99/753 = 13%, which is different from 18%.

Fourth, the main text (p.6) states that 6% of all disease-cell type pairs have significant literature co-appearance. However, based on 547 from Figure 1E and the total of 26460 disease-cell type pairs tested (see Comment 1), 547/26460 = 2%, which is different from 6%.

In addition to fixing the discrepancies, it may be good to expand the explanations, as this is really the most important part of the paper.

3. The PCD example (Figure 1C, Figure 2D) is a great example. However, I suggest to increase the number of specific Mendelian diseases highlighted in the main Figures (prior to delving into diseases with multiple affected tissues, or other special categories of diseases) from 1 to at least 4, e.g., via a separate main Figure highlighting 4 Mendelian diseases (with human or human+mouse results).

4. I expect that for some Mendelian diseases, coding mutations to disease-associated gene(s) affect protein product but do not affect expression in any cell type. It would be interesting to include some well-studied Mendelian diseases in this category as negative controls, for which failure to implicate any disease-affected cell type is the correct answer.

Reviewer #3 (Recommendations for the authors):

I really do believe the authors should provide context of their method with a larger set of previously published work. Additionally, the thresholding concerns and lack of a clear null comparison make it difficult to assess the robustness of the method and the analyses.

eLife. 2024 Jan 10;13:e84613. doi: 10.7554/eLife.84613.sa2

Author response


Essential revisions:

1. All reviewers commented with respect to lack of rigor in the approach for determining statistical significance (Rev1#1, Rev2#1, Rev3#2). Criteria for statistical significance needs to be justified/investigated to provide statistical stringency for determining the set of significant associations.

We changed the approach for determining the statistical significance of PrEDiCT scores to be based on permutation tests, as suggested by the reviewers (Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1—figure supplement 2). We assessed the stringency of the approach using literature text-mining and expert curation (Results, page 7, 2nd paragraph; Methods, page 22, ‘Text-mining of PubMed records’, and page 23, ‘Expert curation and assessment of disease-affected cell types’; Figure 1F,G, and Figure 1—figure supplement 4).

2. Investigation of null/negative control genes would strengthen the conclusions of this work (Rev3#1, Rev2#4). For example, some Mendelian diseases impact disease through coding mutations without impacting expression in any cell type. Such genes could serve as negative controls showcasing the method does not identify any disease-affected cell type.

We added negative control cases to the Results, page 7, 1st paragraph.

3. Improved textual contextualization (and caveats) of proposed work to better showcase the proposed results would significantly strengthen the rigor (Rev1#2, Rev3#1,4,5,6).

We improved textual contextualization in the Introduction (from the end of page 2 to page 3) and throughout the manuscript. We extended the discussion of caveats in the Discussion (page 16, 2nd paragraph and page 17, 2nd paragraph).

Reviewer #1 (Recommendations for the authors):

The manuscript by Hekselman et al. presents analyses linking cell-types to monogenic disorders using over-expression of monogenic disease genes as the signal. The manuscript analyses data from 6 tissues (bone marrow, lung, muscle, spleen, tongue and trachea) together with ~1,000 rare diseases from OMIM (with ~2,000 associated genes) to identify cell-type of interest for specific disease of choice. The signal used by the approach is the relative expression of OMIM-genes in a particular cell type relative to the expression of the gene in the tissue of interest identifying cell-type-disease pairs that are then investigated through literature review and recapitulated using mouse expression. A potentially interesting finding is that disease genes manifesting in multiple tissues seem to hit same cell-types. Overall, this important study combines multiple data analyses to quantify the connection between cell types and human disorders. However, whereas some of the analyses are compelling, the statistical analyses are incomplete as they don't provide full treatment of type I error.

I have two main critiques that reduce my enthusiasm for this work.

1. The statistical framework to identify cell-types for a given disease (gene) is likely not calibrated to type I error (page 5 and methods). The approach notes that disease-cell-type pairs with a PREDICT score > 2 (top 5%) of all pairs are "considered significant". I could not find any justification for why this is a well-calibrated test that yields significant associations. Under a model of where no true cell-type-associations are present in the data, this approach will still incorrectly flag the top 5% as being significant. In fact, Figure 1B appears consistent with PREDICT scores drawn from a normal distribution centered on 0 potentially consistent with most (if not all) associations being null. The approach needs a proper statistical model that is well-calibrated test to declare significant associations.

Following this comment, we revised the procedure for declaring significant disease–cell-type associations. Instead of considering disease–cell-type pairs with PrEDiCT score > 2 (top 5%) as significant, we used permutation testing to determine statistical significance Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; (Figure 1—figure supplement 2). Specifically, given a disease D with n disease-associated genes, we randomly selected n genes expressed in any cell type, and computed the PrEDiCT score for this random gene set in each cell type of the disease-affected tissue (referred to as ‘random score’). We repeated this procedure 1,000 times, resulting in 1,000 random scores per disease and cell type. The p-value of the PrEDiCT score of disease D in cell type c was set to the fraction of random scores in c that were at least as high as the original PrEDiCT score of D in c. P-values were adjusted for multiple hypothesis testing per disease using the Benjamini-Hochberg procedure. To increase stringency, we treated only statistically significant disease–cell-type pairs with PrEDiCT score≥1 as 'likely affected'. We estimated type I error using literature text-mining or expert curation (Figure 1F and Figure 1—figure supplement 4A). False-positive rates of the revised procedure were low in both (0.01 and 0.07, respectively).

2. Functional validation of the disease-cell-type is limited. Associations are investigate using literature search in Pubmed focused on co-occurrence of disease and cell types; this is not validation as many of those co-appearances reflect negative and/or speculative results. The second line of evidence that these associations are true comes from recapitulation of results using mouse data; these results could be over-interpreted as human and mouse cell type data is matched based on expression itself thus creating an expected loop: mouse cell-type expression matched to human cell-type expression identifies similar expression-driven associations.

We agree with the reviewer that functional validation using literature search in Pubmed is limited. To strengthen the reliability of likely disease–cell-type associations, we assigned an expert to curate associations with different PrEDiCT scores (ten associations per tissue, for six tissues; Methods, page 23, ‘Expert curation and assessment of disease-affected cell types’). Next, we used the curated association to estimate false-positive and false-negative rates. This analysis showed that the revised scheme had a low false-positive rate and that likely associations were enriched in expert-verified pairs (p=1.3E-4, Fisher’s exact test; Figure 1—figure supplement 4).

With respect to the recapitulation of results using mouse data, we assessed the impact of 'reuse' of expression data to drive associations. To avoid reusing disease genes in both PrEDiCT scheme and in the matching between human and mouse cell types, we matched between cell types based on the expression of non-disease genes alone. We found that most of the originally matched celltype pairs (109/122, 89%) were also matched upon excluding disease genes. Hence, the impact of 'reuse' of expression data seems minor. We acknowledge this caveat and mention the observation described herein in Discussion, page 17, 2nd half of the 2nd paragraph; Results, page 8, last sentence; and Methods, page 23, ‘Determining matching cell types between tissues’.

Reviewer #2 (Recommendations for the authors):

Comments:

1. The PrEDiCT score threshold for inferring a disease-cell type pair as statistically significant seems unlikely to be sufficiently stringent. The threshold for inferring statistical significance must be rigorously justified.

First, for diseases with only 1 associated gene, z>2 is a lax threshold, as 2.5% of disease-cell type pairs would be expected to be significant by chance. The distribution of the number of associated genes per disease is important and must be discussed in the main text with an associated Supp Table/Supp Figure (how many of the 1113 diseases have only 1 associated gene, how many have exactly 2 associated genes, etc.).

Second, from Table S3, it seems that the total number of disease-cell type pairs tested is 3952 + 5624 + 12996 + 2568 + 780 + 540 = 26460 (which is less than 1113*129 = 143577, because only cell types in disease-associated tissues were tested). The number is 26460 is important and must be reported in the main text. Given that 2.5% * 26460 = 662, in the extreme case that all diseases have only 1 associated gene we would expect 662 significant PrEDiCT scores by chance, such that many of the 753 significant PrEDiCT scores that are reported could be false positives.

Third, under the null hypothesis of no disease-affected cell types, for diseases with only 1 associated gene, the PrEDiCT scores may have more large values than a normal distribution, because there could be many genes with patterns of preferential expression in a particular cell type. On the other hand, for diseases with x associated genes with x>1, the PrEDiCT scores may have fewer large values than a normal distribution, because the median of x independent z-scores has a <2.5% chance of being >2.

The best solution would be to assess statistical significance via empirical comparison with PrEDiCT scores for non-disease-associated control genes (for diseases with only 1 associated gene), or empirical comparison with PrEDiCT scores for the median of x non-disease-associated control genes (for diseases with x associated genes with x>1). I recommend that this approach should be used. The resulting P-values can then be evaluated for statistical significance using Bonferroni (probably too conservative) or FDR. It is likely that the method has higher power for larger values of x, such that FDR could be stratified by the value of x. Note that the main point here is variation across genes: unless disease-associated gene(s) are very significantly different from non-disease-associated control genes, then no significant disease-affected cell type should be inferred.

An alternative, which I have much less enthusiasm for but which is plausible, would be to state throughout that the PrEDiCT scores identify candidate disease-affected cell types, and use excess overlap with disease-cell type pairs from literature co-appearance to assign an FDR to the candidate disease-affected cell types (perhaps at different PrEDiCT score thresholds), e.g., FDR = 100% / (fold excess overlap).

We thank the reviewer for these suggestions and adopted the ‘best solution’ that the reviewer suggested above. Specifically, we empirically compared the original cell-type PrEDiCT scores of a disease with PrEDiCT scores computed for randomly selected sets of non-disease-associated control genes. The resulting P-values were then adjusted using Benjamini-Hochberg procedure (Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1—figure supplement 2).

Per the reviewer’s requests, we added a supplementary figure that shows the distribution of diseases by the number of associated genes, and the same distribution for the diseases with likely associations (Figure 1—figure supplement 3). We report the revised total number of disease–celltype pairs tested in Results, page 6, end of 1st paragraph.

2. The numbers in Figure 1E are not consistent with the numbers in the main text.

First, the main text (p.5) states that 753 disease-cell type pairs have significant literature co-appearance, whereas Figure 1E states that 654+99=753 disease-cell type pairs have significant PrEDiCT scores and 448+99=547 pairs have significant literature co-appearance. I am guessing that 753 in the main text is a typo and should be 547.

Second, the main text (p.6) states that 714 diseases had disease-affected cell types inferred (I believe this is based on the PrEDiCT score, but the wording of the text is confusing and could be improved), whereas Figure 1E states that 654+99=753 disease-cell type pairs have significant PrEDiCT scores. I’m guessing that 753>714 because some diseases have >1 disease-affected cell types inferred, but this should be stated explicitly.

Third, the main text (p.6) states that 18% of disease-cell type pairs with significant PrEDiCT scores have significant literature co-appearance. However, based on the numbers in Figure 1E, 99/753 = 13%, which is different from 18%.

Fourth, the main text (p.6) states that 6% of all disease-cell type pairs have significant literature co-appearance. However, based on 547 from Figure 1E and the total of 26460 disease-cell type pairs tested (see Comment 1), 547/26460 = 2%, which is different from 6%.

In addition to fixing the discrepancies, it may be good to expand the explanations, as this is really the most important part of the paper.

We are grateful to the reviewer for noting these issues. We recalculated the various numbers after redoing the entire analysis, including revising the disease-tissue dataset, calculation of PrEDiCT scores and their statistical significance, and rerunning the Biopython package to identify disease–cell-type pairs with significant literature coappearance. Consequently, the results have changed, and we revised all relevant numbers (also summarized in new Supplementary File 1A). For the first and second point, the revised number of disease–cell-type pairs with significant PrEDiCT scores, denoted 'likely affected associations', was 489. The disease–cell-type pairs with significant literature co-appearance was revised to 229 pairs (denoted 'literature-supported pairs'). Importantly, the current version of Biopython retrieves only up to 9,999 records per term, rather than the previously available threshold of 99,999 records. For instance, Biopython retrieved 9,999 records for each of the terms erythrocytes and β-thalassemia, of which 32 records overlapped (the overlap remained statistically significant). We revised wording to explicitly state the revised numbers in Results, page 7, last sentence; and in Figure 1G (also see figure legend).

For the third and fourth points, we recalculated the different fractions. The fraction of likely affected disease–cell-type pairs that are literature-supported out of all likely affected pairs was updated to 41/489 (8.4%). The fraction of literature-supported disease–cell-type pairs out of all disease–celltype pairs was updated to 229/34,249 (0.7%, now in Supplementary File 2A; the number of all disease–cell-type pairs grew due to miscalculation in the original analysis). We updated the text and expanded the explanations in Results, page 8, 1st paragraph; and in Figure 1G.

3. The PCD example (Figure 1C, Figure 2D) is a great example. However, I suggest to increase the number of specific Mendelian diseases highlighted in the main Figures (prior to delving into diseases with multiple affected tissues, or other special categories of diseases) from 1 to at least 4, e.g., via a separate main Figure highlighting 4 Mendelian diseases (with human or human+mouse results).

Per the reviewer’s suggestion, we highlight five additional Mendelian diseases (Results, page 7, 1st paragraph; Figure 1D and 1E), all of which were recapitulated using mouse data (Results, page 9, 1st paragraph; Figure 2D and 2E).

4. I expect that for some Mendelian diseases, coding mutations to disease-associated gene(s) affect protein product but do not affect expression in any cell type. It would be interesting to include some well-studied Mendelian diseases in this category as negative controls, for which failure to implicate any disease-affected cell type is the correct answer.

We thank the reviewer for this suggestion. We included two cases of negative controls, for which failure to implicate any disease-affected cell type is the correct answer in Results, page 7, 1st paragraph.

Reviewer #3 (Recommendations for the authors):

I really do believe the authors should provide context of their method with a larger set of previously published work. Additionally, the thresholding concerns and lack of a clear null comparison make it difficult to assess the robustness of the method and the analyses.

We provide context for our method by providing a detailed description of a larger set of previously published work (Introduction, page 2-3). Additionally, we revised the statistical analysis and clarified the null comparison of the results (Results, page 6, 1st paragraph; Methods, page 21-22, ‘PrEDiCT score calculation and significance assessment’; Figure 1—figure supplement 2). We enhanced the assessments of the method performance and robustness by using several lines of external evidence (Results, page 7, 2nd paragraph; Methods, page 22, ‘Text-mining of PubMed records’, and page 23, ‘Expert curation and assessment of disease-affected cell types’; Figure 1F,G, and Figure 1—figure supplement 4).

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Hekselman I, Yeger-Lotem E. 2024. Affected cell types for hundreds of Mendelian diseases revealed by analysis of human and mouse single-cell data. Dryad. [DOI] [PMC free article] [PubMed]
    2. The Tabula Sapiens Consortium. Jones RC, Karkanias J, Krasnow MA. 2022. Tabula Sapiens. NCBI Gene Expression Omnibus. GSE201333
    3. Tabula Muris Consortium 2018. Tabula Muris: Transcriptomic characterization of 20 organs and tissues from Mus musculus at single cell resolution. NCBI Gene Expression Omnibus. GSE109774

    Supplementary Materials

    Figure 1—source data 1. The Source Data file contains data used to generate Figure 1B-G.
    elife-84613-fig1-data1.xlsx (1,003.2KB, xlsx)
    Figure 1—figure supplement 1—source data 1. The Source Data file contains the number of diseases with and without likely affected cell types per tissue.
    Figure 1—figure supplement 3—source data 1. The Source Data file contains data of numbers of diseases per number of disease-assoociated genes with and without likely affected cell types.
    Figure 1—figure supplement 4—source data 1. The Source Data file contains data used to generate Figure 1—figure supplement 4A, B.
    Figure 2—source data 1. The Source Data file contains data used to generate Figure 2A-G.
    Figure 2—figure supplement 1—source data 1. The Source Data file contains the numbers of diseases with a single, or multiple, disease-associated genes that were or were not recapitulated using mouse expression data.
    Figure 3—source data 1. The Source Data file contains data used to generate Figure 3A-C.
    elife-84613-fig3-data1.xlsx (154.5KB, xlsx)
    Figure 3—figure supplement 1—source data 1. The Source Data file contains data used to generate the circos plot that represents the similarity between cell types of distinct human tissues.
    Figure 4—source data 1. The Source Data file contains data used to generate Figure 4A-C.
    Figure 5—source data 1. The Source Data file contains data used to generate Figure 5A, B.
    Supplementary file 1. Diseases, disease genes, and likely affected cell types of each tissue.
    elife-84613-supp1.xlsx (324KB, xlsx)
    Supplementary file 2. Diseases and their PrEDiCT scores across human cell types per tissue.
    elife-84613-supp2.xlsx (1.8MB, xlsx)
    Supplementary file 3. Names of disease and cell type co-appearance in PubMed records per tissue.
    elife-84613-supp3.xlsx (201.6KB, xlsx)
    Supplementary file 4. Mouse cell clusters annotations.
    elife-84613-supp4.xlsx (904.7KB, xlsx)
    Supplementary file 5. Matching cell types between human and mouse tissues.
    elife-84613-supp5.xlsx (79.3KB, xlsx)
    Supplementary file 6. Diseases and their PrEDiCT scores across mouse cell types per tissue.
    elife-84613-supp6.xlsx (1.3MB, xlsx)
    Supplementary file 7. Tissues affected by heritable cancers.
    elife-84613-supp7.xlsx (12.6KB, xlsx)
    Supplementary file 8. Prevalence, susceptibility and classes of cell types.
    elife-84613-supp8.xlsx (17.4KB, xlsx)
    Supplementary file 9. The percentage of cells that express a gene per cell type in human tissues.
    elife-84613-supp9.xlsx (44.9MB, xlsx)
    Supplementary file 10. The percentage of cells that express a gene per cell type in mouse tissues.
    elife-84613-supp10.xlsx (14.8MB, xlsx)
    Supplementary file 11. The preferential expression of genes in cell types of human tissues.
    elife-84613-supp11.xlsx (23.4MB, xlsx)
    Supplementary file 12. The preferential expression of genes in cell types of mouse tissues.
    elife-84613-supp12.xlsx (12.3MB, xlsx)
    Supplementary file 13. Summary of PrEDiCT scores.
    elife-84613-supp13.xlsx (11.7KB, xlsx)
    MDAR checklist

    Data Availability Statement

    All data generated or analyzed during this study are included in the manuscript and in the supporting files. The Source Data files contain data used to generate all figures. Additional data and code to redo analysis are available at GitHub https://github.com/hekselman/PrEDiCT(copy archived at Hekselman, 2024) and Dryad https://doi.org/10.5061/dryad.9w0vt4bm7.

    The following dataset was generated:

    Hekselman I, Yeger-Lotem E. 2024. Affected cell types for hundreds of Mendelian diseases revealed by analysis of human and mouse single-cell data. Dryad.

    The following previously published datasets were used:

    The Tabula Sapiens Consortium. Jones RC, Karkanias J, Krasnow MA. 2022. Tabula Sapiens. NCBI Gene Expression Omnibus. GSE201333

    Tabula Muris Consortium 2018. Tabula Muris: Transcriptomic characterization of 20 organs and tissues from Mus musculus at single cell resolution. NCBI Gene Expression Omnibus. GSE109774


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES