Abstract
Tissue specificity of gene expression sheds light on the tissue-selective manifestation of hereditary disease despite the same DNA across all tissues. The evolutionary path of such tissue specificity provides essential information about the tissue-specific function of genes and the validity of disease animal models. With recent improvements of the sequencing technology, more and more large-scale transcriptomics studies have been conducted among different species across multiple tissues. In this study, we exploit existing transcriptomics resources of humans, cynomolgus macaques, rats, mice, and dogs across 13 tissues. We find that although tissue specificity of homologous gene expression is largely well conserved across species, a total of 380 genes shift or are in the process of shifting their tissue specificity. The tissue-specificity-shifting genes are less conserved than those preserving their tissue specificity or housekeeping genes. Interestingly, tissue-specificity-shifting genes tend to be less conserved at the third codon positions, likely due to their relaxed synonymous codon usage bias. Moreover, compared with genes, cassette exons are more likely to shift their tissue specificity of splicing across the five species.
Keywords: alternative splicing, codon usage bias, divergence, gene expression, phylogenetic comparative analyses, tissue specificity
1. Introduction
It is imperative to quantify and compare gene expression across tissues since this is critical to understand gene function as well as its role in the tissue-selective manifestation of hereditary disease. Usually two major gene categories are studied in depth: housekeeping genes and tissue-specific genes. The formers are of utmost importance to the basic maintenance of cell function, whereas tissue-specific genes are also invaluable as many diseases exhibit tissue specificity (Winter et al., 2004; Barshir et al., 2014; Koh et al., 2014; Kitsak et al., 2016).
A systematic analysis showed that disease genes and protein complexes indeed tend to be overexpressed in the tissues where defects cause pathology (Lage et al., 2008), highlighting the importance of the tissue-specificity studies of gene expression. In addition, understanding the tissue-specific pattern of gene expression is essential to elucidate the molecular mechanisms of tissue development and the tissue-specific transcriptional regulation.
With recent advances in the sequencing technology, large-scale transcriptomics studies made it feasible to perform the phylogenetic comparative analyses of gene expression across species (Rohlfs and Nielsen, 2015; Roux et al., 2015; Shafer, 2019). The expression-level analyses help to understand how genomic variations especially those on regulatory elements translate into phenotype variations across different species (Meireles-Filho and Stark, 2009; Wittkopp and Kalay, 2011). The evolutionary path of gene expression can then be related to morphological, physiological, and developmental characteristics of individual species.
Alternative splicing complicates the transcriptomes of higher eukaryotes. In humans, up to 95% of multiexon genes undergo alternative splicing to encode protein isoforms with different functions (Pan et al., 2008). Notably, about 15% of human hereditary diseases and cancers are associated with alternative splicing events (Marquez et al., 2012; Cui et al., 2017). The knowledge of splicing diversification helps to further explain the phenotypic differences among different species (Barbosa-Morais et al., 2012).
In this study, we combine the study of tissue specificity with the phylogenetic comparative analyses of gene expression. We focus on the evolutionary changes in the tissue specificity of gene expression. The divergent selection on the tissue specificity can help us to better understand the impact of tissue-specific expression on the tissue-specific disease pathology as well as the validity of disease animal models.
Recently Naqvi et al. (2019) investigated the evolutionary path of sex differences in gene expression by examining transcriptomes of 13 tissues in male and female humans, mice, rats, dogs, and cynomolgus macaques. For the four nonhuman species, three females and three males of each species were sampled. Another 740 human RNA-seq data sets were extracted from the Genotype-Tissue Expression (GTEx) Consortium (https://gtexportal.org/home/) (GTEx Consortium, 2013). Utilizing such a unique and valuable resource in transcriptomics, we conduct a comparative genome-wide multitissue multispecies RNA study to explore the diversification of tissue specificity in gene expression or alternative splicing.
2. Methods
2.1. Data acquisition
RNA-seq data of the four nonhuman species (cynomolgus macaque, mouse, rat, and dog) were downloaded from the GEO database (GSE125483) (Naqvi et al., 2019). A total of 13 tissues (adipose, adrenal gland, brain, colon, heart, liver, lung, muscle, pituitary, skin, spleen, testis, and thyroid) were profiled in three male and three female samples of each species. RNA-seq data for human was acquired from the GTEx Consortium (https://gtexportal.org/), following the sample selection from the original publication (Naqvi et al., 2019). An IRB is not required to analyze sequence data from GTEx (http://www.gtexportal.org/home/faq#irb). The selected tissues were described in GTEx as: adipose—visceral (omentum), adrenal gland, brain—cortex, colon—transverse, heart—left ventricle, liver, lung, muscle—skeletal, pituitary, skin (not sun-exposed), spleen, testis, and thyroid.
The homologous gene information was obtained from the Ensembl database Version 96 (https://uswest.ensembl.org/). The PhastCons conservation scores were downloaded from UCSC Genome Browser (https://genome.ucsc.edu/).
2.2. RNA-seq data processing
For nonhuman mammals, short sequence reads were mapped to the reference genomes (cynomolgus macaques to macFas5; mouse to mm10; rat to rn6; dog to canFam3) using STAR (Dobin et al., 2013) v2.6.0 with the parameters specified as:—outFilterMultimapNmax 50—outFilterMismatchNmax 999—outFilterMismatchNoverReadLmax 0.15—outSAMtype BAM SortedByCoordinate. The bam output files from STAR were further fed to Salmon (Patro et al., 2017), which is a tool for fast transcript quantification from RNA-seq data. Salmon alignment-based mode was used to quantify TPM (transcript per million) values for transcripts with the default parameters and respective reference genomes. Salmon output files were then processed using our customized scripts to obtain aggregated gene-level TPM values. For human RNA-seq data, we downloaded the preprocessed TPM values directly from GTEx.
Our read mapping was not significantly impacted by duplicated genes. We examined the mapping statistics from all nonhuman samples: there were on average 85.9% uniquely mapped reads and 6.7% multimapped reads (all mapping statistics were listed in Supplementary Table S1). In the downstream quantification step, Salmon handled the multimapped reads based on the structure of the uniquely mapped reads to provide a transcript-level quantification. The final gene-level TPM values were then aggregated. Thus, the gene expression quantification was accurate without strong effect caused by paralogous genes.
2.3. Tissue-specificity quantification
The tissue-specificity measurement τ was calculated as follows:
where , = the number of tissues, for the gene expression in tissue i. If , . τ varies between 0 and 1, where 0 for ubiquitously expressed genes and 1 for tissue-specific highly expressed genes.
To apply the phylogenetic ANOVA model to formally test the divergence of tissue specificity, we incorporated the τ-based tissue-specificity measure with a matrix consisting of expression profiles of all individuals. Specifically, to associate τ with each tissue, we defined as . Thus, for the tissue with the highest expression and for all other tissues.
For each gene, we created a matrix
where rows were for individuals and columns were for tissues. A gene was defined as tissue-specific if the max column-average was >0.8.
2.4. Identification of tissue-specific shifting pattern with the expression variance and evolution model
The expression variance and evolution (EVE) model (Rohlfs and Nielsen, 2015) can be used to detect expression divergence/diversity and branch-specific shift with expression profiles. In this study, we extended its usage in the detection of tissue-specificity shifting among species with the matrix addressed earlier. We supplied the model with measures instead of expression data to examine the shifting of tissue specificity. For example, for each of the human brain-specific genes, the matrix was constructed and fed into the EVE model to inspect whether any human brain-specific gene experienced significant tissue-specificity shift in other species.
The likelihood ratio test (LRT) statistics produced by the EVE model was assessed by the chi-square test. To validate whether the chi-squared (df = 1) distribution was appropriate to approximate the null distribution of the LRT statistics, we generated the null distribution based on parametric bootstrap through EVE. We randomly selected 100 genes and generated the quantile-quantile plots. As shown in Supplementary Figure S1a, the null distributions were similar to the chi-squared (df = 1) distribution. The distribution of the p values based on the Kolmogorov–Smirnov tests is shown in Supplementary Figure S1b. A total of 94 p values were >0.05 and only one <0.01. Thus, the EVE model is generally applicable for the detection of tissue-specificity shifting.
2.5. Synonymous codon usage distribution
The coding sequences (CDSs) of human genes were downloaded from the Ensembl Biomart (https://www.ensembl.org/biomart/). The codon usage for an interested gene was averaged from all transcripts of that gene. A series of chi-square tests were performed to test the potential bias usage of synonymous codons.
2.6. Alternative splicing of cassette exons
Cassette exons for each species were obtained from gene annotation GTF files in Ensembl (release 97). The splicing ratio of a cassette exon was calculated as 0.5 × inclusive junction read counts/(0.5 × inclusive junction read counts + exclusive junction read counts). Junction reads from individuals of the same species were aggregated together. And we required that the total number of exclusive junction reads or the total number of inclusive junction reads was >20 to obtain a valid splicing ratio calculation. Only valid splicing ratios were included in the τ calculation.
To identify homologous cassette exons, we utilized BLASTn (Altschul et al., 1990) to align cassette exons from homologous genes of different species. We built a BLAST database from human cassette exon sequences, and then blasted nonhuman cassette exon sequences against the database. Cassette exon pairs with ≥50% positions aligned and belonging to homologous gene pairs were homologous cassette exons. The similar strategy of identifying homologous exons had been applied in Sakuma et al. (2015).
The number of homologous cassette exons between two species varied from 120 to 5004 (mean 1058, median 396). Since we used human sequences as anchor sequences, species pairs involving humans and other well-annotated species had more homologous cassette exons. Only 44 cassette exons had one-to-one homologous cassette exons across all the five species. In the future, a more sophisticated model can be developed to further incorporate evolutionary distances into the identification of homologous cassette exons.
3. Results and Discussion
3.1. Tissue-specificity landscape of gene expression in five mammalian species
To measure the tissue specificity in gene expression, we calculated the Tau (τ) value, which has been reported as the most robust metric for tissue specificity (Kryuchkova-Mostacci and Robinson-Rechavi, 2017). τ ranges from 0 to 1, where 0 is for ubiquitously expressed genes and 1 for highly tissue-specifically expressed genes. Tissue-specific genes are those with average τ ≥ 0.8 among individuals of the same species. Based on our analysis of the five mammalian species, the number of tissue-specific genes ranged from 4170 to 8194.
The numbers were slightly higher than those reported in previous studies of humans, mice, and rats, which were obtained from expression-fold-change-based methods (Liao and Zhang, 2006; Yu et al., 2014; Uhlén et al., 2016). Interestingly the number in cynomolgus macaques was more than 10-fold greater than that in a previous study (Huh et al., 2012) (4170 vs. 175). The discrepancy might be caused by different analysis criteria, different numbers of considered tissues, as well as differences in sequencing technologies. To our best knowledge, there is no prior research examining transcriptome-wide tissue-specific gene expression in dogs.
The overall tissue-specificity pattern among the five species was similar: the highest number of tissue-specific genes in the testis; followed by the brain, skin, and liver; and relatively few in other tissues (Fig. 1a). If only genes with one-to-one homologous across all the five species were considered, the total number of tissue-specific genes as well as their tissue distribution were still consistent across the five species. The testis and brain were the two tissues with the most numbers of tissue-specific genes (Fig. 1b).
FIG. 1.
Number of tissue-specific genes identified across 13 tissues in five species. (a) All genes were considered for each species. (b) Only genes with one-to-one homologous genes across all five species were considered.
We confined our studies to one-to-one homologous protein-coding genes since our main focus is the evolutionary path of the tissue specialty of gene expression. Such genes made up 15%–82% (median: 49%) of all tissue-specific genes from individual tissues. Moreover, one-to-many/many-to-many genes only consisted of 0–17% (median: 2%) of all tissue-specific genes for each tissue. The remaining large number of genes lost their homologous counterparts in at least one of the five species.
3.2. Evolutionary patterns of tissue specificity across five mammalian species
To understand the evolutionary path of a gene's tissue specificity, we examined whether the gene was enriched in the same tissue of different species. Thus, we counted the number of genes that were tissue specific in the same tissue of two species (i.e., tissue specificity maintains) and genes that were tissue specific in different tissues of two species (i.e., tissue-specificity shifts). Among protein-coding genes with one-to-one orthologous across all the five species, 3522 genes maintained their tissue specificity in at least two species, whereas 898 genes shifted their tissue specificity from one tissue to another tissue across different species based on the τ measurements.
Figure 2 shows the pairwise tissue comparison among the five species. Each tissue pair corresponded to a circle and each circle was divided into 10 sections representing the 10 pairwise comparison between the five species for this tissue pair. Note that the shift from tissue A to tissue B was different from the shift from tissue B to A. The shifting direction here was from a row tissue to a column tissue. Majority of tissue specificities were conserved (diagonal circles). The shifting of tissue specificities was also observed (off diagonal circles). Many of such shifting events happened from the testis to the brain tissues as well as from the brain to the testis since the two had the most numbers of tissue-specific genes (Fig. 2a).
FIG. 2.
Tissue-specificity shifting of gene expression between two tissues of two species. All genes have one-to-one homologous counterparts across the five species. Each circle represents a comparison between two tissues. The shifting direction is from a row tissue to a column tissue. Each comparison of the two tissues between a pair of species is illustrated in one sector of the circle. Such species pairs are abbreviated by their first letters, that is, dh represents the comparison between dogs and humans. The arrangements of the species pairs for the sectors are shown along the heatmap. The shifting discoveries were based on the τ-only measurements. (a) Absolute frequency. (b) Normalized frequency. For a species pair dh, the normalized tissue-specificity-shifting frequency from tissues T to S is , where is the number of genes that are tissue-T-specific in species d and tissue-S-specific in species h; is the total number of genes that are tissue-T-specific in species d; and is the total number of genes that are tissue-S-specific in species h.
Similar findings were reported in Fukushima and Pollock (2020). Interestingly, many shifting events also happened from pituitaries to brains (Fig. 2a). Such pituitary → brain shifts were especially enriched in the comparison between rats and cynomolgus macaques (“rc”) and the comparison between rats and humans (“rh”). When the relative frequencies normalized by the numbers of tissue-specific genes in the two considered tissues were used, the pituitary → brain and the muscle → heart shifts were relatively more frequent than other shifts (Fig. 2b). And the liver and testis tissues were more likely to preserve their expression tissue specificity (Fig. 2b).
To demonstrate the robustness of the results, we also applied a fold-change method to detect tissue-specific genes and shifting ones. A gene was considered as tissue specific if (1) its expression level was more than fivefold higher in a particular tissue as compared with all other tissues; (2) it was highly expressed in the tissue (FPKM >5). The fold-change-based method is much more stringent than the τ-based method and leads to fewer number of tissue-specific genes. But the shifting patterns between tissues were similar and the enrichment of shifts between related tissues (e.g., brain < = > pituitary gland, or heart < = > muscle) still existed (Supplementary Fig. S2).
To further confirm the shifts in tissue specificity of gene expression, we applied the EVE model (Rohlfs and Nielsen, 2015) to conduct formal hypothesis tests. The EVE model was originally developed to examine the evolution of gene expression by comparing the expression variance between species and that within species. In this study, we are interested in the divergence of tissue specificity and hence focus on scenarios with differences of tissue specificity between species larger than those within species. To obtain a tissue-specificity score per tissue, we defined our own Phi (φt) score by associating τ to the gene expression in tissue t normalized by its highest expression across all tissues (see Section 2).
Thus, φt = τ for the tissue with the highest expression; φt < τ for all other tissues. Considering a gene in a tissue t, if it met the tissue-specificity cutoff (τ ≥ 0.8) in at least one species, we calculated φt for all individuals across the five species and then conducted the phylogenetic ANOVA through EVE. A total of 380 genes were declared as significantly shifting their tissue specificity (false discovery rate [FDR] < 0.05). Thus, for those genes, the tissue-specificity variation between species was significantly larger than that within species.
3.3. Genes with shifting tissue specificity have a distinct conservation pattern
To assess the impact of the sequence-level conservation on the regulation-level conservation, we examined both the CDS regions and the promoter regions of genes with shifting tissue specificity assessed by our EVE model. We observed that shifting genes were significantly less conserved in CDS regions compared with other tissue-specific genes without shifting as well as housekeeping genes (the one-sided Wilcoxon signed-rank test, p values = 0.0027 and <2.2 × 10−16, respectively; Fig. 3a).
FIG. 3.
Conservation comparison for genes with different degree of tissue specificity. Genes were grouped as tissue-specificity-shifting genes, tissue-specificity-not-shifting genes, and housekeeping genes. (a) Conservation scores of CDS regions for different gene groups. For each gene, the conservation score was calculated as the average of the PhastCons scores across CDS positions. For genes with multiple transcripts, the average scores of all transcripts were used. (b) Comparison of the conservation scores at the third positions of codons for different gene groups. The median PhastCons score at each CDS third position relative to the 5′ and 3′ ends is shown. (c) Comparison of the conservation scores at promoter regions for different gene groups. The third quartile of PhastCons score at each relative position is shown. The human annotations are used. CDS, coding sequence.
More interestingly, shifting genes tended to have less conserved third positions in codons. Thus, the 3-nt periodicity pattern for shifting genes showed deeper conservation decreases on the third positions (Fig. 3b). The conservation status in promoter regions (or 5′UTRs) also had a distinguishable pattern (Fig. 3c): shifting genes were less conserved in the immediate upstream region (−100 bp to 0) of start codons. However, they exhibited a conservation peak around (−250 bp, −100 bp), which was not observed in other nonshifting tissue-specific genes and housekeeping genes. The lengths of 5′UTRs were similar for the three gene groups (Supplementary Fig. S3).
The median length of 5′UTRs based on the human annotation was 105, 109, and 105 bp for tissue-specific shifting genes, tissue-specific nonshifting genes, and housekeeping genes, respectively. Thus, the (−100 bp to 0) region upstream of start codons was more likely to correspond to 5′UTRs and the (−250 bp, −100 bp) region was more likely to be located in promoter regions. Previous studies have shown that housekeeping genes evolve more slowly as a result of purifying selection acting on both CDS regions and core promoter regions of housekeeping genes (Zhu et al., 2008). In this study, the conservation difference between tissue-specific genes shifting their tissue specificity and those preserving their tissue specificity also suggests the different selection constraints on these genes.
3.4. Subgroups of genes with shifting tissue specificity
Among those 380 genes with significant tissue-specificity divergence, we further categorized them into four subgroups (Fig. 4a): (A) complete shifting: tissue t specific in some species, and tissue s (s≠t) specific in some other species (not necessarily in all). The τ-only-based study in Figure 2 focused on such complete shifting. If not (A), then (B) intermediate shifting: tissue t specific in some species; and in some other species the gene is not expressed (TPM <5) in tissue t and has much higher expression in other tissues although not meeting the tissue-specificity cutoff yet. If not (A) and (B), then (C) specificity loss: tissue t specific for some species; in other species, the gene lost tissue specificity through not expressed at all (the highest TPM among all tissue <5).
FIG. 4.
Subgroups of genes with shifting specificity measured by EVE. (a) Number of genes for each shifting subgroup. A: complete shifting; B: intermediate shifting; C: specificity loss; and D: initial shifting. (b) The comparison of PhastCons scores between complete shifting (subtype A) and incomplete shifting genes (subtypes B–D) at each promoter position. p Values from Student's t-tests are shown. (c) The comparison of PhastCons scores between complete shifting and incomplete shifting genes at third codon positions. p Values from Student's t-tests are shown. EVE, expression variance and evolution.
If not (A), (B), and (C), then (D): initial shifting: tissue t specific for some species; and in other species the gene is expressed in the tissue t (TPM ≥5) but has much higher expression in other tissues (at least twice of the expression in tissue t) while not meeting the tissue-specificity cutoff yet. As shown in Figure 4a, most of the shifting patterns fell into the subgroup C (68.9%), which lost its tissue specificity by not being expressed in any tissue for certain species. Intermediate shifting consisted of another 23.7% of these divergent genes. Only 13 genes (PLET1, RDH8, FNDC9, DHRS2, CCN4, ANXA10, C14orf39, LCTL, S100A5, IRX6, RIPPLY2, CRYGN, and MROH6) completely shifted their tissue specificity from one tissue to another tissue under the more stringent statistical tests, compared with 898 genes in the τ-only-based study.
We examined the potential conservation differences among the complete shifting genes and other incomplete shifting genes. As shown in Figure 4b, complete shifting genes tend to be more conserved (further based on the signs of the t-test statistics) in the 60–90 and 280–290 base pair positions upstream of the CDS regions. However, in the third codon positions, no significantly different evolutionary signal was observed between complete shifting and incomplete shifting genes (Fig. 4c), possibly due to the short amount of evolutionary time. Although some third codon positions exhibited a significant p value (Student's t-tests) <0.05, none of them passed the FDR threshold of 0.1.
3.5. Synonymous codon usage bias in different gene groups
Since tissue-specific genes especially those shifting ones exhibited less conserved third positions of codons, we investigated their synonymous codon usages. We examined all 19 codons with the codon degeneracy (18 amino acids and the stop codon). As shown in Figure 5, all 19 codons exhibit significantly biased synonymous codon usages (p values ≤0.001, chi-square tests). However, the severity of codon usage bias is very different for different gene groups.
FIG. 5.
Synonymous codon usage bias analysis for different gene groups. Single-letter amino acid codes (* represents the stop codon) for the 19 codons with degeneracy are shown along the horizontal axis. The vertical axis represents the significance of the codon usage bias (−log10 p, truncated at 300 for those very significant ones with “>” on the bars) for a specific codon in the three gene groups. The p value was based on the chi-square goodness-of-fit test.
The codon usage for tissue-specificity-shifting genes was more relaxed or uniform with an average p value of 1.1 × 10−4 for the chi-square goodness-of-fit tests (brown bars in Fig. 5). Housekeeping genes exhibited the highest codon usage bias with very extreme p values (blue bars in Fig. 5), whereas tissue-specific but not shifting genes showed an intermediate level of codon usage bias (purple bars in Fig. 5). Thus, tissue-specificity-shifting genes had less constraint on the selection of synonymous codons and displayed less conserved third codon positions.
3.6. Functional classification for different gene groups
Gene ontologies for genes maintaining or shifting their tissue specificity were functionally clustered using the PANTHER classification system (http://pantherdb.org/; v.14.0). For genes maintaining their tissue specificity, we generated the PANTHER analysis for each tissue separately. We particularly focused on brain-specific and testis-specific genes since they were the two tissues with the most numbers of tissue-specific genes. Brain-specific genes were more enriched for ontology categories compared with testis-specific ones. For example, we found 103 function hits out of 130 brain-specific genes, comparing with only 127 hits out of 349 testis-specific genes (two-proportions z-test, p value <2.2 × 10−16).
The top three molecular function GO terms in brain-specific genes are “binding,” “transporter activity” and “molecular transducer activity.” Only 19 pathway hits were found for testis-specific genes, yet unexpectedly, we found two pathways involving neuron degenerative disease. Testis-specific genes DNAI2, GAPDHS, CAPN11, and DNAH8 are involved in the Huntington disease pathway, whereas PSMA8 is in the term of Parkinson disease. All the five genes are indeed testis enriched when we examined their protein-level tissue specificity from the Human Protein ATLAS database (www.proteinatlas.org).
In the functional classification for shifting genes combining all tissues, we found 242 hits out of the 380 shifting genes for molecular function. The top three molecular function GO terms are “binding,” “catalytic activity,” and “molecular function regulator.” For the pathway analysis, it was not surprising to see that many shifting genes were involved in many different signaling pathways such as nicotinic acetylcholine receptor signaling pathway, transforming growth factor-beta (TGF-β) signaling pathway, and Wnt signaling pathway (Fig. 6).
FIG. 6.
Gene ontology pathway analysis for genes with shifting tissue specificity.
3.7. Tissue-specificity shifting at the alternative splicing level
Besides the shifting of tissue specificity at the gene level, we examined the shifting at the splicing level as well. We applied the Tau (τ) method substituting gene expression level with exon splicing ratios to examine possible tissue-specificity shifting at the splicing level. Figure 7 shows the pairwise comparison among the five species. We still observed that majority of tissue specificities were conserved along the diagonal line. However, the ratio of specificity-maintaining cassette exons compared with specificity-shifting ones was far less than that for genes.
FIG. 7.
Tissue-specificity shifting of alternative splicing between two tissues of two species. All homologous exons between two species are considered and they may not have the one-to-one homologous counterparts in other species. Each circle represents a comparison between two tissues. The shifting direction is from a row tissue to a column tissue. Each comparison of the two tissues between a pair of species is illustrated in one sector of the circle. The shifting discoveries were based on the τ-only measurements. (a) Absolute frequency. (b) Normalized frequency. For a species pair dh, the normalized tissue-specificity-shifting frequency from tissues T to S is , where is the number of cassette exons whose splicing ratios are tissue-T-specific in species d and tissue-S-specific in species h; is the total number of cassette exons whose splicing ratios are tissue-T-specific in species d; and is the total number of cassette exons whose splicing ratios are tissue-S-specific in species h.
A total of 366 cassette exons maintained their tissue specificity in at least two species. And 260 cassette exons shifted their tissue specificity at the splicing level. The tissue specificity of the splicing of cassette exons was not as conserved as that of gene expression. They were more prone to shift their tissue specificity (ratio: 260/366 vs. 898/3522, two-proportions z-test, p value <2.2 × 10−16). Moreover, we observed hotspots of splicing-specificity-shifting events such as pituitary → brain, muscle → brain, and testis → brain (Fig. 7a).
In addition, brains were more likely to preserve their tissue specificity of splicing (Figs. 7a, b). Alternative splicing has been reported to evolve more rapidly than gene expression (Barbosa-Morais et al., 2012). In this study, we showed that the tissue specificity of alternative splicing also tended to evolve more rapidly than the tissue specificity of gene expression. Since the EVE model requires using one-to-one homologous cassette exons across all species, yet only 44 such cassette exons exist across all the five species, we only explored such tissue-specificity shifting with the τ measurements.
4. Conclusion
In addition to the divergence of gene expression or alternative splicing, the tissue specificity of gene expression also diverged along the evolutionary path. The divergence of tissue specificity may underlie the diverse phenotypes across different species. The understanding of such diversification is valuable to dissect human disease pathology.
Supplementary Material
Author Disclosure Statement
The authors declare they have no conflicting financial interests.
Funding Information
W.J. and L.C. were partially supported by NIH grants R01GM137428 and R01NS104041.
Supplementary Material
REFERENCES
- Altschul, S.F., Gish, W., Miller, W., et al. 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. [DOI] [PubMed] [Google Scholar]
- Barbosa-Morais, N.L., Irimia, M., Pan, Q., et al. 2012. The evolutionary landscape of alternative splicing in vertebrate species. Science 338, 1587–1593. [DOI] [PubMed] [Google Scholar]
- Barshir, R., Shwartz, O., Smoly, I.Y., et al. 2014. Comparative analysis of human tissue interactomes reveals factors leading to tissue-specific manifestation of hereditary diseases. PLoS Comput. Biol. 10, e1003632. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cui, Y., Cai, M., and Stanley, H.E.. 2017. Comparative analysis and classification of cassette exons and constitutive exons. Biomed. Res. Int. 2017, 7323508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dobin, A., Davis, C.A., Schlesinger, F., et al. 2013. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fukushima, K., and Pollock, D.D.. 2020. Amalgamated cross-species transcriptomes reveal organ-specific propensity in gene expression evolution. Nat. Commun. 11, 4459. [DOI] [PMC free article] [PubMed] [Google Scholar]
- GTEx Consortium. 2013. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huh, J.W., Kim, Y.H., Park, S.J., et al. 2012. Large-scale transcriptome sequencing and gene analyses in the crab-eating macaque (Macaca fascicularis) for biomedical research. BMC Genomics 13, 163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kitsak, M., Sharma, A., Menche, J., et al. 2016. Tissue specificity of Human Disease Module. Sci. Rep. 6, 35241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Koh, W., Pan, W., Gawad, C., et al. 2014. Noninvasive in vivo monitoring of tissue-specific global gene expression in humans. Proc. Natl. Acad. Sci. U. S. A. 111, 7361–7366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kryuchkova-Mostacci, N., and Robinson-Rechavi, M.. 2017. A benchmark of gene expression tissue-specificity metrics. Brief Bioinform. 18, 205–214 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lage, K., Hansen, N.T., Karlberg, E.O., et al. 2008. A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes. Proc. Natl. Acad. Sci. U. S. A. 105, 20870–20875. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liao, B.Y., and Zhang, J.. 2006. Evolutionary conservation of expression profiles between human and mouse orthologous genes. Mol. Biol. Evol. 23, 530–540. [DOI] [PubMed] [Google Scholar]
- Marquez, Y., Brown, J.W., Simpson, C., et al. 2012. Transcriptome survey reveals increased complexity of the alternative splicing landscape in Arabidopsis. Genome Res. 22:1184–1195. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meireles-Filho, A.C., and Stark, A.. 2009. Comparative genomics of gene regulation-conservation and divergence of cis-regulatory information. Curr. Opin. Genet. Dev. 19, 565–570. [DOI] [PubMed] [Google Scholar]
- Naqvi, S., Godfrey, A.K., Hughes, J.F., et al. 2019. Conservation, acquisition, and functional impact of sex-biased gene expression in mammals. Science 365, eaaw7317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pan, Q., Shai, O., Lee, L.J., et al. 2008. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat. Genet. 40, 1413–1415. [DOI] [PubMed] [Google Scholar]
- Patro, R., Duggal, G., Love, M.I., et al. 2017. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rohlfs, R.V., and Nielsen, R.. 2015. Phylogenetic ANOVA: The expression variance and evolution model for quantitative trait evolution. Syst. Biol. 64, 695–708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Roux, J., Rosikiewicz, M., and Robinson-Rechavi, M.. 2015. What to compare and how: Comparative transcriptomics for Evo-Devo. J. Exp. Zool. B Mol. Dev. Evol. 324, 372–382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sakuma, M., Iida, K., and Hagiwara, M.. 2015. Deciphering targeting rules of splicing modulator compounds: Case of TG003. BMC Mol. Biol. 16, 16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shafer, M.E.R. 2019. Cross-species analysis of single-cell transcriptomic data. Front Cell Dev. Biol. 7, 175. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uhlén, M., Hallström, B.M., Lindskog, C., et al. 2016. Transcriptomics resources of human tissues and organs. Mol. Syst. Biol. 12, 862. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Winter, E.E., Goodstadt, L., and Ponting, C.P.. 2004. Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. Genome Res. 14, 54–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wittkopp, P.J., and Kalay, G.. 2011. Cis-regulatory elements: Molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69. [DOI] [PubMed] [Google Scholar]
- Yu, Y., Fuscoe, J.C., Zhao, C., et al. 2014. A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages. Nat. Commun. 5, 3230. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhu, J., He, F., Hu, S., et al. 2008. On the nature of human housekeeping genes. Trends Genet. 24, 481–484. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.







