Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 May 1.
Published in final edited form as: Nat Genet. 2023 Oct 30;55(11):1901–1911. doi: 10.1038/s41588-023-01537-1

A pan-tissue survey of mosaic chromosomal alterations in 948 individuals

Teng Gao 1, Maria Eleni Kastriti 2,3, Viktor Ljungström 1, Andreas Heinzel 4, Arthur S Tischler 5, Rainer Oberbauer 4, Po-Ru Loh 6,7, Igor Adameyko 2,3, Peter J Park 1,, Peter V Kharchenko 1,8,
PMCID: PMC10838621  NIHMSID: NIHMS1958862  PMID: 37904053

Abstract

Genetic mutations accumulate in an organism’s body throughout its lifetime. While somatic single-nucleotide variants have been well-characterized in the human body, the repertoire and consequences of large chromosomal alterations in normal tissues remain largely unknown. Here, we present a pan-tissue survey of mosaic chromosomal alterations (mCAs) in 948 healthy individuals from the Genotype-Tissue Expression project, augmenting RNA-based allelic imbalance estimation with haplotype phasing. We found that approximately a quarter of the subjects carry a clonally-expanded mCA in at least one tissue, with incidence strongly correlated with age. The prevalence and genome-wide patterns of mCAs vary considerably across tissue types, suggesting tissue-specific mutagenic exposure and selection pressures. The mCA landscapes in normal adrenal and pituitary glands resemble those in tumors arising from these tissues, whereas the same is not true for the esophagus and skin. Together, our findings show a widespread, age-dependent emergence of mCAs across normal human tissues.

Keywords: somatic mosaicism, genome integrity, aging, cancer

Introduction

Healthy tissues in the human body accrue mutations with age, including those in known cancer genes13. A subset of these mutations may drive proliferation of the originating cell, leading to macroscopic clonal expansions over time. Most somatic clones, however, do not give rise to cancer, and their impact on tissue function remains unclear. Somatic single-nucleotide variants (SNVs) can be found pervasively in most organs2,4,5. In contrast, mosaic chromosomal alterations (mCAs) are rare in healthy tissues and more indicative of malignant progression1,3,69. Large-scale studies of mCAs in the blood have elucidated their age-related incidence, genetic and environmental risk factors, as well as the connection to hematological malignancies1015. However, our understanding of mCAs in healthy solid tissues remains limited. Studies of somatic mosaicism in solid organs typically require costly or time-consuming techniques such as deep sequencing of micro-dissected tissue regions1,3,9,16,17, clonal expansion of individual cells in vitro18, or single-cell DNA sequencing19. As a consequence, existing studies have either focused on a single organ6,7,2022 or analyzed samples from only a handful of individuals9,2325. This makes it challenging to identify and compare patterns of mCAs across different organs, especially given their rare occurrences in healthy tissues. One prior study examined the landscape of mCAs in histologically normal tissues adjacent to the tumor (NATs) from a large cancer cohort26. However, NATs are morphologically and phenotypically distinct from healthy tissues2730.

The Genotype-Tissue Expression (GTEx) project systematically measured gene expression for 16,672 tissue samples from 948 healthy individuals (age 20–70 years), covering 52 tissue types in aggregate31. The limited ability to detect chromosomal alterations based on RNA-seq data, however, has hindered the characterization of mCAs from such datasets. Here, we augment RNA-based allelic imbalance estimation with recent advances in haplotype phasing, enabling high-accuracy detection of rare mCAs from RNA-seq data. Our analysis demonstrates age-related acquisition of mCAs in various normal tissue types and previously unseen tissue-specific patterns of chromosomal alterations.

Results

Accurate detection of low-clonality mCAs from RNA-seq data

Mosaic chromosomal alterations are challenging to detect from RNA-seq. As with other types of sequencing data, chromosomal alterations can be detected based on deviations in genomic coverage and allele frequencies3235. However, RNA-seq only captures a small fraction of the genome sequence and thus provides limited detection power. Furthermore, allele-specific expression can also arise due to transcription stochasticity, genomic imprinting, and tissue-specific regulatory effects in the absence of genomic alterations, hindering the specificity of mCA detection36,37. Finally, unlike clonal alterations in cancer, mCAs in normal tissues are typically present in a small proportion of the cells, yielding weak genomic signals in bulk data. Haplotype information allows allelic imbalance signals to be aggregated across consecutive loci, thereby improving the detection of subclonal alterations12,33,34,38,39. The power of this approach can be augmented by longer-range haplotype phasing. To improve phasing for the GTEx cohort, we combined recent advances in haplotype estimation (Fig. 1a; Methods)40,41,42. Importantly, we performed population-based phasing on WGS-derived genotypes using the largest reference panel available to date (TOPMed; n=97,256). This approach allowed us to improve phasing quality by approximately three-fold compared to earlier analyses34,43, yielding large haplotype blocks (perfectly phased genomic intervals with no switch errors) that have a median size of 1.1 Mb (original: 0.34 Mb) and contain a median of 7 genes (original: 2.5 genes; Fig. 1b and Supplementary Fig. 1a; Supplementary Methods).

Fig. 1: Detection of mosaic chromosomal alterations from the GTEx dataset.

Fig. 1:

a, Schematic of phasing strategy. Top, unphased SNPs within expressed regions of genes. Bottom, SNPs are phased into haplotypes. Left of arrow, data modalities (WGS, WES, RNA-seq) used to perform genotyping and phasing. Right of arrow, algorithms (Eagle2, phASER) and reference panels (TOPMed) used to perform phasing. b, Phasing performance of a GTEx subject’s chromosome 9q using different phasing strategies. RNA, genotyping from RNA. GTEx, original GTEx phasing. TOPMed+phASER, the strategy employed in the current study. Each dot represents a distinct phased block. The numbers of phased blocks are marked above the figure in brackets. c, Schematic of the phase permutation test to assess the statistical significance of haplotype imbalance. Top, observed allele counts in an aberrant region. Bottom, the same set of allele counts with randomized phasing between genes. Right bar charts show aggregated haplotype counts in the region. Star represents a statistically significant difference between the haplotype counts (schematic only). d, Per-autosome distribution of detected mCAs. Each line represents one detected event, colored by tissue type. Multi-tissue chr10 alterations from one individual are not shown.

Larger phased haplotype blocks can help distinguish consistent allelic bias in consecutive genes that is characteristic of mCAs (Fig. 1c, top) from random patterns of allele-specific expression (Fig. 1c, bottom). Combined with a custom haplotype-aware Hidden Markov Model for RNA (HaHMMR; Methods), the improved phasing indeed resulted in enhanced statistical power to detect low-clonality events (Supplementary Fig. 1). To ensure stringent control of false discovery rate, we devised a permutation test based on the coordination of allelic imbalance between genes to evaluate the statistical significance of identified events (Fig. 1c; Methods).

To evaluate the sensitivity of our mCA detection approach, we used The Cancer Genome Atlas (TCGA)44 tumor and adjacent normal samples sequenced by both RNA-seq and WGS to simulate mosaic samples with various subclonal fractions (Supplementary Methods; Supplementary Table 1). On a per-chromosome level, HaHMMR was able to detect WGS-validated events at 10% clonal fraction with appreciable sensitivity (0.17), and recovered more than half of the true events at 20% clonal fraction (sensitivity=0.6; Extended Data Fig. 1a). As expected, the detection sensitivity strongly depended on the number of expressed genes in the altered regions (Supplementary Fig. 2). To validate the specificity of our approach, we applied HaHMMR to 744 GTEx RNA-seq samples with paired WGS. We observed a false positive rate of 6.1×10−5 per chromosome for a range of Q value thresholds (Extended Data Fig. 1b). The estimated false discovery rate for mCAs above 15% clonality is less than 0.13 (assuming a prevalence of 2% among samples; Extended Data Fig. 1c; Supplementary Methods). To compare against existing methods, we turned to the 600 esophagus samples in GTEx examined by an earlier study2. The esophagus mucosa is known to undergo age-related clonal expansion of copy-neutral loss-of-heterozygosity (CNLoH) on chr9q1,22. Compared to the previous analysis, HaHMMR identified 4.7 times as many chr9q CNLoH events, with a more pronounced enrichment in samples with older age (Supplementary Fig. 3). Taken together, these evaluations demonstrate the efficacy of HaHMMR for the precise and sensitive discovery of rare mCAs from RNA-seq data.

Age-related acquisition of mCAs across normal tissues

We applied HaHMMR to systematically profile mCAs in the GTEx v8 dataset. Retaining only high-confidence events based on statistical evidence (Methods), the final mCA calls comprised 377 events from 290 samples (1.8% of all samples; Fig. 1d and Fig. 2). We further classified the detected mCAs into amplifications (n=54), deletions (n=50), and CNLoHs (n=273; Extended Data Figs. 2,3; Methods). The events ranged from 6.7% to 100% (median: 22%) in cell fraction (Fig. 3a) and from 1.3 Mb to 250 Mb (median: 57 Mb) in size, containing a median of 451 expressed genes (IQR: 269–515; Supplementary Fig. 4). The chromosomal breakpoints of the detected mCAs were enriched near telomeres and the pericentromeric region of chr9q, reflecting the preponderance of chromosome- and arm-level events (Supplementary Fig. 5).

Fig. 2: Examples of mCAs detected in different tissues.

Fig. 2:

Events shown are from distinct individuals. In each figure panel, the top track shows expression log2 fold-change (logFC); each dot represents a gene. The bottom track shows phased paternal haplotype frequency (pHF); each dot represents a SNP. Deviations in logFC from 0 reflects total copy number change; a positive deviation reflects a copy number gain, whereas a negative deviation reflects a copy number loss. Deviations in pHF from 0.5 reflect chromosomal imbalance; switches in the direction of deviation (the “block” patterns) represent phase switch errors from population-based haplotype estimation (an example is pointed out by the black arrow in the bottom right panel). Red lines in the top track represent  log2ϕ where ϕ is the estimated total copy number change relative to two copies for the segment. Red lines in the bottom track represent 0.5±θ where θ is the estimated haplotype imbalance for the segment. Gray vertical dashed lines denote centromere positions. Red vertical dashed lines mark gene positions. NEU, neutral. AMP, amplification. DEL, deletion. CNLoH, copy-neutral loss of heterozygosity.

Fig. 3: Age-related acquisition and tissue-specific prevalence of mCAs.

Fig. 3:

a, Inferred clonal fractions of detected mCAs. Black curves and shaded areas represent the estimated probability density. Each dot represents a distinct event. For each event type, the number of detected events is indicated in brackets. b, Prevalence of mCAs by age groups. The y-axis represents the fraction of subjects with mCA detected in any tissue (red), in the esophagus (green), and in any tissue other than the esophagus (blue). Centers of the error bands (marked in solid dots) represent the observed fractions of subjects with detectable mCAs (the number of cases in each group is indicated on top in brackets). Shaded areas of the error bands represent 95% confidence intervals from the binomial distribution. Multi-tissue chr10 alterations from one individual are excluded. c, Prevalence of mCAs in different tissues. d, Prevalence of mCAs by tissue subtype. P values from two-sided Fisher’s exact tests are shown on the right. In c and d, the centers and whiskers of the error bars respectively represent the observed fractions of samples with detectable mCAs and 95% confidence intervals from the binomial distribution. The numbers of biologically independent samples examined are indicated in brackets (cases/total). e, Prevalence of detectable mCAs versus the prevalence of clonally expanded SNVs (VAF ≥ 5%) in different tissues. The estimated Pearson correlation coefficient and a two-sided P-value are marked on top. Red dashed line shows a fitted linear model (no intercept) using robust regression with Huber weighting.

Overall, 243 out of 947 individuals (25.7%) had a detectable mCA in at least one tissue. The prevalence of these events showed a striking age-related trend (Fig. 3b), increasing from 1.2% (CI: 0.03%–6.7%) in subjects under 30 years of age, to 29.3% (CI: 25.8%–33.0%) in subjects above 50 years of age. Many individual tissues also showed signs of age-related accumulation of mCAs (Extended Data Fig. 4 and Supplementary Fig. 6). Samples with detectable mCAs generally harbored higher SNV burden in various tissues (Extended Data Fig. 5). Interestingly, the overall mCA incidence plateaued past age 50 and slightly decreased in the oldest age group (60–70 years), which may be in part due to the exclusion of individuals with cancer and tissue samples with neoplastic changes from the GTEx study (Fig. 3b; Supplementary Fig. 7).

Among the 243 individuals with detectable mCAs, most (213/243; 88%) carried events in only one tissue. One individual harbored a mosaic chr10 CNLoH detectable throughout the body (cell fraction 4.4%–27%), reflecting a single event that arose early in development4547 (Supplementary Fig. 8). We excluded this event from the subsequent analyses. In the additional 29 individuals, mCAs were observed in multiple tissues. These subjects tended to be older compared to individuals with mCA detected in a single tissue (P=0.017; one-sided t-test), consistent with the age-related acquisition of genome aberrations. In most subjects with mCAs detected in multiple tissues (23/29), the events affected distinct genomic regions, indicating their independent origin. The other cases included mirrored and identical events (Extended Data Fig. 6). Taken together, the age-related incidence of detectable mCAs in specific tissues indicates that most of these events likely reflect postnatal clonal expansions arising with advancing age.

Tissue-specificity of mCAs and parallels with malignancies

The prevalence of mCAs varied considerably across tissue types (Fig. 3c). Tissues that most frequently harbored mCAs were the esophagus (10.0%), adrenal gland (6.7%), and pituitary gland (2.8%). Although the remaining tissues showed low frequencies of mCAs, we identified at least one event in 20 other tissue types (Fig. 3c; examples shown in Fig. 2). Tissue subtypes of the same organ can also exhibit striking differences in mCA prevalence. For example, mCAs in the esophagus were found almost exclusively in the mucosa (147/149), whereas the muscularis and gastroesophageal junction rarely harbored any detectable mCAs (Fig. 3d). Similarly, sun-exposed skin harbored significantly more mCAs than non-exposed skin, reflecting the effect of ultraviolet exposure48 (Fig. 3d). The variations in mCA prevalence across tissue types could not be explained by sample technical covariates such as sequencing coverage (Supplementary Fig. 9). However, as our approach was only able to detect mCAs occurring at sufficiently high clonal fraction (≥7–10%), we expect the observed mCA prevalence to be influenced by the developmental timing of the event, proliferation patterns within the tissue, as well as the tissue sampling scheme. For instance, the clonal scope of esophagus tissues, which are sampled as local regions, would be substantially smaller than that of blood samples where clones are well-mixed. These factors would similarly impact the detection of somatic SNVs. Indeed, the prevalence of detectable mCAs was correlated with that of clonally expanded SNVs (VAF ≥ 5%) across tissues (Pearson’s R=0.56, P=3.9×10−5; Fig. 3e). The high frequencies of mCAs in the esophagus mucosa and adrenal gland appeared to stand out above this underlying trend (Fig. 3e).

In addition to the prevalence of detected events, the positional distributions of mCAs along the genome also showed significant tissue-specificity (Fig. 1d and Extended Data Fig. 3; P < 1×10−5, two-sided Fisher’s exact test). In particular, the distributions of mCAs that we detected from solid tissues were markedly different from those reported in blood12 (P < 1×10−5, two-sided Fisher’s exact test), reflecting tissue-specific mechanisms of mutagenesis and clonal selection.

Given the open questions about the extent to which somatic variations in normal tissues reflect genomic aberrations in different cancers3,12,49,50, we asked whether the detected mCAs resemble those seen in cancers originating in the corresponding tissue. Of note, tissue samples exhibiting visible neoplastic expansion were excluded from sample collection based on pathology evaluation51. We first defined genomic regions recurrently altered by mCAs (Fig. 4a; Methods). Recurrently gained regions across tissues included chr3q, chr12, and chr19 affecting several known oncogenes such as PIK3CA/B, KRAS, ERBB3, MDM2, and CCNE1 that are frequently amplified in cancer (Fig. 4a, left). Recurrent deletions on chr3 and chr9q affected several tumor suppressors involved in DNA repair (MLH1, BAP1, and ATR), cell cycle (PPP6C), and growth (TSC1; Fig. 4a, middle). Although chr3q was both recurrently gained and deleted, the two types of events occurred in separate sets of tissues, highlighting distinct selection pressures due to the tissue environment (Extended Data Fig. 3). We also identified recurrent CNLoHs on chr17p and chr9q (Fig. 4a, right), which likely serve as “second hits” to somatic mutations in TP53 and NOTCH1 that further promote clonal expansion13. To assess the overall concordance of mCA landscapes between cancer and normal tissues, we compared the chromosome-wise mCA frequency of each tissue to that of the matching cancer types (Methods; Supplementary Table 2). We found that the mCA patterns in normal tissues in aggregate showed a higher resemblance to the corresponding cancers than expected by chance (P < 1×10−4, permutation test; Supplementary Fig. 10).

Fig. 4: Genomic distributions of mosaic chromosomal alterations across tissues.

Fig. 4:

a, Recurrence analysis of mCAs by event types (left panel, amplifications; middle panel, deletions; right panel, CNLoHs). The height of the histograms represents the number of events covering a given genomic position. Color intensities indicate significance levels of one-sided permutation P values. NS, not significant. Known oncogenes and tumor suppressors in recurrently altered regions are highlighted. The event coverage shown for chr9q CNLoH is capped at 12 (up to 123 additional event coverages are indicated by a triangle). b, Chromosomal alteration frequencies by genomic positions in the normal adrenal gland, pheochromocytoma (PCC), and adrenal cortical carcinoma (ACC). The numbers of biologically independent samples are indicated in brackets. Event frequencies are plotted by 5 Mb bins.

Intriguingly, the overall mCA landscape in the normal adrenal gland strongly resembled that of pheochromocytoma (PCC), which originates from chromaffin cells in the adrenal medulla (Fig. 4b). Particularly striking was the recurrent loss or CNLoH events on chromosome 3 (n=7), which is also the second most frequent event in PCC52 (~50% of PCC patients; Fig. 4b). In PCCs, chr3 deletions frequently target the VHL gene located on chromosome 3p25–26, a tumor suppressor regulating cell growth and division52. On the other hand, other alterations recurrent in PCC (e.g., chr1p and chr17p deletion) were not observed in the normal adrenal gland, indicating that these events are more specific to malignancy. We also detected alterations on chromosomes 7, 8, 9, and 22, which are common in both PCC and adrenocortical carcinoma (ACC; Fig. 4b).

Normal pituitary tissues also exhibited mCA patterns reminiscent of pituitary cancers. Pituitary tissues had the highest mCA burden per mutated sample and tended to harbor chromosomal gains more frequently than other tissues (Extended Data Fig. 7). This mutational pattern is typical of hormone-secreting pituitary tumors, which are characterized by frequent arm-level copy number variations (CNVs) disrupting large fractions of the genome, with very low SNV and focal CNV burden53. Furthermore, the recurrently gained chromosomes in the normal pituitary are concordant with those in pituitary tumors53 (Extended Data Fig. 8a).

In certain tissue types, however, we observed a dissimilarity between the normal and cancer mCA landscapes. In normal esophagus, chr9q CNLoH (n=137) was the most frequent event, followed by chr17p CNLoH (n=5) and chr3 gain (n=5). The mCA landscapes of esophageal cancers, however, are dominated by a distinct set of events (Extended Data Fig. 8b). Although chr9q LoH is also observed in esophageal cancers, it is not the most recurrent event. These observations are consistent with prior studies in smaller cohorts1,2,22. Likewise, mCAs characteristic of skin cancers were rarely observed in the normal skin (Extended Data Fig. 8c). Although chr9q LoH was recurrent in both skin cancers and normal skin (n=13), such events likely reflect benign clonal expansion through somatic NOTCH1 mutations1,3,22,50.

Risk factors of mCAs

Somatic mutagenesis can be associated with demographic and environmental factors1,1013,22. We therefore assessed the effect of age, sex, ancestry, as well as lifestyle habits (drinking and smoking) on mCA incidence (Methods). As expected, age was significantly associated with mCAs in the esophagus mucosa (OR=1.7, P=1×10−8) as well as in other tissues (OR=1.3, P=5×10−4; Fig. 5a and Supplementary Fig. 6). Drinking and smoking were associated with mCAs specifically in the esophagus mucosa (OR=2.1, P=2.3×10−3, smoking; OR=2.4, P=4.2×10−3, drinking; Fig. 5a), but not in other tissues. In the esophagus mucosa, we also observed a dosage effect by increasing drinking frequency among alcohol users (P=3.5×10−4, multivariate logistic regression; Extended Data Fig. 9b; Methods), with an especially high risk of detectable mCAs for daily drinkers (OR=6.0, P=3.9×10−7). We did not observe a significant association with sex or ancestry, possibly due to the limited sample size.

Fig. 5: Risk factors of mCAs and associations with morphological features.

Fig. 5:

a, Associations of mCAs with demographic factors and lifestyle history (esophagus mucosa: n=533; other: n=15,357 biologically independent samples). Unadjusted two-sided P values from multivariate logistic regression are shown on the right and stars denote their significance levels (*P < 0.05, **P < 0.01, ***P < 0.001). Age was analyzed in decades. b, Associations of mCAs with tissue morphology features (n=15,847 biologically independent samples). Two-sided Q values (FDR-corrected P values) from multivariate logistic regression are shown on the right and stars denote their significance levels (*Q < 0.05, **Q < 0.01, ***Q < 0.001). In a and b, the centers (marked by diamonds) and whiskers of the error bars respectively represent the estimated odds ratio (OR) and 95% confidence interval from the regression model.

Functional impacts of mCAs

We next asked whether mCAs can impact tissue maintenance and function in the absence of neoplastic changes. We first assessed the association of mCA with morphological features extracted from pathology evaluation of the source tissue (Methods). We found that the presence of mCA was significantly associated with hyperplasia (OR=3.1, Q=1.8×10−3) and inflammation (OR=3.8, Q=1.8×10−3; Fig. 5b, Supplementary Fig. 11). The latter association is especially interesting as it suggests that either clonal expansions carrying mCAs can elicit local inflammation54,55 or, alternatively, prolonged inflammation due to other causes can promote mutagenesis56 or the expansion of mutated clones over time5759. Since the prevalences of hyperplasia and inflammation vary substantially across tissue types, these associations may in part reflect distinct proliferative potentials and clonal dynamics of different tissues.

Functional impacts of mCAs were also apparent at the level of transcriptional state. Chromosomal gains and deletions altered the dosage of 487 genes on average and resulted in systematic up- and down-regulation of gene expression within the affected regions (Extended Data Fig. 2a and Supplementary Fig. 4). However, the majority (71%) of the detected mCAs were copy-neutral. In contrast to gains and deletions, CNLoHs do not modify the copy number of genes, but have been shown to create a selective advantage by interacting with somatic or inherited variants1,15,60. Given the well-characterized effects of inherited genetic variations on gene expression, we hypothesized that mosaic CNLoHs can alter the transcriptional profile of a tissue by deleting or duplicating regulatory variants at quantitative trait loci (QTLs). We first focused on cis expression QTLs (cis-eQTLs), where inherited variants (eVariants) influence the expression of nearby genes (eGenes). Across tissues, mosaic CNLoHs affected an average of 138 tissue-specific eVariants (Fig. 6a), of which 13% were predicted to result in a greater than 2-fold change in expression. We reasoned that the impact of mosaicism on tissue-level gene expression should be reflected by a shift in the observed phenotype (e.g., eGene expression level) toward that of homozygotes, depending on which allele is in excess (Fig. 6b). To demonstrate this potential mechanism, we focused on 7 eVariants in the esophagus mucosa that have large effects and were recurrently affected by chr9q CNLoH events (Methods). As expected, all 7 loci showed a trend in the expected direction (P=0.008, one-sided binomial test), out of which 3 showed a significant correlation between eGene expression and eVariant fraction (Fig. 6c, Supplementary Fig. 12). Based on the catalog of cis splicing QTLs (cis-sQTLs) and the associated variants (sVariants) within GTEx, mosaic CNLoHs affected an average of 41 tissue-specific sVariants (Fig. 6d). We similarly tested whether mosaic CNLoHs can also cause differential splicing via sQTLs (Fig. 6e). Out of 23 sQTLs recurrently affected by chr9q CNLoHs in the esophagus mucosa, 18 showed a trend consistent with the established differential splicing pattern (P=0.005, one-sided binomial test; Fig. 6f, Supplementary Fig. 13). Interestingly, one of the significant loci impacted CDC26, a cell cycle gene involved in the anaphase-promoting complex (Fig. 6f). Taken together, these examples demonstrate allelic dosage adjustment at QTLs as a potential pathway by which mosaic CNLoHs can impact gene expression and thereby cellular phenotypes.

Fig. 6: Mosaic CNLoHs impact gene expression and splicing patterns via QTLs.

Fig. 6:

a, Histogram of the number of cis-eQTL variants affected per mCA event and tissue. b, Schematic of a CNLoH event adjusting inherited variant dosage at a cis-eQTL thereby influencing gene expression. c, Example eQTLs showing significant correlations between mosaic eVariant genotype fraction and eGene expression. d, Histogram of the number of cis-sQTL variants affected per mCA event and tissue. e, Schematic of a CNLoH event adjusting inherited variant dosage at a cis-sQTL thereby influencing alternative splicing patterns. f, Example sQTLs showing significant correlations between mosaic sVariant genotype fraction and splicing phenotypes. In d and f, for each locus the top panel shows CNLoH events (green lines) affecting the e/sVariant (solid dots). The pair of e/sGene and e/sVariant is indicated in the title. Dot colors indicate variant dosage inferred by the mosaic cell fraction. The bottom panel shows the relationship between e/sGene expression and e/sVariant dosage. Boxplots show the distribution of wild-type data points. Each dot represents a distinct sample. Dot colors indicate sample type. WT, wild-type. The centers (marked by dashed lines) and shaded areas of the error bands respectively represent the predicted means and 95% confidence intervals of a linear regression model fitted on mosaic data points. The estimated Pearson correlation coefficient and a one-sided Q-value (FDR-corrected P value) are marked on top. The numbers of biologically independent samples are indicated in brackets (for the WT category, sample sizes are provided for the three respective genotypes).

Single-cell dissection of the phenotypic impacts of mCAs

To follow up on the unexpected patterns of mosaicism in the normal adrenal gland, we screened 13 adrenal scRNA-seq samples (Supplementary Table 3) for mCAs using Numbat34. In one sample (DO6; age 64 years), we identified a high-confidence clone with multiple aberrations (Fig. 7a). The mutant clone appeared to originate from the adrenocortical cell populations (Fig. 7b,c). Based on marker gene expression of cortical zones, we found that the mutant clone made up 75% of the zona glomerulosa and 61% of the zona fasciculata cells, whereas the vast majority of zona reticularis cells were genotypically normal (Fig. 7bd). This distribution is consistent with the current centripetal model of adrenal cortex replenishment, from the outer zona glomerulosa towards the inner zona reticularis layers61 (Fig. 7b).

Fig. 7: Single-cell dissection of mosaic chromosomal alterations in the adrenal gland.

Fig. 7:

a, Pseudobulk copy number profile of the mutant clone in DO6 inferred from scRNA-seq. The top track shows expression log2 fold-change (logFC); each dot represents a gene. The bottom track shows phased paternal haplotype frequency (pHF); each dot represents a SNP. b, Diagram of human adrenal gland tissue organization and marker genes. ZG, zona glomerulosa. ZF, zona fasciculate. ZR, zona reticularis. c, Cell type annotations (left panel), adrenal cortex zone annotations (middle panel), and single-cell genotype assignments (right panel) of DO6 adrenal cells visualized on a t-distributed stochastic neighbor-embedding (t-SNE) of gene expression profiles. In all panels, only cell types with > 10 cells are shown. In the right panel, only cells with genotype classification confidence > 0.95 are shown. d, Proportions of mutant cells in different adrenal cortex zones. Only cells with genotype classification confidence higher than 0.95 are included. The centers and whiskers of the error bars respectively represent the observed fractions of aneuploid cells and 95% confidence intervals from the binomial distribution. The numbers of cells (with genotype classification confidence > 0.95) in each category are marked above in brackets (aneuploid/total; measurements are biologically independent). e, RNAscope® in situ hybridization for cortical markers HSD3B2 (green) and CYP11B2 (red) with DAPI counterstain (blue) on the adrenal gland section of DO6. The left panel shows an overview of the tissue section. White arrows indicate areas with ectopic expression of marker genes. v, vessel. The right panel shows a higher-resolution view of a region (white box in the left panel) with abnormal tissue organization. The experiment was performed once. f, Differentially expressed genes between the mutant and wild-type cells in the ZF compartment. Each dot represents a gene. The upward triangle denotes a data point beyond the axis limit. NEU, neutral. AMP, amplification. DEL, deletion.

We further performed histological examination of adrenal tissues from the above donor (DO6) as well as an age- and sex-matched control (DO3). In situ hybridization for cortical marker genes of the DO6 adrenal cortex revealed diffuse cortical hyperplasia (Fig. 7e, left; Extended Data Fig. 10), as well as disorganized and ectopic expression of HSD3B2 and CYP11B2 (Fig. 7e, right; Extended Data Fig. 10). Furthermore, we observed abnormal extracortical landmarks outside the capsular layer (Extended Data Fig. 10). In agreement with our scRNA-seq analysis, the organization of the medulla was unaffected (Extended Data Fig. 10d).

To investigate the transcriptional changes in the mutant clone, we performed differential gene expression (DE) analysis within the zona fasciculata compartment, comparing cells with and without mCAs. We found 34 significantly differentially expressed genes in cis (i.e., residing within the affected regions) and 40 in trans (i.e., residing outside the affected regions) of the detected mCAs (Fig. 7f). The upregulated genes include MC2R (FC=1.35; Q=8.5×10−5), which encodes for the receptor of adrenocorticotropic hormone (ACTH), as well as CYP21A2 (FC=1.13; Q=3.6×10−13) and HSD3B2 (FC=1.17, Q=1.1×10−3), which encode key enzymes in cortisol production. Of note, ACTH is the main regulator of cortisol production in the zona fasciculata, and the overexpression of these genes indicates an overreactive adrenal phenotype. Also upregulated is IGFBP2 (FC=1.42; Q=5.6×10−5), which encodes for a binding protein involved in insulin-like growth factor signaling, a pathway associated with overgrowth and tumorigenesis in the adrenal cortex62,63. Overall, these features illustrate the potential impacts of mCAs on organ morphology and function.

Discussion

Chromosomal aberration is a phenomenon we normally ascribe to cancer cells34,64,65. Here, by examining an extensive panel of RNA-seq data, we were able to profile the patterns and prevalences of mCAs in many normal tissues in the human body. It is important to note that our approach is only able to detect relatively large mutant clones (≥7–10% cell fraction) within the tissue. Any mCA we identify was presumably acquired in a single progenitor cell which subsequently underwent considerable clonal expansion. Both the prevalence and genomic distribution of mCAs were highly tissue-specific, suggesting varying tissue turnover dynamics, mutagenic exposure, and selection pressures.

While age appeared to be the universal driver behind mCA acquisition across tissues, we were also able to characterize the effects of extrinsic risk factors such as smoking, drinking, and ultraviolet exposure, extending the observations from previous studies14,22,48. Due to the limited cohort size, we were unable to systematically discover inherited variants associated with mCA acquisition. However, we note one instance of chr8q CNLoH in a lung sample that potentially reflects a known mechanism of mCA acquisition via a rare NBN frameshift variant, which was previously described in clonal hematopoiesis15 (Supplementary Fig. 14). Larger cohorts are needed to identify inherited genetic mechanisms underpinning mCA acquisition in different tissues.

Our analysis revealed previously unrecognized clonal mosaicism in the adult human adrenal and pituitary glands. The striking similarity between the normal and tumor mCA landscapes suggests that morphologically normal clones carrying large mCAs may represent a premalignant stage of tumorigenesis in these two organs. This is in contrast to the normal esophagus and skin that predominantly lack alterations characteristic of the corresponding cancers, suggesting that these aberrations generally occur late in tumor evolution, and likely do not provide selective advantage until transformation1,3,22,48. It is important to note, however, that our estimated mCA prevalence in the normal adrenal gland (6.7%) far exceeds the prevalence of malignant cancers arising from the adrenal gland (0.0001%–0.04%)66,67. Similarly, the observed mCA incidence in the normal pituitary (2.8%) is far higher than the incidence of pituitary adenoma and carcinoma (0.0005%–0.5%)68. It is therefore likely that most detectable mCAs do not give rise to malignancies. Nevertheless, the expansion of aneuploid clonal populations in the adrenal and pituitary glands may exert systemic effects by distorting hormonal profiles or their dynamics53,69. Indeed, our scRNA-seq analysis of a normal adult adrenal gland harboring mCAs revealed gene expression changes related to cellular proliferation and hormone secretion.

Understanding the oncogenic potential of mutant clones is especially important for cancer surveillance in the healthy population. However, such fitness effects are difficult to evaluate as our ability to detect mutations is biased by the tissue sampling scheme, the developmental timing of the event, and the proliferation patterns of the tissue8,7072. These factors similarly influence the detection of SNVs, leading to a correlation in the observed prevalences of SNVs and mCAs across tissues (Fig. 3e). Prevalences of mCAs in the esophagus mucosa and adrenal gland appeared to be above this background trend, suggesting that mCAs in these tissues may provide a selective advantage to the affected clones. While such fitness effect is known for LoH of the NOTCH1 and TP53 loci1,3, analysis of larger cohorts is needed to characterize the impact of recurrent mCAs in specific genomic regions.

We expect the extent of somatic variation within the human body to be substantially higher than the estimates in our study. The vast majority of events below 10% clonal fraction would be missed by our mCA screening approach (Extended Data Fig. 1a). Furthermore, as our RNA-based mCA detection approach depends heavily on gene expression in the altered regions, the prevalence of events in gene-poor or transcriptionally silenced regions of the genome was likely underestimated. Future studies employing more sensitive techniques such as high-coverage WGS should help uncover a more complete landscape of mCAs across tissues. Further investigations are also needed to identify the mechanisms that generate mCAs in normal tissues. Although it is well known that mosaic CNLoH events frequently result from mitotic recombination73, mechanisms for generating chromosomal deletions and gains may include non-allelic homologous recombination, non-homologous end joining, or chromosome mis-segregation, among others. Here as well, WGS data would enable analysis of exact breakpoints that are difficult to determine from RNA data, shedding light on the underlying mutational processes. Nonetheless, our findings demonstrate that age-related clonal expansions of mCAs are widespread across normal human tissues, exhibiting complex relationships with tumorigenesis. The mCA screening paradigm employed here can be used to guide more focused future studies using single-cell, spatially-resolved, or functional assays to characterize the etiology and impact of mCAs in normal tissues.

Methods

Study cohorts, datasets and ethical approval.

All analyses performed in the study comply with ethical regulations within the respective cohorts (GTEx, TCGA, Vienna adrenal study). Bulk RNA-seq, WGS, WES, individual-level genotype data, and the associated sample manifests from the GTEx v8 release were downloaded from the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-Space (AnVIL) platform (https://gen3.theanvil.io/) using the Gen3 client. Additional WGS and genotyping data included in the GTEx v9 release were obtained through dbGaP and Terra/AnVIL (phs000424.v9.p2). Copy number profiles, WGS, and RNA-seq data for TCGA samples were downloaded from the GDC Data Portal (https://portal.gdc.cancer.gov). All GTEx human donors were deceased, with informed consent obtained via next-of-kin or legally authorized representative for participation in the study. Each institution that collected specimens has either submitted a GTEx research protocol and undergone IRB review, or consulted with their Office of Research Subject Protection. Please refer to the original publications for a full description of cohort demographics, patient recruitment, and ethics approval within GTEx and TCGA31,44,51. The collection and study of human adrenal glands of kidney donors were carried out under an institutional ethical permit approved by the Ethics Committee of the Medical University of Vienna (EK Nr 1544/2020). Informed consent of the donors or next-of-kin was in place and no compensation was provided to participants.

Generation of single-cell transcriptomic dataset of the human adrenal gland.

Excised human adrenal glands from local kidney donors were kept in sterile 1× DPBS without calcium or magnesium on ice until arrival to the laboratory. The gland was cleaned from the surrounding fat, cut into strips and then into pieces of 5 mm × 5 mm. Enzymatic digestion took place in a mix of dispase II (2 mg/ml; Roche, Cat No 04942078001), collagenase P (2 mg/ml; Roche Cat No 11213865001) in 1× DPBS solution without calcium or magnesium containing 2 mM CaCl2 at 37°C, with mild shaking for 30–40 min, followed by mechanical dissociation with a wide-bore pipette tip. Single-cell suspension was transferred into a clean Falcon tube through a 100-μm-pore size cell culture filter. Cells were pelleted by centrifugation at 300 g, 5 min, 4°C and washed three times with 10% FBS (Fetal Bovine Serum) in 1× DPBS without calcium or magnesium. Single cells were methanol-fixed according to the demonstrated protocol by 10x Genomics in 90% methanol and stored at −80°C. Rehydration of methanol-fixed single cells was performed just prior to library preparation. Briefly, fixed cell suspension was equilibrated on ice and spun down at 3000 g, 10 min, 4°C. Supernatant was removed and replaced by 1 mL rehydration buffer (1% BSA, 0.5 U/μL RNaseOUT [ThermoFisher Scientific, 10777019] in 1× DPBS). Spinning down, removal of supernatant and resuspension in rehydration buffer were repeated two more times before cells were counted and finally resuspended to a concentration of 700–1500 cells/μL prior to generating emulsion and loading to the Chromium controller and downstream library generation and sequencing. Single-cell RNA-seq libraries were generated using the Chromium Controller and the Next GEM Single Cell 3’ Reagent Kit (v3.1, 10x Genomics) according to the manufacturer’s instructions aiming for a maximum cell recovery of 10,000 cells. Library concentrations were quantified with the Qubit 2.0 Fluorometric Quantitation system (Life Technologies, Carlsbad, CA, USA) and the size distribution was assessed using the Experion Automated Electrophoresis System (Bio-Rad, Hercules, CA, USA). Libraries were sequenced by the Biomedical Sequencing Facility at the CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, using the Illumina HiSeq 4000/NovaSeq 6000 platform. Raw sequencing data were demultiplexed using bcl2fastq v2.20.0.422. The data were further processed using the cellranger count pipeline (v7.0.0, 10x Genomics) with the GRCh38 reference genome and GENCODE v32 gene annotation.

Fixation, processing, and staining of human adrenal gland samples.

Following removal of the surrounding fat, a 0.5-cm-thick biopsy of the adrenal gland was fixed in 4% PFA with mild agitation for 12–16 hrs at 4°C. Following washes with 1× PBS and cryoprotection in 30% sucrose for 48–72 hrs the adrenal biopsy was embedded in Tissue-TEK O.C.T. and stored at −80°C until further processing. For downstream stainings 10-μm-thick cryosections were collected on SuperFrost PLUS slides.

Immunofluorescent staining.

Slides equilibrated to room temperature were subjected to heat-mediated antigen retrieval in 1× Target Retrieval Solution (Dako, S1699) by submerging the slides in the solution and steaming for 5 min and allowed to cool down. Sections were then washed three times in PBS with 0.1% Tween-20 (1× PBST) and incubated with primary antibodies diluted in DAKO Antibody Diluent (Cat No S0809, Agilent) at 4°C, overnight. Following washes in 1× PBST and incubation with secondary antibodies diluted in DAKO Antibody Diluent at room temperature for 1 h, slides were washed three times in 1× PBST and mounted with Mowiöl mounting medium. Primary antibodies used: rabbit anti-CHGA (1:500, Cat No 259 003, Synaptic Systems), rabbit anti-CYP11B1 (1:50, Cat No HPA056348, Sigma-Aldrich), mouse anti-SMA-Cy3 (1:1000, Cat No C6198, Sigma-Aldrich). DAPI was applied following staining at a concentration of 0.5 mg/ml. For primary antibody detection, secondary antibodies raised in donkey, conjugated with Alexa-488, and -555 fluorophores were used (1:1000, Molecular Probes, ThermoFisher Scientific).

RNAscope® in situ hybridization.

RNAscope® in situ hybridization was performed on cryosections from PFA fixed, cryoprotected adrenal biopsies (see above for fixation and sectioning) with the manual RNAscope® Multiplex Fluorescent V2 Assay according to manufacturer’s instructions. Probes used were Hs-CYP11B2-C2 (Cat No 592851-C2) and Hs-HSD3B2-C3 (Cat No 467681-C3; commercially available through the ACDBio website). DAPI was applied following staining at a concentration of 0.5 mg/ml.

Microscopy and figure assembly.

Images were acquired using an LSM 780 Zeiss confocal microscope equipped with 5×, 10×, 20×, and 40× objectives. Acquired images in the lsm format were processed with ImageJ (v1.54d) for export as tiff files. Figures were then assembled with Adobe Photoshop (v24.7) and Illustrator (v27.8.1).

Phasing strategy.

We combined population-based phasing and read-backed phasing, integrating across data modalities (WGS, WES, bulk RNA-seq). For the subjects with DNA genotyping data (n=945), we first phased common and low-frequency variants (population AF > 0.001 in the GTEx cohort) genotyped from WGS using Eagle241 (v2.4) with the TOPMed reference panel42. Subject VCFs were submitted to the TOPMed imputation server (https://imputation.biodatacatalyst.nhlbi.nih.gov). We then used phASER40 to perform read-backed phasing of rare variants. For subjects with genotypes available in the v8 release, the existing phASER annotations in the v8 analysis genotype VCF were used. For subjects with genotypes available in the v9 release, we ran phASER (v1.1.1) under settings --mapq 255 --baseq 10 on all available RNA-seq BAMs. We only kept read-phased variants linked with a common or low-frequency variant with phasing confidence > 99%. For subjects without WGS data (n=3), we performed genotyping from bulk RNA-seq data using a panel of common germline single-nucleotide polymorphisms (SNPs; population AF > 0.05 in 1000 Genome). SNPs with allele fraction between 0.15 and 0.85 were determined as heterozygous and those with allele fraction of 1 and covered by more than 10 reads were determined as homozygous. The genotyping calls were then combined with those from WES. When the same SNP was genotyped from both RNA-seq and WES, the genotype from WES was used. Lastly, we removed variants with population AF < 0.001 in the TOPMed reference panel before performing population-based phasing using the TOPMed imputation server.

Detection of mCAs from bulk RNA-seq.

We modified the haplotype-aware HMM in Numbat34 to optimize the detection of mCAs from bulk RNA-seq of normal tissues (we refer to the resulting method as HaHMMR). First, we simplified the state space to only consist of diploid, heterozygous deletion, CNLoH, and imbalanced amplification. Each of the aberrant states consists of two haplotype states (major and minor) with reciprocal allelic deviations as described previously34. The transition matrix is structured such that transition between aberrant states is forbidden, i.e., only one aberrant state is allowed per mCA segment. In addition, for each SNP j, we introduced a site-specific parameter aj (where aj=1 when the phased allele is the alternative allele, and aj=0 otherwise) in the allele count model to account for allelic mapping bias:

YjBetaBinommj,θ+(1)ajrγ,1θ+(1)ajrγ

where θ is the haplotype fraction, mj is the phased allele count at SNP j, and γ is the inverse overdispersion in allele counts. We fixed r=0.015 based on empirical estimates. The forward algorithm was used to identify the optimal deviations in θ for each chromosome by maximizing the total model likelihood. Additional details on the HMM configuration are given in the Supplementary Methods. Expression profiles from samples with age < 30 years were used as expression reference for samples deriving from the same tissue; the sample itself was always excluded from the reference. For tissue types where less than two samples were collected from individuals with age < 30 years, we included samples from the youngest two individuals as the reference. For the analysis of all RNA-seq samples, we used a transition probability of 𝑡 = 10−6 and a phase switch rate of λ=0.5 unless otherwise specified.

Permutation test to evaluate event significance.

We devised a permutation test to evaluate the statistical significance of coordinated allelic imbalance across genes. This procedure is based on the rationale that randomizing phasing between genes should destroy the likelihood evidence of true events (H1) but not artifacts (H0). For each called event, we randomly shuffled the phasing between genes and recalculated the log likelihood ratio (LLR) using the forward algorithm for 500 repetitions, thus generating independent samples for the null distribution of the LLR statistic. Assuming asymptotic normality, we approximated the true null distribution of the LLR by fitting a Gaussian distribution to the obtained samples using maximum likelihood. A one-sided P value for the observed LLR (x) was then calculated as 1Φ(x;μˆ,σˆ), where Φ is the normal CDF and μˆ,σ^ are the estimated mean and variance of the null distribution.

Event filtering and quality control.

We applied several filters to the raw mCA calls to reduce the rate of false positives. For each chromosome, we derived a combined P value using Fisher’s method from all aberrant regions. Multiple-testing correction was performed on the per-chromosome P values using the Benjamini-Hochberg procedure to obtain adjusted significance scores (Q values). We removed any events that contained ≤ 50 genes and recurrent artifacts on the telomere regions of chr19 and chr20. Finally, we only retained high-confidence events (Q < 7.5×10−3 and LLR > 35) for the final call sets.

Alteration type assignment.

We classified mCA events as deletion, amplification, and CNLoH events using the estimated haplotype imbalance (θ) and expression fold-change (ϕ). First, the posterior distributions for θ and ϕ (with a uniform prior) were obtained using Laplace approximation, based on the observed allele counts and gene expression counts. Then, the likelihood of each genotype (g{(3:1),(2:1),(1:0),(2:0)}) was obtained by integrating over all possible cell fractions f:

P(G=g)01pθ(θ(f,g))pϕ(ϕ(f,g))df

where pθ and pϕ are the posterior distributions of θ and ϕ, respectively; θ(f,g) and ϕ(f,g) are the expected haplotype imbalance and expression fold-change, respectively, for a given mutant cell fraction and genotype. Finally, the normalized posterior probabilities were obtained for amplification P(G{(3:1),(2:1)})=P(G=(3:1))+P(G=(2:1)), deletion P(G=(1:0)), and CNLoH P(G=(2:0)). The most likely mCA type was then assigned to the event.

Cell fraction inference.

We inferred the mutant cell fraction of each mCA event based on the posterior distributions of the haplotype imbalance (θ) and expression fold-change (ϕ), integrating over uncertainty in genotypes:

p(f)g𝒢pθ(θ(f,g))pϕ(ϕ(f,g))

The maximum likelihood estimate of the mutant cell fraction is:

fˆ=argmaxf[0,1]p(f)

Event recurrence analysis.

To identify recurrently altered regions, we compared the number of observed events at each genomic position with a null distribution assuming that mCAs are uniformly distributed across the genome. Under the null hypothesis, the probability that mCA event i with length li covers any given genomic position j is li/L, where L is the size of the genome (excluding sex chromosomes). Let Ii be a binary indicator of whether event i covers genomic position j, then

IiBernli/L,i=1M

where M is the total number of events. The number of events affecting position j is then a random variable distributed as

CjiMIi

For each alteration type, we obtained the empirical distribution of Cj by running 10,000 simulations. Finally, we computed a one-sided P value for the number of observed events at each genomic position based on the empirical null distributions.

Tissue specificity of mCA landscapes.

To assess whether the positional distribution of mCAs along the genome is tissue-specific, we binned events from each tissue (or tissue groups) by chromosome and tested the association of tissue type and chromosomal distribution using a two-sided Fisher’s exact test with Monte Carlo (105 replicates). For the comparison with blood mCAs, we obtained processed segments from the Loh et al. 2018 study12 and removed focal events (segment size < 5 Mb).

Analysis of somatic SNVs.

To assess the relationship between mCAs and somatic SNVs, we obtained somatic SNV calls from the Yizhak et al. study (available via dbGaP: phs000424)2, which analyzed samples included in the GTEx v7 release. For the analysis of mCA and SNV prevalence across tissues, we only included tissue types with at least 10 samples in the previous study.

Comparison with cancer genomic alteration landscapes.

To compare the mCA landscapes in normal and cancer tissues, we first defined a matching between each GTEx normal tissue type with one or multiple cancer types originating from that tissue (Supplementary Table 2). We then defined chromosomal alterations in TCGA based on the total copy number and minor copy number of each segment as well as the tumor ploidy (median total copy number of all segments). Specifically, we classified segments with total copy number higher than the ploidy as an amplification, segments with total copy number lower than the ploidy as a deletion, and segments with total copy number of 2 and minor copy number of 0 as a CNLoH. We additionally obtained copy number segments for pituitary, BCC and cSCC from three other studies53,74,75. We removed focal events (segment size < 5 Mb) from all the above datasets. We used a permutation test to evaluate the statistical significance of the global resemblance between the mCA landscapes in normal tissues and the corresponding cancer types. We modeled each observed mCA event (from normal tissues) as a sample from a categorical distribution with 22 × 3 categories (3 mCA types per chromosome). That is, for an event from normal tissue type k, its distribution is parameterized by a probability vector vk, which is the mCA frequency by chromosome and mCA type (amplification, deletion, CNLoH) for the matching cancer type(s) of tissue k. When there is zero instance of a particular chromosome and mCA type combination for a tissue’s matching cancer samples, a pseudocount of 1 is added. A model log-likelihood can then be computed for the observed set of mCAs. The permutation test is based on the rationale that if the observed mCAs are completely independent of the mCA frequencies in the corresponding cancer types (H0), then randomly permuting the matching cancer types (and therefore the vks) would result in a similar log-likelihood. Conversely, if there is any resemblance between the mCA landscapes in normal tissues and the corresponding cancers (H1), then permuting the normal-cancer matching would result in a significantly lower log-likelihood. We performed a total of 10,000 permutations and calculated a one-sided empirical P value as P=r+1n+1 where r is the number of times the permutation log-likelihood is greater than or equal to the observed log-likelihood and n is the total number of permutations.

Detection of mCAs from scRNA-seq data.

We used Numbat34 (v1.0.2; https://github.com/kharchenkolab/numbat) to detect mCAs in the adrenal gland scRNA-seq samples, with parameters 𝑡 = 10−6 and max_entropy = 0.8. Clustered gene expression profiles from the youngest donor (DO2) were used as expression reference. The rest of the parameters were kept as the default.

Differential gene expression analysis.

For the adrenal gland scRNA-seq sample with detectable mCA, we used the two-sided Mann-Whitney U-test implemented in pagoda276 (v1.0.9; https://github.com/kharchenkolab/pagoda2) to identify confident differentially expressed genes between subclones. We used default parameter settings, with an adjusted P value (Q value) threshold of 0.01. The Benjamini-Hochberg procedure was used for multiple testing corrections.

Associations of mCAs with demographic and morphological features.

Multivariate logistic regressions were performed using the glm function included in the stats package within R (v4.2.2). In the analysis of demographic factors and lifestyle habits (smoking and drinking), we adjusted for total allele coverage (defined as the total allele counts of phased heterozygous SNPs in the RNA-seq sample; centered and scaled). In the analysis of morphological features, we restricted the analysis to solid tissues (excluding blood), adjusting for sex, age, ancestry, and allele coverage.

Dosage effect of drinking frequency.

We assessed the dosage effect of increasing drinking frequency on the presence of detectable mCAs in the esophagus mucosa by multivariate logistic regression adjusted for sex, age, ancestry, and allele coverage. Drinking frequency was included in the model as an ordinal variable with integer levels of 1, 2, 3 respectively for monthly, weekly, and daily drinking periods, and its main effect was assessed.

QTL analysis.

To study the effect of mCAs on QTLs, we used the eQTL and sQTL catalogs from the GTEx v8 release. For eQTLs, we restricted our analysis to the set of significant eGenes and their lead eVariants (Q < 0.05). For the analysis on eVariant dosage, we selected eVariants in the esophagus mucosa that have high confidence (Q < 10−3), have large effect sizes (|log2(aFC)| > 1), putatively causal (CAVIAR causal probability > 0.1), and have consistent trend in expression across genotypes (0/0, 0/1, 1/1). In each sample with CNLoH covering an eVariant, we phased the eVariant with respect to the major and minor haplotypes using the closest expressed common SNP within 0.5 Mb (which does not reside in the same gene and has posterior allele assignment confidence > 0.9) as anchor. Out of all candidate loci, 7 had at least 10 mosaic samples (mosaic cell fraction > 15%) in which we can confidently phase the eVariant in order to determine variant dosage. For sQTLs, we restricted our analysis to the set of significant sGenes and their lead sVariants (Q < 0.05). For the analysis on sVariant dosage, we selected sVariants in the esophagus mucosa that have high confidence (Q < 10−3), have large effect sizes (|β| > 0.03), and have consistent trend in splicing phenotypes across genotypes. We performed variant phasing in mosaic samples using the same procedure as described above. Out of all candidate loci, 23 had a sufficient number of mosaic samples (cell fraction > 15%) in which we can confidently phase the sVariant in order to determine variant dosage. For each QTL, the correlation between variant dosage and expression or splicing phenotype was assessed by Pearson’s product-moment correlation coefficient (one-sided cor.test in R) and multiple-testing correction was performed using the Benjamini-Hochberg procedure.

Statistics and reproducibility.

The study was designed to be a retrospective analysis of existing datasets. The experiments were not randomized, and the investigators were not blinded to allocation during the experiments and outcome assessment. No statistical method was used to predetermine sample sizes because the prevalence of mCAs in normal tissues was unknown. Out of all samples in the GTEx v8 collection with downloadable data, we removed samples with high levels of inter-individual contamination (contamination score higher than the 98th percentile). The contamination score was calculated as the fraction of SNPs covered by at least 15 reads with AF < 0.2 or AF > 0.8. We removed samples deriving from cell lines (EBV-transformed lymphocytes and cultured fibroblasts), since our study focused on whole tissues. The final dataset consisted of 16,365 whole tissue samples from 947 individuals after quality control and exclusion of cell lines. To ensure the quality of ground-truth CNV calls from the TCGA validation cohort, we excluded tumor samples with low purity (≤85%), tumor samples without matched adjacent normal, and samples whose DNA and RNA were not co-extracted from the same tissue portion (Supplementary Methods). We excluded one additional tumor sample due to difference in sequencing technique (single-end RNA sequencing). Custom statistical analyses and visualizations were performed in R (v4.2.2). For all boxplots, box center line indicates median; box limits indicate upper and lower quartiles; whiskers indicate 1.5× interquartile range.

Extended Data

Extended Data Fig. 1. Performance evaluation of mCA detection using HaHMMR.

Extended Data Fig. 1

a, Detection sensitivity with respect to clonal fraction. b, False positive rate (1-specificity) as a function of Q value cutoff. The center (marked by black line) and shaded area of the error band respectively represent the observed false positive rates and 95% confidence intervals from the binomial distribution. Red bar indicates the range of Q value cutoffs at which a false positive rate of 6.1×10−5 is achieved. Red triangle marks the Q value cutoff chosen for the study and the expected false positive rate. c, Estimated false discovery rate (FDR) as a function of clonal fraction and event prevalence among samples.

Extended Data Fig. 2. Assignment of mCA types.

Extended Data Fig. 2

a, Expression log2 fold-change (logFC) and haplotype imbalance of detected events. Each dot represents a detected event. Dashed lines indicate the expected logFC and haplotype imbalance for different mutant cell fractions and alteration types. b, Genome-wide distribution of detected mCAs. Each line represents a detected mCA, colored by alteration type. Multi-tissue chr10 alterations from one individual are not shown.

Extended Data Fig. 3. Genomic distribution of mCAs for each individual tissue.

Extended Data Fig. 3

Each line represents a detected mCA, colored by alteration type. Multi-tissue chr10 alterations from one individual are not shown..

Extended Data Fig. 4. Age-related acquisition of mCAs in different tissue types.

Extended Data Fig. 4

The centers (marked by solid dots) and whiskers (vertical lines) of the error bars respectively represent the observed fractions of samples with detectable mCA and 95% confidence intervals from the binomial distribution. For each group, the number of mCA-positive cases and the total number of biologically independent samples are indicated in brackets. Multi-tissue chr10 alterations from one individual are excluded.

Extended Data Fig. 5. Point mutation burden and mCAs in normal tissues.

Extended Data Fig. 5

SNV burden (VAF ≥ 5%) in samples with and without detectable mCAs. Black curves and shaded areas represent the estimated probability density. Each dot represents a distinct sample. The total numbers of biologically independent samples are marked in brackets. Only tissue types with more than 3 samples with mCAs are included. Q values (FDR-corrected P values) from two-sided Wilcoxon tests are shown on top of each panel.

Extended Data Fig. 6. Examples of overlapping mCAs detected in different tissues of the same individual.

Extended Data Fig. 6

In the subjects shown in a and b, a mirrored pattern of haplotype imbalance is observed in the tissue pair. In the subjects shown in c and d, a concordant pattern of haplotype imbalance is observed in the tissue pair, possibly reflecting the same event that arose during development or blood infiltration. In all sets of figures, the left panels show allele profiles of the affected chromosomes. Alleles in altered regions are colored by the inferred haplotypes. Gray vertical dashed lines denote centromere positions. The middle panels show the inferred total copy number and haplotype proportion for pairs of events. The centers (marked by diamonds) of the error bars represent the maximum likelihood estimates of total copy numbers (y-axis) and haplotype proportions (x-axis). The whiskers of the error bars represent 95% confidence intervals derived from the model likelihood. The right panels show schematics of possible chromosomal alterations.

Extended Data Fig. 7. Mosaic chromosomal alteration types and genome-wide burden per tissue.

Extended Data Fig. 7

a, Number of events by alteration type in each tissue. The number on top of each bar denotes the total number of events. b, Fraction of genome altered in samples with detectable mCAs from each tissue type. Each dot represents a distinct sample with detectable mCAs. The total numbers of biologically independent samples are indicated in brackets.

Extended Data Fig. 8. Comparison of cancer and normal chromosomal alteration landscapes.

Extended Data Fig. 8

a, Chromosomal alteration frequencies by genomic positions in the normal pituitary and pituitary adenomas (PA). b, Chromosomal alteration frequencies by genomic positions in the normal esophagus, esophageal squamous cell carcinoma (ESCC), and esophageal adenocarcinoma (EAC). c, Chromosomal alteration frequencies by genomic positions in the normal skin, skin cutaneous melanoma (SKCM), cutaneous squamous cell carcinoma (cSCC), and skin basal cell carcinoma (BCC). For a-c, the numbers of biologically independent samples are indicated in brackets. Event frequencies are plotted by 5 Mb bins.

Extended Data Fig. 9. Associations of different drink types and drink frequencies with mCAs in the esophagus mucosa.

Extended Data Fig. 9

In both a and b, the centers (marked by diamonds) and whiskers of the error bars respectively represent the estimated odds ratio (OR) and 95% confidence interval from multivariate logistic regression (Methods). Unadjusted two-sided P values from the multivariate logistic regression model are shown on the right. The numbers of biologically independent samples examined are n=535 for drink type and n=510 for drink frequency.

Extended Data Fig. 10. Histological characterization of adult adrenal glands with and without mCAs.

Extended Data Fig. 10

a, Hematoxylin-eosin (H&E) stain of adrenal section of an age- and gender-matched donor (DO3, female, age 59 years) and b, that of the donor with genome aberration (DO6). In DO6, diffuse cortical hyperplasia is evident by the thickened cortical layers. Star denotes extracapsular cortical nubbin. Black arrows indicate cortical extrusions. c, RNAscope® in situ hybridization for cortical markers HSD3B2 and CYP11B2. A zoomed-in view of the tissue section is shown for the region highlighted in a (black rectangle). d, Left panel, immunofluorescent staining of adrenal gland section of DO6 against cortical marker CYP11B1 and smooth muscle actin (SMA) as a marker of blood vessels. Right panel, immunofluorescent staining against chromaffin cell marker chromogranin A (CHGA) and SMA. ZG, zona glomerulosa. ZF, zona fasciculate. ZR, zona reticularis. v, vessel. Extensive extracapsular cortical extrusions and nubbin are observed in the cortex. In contrast, the organization of the medulla remains normal. Each experiment in a-d was performed once.

Supplementary Material

Supplementary Information
Source Data Fig. 7f
Source Data Fig. 4a
Source Data Fig. 1d

Acknowledgements

P.V.K., I.A., and T.G. were in part supported by ERC Synergy grant no. 85629 (KILL-OR-DIFFERENTIATE) from the European Research Council. P.V.K. serves on the Scientific Advisory Board to Celsius Therapeutics Inc. and Biomage Inc. P.J.P. was supported by NIH grant R01HG012573. The work of I.A. was additionally supported by Knut and Alice Wallenberg Foundation, Swedish Research Council, Bertil Hallsten Foundation, Paradifference Foundation, and Austrian Science Fund Project Grant. The work of A.S.T was funded by Paradifference Foundation. R.O. was funded by Vienna Science and Technology Fund (WWTF Genomics based immunologic risk stratification #10.47379/LS20081. P.-R.L. was supported by NIH grant no. DP2 ES030554, a Burroughs Wellcome Fund Career Award at the Scientific Interfaces and the Next Generation Fund at the Broad Institute of MIT and Harvard. V.L. is supported by the Swedish Research Council (2020–00583). M.E.K. was supported by the Novo Nordisk Foundation (Postdoc fellowship in Endocrinology and Metabolism at International Elite Environments, NNF17OC0026874) and Stiftelsen Riksbankens Jubileumsfond (Erik Rönnbergs fond stipend). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The GTEx project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The authors would like to thank Kristin Ardlie for sharing data on cancer incidental findings in GTEx; Jonathan Mitchel, Dominik Glodzik, and Vinay Viswanadham for helpful discussions; Shannon Ehmsen for creating graphical illustrations.

Footnotes

Competing Interests Statement

P.V.K serves on the scientific Advisory Board to Celsius Therapeutics Inc. and Biomage Inc.

P.V.K is an employee of Altos Labs Inc. The rest of the authors declare no competing interests.

Code Availability

Custom scripts and analysis notebooks for reproducing results in the paper are available on Zenodo (DOI: 10.5281/zenodo.8310299)78. The HaHMMR method for mCA detection from bulk RNA-seq data is available on GitHub (https://github.com/kharchenkolab/hahmmr)79.

Inclusion & Ethics

To our knowledge, all local collaborators where the research was conducted were included as co-authors.

Data Availability

GTEx v8 mCA calls are provided as Source Data Fig. 1. Genome-wide mCA recurrence levels are provided as Source Data Fig. 4. Access to GTEx v8 data (including additional WGS and genotyping data in the v9 release) can be requested through dbGaP (phs000424.v8.p2; phs000424.v9.p2). Processed copy number segments for cSCC, BCC, and normal blood can be obtained from the original publications12,74,75. Copy number profiles, WGS, and RNA-seq data for TCGA samples can be downloaded from the GDC Data Portal (https://portal.gdc.cancer.gov) with appropriate access permission from dbGaP (phs000178.v11.p8). Pituitary sequencing data from Bi et al. can be obtained from the European Genome-phenome Archive (EGA; EGAS00001001714). Somatic SNV calls from Yizhak et al.2 with original GTEx sample IDs can be obtained through dbGaP (phs000424). Processed gene expression data of the DO6 adrenal gland scRNA-seq sample is available on Zenodo (DOI: 10.5281/zenodo.8336489)77. Access to the raw sequencing data for this donor sample can be requested through EGA (EGAD00001011288), subject to approval by the data access committee and under the condition that the data will not be propagated further. Access to the source human adrenal tissue materials is restricted due to privacy agreements with the research participants. The list of differentially expressed genes between mutant and normal adrenal cells is provided as Source Data Fig. 7.

References

  • 1.Martincorena I et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Yizhak K et al. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364, (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Martincorena I et al. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Rockweiler NB et al. The origins and functional effects of postzygotic mutations throughout the human life span. Science 380, eabn7113 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Jaiswal S & Ebert BL Clonal hematopoiesis in human aging and disease. Science 366, eaan4673 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Li R et al. Macroscopic somatic clonal expansion in morphologically normal human urothelium. Science 370, 82–89 (2020). [DOI] [PubMed] [Google Scholar]
  • 7.Lawson ARJ et al. Extensive heterogeneity in somatic mutation and selection in the human bladder. Science 370, 75–82 (2020). [DOI] [PubMed] [Google Scholar]
  • 8.Lee-Six H et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019). [DOI] [PubMed] [Google Scholar]
  • 9.Li R et al. A body map of somatic mutagenesis in morphologically normal human tissues. Nature 597, 398–403 (2021). [DOI] [PubMed] [Google Scholar]
  • 10.Laurie CC et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat. Genet. 44, 642–650 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jacobs KB et al. Detectable clonal mosaicism and its relationship to aging and cancer. Nat. Genet. 44, 651–658 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Loh P-R et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559, 350–355 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Terao C et al. Chromosomal alterations among age-related haematopoietic clones in Japan. Nature 584, 130–135 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Thompson DJ et al. Genetic predisposition to mosaic Y chromosome loss in blood. Nature 575, 652–657 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Loh P-R, Genovese G & McCarroll SA Monogenic and polygenic inheritance become instruments for clonal selection. Nature 584, 136–141 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Ellis P et al. Reliable detection of somatic mutations in solid tissues by laser-capture microdissection and low-input DNA sequencing. Nat. Protoc. 16, 841–871 (2021). [DOI] [PubMed] [Google Scholar]
  • 17.Brunner SF et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Blokzijl F et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Luquette LJ et al. Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements. Nat. Genet. (2022) doi: 10.1038/s41588-022-01180-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Yoshida K et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ng SWK et al. Convergent somatic mutations in metabolism genes in chronic liver disease. Nature 598, 473–478 (2021). [DOI] [PubMed] [Google Scholar]
  • 22.Yokoyama A et al. Age-related remodelling of oesophageal epithelia by mutated cancer drivers. Nature 565, 312–317 (2019). [DOI] [PubMed] [Google Scholar]
  • 23.Coorens THH et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392 (2021). [DOI] [PubMed] [Google Scholar]
  • 24.Moore L et al. The mutational landscape of human somatic and germline cells. Nature 597, 381–386 (2021). [DOI] [PubMed] [Google Scholar]
  • 25.Park S et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature 597, 393–397 (2021). [DOI] [PubMed] [Google Scholar]
  • 26.Jakubek YA et al. Large-scale analysis of acquired chromosomal alterations in non-tumor samples from patients with cancer. Nat. Biotechnol. 38, 90–96 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Aran D et al. Comprehensive analysis of normal adjacent to tumor transcriptomes. Nat. Commun. 8, 1077 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Heaphy CM et al. Telomere DNA content and allelic imbalance demonstrate field cancerization in histologically normal tissue adjacent to breast tumors. Int. J. Cancer 119, 108–116 (2006). [DOI] [PubMed] [Google Scholar]
  • 29.Trujillo KA et al. Markers of fibrosis and epithelial to mesenchymal transition demonstrate field cancerization in histologically normal tissue adjacent to breast tumors. Int. J. Cancer 129, 1310–1321 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Heaphy CM, Griffith JK & Bisoffi M Mammary field cancerization: molecular evidence and clinical importance. Breast Cancer Res. Treat. 118, 229–239 (2009). [DOI] [PubMed] [Google Scholar]
  • 31.Consortium GTEx. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Serin Harmanci A, Harmanci AO & Zhou X CaSpER identifies and visualizes CNV events by integrative analysis of single-cell or bulk RNA-sequencing data. Nat. Commun. 11, 89 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ozcan Z et al. Chromosomal imbalances detected via RNA-sequencing in 28 cancers. Bioinformatics (2022) doi: 10.1093/bioinformatics/btab861. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Gao T et al. Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes. Nat. Biotechnol. (2022) doi: 10.1038/s41587-022-01468-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Fan J et al. Linking transcriptional and genetic tumor heterogeneity through allele analysis of single-cell RNA-seq data. Genome Res. 28, 1217–1227 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Reinius B & Sandberg R Random monoallelic expression of autosomal genes: stochastic transcription and allele-level regulation. Nat. Rev. Genet. 16, 653–664 (2015). [DOI] [PubMed] [Google Scholar]
  • 37.Castel SE, Levy-Moonshine A, Mohammadi P, Banks E & Lappalainen T Tools and best practices for data processing in allelic expression analysis. Genome Biol. 16, 195 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Vattathil S & Scheet P Haplotype-based profiling of subtle allelic imbalance with SNP arrays. Genome Res. 23, 152–158 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Nik-Zainal S et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Castel SE, Mohammadi P, Chung WK, Shen Y & Lappalainen T Rare variant phasing and haplotypic expression from RNA sequencing with phASER. Nat. Commun. 7, 12817 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Loh P-R et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Taliun D et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Consortium GTEx et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Cancer Genome Atlas Research Network et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Li N et al. Causal variants screened by whole exome sequencing in a patient with maternal uniparental isodisomy of chromosome 10 and a complicated phenotype. Exp. Ther. Med. 11, 2247–2253 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Benn P Uniparental disomy: Origin, frequency, and clinical significance. Prenat. Diagn. 41, 564–572 (2021). [DOI] [PubMed] [Google Scholar]
  • 47.Bizzotto S et al. Landmarks of human embryonic development inscribed in somatic mutations. Science 371, 1249–1253 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Fowler JC et al. Selection of Oncogenic Mutant Clones in Normal Human Skin Varies with Body Site. Cancer Discov. 11, 340–361 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Colom B et al. Mutant clones in normal epithelium outcompete and eliminate emerging tumours. Nature 598, 510–514 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Colom B et al. Spatial competition shapes the dynamic mutational landscape of normal esophageal epithelium. Nat. Genet. 52, 604–614 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Carithers LJ et al. A Novel Approach to High-Quality Postmortem Tissue Procurement: The GTEx Project. Biopreserv. Biobank. 13, 311–319 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Fishbein L et al. Comprehensive Molecular Characterization of Pheochromocytoma and Paraganglioma. Cancer Cell 31, 181–193 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Bi WL et al. Landscape of Genomic Alterations in Pituitary Adenomas. Clin. Cancer Res. 23, 1841–1851 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Mantovani A, Allavena P, Sica A & Balkwill F Cancer-related inflammation. Nature 454, 436–444 (2008). [DOI] [PubMed] [Google Scholar]
  • 55.Santaguida S et al. Chromosome Mis-segregation Generates Cell-Cycle-Arrested Cells with Complex Karyotypes that Are Eliminated by the Immune System. Dev. Cell 41, 638–651.e5 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Kay J, Thadhani E, Samson L & Engelward B Inflammation-induced DNA damage, mutations and cancer. DNA Repair (Amst.) 83, 102673 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Heyde A et al. Increased stem cell proliferation in atherosclerosis accelerates clonal hematopoiesis. Cell 184, 1348–1361.e22 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Hormaechea-Agulla D et al. Chronic infection drives Dnmt3a-loss-of-function clonal hematopoiesis via IFNγ signaling. Cell Stem Cell 28, 1428–1442.e6 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Cai Z et al. Inhibition of inflammatory signaling in Tet2 mutant preleukemic cells mitigates stress-induced abnormalities and clonal hematopoiesis. Cell Stem Cell 23, 833–849.e5 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Gao T et al. Interplay between chromosomal alterations and gene mutations shapes the evolutionary trajectory of clonal hematopoiesis. Nat. Commun. 12, 338 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Walczak EM & Hammer GD Regulation of the adrenocortical stem cell niche: implications for disease. Nat. Rev. Endocrinol. 11, 14–28 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Drelon C et al. Analysis of the role of Igf2 in adrenal tumour development in transgenic mouse models. PLoS One 7, e44171 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Hoeflich A et al. Overexpression of insulin-like growth factor-binding protein-2 results in increased tumorigenic potential in Y-1 adrenocortical tumor cells. Cancer Res. 60, 834–838 (2000). [PubMed] [Google Scholar]
  • 64.Patel AP et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344, 1396–1401 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Gao R et al. Delineating copy number and clonal substructure in human tumors from single-cell transcriptomes. Nat. Biotechnol. 39, 599–608 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Sharma E et al. The characteristics and trends in adrenocortical carcinoma: A United States population based study. J. Clin. Med. Res. 10, 636–640 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Aygun N & Uludag M Pheochromocytoma and paraganglioma: From epidemiology to clinical findings. SiSli Etfal Hastan. Tip Bul. / Med. Bull. Sisli Hosp. 54, 159–168 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Dekkers OM, Karavitaki N & Pereira AM The epidemiology of aggressive pituitary tumors (and its challenges). Rev. Endocr. Metab. Disord. 21, 209–212 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Beuschlein F et al. Constitutive activation of PKA catalytic subunit in adrenal Cushing’s syndrome. N. Engl. J. Med. 370, 1019–1028 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Kumagai K et al. Expansion of gastric intestinal metaplasia with copy number aberrations contributes to field cancerization. Cancer Res. (2022) doi: 10.1158/0008-5472.CAN-21-1523. [DOI] [PubMed] [Google Scholar]
  • 71.Olafsson S et al. Somatic Evolution in Non-neoplastic IBD-Affected Colon. Cell 182, 672–684.e11 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Cagan A et al. Somatic mutation rates scale with lifespan across mammals. Nature 604, 517–524 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Moynahan ME & Jasin M Mitotic homologous recombination maintains genomic stability and suppresses tumorigenesis. Nat. Rev. Mol. Cell Biol. 11, 196–207 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

Methods-only References

  • 74.Bonilla X et al. Genomic analysis identifies new drivers and progression pathways in skin basal cell carcinoma. Nat. Genet. 48, 398–406 (2016). [DOI] [PubMed] [Google Scholar]
  • 75.Inman GJ et al. The genomic landscape of cutaneous SCC reveals drivers and a novel azathioprine associated mutational signature. Nat. Commun. 9, 3667 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Fan J et al. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis. Nat. Methods 13, 241–244 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Adameyko I, Heinzel A, Oberbauer R & Kastriti ME Adult adrenal gland scRNA-seq. (2023) doi: 10.5281/ZENODO.8336489. [DOI] [Google Scholar]
  • 78.Gao T Analysis code for pan-tissue mCA study. (Zenodo, 2023). doi: 10.5281/ZENODO.8310299. [DOI] [Google Scholar]
  • 79.Gao T kharchenkolab/hahmmr: v1.0.0. (Zenodo, 2023). doi: 10.5281/ZENODO.8342630 [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information
Source Data Fig. 7f
Source Data Fig. 4a
Source Data Fig. 1d

Data Availability Statement

GTEx v8 mCA calls are provided as Source Data Fig. 1. Genome-wide mCA recurrence levels are provided as Source Data Fig. 4. Access to GTEx v8 data (including additional WGS and genotyping data in the v9 release) can be requested through dbGaP (phs000424.v8.p2; phs000424.v9.p2). Processed copy number segments for cSCC, BCC, and normal blood can be obtained from the original publications12,74,75. Copy number profiles, WGS, and RNA-seq data for TCGA samples can be downloaded from the GDC Data Portal (https://portal.gdc.cancer.gov) with appropriate access permission from dbGaP (phs000178.v11.p8). Pituitary sequencing data from Bi et al. can be obtained from the European Genome-phenome Archive (EGA; EGAS00001001714). Somatic SNV calls from Yizhak et al.2 with original GTEx sample IDs can be obtained through dbGaP (phs000424). Processed gene expression data of the DO6 adrenal gland scRNA-seq sample is available on Zenodo (DOI: 10.5281/zenodo.8336489)77. Access to the raw sequencing data for this donor sample can be requested through EGA (EGAD00001011288), subject to approval by the data access committee and under the condition that the data will not be propagated further. Access to the source human adrenal tissue materials is restricted due to privacy agreements with the research participants. The list of differentially expressed genes between mutant and normal adrenal cells is provided as Source Data Fig. 7.

RESOURCES