Skip to main content
PLOS Genetics logoLink to PLOS Genetics
. 2021 Jan 14;17(1):e1009302. doi: 10.1371/journal.pgen.1009302

UV-exposure, endogenous DNA damage, and DNA replication errors shape the spectra of genome changes in human skin

Natalie Saini 1,¤, Camille K Giacobone 1, Leszek J Klimczak 2, Brian N Papas 2, Adam B Burkholder 2, Jian-Liang Li 2, David C Fargo 2, Re Bai 3, Kevin Gerrish 3, Cynthia L Innes 4, Shepherd H Schurman 4, Dmitry A Gordenin 1,*
Editor: Mitch McVey5
PMCID: PMC7808690  PMID: 33444353

Abstract

Human skin is continuously exposed to environmental DNA damage leading to the accumulation of somatic mutations over the lifetime of an individual. Mutagenesis in human skin cells can be also caused by endogenous DNA damage and by DNA replication errors. The contributions of these processes to the somatic mutation load in the skin of healthy humans has so far not been accurately assessed because the low numbers of mutations from current sequencing methodologies preclude the distinction between sequencing errors and true somatic genome changes. In this work, we sequenced genomes of single cell-derived clonal lineages obtained from primary skin cells of a large cohort of healthy individuals across a wide range of ages. We report here the range of mutation load and a comprehensive view of the various somatic genome changes that accumulate in skin cells. We demonstrate that UV-induced base substitutions, insertions and deletions are prominent even in sun-shielded skin. In addition, we detect accumulation of mutations due to spontaneous deamination of methylated cytosines as well as insertions and deletions characteristic of DNA replication errors in these cells. The endogenously induced somatic mutations and indels also demonstrate a linear increase with age, while UV-induced mutation load is age-independent. Finally, we show that DNA replication stalling at common fragile sites are potent sources of gross chromosomal rearrangements in human cells. Thus, somatic mutations in skin of healthy individuals reflect the interplay of environmental and endogenous factors in facilitating genome instability and carcinogenesis.

Author summary

Skin forms the first barrier against a variety of environmental toxins and DNA damaging agents. Additionally, DNA of skin cells suffer from endogenous damage and errors during replication. Altogether, these lesions cause a variety of genome changes resulting in disease including cancer. However, the accurate measurement of the range and complete spectrum of genome changes in healthy skin was missing due to technical or biological limitations of prior studies. We present here accurate measurements of the various types of somatic genome changes that we found in skin fibroblasts and melanocytes from 21 donors ranging in ages from 25 to 79 years, which allowed to distinguish age related from age independent changes. Our cohort contains both White and African American donors, allowing an estimation of the impacts of skin color on mutagenesis. As a result, we revealed the complete spectrum and determined the range of somatic genome changes and their etiologies in healthy human skin fibroblasts and melanocytes and highlighted molecular mechanisms underlying these changes. Therefore, our study introduces a base line for defining disease levels of genome instability in skin.

Introduction

Cells within the human body encounter a vast variety of DNA damaging agents throughout an individual’s lifetime. By some estimates, cells may receive 70,000 DNA lesions per day [1,2]. Erroneous repair or lack of repair of these lesions would lead to a variety of genome changes including somatic single base substitutions, insertions and deletions, rearrangements and copy number changes. Large-scale sequencing studies of single cells, clonally expanded single cells and bulk cells from healthy humans have demonstrated that healthy human tissues are genetically mosaic with thousands of somatic mutations [311]. Analysis of such accumulated somatic genome changes have enabled elucidation of the sources of the mutation-initiating lesions as well as the various DNA repair pathways that may be involved in error-prone repair of DNA damage in human cancers [1215]. Since at least half of the somatic genome changes seen in cancers originate in healthy pre-cancerous cells [16], it is imperative to establish the sources of DNA damage and their impacts on genome stability in healthy cancer-free tissues.

Skin is the largest tissue in the human body and forms the first line of defense against environmental toxins and DNA damaging agents, with ultraviolet (UV) radiation being the most potent environmental mutagen in skin cells. In fact, melanoma genomes have the highest burdens of mutations with UV-induced mutation signatures predominating amongst the mutation signatures identified in this cancer type [12,13]. The pathogenic impact of UV-radiation in generating genome instability is multifaceted. UV-induced DNA lesions are a source of replicative polymerase stalling [17,18] and require translesion synthesis (TLS) over cyclobutane pyrimidine dimers (CPD) and pyrimidine 6–4 pyrimidone (6-4PP) [1927]. Error-prone TLS over UV-induced lesions leads to C➔T changes in the yCn motif (y is any pyrimidine, n is any nucleotide, mutated base is capitalized). Cytosines within a CPD may also be deaminated to uracils and upon copying by the canonical DNA polymerases or by the TLS polymerase, Pol η, would be fixed as yCn➔yTn changes or to CC➔TT changes in the next round of replication [19,21]. Error-prone TLS across thymine CPDs can also lead to T➔C changes [19,21,28,29] preferring nTt➔nCt motif [8]. Altogether, these base-substitution motifs derived from experimental data constitute a significant part of mutation signature SBS7b extracted by non-negative matrix factorization analysis from mutation catalogs of thousands of whole-genome sequenced human cancers [13]. In the absence of TLS across UV-induced lesion it would not result in base substitutions but can lead to impediment of replication fork progression. Restart of a stalled replication fork can result in the formation of single-stranded gaps in the sister DNA molecules and later convert to double strand breaks (DSBs) [30,31]. Inaccurate repair of such a DSB via homologous recombination (HR) or non-homologous end joining (NHEJ) can lead to a structural changes, copy number variation or generate a small insertion or a deletion.

In agreement with UV radiation being the major source of DNA damage in skin cells, various studies have demonstrated that C➔T changes in the yCn context is the most prevalent base substitution in skin fibroblasts, melanocytes, and keratinocytes. In addition, human skin cells also carry CC➔TT changes and T➔C in the nTt motifs [7,8,32,33]. Moreover, we previously demonstrated that fibroblasts obtained from sun-exposed body sites carry a higher mutation burden along with a higher contribution of a UV-mutation signature than fibroblasts obtained from sun-shielded sites [8]. Our findings were also supported in the study by Tang et.al. wherein they demonstrated higher mutation burden in melanocytes from sun-exposed body sites than sun-shielded body sites via either whole exome sequencing or targeted sequencing of 509 cancer-associated genes in single melanocytes [33]. In summary, numerous studies have established and verified the prominent mutagenic effects of the bypass of UV-induced lesions by translesion polymerases generating a characteristic base substitution signature in skin cells. However, the broad spectrum of somatic genome instability, including consequences of UV-induced DSBs in cells of healthy human skin have neither been established nor characterized by mutation signature analysis.

In addition to environmental DNA damage, cells may also accumulate somatic genome changes due to endogenous DNA damage or errors during DNA replication in the form of base substitutions, small insertions or deletions (indels), and gross chromosomal rearrangements. Somatic mutations in skin cells have been measured either by deep sequencing of bulk tissue [32] or whole-genome sequencing of single cell-derived induced pluripotent stem cells [5] or single cell-derived clonal lineages [7,8]. However, due to either small sample sizes or difficulties in accurately identifying somatic indels and chromosomal rearrangements using induced pluripotent stem cells or bulk cells, none of these studies have been able to adequately characterize the different sources of DNA damage and their mutagenic outcomes in skin cells from healthy donors [34].

Here, we present an integrated analysis of the various types of somatic genome changes that are found in skin fibroblasts and melanocytes from a total of 21 donors ranging in ages from 25 to 79 years. Unlike previous studies, our cohort contains both White and African American donors, allowing a better estimation of the impacts of skin color on mutagenesis in skin cells. Our work provides the normal range of the burden and types of somatic genome instability in human skin cells. We show here that in skin cells, endogenous DNA damage in the form of spontaneously deaminated cytosines at CpG motifs, oxidative DNA damage, as well as DNA replication errors, are a substantial source of somatic mutagenesis. Additionally, UV-induced DNA damage is prevalent even in sun-shielded skin cells and manifests as single base substitutions arising from DNA synthesis over lesions by TLS and by deletions of five or more nucleotides arising from end-joining repair of UV-induced DSBs. Our analysis also highlights the differences in the outcomes of UV-induced DSBs and DSBs induced by endogenous DNA damage in cancer-free skin cells. Overall, we provide a comprehensive analysis of the various UV-induced and endogenous genome de-stabilizing processes that operate in healthy skin cells.

Results

Study design

Based on our prior study [8], we performed whole-genome sequencing of hip skin cells as it would allow detection of versatile mutational processes, because it is mostly sun-shielded. UV-induced mutagenesis has lower contribution into overall mutation spectrum in this tissue which allows better detection of other types of mutagenesis. In this study, we analyzed somatic genome changes in single cell lineages from 34 fibroblasts obtained from skin biopsies taken from hips of a total of 21 donors, ages 25 to 79. Our dataset includes the hip fibroblasts from two donors sequenced previously [8]. In addition, we sequenced five genomes of clonal single melanocyte-derived lineages. Single skin fibroblasts were propagated in culture up to approximately 1,000,000 cells which provided sufficient high-quality genomic DNA for whole-genome sequencing and follow-up validation. The clonal single-melanocyte lineages were cultured in media up to 10,000 cells. DNA from these cells was whole-genome amplified and sequenced. In addition, we were able to grow one melanocyte clone up to 1,000,000 cells and performed whole-genome sequencing on this sample without whole-genome amplification. From each donor, we also sequenced whole blood DNA (Fig 1A).

Fig 1. Schematics and total base substitutions identified per clonal lineage in this study.

Fig 1

(A) Schematics of the study design. From each donor, we obtained blood for whole-genome sequencing. In addition, we obtained skin biopsies from the hips of the donors from which fibroblasts and melanocyte clonal lineages were obtained. Fibroblasts were grown up to a million cells and their DNA was directly used for whole-genome sequencing, while melanocytes grew up to 10,000 cells and the DNA was whole-genome amplified and thereafter sequenced. (B) The total base substitutions in each clonal lineage versus the age of the donors. The pink filled circles denote melanocyte clones. The x-axis denotes the ages of the donors, while the y-axis denotes the number of base substitutions. The solid black line is the linear regression line for the samples, while the dotted black curves are the 95% confidence intervals. The source data for this figure is in S1 and S2 Tables.

The median sequencing depth for the samples was 78X with a minimum average coverage per site of 50X (S1 Table). The genome-wide changes detected in the clones were compared to blood samples from the same donors and only the variants unique to the clones were denoted as somatic changes in the clones. Stringent filtering criteria were applied to exclude changes that could have occurred during limited propagation of the clone. For this purpose, only base substitutions as well as indels calls within 45% and 55% (heterozygous alleles) or above 90% (homozygous alleles) allele frequencies were considered clonal and somatic in the initiating melanocyte or fibroblast cell. All other calls that did not conform to these allele frequencies were considered sub-clonal and were removed from the analysis as these most likely represented culture-induced artifacts. We also analyzed the allele frequencies of all somatic base substitutions in the clonal lineages. All fibroblast clonal lineages demonstrated a peak of mutation calls at the 45% to 55% allele frequencies, indicating that these samples were clonal (S1 Fig and S2 Table). We did not see such a peak in the whole-genome amplified melanocyte clones which could reflect uneven genome amplification and localized genome duplications during the whole-genome amplification step. Nonetheless, only analyzing heterozygous mutation calls within the 45% and 55% allele frequencies and homozygous mutation calls at >90% allele frequencies allows us to estimate the minimum number of somatic mutations in the founder cells. Mutations that accrue during culture and/or polymerase errors during whole genome amplification are expected to not be clonal and have allele frequencies <45%. For structural changes, clonal calls with variant junction reads representing at least 30% of the total junction reads, and no reads representing the variants in the blood genomes were identified as clonal somatic rearrangements present in the initiating fibroblast. Multiple samples sequenced from the same donors allowed intra-individual comparisons of somatic genome instability in humans.

UV-induced base substitutions and C➔T changes due to spontaneous cytosine deamination are prevalent in skin cells

We detected 402 to 14029 base substitutions in each clonal lineage sequenced and the mutations did not increase with the age of the donors (Fig 1B). Analysis of the mutation spectrum revealed that the predominant mutation in many samples was C➔T base substitution (S2 Fig and S2 Table). The number and types of base substitutions in melanocyte clones were similar to those seen in the fibroblasts (Figs 1B and S2 and S2S4 Tables). To identify the predominant mutation signatures in our samples, we determined the cosine similarities of the 96 tri-nucleotide motif mutation profiles in our samples versus all the published mutation signatures derived from analysis of thousands of mutation catalogs from human tumors ([13], and https://cancer.sanger.ac.uk/cosmic/signatures). This allowed us to agnostically determine the mutation signatures previously identified in cancers that were also overrepresented in our samples. We saw that the SBS7b signature was overrepresented in many samples. These signatures comprise of C➔T changes at cC or tC motifs (S2 and S3 Figs). In addition, SBS1 was only weakly represented in our samples. We also found mutation signatures SBS2 and SBS11 present in the samples which also carry a strong SBS7b mutation signature. These are likely due to the overlap between the SBS2 and SBS11 mutation signatures with UV-induced mutations (S3 Table). We also used non-negative matrix factorization (NMF)-based deconvolution of mutation signatures as a parallel approach to agnostically determine the predominant signatures in our samples for single base substitutions and dinucleotide substitutions. We were able to detect SBS1, SBS5, and prominent UV mutation signatures (SBS7b and DBS1) in our cohort (S3 Table, S2 and S3 Figs). In 28 samples we also detected either SBS4 or SBS18 which are indicative of oxidative damage in the cells leading to G➔T (C➔A) changes (S2 and S3 Figs and S3 Table).

We then sought to determine if the most prominent components of mutational signatures identified above were statistically enriched in our samples. For this purpose, we used our previously described knowledge-based trinucleotide-motif-centered pipeline [8,14,15,35]. This pipeline calculates enrichment with mutations within pre-defined trinucleotide motifs. It also calculates the sample-specific p-values for enrichments and minimum estimate of mutation load assigned to a motif-specific mutagenic process after stringent statistical filtering. nCg➔nTg changes likely arise upon spontaneous deamination of methylated cytosines [15]. These mutations constitute the major component of SBS1 in COSMIC [12,13]. SBS1-associated mutation load has been shown to increase with age in cancers [36] and in healthy individuals [7,10,37,38]. Analysis of the nCg➔nTg changes in our donors demonstrated that this mutation type is statistically enriched in all the samples and was also found to linearly increase with the ages of the participants with an average increase of 0.4 mutations per year (Fig 2 and S4 Table). We also detected statistically significant enrichment with UV-associated C➔T changes in the tC or cC context (yCn➔yTn, major component of COSMIC SBS7b) in many of the sequenced samples as well as a prominent presence of CC➔TT changes (Fig 2 and S4 Table). The estimates of yCn➔yTn minimum mutation load correlate with direct counts of the less frequent CC➔TT and nTt➔nCt changes, which have previously been shown to be associated with UV-induced DNA damage in human cells [8,13] (S4 Fig). Altogether these three types of changes indicate the contribution of UV-induced changes in mutation load accumulated in skin. Interestingly, the UV-induced mutations did not correlate with the ages of the participants (Fig 2).

Fig 2. Analysis of the motif-specific mutation signatures in the genomes of skin cells.

Fig 2

The minimum mutation load for the (A) nCg➔nTg mutation signature, (B) yCn➔yTn mutation signature and (C) the total mutation load for the CC➔TT dinucleotide changes are plotted against the ages of the participants. The solid pink circles denote the mutation load in melanocytes. The black solid line is the linear regression, and the dotted curves are the 95% confidence intervals for each dataset. The source data for this figure is S4 Table.

SBS1 mutations identified by SigProfilerExtractor (NMF-based deconvolution) and the minimum number of nCg➔nTg mutations analyzed by the knowledge-based pipeline correlate with each other. We saw a similar correlation between the mutations attributed to SBS7b by SigProfilerExtractor and the minimum number of yCn➔yTn mutations identified by the knowledge-based pipeline (S5 Fig). These data indicate that both methodologies perform similarly in the evaluation of mutation signatures.

Bulk exome sequencing reveals the presence of cancer drivers in the samples at less than 10% allele frequencies

Whole-exome sequencing up to 100X to 150X was also performed for bulk samples from which 14 fibroblast clones and four melanocyte clones, respectively (S1 Table). Analysis of single nucleotide variants (SNVs) in the bulk samples and comparisons with the clonal lineages derived from the bulk cells revealed the presence of overlapping SNVs in bulk and the corresponding clones. Interestingly, we did not see any overlapping SNVs between bulk samples and clonal lineages that were not derived from the same bulk sample, even if they were coming from the same donor (S5 Table). This observation validates our mutation calling pipeline and provides support for the presence of the mutations detected in the clonal lineages in the original skin biopsies.

The somatic mutations identified in the bulk samples were predominantly at or less than 10% allele frequencies (S5 Table and S6 Fig). This observation was also found to hold true for the allele frequencies of the overlapping SNVs in bulk samples and their corresponding clonal lineages (S7 Fig). The low allele frequency in the population demonstrates the large amount of heterogeneity in the dermal and the epidermal tissue.

We also annotated all the SNVs in all whole-genome sequenced clones and whole-exome sequenced bulk tissues for functional effects. All non-synonymous SNVs, stop gains, start or stop loss SNVs in the clones and bulk tissues were further analyzed using the Cancer Genome Interpreter [39] to determine if these were potentially cancer drivers. Of the 672 SNVs in the clones that had potentially functional impacts, 32 changes were in tumor driver genes and 13 changes were annotated as tumor drivers. One sample, DAG_H95, was found to have 3 tumor driver mutations, however the donor does not have any history of cancer (S2 Table). We also detected 3190 SNVs in the bulk tissues that could alter protein sequence, of which 390 were within tumor driver genes, and 62 of the mutations were annotated as driver mutations. Interestingly, these driver mutations were also present between 2 to 10% allele frequencies in the samples (S7 Fig). Overall, the results suggest that normal sun-shielded human skin carries a substantial proportion of cancer driver mutations, albeit at low allele frequencies.

Single base indels in homonucleotide repeats and deletions larger than 5 bases are ubiquitous in skin cells

We detected from 7 to 71 indels in the donors (Fig 3A and S6 Table). The insertions ranged from 1 base to 40 bases and deletions ranged from 1 to 171 bases (Fig 3B and S6 Table). The total number of indels per sample do not appear to increase statistically with the ages of the donors (Fig 3A). NMF-based deconvolution analysis of indel signatures or measuring the cosine similarities of indel patterns with the indel signatures currently annotated in cancers [13] demonstrated that two indel types were prevalent in our samples. The first was single base insertions or deletions in homopolymeric stretches associated in our samples with ID1, ID2 and ID7 (S8 Fig and S7 and S8 Tables). Since many samples had very low numbers of indels, it is possible that mathematical deconvolution of indel signatures may carry errors. Therefore, instead of the number of mutations within each signature, we used the total number of single base insertions or deletions in homopolymeric stretches for further downstream analyses. These types of indels were also found to increase linearly with the ages (0.22 mutations per year) of the donors consistent with the idea that they were associated with polymerase slippage at the homopolymeric repeats [40,41] during ongoing DNA replication in fibroblasts over the donors’ lifetime (Fig 3C). The second class of indels were deletions spanning five nucleotides or more, many of which have microhomology of one or more bases at the deletion junction (S8 Fig and S8 Table). Based on cosine similarities, these indels were highly similar to ID8 (S8 Table), the indel signature associated with double-strand break repair via non-homologous end joining [13]. Consistently, NMF-based deconvolution of indel signatures applied to our samples identified these deletions of five or more bases as part of novel signature which is a composite signature made up of ID8-like indels as well as indels in homopolymeric repeats (Signature A in S8 Fig and S8 Table). Such deletions spanning five or more nucleotides were identified in almost all samples and did not demonstrate a statistically significant increase with the ages of the donors. We also did not see any differences between the indel load or signatures in melanocytes versus the fibroblasts indicating that the processes yielding indels in both cell types are likely the same (Fig 3C).

Fig 3. Analyses of indels in skin cells.

Fig 3

(A) The total number of indels identified in each sample plotted against the ages of the donors. The black dots denote fibroblast clones, while the pink dots denote the melanocyte clones. (B) The distribution of the lengths of the insertions and deletions detected in the clonal lineages. The source data for panels A and B is in S6 Table. (C) The number of insertions and deletions in homopolymeric repeats and the deletions spanning 5 bases or more plotted against the ages of the donors. The open circles denote fibroblast clones, while the filled in circles denote melanocyte clones. The source data for this figure are in S7 Table. (D) The number of insertions and deletions in homopolymeric repeats and the deletions spanning five bases or more plotted against the yCn➔yTn minimum mutation load in skin cells. The source data for this panel are in S4 and S6 Tables. In the graphs, the black solid line is the linear regression of the data, and the dotted black curves are the 95% confidence intervals.

The number of deletions spanning five or more nucleotides were found to also correlate in our samples with the UV-associated trinucleotide-centered yCn to yTn mutation signature. We did not see a positive correlation between the UV-associated mutation signatures and single base indels in homopolymeric repeats (Fig 3D). These data indicate that unlike indels at homopolymeric repeats, UV-induced DNA double strand breaks are the underlying etiology for deletions of five or more bases in human skin cells.

The majority of the insertions in skin cells are templated

The predominant insertions detected in our clonal lineages were templated single-base insertions (i.e. copied from the neighboring bases). Of the 186 single base insertions, 163 of the insertions were copied from the adjacent base (S9 Table). Such insertions most likely represent polymerase slippage events within homopolymeric runs of bases or erroneous Okazaki fragment maturation [40,41] and constitute the ID1, ID2 and ID3 indel signatures as mentioned above [13].

We also detected 28 instances of insertions larger than two bases in length. 18 of these larger insertions were a duplication of the neighboring residues. Three of these templated insertions carried small mismatches likely due to errors during copying of the neighboring residues (S9 Table). Such templated insertions along with deletions spanning five bases or more have been shown to be characteristic of non-homologous or microhomology-mediated end-joining of double-strand breaks [13,42,43]. As such, it is likely that repair of UV-induced double strand breaks also leads to insertions of two bases and more. However, the low numbers of such events do not allow statistical verification of this hypothesis.

UV-induced mutation load varies by race and is not impacted by the sex of the donors

Our cohort included five African American or Black donors and 16 White donors, thus allowing us to also determine if the accumulation of somatic genome changes is different between the two races. The total base substitutions in samples from the White donors (median 1824), were higher than in the skin fibroblasts and melanocytes obtained from African American or Black donors (median 715, p-value = 0.00002193, calculated by two-tailed Mann Whitney test). We reasoned that this lower mutation load in the African American donors might reflect the protective effect of melanin in skin. Consistently, we did see a prominent presence of UV-associated yCn➔yTn changes in skin cells from White donors (median for White donors = 209). However, we did not detect statistically significant enrichment with this mutation type in skin cells from Black donors (minimum estimate of mutation load = 0, Fig 4A and S10 Table). The number of nCg➔nTg mutations, which are not associated with UV-lesions did not vary across the two categories of donors (median for White donors = 48, median for African American donors = 40). In addition, although we did not see any difference in the total number of indels or the number of indels in homopolymeric repeats in skin cells obtained from donors of either race, we found increased numbers of deletions spanning five bases or more in skin cells obtained from White donors (median = 9) as compared to the skin cells obtained from African American or Black donors (median = 2, P-value = 0.0002781, calculated by two tailed Mann Whitney test) (Fig 4B and S10 Table). In order to avoid skewing of the data due to differences in sequencing methodologies used for melanocytes and fibroblasts in our samples, we also calculated P-values for each of the cohorts using a two tailed Mann-Whitney test after excluding the data from the melanocytes. Even in this data set, we were clearly able to detect an increase in UV-induced mutations in White donors as compared to African American or Black donors (S10 Table). Overall, our data can be explained by melanin in skin providing strong protection against UV-associated somatic mutations in the form of both UV-signature base substitutions as well as deletions of five bases or more.

Fig 4. Base substitutions and indels in African American and White donors.

Fig 4

(A) The total number of base substitutions, the nCg➔nTg minimum mutation load and the yCn➔yTn minimum mutation load in African American and White donors. Melanocytes are depicted as filled black circles. (B) The total number of indels, single nucleotide indels in homopolymeric repeats and deletions spanning five bases or more in the African American and White donors. Melanocytes are depicted as filled black circles. A two-sided Mann-Whitney U-test was performed to compare the mutation load across the two cohorts. * denotes a Bonferroni corrected P-value < 0.05. The source data for this figure is in S10 Table.

Our cohort also consists of eight men and 13 women. Analysis of mutation load based on sex did not demonstrate any differences between the skin cells obtained from the men or the women (S9 Fig and S10 Table).

Structural variant hotspots colocalize with common fragile sites

Structural variants were only analyzed for the fibroblast clonal lineages and the single melanocyte clonally grown lineage for which we were able to obtain sufficient cells for WGS without whole-genome amplification, since the genome amplification process can result in many false rearrangement calls. There were 120 structural variants in the 35 sequenced clonal lineages (from 1 to 14 in each isolate) (Fig 5A and S11 Table). The structural variants included deletions, duplications, inversions ranging in size from 225 bp to 39 Mbp, as well as translocations. No age-dependent increase in structural variants was evident.

Fig 5. The structural variants identified in the genomes of skin cells.

Fig 5

(A) The number of structural variants in each donor plotted against the ages of the donors. The black inclined line denotes the linear regression of the data, while the dotted curves denote the 95% confidence intervals. (B) The number of structural variants that were or were not within hotspots and common fragile sites. A Fisher’s exact test was performed to determine if structural variants in hotspots were also preferentially present within common fragile sites. (C) The types of structural variants that overlap and do not overlap common fragile sites. A Chi-square test was performed to determine if the structural variant types within common fragile sites were different from those that did not overlap common fragile sites. The source data for this figure are in S11 Table.

We identified genomic regions that are hotspots for chromosomal breakage and structural variation. Two or more rearrangements were denoted as part of a hotspot if they were less than or equal to 1Mbp apart and were present in different samples. Of the 120 structural variants identified, 55 rearrangements were within hotspots. Previously, we showed that structural variants identified in skin fibroblasts of two donors were often in the vicinities of common fragile sites (CFSs) [8]. To determine if the structural variants in this larger data set also often colocalize with CFSs, we identified those deletions, duplications and inversions that intersect common fragile sites within the HumCFS database [44]. We also identified those translocations as colocalizing with fragile sites, whose breakpoints were within 10kb of a CFS. 63 rearrangements were found to colocalize with CFSs. 18 of the rearrangements within CFSs were on chromosome 7, of which 14 rearrangements were within FRA7J, implying that this fragile site is expressed more prominently in fibroblasts than the other fragile sites, leading to higher levels of replication stalling and gross chromosomal breakage. Moreover, the majority of the rearrangements within hotspots also colocalized with CFSs, while the majority of rearrangements that were not within hotspots were scattered across the genome (Fig 5B and S11 Table). Interestingly, we did not see any difference in the types of structural variants that overlap and those that did not overlap CFSs (Fig 5C). Moreover, we determined the use of microhomology at the breakpoints to identify a role of microhomology mediated repair of DSBs at CFSs. Of the 120 rearrangements, only 15 rearrangements contained microhomology at the breakpoints (6 overlapping CFSs, and 9 not overlapping CFSs). These regions of microhomology were small and ranged from 2 to 3 bases. A Fisher’s exact test demonstrated no significant bias in the use of microhomology between the variants that overlapped CFSs versus those that do not (P-value 0.41) (S11 Table).

Overall, we hypothesize that replication-associated difficulties at CFSs are responsible for the generation of rearrangement hotspots in healthy human cells.

Discussion

In this study, we revealed and accurately measured load of the major types of somatic genome changes in human skin. We grew single cell-clonal lineages derived from human skin fibroblasts and melanocytes. Whole genome sequencing from these samples allows us to detect somatic genome changes that are present in the original single cells with high accuracy. Moreover, the methodology provides sufficient DNA for orthogonal validation of the changes, allowing us to apply the most stringent criteria for identifying different kinds of genome changes without losing sensitivity.

Our work provides the range of normal somatic genome changes in human skin cells in donors across a wide range of ages and of different races. We demonstrate each skin cell carries from 402 to 14029 base substitutions, 7 to 71 indels and 1 to 14 structural variants per cell. The mutation burden in healthy skin cells was also similar to the median mutation load in cancers [45]. Interestingly, we identified various cancer driver mutations in the clones as well as in the bulk tissue samples, although these driver mutations were present at low allele frequencies in the bulk samples. This observation echoes previous findings where normal tissue often contains cells with driver mutations [11,32,37,38,46,47].

Analysis of mutation signatures in the clonal lineages allowed to differentiate between endogenous DNA damage-induced mutations, replication-associated errors as well as environmental DNA damage-induced genome changes. C➔T changes at CpG motifs as well as single nucleotide insertions and deletions were found to increase with the ages of the donors and were indicative of endogenous mutational processes and replication errors, respectively. In addition, UV-induced base substitution signatures were prominent in many samples even though they were obtained from sun-shielded skin. UV-induced DNA damage can also lead to the formation of double strand breaks in the genome. We showed here that deletions spanning 5 or more nucleotides with or without microhomologies at the junctions strongly correlated with the UV-induced base substitution signature. Previously, this indel signature (ID8) has been identified in a wide variety of cancers, and likely represents repair of double strand breaks via non-homologous end joining (NHEJ) pathways [13]. As such, we hypothesize that ID8-like indels are characteristic of UV damage in human cells. Since we also detect deletions with limited microhomologies at the junctions, it is possible that in addition to canonical NHEJ, microhomology mediated end joining (MMEJ) or polymerase theta-mediated end joining (TMEJ) [48,49] may also participate in the repair of UV-induced DSBs in skin cells. In addition to deletions, we also detected a few instances of long insertions, often templated from the flanking sequences in our samples. Such locally templated insertions are highly characteristic of TMEJ and are likely formed by the Polθ-dependent synthesis wherein one resected DSB end uses the second resected DSB end for synthesis [42]. Overall, our data indicates that ID8-like indels along with a small number of templated insertion events accumulate in skin cells, due to UV-induced DNA damage and error-prone repair via NHEJ or TMEJ.

Interestingly, although non-UV mutations (nCg➔nTg and indels at homopolymeric repeats) increased with the ages of the participants, we did not see a similar correlation between UV-exposure-induced mutations and ages of the donors. Since we are measuring mutation load in sun-shielded skin cells, as such, even intermittent UV-exposure due to clothing and lifestyle choices of the participants during their lifetimes are likely to lead to the formation of UV-induced DNA damage and impact the lifetime accumulation of UV-induced mutation load in hip-derived cells. Thus, the absence of a correlation between age of the donors and the UV-induced mutation load might be due to differences in overall accumulation of UV-exposure across the lifetime of different donors.

DNA double strand breaks can be channeled into repair via two major pathways, HR or NHEJ. One major factor that determines the choice of the repair pathway in cells is the cell cycle stage. Cells in the S or early G2 phases of the cell cycle predominantly repair DSBs via HR, while NHEJ events peak in the G1 or late G2 phases [50,51]. Since HR is mostly error-free, we would not be able to detect HR activity that may have occurred in skin cells. Nonetheless, the prominent presence of UV-associated NHEJ or TMEJ-generated indels in human skin further indicates that the majority of UV-associated damage and mutagenesis accrues in quiescent non dividing cells.

We also demonstrated here that UV-associated mutation load is decreased in skin cells from African American donors as compared to White donors. We hypothesize that this effect may be due to the protective effects of melanin on UV-induced DNA damage. In agreement with the decreased mutation load, are the lower rates of skin cancer in African Americans. While skin cancer accounts for up to 35–45% of all cancers in Caucasians [52], it only accounts for 1–2% of the neoplasms in African Americans [5355]. Moreover, the impact of UV-exposure as a risk factor for skin cancers is decreased in African Americans as compared to Caucasians [55,56]. These observations imply that lowered mutation burden due to UV-radiation is indicative of the lower risk of UV-induced skin cancer in African Americans.

In addition to small indels, we also detected large structural variant hotspots in our samples that often coincided with CFSs. For example, rearrangements in FRA7J were found in 14 different donors and was the most common hotspot in our samples (S11 Table). Recurrent breakage at this fragile site has been implicated in the Williams-Beuren syndrome and and this region contains the genes LIMK1, EIF4H(WBSCR1), AUTS2 as well as the tumor suppressor gene FZD9 [57,58]. One explanation for the large number of rearrangements found at a single fragile site is that the genes within this fragile site are preferentially expressed in fibroblasts that may cause transcription-replication collisions often leading to breakage and rearrangements. Alternatively, the replication timing within the fragile locus may be delayed leading to unfinished replication and fragility. Tissue-specific expression and alteration in replication timing at fragile sites has been observed previously in cultured cells [5961]. As such, we surmise that fibroblast-specific replication-associated difficulties at common fragile sites lead to the formation of rearrangement hotspots in normal skin.

Overall, our work provides an accurate and comprehensive catalog of the somatic genome changes attributable to different DNA damaging processes that act upon human skin cells over the lifetime of the individuals. Our analysis uniquely identifies and measures the impacts of endogenously operating DNA damage, DNA replication errors as well as environmental DNA damage on the somatic mutation load and profiles in each single cell-derived lineage. Finally, we provide the reference for the burden, types and etiologies underlying somatic genome instability in cells of healthy human skin which is required for defining disease level of somatic genome instability.

Materials and methods

Ethics statement

Written consent was obtained from all participants in the Environmental Polymorphisms Registry (registered with ClinicalTrials.gov, NCT00341237, and approved by the NIH Institutional Review Board, protocol 04-E-0053). Each participant provided their age, sex and self-identified race.

Sample collection and processing

4 mm punch skin biopsies were collected from donors’ hips. Samples were collected from healthy cancer-free skin. After overnight incubation of the biopsies at 4°C in 2.66 units/ml dispase (Roche) and 50μg/ml gentamycin (Sigma Aldrich), the epidermis and dermis were separated. The epidermis was emulsified and plated in a six-well cell culture dish in the DermaLife Ma Melanocyte Medium Complete Kit (Lifeline Cell Technology) supplemented with 100μg/ml primocin (Invitrogen). Melanocytes were identified based on their dendritic shape and ability to grow adhered to the dish in serum-free media. The dermis from each biopsy was divided into six to eight pieces which were then allowed to adhere to a six-well cell culture dish and were grown in Dulbecco’s modified eagle’s medium (Gibco) supplemented with 1X non-essential amino acids (Hyclone), 10% Cosmic Calf Serum (Hyclone), 10% AmnioMax C-100 supplement (Gibco) and 100μg/ml primocin. Fibroblasts were identified as adherent cells elongated in shape that grew from the dermis pieces. All cultures were incubated at 37°C in a 5% carbon dioxide containing incubator. A portion of bulk cultures of both fibroblasts and melanocytes were harvested for genomic DNA, and another portion was diluted and plated to obtain single cell-derived clones. Fibroblast clones were expanded in culture for 5 to 6 additional passages (4–6 weeks) to obtain ~106 cells, and genomic DNA was extracted. Genomic DNA extraction from all samples was performed with DNeasy Blood and Tissue kit (Qiagen). Melanocyte clones were expanded in culture for 2 to 3 passages to obtain 10,000 cells and genomic DNA was extracted. 1 to 2.5 ng of the melanocyte genomic DNA was treated with USER (NEB) to remove deaminated cytosines from genomic DNA that are an artifact of DNA extraction [62]. The DNA was amplified using the REPLI-g Mini Kit (Qiagen). 12 to 14 different primer sets were used for PCR across random loci at different chromosomal positions, ranging from 100bp to 500bp, from the amplified genomic DNA. Samples with 10 or more reactions with the correct amplification product were subsequently purified and used for whole-genome sequencing. This quality check allows us to only sequence the genomic DNA with uniform amplification. Venous blood was collected in three to five 8.5mL PAXgene blood DNA tubes (PreAnalytiX/Qiagen) and DNA was isolated from whole blood samples. For 38 DNA samples DNA libraries were prepared using Truseq DNA PCR-free 350bp insert kit (Illumina), and were subsequently sequenced using Illumina HiseqX. For the 30 remaining samples, libraries were prepared using the Nextera DNA Flex library Prep kit (Illumina) and sequenced using the NovaSeq 6000 system. All samples were sequenced as 150 base-paired reads to a depth of 50X to 132X. For a subset of donors, additional skin biopsies were obtained for establishing a second clonal fibroblast lineage.

We also analyzed whole-genome sequenced clonal hip fibroblasts (D1-L-H, D1-R-H1, D1-R-H2, D2-L-H and D2-R-H) from 2 donors that were obtained in a previous study [8].

Calling somatic genome changes in sequenced clones

The FASTQ reads for each clone and blood sample were aligned to the hg19 genome using the GATK best practices pipeline [63]. Three base substitution callers, SomaticSniper [64], VarScan2 [65,66] and Mutect2 [67] were used to identify the clone-specific mutation calls that were not present in blood of the same donors. Only base substitutions detected by all three callers were analyzed further. Any somatic mutations that were also present in the dbSNP138 database, or any SNVs that overlapped SimpleRepeats tract in the UCSC Genome Browser were removed. The final mutation calls were filtered based on allele frequencies, such that only heterozygous mutations with allele frequencies between 45% and 55% or homozygous mutations with allele frequencies greater than 90% were kept. This methodology of using three independent mutation callers and stringent filtering criteria were used previously for accurate measurements of somatic mutations in human fibroblasts and has demonstrated very high accuracy by orthogonal validations [8]. For bulk whole exome sequencing, Mutect2 was used to call mutations. SNVs that overlapped SimpleRepeats tract or were in the dbSNP138 database were removed. The mutation calls for both whole exome and whole-genome sequencing have been organized as MAF files in the TCGA format and have been submitted to dbGAP study phs001182.v2.p1. Somatic structural variants within 1Mb of each other in different donors were marked as being within a “hotspot”.

Delly was used to identify structural variants in the form of deletions, duplications, inversions and translocations [8,68]. Calls which were designated “LowQual” and/or “IMPRECISE” were removed. Clonality of structural variants was determined based on the allelic fraction of reads supporting the variants in the clone. Structural variants with 30% or more reads supporting the structural variant and the absence of any reads supporting the variants in blood were denoted as clonal somatic changes. Due to the low number of variants, we cannot rule out that some of the structural variants may have been generated due to a rearrangement in during the first few cell divisions of the founder cell in culture. However, we think this is unlikely as these cells were passaged less than 3 times before the generation of a clonal lineage. Moreover, the number and types of variants are similar to those detected in previous studies [6971] indicating that the structural variants detected in our work were likely present in founder cells.

Indels were detected using the tool SV-ABA [72] and were filtered based on multiple criterion. Indel calls with a quality score less than 50 were removed and were only included if they occurred between 45%-55% allele frequency (heterozygous indel), or between 90%-100% allele frequency (homozygous indel). Indels that overlapped with the SimpleRepeats or the RepeatMasker tracts were removed as these calls were often found to be erroneous. A subset of the indels were visually verified by inspection of the alignments using the Integrative Genomics Viewer [73]. 46 indels were orthogonally verified via PCR amplification and Sanger Sequencing.

Analyzing base substitution and indel signatures in clones

We used the SigProfilerMatrixGenerator [74] to identify the different types of indels in our samples. SigProfilerExtractor [13] was used to deconvolute the single base substitution, dinucleotide base substitution and indel signatures in our samples. 9 processes with 10 iterations were used within SigProfilerExtractor for extraction of indel and base substitution signatures. MutationalPatterns [75] was used to both identify the cosine similarities between the mutation patterns in samples in this study and the signatures identified in COSMIC and to also identify the contributions of COSMIC signatures in our samples. For base substitutions, the function fit_to_signatures() within MutationalPatterns was used to identify the contributions of the COSMIC signatures on the mutation profile of each sample. For indels, the function fit_to_signatures() was modified to allow the matrix to have 83 rows instead of 96 so that the indels in our samples could be compared to the known ID signatures (83 channels) in COSMIC.

The enrichment of mutation signatures in each of the samples analyzed in this study was calculated as described in [8,14,35]. For this calculation, context is defined as the +/- 20 bases surrounding the mutated base. The mutated residue is capitalized in the annotation of the signature and the equation to calculate enrichment of a given mutation signature is provided below with the UV-mutation signature yCn➔yTn as an example.

Enrichment(yCnyTn)=[MutationsyCnyTn]×[Contextc][MutationsCT]×[Contextycn]

For each motif, the reverse complement was also taken into account in the calculations. Mutations <10 bases apart, are excluded in this calculation as these are “complex” mutations that likely arise due to the activity of translesion polymerases and may confound the analysis of the mutation signatures. To determine if increased fold enrichments for the mutation signatures were statistically relevant a Fisher’s Exact test was performed wherein the ratio of the number of mutations within the trinucleotide motif (MutationsyCn→yTn), and those that do not conform to the trinucleotide motif (MutationsC→T), were compared to the number of unmutated bases in the context that either were in the trinucleotide motif (Contextycn) versus those that were not in the context (Contextc). Multiple hypothesis testing was further accounted for by the correction of the P-values via the Benjamini-Hochberg method. For samples where enrichment > 1 and the corrected P-value < 0.05, the Minimum Mutation load was calculated for the enriched signature. The equation for calculating this is provided below for the yCn➔yTn mutation signature.

MinimumMutLoad(yCnyTn)=[MutationsyCnyTn]×[EnrichmentyCnyTn1][EnrichmentyCnyTn]

Analysis of bulk DNA samples

We also sequenced the exomes of 15 fibroblast bulk samples and 4 melanocyte bulk samples directly cultured from the biopsies. Bulk samples are defined as cells that were not propagated clonally. Libraries were prepared using the Nextera Flex for Enrichment library prep kit, Illumina Exome Panel (Illumina) and IDT for Illumina UD Indexes (Illumina). All samples were sequenced using the NovaSeq 6000 system up to approximately ~150X depth. Somatic mutations were called in the samples by using Mutect2. Whole-genome-sequenced blood samples from the donor corresponding to each bulk sample was used as a proxy for germline mutations. Only single nucleotide variants called by Mutect2 were further analyzed. Any mutations that were within the dbSNP138 database or in the SimpleRepeats tract were removed.

Annotation of SNVs

We used Annovar [76] to annotate SNVs for changes to protein sequence using the refGene track from UCSC Genome Browser. Nonsynonymous SNVs or SNVs affecting start or stop codons and splice sites were further annotated using the Cancer Genome Interpreter [39] as driver mutations or passenger mutations.

Supporting information

S1 Fig. The distribution of the allele frequencies of the whole genome sequenced clones in this study.

The plots for melanocyte clones are in red. The source data for this figure is in S2 Table.

(TIF)

S2 Fig

(A)The mutation spectra in each sequenced clone in this study. The melanocyte clones are marked with an “M” in the X-axis. The source data for this figure are in S2 Table. (B) The NMF-derived mutation signature loads as determined by SigProfilerExtractor in each clone sequenced in this study. Samples form African American donors are annotated with an “A” and melanocyte clones are marked with an “M” in the X-axis. The source data for this figure are in S3 Table.

(TIF)

S3 Fig. The NMF-derived single base substitution and double base substitution signatures identified in human skin cells.

Total mutations corresponding to the signature in the cohort as determined by SigProfilerExtractor are shown.

(TIF)

S4 Fig. The correlation of the different UV-specific mutation signatures in the samples in this study.

The total nTt➔nCt mutations and the CC➔TT mutations are plotted against the yCn➔yTn total mutation load in the samples. The black inclined line denotes the linear regression of the data, and the dotted black lines denote the 95% confidence intervals. The source data for this figure are in S4 Table.

(TIF)

S5 Fig. Comparison of the NMF-derived mutation signatures and the trinucleotide-specific mutation signatures in this study.

The nCg➔nTg minimum mutation load in each sample is plotted against SBS1-associated mutations as determined by SigProfilerExtractor, and the yCn➔yTn minimum mutation load in each sample is plotted against SBS7b-associated mutations. The linear regression of the data is shown, and the dotted lines denote the 95% confidence intervals. The source data for this figure are in S3 and S4 Tables.

(TIF)

S6 Fig. The distribution of the allele frequencies of the whole exome sequenced bulk samples in this study.

The source data for this figure is in S5 Table.

(TIF)

S7 Fig. The distribution of the allele frequencies of consensus alleles and cancer drivers.

(A) The allele frequencies of the consensus SNVs identified in the bulk and the corresponding clones are shown. (B) The allele frequency distribution of the cancer driver mutations identified in the exome of the bulk samples. The source data for this figure is in S5 Table.

(TIF)

S8 Fig. The NMF-derived indel mutation signatures in this study.

Total mutations corresponding to the signature as determined by SigProfilerExtractor are shown.

(TIF)

S9 Fig. The analyses of the impact of sex on mutation and indel load in the samples.

The total base substitutions and the total indels in the clonal lineages derived from males and females in this study are shown. A Mann-Whitney U-test was used to determine if the distribution of mutation and indel load were statistically different between the two cohorts. The P-values for the base substitutions was 0.4041, while the P-value for the indels was 0.9401. The source data for this figure is in S10 Table.

(TIF)

S1 Table. Coverage statistics and donor characteristics for all samples sequenced in this study.

(XLSX)

S2 Table. The somatic base substitutions in the whole genome sequenced single skin cell clonal lineages.

a) The fraction of all SNVs prior to filtering that correspond to each allele frequency bin. SNVs that corresponded to allele frequencies between 45% and 55% or above 90% were considered clonal. b) The exonic somatic base substitutions in the samples. c) The mutation spectra in the samples. The reverse complements are considered in the mutation spectra analyses.

(XLSX)

S3 Table. Agnostic base substitution signature analyses.

a) The contributions of previously determined mutation signatures using MutationalPatterns. b) The number of mutations corresponding to each signature identified by SigProfilerExtractor

(XLSX)

S4 Table. Motif-specific mutation signature analyses.

a) nCg➔nTg mutation signature analysis b)yCn➔yTn mutation signature analysis. c) nTt➔nCt mutation signature analysis. d) CC➔TT total dinucleotide substitutions.

(XLSX)

S5 Table. The somatic mutation list for whole exome sequenced bulk tissues.

(XLSX)

S6 Table. Somatic indels identified in the samples.

(XLSX)

S7 Table. The distribution of the types of indels in each sample as identified by SigProfilerMatrixGenerator.

(XLSX)

S8 Table. Agnostic indel signature analysis.

a) The contributions of previously determined mutation signatures using MutationalPatterns. b) The number of mutations corresponding to each signature identified by SigProfilerExtractor

(XLSX)

S9 Table. The somatic templated insertions identified in human skin fibroblasts and melanocytes.

(XLSX)

S10 Table. The somatic genome changes in the samples along with the cell type, and the sex and race of the donors.

a) The total number of somatic genome changes in the samples along with the cell type, and the sex and race of the donors. b) The median values and P-values for Mann-Whitney tests for pairwise comparisons of the somatic genome changes between the sets of clones obtained from White and African American donors.

(XLSX)

S11 Table. The somatic structural variants identified in the donors.

a) All the somatic structural variants annotated for hotspots, common fragile sites and microhomologies. b) A Fisher's exact test for the use of microhomologies in the SVs that are present within or out of common fragile sites.

(XLSX)

Acknowledgments

We are thankful to Drs Natasha Degtyareva, Kathleen Hudson, Scott Lujan and Sriram Vijayraghavan for critically reading this manuscript and providing their feedback.

Data Availability

All BAM and MAF files are available under controlled access from the dbGaP database (phs001182.v2.p1 https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001182.v2.p1). All other data including the underlying numerical data for all of graphs and summary statistics are in Supplementary Tables. The R-code for analysis of the trinucleotide-specific mutation signatures can be accessed via https://github.com/NIEHS/P-MACD".

Funding Statement

This work was supported by the US National Institute of Health Intramural Research Program Project Z1AES103266 to D.A.G. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Tubbs A, Nussenzweig A. Endogenous DNA Damage as a Source of Genomic Instability in Cancer. Cell. 2017;168(4). 10.1016/j.cell.2017.01.002 WOS:000396277600015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Lindahl T, Barnes DE. Repair of endogenous DNA damage. Cold Spring Harb Sym. 2000;65:127–33. 10.1101/sqb.2000.65.127 WOS:000169676800014. [DOI] [PubMed] [Google Scholar]
  • 3.Lodato MA, Woodworth MB, Lee S, Evrony GD, Mehta BK, Karger A, et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science. 2015;350(6256):94–8. Epub 2015/10/03. 10.1126/science.aab1785 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Rouhani FJ, Nik-Zainal S, Wuster A, Li Y, Conte N, Koike-Yusa H, et al. Mutational History of a Human Cell Lineage from Somatic to Induced Pluripotent Stem Cells. PLoS Genet. 2016;12(4):e1005932 10.1371/journal.pgen.1005932 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Abyzov A, Tomasini L, Zhou B, Vasmatzis N, Coppola G, Amenduni M, et al. One thousand somatic SNVs per skin fibroblast cell set baseline of mosaic mutational load with patterns that suggest proliferative origin. Genome Res. 2017;27(4):512–23. Epub 2017/02/24. 10.1101/gr.215517.116 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.D'Antonio M, Benaglio P, Jakubosky D, Greenwald WW, Matsui H, Donovan MKR, et al. Insights into the Mutational Burden of Human Induced Pluripotent Stem Cells from an Integrative Multi-Omics Approach. Cell Rep. 2018;24(4):883–94. Epub 2018/07/26. 10.1016/j.celrep.2018.06.091 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Franco I, Helgadottir HT, Moggio A, Larsson M, Vrtačnik P, Johansson A, et al. Whole genome DNA sequencing provides an atlas of somatic mutagenesis in healthy human cells and identifies a tumor-prone cell type. Genome Biol. 2019;20(1):285–. 10.1186/s13059-019-1892-z . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Saini N, Roberts SA, Klimczak LJ, Chan K, Grimm SA, Dai S, et al. The Impact of Environmental and Endogenous Damage on Somatic Mutation Load in Human Skin Fibroblasts. PLoS genetics. 2016;12(10):e1006385–e. 10.1371/journal.pgen.1006385 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bae T, Tomasini L, Mariani J, Zhou B, Roychowdhury T, Franjic D, et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science. 2018;359(6375):550–5. Epub 2017/12/09. 10.1126/science.aan8690 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Blokzijl F, de Ligt J, Jager M, Sasselli V, Roerink S, Sasaki N, et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature. 2016;538(7624):260–4. Epub 2016/10/05. 10.1038/nature19768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Brunner SF, Roberts ND, Wylie LA, Moore L, Aitken SJ, Davies SE, et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature. 2019;574(7779):538–42. 10.1038/s41586-019-1670-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013;500(7463):415–21. 10.1038/nature12477 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, et al. The repertoire of mutational signatures in human cancer. Nature. 2020;578(7793):94–101. Epub 2020/02/07. 10.1038/s41586-020-1943-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Roberts SA, Lawrence MS, Klimczak LJ, Grimm SA, Fargo D, Stojanov P, et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat Genet. 2013;45(9):970–6. Epub 2013/07/16. 10.1038/ng.2702 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Roberts SA, Gordenin DA. Hypermutation in human cancer genomes: footprints and mechanisms. Nature reviews Cancer. 2014;14(12):786–800. Epub 2014/11/25. 10.1038/nrc3816 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Tomasetti C, Vogelstein B, Parmigiani G. Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation. Proc Natl Acad Sci U S A. 2013;110(6):1999–2004. 10.1073/pnas.1221068110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kaufmann WK, Cleaver JE. Mechanisms of inhibition of DNA replication by ultraviolet light in normal human and xeroderma pigmentosum fibroblasts. J Mol Biol. 1981;149(2):171–87. Epub 1981/06/25. 10.1016/0022-2836(81)90297-7 . [DOI] [PubMed] [Google Scholar]
  • 18.Rudolph CJ, Upton AL, Lloyd RG. Replication fork stalling and cell cycle arrest in UV-irradiated Escherichia coli. Genes Dev. 2007;21(6):668–81. Epub 2007/03/21. 10.1101/gad.417607 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Sugiyama T, Chen Y. Biochemical reconstitution of UV-induced mutational processes. Nucleic Acids Res. 2019;47(13):6769–82. Epub 2019/05/06. 10.1093/nar/gkz335 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Yu SL, Johnson RE, Prakash S, Prakash L. Requirement of DNA polymerase eta for error-free bypass of UV-induced CC and TC photoproducts. Mol Cell Biol. 2001;21(1):185–8. Epub 2000/12/13. 10.1128/MCB.21.1.185-188.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Stary A, Kannouche P, Lehmann AR, Sarasin A. Role of DNA polymerase eta in the UV mutation spectrum in human cells. J Biol Chem. 2003;278(21):18767–75. Epub 2003/03/20. 10.1074/jbc.M211838200 . [DOI] [PubMed] [Google Scholar]
  • 22.Johnson RE, Prakash S, Prakash L. Efficient bypass of a thymine-thymine dimer by yeast DNA polymerase, Poleta. Science. 1999;283(5404):1001–4. Epub 1999/02/12. 10.1126/science.283.5404.1001 . [DOI] [PubMed] [Google Scholar]
  • 23.Masutani C, Kusumoto R, Iwai S, Hanaoka F. Mechanisms of accurate translesion synthesis by human DNA polymerase eta. EMBO J. 2000;19(12):3100–9. Epub 2000/06/17. 10.1093/emboj/19.12.3100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Washington MT, Prakash L, Prakash S. Mechanism of nucleotide incorporation opposite a thymine-thymine dimer by yeast DNA polymerase eta. Proc Natl Acad Sci U S A. 2003;100(21):12093–8. Epub 2003/10/07. 10.1073/pnas.2134223100 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Dumstorf CA, Clark AB, Lin Q, Kissling GE, Yuan T, Kucherlapati R, et al. Participation of mouse DNA polymerase iota in strand-biased mutagenic bypass of UV photoproducts and suppression of skin cancer. Proc Natl Acad Sci U S A. 2006;103(48):18083–8. Epub 2006/11/23. 10.1073/pnas.0605247103 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Ziv O, Geacintov N, Nakajima S, Yasui A, Livneh Z. DNA polymerase zeta cooperates with polymerases kappa and iota in translesion DNA synthesis across pyrimidine photodimers in cells from XPV patients. Proc Natl Acad Sci U S A. 2009;106(28):11552–7. Epub 2009/07/01. 10.1073/pnas.0812548106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Yoon JH, Prakash L, Prakash S. Highly error-free role of DNA polymerase eta in the replicative bypass of UV-induced pyrimidine dimers in mouse and human cells. Proc Natl Acad Sci U S A. 2009;106(43):18219–24. Epub 2009/10/14. 10.1073/pnas.0910121106 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Zhang H, Siede W. UV-induced T—>C transition at a TT photoproduct site is dependent on Saccharomyces cerevisiae polymerase eta in vivo. Nucleic Acids Res. 2002;30(5):1262–7. 10.1093/nar/30.5.1262 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.McCulloch SD, Kokoska RJ, Masutani C, Iwai S, Hanaoka F, Kunkel TA. Preferential cis-syn thymine dimer bypass by DNA polymerase eta occurs with biased fidelity. Nature. 2004;428(6978):97–100. 10.1038/nature02352 . [DOI] [PubMed] [Google Scholar]
  • 30.Lopes M, Foiani M, Sogo JM. Multiple mechanisms control chromosome integrity after replication fork uncoupling and restart at irreparable UV lesions. Mol Cell. 2006;21(1):15–27. Epub 2006/01/03. 10.1016/j.molcel.2005.11.015 . [DOI] [PubMed] [Google Scholar]
  • 31.Elvers I, Johansson F, Groth P, Erixon K, Helleday T. UV stalled replication forks restart by re-priming in human fibroblasts. Nucleic Acids Res. 2011;39(16):7049–57. Epub 2011/06/08. 10.1093/nar/gkr420 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Martincorena I, Roshan A, Gerstung M, Ellis P, Van Loo P, McLaren S, et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science. 2015;348(6237):880–6. 10.1126/science.aaa6806 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tang J, Fewings E, Chang D, Zeng H, Liu S, Jorapur A, et al. The genomic landscapes of individual melanocytes from human skin. Nature. 2020;586(7830):600–5. Epub 2020/10/09. 10.1038/s41586-020-2785-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Saini N, Gordenin DA. Somatic mutation load and spectra: A record of DNA damage and repair in healthy human cells. Environ Mol Mutagen. 2018;59(8):672–86. Epub 2018/08/29. 10.1002/em.22215 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Chan K, Roberts SA, Klimczak LJ, Sterling JF, Saini N, Malc EP, et al. An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat Genet. 2015;47(9):1067–72. Epub 2015/08/11. 10.1038/ng.3378 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, Nik-Zainal S, et al. Clock-like mutational processes in human somatic cells. Nat Genet. 2015;47(12):1402–7. 10.1038/ng.3441 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Moore L, Leongamornlert D, Coorens THH, Sanders MA, Ellis P, Dentro SC, et al. The mutational landscape of normal human endometrial epithelium. Nature. 2020;580(7805):640–6. Epub 2020/05/01. 10.1038/s41586-020-2214-z . [DOI] [PubMed] [Google Scholar]
  • 38.Lee-Six H, Olafsson S, Ellis P, Osborne RJ, Sanders MA, Moore L, et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature. 2019;574(7779):532–7. 10.1038/s41586-019-1672-7 [DOI] [PubMed] [Google Scholar]
  • 39.Tamborero D, Rubio-Perez C, Deu-Pons J, Schroeder MP, Vivancos A, Rovira A, et al. Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations. Genome Med. 2018;10(1):25 Epub 2018/03/30. 10.1186/s13073-018-0531-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Gordenin DA, Resnick MA. Yeast ARMs (DNA at-risk motifs) can reveal sources of genome instability. Mutat Res. 1998;400(1–2):45–58. Epub 1998/08/01. 10.1016/s0027-5107(98)00047-5 . [DOI] [PubMed] [Google Scholar]
  • 41.Sia EA, Jinks-Robertson S, Petes TD. Genetic control of microsatellite stability. Mutat Res. 1997;383(1):61–70. Epub 1997/01/31. 10.1016/s0921-8777(96)00046-8 . [DOI] [PubMed] [Google Scholar]
  • 42.Carvajal-Garcia J, Cho JE, Carvajal-Garcia P, Feng W, Wood RD, Sekelsky J, et al. Mechanistic basis for microhomology identification and genome scarring by polymerase theta. Proc Natl Acad Sci U S A. 2020;117(15):8476–85. Epub 2020/04/03. 10.1073/pnas.1921791117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Yu AM, McVey M. Synthesis-dependent microhomology-mediated end joining accounts for multiple types of repair junctions. Nucleic Acids Res. 2010;38(17):5706–17. Epub 2010/05/13. 10.1093/nar/gkq379 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kumar R, Nagpal G, Kumar V, Usmani SS, Agrawal P, Raghava GPS. HumCFS: a database of fragile sites in human chromosomes. BMC Genomics. 2019;19(Suppl 9):985 Epub 2019/04/20. 10.1186/s12864-018-5330-5 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499(7457):214–8. Epub 2013/06/19. 10.1038/nature12213 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Martincorena I, Fowler JC, Wabik A, Lawson ARJ, Abascal F, Hall MWJ, et al. Somatic mutant clones colonize the human esophagus with age. Science. 2018;362(6417):911–7. Epub 2018/10/20. 10.1126/science.aau3879 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Yoshida K, Gowers KHC, Lee-Six H, Chandrasekharan DP, Coorens T, Maughan EF, et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature. 2020;578(7794):266–72. Epub 2020/01/31. 10.1038/s41586-020-1961-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Wang H, Xu X. Microhomology-mediated end joining: new players join the team. Cell Biosci. 2017;7:6 Epub 2017/01/20. 10.1186/s13578-017-0136-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Sfeir A, Symington LS. Microhomology-Mediated End Joining: A Back-up Survival Mechanism or Dedicated Pathway? Trends Biochem Sci. 2015;40(11):701–14. Epub 2015/10/07. 10.1016/j.tibs.2015.08.006 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Karanam K, Kafri R, Loewer A, Lahav G. Quantitative live cell imaging reveals a gradual shift between DNA repair mechanisms and a maximal use of HR in mid S phase. Mol Cell. 2012;47(2):320–9. Epub 2012/07/31. 10.1016/j.molcel.2012.05.052 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Delacote F, Lopez BS. Importance of the cell cycle phase for the choice of the appropriate DSB repair pathway, for genome stability maintenance: the trans-S double-strand break repair model. Cell Cycle. 2008;7(1):33–8. Epub 2008/01/17. 10.4161/cc.7.1.5149 . [DOI] [PubMed] [Google Scholar]
  • 52.Ridky TW. Nonmelanoma skin cancer. J Am Acad Dermatol. 2007;57(3):484–501. Epub 2007/05/22. 10.1016/j.jaad.2007.01.033 . [DOI] [PubMed] [Google Scholar]
  • 53.Halder RM, Bridgeman-Shah S. Skin cancer in African Americans. Cancer. 1995;75(2 Suppl):667–73. Epub 1995/01/15. . [DOI] [PubMed] [Google Scholar]
  • 54.Gloster HM Jr., Neal K. Skin cancer in skin of color. J Am Acad Dermatol. 2006;55(5):741–60; quiz 61–4. Epub 2006/10/21. 10.1016/j.jaad.2005.08.063 . [DOI] [PubMed] [Google Scholar]
  • 55.Bradford PT. Skin cancer in skin of color. Dermatol Nurs. 2009;21(4):170–7, 206; quiz 178. Epub 2009/08/21. [PMC free article] [PubMed] [Google Scholar]
  • 56.Halder RM, Bang KM. Skin cancer in blacks in the United States. Dermatol Clin. 1988;6(3):397–405. Epub 1988/07/01. . [PubMed] [Google Scholar]
  • 57.Plaja A, Castells N, Cueto-Gonzalez AM, del Campo M, Vendrell T, Lloveras E, et al. A Novel Recurrent Breakpoint Responsible for Rearrangements in the Williams-Beuren Region. Cytogenet Genome Res. 2015;146(3):181–6. Epub 2015/09/19. 10.1159/000439463 . [DOI] [PubMed] [Google Scholar]
  • 58.Georgakilas AG, Tsantoulis P, Kotsinas A, Michalopoulos I, Townsend P, Gorgoulis VG. Are common fragile sites merely structural domains or highly organized "functional" units susceptible to oncogenic stress? Cell Mol Life Sci. 2014;71(23):4519–44. Epub 2014/09/23. 10.1007/s00018-014-1717-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Le Tallec B, Dutrillaux B, Lachages AM, Millot GA, Brison O, Debatisse M. Molecular profiling of common fragile sites in human fibroblasts. Nat Struct Mol Biol. 2011;18(12):1421–3. Epub 2011/11/08. 10.1038/nsmb.2155 . [DOI] [PubMed] [Google Scholar]
  • 60.Murano I, Kuwano A, Kajii T. Fibroblast-specific common fragile sites induced by aphidicolin. Hum Genet. 1989;83(1):45–8. Epub 1989/08/01. 10.1007/BF00274145 . [DOI] [PubMed] [Google Scholar]
  • 61.Maccaroni K, Balzano E, Mirimao F, Giunta S, Pelliccia F. Impaired Replication Timing Promotes Tissue-Specific Expression of Common Fragile Sites. Genes (Basel). 2020;11(3). Epub 2020/03/25. 10.3390/genes11030326 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Chen C, Xing D, Tan L, Li H, Zhou G, Huang L, et al. Single-cell whole-genome analyses by Linear Amplification via Transposon Insertion (LIANTI). Science. 2017;356(6334):189–94. Epub 2017/04/15. 10.1126/science.aak9787 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43(5):491–8. Epub 2011/04/12. 10.1038/ng.806 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Larson DE, Harris CC, Chen K, Koboldt DC, Abbott TE, Dooling DJ, et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics. 2012;28(3):311–7. Epub 2011/12/14. 10.1093/bioinformatics/btr665 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Koboldt DC, Larson DE, Wilson RK. Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection. Curr Protoc Bioinformatics. 2013;44:15 4 1–7. Epub 2015/01/02. 10.1002/0471250953.bi1504s44 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22(3):568–76. Epub 2012/02/04. 10.1101/gr.129684.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 2013;31(3):213–9. Epub 2013/02/12. 10.1038/nbt.2514 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Rausch T, Zichner T, Schlattl A, Stutz AM, Benes V, Korbel JO. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28(18):i333–i9. Epub 2012/09/11. 10.1093/bioinformatics/bts378 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Abyzov A, Mariani J, Palejev D, Zhang Y, Haney MS, Tomasini L, et al. Somatic copy number mosaicism in human skin revealed by induced pluripotent stem cells. Nature. 2012;492(7429):438–42. Epub 2012/11/20. 10.1038/nature11629 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Knouse KA, Wu J, Whittaker CA, Amon A. Single cell sequencing reveals low levels of aneuploidy across mammalian tissues. Proc Natl Acad Sci U S A. 2014;111(37):13409–14. Epub 2014/09/10. 10.1073/pnas.1415287111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Knouse KA, Wu J, Amon A. Assessment of megabase-scale somatic copy number variation using single-cell sequencing. Genome Res. 2016;26(3):376–84. Epub 2016/01/17. 10.1101/gr.198937.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Wala JA, Bandopadhayay P, Greenwald NF, O'Rourke R, Sharpe T, Stewart C, et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 2018;28(4):581–91. Epub 2018/03/15. 10.1101/gr.221028.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Thorvaldsdottir H, Robinson JT, Mesirov JP. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 2013;14(2):178–92. Epub 2012/04/21. 10.1093/bib/bbs017 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Bergstrom EN, Huang MN, Mahto U, Barnes M, Stratton MR, Rozen SG, et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics. 2019;20(1):685 Epub 2019/09/01. 10.1186/s12864-019-6041-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Blokzijl F, Janssen R, van Boxtel R, Cuppen E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 2018;10(1):33 Epub 2018/04/27. 10.1186/s13073-018-0539-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164 Epub 2010/07/06. 10.1093/nar/gkq603 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Mitch McVey, David J Kwiatkowski

Transfer Alert

This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.

30 Sep 2020

Dear Dr Gordenin,

Thank you very much for submitting your Research Article entitled 'UV-exposure, endogenous DNA damage and DNA replication errors shape the spectra of genome changes in human skin' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by three independent peer reviewers. The reviewers appreciated the attention to an important topic and thought that your analysis of UV-induced mutations between different ethnic groups adds novelty to the study. However, they also identified some aspects of the manuscript that should be improved to increase its impact.

Specifically, they asked for additional analyses of some of the data and modifications or additions to several figures. Two reviewers would like to see justifications for the choice of thresholds for allele frequencies. In addition, all felt strongly that the code used during the analysis should be made available on a public repository such as GitHub. Finally, I agree with reviewer 3 that it would be helpful if you could speculate as to the basis behind the surprising finding that UV-induced mutation load is age-independent.

We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer.

In addition, we ask that you:

1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images.

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

* Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. *

To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

[LINK]

Please let us know if you have any questions while making these revisions.

Sincerely,

Mitch McVey

Guest Editor

PLOS Genetics

David Kwiatkowski

Section Editor: Cancer Genetics

PLOS Genetics

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: In the manuscript the authors describe their study of somatic mutations in the normal skin of multiple individuals. For this they rely on cell cloning, i.e., single cells are cultured until colonies are of sufficient size to be sequenced without (or with little) DNA amplification. There were a few previous and similar studies conducted and cited in this manuscript. The novel aspects of this study are: 1) cloning and studying mutations in melanocytes; 2) using samples from different races; 3) using samples from different sexes; 3) a wide age range of participants donated skin samples; 4) analysis that spans beyond just SNVs and includes indels and SVs.

The study reports several significant, new, and interesting findings, of which for me the most interesting were:

* support of UV-induced mutations from by identified indels;

* the experimental evidence that dark skin has protective effect against UV-induced mutations.

I only have a few suggestions on how to improve the manuscript, as well as some minor questions/concerns.

* The threshold of allele frequency for somatic variants in clones “within 45% and 55%” seems quite stringent. Could the authors estimate their sensitivity for discovering the variants from founder cells in a clone?

* Can the authors estimate an average increase in non-UV-induced mutations with age?

* Figure 3B. It is common to show the distribution of indels using negative values for the length of deletions. Please consider this.

* It was unclear which variants were analyzed for cancer drivers. Were there any mutations discovered in clones and in bulks that were predicted to be a driver?

* Can the authors comment about breakpoint features of SVs in CFS and out of CFS? Were there any differences?

* Could the authors comment on the possibility of SVs in the CFS being the results of culturing fibroblasts before deriving clones? I wonder whether CFSs mentioned in the text may have been determined as such from cultured fibroblasts?

* Line 346: Should there be a letter after the first stated CFS “FRA7”?

Reviewer #2: In this work Saini et al extend their previous study (Saini et al, PloS Genetics 2016) on the whole genome sequencing of clonally expanded skin fibroblasts sampled from anatomical sites presumed to have experienced a range of UV exposure. The previous work considered six sampling sites from one donor and four sampling sites from a second donor, both donors of primarily European ancestry. This new work re-analyses the hip-derived samples from the existing data but substantially adds to it expanding the number of donors to 21, including ethnic diversity as a sampling metric and including a limited comparison of mutation patterns between skin fibroblasts and melanocytes.

Analysis includes the quantification of single base substitution signatures, double base changes, indels and structural changes. Several of the results are confirmatory, but importantly so, for example the UV signature mutation load not correlating with patient age and the consistent detection of substantial UV signatures even in sites that are mostly shielded from the sun. However, comparisons of mutation spectra and loads for non-cancerous samples, between ethnicities is to my knowledge novel at the whole-genome multiple sample scale. This reveals a striking reduction in UV signature mutation load for African American samples compared to European – not an unexpected result but important validation of expectation. The authors also find that >4bp deletions also show a relative enrichment in European ancestry skin, suggesting a UV contribution to these multi-basepair deletions. The other key finding is the high frequency of chromosomal rearrangements with breakpoints localised approximately to common fragile sites – again building on earlier observations of the same group but much more solid in this expanded study.

Overall the manuscript is well thought out and presented. The results do provide new insights and the primary data generated is likely to be of interest to multiple groups working on the mechanisms of mutagenesis and those studying skin cancer. To the extent practically possible with human sample data, the experiments are generally well controlled and conclusions justified by the evidence presented. Probably the most important aspect of this work is the comparison of mutation spectra and loads for skin from patients with dramatically different levels of naturally occurring melanin, which is expected to protect from UV damage. A fundamental limitation to the interpretation of these results is that UV exposure per donor is unmeasured and not controlled for. Being hip derived samples the assumption is that these are sites predominantly shielded from the sun, but as such, small differences in exposure (e.g. one bad sunburn) could strongly distort the total lifetime exposure to UV. I think that a caveat/qualification to this effect should be more clearly stated in the discussion.

While I am supportive of publication I suggest the following points could be addressed where practical.

#1. Currently each donor is classified as “White” or “Black or African American”. Within both of these groups there is likely to be a range of skin pigmentation in non-sun-exposed sites (hips). Were measures of that baseline pigmentation taken? It would be useful to report those measures and correlate mutation load/spectra rather than the binary classification of ancestry.

#2. Figure 4 plots include melanocyte clones (the “White” outlier with ~14k SBS mutations must be DAG_H275) but these aren’t discriminated in these plots as they are in figures 1 to 3. They should be indicated, particularly when they represent outlier data-points. Do the significant differences noted in these plots remain significant after excluding the melanocyte clones?

#3. A useful complement to the existing plots would be a plot summarising: (a) SBS mutation load per clone, (b) contribution of mutation signature contribution per clone like Fig S1 but with a richer set of signatures such as COSMIC, (c) annotation of ethnicity and cell type (fibroblast/melanocyte) per clone. Such a plot, possibly rank-ordered by total mutation load, would be an informative summary of the key data generated and would make apparent some results that appear to be masked in the current presentation. For example the clone DAG_H275 has a high SBS load and from Fig S1 looks to be dominated by C→T substitutions consistent with UV exposure, but then comparison to Fig 4a (highest purple value to highest red value) and mutation counts in table S2 indicate that this clone must dominated by rCn to rTn mutations (COSMIC signature SBS2). Is this an outlier amongst melanocyte clones?

#4. Figure 3b. It is difficult to see the individual histogram bars, especially at the left edge. Suggest that this plot would be more informative if points were used rather than histograms and the x-axis shown on a log scale (in addition to the y-axis being kept on a log scale).

#5. Figure S2. X-axis annotations are not clear (partially overlapped by subsequent panels).

#6. Code availability: “The R-code for analysis of the trinucleotide-specific mutation signatures is the same as used in [8] and will be provided on request.” For efficient and long-term reproducibility code, especially that being used across multiple publications, it should be in a pubic repository such as GitHub, Figshare, Zendo or an institutional repository that provides a DOI.

#7. Data availability: In the additional information front section of the submission, the authors state “Yes - all data are fully available without restriction”. This should also be clearly stated in the manuscript. I would also suggest that the work will be more widely used and cited if the full list of somatic mutation calls (e.g. MAF files filtered for germline variants) can be made available without the requirement for DAC approval, which although an important process for limiting access to potentially personally identifiable germline genotypes, is unnecessary limiting for somatic mutations.

Reviewer #3: The manuscript by Saini et al. provides a very interesting analysis of mutation signatures in primary skin cells in donors of different ages and races. As such, it is of great interest to the scientific community. I have several major and minor points for authors to address.

Major points:

How did sequencing errors affect the results?

How did whole-genome amplification, and potential errors and biases, affect the results?

Please justify the choice of 45%-55% and 90% threshold for allele frequencies used. Were simulations used to derive these?

The finding that UV-induced mutation load is independent of age is counter-intuitive. Please provide an explanation behind this unexpected result.

Also, how do the authors explain that UV-based substitution signatures were prominent in many samples obtained from sun-shielded skin?

The paper should be read by a native speaker of English to fix some minor grammar issues.

The authors should provide all their code on GitHub.

The authors should provide a file with all the genetic variants they identified including their genomic locations for all the samples.

Can the authors speculate about the applications of their study with respect to the increased risk of skin cancer in individuals with high UV exposure?

Minor points:

The title should read 'UV exposure, endogenous DNA damage, and DNA replication errors shape the spectra of genome changes in human skin'

Abstract: usually ‘mutation load’ is used as singular

Abstract: ‘…DNA replication stalling at common fragile sites Is a potent source…'

Abstract, last sentence: please change ‘intrinsic factors’ -> ‘endogenous factors’

P. 3, line 39: ‘Large scale’ -> ‘Large-scale’

P. 3, line 44: there should be no comma after ‘Since’

P. 4, line 57: add space after ‘Pol’ and before the greek letter of the polymerase

P. 4, line 59: ‘Error-prone’

P. 4, line 69: ‘UV radiation’

P. 4, line 71: add comma after ‘melanocytes’

P. 4, line 76: ‘burdens’ -> ‘burden’; ‘sun shielded’ -> ‘sun-shielded’

P. 5, line 82: add comma after ‘skin’, change ‘signature’ -> ‘mutation signature’

P. 5, line 86: add comma after ‘(InDels)’

P. 5, line 89: change to ‘…due to either small sample sizes or difficulties…’

P. 6, line 101: add comma after ‘errors’

P. 6, line 114: remove ‘-‘ after ‘cell’

P. 6, line 116: here and elsewhere in the paper spell out all numbers below 10

P. 6, lines 118 and 121: ‘whole-genome’

P. 7, line 122: ‘whole-genome’

Figures 1-3, 5: ‘R-square’ -> ‘R-squared’

P. 7, line 135: ‘sequencing coverage’ -> ‘sequencing depth’ (was this average per site?)

P. 7, line 137: ‘variations’ -> ‘variants’

P. 8, line 163: ‘see’ -> ‘saw’

P. 9, line 169: add comma after ‘SBS5’

P. 9, line 173: ‘that’ -> ‘this’

P. 9, line 179: ‘SBS1-associated’

P. 10, line 189: please add a reference after ‘have previously been shown to be associated with UV-induced DNA damage in human cells’

P. 10, line 206: ‘Whole-exome’

P. 12, line 240: remove comma after ‘Since’

P. 12, line 247: ‘spanning 5 nucleotides or larger’ -> ‘spanning five or more nucleotides’

P. 12, line 248: ‘have microhomology of one or more bases’

P. 12, line 252: ‘deletions of 5 bases or more’ -> ‘deletions of five or more bases’

P. 13, line 254: ‘spanning five or more nucleotides’

P. 13, line 256: ‘indel load'

P. 13, line 257, and elsewhere in the manuscript: ‘InDel’ -> ‘indel’

P. 14, line 290 and elsewhere in the manuscript: ‘gender’ -> ‘sex’

P. 14, line 297: ‘do see’ -> ‘did see’

P. 16, line 324: ‘whole-genome’

P. 16, lines 327 and 338: add space before ‘bp’ and ‘Mbp’

P. 17, line 351: ‘surmise -> ‘hypothesize’

P. 18, line 356: ‘Whole-genome’

P. 18, line 373: ‘endogenously operating’ -> ‘endogenous’

P. 20, line 416: ‘loads’ -> ‘load’

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No: The authors should provide all their code on GitHub.

The authors should provide a file with all the genetic variants they identified including their genomic locations for all the samples.

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Martin S Taylor

Reviewer #3: No

Decision Letter 1

Mitch McVey, David J Kwiatkowski

7 Dec 2020

Dear Dr Gordenin,

We are pleased to inform you that your manuscript entitled "UV-exposure, endogenous DNA damage, and DNA replication errors shape the spectra of genome changes in human skin" has been editorially accepted for publication in PLOS Genetics. Congratulations! The reviewers all agreed that your revisions address their concerns and that the manuscript will make an important contribution to the somatic mutation field. As the managing editor, I agree with this sentiment. Please note that Reviewer 1 did request one additional wording change when you submit the final version.

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Mitch McVey

Guest Editor

PLOS Genetics

David Kwiatkowski

Section Editor: Cancer Genetics

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: The authors did adequately respond to my comments. I would just like to comment on the following statement: “We did not see such a peak in the whole-genome amplified melanocyte clones which could reflect uneven genome amplification and localized genome duplications during the whole-genome amplification step.” The in vitro amplification step is likely to make distribution of allele frequencies broader, but the distribution still should be centered at 50%. Based on Figure S1, a possible interpretation is that some of melanocyte colonies have two founder cells and thus the corresponding distributions have peaks at 25% frequency. In this regard, I’m glad that the authors are careful in describing their results with the statement “… allows us to estimate the minimum accurate number of somatic mutations in the founder cells,” but, in light of possible technical challenges, I would remove word “accurate” from the statement.

Reviewer #2: I am happy that my previously raised points have been addressed to the extent practically possible. To my reading the points raised by other viewers also seem to have been well addressed by the revisions to the manuscript. I'm happy to recommend publication, it's a good well thought out study that complements a recent flurry of papers on somatic mutation in normal (non-cancer) tissue. Clearly a hot area in which this paper still has it's unique selling points including the stratification by ancestry/pigmentation.

Reviewer #3: The authors have adequately addressed my concerns.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: None

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Martin S. Taylor

Reviewer #3: No

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-20-01306R1

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Acceptance letter

Mitch McVey, David J Kwiatkowski

23 Dec 2020

PGENETICS-D-20-01306R1

UV-exposure, endogenous DNA damage, and DNA replication errors shape the spectra of genome changes in human skin

Dear Dr Gordenin,

We are pleased to inform you that your manuscript entitled "UV-exposure, endogenous DNA damage, and DNA replication errors shape the spectra of genome changes in human skin" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Melanie Wincott

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. The distribution of the allele frequencies of the whole genome sequenced clones in this study.

    The plots for melanocyte clones are in red. The source data for this figure is in S2 Table.

    (TIF)

    S2 Fig

    (A)The mutation spectra in each sequenced clone in this study. The melanocyte clones are marked with an “M” in the X-axis. The source data for this figure are in S2 Table. (B) The NMF-derived mutation signature loads as determined by SigProfilerExtractor in each clone sequenced in this study. Samples form African American donors are annotated with an “A” and melanocyte clones are marked with an “M” in the X-axis. The source data for this figure are in S3 Table.

    (TIF)

    S3 Fig. The NMF-derived single base substitution and double base substitution signatures identified in human skin cells.

    Total mutations corresponding to the signature in the cohort as determined by SigProfilerExtractor are shown.

    (TIF)

    S4 Fig. The correlation of the different UV-specific mutation signatures in the samples in this study.

    The total nTt➔nCt mutations and the CC➔TT mutations are plotted against the yCn➔yTn total mutation load in the samples. The black inclined line denotes the linear regression of the data, and the dotted black lines denote the 95% confidence intervals. The source data for this figure are in S4 Table.

    (TIF)

    S5 Fig. Comparison of the NMF-derived mutation signatures and the trinucleotide-specific mutation signatures in this study.

    The nCg➔nTg minimum mutation load in each sample is plotted against SBS1-associated mutations as determined by SigProfilerExtractor, and the yCn➔yTn minimum mutation load in each sample is plotted against SBS7b-associated mutations. The linear regression of the data is shown, and the dotted lines denote the 95% confidence intervals. The source data for this figure are in S3 and S4 Tables.

    (TIF)

    S6 Fig. The distribution of the allele frequencies of the whole exome sequenced bulk samples in this study.

    The source data for this figure is in S5 Table.

    (TIF)

    S7 Fig. The distribution of the allele frequencies of consensus alleles and cancer drivers.

    (A) The allele frequencies of the consensus SNVs identified in the bulk and the corresponding clones are shown. (B) The allele frequency distribution of the cancer driver mutations identified in the exome of the bulk samples. The source data for this figure is in S5 Table.

    (TIF)

    S8 Fig. The NMF-derived indel mutation signatures in this study.

    Total mutations corresponding to the signature as determined by SigProfilerExtractor are shown.

    (TIF)

    S9 Fig. The analyses of the impact of sex on mutation and indel load in the samples.

    The total base substitutions and the total indels in the clonal lineages derived from males and females in this study are shown. A Mann-Whitney U-test was used to determine if the distribution of mutation and indel load were statistically different between the two cohorts. The P-values for the base substitutions was 0.4041, while the P-value for the indels was 0.9401. The source data for this figure is in S10 Table.

    (TIF)

    S1 Table. Coverage statistics and donor characteristics for all samples sequenced in this study.

    (XLSX)

    S2 Table. The somatic base substitutions in the whole genome sequenced single skin cell clonal lineages.

    a) The fraction of all SNVs prior to filtering that correspond to each allele frequency bin. SNVs that corresponded to allele frequencies between 45% and 55% or above 90% were considered clonal. b) The exonic somatic base substitutions in the samples. c) The mutation spectra in the samples. The reverse complements are considered in the mutation spectra analyses.

    (XLSX)

    S3 Table. Agnostic base substitution signature analyses.

    a) The contributions of previously determined mutation signatures using MutationalPatterns. b) The number of mutations corresponding to each signature identified by SigProfilerExtractor

    (XLSX)

    S4 Table. Motif-specific mutation signature analyses.

    a) nCg➔nTg mutation signature analysis b)yCn➔yTn mutation signature analysis. c) nTt➔nCt mutation signature analysis. d) CC➔TT total dinucleotide substitutions.

    (XLSX)

    S5 Table. The somatic mutation list for whole exome sequenced bulk tissues.

    (XLSX)

    S6 Table. Somatic indels identified in the samples.

    (XLSX)

    S7 Table. The distribution of the types of indels in each sample as identified by SigProfilerMatrixGenerator.

    (XLSX)

    S8 Table. Agnostic indel signature analysis.

    a) The contributions of previously determined mutation signatures using MutationalPatterns. b) The number of mutations corresponding to each signature identified by SigProfilerExtractor

    (XLSX)

    S9 Table. The somatic templated insertions identified in human skin fibroblasts and melanocytes.

    (XLSX)

    S10 Table. The somatic genome changes in the samples along with the cell type, and the sex and race of the donors.

    a) The total number of somatic genome changes in the samples along with the cell type, and the sex and race of the donors. b) The median values and P-values for Mann-Whitney tests for pairwise comparisons of the somatic genome changes between the sets of clones obtained from White and African American donors.

    (XLSX)

    S11 Table. The somatic structural variants identified in the donors.

    a) All the somatic structural variants annotated for hotspots, common fragile sites and microhomologies. b) A Fisher's exact test for the use of microhomologies in the SVs that are present within or out of common fragile sites.

    (XLSX)

    Attachment

    Submitted filename: 2020_Saini_Response_to_Reviewers_V3.docx

    Data Availability Statement

    All BAM and MAF files are available under controlled access from the dbGaP database (phs001182.v2.p1 https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001182.v2.p1). All other data including the underlying numerical data for all of graphs and summary statistics are in Supplementary Tables. The R-code for analysis of the trinucleotide-specific mutation signatures can be accessed via https://github.com/NIEHS/P-MACD".


    Articles from PLoS Genetics are provided here courtesy of PLOS

    RESOURCES