Abstract
Purpose
Population-scale, exome-sequenced cohorts with linked electronic health records (EHR) permit genome-first exploration of phenotype. Phenotype and cancer risk are well characterized in children with a pathogenic DICER1 (HGNC ID:17098) variant. Here, the prevalence, penetrance, and phenotype of pathogenic germline DICER1 variants in adults were investigated in 2 population-scale cohorts.
Methods
Variant pathogenicity was classified using published DICER1 ClinGen criteria in the UK Biobank (469,787 exomes; unrelated: 437,663) and Geisinger (170,503 exomes; unrelated: 109,789) cohorts. In the UK Biobank cohort, cancer diagnoses in the EHR, cancer, and death registry were queried. For the Geisinger cohort, the Geisinger Cancer Registry and EHR were queried.
Results
In the UK Biobank, there were 46 unique pathogenic DICER1 variants in 57 individuals (1:8242; 95% CI: 1:6362-1:10,677). In Geisinger, there were 16 unique pathogenic DICER1 variants (including 1 microdeletion) in 21 individuals (1:8119; 95% CI: 1:5310-1:12,412). Cohorts were well powered to find larger effect sizes for common cancers. Cancers were not significantly enriched in DICER1 heterozygotes; however, there was a ∼4-fold increased risk for thyroid disease in both cohorts. There were multiple ICD10 codes enriched >2-fold in both cohorts.
Conclusion
Estimates of pathogenic germline DICER1 prevalence, thyroid disease penetrance, and cancer phenotype from genomically ascertained adults are determined in 2 large cohorts.
Keywords: DICER1, DICER1 syndrome, Health care population, Penetrance, Prevalence
Introduction
DICER1 (HGNC ID:17098) is essential for processing pre-microRNA into mature microRNA. Pathogenic and likely pathogenic (P/LP) germline variation in DICER1 underlies DICER1-related tumor predisposition, which is associated with increased risk for a variety of childhood tumors, including pleuropulmonary blastoma, cystic nephroma, nasal chondromesenchymal hamartoma, ciliary body medulloepithelioma, and a wide spectrum of sarcomas. Both children and adults can develop macrocephaly, multinodular goiter, and thyroid cancer. Females are at increased risk for ovarian sex-cord stromal tumors, including Sertoli-Leydig cell tumor and gynandroblastoma, and rhabdomyosarcoma of the cervix.1 Nontumor features are reported to include macrocephaly and dental, renal, and urinary tract anomalies,2,3 although replication of these observations is needed. In addition to P/LP germline DICER1 variants, specific somatic “hotspot” missense variants (S1344, E1705, D1709, G1809, D1810, and E1813) are found in DICER1-associated tumors in most cases.4 The lifetime penetrance of DICER1-associated neoplasms is modest, as determined by family-based studies.5 However, the estimated prevalence of DICER1 loss-of-function variants is more common than expected, ranging from 1/4600 to 1/10,600 individuals,6,7 including genomic ascertainment from 92,296 participants of the Geisinger MyCode Community Health Initiative.7
Phenotype and cancer risk are generally well characterized in children with a P/LP DICER1 variant. Here, the prevalence, penetrance, and phenotype of P/LP germline DICER1 variants in adults was investigated in 2 population-scale cohorts using revised American College of Medical Genetics and Genomics-Association for Molecular Pathology DICER1 variant classification criteria.8
Materials and Methods
UK Biobank cohort
The UK Biobank cohort consists of nearly half a million consented participants aged 37 to 69 years at the time of enrollment. Human subjects’ protection and review was through the North West Multi-centre Research Ethics Committee. In this study, 469,787 individuals who were exome sequenced were investigated.9 The total number of unrelated individuals (>3rd degree) was 437,663, which was calculated using R package “ukbtools” with function “ukb_gene_samples_to_remove.”
Geisinger DiscovEHR cohort
The DiscovEHR cohort consists of 170,503 participants aged 0 to 90+ who were enrolled in the Geisinger Health System.10 This study was approved by the Geisinger Institutional Review Board; participants consented for broad research use of their exome and linked electronic health record (EHR) data. The total number of unrelated individuals (>3rd degree) was 109,789.
DICER1 variant annotation, classification, and control selection
All DICER1 variants were annotated using snpEFF and ANNOVAR. Gene annotation was based on transcript NM_177438.3. Variants with GQ<30 and ABHet<0.2 were excluded from the analysis. DICER1 variants were classified using recently published ClinGen DICER1 Variant Curation Expert Panel criteria,8 excluding case-level clinical criteria codes. Exome-derived copy-number variation (CNV) was only examined in the Geisinger cohort because these data were not available from the UK Biobank exome cohort. A sample with a DICER1 copy-number loss was called from a referenced-based CLAMMS algorithm using exome sequencing and confirmed by Illumina chip array.11 Samples underwent quality assurance to remove samples of sex mismatch, other large chromosomal abnormalities, and outliers of derivative log ratio spread and genomic wave factors. Outliers were defined as 1.5 times the interquartile range from the third quartile of the distribution of derivative log ratio spread.12 From both cohorts, DICER1 heterozygotes (hereafter, “heterozygotes”) were selected if they harbored a P (Bayesian points ≥10), likely pathogenic (LP, Bayesian points: 6-9), copy-number deletion (Geisinger only), or a variant of uncertain significance (VUS) but probably LP (Bayesian point 5) variant; DICER1 controls were selected if they harbored a canonical (wild type), benign, or likely benign (B/LB) DICER1 variant.
EHR-linked phenotypes in heterozygotes and controls
For UK Biobank, EHR-linked phenotypes and date and age at diagnosis were retrieved on 2/23/23 from fields 41270/41280 (diagnosis-ICD10), 40001/40007 (underlying cause of death: ICD10), 40006/40008/40013 (Cancer Registry: type of cancer ICD10/ICD9). For Geisinger, EHR-linked phenotypes and age at diagnosis were retrieved in December 2022. Tumor Registry phenotypes and age at diagnosis were retrieved in April 2022 and reflect the most recent update.
Thyroid phenotype analyses were performed using ICD10 code of E01-E07 but excluded E03.2, E03.3, and E06.4 (codes for postinfection-induced and medication-induced hyperthyroidism). For noncancer analyses, all ICD10 codes were collapsed based on ICD10 category.
EHR review for Geisinger cohort
The EHR of heterozygotes was reviewed by a genetic counselor using a structured approach that captured imaging reports (brain, head/neck/sinus, chest, abdomen/pelvis, and extremities), procedures (major and minor surgeries), pathology reports, hospitalization records, and documentation of DICER1-associated clinical features, such as macrocephaly, sinusitis/chronic nasal congestion, dysmorphic features, and ophthalmology. In addition, keyword search for “PPB,” (pleuropulmonary blastoma, a DICER1-associated tumor) “DICER1,” and “DICER” was performed.7 Previously reviewed individuals were rereviewed for updated information.
Statistical analyses
Statistical analyses were performed using R version 4.1.0. All odds ratios were calculated from multivariate logistic regression analyses correcting for sex, ethnicity, body mass index (BMI), and smoking status. Forest plots were generated using R package “ggplot2.” SAIGE-GENE+ analysis was performed using R package SAIGE-GENE+ version 1.1.6.2.13 Power estimates were performed by adapting formulas from Chow et al14 to a cohort study setting.
For UK Biobank Kaplan-Meier (KM) time-to-event analyses, thyroid phenotypes were obtained from field 41270 and age at diagnosis were calculated from field 41280 minus year of birth. For Cox-proportional hazard models, sex, ethnicity, BMI, smoking history, and age at enrollment were used as covariates. For Geisinger KM analyses, the earliest recorded diagnoses of thyroid phenotypes in the EHR were used. Only diagnoses 3 months after the first encounter in the EHR were included in KM analyses. The last age of encounter was used as event age for censored subjects. Univariate and multivariate Cox-proportional hazards were used to compare differences between KM curves. Multivariate analyses adjusted for sex, current age, self-reported race/ethnicity, lifetime median BMI, and smoking status. The unrelated cohort in the Geisinger population was determined by random selection of one individual from each family with relationships determined by identity by descent to the third degree.15
Results
Prevalence of P/LP germline DICER1 variants in UK Biobank and Geisinger
In this study, 469,787 individuals from the UK Biobank and 170,503 individuals from Geisinger were investigated. Both cohort demographics were similar to those previously reported (Supplemental Table 1). Across both cohorts, there were a total of 60 unique DICER1 variants: 58 unique LP variants, 1 VUS but probably LP, and 1 CNV deletion (determined in Geisinger only) in 78 individuals (Supplemental Table 2 [hereafter, “P/LP variants”]). There were 2 variants observed in both cohorts (c.4050+1G>A and c.1525C>T p.(Arg509∗)). Of the total of 60 unique variants, 59 variants were putative loss of function (including the CNV deletion), and 1 was a missense “hotspot” variant (c.5127T>A p.(Asp1709Glu); previously reported7). In the UK Biobank, there were a total of 10 recurrently observed variants, where 4 recurrently observed variants were in related DICER1 heterozygotes. In Geisinger, there were a total of 3 recurrently observed variants, where 2 recurrently observed variants were in related DICER1 heterozygotes (Supplemental Table 2).
In the UK Biobank, 46 unique variants in 57 individuals (1:8242, 95% CI: 1:6362-1:10,677) were observed; 9 of these individuals were closely related (<3rd degree) in 4 pedigrees, giving an adjusted prevalence of 1:8416; 95% CI: 1:6418-1:11,035 (Table 1). Of the 57 DICER1 heterozygotes, 28 (49.1%) were male. The median age was 69.9 years (range: 55-84 years); 9 (15.8%) were deceased.
Table 1.
Unique and total DICER1 pathogenic/likely pathogenic variant counts and prevalence in the UK Biobank and Geisinger cohorts in all participants and unrelated participants
| Cohort | Unique Variant Count |
Number of Individuals | Prevalence |
|---|---|---|---|
| UK Biobank (all) | 46 | 57 | 1:8242 (95% CI: 1:6362-1:10,677) |
| 25 frameshift | |||
| 16 stopgain | |||
| 5 canonical splice site (copy-number analysis unavailable) | |||
| UK Biobank (unrelated) | 46 | 52 | 1:8416 (95% CI: 1:6418-1:11,035) |
| Geisinger (all) | 16 | 21 | 1:8119 (95% CI: 1:5310-1:12,412) |
| 9 frameshift | |||
| 3 stopgain | |||
| 2 canonical splice site | |||
| 1 missense | |||
| 1 CNV deletion | |||
| Geisinger (unrelated) | 16 | 18 | 1:6099 (95% CI: 1:3858-1:9641) |
CNV, copy-number variation.
See Supplemental Table 2 for details on variant information.
In Geisinger, 16 unique variants in 21 individuals (1:8119; 95% CI: 1:5310-1:12,412) were observed; 5 individuals were closely related (<3rd degree) in 2 pedigrees, giving an adjusted prevalence of 1:6099; 95% CI: 1:3858-1:9641) (Table 1). Of these 21 DICER1 heterozygotes, 6 (28.6%) were male. The median age was 58.3 years (range: 27-94 years); 2 (9.5%) were deceased. Seven of the DICER1 variants were previously reported by us in the Geisinger cohort7; by the Variant Curation Expert Panel classification rules, 4 DICER1 variants were reclassified as VUS (Supplemental Table 3).
Selection of controls in UK Biobank and Geisinger
In the UK Biobank, 398,883 individuals (unrelated: 373,897) who harbored B/LB variants or wild-type DICER1 were defined as controls. Likewise, in Geisinger cohort, 167,990 individuals (unrelated: 109,789) who harbored B/LB or canonical DICER1 variation were defined as controls.
Power to detect predisposition to common and rare cancers in UK Biobank and Geisinger
Supplemental Figure 1 shows power as a function of presumed true odds ratio for a range of cancer rates in the UK Biobank cohort using the cohort-specific DICER1 heterozygote prevalence of 1:8416. (The rate of cancer in the UK Biobank cohort is 24%.) Supplemental Figure 2 shows power as a function of odds ratio for a range of cancer rates in the Geisinger cohort using the cohort-specific DICER1 heterozygote prevalence of 1:8119. (The rate of all cancer in the Geisinger cohort is 24%.) Both estimates assume that the sequenced cohort is a true sample of the adult population and that there were no ascertainment biases. Under these ideal circumstances, in the UK Biobank cohort, there was sufficient power (≥80%) to detect common cancers (≥5% rate, which would include many sex-specific cancers, such as breast and prostate) with odds ratio of at least 2.93. In the smaller Geisinger cohort, there was sufficient power (≥80%) to detect common cancers ≥12% rate) with odds ratio of at least 3.64.
Cancer phenotypes
Because germline DICER1 P/LP variants are known to be associated with a variety of cancers16 in children, all cancer-associated ICD10 codes were examined. In the UK Biobank, only fields 40001/40007 (underlying cause of death: ICD10), 40006/40008 (Cancer Registry: type of cancer ICD10) were queried. In the Geisinger cohort, the Geisinger Cancer Registry (established in 1943) and linked EHR were queried. There were 14 and 6 cancers among the 12 (21%) and 5 (25%) DICER1 heterozygotes, which were not significantly enriched compared with controls (Fisher’s exact test, 0.7174 and 1, respectively) in the UK Biobank and Geisinger cohorts, respectively. Table 2 lists the neoplasms that were reported in the DICER1 heterozygotes in the 2 cohorts. Of note is that the meningioma observed in DICER1 heterozygotes in both the UK Biobank and Geisinger cohorts was coded as a malignancy. The individual in the Geisinger cohort with a DICER1 CNV deletion did not have a history of any tumors (benign or malignant).
Table 2.
C-codes reported in pathogenic/likely pathogenic DICER1 heterozygotes in the UK Biobank and Geisinger cohorts
| UK Biobank | ||||||
|---|---|---|---|---|---|---|
| DICER1 Variant | Sex | Age (y) | ICD9/10 | Age at Cancer (y) | Vital Status | |
| UKB1 | p.(Asp1414fs) | Female | 78 | C92.0 Acute myeloid leukemia | 77 | Died 78.1 (due to AML) |
| UKB2 | p.(Ser1344fs) | Female | 57 | 1944 Pineoblastoma | 15 | |
| D32.0 Meningiomac | 50 | |||||
| UKB3 | p.(Ser1216fs) | Female | 64 | 630 Choriocarcinoma | 23 | |
| UKB4 | p.(Phe854fs) | Female | 72 | C50.5 Breast cancer | 62 | |
| UKB5 | p.(Tyr936a) | Female | 81 | 1744 Breast cancer | 50 | |
| UKB6 | p.(Leu303fs) | Female | 67 | C50.4 Breast cancer | 56 | Died 67 (due to breast cancer) |
| UKB7 | p.(Cys1584fs) | Male | 76 | C61 Prostate cancer | 67 | |
| C44.4 Skin cancer, squamous cell carcinoma | 71 | |||||
| UKB8 | p.(Gln105a) | Female | 72 | C25.1 Pancreatic cancer | 68 | Died 72 (due to pancreatic cancer) |
| UKB9 | p.(Glu221a) | Female | 79 | C56 Ovarian cancer | 58 | |
| C50.4 Breast cancer | 68 | |||||
| UKB10 | p.(Val121fs) | Male | 76 | C91.1 Chronic lymphocytic leukemia | 72 | Died 76 |
| UKB11 | p.(Ser1629fs) | Female | 69 | C44.3 Skin cancer, basal cell carcinoma | 47 | |
| UKB12 | p.(Ser1101fs) | Male | 67 | C43.6 Skin cancer, superficial spreading melanoma | 62 | |
| Geisinger | ||||||
|---|---|---|---|---|---|---|
| DICER1 Variant | Sex | Age (y) | ICD10 | Age at Cancer (y) | Vital Status | |
| GHS1 | c.4050+1G>A | Female | 62 | C50.2 Breast cancer | 54 | |
| GHS2a | p.(Asn1668fs) | Male | 55 | C73.9 Papillary thyroid cancer | 36 | |
| GHS3a | p.(Ser1823fs) | Female | 44 | C75.3 Pineoblastoma | 14 | Died 44 |
| C70.0 Meningioma | 41 | |||||
| GHS4b | p.(Glu128fs) | Male | 67 | C34.1 Lung cancer | 62 | |
| GHS5b | p.(Glu128fs) | Female | 65 | C34.3 Lung cancer | 54 | |
AML, acute myeloid leukemia.
Please refer to Supplemental Table 2 for additional variant information.
Previously reported.
Related individuals.
Meningioma found in this participant was reported in Cancer Registry with malignant histology.
Risk and penetrance of thyroid phenotypes
DICER1 heterozygotes were significantly enriched for thyroid disease (E01-E07; excluding medication/postinfection-associated thyroid codes E03.2, E03.3 and E06.4) with an approximately 4-fold increased risk for these phenotypes (Figure 1) in both cohorts. In both cohorts, DICER1 heterozygotes harbored only the E03 (“other hypothyroidism”), E04 (“other non-toxic goiter”), and E05 (“thyrotoxicosis”) codes; these 3 ICD10 codes were evaluated separately (Figure 1). DICER1 heterozygotes in UK Biobank were significantly enriched in the E03 and E04 codes, whereas Geisinger DICER1 heterozygotes were only significant for E04 (Figure 1). Because the Geisinger cohort was enriched for related individuals, to correct for relatedness, a SAIGE-GENE+ analysis was performed. SAIGE analysis confirmed that the thyroid phenotype (E01-E07 codes) was enriched in DICER1 heterozygotes compared with controls (P = .004). The individual in the Geisinger cohort with a CNV deletion did not have a history of thyroid disease.
Figure 1.
Risk of thyroid phenotypes in DICER1 heterozygotes in UK Biobank and Geisinger. Hypothyroidism (E03), nontoxic goiter (E04), thyrotoxicosis (E05), and all thyroid codes (E01-E07)-adjusted odd ratios, 95% CI, and adjusted P value were calculated using logistic regression with age at enrollment (only for UK Biobank), sex, race, BMI, and smoking status.
The cumulative incidence (penetrance) of thyroid phenotypes was also evaluated using KM time-to-event and Cox-proportional hazard analyses. In the UK Biobank, the age-related penetrance of thyroid phenotypes for P/LP DICER1 variants was significantly higher than controls (hazard ratio [HR] 4.2 [95% CI: 2.5-7.1], log-rank P = 7.28E−8) and remained significant after adjusting for race, smoking, BMI, sex, and age at enrollment in a Cox-proportional hazard model (HR 4.3 [95% CI: 2.5-7.3], log-rank P = 4.86E−8) (Figure 2A, UK Biobank, all participants). Similar results were observed when the analyses were restricted to unrelated individuals in the cohort (HR 4.5 [95% CI: 2.6-7.7], log-rank P = 6.99E−8) (Supplemental Figure 3, UK Biobank, unrelated). Likewise, in the Geisinger cohort, a higher penetrance of thyroid phenotypes was observed in DICER1 heterozygotes compared with controls (HR 1.9 [95% CI: 0.9-3.8], log-rank P = .07). The difference was not statistically significant and remained not significant after adjusting for covariates (HR 1.9 [0.9-4.0], log-rank P = .08) likely because of the small sample size (Figure 2B [Geisinger, all participants]). Similar results were observed when the analyses were restricted to unrelated individuals (HR 1.8 [0.9-3.5], log-rank P = .11) (Supplemental Figure 4 [Geisinger, unrelated]).
Figure 2.
Cumulative incidence of thyroid phenotypes. Reverse Kaplan-Meier plots in DICER1 heterozygotes compared with controls for (A) UK Biobank and (B) Geisinger (adjusted for relatedness). Thyroid ICD10 codes, including E01-E07 (but excluding E03.2, E03.3, and E06.4), are considered events. The log-rank test P values from Cox-proportional hazard model were adjusted for age at enrollment (only for UK Biobank), sex, race, BMI, smoking status. For (B), left-truncation bias correction was applied using first encounter date.
Genomic ascertainment of excess ICD10 phenotypes in DICER1 heterozygotes
Germline P/LP variants in DICER1 are linked to macrocephaly,17 as well as structural renal anomalies and dental and ocular phenotypes.2,3,18, 19, 20 The prevalence of noncancer ICD codes in DICER1 heterozygotes was compared with controls. In the UK Biobank cohort, any ICD10 code observed in more than 10% (at least 6) of DICER1 heterozygotes was tested for enrichment; for ICD10 (field id 41270), this resulted in a total of 24 ICD10 codes (Supplemental Table 4). Age of onset of diagnosis was not significantly different between DICER1 heterozygotes and controls (Supplemental Table 5). Six ICD10 codes were enriched >2-fold compared with controls (Supplemental Table 5); 2 of those phenotypes (base codes R50 and Z11) were statistically significant after Bonferroni correction (Figure 3A). To determine if the UK Biobank DICER1 heterozygotes with retention of urine (base code R33) and acute kidney failure (base code N17) harbored renal or urinary tract structural abnormalities, abdominal MRI (UK Biobank field 12224) was reviewed. None of the 57 DICER1 heterozygotes had an abdominal MRI performed. A review of all ICD10 codes in the 6 UK Biobank participants with an acute kidney failure code did not reveal a specific cause of that diagnosis, although all of them also harbored chronic renal failure and hypertension codes.
Figure 3.
Risk of other phenotypes in DICER1 heterozygotes. ICD10 codes in (A) UK Biobank and (B) Geisinger enriched in DICER1 heterozygotes vs controls adjusted odd ratios and 95% CI were calculated using logistic regression with age at enrollment (only for UK Biobank), sex, race, BMI, smoking status. Red font denotes statistical significance after Bonferroni correction (P < .002 for UK Biobank and P < .0004 for Geisinger).
In the Geisinger cohort, any ICD10 code observed in more than 20% (at least 4) of DICER1 heterozygotes was tested for enrichment; this resulted in a total of 112 ICD10 codes (Supplemental Table 6). There were 18 ICD10 codes enriched >2-fold compared with controls (Supplemental Table 5), and 2 phenotypes, E89 (“Postprocedural endocrine and metabolic complications and disorders, not elsewhere classified”) and E04 (“Other nontoxic goiter”) were statistically significant after Bonferroni correction (Figure 3B). Age of onset of diagnosis was not significantly different between DICER1 heterozygotes and controls after Bonferroni correction (Supplemental Table 5). There was no overlap in ICD codes across the 2 cohorts.
Structured chart review of 21 DICER1 heterozygotes in the Geisinger cohort
In the Geisinger cohort, a genetic counselor performed a structured chart review of the 21 DICER1 heterozygotes. The review confirmed diagnoses from the ICD codes. The keyword search revealed 1 male with an allele for a splice acceptor variant c.1510-1G>A had bilateral thyroid cysts at age 41 years, with a son (not a participant in the DiscovEHR cohort) with a history of PPB. None of the 21 DICER1 heterozygotes had notation of “DICER1” (or similar) in their EHR.
Discussion
In this analysis, the prevalence of a DICER1 P/LP variant was determined and risk of thyroid, cancer, and noncancer phenotypes was quantified after genomic ascertainment in adults in 2 population-scale exome-sequenced cohorts linked to the electronic health record. The 2 estimates of P/LP DICER1 variant prevalence from the 2 different cohorts are remarkably congruent. The prevalence of a DICER1 P/LP variant in the Geisinger cohort in this analysis was less frequent compared with a previously published estimate,7 likely because of the use of more conservative and peer-reviewed variant classification.8 To our knowledge, estimates of P/LP DICER1 variant prevalence from the UK Biobank have not been previously reported.
The increased likelihood to develop multinodular goiter in DICER1 heterozygotes is well established.21 From a family-based cohort study of DICER1 heterozygotes, the cumulative incidence (penetrance) of multinodular goiter (or history of thyroidectomy) by age 40 years was estimated to be 75% in women and 17% in men.19 After genomic ascertainment, a significantly elevated risk of thyroid disease (not just multinodular goiter) was observed and quantified in the 2 different cohorts; a significant excess risk of “other nontoxic goiter” was also observed in both cohorts. The significant excess cumulative incidence of thyroid disease in UK Biobank (but not Geisinger) may be due to its larger sample size and lower rate of thyroid diagnoses in controls, which may reflect differences in the background rates of clinical thyroid imaging and testing at Geisinger and in the UK.
A comprehensive approach was taken to identify cancers in both cohorts. Of the 6 malignancies among the DICER1 heterozygotes in the Geisinger cohort, 2 (papillary thyroid cancer and pineoblastoma) were known to be DICER1-associated and had been previously reported.7 Three malignancies (breast and lung) were common and are not known to be associated with germline variation in DICER1. Although meningioma is typically histologically benign, the tumor observed in the Geisinger DICER1 heterozygote was assigned an ICD C-code by the Geisinger Cancer Registry. In the UK Biobank, there were 14 malignancies; only 1 (pineoblastoma) was known to be associated with variation in DICER1. Both participants in the 2 cohorts with a pineoblastoma developed a meningioma, presumably secondary to radiation exposure, decades after the diagnosis of pineoblastoma; meningioma is not known to be associated with germline DICER1 variation. With 1 exception, the remaining malignancies were common, diagnosed at a typical age for the tumor type, and not known the be associated with germline variation in DICER1. The 1 exception was the diagnosis of “choriocarcinoma” in a female at age 23 years made in the early 1980s in the UK Biobank cohort, well before the recognition of DICER1-related tumor predisposition. Overall, although the lack of a significant excess of cancers among older DICER1 heterozygotes vs controls in both cohorts is noteworthy, this study is underpowered to detect a modest (<2.5×, UK Biobank; <3.5×, Geisinger) effect sizes, especially for sex-specific common cancers. However, given the power calculation assumptions, both cohorts (but especially UK Biobank) were well powered to find larger effect sizes (>3.5×) for common cancers that would be expected from a cancer susceptibility gene of moderate-to-high penetrance. The number of observations was modest and was acknowledged and merits replication in genomically ascertained cohorts.
An exploratory analysis was performed to investigate the excess (if any) of ICD codes in genomically ascertained DICER1 heterozygotes vs controls. The top significant findings after Bonferroni correction from the UK Biobank cohort (“fever of other and unknown origin” and “encounter for screening for infectious and parasitic diseases”; Figure 3A, Supplemental Table 4) are of unclear etiology. The remaining 4 excess ICD codes (whether statistically significant or not) in the UK Biobank cohort are plausibly related to known DICER1 biology. One (“other hypothyroidism” [E03]) is noted above. Two codes (“retention of urine” [R33] and “acute kidney failure” [N17]) may be related to abnormalities in the genitourinary system. Renal cysts and structural kidney and ureter abnormalities are documented in DICER1 heterozygotes,1,3 and there is evidence of a role for DICER1 in kidney pathology22 and proteinuria from a rare-variant PheWAS using UK Biobank data.23 Lastly, 1 code (“Other diseases of intestine” [K63]) arose from 7 individuals with a code for “polyp of colon”; all diagnoses were made at age 60 years or older. Although juvenile-type polyps in the small intestine have been reported in DICER1 heterozygotes or children with PPB,24,25 a link to an excess risk of colon polyps (of any type) has not been established.
The top significant excess of ICD codes after Bonferroni correction from the Geisinger cohort (Figure 3B, Supplemental Table 6) are plausibly related to variation in DICER1: “post-procedure care from endocrine procedures” (eg, thyroidectomy) and “other nontoxic goiter.” Of the remaining 16 excess ICD codes (whether statistically significant or not) in the Geisinger cohort, some may plausibly arise from germline DICER1 variation based on known clinical features and biology. These include “other retinal disorders” (H35), which includes 5 diagnoses related to macular degeneration, a process linked to DICER1 deficit18,26 in humans. “Chronic rhinitis, nasopharyngitis, and pharyngitis” (J31) is intriguing given the increased risk of nasal chondromesenchymal hamartomas in DICER1 heterozygotes.27 The nominal excess of “pneumonia, unspecified organism” (J18) codes is of unclear etiology. Lastly, there were multiple orthopedic diagnoses related to spondylosis (M46, M67, and M47). Interestingly, in a rare-variant PheWAS using UK Biobank data,23 a DICER1 variant was observed in a nongenome-wide significant excess in individuals with a code of “other spondylosis with myelopathy (cervical region).”
There are limitations to this study. ICD coding was created for use in billing, not research. The studied cohorts are predominantly of European ancestry. The modest number of individuals with DICER1 P/LP variants limit power. CNV was not available from the UK Biobank; thus, DICER1 heterozygotes were likely undercounted as cases and may be in the control group, but these effects were likely modest. The median age in both cohorts was around 60 years. Some DICER1 heterozygotes may not have survived long enough to enroll in a study of older adults. Of those that did survive, recall bias and/or lost records may have frustrated attempts to document childhood illness, including severe ones such as cancer. In the Geisinger cohort, there were ∼3000 individuals <18 years old; however, none of them harbored a DICER1 P/LP variant.
Enrollment in the studied populations was subject to ascertainment biases as individuals with certain early childhood conditions leading to death or disabilities would be less likely to participate in these cohorts. The “healthy volunteer” bias (compared with the UK population) of the UK Biobank has been reported before.28
In summary, genomic ascertainment of germline DICER1 P/LP variants in adults in 2 population-scale, exome-sequenced cohorts provided refined determinations of their prevalence, estimated the penetrance of thyroid disease in older adults, and quantified excess ICD diagnoses. Additional studies in larger, more diverse cohorts are needed to follow-up these observations.
Data Availability
The data supporting the findings of this article are reported in the main text, figures, and tables. Data to reproduce the results are available to qualified academic noncommercial researchers under a data access agreement.
ORCIDs
Jung Kim: http://orcid.org/0000-0001-6274-2841
Jeremy Haley: http://orcid.org/0000-0002-3279-7873
Jessica N. Hatton: http://orcid.org/0000-0003-2104-171X
Uyenlinh L. Mirshahi: http://orcid.org/0000-0003-4972-5451
H. Shanker Rao: http://orcid.org/0000-0001-6827-1470
Mark F. Ramos: http://orcid.org/0000-0002-5069-2397
Diane Smelser: http://orcid.org/0000-0002-3925-8864
Gretchen M. Urban: http://orcid.org/0009-0002-0990-6872
Kris Ann P. Schultz: http://orcid.org/0000-0002-1788-5832
David J. Carey: http://orcid.org/0000-0001-6404-1950
Douglas R. Stewart: http://orcid.org/0000-0001-8193-1488
Conflict of Interest
The authors declare no conflicts of interest.
Acknowledgments
The authors would like to acknowledge the participants of the MyCode Community Initiative for the use of their genomic and electronic health information, without which parts of this study would not be possible. This work used the computational resources of the National Institutes of Health High Performance Computing Biowulf cluster.
Funding
This work was supported by the Intramural Research program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD. Kris Ann Schultz receives funding from National Institutes of Health National Cancer Institute grant R01/R37CA244940, Children’s Minnesota Foundation, Pine Tree Apple Classic Fund and Rein in Sarcoma. The patient enrollment and exome sequencing were funded by Regeneron Genetics Center (Tarrytown, NY).
Author Information
Conceptualization: D.R.S., J.K.; Data Curation: J.K., J.H., J.N.H., U.L.M., G.M.U.; Formal Analysis: J.K., J.H., H.S.R., M.F.R., U.L.M., D.S.; Funding Acquisition: D.R.S.; Project Administration: D.J.C., D.R.S.; Supervision: D.J.C., D.R.S.; Writing-original draft: J.K., D.R.S.; Writing-review and editing: J.K., J.H., J.N.H., U.L.M., H.S.R., M.F.R., D.S., G.M.U., K.A.P.S., D.J.C., D.R.S.
Ethics Declaration
For UK Biobank, human subjects’ protection and review was through the North West Multi-centre Research Ethics Committee as a Research Tissue Bank (https://www.ukbiobank.ac.uk/learn-more-about-uk-biobank/about-us/ethics). This approval does not require each researcher to obtain a separate institutional review board approval. This research has been conducted using the UK Biobank Resource under Application Number 54389. For Geisinger, study was approved by the Geisinger Institutional Review Board; participants consented for broad research use of their exome and linked electronic health record (EHR) data. Data from both the Geisinger and UK Biobank studies was deidentified.
Footnotes
The Article Publishing Charge (APC) for this article was paid by Douglas R. Stewart.
Additional Information
The online version of this article (https://doi.org/10.1016/j.gimo.2024.101846) contains supplemental material, which is available to authorized users.
Additional Information
References
- 1.Schultz K.A.P., Stewart D.R., Kamihara J., et al. In: GeneReviews®. Adam M.P., Mirzaa G.M., Pagon R.A., et al., editors. University of Washington; Seattle: 1993-2020. DICER1 tumor predisposition; pp. 1–34. [PubMed] [Google Scholar]
- 2.Choi S., Lee J.S., Bassim C.W., et al. Dental abnormalities in individuals with pathogenic germline variation in DICER1. Am J Med Genet A. 2019;179(9):1820–1825. doi: 10.1002/ajmg.a.61292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Khan N.E., Ling A., Raske M.E., et al. Structural renal abnormalities in the DICER1 syndrome: a family-based cohort study. Pediatr Nephrol. 2018;33(12):2281–2288. doi: 10.1007/s00467-018-4040-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.de Kock L., Wu M.K., Foulkes W.D. Ten years of DICER1 mutations: provenance, distribution, and associated phenotypes. Hum Mutat. 2019;40(11):1939–1953. doi: 10.1002/humu.23877. [DOI] [PubMed] [Google Scholar]
- 5.Stewart D.R., Best A.F., Williams G.M., et al. Neoplasm risk among individuals with a pathogenic germline variant in DICER1. J Clin Oncol. 2019;37(8):668–676. doi: 10.1200/JCO.2018.78.4678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kim J., Field A., Schultz K.A.P., Hill D.A., Stewart D.R. The prevalence of DICER1 pathogenic variation in population databases. Int J Cancer. 2017;141(10):2030–2036. doi: 10.1002/ijc.30907. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Mirshahi U.L., Kim J., Best A.F., et al. A genome-first approach to characterize DICER1 pathogenic variant prevalence, penetrance, and phenotype. JAMA Netw Open. 2021;4(2) doi: 10.1001/jamanetworkopen.2021.0112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hatton J.N., Frone M.N., Cox H.C., et al. Specifications of the ACMG/AMP variant classification guidelines for germline DICER1 variant curation. Hum Mutat. 2023;2023 doi: 10.1155/2023/9537832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Backman J.D., Li A.H., Marcketta A., et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599(7886):628–634. doi: 10.1038/s41586-021-04103-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Carey D.J., Fetterolf S.N., Davis F.D., et al. The Geisinger MyCode community health initiative: an electronic health record-linked biobank for precision medicine research. Genet Med. 2016;18(9):906–913. doi: 10.1038/gim.2015.187. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Packer J.S., Maxwell E.K., O’Dushlaine C., et al. CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data. Bioinformatics. 2016;32(1):133–135. doi: 10.1093/bioinformatics/btv547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Fortier N., Rudy G., Scherer A. Detection of CNVs in NGS data using VS-CNV. Methods Mol Biol. 2018;1833:115–127. doi: 10.1007/978-1-4939-8666-8_9. [DOI] [PubMed] [Google Scholar]
- 13.Zhou W., Nielsen J.B., Fritsche L.G., et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet. 2018;50(9):1335–1341. doi: 10.1038/s41588-018-0184-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Chow S.C., Shao J., Wang H. 2nd ed. Chapman & Hall/CRC; 2008. Sample Size Calculations in Clinical Research. [Google Scholar]
- 15.Staples J., Maxwell E.K., Gosalia N., et al. Profiling and leveraging relatedness in a precision medicine cohort of 92,455 exomes. Am J Hum Genet. 2018;102(5):874–889. doi: 10.1016/j.ajhg.2018.03.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Kommoss F.K.F., Chong A.S., Chong A.L., et al. Genomic characterization of DICER1-associated neoplasms uncovers molecular classes. Nat Commun. 2023;14(1):1677. doi: 10.1038/s41467-023-37092-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Khan N.E., Bauer A.J., Doros L., et al. Macrocephaly associated with the DICER1 syndrome. Genet Med. 2017;19(2):244–248. doi: 10.1038/gim.2016.83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kaneko H., Dridi S., Tarallo V., et al. DICER1 deficit induces Alu RNA toxicity in age-related macular degeneration. Nature. 2011;471(7338):325–330. doi: 10.1038/nature09830. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Khan N.E., Bauer A.J., Schultz K.A.P., et al. Quantification of thyroid cancer and multinodular goiter risk in the DICER1 syndrome: a family-based cohort study. J Clin Endocrinol Metab. 2017;102(5):1614–1622. doi: 10.1210/jc.2016-2954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Huryn L.A., Turriff A., Harney L.A., et al. DICER1 syndrome: characterization of the ocular phenotype in a family-based cohort study. Ophthalmology. 2019;126(2):296–304. doi: 10.1016/j.ophtha.2018.09.038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Rio Frio T., Bahubeshi A., Kanellopoulou C., et al. DICER1 mutations in familial multinodular goiter with and without ovarian Sertoli-Leydig cell tumors. JAMA. 2011;305(1):68–77. doi: 10.1001/jama.2010.1910. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ho J.J., Marsden P.A. Dicer cuts the kidney. J Am Soc Nephrol. 2008;19(11):2043–2046. doi: 10.1681/ASN.2008090986. [DOI] [PubMed] [Google Scholar]
- 23.Wang Q., Dhindsa R.S., Carss K., et al. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature. 2021;597(7877):527–532. doi: 10.1038/s41586-021-03855-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Priest J.R., Williams G.M., Hill D.A., Dehner L.P., Jaffé A. Pulmonary cysts in early childhood and the risk of malignancy. Pediatr Pulmonol. 2009;44(1):14–30. doi: 10.1002/ppul.20917. [DOI] [PubMed] [Google Scholar]
- 25.Foulkes W.D., Bahubeshi A., Hamel N., et al. Extending the phenotypes associated with DICER1 mutations. Hum Mutat. 2011;32(12):1381–1384. doi: 10.1002/humu.21600. [DOI] [PubMed] [Google Scholar]
- 26.Tarallo V., Hirano Y., Gelfand B.D., et al. DICER1 loss and Alu RNA induce age-related macular degeneration via the NLRP3 inflammasome and MyD88. Cell. 2012;149(4):847–859. doi: 10.1016/j.cell.2012.03.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Stewart D.R., Messinger Y., Williams G.M., et al. Nasal chondromesenchymal hamartomas arise secondary to germline and somatic mutations of DICER1 in the pleuropulmonary blastoma tumor predisposition disorder. Hum Genet. 2014;133(11):1443–1450. doi: 10.1007/s00439-014-1474-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Fry A., Littlejohns T.J., Sudlow C., et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am J Epidemiol. 2017;186(9):1026–1034. doi: 10.1093/aje/kwx246. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data supporting the findings of this article are reported in the main text, figures, and tables. Data to reproduce the results are available to qualified academic noncommercial researchers under a data access agreement.



