Abstract
Background:
Although respiratory pathology is known to develop in young children with cystic fibrosis (CF), the determinants of early-onset lung disease have not been elucidated.
Objective:
We aimed to determine the impact of potential intrinsic and extrinsic risk factors during the first 3 years of life, testing the hypothesis that both contribute significantly to early-onset CF lung disease.
Design:
We studied 104 infants born during 2012-2017, diagnosed through newborn screening by age 3 months and evaluated comprehensively to 36 months of age. Lung disease manifestations were quantified with a new scoring system known as CFELD for Cystic Fibrosis Early-onset Lung Disease. The variants in the cystic fibrosis transmembrane conductance regulator (CFTR) gene were determined and categorized. Whole genome sequencing was performed on each subject and the data transformed to polygenic risk scores (PRS) that aggregate variants associated with lung function. Extrinsic factors included socioeconomic status (SES) indicators and environmental experiences such as exposures to smoking, pets, and daycare.
Results:
We found by univariate analysis that CFTR genotype and genetic modifiers aggregated by the PRS method were significantly associated with early-onset CF lung disease. Ordinal logistic regression analysis demonstrated that high and stable SES (maternal education ≥community college, stable 2-parent home, and not receiving Medicaid) and better growth (weight-for-age and height-for-age z-scores) reduced risks, while exposure to smoking and daycare ≥20 hours/week increased the risk of CFELD severity.
Conclusions:
Extrinsic, modifiable determinants are influential early and potentially as important as the intrinsic risk factors in the onset of CF lung disease.
Keywords: cystic fibrosis, whole genome sequencing, polygenic risk score, socioeconomic status, maternal education
INTRODUCTION
The course of cystic fibrosis (CF) due to pathogenic variants in the cystic fibrosis transmembrane conductance regulator (CFTR) gene is dominated by its impact on the respiratory system.1 It is known from histopathological studies on autopsied infants2,3 and chest CT observations4–6 on patients diagnosed through newborn screening that lung disease develops as early as 10 weeks of age,4,5 and nearly one-third have bronchiectasis by 2-4 months of age.6 In addition, early development of lung disease, defined herein as during the first 3 years of life,7 correlates with its progression in children with CF.8,9 Yet very little attention has been given to the timing and determinants of early onset of CF lung disease, nor to the interplay of intrinsic/genetic10–12 and extrinsic/environmental risk factors in young children that potentially determine prognosis as patients age.13–15 Thus, important gaps in knowledge exists with regard to early-onset of lung disease, but there are clues from a variety of studies.11–16 This topic is increasingly important in the era of CFTR modulators recently approved for children17 because decision-making on when to initiate such expensive therapy can be challenging.
The characteristic clinical features of CF and our unique cohort of young children in the longitudinal FIRST (Feeding Infants Right… from the STart) project18 enabled us to develop a clinical severity scoring system for infants and toddlers that we refer to as CFELD for CF Early-onset Lung Disease.7 Its first application confirmed the heterogeneity of CF lung disease, particularly in its onset between 1 and 3 years of age.7 Because we performed whole genome sequencing (WGS) on the majority of patients,19 we were afforded an unprecedented opportunity to study the impact of CFTR genotype and potential non-CFTR genetic factors (modifiers) in addition to extrinsic risk factors. The latter gap was of special interest in this study since lung disease manifestations vary greatly even among patients with the same CFTR genotype such as homozygous F508del,10 yet genetic modifiers have not been investigated before with regard to the onset of lung disease in young children with CF. Rather than focusing on one or a limited number of potential modifiers to evaluate risk alleles, we elected to employ the polygenic risk score (PRS) strategy20 using aggregated variants to estimate an overall genetic effect on CFELD severity21 —a method that has been used successfully in cardiovascular22 and cancer research.20 The prospective FIRST study design provided us with an abundance of extrinsic variables recorded systematically that include care variations and socioeconomic status (SES) indicators.13–15 Our hypothesis is that extrinsic/environmental factors are at least as important as genetic predisposition for developing the early-onset lung disease phenotype. Thus, the overall goal of this study was to delineate prospectively and comprehensively the effect of genetic and socioenvironmental determinants on early-onset lung disease in children with CF during the first 3 years of life.
MATERIALS AND METHODS
Study Design and Population
The complete FIRST cohort and its design and data collection are described elsewhere.7,18 Briefly, we evaluated children born during 2012-17, enrolled after diagnosis through newborn screening at six CF Centers (Madison and Milwaukee, WI; Boston, MA; Indianapolis, IN; Salt Lake City, UT and Chicago, IL), and followed with study visits conducted in conjunction with routine care according to clinical practice guidelines for treatment and regular follow up evaluations,23,24 i.e., monthly after diagnosis until age 6 months, bi-monthly from 6 to 12 months, and every 3 months thereafter. The FIRST project was approved by the Institutional Review Boards at all participating institutions. Informed written consent was obtained from the parents/guardians of all participating patients. The WGS component19 was initiated in 2017. Of the 145 infants who completed follow-up to ≥3 years of age7,18 thus eligible for WGS, 116 subjects’ families were available to approach for consent, and 90% (N=104) agreed.
Outcome measure: CF early-onset lung disease (CFELD) severity
The development of the CFELD scoring system was recently published.7 Briefly, clinical manifestations of CF lung disease (respiratory symptoms, pulmonary exacerbations, microbial infections, and hospitalizations) were recorded systematically on a pulmonary interval history form at each CF center visit (totaling 1612 forms from 104 subjects in their first 3 years of life). These data were used to derive an overall CFELD score for each subject, ranging from zero (best) to 12 (worst). The CFELD scores were further classified into five severity categories: asymptomatic (zero), minimal (1-2), mild (3-4) , moderate (5-8), and severe (9-12).
Extrinsic risk factors: socioenvironmental exposure and CF clinical care
Data on SES and environmental risk factors listed in Table 1 were collected at enrollment and annually thereafter via parent questionnaires (available in the online supplement), with a total of 474 questionnaires collected in the first 3 years of life. CF clinical care data included initiation of CF therapy as indicated by the age at the first CF center visit, pancreatic sufficiency or insufficiency as assessed by serial measurements of fecal elastase-1 as described previously,18 and use of CFTR modulators (ivacaftor after 2015 and ivacaftor/lumacaftor after 2018) that were prospectively recorded in pulmonary interval history forms.7 In addition, growth (recumbent length before 2 years of age, standing height after 2 years of age, and weights) was measured at each study visit, and z-scores for weight-for-age and length-for-age were calculated based on the WHO references for age 0-24 months and CDC references for age 25-36 months as recommended by the CDC and the American Academy of Pediatrics.25,26
Table 1.
Characteristics of the study cohort
WGS cohort (n=104) |
Whole cohort (n=145) |
|
---|---|---|
CF-related characteristics: | ||
Age at first CF center visit | ||
At birth (meconium ileus presentation) | 16 (15%)a | 23 (16%) |
1-2 weeks | 55 (53%) | 80 (55%) |
3-4 weeks | 21 (20%) | 26 (18%) |
≥5 weeks | 12 (12%) | 16 (11%) |
Pancreatic phenotype at age 3 years by fecal elastase-1 | ||
Pancreatic sufficient (PS)b | 9 (9%) | 19 (13%) |
Pancreatic insufficient (PI) | 95 (91%) | 126 (87%) |
CFTR genotype | ||
Homozygous F508del | 52 (50%) | 74 (51%) |
F508del and another PI-associated variantc | 38 (36%) | 48 (33%) |
Non-F508del PI-associated variants in both alleles | 6 ( 6%) | 7 ( 5%) |
At least one PS-associated variantd | 8 ( 8%) | 16 (11%) |
CFTR modulator (ivacaftor) therapy before 3 years of age | 10 ( 8%) | 11 ( 8%) |
Age starting ivacaftor therapy (years) | 1.9 ± 0.7 | 1.9 ± 0.7 |
Years of ivacaftor therapy | 1.2 ± 0.7 | 1.1 ± 0.7 |
Growth status at birth and 3 years of age | ||
Weight-for-age z-score at birth | −0.22 ± 1.03 | −0.18 ± 1.04 |
Length-for-age z-score at birth | 0.15 ± 1.21 | 0.21 ± 1.18 |
Weight-for-age z-score at 3 years of age | 0.19 ± 0.87 | 0.20 ± 0.86 |
Length-for-age z-score at 3 years of age | −0.08 ± 0.88 | −0.07 ± 0.86 |
Demographic and socioeconomic status (SES) characteristics: | ||
Female sex | 47 (45%) | 66 (46%) |
Race: Non-White | 3 ( 3%) | 4 (3%) |
Ethnicity: Hispanic | 6 ( 6%) | 7 (5%) |
Parental education at or above community college | ||
Both parents | 61 (59%) | 86 (59%) |
Mother only | 13 ( 9%) | 14 (12%) |
Father only | 9 ( 9%) | 13 ( 9%) |
Neither parent | 21 (20%) | 32 (22%) |
Household annual income | ||
< $40,000 | 33 (32%) | 41 (28%) |
$40,000-$79,000 | 39 (37%) | 47 (32%) |
≥$80,000 | 32 (31%) | 54 (37%) |
Having stable 2-parent home in the first 3 years of lifee | 88 (85%) | 126 (87%) |
Receiving Medicaid or other public health insurance | 41 (39%) | 53 (37%) |
Environmental characteristics: | ||
Having sibling(s) with CF | 16 (15%) | 23 (16%) |
Having any pets at home | 67 (64%) | 97 (67%) |
Dogs | 55 (53%) | 79 (54%) |
Cats | 27 (26%) | 39 (27%) |
Exposure to passive smoke | 10 (10%) | 15 (10%) |
Daycare attendance before 3 years of age | ||
No daycare | 83 (80%) | 113 (78%) |
6-9 hours per week | 3 ( 3%) | 5 ( 3%) |
≥20 hours per week | 18 (17%) | 27 (19%) |
Values are N(%) of subjects or mean ± SD.
Defined by all fecal eastase-1 values >200 μg/g during the first 3 years of life.
See Table 2.
Defined by 2-parent households that did not change during the first 3 years of life.
Intrinsic risk factors: demographic characteristics and genetic factors
Demographics (race and ethnicity) were recorded by the research coordinators at enrollment.7 Two genetic factors were examined, namely, CFTR genotype and PRS constructed to aggregate variants associated with lung function. Data on the CFTR gene variants were obtained from newborn screening reports and confirmed with WGS. Thirty-one variants were identified our cohort (Table 2), and each was categorized as PI-associated variant (PI-v) when the percentage of PI for that variant is >50% or PS-associated variant (PS-v) when the PI percentage is <50% by using the online CFTR2 database (https://cftr2.org/mutations_history, data version January 10, 2020).
Table 2.
CFTR variants detected in 104 children in the FIRST cohort
Variant c.DNA name (legacy name) from CFTR2a | Allele count (frequency) | Number of subjectsb | CFTR2 allele frequency | %PI in CFTR2 | %PI based on fecal elastase |
---|---|---|---|---|---|
PI-associated variantsc: | |||||
c.1521_1523delCTT (F508del) | 149 (71.6%) | 97d | 69.7% | 98% | 92% |
c.1652G>A (G551D) | 9 (4.3%) | 9e | 2.10% | 96% | 100% |
c.1624G>T (G542X) | 9 (4.3%) | 7f | 2.54% | 98% | 89% |
c.489+1G>T (621+1G→T) | 6 (2.9%) | 6 | 0.93% | 99% | 100% |
c.3909C>G (N1303K) | 3 (1.4%) | 3 | 1.58% | 98% | 100% |
c.4077_4080delTGTTinsAA (4209TGTT→AA) | 2 (1.0%) | 2 | 0.01% | 100% | 100% |
c.1657C>T (R553X) | 2 (1.0%) | 2 | 0.93% | 97% | 100% |
c.1680-886A>G (1811+1634A→G) | 2 (1.0%) | 2 | 0.06% | 96% | 100% |
c.1477C>T (Q493X) | 2 (1.0%) | 2g | 0.21% | 96% | 100% |
c.1400T>C (L467P) | 1 (0.5%) | 1 | 0.03% | 100% | 100% |
c.54-5940_273+10250del21kb (CFTRdele 2,3) | 1 (0.5%) | 1 | 0.29% | 100% | 100% |
c.274-1G>A (406-1G→A) | 1 (0.5%) | 1f | 0.03% | 100% | 100% |
c.1766+1G>A (1898+1G→A) | 1 (0.5%) | 1 | 0.30% | 99% | 100% |
c.3528delC (3659delC) | 1 (0.5%) | 1 | 0.38% | 99% | 100% |
c.1021_1022dupTC (1154insTC) | 1 (0.5%) | 1 | 0.15% | 99% | 100% |
c.1679G>C (R560T) | 1 (0.5%) | 1 | 0.24% | 98% | 100% |
c.1519_1521delATC (I507del) | 1 (0.5%) | 1 | 0.46% | 98% | 100% |
c.1585-1G>A (1717-1G→A) | 1 (0.5%) | 1 | 0.86% | 97% | 100% |
c.2051_2052delAAinsG (2183AA→G) | 1 (0.5%) | 1f | 0.38% | 96% | 100% |
c.4242+1G>T (4374+1G→A) | 1 (0.5%) | 1 | 0.01% | 93% | 100% |
c.2052dupA (2184insA) | 1 (0.5%) | 1 | 0.23% | 85% | 0% |
c.2988G>A (3120G→A) | 1 (0.5%) | 1 | 0.06% | 55% | 100% |
c.2554_2555insTg | 1 (0.5%) | 1 | -- | -- | 100% |
c.1469delTh | 1 (0.5%) | 1 | -- | -- | 100% |
PS-associated variantsi: | |||||
c.2657+5G>A (2789+5G→A) | 2 (1.0%) | 2j | 0.72% | 43% | 0% |
c.1021T>C (S341P) | 1 (0.5%) | 1 | 0.02% | 38% | 0% |
c.349C>T (R117C) | 1 (0.5%) | 1 | 0.10% | 24% | 0% |
c.617T>G (L206W) | 1 (0.5%) | 1 | 0.23% | 20% | 0% |
c.328G>C (D110H) | 1 (0.5%) | 1 | 0.05% | 17% | 0% |
c.[350G>A;1210-12T[7]] (R117H; 7T) | 1 (0.5%) | 1 | 0.09% | 15% | 0% |
c.4364C>G (S1455X) | 1 (0.5%) | 1 | 0.01% | 10% | 0% |
CFTR2 (https://cftr2.org/mutations_history), data version January 10, 2020.
In this column, numbers without superscripts indicate that subjects had c.1521_1523delCTT in their second allele.
Defined by CFTR2 %PI higher than 50%.
52 had homozygous c.1521_1523delCTT and 45 had c.1521_1523delCTT and a different variant in the second allele.
7 had c.1652G>A/c.1521_1523delCTT; 1 had c.1652G>A/c.274-1G>A; 1 had c.1652G>A/c.2657+5G>A.
2 had homozygous c.1624G>T; 2 had c.1624G>T/c.1521_1523delCTT; 1 had c.1624G>T/c.2051_2052delAAinsG; 1 had c.1624G>T/c.1477C>T; 1 had c.1624G>T/c.274-1G>A.
1 had c.1477C>T/ c.1521_1523delCTT; 1 had c.1624G>T/c.1477C>T which is the same subject listed in footnote f.
Not reported in CFTR2; categorized as PI-associated variants based on all fecal elastase values <200 μg/g stool.
Defined by CFTR2 %PI lower than 50%.
1 had c.2657+5G>A/c.1521_1523delCTT; 1 had c.2657+5G>A/c.1652G>A which is the same subject listed in footnote e.
WGS was performed on leukocyte DNA extracted for EDTA-treated blood using Illumina HiSeqX and NovaSeq 6000 platforms and aiming for a mean depth of 30X as described previously.19,27 DNA sequence reads were aligned to reference genome GRCh38. The resultant variant calling format (VCF) files were processed and only bi-allelic single-nucleotide polymorphisms (SNPs) on autosomes were kept for statistical analysis.
Statistical analysis
Our statistical analysis plan featured the application of the state-of-the-art and well established PRS-CS method28 that leverages genome-wide association statistics and employs Bayesian statistical regularization to calculate PRS and improve its predictive performance. More specifically, to calculate PRS for each subject, we used summary statistics from a genome-wide association study (GWAS) of lung function with 400,102 individuals.21 SNP coordinates in GWAS summary statistics were lifted over to genome build GRCh38 using the UCSC liftover software. PRS was constructed using the PRS-CS method28 for the lung function parameter forced expiratory volume in one second with a total of 1,060,005 variants. Therefore, higher PRS z-scores indicate better lung function. PRS were standardized to z-scores with a mean of 0 and a variance of 1 in all analyses.
Univariate analysis on each risk factor was performed by using analysis of variance to compare means and the median test to compare medians, respectively. Chi-square test and Fisher’s exact test (when sample size was <5 in any subgroup) were used to compare proportions. Ordinal logistic regression was conducted to assess the likelihood of having early-onset CF lung disease incorporating multiple intrinsic and extrinsic risk factors using the five CFELD categories in the following order: asymptomatic, minimal, mild, moderate, and severe. The presentations of statistical significance include P values and odds ratios ± 95% confidence intervals in the log-scale depicted in a forest plot in Figure 3. We also calculated partial R2 to quantify the relative contributions of intrinsic and extrinsic risk factors on the continuous CFELD score. All analyses were performed by using SAS or R.
Figure 3. Risk of CF Early-onset Lung Disease (CFELD) associated with three overarching categories of factors, namely, genetic, socioeconomic status (SES) and environmental, and CF clinical care factors, as assessed by ordinal logistic regression.
The outcome variable consisted of five CFELD severity categories in the following order: asymptomatic, minimal, mild, moderate, and severe. Odds ratio (OR) with 95% confidence interval (CI) and p-values are shown for each factor. An OR of greater than one indicates a greater likelihood of having more severe CFELD. For factors that have more than 3 categories, the reference group is marked. Demographic covariates sex, race (White or Non-White), ethnicity (Hispanic or Non-Hispanic) were also adjusted in the model.
RESULTS
Characteristics of the study population and CFTR variants
The characteristics shown in Table 1 did not differ significantly between the 104 FIRST subjects participating in the WGS component compared to the whole cohort of 145 subjects who completed 36-month follow up. The WGS cohort had three non-white subjects: one Black and two identifying as More Than One Race.
CFTR genotypes are listed by the frequency of pathogenic variants with and without the most common c.1521_1523delCTT (F508del) allele in Table 2 along with the distribution of pancreatic phenotypes. Both of these characteristics resembled the national CF population as reported in the 2021 CF Foundation Patient Registry.29 The most frequent CFTR variant identified was F508del with an allele frequency of 71.6% as expected. Two variants we identified, c.2554_2555insT and c.1469delT, have not yet been documented in the CFTR2 database. Among all 31 variants, only one variant had incongruent categorization of PI vs. PS when compared to using the clinical method of fecal elastase-1; specifically, the 2184insA detected in a single FIRST subject had all fecal elastase-1 consistently in the PS range (315, 368, 394 and 283 μg/g at age 4, 12, 24 and 36 months, respectively). On the other hand, in the CFTR2 database, 85% of patients with this mutation are PI.
As shown in Table 1, under our pancreatic phenotype-determined categorization method, 36% of the FIRST-WGS cohort had F508del and another PI-v in the second allele, 6% had two PI-v that were not F508del, and 8% had at least one allele with PS-v. Excluding the subject with 2184insA that had incongruent categorization of PI vs. PS, the 95 subjects who had PI-v in both alleles of the CFTR gene had significantly lower (p< 0.001) fecal elastase-1 (39 ± 40 μg/g stool) and higher (p=0.007) sweat chloride (102 ± 14 mmol/L) compared to the 8 subjects who had PS-v (fecal elastase-1: 473 ± 34; sweat chloride: 65 ± 25).
Univariate analyses of the effects of genetic factors on CFELD severity
To test the hypothesis that genetic variations predict CFELD score, we performed univariate analyses on the association of two genetic factors, CFTR genotype and PRS, with CFELD severity. Overall, the prevalence of CFELD categories were 6% asymptomatic, 16% minimal, 29% mild, 36% moderate, and 13% severe in the FIRST-WGS cohort. The distribution of CFELD severity differed significantly among the four CFTR genotype categories. Specifically, proportionately more subjects who had PS-v had asymptomatic or minimal CFELD compared to the other three CFTR genotype categories (Figure 1A). CFELD scores were also significantly correlated with PRS z-scores (Figure 1B). Consistently, a higher prevalence of moderate and severe CFELD were found among subjects with low PRS z-scores compared to those with high PRS z-scores greater than 1 (Figure 1C).
Figure 1. Impact of genetic factors on CF Early-onset Lung Disease (CFELD) severity.
Univariate analysis of the distribution of the CF Early-onset Lung Disease (CFELD) severity categories by genetic factors. Panel A shows the CFELD categories by Cystic Fibrosis Conductance Regulator (CFTR) genotype categories; pancreatic insufficiency associated variants (PI-v) and pancreatic sufficiency associated variants (PS-v) are described in METHODS and listed in Table 2. Panel B shows the correlation between the CFELD score and the polygenic risk score (PRS) z-score, which is an aggregated genetic risk score associated with lung function as described in METHODS; note that higher PRS z-scores indicate higher lung function. Panel C shows the CFELD categories by PRS z-score categories. P-values in panels A and C were obtained by chi-square or Fisher’s exact test.
Univariate analyses of the effects of SES and environmental factors on CFELD severity
As shown in Figure 2A, maternal education was significantly predictive of CFELD severity. Medicaid patients had more severe CFELD, although the p-value was 0.06. Paternal education (p=0.13) and annual household income (p=0.74) were not significant, however, they were both significantly associated with caregiver stability, which was defined as having a stable 2-parent home that did not change during the first 3 years of life (Figure 2C). These observations prompted us to derive a composite SES indicator we refer to as “high and stable SES” (Figure 2D) that incorporated maternal education, Medicaid insurance coverage, and stable 2-parent home for subsequent multiple regression analysis described below.
Figure 2. Impact of socioeconomic status (SES) indicators on CF Early-onset Lung Disease (CFELD) severity.
Univariate analysis of the distribution of the CFELD severity categories by maternal education (A) and type of health insurance coverage (B). Paternal education and household income were found to be significantly associated with having stable 2-parent home in the first 3 years of life (C), which prompted us to derive a composite SES indicator we refer to as “high and stable SES” (D). P-values were obtained by chi-square or Fisher’s exact test.
Regarding environmental factors shown in Table 1, only daycare attendance was a significant predictor of CFELD severity in univariate analyses, i.e., proportionately more FIRST subjects who attended daycare for ≥20 hours per week had severe CFELD (28%) compared to those in daycare for <20 hours per week or those with no daycare (10%), p=0.03.
Multiple regression analysis to identify predictors for CFELD severity
Our next analysis applied ordinal logistic regression model to assess each factor’s effect on CFELD severity adjusting for all the other factors on the risk of CFELD severity cumulative over the first 3 years of life. Figure 3 shows that both genetic factors, CFTR genotype and PRS z-scores, were significant predictors. Having F508del or other PI-v in both alleles was associated with significantly greater risk of more severe CFELD and so did having low PRS z-scores or intermediate PRS-z scores when compared with the high PRS z-score group.
Similarly, the effect of the composite SES indicator is highly significant (Figure 3); having high and stable SES reduced the risk of more severe CFELD to 19% of that compared to the low and unstable SES group. In contrast, smoke exposure and daycare attendance were associated with an increased risk of CFELD severity in the multiple regression model. We found that daycare attendance ≥20 hours/week was also significantly associated with a higher likelihood of having more MD-reported PEx episodes [OR=6.5 (95% CI: 1.7-25.6), p=0.005] which is a major component in the CFELD scoring system.
Analysis of the CF clinical care factors demonstrated that better nutritional status as indicated by higher weight-for-age and length-for-age z-scores at birth, and higher height-for-age z-scores at 3 years of age, were associated with a lower likelihood of more severe CFELD (Figure 3). Earlier age at the first CF center visit after positive newborn screening and longer duration of CFTR modulator ivacaftor therapy were beneficial but their effects were not statistically significant.
We also estimated the relative contribution of intrinsic and extrinsic risk factors on CFELD. Intrinsic and extrinsic risk factors showed substantial and similar contributions to the variation in CFELD severity, with partial R2 of 0.239 and 0.342, respectively.
Analysis among children with homozygous F508del genotype
Having observed that PRS z-scores and SES indicators were significant predictors of CFELD severity in the FIRST cohort, our final analysis focused on evaluating the half of the cohort with homozygous F508del genotype, testing the hypothesis that these factors are also determinants of early-onset lung disease in the first 3 years of life in young children with the most prevalent CFTR genotype that generally causes “classical CF”.1
The distribution of CFELD severity in F508del homozygotes (N=52) revealed a slightly lower prevalence in the asymptomatic category (4% vs. 6%) and a higher prevalence of in the severe category (21% vs. 13%) in comparison with the overall FIRST-WGS cohort (N=104). Ordinal logistic regression analysis identified four significant predictors of CFELD severity. Specifically, high PRS-z scores (p=0.026) and high and stable SES (p=0.012) were associated with a lower likelihood of more severe CFELD, while daycare attendance (p=0.032) and having GI/nutrition-related hospitalizations (p=0.019) were associated with a higher likelihood of more severe CFELD scores.
DISCUSSION
Our results show that while intrinsic/genetic factors are determinants of early onset CF lung disease, extrinsic/environmental influences are similarly and perhaps equally important. It is important to highlight that the genetic factors include not only the CFTR genotype, which we demonstrated recently,7 but also potential genetic modifiers aggregated through the PRS strategy using WGS data in the present study.20,21 The resultant new finding that PRS is a significant predictor of the CFELD severity adds new knowledge into our understanding that helps explain the heterogeneity of lung disease in CF. Previous research has suggested that CFTR genotype may be a determinant of lung disease based on studies in older CF patients1 and in a small series12 we published on the timing of lung disease onset in children less than 5 years of age; however, Cutting11 pointed out that “minimal correlation has been found between CFTR genotype and severity of lung disease.” In addition, the additive effects of multiple genetic modifiers on early lung disease have not been demonstrated previously nor have WGS data been investigated to our knowledge in young children with CF. Although these findings may not seem surprising, it is clinically important that as early as 3 years of age children with CF display such heterogeneity of lung disease onset attributable to intrinsic factors. This observation has implications for application of precision, personalized medicine and early therapeutic decision-making with regard to expensive CFTR modulators.17 Because diagnosis through newborn screening is accompanied by CFTR genotype determination, information on the early disease liability of variants can be incorporated into routine practice.
Our design/methodology for studying potential genotype effects proved advantageous, particularly the development of a pancreatic phenotype-determined categorization method. We considered and explored using the genotype classification method of McKone et al,30 or derivatives of it, but some variants such as F508del do not fit within one class; moreover, other variants assigned to classes IV or V that would originally have been considered “mild” have been subsequently identified as PI-associated with typical CF lung disease.1 But a phenotype-driven scheme helps ensure that the genotype pathogenicity is correctly interpreted. In addition, because the cost of WGS has become affordable, we suggest that assessment of aggregated genetic modifiers could potentially be used after early diagnosis in efforts to predict the pulmonary prognosis. Our data demonstrate the feasibility and potential value of this approach as does the recent report by Zhou et al.31
Perhaps most importantly, our finding that even by 3 years of age extrinsic/environmental factors can have significant impacts on early-onset lung disease is novel and also clinically valuable. The strongly detrimental effect of SES-related risks being manifested so early in life underscores the need for early comprehensive socioenvironmental assessments and supportive interventions. We were especially impressed by the impact of maternal education level and daycare. Maternal education has been reported previously from several studies as influencing adherence32 and outcomes.32,33 Daycare exposure, however, has only been studied34,35 to a limited extent in older children with CF and not been identified previously as a risk factor in CF despite more than three decades of convincing worldwide pediatric studies36–40 with compelling data on the greater likelihood of infections and antibiotic usage in preschool children with daycare exposures.
The extrinsic risks we discovered, in contrast to genetic factors, are potentially modifiable. As stated by Oates and Schechter,41 “We Can Do Better.” Parents, particularly mothers who are the usual primary caregivers, can be better educated about the complex care requirements of young children with CF.23,24 Daycare environments can also be carefully selected and/or modified to reduce microbial exposure risks.39,40 In addition, maternal support systems have been demonstrated to be effective for enhancing child care.40,41 Our finding that these extrinsic risk factors play a significant role in the development of early-onset lung disease in F508del homozygous children provides new evidence to justify increased effort towards reducing the risk associated with these extrinsic factors. Optimizing growth in the first 3 years of life, i.e., achieving good weight-for-age and height-for age, is critical and can be achieved by adhering to the comprehensive nutrition therapy and management outlined in the clinical practice guidelines.23,24
Ever since Kerem et al10 reported the heterogeneity of pulmonary dysfunction among F508del homozygotes, a convincing explanation for this wide variability has remained a significant gap in our knowledge about CF lung disease determinants. Much attention has been given to individual genetic modifiers, and some studies have been informative.11,44–46 Unlike the GWAS strategy applied in many of these studies using very large cohorts with retrospectively assessed clinical data on CF patients having established and often progressive pulmonary dysfunction, we employed a novel design by focusing on the onset of CF lung disease and aggregating potential genetic modifiers in a prospectively, comprehensively studied cohort evaluated longitudinally since shortly after birth. With the application of a hypothesis-testing approach, this strategy proved advantageous as we found both genetic and nongenetic factors to be significantly associated with CFELD severity and the early lung disease onset phenotype.
Collaco et al,16 concluded that “genetic and environmental factors contribute equally to pulmonary function variation in CF” based on assessment of an international cohort consisting of 134 monozygous twins and 272 dizygous twins and siblings studied at 6 to 40 years of age. Although the potency of genotype was clear, the evaluation of environmental factors was complex, as would be expected for such a widely variable age range of subjects, and had limited specificity at the individual level. Our results support and extend to younger patients the qualitative observations from twin studies, but we lack the statistical power for conclusive hereditability estimates that might allow precise determination of the relative impact of intrinsic and extrinsic risk factors on early CF lung disease onset.
Limitations of our study include the relatively small size of the FIRST cohort compared to GWAS projects and the aggregation of potential genetic modifiers through PRS methodology rather than their individualization. The former is mitigated by our a priori hypothesis-testing design and the latter by the value of the UK Biobank database21 combined with how well the PRS strategy26 has performed in cancer and cardiovascular studies.20,22 Although we have not reported on individual SNPs, this is a future goal. Another potential limitation is our reliance on clinical data for evaluation of lung disease. However, since objective measures of lung function such as multiple breath washout are not performed in routine clinical settings for children younger than 3 years of age, the CFELD system provide a standardized approach to assess lung disease severity. In addition, it was developed with sound multiattribute methodology.7 But, validation of the CFELD system by objective measures of lung disease is needed and is planned by the FIRST team when all subjects have completed their age 6-years assessments. In addition, future studies will be organized to evaluate if the CFELD scores are influenced by elexacaftor-tezacaftor-ivacaftor, the highly effective CFTR modulator therapy that just received approval for use in CF children aged 2-5 years. Lastly, our SES and environmental data rely on parental reporting and some values such as exposure to smoking may be biased.
In conclusion, the early onset of CF lung disease can be attributed similarly to both genetic and non-genetic (both social and environmental) risk factors. Although inherited DNA variations cannot be modified, some extrinsic exposures can and should become targets for interventions.
ACKNOWLEDGEMENTS
We thank Manavalan Gajapathy and Brandon M. Wilk for expert WGS informatics contributions and the following faculty members at the six participating CF centers that assumed the leadership role in the Feeding Infants Right… From the Start (FIRST) Project: Michael Rock, MD (University of Wisconsin – Madison and American Family Children’s Hospital, Madison, WI), Nick Antos, MD and Hara Levy, MD (Medical College of Wisconsin and Children’s Hospital of Wisconsin, Milwaukee, WI), Jon Gaffin, MD and Henry Dorkin, MD (Harvard University and Children’s Hospital Boston, Boston, MA), Michelle Howenstine, MD and Clement Ren, MD (Indiana University and Riley Children’s Hospital, Indianapolis, IN), Fadi Asfour, MD and Barbara Chatfield, MD (University of Utah and Primary Children’s Hospital, Salt Lake City, UT), and Susanna McColley, MD and Hara Levy, MD (Northwestern University and Lurie Children’s Hospital, Chicago, IL).
We are most grateful for the following research coordinators for their superb management of the FIRST study activities on a day-to-day basis at each study site: Danielle Sander, Taiya Bach and Anita Laxova (Madison, WI), Laura Roth, Danielle Graf, Theresa Kump, Briana Horn, and Rachel Bersie (Milwaukee, WI), Olivia Killilea, Maggie Hui, Rachel Gross, Kayla Regan, Sean Ruvolo, Kathy Doan, Kelsey Hill, Audrey Petteruti, Olyn Andrade (Boston, MA), Misty Thompson and Lisa Bendy (Indianapolis, IN), Jane Vroom, Heather Oldroyd (Salt Lake City, UT), and Rashika Rangaraj and Zainub Ashrafi (Chicago, IL).
We are deeply indebted to the following Registered Dietitian Nutritionists (RDNs) for their commitment and laborious collection of nutritional data: Erin Seffrood and Mary Marcus (Madison, WI), Olivia Lampone and Tami Miller (Milwaukee, WI), Laura Jay, Jessica Leonard, Sharon Silverman, Mollie Studley (Boston, MA), Karen Maguiness (Indianapolis, IN), Catherine McDonald (Salt Lake City, UT) and Eileen Potter (Chicago, IL).
We greatly appreciate the assistance from the following nurses for facilitating subject enrollment, data and biological specimen collections: Darci Pfeil (Madison, WI), Nicole Brueck (Milwaukee, WI), Monica Ulles and Chelsey Cheng (Boston, MA), Suzanne Meihls (Salt Lake City, UT), Stacey Bichl (Chicago, IL).
In addition, the FIRST project includes an outstanding team of researchers in the Madison Data Coordinating Center that are responsible for validating and compiling the data (Danielle Sander, Taiya Bach, Lyanne Chin, Suzanne Shoff and Makayla Schuchardt), food record analysis (Rachel Fenske and Lisa Davis) biological specimen management and biomarker analysis (Sangita Murali and Lyanne Chin) as well as statistical analysis (Zhumin Zhang and Lyanne Chin).
Lastly, Frank Greer, MD (neonatologist infant nutrition consultant) and Christopher Green, MD, University of Wisconsin Department of Pediatrics, are acknowledged with gratitude for their essential contributions to the FIRST startup phase and CFELD scoring system, respectively.
Funding:
This work is supported by grants from the NIH (R01 DK072126, R56 DK109692 and R01 DK109692), the CF Foundation (LAI14A0, LAI15A0, LAI17A0, WORTHE19A0), and The Legacy of Angels Foundation. L. Huang is supported by a Shapiro Summer Research award from the University of Wisconsin School of Medicine and Public Health and a CF Foundation Traineeship award (HUANG21H0).
Abbreviations:
- CF
cystic fibrosis
- FIRST
Feeding Infants Right… from the STart
- CFTR
cystic fibrosis transmembrane conductance regulator
- PI
pancreatic insufficiency
- PS
pancreatic sufficiency
- PI-v
PI-associated variants
- PS-v
PS-associated variants
- CFELD
Cystic Fibrosis Early-onset Lung Disease
- WGS
whole-genome sequencing
- PRS
polygenic risk score
- SES
socioeconomic status
- SNPs
single-nucleotide polymorphisms
- GWAS
genome-wide association study
Footnotes
Some of the data in the manuscript was presented as an abstract at the North American Cystic Fibrosis Conference (November 2022) and the European Cystic Fibrosis Society Conference (June 2023)
Conflict of Interest/Disclosures: The authors have no conflict of interest to disclose.
References
- 1.Bell SC, Mall MA, Gutierrez H, Macek M, Madge S, Davies JC, et al. The future of cystic fibrosis care: a global perspective. Lancet Respir Med 2020;8:65–124. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Esterly JR, Oppenheimer EH. Observations in cystic fibrosis of the pancreas. 3. Pulmonary lesions. Johns Hopkins Med J 1968;122:94–101. [PubMed] [Google Scholar]
- 3.Bedrossian CWM, Donald Greenberg S, Singer DB, Hansen JJ, Rosenberg HS. The lung in cystic fibrosis. A quantitative study including prevalence of pathologic findings among different age groups. Hum Pathol 1976;7:195–204. [DOI] [PubMed] [Google Scholar]
- 4.Sly PD, Brennan S, Gangell C, de Klerk N, Murray C, Mott L, et al. Lung disease at diagnosis in infants with cystic fibrosis detected by newborn screening. Am J Respir Crit Care Med 2009;180:146–52. [DOI] [PubMed] [Google Scholar]
- 5.Stick SM, Brennan S, Murray C, Douglas T, von Ungern-Sternberg BS, Garratt LW, et al. Bronchiectasis in Infants and Preschool Children Diagnosed with Cystic Fibrosis after Newborn Screening. J Pediatr 2009;155:623–8. [DOI] [PubMed] [Google Scholar]
- 6.Sly PD, Gangell CL, Chen L, Ware RS, Ranganathan S, Mott LS, et al. Risk Factors for Bronchiectasis in Children with Cystic Fibrosis. N Engl J Med 2013;368:1963–70. [DOI] [PubMed] [Google Scholar]
- 7.Huang L, Lai HJ, Antos N, Rock MJ, Asfour F, Howenstine M, et al. Defining and identifying early-onset lung disease in cystic fibrosis with cumulative clinical characteristics. Pediatr Pulmonol 2022;57:2363–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Barreda CB, Farrell PM, Laxova A, Eickhoff JC, Braun AT, Coller RJ, et al. Newborn screening alone insufficient to improve pulmonary outcomes for cystic fibrosis. J Cyst Fibros 2021;20:492–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Stanojevic S, Davis SD, Retsch-Bogart G, Webster H, Davis M, Johnson RC, et al. Progression of lung disease in preschool patients with cystic fibrosis. Am J Respir Crit Care Med 2017;195:1216–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Kerem E, Corey M, Kerem B, Rommens J, Markiewicz D, Levison H, et al. The Relation between Genotype and Phenotype in Cystic Fibrosis — Analysis of the Most Common Mutation (ΔF508). N Engl J Med 1990;323:1517–22. [DOI] [PubMed] [Google Scholar]
- 11.Cutting GR. Modifier genes in Mendelian disorders: The example of cystic fibrosis. Ann N Y Acad Sci 2010;1214:57–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Braun AT, Farrell PM, Ferec C, Audrezet MP, Laxova A, Li Z, et al. Cystic fibrosis mutations and genotype-pulmonary phenotype analysis. J Cyst Fibros 2006;5:33–41. [DOI] [PubMed] [Google Scholar]
- 13.Oates GR, Schechter MS. Socioeconomic status and health outcomes: cystic fibrosis as a model. Expert Rev Respir Med 2016;10:967–77. [DOI] [PubMed] [Google Scholar]
- 14.Quittner AL, Schechter MS, Rasouliyan L, Haselkorn T, Pasta DJ, Wagener JS. Impact of socioeconomic status, race, and ethnicity on quality of life in patients with cystic fibrosis in the United States. Chest 2010;137:642–50. [DOI] [PubMed] [Google Scholar]
- 15.Schlüter DK, Southern KW, Dryden C, Diggle P, Taylor-Robinson D. Impact of newborn screening on outcomes and social inequalities in cystic fibrosis: A UK CF registry-based study. Thorax 2020;75:123–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Collaco JM, Blackman SM, McGready J, Naughton KM, Cutting GR. Quantification of the relative contribution of environmental and genetic factors to variation in cystic fibrosis lung function. J Pediatr 2010;157:802–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Zemanick ET, Taylor-Cousar JL, Davies J, Gibson RL, Mall MA, McKone EF, et al. A phase 3 open-label study of elexacaftor/tezacaftor/ivacaftor in children 6 through 11 years of age with cystic fibrosis and at least one F508del allele. Am J Respir Crit Care Med 2021;203:1522–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Lai HJ, Chin L, Murali S, Bach T, Sander D, Farrell PM. Vitamins A , D , E status as related to supplementation and lung disease markers in young children with cystic fibrosis. Pediatr Pulmonol 2022;57:935–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lai HJ, Song J, Lu Q, Murali S, Gajapathy M, Wilk BM, et al. Genetic factors help explain the variable responses of young children with cystic fibrosis to vitamin D supplements. Clin Nutr ESPEN 2022;51:367–76. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Lewis CM, Vassos E. Polygenic risk scores: From research tools to clinical instruments. Genome Med. 2020;12(44). doi: 10.1186/s13073-020-00742-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Shrine N, Guyatt AL, Erzurumluoglu AM, Jackson VE, Hobbs BD, Melbourne CA, et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat Genet 2019;51:481–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Khera AV, Emdin CA, Drake I, Natarajan P, Bick AG, Cook NR, et al. Genetic Risk, Adherence to a Healthy Lifestyle, and Coronary Disease. N Engl J Med 2016;375:2349–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Borowitz D, Robinson KA, Rosenfeld M, Davis SD, Sabadosa KA, Spear SL, et al. Cystic fibrosis Foundation evidence-based guidelines for management of infants with cystic fibrosis. J. Pediatr 2009;155:73–93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lahiri T, Hempstead SE, Brady C, Cannon CL, Clark K, Condren ME, et al. Clinical practice guidelines from the cystic fibrosis foundation for preschoolers with cystic fibrosis. Pediatrics 2016;137:1–26. [DOI] [PubMed] [Google Scholar]
- 25.Grummer-Strawn LM, Reinold C, and Krebs NF, Use of World Health Organization and CDC growth charts for children aged 0-59 months in the United States. MMWR Recomm Rep, 2010. 59(Rr-9): p. 1–15. [PubMed] [Google Scholar]
- 26.Section on Breastfeeding. Breastfeeding and the use of human milk. Pediatrics. 2012. Mar;129(3):e827–41. doi: 10.1542/peds.2011-3552. Epub 2012 Feb 27. [DOI] [PubMed] [Google Scholar]
- 27.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment / Map (SAM) Format and SAMtools 1000 Genome Project Data Processing Subgroup. Bioinformatics 2009;25:1–2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Ge T, Chen CY, Ni Y, Feng YCA, Smoller JW. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 2019;10:1776. doi: 10.1038/s41467-019-09718-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Cystic Fibrosis Foundation Patient Registry 2021. Annual Data Report. Bethesda, Maryland: 2021. [Google Scholar]
- 30.McKone EF, Goss CH, Aitken ML. CFTR genotype as a predictor of prognosis in cystic fibrosis. Chest. 2006;130:1441–7. [DOI] [PubMed] [Google Scholar]
- 31.Zhou YH, Gallins PJ, Pace RG, Dang H, Aksit MA, Blue EE, et al. Genetic Modifiers of Cystic Fibrosis Lung Disease Severity: Whole Genome Analysis of 7,840 Patients. Am J Respir Crit Care Med. 2023. Mar 15. doi: 10.1164/rccm.202209-1653OC. Epub ahead of print. . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Oates GR, Stepanikova I, Gamble S, Gutierrez HH, Harris WT. Adherence to airway clearance therapy in pediatric cystic fibrosis: Socioeconomic factors and respiratory outcomes. Pediatr Pulmonol 2015;50:1244–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Kosorok MR, Jalaluddin M, Farrell PM, Shen G, Colby CE, Laxova A, et al. Comprehensive analysis of risk factors for acquisition of Pseudomonas aeruginosa in young children with cystic fibrosis. Pediatr Pulmonol 1998;26:81–8. [DOI] [PubMed] [Google Scholar]
- 34.Sanders DB, Emerson J, Ren CL, Schechter MS, Gibson RL, Morgan W, et al. Early childhood risk factors for decreased FEV1 at age six to seven years in young children with cystic fibrosis. Ann Am Thorac Soc 2015;12:1170–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Rosenfeld M, Emerson J, McNamara S, Thompson V, Ramsey BW, Morgan W, et al. Risk factors for age at initial Pseudomonas acquisition in the cystic fibrosis epic observational cohort. J Cyst Fibros 2012;11:446–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Ferson MJ. Infections in day care. Curr Opin Pediatr 1993;5:35–40. [DOI] [PubMed] [Google Scholar]
- 37.Sun Y, Sundell J. Early daycare attendance increase the risk for respiratory infections and asthma of children. J Asthma 2011;48:790–6. [DOI] [PubMed] [Google Scholar]
- 38.Schuez-Havupalo L, Toivonen L, Karppinen S, Kaljonen A, Peltola V. Daycare attendance and respiratory tract infections: A prospective birth cohort study. BMJ Open 2017;7(9):e014635. doi: 10.1136/bmjopen-2016-014635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Brady M. Infectious disease in pediatric out-of-home child care. Am J Infect Control 2005;33:276–85. [DOI] [PubMed] [Google Scholar]
- 40.Huskins WC. Transmission and control of infections in out-of-home child care. Pediatr Infect Dis J 2000;19(10 SUPPL.):106–10. [DOI] [PubMed] [Google Scholar]
- 41.Oates GR, Schechter MS. Social inequities and cystic fibrosis outcomes: we can do better. Ann Am Thorac Soc 2021;18:215–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Budds K. Validating social support and prioritizing maternal wellbeing: Beyond intensive mothering and maternal responsibility. Philos Trans R Soc Lond B Biol Sci. 2021;376(1827):20200029. doi: 10.1098/rstb.2020.0029. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Patel S, Patel S. The effectiveness of lactation consultants and lactation counselors on breastfeeding outcomes. J Hum Lact 2016;32:530–41. [DOI] [PubMed] [Google Scholar]
- 44.Drumm ML, Konstan MW, Schluchter MD, Handler A, Pace R, Zou F, et al. Gene Modifier Study Group. Genetic modifiers of lung disease in cystic fibrosis. N Engl J Med. 2005;353:1443–53. [DOI] [PubMed] [Google Scholar]
- 45.Corvol H, Blackman SM, Boëlle PY, Gallins PJ, Pace RG, Stonebraker JR, et al. Genome-wide association meta-analysis identifies five modifier loci of lung disease severity in cystic fibrosis. Nat Commun. 2015;6:8382. doi: 10.1038/ncomms9382. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Dorfman R, Sandford A, Taylor C, Huang B, Frangolias D, Wang Y, et al. Complex two-gene modulation of lung disease severity in children with cystic fibrosis. J Clin Invest 2008;118:1040–9. [DOI] [PMC free article] [PubMed] [Google Scholar]