Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Feb 1.
Published in final edited form as: J Allergy Clin Immunol. 2020 Jul 7;147(2):677–685.e10. doi: 10.1016/j.jaci.2020.06.026

Unsupervised Modeling and Genome-Wide Association Identify Novel Features of Allergic March Trajectories

Stanislaw J Gabryszewski 1,#, Xiao Chang 2,#, Jesse W Dudley 3, Frank Mentch 2, Michael March 2, John H Holmes 6, Jason Moore 6, Robert W Grundmeier 3,5, Hakon Hakonarson 2,4,5, David A Hill 1,5,7,*
PMCID: PMC7790850  NIHMSID: NIHMS1610104  PMID: 32650023

Abstract

Background:

The allergic march refers to the natural history of allergic conditions during infancy and childhood. However, population-level disease incidence patterns do not necessarily reflect the development of allergic disease in individuals. A better understanding of the factors that predispose to different allergic trajectories is needed.

Objective:

Determine the demographic and genetic features that associate with the major allergic march trajectories.

Methods:

Presence or absence of common allergic conditions (atopic dermatitis, AD; IgE-mediated food allergy, IgE-FA; asthma; and allergic rhinitis, AR) was ascertained in a pediatric primary care birth cohort of 158,510 subjects. Hierarchical clustering and decision tree modeling was used to associate demographic features with allergic outcomes. Genome-wide association study (GWAS) tested for risk loci associated with specific allergic trajectories.

Results:

We found an association between self-identified “Black” race and progression from AD to asthma. Conversely, “Asian or Pacific Islander” race associated with AD to IgE-FA, and “White” race associated with AD to AR. GWAS of trajectory groups identified risk loci associated with progression from AD to Asthma (rs60242841), and AD to AR (rs9565267, rs151041509, rs78171803). Consistent with our epidemiologic associations, rs60242841 is more common in individuals of African ancestry (AA) than European ancestry (EA), while rs9565267 and rs151041509 are more common in EA than AA individuals.

Conclusion:

We identify novel associations between race and progression along distinct allergic trajectories. Ancestral genetic differences may contribute to these associations. These results uncover important health disparities, refine the concept of the allergic march, and represent a step towards developing individualized medical approaches for these conditions.

Keywords: Atopic march, Allergic march, Allergic trajectory, Atopic dermatitis, IgE-mediated food allergy, Asthma, Allergic rhinitis, Epidemiology, Genome-wide association study

CAPSULE SUMMARY:

We identified novel associations between race and a child’s progression along distinct allergic march trajectories. We report ancestral disparities in novel SNPs that may contribute to these associations.

Graphical Abstract

graphic file with name nihms-1610104-f0006.jpg

INTRODUCTION:

The allergic march (also known as the atopic march) refers to the developmental relationship between the most common allergic manifestations. Based on population-level disease incidence patterns, the march begins with atopic dermatitis (AD) and progresses to IgE-mediated food allergy (IgE-FA), asthma, and allergic rhinitis (AR)/(13) Our prior longitudinal studies of this progression in birth cohorts using observational data derived from electronic medical records have allowed measurement of the individual and cumulative risk that early march members impart for development of the latter.(4, 5) However, population-level studies often ignore the fact that multiple distinct disease developmental trajectories exist within the allergic march.(2, 6) Indeed, some individuals develop the allergic manifestations in a non-traditional order, skip manifestations, or halt progression on the march altogether(2, 6,7) The environmental, genetic, and immunologic factors that predispose to these distinct allergic trajectories are not well understood.

Health disparities relating to the major allergic manifestations have been recognized for decades. For example, asthma is more common and more severe among children; women; low-income, inner-city residents; and African American and Puerto Rican communities(810) Emerging evidence also supports racial and ethnic disparity in food allergy, with worse outcomes among African American and Hispanic children.(1113) These health disparities likely result from a complex interplay between biologic and environmental factors. There is considerable interest in better understanding the origins of allergic disparities in an effort to close these disparity gaps.(8)

The era of electronic medical records (EMR) has allowed the application of ‘big data’ analytic approaches to clinical datasets. In particular, unsupervised modeling of clinical data has emerged as an important approach that allows for the detection of significant associations in a minimally-biased manner.(14) By applying these computational techniques, clinical datasets can become important tools for both addressing predefined questions, as well as a powerful means of discovery that facilitate the development of new hypotheses for subsequent testing.

Here we employ a reductionist strategy to better understand how key demographic and genetic factors influence progression on the allergic march. To that end, we focus on children with AD who go on to develop at least one other major allergic manifestation (IgE-FA, Asthma, AR). We partner minimally-biased computational modeling with genome wide association to dissect the key demographic and genetic features that associate most strongly with specific allergic march outcomes. This approach is designed to improve our understanding of the relationship between these conditions, identify previously unrecognized health disparities, and inform efforts to better identify, diagnose, and treat at risk patient populations.(15)

METHODS:

Birth cohort generation and clinical data extraction:

The Children’s Hospital of Philadelphia (CHOP) network consists of 31 outpatient clinical care sites across the greater Delaware valley that provide both primary and subspecialty pediatric care. Data from this network has previously been validated as a tool to assess regional disease epidemiology(4, 16)

We examined the electronic medical records (Epic Systems Inc., Verona, WI) of the 158,510 children who are part of the CHOP primary care birth cohort, as previously defined.(4) To be included in this cohort, patients were required to have established care in the CHOP primary care network before their first birthday, and between January 1, 2001 and December 31, 2016. Patients were also required to have received primary care from a CHOP-affiliated practice for at least 2 years. Observation time for children was censored on the date of their last face-to-face outpatient health care encounter in our health system before their 18th birthday, and patients were assumed to be observed continuously until they were censored. The average observation time was 7.3 years.

For our epidemiological studies, participants self-identified as “White”, “Black”, “Asian or Pacific Islander”, “Other”, or “Unknown”. Due to our interest in the potential differences in allergic trajectory across populations, we excluded children with unknown race from our analysis.

Generation of allergic disease cohorts:

We ascertained the presence or absence of allergic conditions of interest (AD, IgE-FA, asthma, and AR) using a combination of International Classification of Diseases (ICD) diagnosis codes, allergen information, and medication prescriptions, as described previously.(4, 5, 17) Conditions were identified using the following ICD codes: AD (691.nn, L20.nn); IgE-FA (V15.0[1-5], 995.[6-7], 995.7, Z91.01[0-3], Z91.018, T78.1XXA); asthma (493.n; J45.nn); AR (477.nn; J30.n). All diagnoses were made in accordance with established practice parameters. For inclusion in a disease cohort, patients were required to have diagnosis codes representative of an atopic condition on two separate care visits, occurring at least six months apart. We did not include analysis of food protein-induced enterocolitis syndrome and eosinophilic esophagitis in this study as a result of comparatively smaller patient numbers.

To maximize the specificity of our IgE-FA cohort, we: (1) required an epinephrine auto-injector prescription and a food allergen entry in the EMR allergy module, (2) excluded diagnosis codes relating to food protein-induced enterocolitis syndrome and eosinophilic esophagitis, and (3) recoded patients with diagnosis codes corresponding to lactose intolerance and gluten sensitivity/celiac disease as non-milk or non-wheat allergic, respectively. To maximize specificity of our asthma cohort, we: (1) limited our analysis to ICD codes occurring after 1 year of age, (2) required prescriptions for asthma-specific medications (e.g., albuterol, inhaled corticosteroid) on at least two separate dates, and (3) ignored ICD codes related to viral-induced wheeze and reactive airway disease. Prior chart review has shown these methods to have a high degree of sensitivity and specificity for identifying conditions of interest.(4)

Modeling of Allergic Trajectories:

To identify potential associations between patient demographic characteristics and allergy development in a minimally-biased manner, we performed both an unsupervised cluster analysis and a supervised decision tree analysis in R (www.r-project.org). For our unsupervised approach, we utilized the hierarchical clustering R package Pvclust to examine individuals with more than one allergic manifestation.(18) The measure of dissimilarity was the correlation distance measure implemented by the Pvclust algorithm, and 5000 bootstraps were used. For our supervised analysis, we performed a CART decision tree analysis restricted to the self-identified White, Black, and Asian or Pacific Islander populations and the primary allergic trajectory outcomes, using the Rpart package in R and a complexity parameter of 0.005.(19)

GWAS:

Using our birth cohort, we identified individuals with AD that went on to develop IgE-FA, asthma, and AR as their next allergic manifestation. Using an honest-broker system, these subjects were cross-referenced and supplemented with genotyping data from individuals of African ancestry (AA) in the CHOP Center for Applied Genomics. Allergy-free control subjects, defined as typically developing children with no relevant allergy ICD codes for AD, IgE-FA, Asthma, AR, nutritional deficiencies, or other skin conditions,(20) were recruited through the CHOP Health Care Network. There were no specific criteria to exclude autoimmune diseases in the generation of the control subject cohort. DNA samples of participants were genotyped at the Center for Applied Genomics (CAG) at CHOP using the Illumina HumanHap550, Human Quad610, and Infinium Global Screening arrays. Discovery and replication cohorts were utilized (Supplemental Table I). The discovery cohort contained samples genotyped on GSA arrays, while the replication cohort included samples genotyped on HumanHap550 and Human Quad610 arrays.

EIGENSTRAT was applied to detect and correct for potential substructures and outliers.(21) AA individuals were strictly selected by comparing principal component analysis results of participants and reference populations from Hapmap3 (Supplemental Figure 1). Samples with chip-wide genotyping failure rate greater than 5% were excluded. In addition, SNP markers with minor allele frequencies less than 1%, genotyping failure rates of greater that 2%, and Hardy-Weinberg P-values less than 1×10−6 were excluded before imputation. Pairwise identity-by-descent values were calculated by PLINK to remove cryptic relatedness and duplicated samples.(22) Genotype imputation was performed with the Michigan Imputation Server using minimac4 imputation algorithm.(23) The reference panel utilized for imputation was obtained from over 100,000 deep-coverage whole genome sequences (WGS) derived from the Trans-Omics for Precision Medicine (TOPMed) program, sponsored by the National Heart, Lung, and Blood Institute. The TOPMed imputation reference panel achieved a significant improvement in imputation qualities and accuracies of rare variants.(24) Therefore, in addition to common SNPs (MAFs > 1%, Rsq (imputation quality metric) > 0.3), well-imputed rare SNPs (MAFs < 1%, Rsq > 0.5) with MAFs above 0.1% were also retained for association analysis. No genomic inflation was detected (Supplemental Figure 2).

Association analyses were implemented using logistic regression with an additive model on the imputed dosage of the effect allele while adjusting for sex, and the first five principal components.(21) Meta-analysis was performed by PLINK2. Fixed-effects P values were reported.

Statistics:

Chi-squared testing was used to examine differences in demographic features of the allergy cohorts by comparing (1) cohort A to (2) the entire cohort minus cohort A. The hierarchical clustering R package Pvclust calculates P values via multiscale bootstrap resampling for each cluster in the hierarchy.(18) The P value of a cluster is represented as a value between 0 and 100, which indicates how strong the cluster is supported by the data. Two types of P values are calculated: an AU (Approximately Unbiased) P value and a BP (Bootstrap Probability) value. Associations with an AU larger than 95% (P value < 0.05) are boxed in red, and are strongly supported by the data.

Availability of data:

The epidemiologic dataset supporting the conclusions of this article is available in the Zenodo repository (https://zenodo.org/record/3888959). GWAS summary statistics are available on GWAS Catalog (https://www.ebi.ac.uk/gwas/downloads/summary-statistics) using accession numbers GCST90000019, GCST90000020, GCST90000021, GCST90000022, GCST90000023, and GCST90000024.

Ethical and regulatory oversight:

The GWAS was performed under IRB# 16-013278. The CHOP Institutional Review Board reviewed the primary care cohort generation and modeling portion of our study and determined that it was exempt from requiring ethics approval or subject consent as it did not meet the definition of human subject research.

RESULTS:

Cohort demographics and primary march trajectories

Of the 158,510 children in the CHOP primary care birth cohort, there were 18,596 subjects in the AD cohort, 6,472 in the IgE-FA cohort, 29,351 in the asthma cohort, and 27,094 in the AR cohort (Table I. The demographic characteristics of each cohort demonstrate significant increased prevalence of males and self-identified Black individuals in the allergy groups, as compared to the overall cohort (57% males in the allergy cohorts vs. 51% overall; 45% Black in the allergy cohorts vs 32% in overall; Chi-Squared p<0.001 for both). Consistent with our prior analyses,(4) the average ages of diagnosis for the major allergic manifestations were 472 days (for AD); 839 days (for IgE-FA); 1026 days (for Asthma); and 1457 days (for AR). There were differences in average diagnosis age for each manifestation between the major racial groups studied (Supplemental Table II).

Table I.

Demographic characteristics of allergic disease cohorts

Cohort (n)


Characteristic AD (18,596) IgE-FA (6,472) Asthma (29,351) AR (27,094) Total (158,510)

Gender, % (n)
 Male 55 (10,210) 59 (3,819) 60 (17,512) 56 (15,200) 51 (81,266)
 Female 45 (8,386) 41 (2,653) 40 (11,839) 44 (11,894) 49 (77,244)
Race, % (n)
 White 32 (5,964) 47 (3,34) 40 (11,688) 43 (11,698) 50 (78,691)
 Black 51 (9,465) 34 (2,227) 46 (13,504) 44 (11,860) 32 (50,847)
 Asian or Pacific Islander 5 (957) 7 (444) 3 (785) 3 (748) 4 (5,824)
 Other 2 (312) 2 (140) 2 (548) 1 (416) 2 (3,158)
 Unknown 10 (1,898) 10 (627) 9 (2,826) 9 (2,372) 12 (19,990)
Ethnicity, % (n)
 Hispanic or Latino 6 (1,094) 4 (283) 7 (1,957) 6 (1,578) 7 (10,495)
 Non-Hispanic or Latino 94 (17,502) 96 (6,189) 93 (27,394) 94 (25,516) 93 (148,15)
Birth year, % (n)
 2000 to 2004 13 (2,512) 9 (602) 13 (3,716) 15 (4,173) 9 (13,357)
 2005 to 2009 42 (7,829) 45 (2,912) 47 (13,915) 52 (14,468) 38 (60,576)
 2010 to 2014 32 (5,919) 38 (2,458) 35 (10,246) 29 (7,793) 39 (62,076)
 2015 or Later 13 (2,336) 8 (500) 5 (1,474) 2 (660) 14 (22,501)
Payer type, % (n)
 Medicaid 43 (7,994) 26 (1,678) 42 (12,364) 38 (10,175) 33 (52,289)
 Non-Medicaid 57 (10,602) 74 (4,794) 58 (16,987) 62 (16,919) 67 (106,221)

As there are many possible orders by which individuals can develop the allergic manifestations, we chose to adopt a reductionist model by focusing on each of the major march transitions as determined by population-level data: AD to IgE-FA; AD to Asthma; AD to AR; IgE-FA to Asthma; IgE-FA to AR; Asthma to AR; and AR to Asthma.(1) The numbers of subjects that fell into each of these groups are shown in Supplemental Figure 3. We define these cohorts as the “primary allergic march transitions”.

Unsupervised and supervised modeling of allergy development

In our unsupervised hierarchical clustering analysis of the primary allergic march transitions, we found that children who identified as Black were more likely to progress from AD to Asthma, while those who identified as Asian or Pacific Islander were more likely to progress from AD to IgE-FA (Figure 1). We found other associations between allergic transitions and demographic features. For example, females were more likely to progress from AD to AR, while individuals who identified as White were more likely to progress from AR to Asthma. Together, these results suggest that one or more factors associated with self-identified race influences the pattern of allergic disease development in children.

Figure 1.

Figure 1.

Unsupervised cluster dendrogram of associations between allergic transitions and demographic features. Transitions including atopic dermatitis (AD), IgE-mediated food allergy (IgE-FA), asthma, and allergic rhinitis (AR) were considered. The P value of a cluster is represented as a probability of association value between 0 and 100 ((1-P)×100). Higher values indicate stronger associations. Two types of P values are calculated: an AU (Approximately Unbiased) P value and a BP (Bootstrap Probability) value. Blue values are AU values, and green values are BP values. Associations with an AU larger than 95% (P value ≤ 0.05) are boxed in red. “Asian” = Asian and Pacific Islander.

As AD is a key risk factor for subsequent allergy development,(2527) we further refined our model by focusing on the first march ‘decision point’—the development of the second allergic manifestation in subjects with AD. To do so, we first identified the group of subjects who progressed from AD to at least one of the other allergic manifestation (IgE-FA, Asthma, or AR). We next grouped these subjects based on the next allergic manifestation they developed. We define these three cohorts as the “primary allergic march trajectories” (Supplemental Table III). To illustrate the associations between subject race and the three primary trajectories, we used a supervised decision tree analysis. Consistent with our unsupervised model, we found that individuals who identified as Black were more likely to progress from AD to Asthma, while individuals who identified as Asian or Pacific Islander were more likely to progress from AD to IgE-FA (Figure 2). We did not find a significant effect of subject sex on this outcome (not shown). This more directed form of modeling also associated progression from AD to AR with individuals who identified as White. Together, these findings further support race as a key demographic feature that associates with distinct allergic march trajectories.

Figure 2.

Figure 2.

Supervised decision tree analysis of associations between allergic trajectories and race. Trajectories including atopic dermatitis (AD), IgE-mediated food allergy (IgE-FA), asthma, and allergic rhinitis (AR) shown. Proportions in each node represent the proportion of the population in the node that belong to each class (AD to AR; AD to Asthma; AD to IgE-FA), and nodes are color-coded based on the most frequent class. Percentage indicates the percent of the total population that remains in each node. “Asian” = Asian and Pacific Islander.

Genome wide analysis of allergic trajectories

There are many factors that can influence race-associated outcomes and that should be considered as co-variates, including those of social, environmental, and genetic origins.(2830) To determine whether specific single-nucleotide polymorphisms (SNPs) predominate in each of the three primary trajectory groups, we performed a GWAS using genotyped samples from individuals of African ancestry (AA). Using this approach, we found four SNPs that reached genome-wide significance. The first SNP we detected (rs60242841) is located on chromosome 2 in an intron of LINC00299 (Figure 3). In our population, rs60242841 was significantly associated with the AD to Asthma allergic trajectory in both the discovery and replication cohorts (Table II). Meta-analysis of the two cohorts yielded a genome-wide significant association (rs60242841, Pcombined = 2.21×10−8). In prior studies, rs60242841 has been associated with Insulin-like growth factor 1 levels,(31) and is located in a region of the genome that includes several allergy-related traits in individuals of European ancestry (EA). Notably, rs60242841 is more common in AA as compared with EA individuals (frequencies of 0.0838 and 0.000519, respectively), an ancestral predominance that is consistent with the findings of our unsupervised and supervised demographic modeling.

Figure 3.

Figure 3.

Manhattan plot and regional association of the genome-wide significant locus associated with the atopic dermatitis to asthma trajectory. (A) -log10(P) values for the association tests (two-tailed) shown on the y axis, chromosomes ordered on the × axis. One genetic locus surpassed the genome-wide significance threshold (-log10(P) > 7.3; indicated by the orange dotted line). Lead variant indicated by a diamond, red circles show the variants in linkage disequilibrium. The light and dark grey colors differentiate adjacent chromosomes. (B) Purple diamond indicates the most significantly associated SNP, and circles represent the other SNPs in the region, with coloring from blue to red corresponding to r2 values from 0 to 1 with the index SNP. Known regional genomic elements are shown. The SNP position refers to the National Center for Biotechnology Information (NCBI) build 36. Estimated recombination rates are from HapMap build 36. “Rec. rate” = Recombination rate; “n/a” = no r2 data.

Table II.

Genome-wide significant associations with allergic march trajectories

Frequency (cases/controls) P value






Trajectory SNP Locus Gene Discovery Replication Discovery Replication Combined

AD to Asthma rs60242841 2p25.1 LINC00299 0.111/0.065 0.126/0.069 4.05×10−5 1.44×10−4 2.21×10−8
AD to AR (All) rs9565267 13q22.2 Intergenic 0.102/0.048 0.095/0.050 7.28×10−6 1.53×10−3 4.68×10−8
AD to AR (Male) rs151041509 4q33 LINC02512 0.191/0.099 0.236/0.094 3.44×10−4 6.72×10−6 1.37×10−8
AD to AR (Female) rs78171803 17q22 Intergenic 0.05/0.009 0.068/0.016 8.69×10−5 1.22×10−4 4.32×10−8

The second SNP we detected (rs9565267) is located in an intergenic region of chromosome 13 (Figure 4). We observed that rs9565267 is significantly associated with the AD to AR allergic trajectory in both the discovery and replication cohorts (Table II), and met genome-wide significance in a meta-analysis combining the discovery and replication cohorts (rs9565267, Pcombined = 4.68×10−8). This SNP has been associated with several inflammatory respiratory conditions in prior GWAS studies.(32) Notably, rs9565267 is more common in EA as compared with AA individuals (frequencies of 0.177 vs 0.0489, respectively), an observation that is again consistent with our demographic modeling that found individuals who identified as White to be more likely to progress from AD to AR.

Figure 4.

Figure 4.

Manhattan plot and regional association of the genome-wide significant locus associated with the atopic dermatitis to allergic rhinitis trajectory. (A) -log10(P) values for the association tests (two-tailed) shown on the y axis, chromosomes ordered on the × axis. One genetic locus surpassed the genome-wide significance threshold (-log10(P) > 7.3; indicated by the orange dotted line). Lead variant indicated by a diamond, red circles show the variants in linkage disequilibrium. The light and dark grey colors differentiate adjacent chromosomes. (B) Purple diamond indicates the most significantly associated SNP, and circles represent the other SNPs in the region, with coloring from blue to red corresponding to r2 values from 0 to 1 with the index SNP. Known regional genomic elements are shown. The SNP position refers to the National Center for Biotechnology Information (NCBI) build 36. Estimated recombination rates are from HapMap build 36. “Rec. rate” = Recombination rate; “n/a” = no r2 data.

Finally, as female sex was found to associate with the AD to AR trajectory in our unsupervised modeling, we performed a sex-stratified GWAS analysis.(33) We did not find a significant effect of sex on the AD to Asthma trajectory GWAS associations. However, when considering sex differences in SNPs associated with the AD to AR trajectory, we identified two additional trajectory-linked SNPs (Figure 5). The SNP rs151041509 is significantly associated with the AD to AR allergic trajectory in males (Pcombined = 1.37×10−8; Figure 5A, B; Table II). Further, the SNP rs78171803 is significantly associated with the AD to AR allergic trajectory in females (Pcombined = 4.32×10−8; Figure 5C, D; Table II). The SNP rs151041509 is enriched in EA as compared with AA individuals (frequencies of 0.23 vs 0.11, respectively), while rs78171803 is enriched in AA as compared with EA individuals (frequencies of 0.023 and 0.0000649, respectively). Together, these results indicate that different SNPs are associated with the AD to AR allergic trajectory in males and females.

Figure 5.

Figure 5.

Manhattan plot and regional association of genome-wide significant loci associated with the atopic dermatitis to allergic rhinitis trajectory in males or females. (A, C) -log10(P) values for the association tests (two-tailed) shown on the y axis, chromosomes ordered on the x axis. One genetic locus surpassed the genome-wide significance threshold (-log10(P) > 7.3; indicated by the orange dotted line) in each group. Lead variant indicated by a diamond, red circles show the variants in linkage disequilibrium. The light and dark grey colors differentiate adjacent chromosomes. (B, D) Purple diamond indicates the most significantly associated SNP in each group, and circles represent the other SNPs in the region, with coloring from blue to red corresponding to r2 values from 0 to 1 with the index SNP. Known regional genomic elements are shown. The SNP position refers to the National Center for Biotechnology Information (NCBI) build 36. Estimated recombination rates are from HapMap build 36. “Rec. rate” = Recombination rate; “n/a” = no r2 data.

DISCUSSION:

We use a large primary care birth cohort and several epidemiologic and bioinformatic approaches to identify key demographic and genetic features that associate with specific allergic disease developmental patterns. Given existing data on the complexity of allergy development in individuals,(7) we use a reductionist approach by focusing on those patients with AD who go on to develop at least one additional allergic manifestation.

Using unsupervised modeling, we find that self-identified race is strongly associated with the major allergic trajectories in our cohort. Specifically, we find that the AD to Asthma trajectory predominates in individuals who identify as Black, while the AD to IgE-FA trajectory is most common in individuals who identify as Asian and Pacific Islander, and the AD to AR trajectory is most common in individuals who identify as White. Health disparities in allergy are well established. Asthma disproportionately affects African Americans both in terms of prevalence and severity(9, 10) There is also emerging evidence of racial and ethnic disparity in food allergy again with worse outcomes in African American and Hispanic children.(1113) However, our results provide an important new lens through which to view disparities within the allergic manifestations by identifying differential risk of allergic multimorbidity within distinct racial groups.

Here we focused on the identification of ancestral SNPs that may contribute to the observed epidemiological associations. We report four polymorphisms that are significantly associated with distinct allergic trajectories. The SNP rs60242841, which is associated with progression from AD to Asthma and is enriched in AA individuals, represents a T to C change in an intronic region of long intergenic non-protein coding RNA 299 (LINC00299). The SNP rs60242841 is associated with higher levels of Insulin Like Growth Factor 1 (IGF1 ),(31) an association that may be relevant to known clinical, genetic, and inflammatory connections between obesity and asthma.(34, 35) Further, multiple SNPs located near rs60242841 (Supplemental Figure 4) are associated with allergy-related traits in EA individuals (Supplemental Table IV). Associations between rs60242841 and allergy-related traits were not found in prior studies, likely as a result of the fact that rs60242841 is rare in EA individuals (frequency of 0.000519). We may have detected an association with rs60242841 as our GWAS was performed on AA individuals where rs60242841 is more common (frequency of 0.0838). Thus, this genomic region is highly associated with several allergy-related traits, and rs60242841 is both over-represented in AA individuals and associated with progression from AD to Asthma. This region may be involved in the regulation of LINC00299, or genes near it, to confer this risk.

Our second identified SNP, rs9565267, is associated with progression from AD to AR. This SNP is enriched in EA individuals, and represents a G to A or T change in an intergenic region. This SNP is associated with scleroderma/systemic sclerosis (P=0.0000068, UKB); emphysema (P=0.00061, UKB); pneumonia (P=0.0031, UKB); and pleurisy (P=0.035, UKB) in prior GWAS studies.(32) As it is present in a non-coding region, it is possible that rs9565267 plays a role in the access to and/or transcriptional regulation of one or more genes or non-coding genomic elements. The association of rs9565267 with multiple inflammatory conditions of the airway is consistent with a contributing role for this SNP in progression from AD to AR. In sum, rs9565267 is associated with inflammatory airway disease, and is both over-represented in EA individuals and associated with progression from AD to AR.

The third and fourth SNPs we identified were found in our sex-stratified analysis. The SNP rs151041509 reaches genome-wide significance in association with progression from AD to AR in males. This SNP is in proximity to LINC01612, and Genotype-Tissue Expression Project (GTEx) data indicates that rs151041509 is an expression quantitative trait locus (eQTL) of LINC02512 in multiple tissues including lung, liver, colon, adipose, and kidney (Supplemental Figure 5). Moreover, rs151041509 is significantly associated with diagnosis of allergic hay fever, rhinitis, or eczema (P = 0.019) as well as forced vital capacity (P = 5.5×10−4) in the round 2 GWAS results of UK Biobank.(32) In the analysis of AD to AR allergic trajectory in females, rs78171803 reaches genome-wide significance. This SNP is significantly associated with circulating leukocyte levels in a previously-published GWAS.(36) The SNP rs151041509 is enriched in EA individuals, while rs78171803 is enriched in AA individuals. No genome-wide significant locus was detected in the sex-stratified analyses of AD to asthma, and we did not observe a significant genetic association with the AD to IgE-FA trajectory. This may be due to the small genotyped sample size of our cohort.

It is notable that the difference in the ancestral distributions of rs60242841, rs9565267, and rs151041509 are consistent with our epidemiologic modeling of the major allergy trajectories, raising the possibility that these polymorphisms contribute to the observed associations. However, it is important to note that race is a social rather than biological construct, one that is associated with other relevant factors—many of which are themselves implicated in health disparities—that we could not fully account for in our analysis(3740) As such, the identification of genetic polymorphisms that predominate in particular ancestral groups should not distract attention from other factors that may contribute to particular outcomes including those of biologic, environmental, institutional, and/or social origins.(28) Additional studies are therefore needed to determine the relative contribution of these SNPs to the observed associations in context with other covariates of race that may influence allergy development, diagnosis, and/or clinical care.

There are additional limitations to our study that should be noted. This study was a secondary analysis of health records at a single institution collected as part of routine care. We relied primarily on diagnosis codes to identify conditions of interest. Choice of diagnosis codes, which may be affected by billing or administrative constraints, may introduce biases in our data collection. However, in a prior chart review of a subset of our cohort in which we compared diagnosis codes to commonly accepted practice parameters and consensus guidelines,(4) we found a high degree of diagnosis code accuracy. It is possible that some assignments to racial groups were made by others, rather than being self-identified, and this may introduce additional bias and confounders into our analyses. Further, variations of follow-up time within our cohort could influence the association results. The reductionist approach of focusing on the major allergic manifestations is a simplification that does not consider other allergic conditions, such as eosinophilic esophagitis.(5) In addition, our approach does not consider possible effects on disease endophenotypes that may exist for each condition.(41) Similarly, by focusing on the primary allergy transitions and trajectories, we do not consider other permutations by which individuals may acquire allergic manifestations(2, 6) As such, additional studies are warranted to refine these observations and further improve our understanding of the genetic, environmental, and immunologic factors that contribute to particular allergic trajectories.

The results presented here are consistent with the notion that genetic factors may contribute to allergic march trajectories. However, a limitation of our GWAS is that it assumes a monogenic effect. Genetic risk factors that contribute to allergy development and progression may in fact be polygenic, and identification of such cases would require further studies with a larger sample size. In addition, our GWAS was performed on AA individuals only, as the number of genotyped EA individuals in our cohort with AD was too small for GWAS. As a result, our GWAS may not have identified relevant SNPs for a EA population. To account for this, our analysis used the TOPMed WGS data set as the imputation reference panel, which significantly improved both imputation quality and accuracy for admixed populations including African Americans.(24) Further, both detected associations were supported in two independent cohorts, indicating a high robustness of our reported associations. Allele frequencies of detected variants in controls are highly consistent between the discovery and replication cohorts, and similar with the allele frequency recorded in the public database, providing evidence that the detected associations were not driven by controls. Never the less, future studies with greater sample sizes may uncover additional novel associations in both AA and EA populations that further delineate the genetic underpinnings of the allergic march trajectories.

Taken together, we find that both self-identified race and specific genetic polymorphisms are associated with disease progression along distinct allergic march trajectories. This work refines our understanding of a simplified yet fundamental concept in the field of allergy, and provides a novel conceptual framework within which to consider why children develop certain allergic diseases. Finally, our findings are an important step towards addressing allergic healthcare disparities by improving prevention, diagnosis, and treatment of at risk patient populations.(15)

Supplementary Material

sf1
sf3
sf2
sf6
sf4
sf5

KEY MESSAGES:

  • Self-identified race is associated with progression on distinct allergic march trajectories.

  • SNPs that vary in frequency among different ancestral groups may contribute to race-trajectory associations.

Acknowledgements and Funding:

We thank Julianna Dilollo and Dr. Melanie Ruffner for critical reading of this manuscript. We acknowledge Drs. Kellie Owens, Risa Lavizzo-Mourey, and Steven Joffe for valuable ethics consultation related to the relationship between genomics, race, and health disparities. This work was supported by the National Institutes of Health DK116668 (DAH), LM010098 (JM), AI116794 (JM), LM012601 (JM), the CHOP Endowed Chair in Genomic Research (HH), the American College of Allergy, Asthma, and Immunology (DAH), American Academy of Allergy, Asthma, and Immunology (DAH), the American Partnership For Eosinophilic Disorders (DAH), and CHOP Research Institute Developmental Awards to the Hill Lab and the Center for Applied Genomics (HH). The content of this work is the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

ABBREVIATIONS:

AD

Atopic dermatitis

IgE-FA

IgE-mediated food allergy

AR

Allergic rhinitis

EMR

Electronic medical record

CHOP

Children’s Hospital of Philadelphia

MAF

Minor allele frequency

WGS

Whole genome sequences

GWAS

Genome-wide association study

AA

African ancestry

EA

European ancestry

CAG

Center for Applied Genomics

TOPMed

Trans-Omics for Precision Medicine

ICD

International Classification of Diseases

IGF1

Insulin like growth factor 1

SNP

Single nucleotide polymorphism

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Conflicts of Interest: The authors have no relevant conflicts of interest.

References:

  • 1.Hill DA, Spergel JM. The atopic march: Critical evidence and clinical relevance. Ann Allergy Asthma Immunol. 2018. February;120(2):131–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Paller AS, Spergel JM, Mina-Osorio P, Irvine AD. The atopic march and atopic multimorbidity: Many trajectories, many pathways. J Allergy Clin Immunol. 2019. January 01;143(1):46–55. [DOI] [PubMed] [Google Scholar]
  • 3.Aw M, Penn J, Gauvreau GM, Lima H, Sehmi R. Atopic march: Collegium internationale allergologicum update 2020. Int Arch Allergy Immunol. 2020; 181(1):1–10. [DOI] [PubMed] [Google Scholar]
  • 4.Hill DA, Grundmeier RW, Ram G, Spergel JM. The epidemiologic characteristics of healthcare provider-diagnosed eczema, asthma, allergic rhinitis, and food allergy in children: A retrospective cohort study. BMC Pediatr. 2016. August 20;16:133–z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Hill DA, Grundmeier RW, Ramos M, Spergel JM. Eosinophilic esophagitis is a late manifestation of the allergic march. J Allergy Clin Immunol Pract. 2018. June 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Hill DA, Camargo CA Jr, Paller AS, Spergel JM. A march by any other name. Ann Allergy Asthma Immunol. 2018. July;121(1):137–8. [DOI] [PubMed] [Google Scholar]
  • 7.Belgrave DC, Granell R, Simpson A, Guiver J, Bishop C, Buchan I, et al. Developmental profiles of eczema, wheeze, and rhinitis: Two population-based birth cohort studies. PLoS Med. 2014. October 21;11(10):e1001748. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Guidelines for the diagnosis and management of asthma. NIH; 2007. August. [Google Scholar]
  • 9.Gamble C, Talbott E, Youk A, Holguin F, Pitt B, Silveira L, et al. Racial differences in biologic predictors of severe asthma: Data from the severe asthma research program. J Allergy Clin Immunol. 2010. December 01;126(6):1149–56.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Akinbami LJ, Moorman JE, Liu X. Asthma prevalence, health care use, and mortality: United states, 2005–2009. Natl Health Stat Report. 2011. January 12;(32)(32):1–14. [PubMed] [Google Scholar]
  • 11.Greenhawt M, Weiss C, Conte ML, Doucet M, Engler A, Camargo CA. Racial and ethnic disparity in food allergy in the united states: A systematic review. J Allergy Clin Immunol Pract. 2013. August 01;1(4):378–86. [DOI] [PubMed] [Google Scholar]
  • 12.Joseph CL, Zoratti EM, Ownby DR, Havstad S, Nicholas C, Nageotte C, et al. Exploring racial differences in IgE-mediated food allergy in the WHEALS birth cohort. Ann Allergy Asthma Immunol. 2016. March 01;116(3):219–224.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mahdavinia M, Fox SR, Smith BM, James C, Palmisano EL, Mohammed A, et al. Racial differences in food allergy phenotype and health care utilization among US children. J Allergy Clin Immunol Pract. 2017. April 01 ;5(2):352–357.e1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Moore JH, Boland MR, Camara PG, Chervitz H, Gonzalez G, Himes BE, et al. Preparing next-generation scientists for biomedical big data: Artificial intelligence approaches. Per Med. 2019. May 01;16(3):247–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ortiz RA, Barnes KC. Genetics of allergic diseases. Immunol Allergy Clin North Am. 2015. February 01;35(1):19–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Feemster KA, Li Y, Grundmeier R, Localio AR, Metlay JP. Validation of a pediatric primary care network in a US metropolitan region as a community-based infectious disease surveillance system. Interdiscip Perspect Infect Dis. 2011;2011:219859. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ruffner MA, Wang KY, Dudley JW, Cianferoni A, Grundmeier RW, Spergel JM, et al. Elevated atopic comorbidity in patients with food protein-induced enterocolitis. J Allergy Clin Immunol Pract. 2019. November 20. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Suzuki R, Shimodaira H. Pvclust: An R package for assessing the uncertainty in hierarchical clustering. Bioinformatics. 2006. June 15;22(12):1540–2. [DOI] [PubMed] [Google Scholar]
  • 19.Therneau TM, Atkinson EJ. An introduction to recursive partitioning using the RPART routines. Mayo Clinic Division of Biostatistics; . [Google Scholar]
  • 20.Almoguera B, Vazquez L, Mentch F, March ME, Connolly JJ, Peissig PL, et al. Novel locus for atopic dermatitis in african americans and replication in european americans. J Allergy Clin Immunol. 2019. March 01;143(3):1229–31. [DOI] [PubMed] [Google Scholar]
  • 21.Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006. August 01;38(8):904–9. [DOI] [PubMed] [Google Scholar]
  • 22.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007. September 01;81(3):559–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Das S, Forer L, Schonherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nat Genet. 2016. October 01;48(10):1284–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Kowalski MH, Qian H, Hou Z, Rosen JD, Tapia AL, Shan Y, et al. Use of >100,000 NHLBI trans-omics for precision medicine (TOPMed) consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed african and hispanic/latino populations. PLoS Genet. 2019. December 23; 15(12):e1008500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Dharmage SC, Lowe AJ, Matheson MC, Burgess JA, Allen KJ, Abramson MJ. Atopic dermatitis and the atopic march revisited. Allergy. 2014. January;69(1):17–27. [DOI] [PubMed] [Google Scholar]
  • 26.Belgrave DC, Simpson A, Buchan IE, Custovic A. Atopic dermatitis and respiratory allergy: What is the link. Curr Dermatol Rep. 2015;4(4):221–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Flohr C, Perkin M, Logan K, Marrs T, Radulovic S, Campbell LE, et al. Atopic dermatitis and disease severity are the main risk factors for food sensitization in exclusively breastfed infants. J Invest Dermatol. 2014. February;134(2):345–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wilkinson DY, King G. Conceptual and methodological issues in the use of race as a variable: Policy implications. Milbank Q. 1987;65 Suppl 1:56–71. [PubMed] [Google Scholar]
  • 29.Burchard EG, Ziv E, Coyle N, Gomez SL, Tang H, Karter AJ, et al. The importance of race and ethnic background in biomedical research and clinical practice. N Engl J Med. 2003. March 20;348(12):1170–5. [DOI] [PubMed] [Google Scholar]
  • 30.Powe CE, Evans MK, Wenger J, Zonderman AB, Berg AH, Nalls M, et al. Vitamin D-binding protein and vitamin D status of black americans and white americans. N Engl J Med. 2013. November 21;369(21):1991–2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Prins BP, Kuchenbaecker KB, Bao Y, Smart M, Zabaneh D, Fatemifar G, et al. Genome-wide analysis of health-related biomarkers in the UK household longitudinal study reveals novel associations. Sci Rep. 2017. September 08;7(1): 11008–1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.UK biobank [homepage on the Internet]. [cited 1/1/2020]. Available from: http://www.nealelab.is/uk-biobank/.
  • 33.Khramtsova EA, Davis LK, Stranger BE. The role of sex in the genomics of human complex traits. Nat Rev Genet. 2019. March 01;20(3):173–90. [DOI] [PubMed] [Google Scholar]
  • 34.Zhu Z, Guo Y, Shi H, Liu CL, Panganiban RA, Chung W, et al. Shared genetic and experimental links between obesity-related traits and asthma subtypes in UK biobank. J Allergy Clin Immunol. 2019. October 24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Peters U, Dixon AE, Forno E. Obesity and asthma. J Allergy Clin Immunol. 2018. April;141(4):1169–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell. 2016. November 17;167(5):1415–1429.e19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Keet CA, McCormack MC, Pollack CE, Peng RD, McGowan E, Matsui EC. Neighborhood poverty, urban residence, race/ethnicity, and asthma: Rethinking the inner-city asthma epidemic. J Allergy Clin Immunol. 2015. March;135(3):655–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Sitarik A, Havstad S, Kim H, Zoratti EM, Ownby D, Johnson CC, et al. Racial disparities in allergic outcomes persist to age 10 years in black and white children. Ann Allergy Asthma Immunol. 2020. January 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Banda Y, Kvale MN, Hoffmann TJ, Hesselson SE, Ranatunga D, Tang H, et al. Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the genetic epidemiology research on adult health and aging (GERA) cohort. Genetics. 2015. August 01;200(4):1285–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Yudell M, Roberts D, DeSalle R, Tishkoff S. SCIENCE AND SOCIETY. taking race out of human genetics. Science. 2016. February 05;351(6273):564–5. [DOI] [PubMed] [Google Scholar]
  • 41.Kuruvilla ME, Lee FE, Lee GB. Understanding asthma phenotypes, endotypes, and mechanisms of disease. Clin Rev Allergy Immunol. 2019. April 01;56(2):219–33. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

sf1
sf3
sf2
sf6
sf4
sf5

RESOURCES