Skip to main content
The Journal of Infectious Diseases logoLink to The Journal of Infectious Diseases
. 2023 Mar 27;228(8):979–989. doi: 10.1093/infdis/jiad068

Genome-Wide Association Studies of Diarrhea Frequency and Duration in the First Year of Life in Bangladeshi Infants

Rebecca M Munday 1, Rashidul Haque 2, Genevieve L Wojcik 3, Poonum Korpe 4, Uma Nayak 5,6, Beth D Kirkpatrick 7, William A Petri Jr 8, Priya Duggal 9,✉,1,3
PMCID: PMC11007397  PMID: 36967705

Abstract

Background

Diarrhea is the second leading cause of death in children under 5 years old worldwide. Known diarrhea risk factors include sanitation, water sources, and pathogens but do not fully explain the heterogeneity in frequency and duration of diarrhea in young children. We evaluated the role of host genetics in diarrhea.

Methods

Using 3 well-characterized birth cohorts from an impoverished area of Dhaka, Bangladesh, we compared infants with no diarrhea in the first year of life to those with an abundance, measured by either frequency or duration. We performed a genome-wide association analysis for each cohort under an additive model and then meta-analyzed across the studies.

Results

For diarrhea frequency, we identified 2 genome-wide significant loci associated with not having any diarrhea, on chromosome 21 within the noncoding RNA AP000959 (C allele odds ratio [OR] = 0.31, P = 4.01 × 10−8), and on chromosome 8 within SAMD12 (T allele OR = 0.35, P = 4.74 × 10−7). For duration of diarrhea, we identified 2 loci associated with no diarrhea, including the same locus on chromosome 21 (C allele OR = 0.31, P = 1.59 × 10−8) and another locus on chromosome 17 near WSCD1 (C allele OR = 0.35, P = 1.09 × 10−7).

Conclusions

These loci are in or near genes involved in enteric nervous system development and intestinal inflammation and may be potential targets for diarrhea therapeutics.

Keywords: GWAS, association, diarrhea, enterics, host genetics, malnutrition


In comparisons of infants with no diarrhea to those with either extreme frequency or duration, infants carrying a specific genetic variant (rs2827548) in a noncoding RNA on chromosome 21 were nearly 70% less likely to have extreme frequency/duration of diarrhea.


Each year, 525 000 children <5 years old die from diarrhea, making it the second most common cause of death in that age group [1]. Efforts to reduce diarrhea have included rotavirus immunization, increasing access to clean water, improving toilet facilities, advocating exclusive breastfeeding, and health education programs [1]. While childhood mortality and overall rates of diarrhea have improved [2], there are still many children experiencing multiple bouts of diarrhea in their first year of life, particularly in low-to-middle-income countries [1]. Outside of public health measures and education, the reported heterogeneity of diarrhea even among children with similar exposures suggests that immunity and host genetics may play an important role [3–5].

Previous studies of the host genetic contribution to diarrheal disease generally fall into 3 categories: bowel disorders, inherited Mendelian disorders, and pathogen-specific analyses. Genetic studies of bowel disorders include irritable bowel syndrome [6], Celiac disease [7], and inflammatory bowel disease [8, 9], and hundreds of susceptibility genes have been identified. Studies of inherited Mendelian disorders, such as congenital sodium diarrhea [10] and trichohepatoenteric syndrome [11], have also identified associated genes. Early pathogen-specific analyses focused on candidate genes, such as FUT2, FUT3, and ABO [12, 13], but more recently have expanded to evaluating genes across the human genome. These include the identification of host genes associated with Entamoeba histolytica [14], Cryptosporidium [15], Shigella [16], and Campylobacter [17] diarrheal infections. A large genome-wide association study (GWAS) of European and American children enrolled in several cohorts identified the FUT2 locus associated with infant diarrhea [18] in European ancestry children not vaccinated for rotavirus.

In this study we interrogated the human genome for associations with diarrhea among infants in low-income households in the first year of life from Dhaka, Bangladesh. Focusing this study in an urban slum with limited sanitation and overcrowding, we aimed to identify those children on the tail ends of the spectrum: fully protected or highly susceptible to multiple diarrheal episodes in the first year of life. The identification of genes associated with these extreme outcomes could provide ideal targets for drug development or therapeutics for all children.

METHODS

Institutional Review Boards at Johns Hopkins Bloomberg School of Public Health, University of Virginia, and International Centre for Diarrhoeal Disease Research, Bangladesh (icddr, b) approved study protocols. A parent/guardian of each participant provided written informed consent.

Study Participants

Infants were ascertained from 3 independent birth cohorts, all from Mirpur, Dhaka, Bangladesh: the Dhaka Birth Cohort (DBC) [19], the Performance of Rotavirus and Oral Polio Vaccines in Developing Countries (PROVIDE) Study [20], and the Cryptosporidiosis Birth Cohort (CBC) [21]. In each cohort, enrollment took place within 1 week after birth and individuals were followed for at least the first year of life. Field research assistants visited each family multiple times every week for the first year, collecting information about feeding habits and symptoms of illness, including but not limited to, diarrhea, vomiting, coughing, and fever. Anthropometric data were recorded at time of enrollment and subsequently every 3 months.

Clinical Definitions

For each participant, the first year of life was defined as an interval beginning on the day of birth and ending on the first birthday. For those born on February 29, the first birthday was defined as March 1. Any diarrheal episodes occurring within the interval were then counted and used to define the phenotype. Diarrhea was defined as 3 or more abnormally loose stools in a 24-hour period. An episode of diarrhea was defined as a period of days during which a caregiver reported that an infant had diarrhea, including no more than 2 consecutive nondiarrheal days. In PROVIDE and CBC, some infants underwent lactulose-mannitol testing during the first year. As lactulose is an osmotic laxative, incidents of diarrhea that occurred on the same day as or the day following a lactulose-mannitol test were excluded from the total count. Total duration of diarrhea was defined as the sum of all days in all episodes in the first year, including nondiarrheal days within a given episode. For example, if an infant had 2 days of diarrhea followed by 2 nondiarrheal days followed by 1 more day of diarrhea, this would be a single episode of 5 days duration.

We considered 2 case definitions based on extremes in either number of diarrheal episodes or days with diarrhea, as determined by examining the distributions. Infants were defined as controls if they had zero episodes of diarrhea and/or had zero days of diarrhea in the first year of life. The first case definition was infants with 6 or more diarrheal episodes in the first year of life. The second case definition was infants reporting 25 or more days of diarrhea in the first year of life.

To compare exposure profiles, we analyzed all available surveillance (nondiarrheal) stool samples in the birth cohorts. Both DBC and CBC had monthly stool collections, which were tested via real-time polymerase chain reaction (RT-PCR) for 3 pathogens: Entamoeba histolytica, Cryptosporidium, and Giardia. In DBC, 22 of 26 children without diarrhea had at least 1 surveillance sample tested and 314 of 354 in the remaining cohort had at least 1 surveillance sample tested. In CBC, all 365 children had at least 1 surveillance sample tested. PROVIDE did not conduct monthly stool collection but clinical samples obtained at 6 weeks and 10 weeks were tested for enteropathogens via TaqMan array card. Using this method, 20 of 32 children without diarrhea had samples tested and 242 of 443 in the remaining cohort had samples tested.

Socioeconomic Definitions

Household income was reported as total monthly income in Bangladeshi taka (BDT). For comparison, during the time of data collection, the international poverty line for a family of 5 was approximately 21 000 BDT/month ($1.90 USD/day/person [22] × 5 people × 30 days/month × 75 BDT per USD [average exchange rate over study periods]).

Sanitation was defined as either improved toilets, including those utilizing a sewer or septic tank, as well as water-sealed or slab latrines, and ventilated-improved pit latrines [23], or unimproved toilets, including other pit latrines, open latrines, hanging latrines, and those with no facility. Drinking water was considered protected if from municipality-supplied piped water, pumped water, and tube wells [23], and unprotected if from an open well or surface water such as from a river, pond, or canal.

Genotyping, Quality Control, and Imputation

Methods for genotyping, quality control, and imputation have been previously described [14] and are outlined in detail in Supplementary Figure 1. All steps were done for each cohort separately. Briefly, infants from DBC were genotyped using 3 different Illumina arrays (Human1M-duoV3, HumanOmni1-Quad v1.0, and HumanOmni2.5-4 v1), infants from PROVIDE were genotyped using the Expanded Multi-Ethnic Genotyping Array (MEGA-EX) from Illumina, and infants from CBC were genotyped using Illumina's Infinium Multi-Ethnic Global Array (MEGA). Standard quality control measures included checking clustering, single-nucleotide polymorphism (SNP) missingness, individual missingness, and heterozygosity rate. Principal component analysis outliers, heterozygosity outliers, and 1 member at random from each pair of relatives were removed. Variants were filtered for minor allele frequency (MAF) >0.005 and Hardy-Weinberg equilibrium P value >5 × 10−5. After this process, phasing and imputation were conducted using SHAPEIT [24, 25] and IMPUTE2 [26–30], with 1000 Genomes Project phase 3 as reference [31]. After imputation, each cohort was rechecked for any additional cryptic relatedness using PLINK [32].

Intercohort relatedness was assessed using KING [33], and 1 member from each pair of relatives was removed (the relative with less complete covariate data) (n = 24). CBC was restricted to the Mirpur site, resulting in removal of 221 individuals. Infants were removed if they left the study before their first birthday, to ensure proper phenotyping. Final sample sizes for each cohort were 380 in DBC, 475 in PROVIDE, and 365 in CBC.

Association Analysis

We used SNPTEST [26, 28, 34] to run logistic regression for each case-control analysis, assuming an additive model of inheritance, and including sex and the first principal component as covariates (5.6%, 0.03%, 0.04% of the variance explained in DBC, PROVIDE, and CBC, respectively). In DBC, genotyping batch was also included as a covariate to account for any potential batch effects. Each cohort was analyzed separately, and results were filtered for MAF ≥0.05 and information score (INFO) ≥ 0.7. Filtered summary statistics were used in META [35] for inverse-variance weighted, fixed-effects meta-analyses. Meta-analysis results were restricted to variants that were observed in all 3 cohorts and passed heterogeneity testing (Phet > .05). We conducted a sensitivity analysis by adding variables for socioeconomic status to the model and analyzing the top loci again. We added income as a continuous variable and access to improved toilet as a discrete binary variable, with all other aspects of the model as specified above. We assessed potential interaction between the top SNPs and sex, income, and sanitation using nested models in R.

Annotation and Functional Assessment

Results from the meta-analysis were privately uploaded to LocusZoom [36] and FUMA [37]. Complete methods for each of the FUMA tests can be found online (https://fuma.ctglab.nl/). Briefly, annotation included MAGMA competitive gene set analysis [38], functional consequences of SNPs on genes as assigned by ANNOVAR [39], overlap with regulatory elements, regulatory potential from RegulomeDB [40], expression quantitative trait loci (eQTLs) from the Genotype-Tissue Expression (GTEx) project [41], and chromatin interactions. Default parameters were used for all gene mapping, with eQTLs assessed in all available tissue types, and all available built-in chromatin interaction data utilized. Where applicable, expression data were obtained from The Human Protein Atlas (https://www.proteinatlas.org) [42]. Global allele frequencies were visualized using the Geography of Genetic Variants Browser (https://popgen.uchicago.edu/ggv/) [43].

RESULTS

Clinical and Sociodemographic Characteristics of Each Cohort

Overall, the male to female ratio was about 50% for each cohort with an average household size of 5–6 (Supplementary Table 1). Median monthly household income steadily increased over time reflecting overall changes in the Dhaka community, from 6000 BDT in DBC (2008–2010) to 10 000 BDT in PROVIDE (2011–2012) and 14 000 BDT in CBC (2014–2016). All 3 cohorts were well below the international poverty line (21 000 BDT for family of 5). In all cohorts the infants had an average of approximately 4 episodes of diarrhea and total duration of 16–18 days over the course of the first year. At birth, the length-for-age Z-scores (LAZ), representing long-term nutritional status, were similar in all cohorts. Weight-for-length Z-scores (WLZ) at birth, representing acute nutritional status, improved over time but still fell more than 1 standard deviation below the global median. At 1 year of age, the average LAZ in each cohort was lower than at birth, indicating chronic undernutrition over the course of the first year. The proportion of households using improved toilets, including sewer, septic tank, water- or slab-sealed pits, and ventilated-improved pits, sharply increased from 38% in DBC to 95% in PROVIDE and 94% in CBC. Very few households in any cohort had unprotected drinking water (<1%).

The number of episodes of diarrhea and total days of diarrhea are detailed in Supplementary Figure 2. We defined controls as infants with no reported diarrhea in the first year (0 episodes/0 days) and cases as those with 6 or more episodes and/or 25 or more days. The 2 groups with extreme diarrhea had considerable but not complete overlap of individuals. In DBC and PROVIDE, the children with no diarrhea were more likely to live in households with access to improved toilets (Table 1). In all 3 cohorts we noted an inverse relationship between median age at first diarrheal episode and both total number of episodes and total duration in days. In DBC and PROVIDE, children with the most diarrhea (6 or more episodes and/or 25 or more days) had their first episode at a median age <60 days, and the same infants in CBC were <70 days. This is in stark contrast to the children with only 1–2 episodes and/or 1–14 days, who were closer to 150 days old at the time of their first diarrheal episode (Supplementary Figure 3).

Table. 1.

Demographics by Phenotype and Cohort

Demographics DBC PROVIDE CBC
No Diarrhea
(n = 26)
6+ Diarrheal Episodes
(n = 96)
25+ Days of Diarrhea
(n = 70)
No Diarrhea
(n = 32)
6+ Diarrheal Episodes
(n = 117)
25+ Days of Diarrhea
(n = 130)
No Diarrhea
(n = 32)
6+ Diarrheal Episodes
(n = 80)
25+ Days of Diarrhea
(n = 93)
Female, % 58 47 47 66 39 41 59 54 48
Maternal age at enrollment, mean (SD) 25.6 (5.93) 25.5 (5.01) 25.4 (5.30) 25.2 (4.74) 24.5 (4.37) 24.4 (4.14) 24.6 (5.06) 25.0 (4.66) 24.8 (4.81)
Household size, mean (SD) 5.77 (2.76) 5.43 (1.93) 5.41 (2.34) 5.28 (2.29) 5.44 (2.35) 5.35 (2.29) 5.41 (2.62) 5.54 (2.39) 5.24 (2.20)
Household income, median (IQR) 7.50
(6.05–8.87)
5.50
(4.29–8.00)
5.35
(4.22–7.15)
10.0
(7.75–15.0)
9.00
(6.00–14.0)
9.00
(6.50–15.0)
15.0
(9.00–21.2)
12.0
(10.0–18.0)
14.0
(10.0–18.0)
Improved toilet, % 65 28 24 100 94 92 88 99 98
Birth WLZ, mean (SD) −1.51 (1.01) −1.28 (1.27) −1.35 (1.33) −1.20 (0.78) −1.39 (0.99) −1.37 (0.96) −0.91 (1.41) −1.26 (1.17) −1.39 (1.19)
Birth LAZ, mean (SD) −0.52 (0.94) −1.05 (1.03) −1.10 (1.05) −1.09 (0.78) −0.85 (0.95) −0.85 (0.94) −1.05 (0.96) −0.92 (1.04) −0.92 (1.00)
1 y WLZ, mean (SD) −0.94 (0.79) −1.04 (1.15) −1.14 (1.11) −0.82 (0.92) −0.62 (1.11) −0.59 (1.03) −0.54 (0.99) −0.32 (1.14) −0.38 (1.18)
1 y LAZ, mean (SD) −1.38 (0.98) −1.81 (0.98) −1.89 (0.93) −1.62 (1.24) −1.58 (1.13) −1.58 (1.11) −1.48 (0.84) −1.38 (0.98) −1.23 (0.93)
Age at first diarrheal episode, median (IQR) NA 50.0
(23.0–78.0)
39.0
(21.2–76.7)
NA 58.0
(31.0–93.0)
58.0
(29.2–96.7)
NA 62.5
(33.0–86.5)
66.0
(39.0–91.0)

Household income is reported in thousands of Bangladeshi taka. Improved toilet includes sewer or septic tank, water-sealed or slab latrines, and ventilated-improved pit latrines. Phenotypes are no diarrhea, 6 or more diarrheal episodes in the first year of life, and 25 or more days of diarrhea in the first year of life. The 2 diarrhea groups are not mutually exclusive.

Abbreviations: CBC, Cryptosporidiosis Birth Cohort; DBC, Dhaka Birth Cohort; IQR, interquartile range; LAZ, length-for-age Z score; NA, not applicable; PROVIDE, Performance of Rotavirus and Oral Polio Vaccines in Developing Countries Study; WLZ, weight-for-length Z score.

We evaluated whether the children with no diarrhea were exposed to pathogens in the environment at a level comparable to those with diarrhea (Figure 1). In the available samples, the children with no diarrhea had similar rates of asymptomatic infection compared to their respective cohorts. This indicates that these children were exposed to the same pathogens as their peers but did not develop diarrhea. PROVIDE nondiarrheal samples were collected at 6 weeks and 10 weeks of age (Supplementary Figure 4).

Figure 1.

Figure 1.

Proportion of children carrying pathogens in surveillance stool. A, Surveillance stool samples from the Dhaka Birth Cohort. Left bars represent proportion of children without diarrhea in the first year of life who tested positive for a given pathogen at least once in surveillance (total n = 22). Right bars represent proportion of the remaining cohort who tested positive for a given pathogen at least once in surveillance (total n = 314). B, Surveillance stool samples from the Cryptosporidiosis Birth Cohort. Left bars represent proportion of children without diarrhea in the first year of life who carried a given pathogen in surveillance stool (total n = 32). Right bars represent proportion of the remaining cohort who carried pathogen in surveillance stool (total n = 333).

Genome-Wide Association Analyses

We evaluated approximately 7.36 million variants in 383 infants. Comparing infants with 6 or more diarrheal episodes to those with no diarrhea, we identified 2 protective genome-wide significant (P < 5 × 10−7) regions on chromosomes 21 and 8 (Figure 2A). The top SNP was rs2827548 (effect allele frequency [EAF] = 0.26) located on chromosome 21 in an intron of the noncoding RNA AP000959 (C allele odds ratio [OR ]= 0.31, 95% confidence interval [CI] = .21–.47, P = 4.01 × 10−8). The protective effect size was consistent across cohorts (Phet = .24; Figure 3A and Supplementary Table 2). Interestingly, in the meta-analysis for no diarrhea versus 25 or more days of diarrhea, we observed the same association with SNP rs2827548 (EAF = 0.25, C allele OR = 0.31, 95% CI = .21–.46, P = 1.59 × 10−8; Figure 2B and Figure 3B, and Supplementary Table 2).

Figure 2.

Figure 2.

Manhattan plots of genome-wide association meta-analyses. A, Comparison of having no diarrheal episodes versus 6 or more diarrheal episodes in the first year of life. Three peaks are identified, 1 suggestive and 2 significant. B, Comparison of no days of diarrhea versus 25 or more days of diarrhea. Four peaks are identified, chromosomes 17 and 21 are genome-wide significant. Chromosome 15 encompasses 2 suggestive peaks. Each point is a single variant. The x-axis is chromosomal position and the y-axis is the −log10(P). The dashed line indicates the genome-wide significance threshold, P = 5 × 10−7.

Figure 3.

Figure 3.

Forest plots of ORs for top SNP at each locus. A, The 3 most significant SNPs, rs2827548 (chromosome 21), rs1915541 (chromosome 8), and rs6841521 (chromosome 4), from the GWAS of no diarrheal episodes versus 6 or more diarrheal episodes. ORs are indicated by points, with bars showing the 95% confidence intervals. Dashed lines represent an OR of 1.0 or the null hypothesis of no difference between groups. B, The 4 most significant SNPs, rs2827548 (chromosome 21), rs5026214 (chromosome 17), rs11635981 (chromosome 15), and rs1873995 (chromosome 15), from the GWAS of no days of diarrhea versus 25 or more days of diarrhea. ORs are indicated by points, with bars showing the 95% confidence intervals. Dashed lines represent an OR of 1.0 or the null hypothesis of no difference between groups. Abbreviations: CBC, the Cryptosporidiosis Birth Cohort; DBC, the Dhaka Birth Cohort; GWAS, genome-wide association study; OR, odds ratio; PROVIDE, Performance of Rotavirus and Oral Polio Vaccines in Developing Countries; SNP, single-nucleotide polymorphism.

The other protective locus for 0 versus 6 or more episodes of diarrhea was rs1915541 on chromosome 8 (EAF = 0.25), in the first intron of the sterile alpha motif domain containing 12 (SAMD12) gene (T allele OR = 0.35, 95% CI = .23–.53, P = 4.74 × 10−7). In the analysis of 0 versus 25 or more days of diarrhea, another protective locus was identified on chromosome 17 for rs5026214 (EAF = 0.55), near WSCD1 (C allele OR = 0.35, 95% CI = .24–.51, P = 1.09 × 10−7). To account for any underlying socioeconomic changes in the cohorts over time, we performed a sensitivity analysis on the top loci, and we included monthly household income and access to improved toilet facility. All top SNPs remained significant, with consistent effect sizes (Supplementary Table 3).

The allele frequencies for the effect allele were similar in both diarrhea case categories across the 3 studies (Supplementary Table 4). Interestingly, the children with no diarrhea for rs2827548 had an effect allele (C allele) frequency of 0.42, higher than the 1000 Genomes Bengali in Bangladesh population (0.33) and higher than any other 1000 Genomes population in Asia or Europe (Figure 4). Consistent with the protective effect, the effect allele frequencies were higher in the no diarrhea group as compared to the case categories. We evaluated the genetic principal components by case definition, and there was no substructure detected to suggest that the children with no diarrhea are genetically distinct from the other children at a genome-wide level (Supplementary Figure 5).

Figure 4.

Figure 4.

Frequency of chromosome 21 effect allele compared to global populations. Each pie chart represents the frequencies of the C (dark) and G (light) alleles in the 1000 Genomes Project population closest to its location on the map. Some circles are slightly shifted to avoid overlap. BEB is the Bengali in Bangladesh population, and to the right of that are shown the allele frequencies in our meta-analyses.

Assessment of Interaction

We were interested in whether there were any interactions between our top loci and sociodemographic factors, including sex, income, and sanitation. When we stratified the infants in DBC by access to improved toilets, we observed that the top SNP on chromosome 21, rs2827548, was only protective against 6 or more episodes of diarrhea for those with access to improved toilets and those with unimproved toilets showed no difference in case status by genotype (Supplementary Figure 6A). We were unable to include a similar stratification for the other cohorts due to most families having improved toilets. Including an interaction term in the generalized linear model revealed a significant interaction between access to improved toilet and the top SNP on chromosome 21, rs2827548, for the outcome of 0 versus 6 or more episodes of diarrhea in DBC (Supplementary Table 5). The same interaction was significant in DBC for the analysis of 0 versus 25 or more days of diarrhea (Supplementary Table 6). In PROVIDE, we noted a significant interaction between sex and the top SNP on chromosome 21, rs2827548, for both outcomes (Supplementary Tables 7 and 8). In PROVIDE, we also observed a significant interaction between income and the top SNP on chromosome 17, rs5026214, for the analysis of 0 versus 25 or more days of diarrhea. No significant interactions were observed in CBC (Supplementary Tables 9 and 10).

Annotation and Functional Assessment

One way to assess potential function of noncoding variation is to look for chromatin interactions, or areas of the genome that are in close contact and become ligated together during the experiment. FUMA specifically annotates interactions between a locus of interest (the significant region from the GWAS) and the promoter region of a gene, which indicates potential regulatory function. For the chromosome 21 locus (AP000959), FUMA identified significant (false discovery rate [FDR] ≤ 1 × 10−6) chromatin interactions, which mapped to 40 different genes, including NCAM2 (Supplementary Figure 7). The chromosome 8 locus had significant (FDR ≤ 1 × 10−6) chromatin interactions mapped to 26 different genes (Supplementary Figure 8). These included SAMD12, COLEC10, NOV, and MAL2. Additionally, the top SNP in this locus, rs1915541, was identified as an eQTL for SAMD12 in 10 different tissues, including esophagus mucosa and whole blood. The protective allele (T) is associated with lower expression of SAMD12 in esophagus mucosa (normalized effect size [NES] = −0.40, P = 7.1 × 10−15) and higher expression of SAMD12 in whole blood (NES = 0.38, P = 1.1 × 10−14). Finally, the chromosome 17 locus had significant (FDR ≤1 × 10−6) chromatin interactions mapped to 25 different genes (Supplementary Figure 9). These included DHX33, NLRP1, DERL2, WSCD1, and OR1G1. Zoomed in plots of each top locus and quantile-quantile plots for both analyses can be found in the supplementary materials (Supplementary Figures 10-14).

DISCUSSION

In this study of Bangladeshi infants, we identified 3 loci that were protective against extreme frequency or duration of diarrhea in the first year of life. These infants lived in an urban and poor city slum with limited sanitation, thus they were exposed to a variety of pathogens throughout the first year of life. We identified a small group in each cohort who despite these exposures (surveillance sampling) did not have a diarrheal episode in the first year of life. Furthermore, we noted an interaction between our top SNP, rs2827548, and access to improved toilets, which demonstrated that protective genetic variants may not be sufficient to guard against extreme illness when there is insufficient sanitation. The comprehensive characterization of these children with household visits afforded us the opportunity to identify these protective loci that may serve as therapeutic targets for diarrheal disease across populations.

The first locus on chromosome 21 was shared between both frequency and duration of diarrhea. The intronic variant in the lncRNA AP000959 had several significant chromatin interactions, including with the neural cell adhesion molecule 2 gene (NCAM2). NCAM2 is involved in development of the enteric nervous system, and shows higher expression in wild-type mouse gut versus the aganglionic model used for Hirschsprung disease [44–46]. The enteric nervous system has many functions, including responding to sensory stimuli from the wall of the bowel, controlling intestinal motility, regulating intestinal secretion, and controlling intestinal blood flow. Furthermore, NCAM2 is downregulated in ulcerative colitis [47], although it is unclear whether this is a cause or consequence of disease.

Another significant locus, SAMD12, was identified for frequency of diarrhea and reached suggestive associations for duration of diarrhea. SAMD12 has high protein abundance in the gut [42] and the intronic variant we identified is an eQTL for SAMD12 in esophagus mucosa and whole blood. Additionally, SAMD12 is differentially expressed in a FOXO3 knockout model of inflammatory colon cancer [48] and intronic repeat expansions in this gene are associated with benign adult familial myoclonic epilepsy (Mendelian Inheritance in Man 601068).

An additional locus on chromosome 17 was significant in the analysis of days and suggestive in the analysis of episodes. The lead SNP, rs5026214, is near the WSCD1 gene, which is downregulated in pediatric inflammatory bowel disease [49]. According to the Human Protein Atlas, this protein is most abundant in the gastrointestinal tract [42]. Another SNP, rs146313367, upstream of WSCD1 (but not in allelic association with our lead SNP) was suggestive in a GWAS of irritable bowel syndrome in the UK Biobank [50]. This locus also had chromatin interactions with several genes that could be related to gut inflammation, including DHX33, OR1G1, and NLRP1.

It is remarkable that these novel genetic associations are distinct from our previous studies [14–17] of genetic susceptibility to specific pathogens, as well as those previously described for diarrhea in European-ancestry infants [18]. This suggests that these loci are not pathogen specific but rather reflect pathology of protection from diarrhea generally. This is also reflected in the similar GWAS results between extremes of the frequency of incident diarrheal episodes and duration due to the same children classified as not having diarrhea and thus not having days of diarrhea. However, there were loci that differed between the 2 analyses, indicating that protection from incident diarrhea may be different from protection for total duration of diarrhea. If we can identify infants at risk for repeat diarrheal episodes or prolonged bouts of diarrhea, we may be able to target treatment or therapeutics to limit both the acute and chronic effects of infant diarrhea.

There were limitations to this study. First, to get a clear distinction between those with diarrhea and those who did not have any diarrhea in this high exposure environment, we opted to analyze children at the extreme ends of the distributions. However, this limited our sample size, and power. Second, children get diarrhea for many different reasons, including pathogens, stress, dietary changes, and bowel disorders. We could not evaluate the cause of the diarrhea, but by using the extremes of the distribution we highlight those children most prone to either frequent or long episodes.

In our genome-wide analyses of extreme frequency and prolonged duration of diarrhea in the first year of life in Bangladeshi infants, we identified protective genes with plausible functional links to the enteric nervous system, intestinal inflammation, and response to pathogens. Understanding the genetic architecture of these variants may provide targets for treatment and prevention of diarrhea.

Supplementary Data

Supplementary materials are available at The Journal of Infectious Diseases online. Consisting of data provided by the authors to benefit the reader, the posted materials are not copyedited and are the sole responsibility of the authors, so questions or comments should be addressed to the corresponding author.

Supplementary Material

jiad068_Supplementary_Data

Contributor Information

Rebecca M Munday, Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA.

Rashidul Haque, International Centre for Diarrhoeal Disease Research, Bangladesh, Dhaka, Bangladesh.

Genevieve L Wojcik, Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.

Poonum Korpe, Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.

Uma Nayak, Center for Public Health Genomics, University of Virginia School of Medicine, Charlottesville, Virginia, USA; Department of Public Health Sciences, University of Virginia School of Medicine, Charlottesville, Virginia, USA.

Beth D Kirkpatrick, Vaccine Testing Center, Larner College of Medicine, University of Vermont, Burlington, Vermont, USA.

William A Petri, Jr, Department of Medicine, Infectious Diseases, and International Health, University of Virginia School of Medicine, Charlottesville, Virginia, USA.

Priya Duggal, Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.

Notes

Acknowledgments. We thank the families of Mirpur that participated in these cohort studies, as well as the laboratory staff and field research assistants.

Disclaimer. The funders had no role in study design, analysis, or publication.

Financial support. This work was supported by the Burroughs-Wellcome Fund (Maryland Genetics Epidemiology and Medicine Training Program); the National Institute of Allergy and Infectious Disease (grant numbers AI146123, AI043596, and AI026649); and the Bill and Melinda Gates Foundation (grants to W. A. P. Jr). icddr, b is grateful to the governments of Bangladesh, the UK, Sweden, and Canada for core unrestricted support.

References

  • 1. World Health Organization . Diarrhoeal disease, 2017. https://www.who.int/news-room/fact-sheets/detail/diarrhoeal-disease. Accessed 17 June 2022.
  • 2. World Health Organization, United Nations Children's Fund (UNICEF) . Ending preventable child deaths from pneumonia and diarrhoea by 2025: the integrated global action plan for pneumonia and diarrhoea (GAPPD). Geneva: World Health Organization, 2013. [Google Scholar]
  • 3. Platts-Mills  JA, Babji  S, Bodhidatta  L, et al.  Pathogen-specific burdens of community diarrhoea in developing countries: a multisite birth cohort study (MAL-ED). Lancet Glob Health  2015; 3:e564–75. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Anders  KL, Thompson  CN, Thuy  NTV, et al.  The epidemiology and aetiology of diarrhoeal disease in infancy in southern Vietnam: a birth cohort study. Int J Infect Dis  2015; 35:3–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Sarkar  R, Gladstone  BP, Warier  JP, et al.  Rotavirus and other diarrheal disease in a birth cohort from Southern Indian community. Indian Pediatr  2016; 53:583–8. [DOI] [PubMed] [Google Scholar]
  • 6. Eijsbouts  C, Zheng  T, Kennedy  NA, et al.  Genome-wide analysis of 53,400 people with irritable bowel syndrome highlights shared genetic pathways with mood and anxiety disorders. Nat Genet  2021; 53:1543–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Serena  G, Lima  R, Fasano  A. Genetic and environmental contributors for celiac disease. Curr Allergy Asthma Rep  2019; 19:40. [DOI] [PubMed] [Google Scholar]
  • 8. Garza-Hernandez  D, Sepulveda-Villegas  M, Garcia-Pelaez  J, et al.  A systematic review and functional bioinformatics analysis of genes associated with Crohn's disease identify more than 120 related genes. BMC Genomics  2022; 23:302. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Fachal  L; International IBD Genetics Consortium . OP11 expanded genome-wide association study of inflammatory bowel disease identifies 174 novel loci and directly implicates new genes in disease susceptibility. J Crohns Colitis  2022; 16:i011–3. [Google Scholar]
  • 10. Müller  T, Wijmenga  C, Phillips  AD, et al.  Congenital sodium diarrhea is an autosomal recessive disorder of sodium/proton exchange but unrelated to known candidate genes. Gastroenterology  2000; 119:1506–13. [DOI] [PubMed] [Google Scholar]
  • 11. Hartley  JL, Zachos  NC, Dawood  B, et al.  Mutations in TTC37 cause trichohepatoenteric syndrome (phenotypic diarrhea of infancy). Gastroenterology  2010; 138:2388–98.e2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Marionneau  S, Ruvoën  N, Le Moullac-Vaidye  B, et al.  Norwalk virus binds to histo-blood group antigens present on gastroduodenal epithelial cells of secretor individuals. Gastroenterology  2002; 122:1967–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Tan  M, Jiang  X. Norovirus and its histo-blood group antigen receptors: an answer to a historical puzzle. Trends Microbiol  2005; 13:285–93. [DOI] [PubMed] [Google Scholar]
  • 14. Wojcik  GL, Marie  C, Abhyankar  MM, et al.  Genome-wide association study reveals genetic link between diarrhea-associated Entamoeba histolytica infection and inflammatory bowel disease. mBio  2018; 9:e01668-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Wojcik  GL, Korpe  P, Marie  C, et al.  Genome-wide association study of cryptosporidiosis in infants implicates PRKCA. mBio  2020; 11:e03343-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Duchen  D, Haque  R, Chen  L, et al.  Host genome wide association study of infant susceptibility to Shigella-associated diarrhea. Infect Immun  2021; 89:e00012-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Munday  RM, Haque  R, Jan  N-J, et al.  Genome-wide association study of Campylobacter-positive diarrhea identifies genes involved in toxin processing and inflammatory response. mBio  2022; 13:e0055622. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Bustamante  M, Standl  M, Bassat  Q, et al.  A genome-wide association meta-analysis of diarrhoeal disease in young children identifies FUT2 locus and provides plausible biological pathways. Hum Mol Genet  2016; 25:4127–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Mondal  D, Minak  J, Alam  M, et al.  Contribution of enteric infection, altered intestinal barrier function, and maternal malnutrition to infant malnutrition in Bangladesh. Clin Infect Dis  2012; 54:185–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Kirkpatrick  BD, Colgate  ER, Mychaleckyj  JC, et al.  The “performance of rotavirus and oral polio vaccines in developing countries” (PROVIDE) study: description of methods of an interventional study designed to explore complex biologic problems. Am J Trop Med Hyg  2015; 92:744–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Steiner  KL, Ahmed  S, Gilchrist  CA, et al.  Species of Cryptosporidia causing subclinical infection associated with growth faltering in rural and urban Bangladesh: a birth cohort study. Clin Infect Dis  2018; 67:1347–55. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. World Health Organization . Proportion of population below the international poverty line (%). https://www.who.int/data/gho/indicator-metadata-registry/imr-details/4744. Accessed 7 July 2022.
  • 23. World Health Organization . Improved sanitation facilities and drinking-water sources. https://www.who.int/data/nutrition/nlis/info/improved-sanitation-facilities-and-drinking-water-sources. Accessed 7 July 2022.
  • 24. Delaneau  O, Zagury  J-F, Marchini  J. Improved whole-chromosome phasing for disease and population genetic studies. Nat Methods  2013; 10:5–6. [DOI] [PubMed] [Google Scholar]
  • 25. Delaneau  O, Marchini  J, McVean  G. Integrating sequence and array data to create an improved 1000 genomes project haplotype reference panel. Nat Commun  2014; 5:3934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Marchini  J, Howie  B, Myers  S, McVean  G, Donnelly  P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet  2007; 39:906–13. [DOI] [PubMed] [Google Scholar]
  • 27. Howie  BN, Donnelly  P, Marchini  J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLOS Genet  2009; 5:e1000529. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Marchini  J, Howie  B. Genotype imputation for genome-wide association studies. Nat Rev Genet  2010; 11:499–511. [DOI] [PubMed] [Google Scholar]
  • 29. Howie  B, Marchini  J, Stephens  M. Genotype imputation with thousands of genomes. G3 (Bethesda)  2011; 1:457–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Howie  B, Fuchsberger  C, Stephens  M, Marchini  J, Abecasis  GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet  2012; 44:955–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Auton  A, Abecasis  GR, Altshuler  DM, et al.  A global reference for human genetic variation. Nature  2015; 526:68–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Chang  CC, Chow  CC, Tellier  LC, Vattikuti  S, Purcell  SM, Lee  JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience  2015; 4:7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Manichaikul  A, Mychaleckyj  JC, Rich  SS, Daly  K, Sale  M, Chen  W-M. Robust relationship inference in genome-wide association studies. Bioinformatics  2010; 26:2867–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Burton  PR, Clayton  DG, Cardon  LR. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature  2007; 447:661–78. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Liu  JZ, Tozzi  F, Waterworth  DM, et al.  Meta-analysis and imputation refines the association of 15q25 with smoking quantity. Nat Genet  2010; 42:436–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Pruim  RJ, Welch  RP, Sanna  S, et al.  Locuszoom: regional visualization of genome-wide association scan results. Bioinformatics  2010; 26:2336–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Watanabe  K, Taskesen  E, van Bochoven  A, Posthuma  D. Functional mapping and annotation of genetic associations with FUMA. Nat Commun  2017; 8:1826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. de Leeuw  CA, Mooij  JM, Heskes  T, Posthuma  D. MAGMA: generalized gene-set analysis of GWAS data. PLOS Comput Biol  2015; 11:e1004219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Wang  K, Li  M, Hakonarson  H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res  2010; 38:e164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Boyle  AP, Hong  EL, Hariharan  M, et al.  Annotation of functional variation in personal genomes using RegulomeDB. Genome Res  2012; 22:1790–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. GTEx Consortium . Genetic effects on gene expression across human tissues. Nature  2017; 550:204–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Uhlén  M, Fagerberg  L, Hallström  BM, et al.  Proteomics. Tissue-based map of the human proteome. Science  2015; 347:1260419. [DOI] [PubMed] [Google Scholar]
  • 43. Marcus  JH, Novembre  J. Visualizing the geography of genetic variants. Bioinformatics  2017; 33:594–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Vohra  BPS, Tsuji  K, Nagashimada  M, et al.  Differential gene expression and functional analysis implicates novel mechanisms in enteric nervous system precursor migration and neuritogenesis. Dev Biol  2006; 298:259–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Schriemer  D, Sribudiani  Y, IJpma  A, et al.  Regulators of gene expression in enteric neural crest cells are putative Hirschsprung disease genes. Dev Biol  2016; 416:255–65. [DOI] [PubMed] [Google Scholar]
  • 46. Avetisyan  M. Development of enteric neurons and muscularis macrophages  [Dissertation]. St Louis: Washington University in St Louis, 2019. [Google Scholar]
  • 47. Schniers  A, Anderssen  E, Fenton  CG, et al.  The proteome of ulcerative colitis in colon biopsies from adults—optimized sample preparation and comparison with healthy controls. Proteomics Clin Appl  2017; 11:1700053. [DOI] [PubMed] [Google Scholar]
  • 48. Penrose  HM, Cable  C, Heller  S, et al.  Loss of forkhead box O3 facilitates inflammatory colon cancer: transcriptome profiling of the immune landscape and novel targets. Cell Mol Gastroenterol Hepatol  2018; 7:391–408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. Fang  K, Grisham  MB, Kevil  CG. Application of comparative transcriptional genomics to identify molecular targets for pediatric IBD. Front Immunol  2015; 6:165. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Bonfiglio  F, Zheng  T, Garcia-Etxebarria  K, et al.  Female-specific association between variants on chromosome 9 and self-reported diagnosis of irritable bowel syndrome. Gastroenterology  2018; 155:168–79. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

jiad068_Supplementary_Data

Articles from The Journal of Infectious Diseases are provided here courtesy of Oxford University Press

RESOURCES