Skip to main content
iScience logoLink to iScience
. 2022 Apr 1;25(5):104189. doi: 10.1016/j.isci.2022.104189

Balancing selection at the human salivary agglutinin gene (DMBT1) driven by host-microbe interactions

Adel F Alharbi 1,2, Nongfei Sheng 3, Katie Nicol 1, Nicklas Strömberg 3, Edward J Hollox 1,4,
PMCID: PMC9038570  PMID: 35494225

Summary

Discovering loci under balancing selection in humans can identify loci with alleles that affect response to the environment and disease. Genome variation data have identified the 5′ region of the DMBT1 gene as undergoing balancing selection in humans. DMBT1 encodes the pattern-recognition glycoprotein DMBT1, also known as SALSA, gp340, or salivary agglutinin. DMBT1 binds to a variety of pathogens through a tandemly arranged scavenger receptor cysteine-rich (SRCR) domain, with the number of domains polymorphic in humans. We show that the signal of balancing selection is driven by one haplotype usually carrying a shorter SRCR repeat and another usually carrying a longer SRCR repeat. DMBT1 encoded by a shorter SRCR repeat allele does not bind a cariogenic and invasive Streptococcus mutans strain, in contrast to the long SRCR allele that shows binding. Our results suggest that balancing selection at DMBT1 is due to host-microbe interactions of encoded SRCR tandem repeat alleles.

Subject areas: biological sciences, genetics, evolutionary mechanisms

Graphical abstract

graphic file with name fx1.jpg

Highlights

  • Clear evidence from many analyses show balancing selection at DMBT1

  • Scavenger-receptor cysteine-rich domain array associated with balancing selection

  • Genetic variation, not alternative splicing, responsible for protein isoforms

  • Long, but not short, DMBT1 isoforms bind a cariogenic strain of Streptococcus mutans


Biological sciences; Genetics; Evolutionary mechanisms

Introduction

Patterns of human genetic variation across populations are consistent with most genetic variation being neutral, with only a small subset of variation affected by natural selection (Fu and Akey, 2013). The role of balancing selection in shaping the pattern of human genetic diversity by maintaining alleles in a population at higher frequencies and/or for long periods of time remains elusive (Key et al., 2014). From the earliest study on malaria and sickle cell disease to modern genome scans, it is becoming clear that pathogen pressure has been important in maintaining alleles at intermediate frequencies at particular loci. Most evidence has focused on well-established examples, such as the sickle-cell trait protecting against severe malaria in individuals heterozygous for the beta-globin (HBB) sickle-cell allele, glucose-6-phosphate dehydrogenase (G6PD), and the major histocompatibility complex (MHC) region (Key et al., 2014; Solberg et al., 2008). In these examples, overdominance (heterozygote advantage) maintains two or more distinct alleles in a population. Other balancing selection mechanisms, such as frequency-dependent selection or fluctuating selection, remain underexplored.

Analysis of polymorphisms maintained in both human and chimpanzee lineages suggest, in addition to the MHC region, that there are 125 further regions of balancing selection enriched for genes likely to be involved in host-pathogen interactions (Leffler et al., 2013). Scans of traditional within-species summary statistics that can detect balancing selection, such as Tajima’s D or HKA tests (Bubb et al., 2006; Andrés et al., 2009), have identified some loci undergoing balancing selection, but miss other well-known examples, suggesting a lack of power of the statistics used. In addition, cryptic duplications or copy-number variation (CNV) can be misinterpreted as balancing selection if the genome is not fully or correctly annotated. More accurate annotation of CNV in genomes combined with whole genome sequencing data and statistics designed to sensitively detect haplotypes exhibiting characteristics of balancing selection divergence and frequency have enabled identification of a larger number of balancing selection sites (Bitarello et al., 2018; de Filippo et al., 2016; DeGiorgio et al., 2014; Siewert and Voight, 2017).

In almost all cases where genome-wide scans for balancing selection have been done, moving beyond a statistical signal and determining the molecular basis for the balancing selection has been lacking. This is important not only to provide a clear explanation for patterns of selection and our understanding of the past pressures on humans but also to understand how the variation interacts with pathogens today, with consequences for disease progression, resistance, and treatment (Key et al., 2014).

The mucin hybrid glycoprotein DMBT1 (also known as SALSA, gp340, muclin, or salivary agglutinin), encoded by the DMBT1 gene, is a pattern recognition and scavenger receptor protein (Canton et al., 2013). DMBT1 is composed of tandemly arranged scavenger receptor cysteine-rich (SRCR) domains and interspersed glycosylated Ser/Thr-rich motif (SID) domains for microbe and host-ligand binding and CUB (complement C1r/C1s, Uegf, Bmp1) and ZP (zona pellucida-like) domains for cell polarization and polymerization (Mollenhauer et al., 1997; Prakobphol et al., 2000; Reichhardt et al., 2017).

DMBT1 is expressed by macrophages, in tissues encountering microbes, such as the lung and colon, and tissues producing fluids that encounter microbes, such as saliva, amniotic fluid, and meconium. DMBT1 binds many different viruses and bacteria, including Streptococcus mutans, and host ligands, such as secretory IgA, lactoferrin, and mucin-5B, through the SRCR and SID domains (Thornton et al., 2001; Ligtenberg et al., 2004; Loimaranta et al., 2005). Saliva adhesion and aggregation of streptococcal ligands by DMBT1 and its activation of complement by the lectin pathway have different outcomes in solution versus on surfaces (Loimaranta et al., 2005; Leito et al., 2011; Reichhardt et al., 2012; Reichhardt and Meri, 2016; Gunput et al., 2016).

The DMBT1 gene shows extensive multiallelic copy-number variation (CNV) across all populations, with the tandemly repeated CNV leading to different alleles with between 7 and 21 SRCR domains (Sasaki et al., 2002; Polley et al., 2015). Salivary DMBT1 protein size isoforms I–IV, corresponding in size to shorter and longer isoforms, exist (Eriksson et al., 2007; Esberg et al., 2012), and variation in the number of SRCR domains influences binding of DMBT1 to bacteria, with a short isoform (8 SRCR domains) binding bacteria 30%–40% less effectively than a longer isoform (13 SRCR domains) (Esberg et al., 2012; Bikker et al., 2017). Multiple transcripts of DMBT1 have been detected and are often interpreted as evidence of alternative splicing, although the range and nature of the alternative transcripts are consistent with CNV of the underlying DMBT1 SRCR exons. The O-glycosylation of the Thr-rich SID domains in DMBT1 by short-chain GalGalNAc and long-chain type 1 and type 2 Lewis and ABO chains also vary between individuals (Schulz et al., 2002; Eriksson et al., 2007). The Lewis and ABO glycosylation of DMBT1 depends on Secretor (Se) status, encoded by a polymorphism in the fucosyltransferase 2 (FUT2) gene (Kelly et al., 1995). However, the relative contributions of variation in SRCR copy number at the DNA level, alternative splicing at the RNA level, and alternative glycosylation at the posttranslational processing level, to the observed variation in DMBT1 protein size between individuals remains to be determined.

The consequences of genetic, translational, or posttranslational variation on DMBT1 function and role in disease development remains largely unknown, and DMBT1 complexity, redundancy of pattern recognition pathways, and disease heterogeneity may explain difficulties in linking DMBT1 to disease development. Analysis of variation at the population level, combined with the known function of DMBT1 and the binding affinities of the SRCR domains, suggest an evolutionary explanation based on an interaction between the human dietary environment and genetic variation affecting affinity to the causative agent of dental caries, the bacteria Streptococcus mutans (Polley et al., 2015). This was subsequently supported by direct evidence that different length alleles have differing binding affinities to S. mutans (Bikker et al., 2017).

Several genome-wide selection scans have identified DMBT1 as a gene showing one of the strongest signals for balancing selection in Europeans and Africans (Figure 1). Using T1 and T2 tests, which measure the ratio of within-species to between-species variation and any excess of intermediate frequency variants respectively, DMBT1 was found in the top 20 of genomic loci in both European-Americans from Utah (CEU) and Yoruba from Ibadan in Nigeria (YRI), but not Chinese from Beijing (CHB) (DeGiorgio et al., 2014). The noncentral deviation statistics NCD1 and NCD2, which measure the departure of the allele frequency -spectrum from neutrality under balancing selection, showed strong evidence of balancing selection at DMBT1 in European and African populations from the 1000 Genomes project (Bitarello et al., 2018). The beta statistic, which identifies clusters of alleles at similar frequency, identified SNPs in the 5′ region of DMBT1 to show significant balancing selection in all 1000 Genomes Project populations (Siewert and Voight, 2017). Balancing selection has been acting at this locus since before human-chimpanzee divergence, as two trans-species SNPs (rs74577795, rs79314843), a hallmark of long-term balancing selection, have been identified either side of the DMBT1 gene (Leffler et al., 2013) (Figure 1).

Figure 1.

Figure 1

Evidence for balancing selection at the human DMBT1 gene

The DMBT1 gene is shown in blue, with three tracks above representing Tajima’s D measured from sequenced genomes from three populations (European-Americans from Utah (CEU) in green, Chinese from Beijing (CHB) in blue, Yoruba from Ibadan (YRI) in orange), image taken from the 1000 Genomes selection browser (https://hsb.upf.edu/). Below the DMBT1 gene the SNP rs11523871 and two CNVs thar affect the copy number of the SRCR repeats are shown. Above the tracks showing Tajima’s D are different sources of evidence of balancing selection, namely the beta statistic (Siewert and Voight, 2017), the NCD statistic (purple (Bitarello et al., 2018), trans-specific variants (Leffler et al., 2013), and composite likelihood ratio tests (DeGiorgio et al., 2014).

Genome-wide selection scans for balancing selection examine patterns of SNP diversity and so, taken together, identify regions flanking the CNV, in particular a region spanning exon 1 and the 5′ region of the gene, which shows two highly diverged haplotypes maintained at high frequencies. Whether these signals of balancing selection are independent, or are driven by linkage disequilibrium with the CNV within the DMBT1 gene, was unclear and was the motivation for this study.

Our first aim was to dissect the relationship between the evidence of balancing selection shown by the pattern of variation in regions flanking the DMBT1 gene and the alleles of the CNV affecting the number of SRCR domains within the DMBT1 gene. Our second aim was to clarify the relationship between the CNV, the DMBT1 transcript, and the different glycoprotein isoforms shown to differ functionally. Our third aim was to explore the effect of proteins from different DMBT1 allele lengths on pattern recognition of a cariogenic and invasive S. mutans phenotype.

Results

Two divergent haplotypes explain the footprint of balancing selection at DMBT1

The population genetics statistic Tajima’s D can be used to identify regions of balancing selection (Tajima, 1989). Initial inspection of Tajima’s D values across the DMBT1 region showed a 16 kb region (GRCh38 chr10:122555466-122571966, Figure 1) of high value (2.67 in CEU) at the 5′ end of DMBT1 overlapping with signals of balancing selection identified using genome-wide scans of other statistics (Figure 1) (Pybus et al., 2014). We calculated Tajima’s D in 10 kb windows across the genome for all populations in the Human Genome Diversity Project, and the value for the 10 kb window within this DMBT1 region (GRCh38 chr10:122560000-122570000) was z-normalized against the distribution of all Tajima's D values calculated across the genome for that population. As expected, a high z-normalized Tajima’s D value was shown across almost all European populations and African populations, with large differences in values sometimes observed between closely located populations, particularly in Asia (Table S1).

We examined the haplotype structure of the 16 kb region in the French population and the Yoruba population from the Human Genome Diversity Panel, both showing a high positive Tajima’s D. Two highly diverged haplotypes were responsible for the signal of balancing selection in these and other populations (Figure S1). Therefore, the allelic status at many SNPs in this region can distinguish the two major haplotype clades (Figure S1), and we chose one SNP (rs11523871) whose alleles act as a proxy for the two haplotypes under balancing selection.

Copy-number variation of DMBT1 SRCR domains is associated with the two haplotype clades

Our hypothesis was that the high frequency of the divergent haplotypes was driven by linkage of each haplotype to distinct spectra of copy-number variants in the DMBT1 gene. To explore whether the copy number of the tandemly repeated SRCR domains is responsible for the signal of balancing selection, we phased individual SRCR domain copy numbers to haplotypes for 132 unrelated samples in the CEPH cohort by observing segregation in pedigrees. rs11523871-C haplotypes carry fewer SRCR repeats compared with rs11523871-A haplotypes (p = 3.282 × 10−7, n = 263, Wilcoxon rank sum test with continuity correction), with the relationship mostly explained by a higher frequency of the short 8 SRCR domain allele on the rs11523871-C haplotype (Figure 2A). We also phased individual SRCR domain copy numbers to haplotypes in a cohort of 58 Yoruba individuals from Ibadan Nigeria (YRI), using parent-offspring trio information and the software CNVice (Zuccherato et al., 2017). Again, we found that rs11523871-C haplotypes carry fewer SRCR domain repeats compared with rs11523871-A haplotypes (p = 5.5 × 10−4, n = 116, Wilcoxon rank sum test with continuity correction), but this association is due primarily to a high frequency of the 11 allele on the rs11523871-C haplotypes (Figure 2B). In summary, association between SNP alleles and SRCR domain copy numbers is population specific, with the rs11523871-C haplotype carrying, on average, a shorter SRCR domain than the rs11523871-A haplotype.

Figure 2.

Figure 2

Association of rs11523871 and DMBT1 SRCR repeat copy number

(A and B) For two populations, CEPH (A) and YRI (B), the distributions of DMBT1 SRCR repeat domain copy numbers associated with the rs11523871-A allele (blue, above the x axis) and rs11523871-C (red, below the x axis) are shown. The y axis shows the number of observations in the two samples (CEPH n = 263, YRI n = 116).

Alleles in the two haplotype clades are unlikely to affect protein function and do not affect gene expression in a manner consistent with balancing selection

We also explored other explanations for the high frequency of divergent haplotypes. Firstly, we explored whether any of the variants in the divergent haplotypes were likely to have a direct effect on the DMBT1 protein. There are two common SNPs that alter an amino acid, rs11523871 (Thr42Pro) and rs3013236 (Leu54Ser), both outside the functional SRCR domains. Analysis using Polyphen and SIFT through Ensembl’s Variant Effect Predictor (McLaren et al., 2016) reported these variants as benign and tolerated, respectively, arguing against these SNPs causing a functional difference, and therefore, they are unlikely to be responsible for the signal of balancing selection.

If the genotype of one of the variants within the divergent haplotype were to affect expression levels of the gene, it is possible that this could explain the pattern of balancing selection observed. Analysis of GTex data confirmed the established tissue expression pattern, with high expression levels in lung, small intestine, colon, and minor salivary gland (Figure 3A). Of these tissues, only in the colon and skin do DMBT1 expression levels show a statistically significant relationship with genotype at rs11523871, with p = 1.7 × 10−19, n = 368 for colon, p = 2.3 × 10−22, n = 605 for sun-exposed skin, and p = 2.0 × 10−13 n = 517 for non-sun-exposed skin. In each case, a linear increase in expression with each C allele is observed, with the CC genotype showing on average the highest expression (Figure 3B), although given the low level of DMBT1 expression, the biological relevance of this effect in skin is likely to be small. Notably, there is no evidence of a relationship between rs11523871 and expression levels in the minor salivary gland, which expresses DMBT1 at high levels for secretion into the buccal cavity and therefore subsequent interaction with S. mutans. Given the lack of evidence in other tissues, and the uncertainty surrounding measuring the abundance of transcripts containing tandem repeats using short-read RNAseq, we tested the relationship between rs11523871 genotype and expression level in the duodenum from 41 individuals using digital droplet PCR (Figure 3C). This shows variation in expression levels (ANOVA, p = 0.027) and weak support for a linear relationship with the C allele (p = 0.047, UBC as reference, p = 0.09, RPLP0 as reference, two-tailed F-test). Taken together, there is no evidence of a heterozygous effect on expression level, but a relationship between allele C, acting as a proxy for one haplotype at the balancing selection locus at rs11523871 and high expression.

Figure 3.

Figure 3

DMBT1 gene expression and rs11523871 genotype

(A) Tissue expression of DMBT1 across 54 tissues, ordered by mean expression level, from RNAseq data. Data and image from the GTEx Portal Locus Browser v.8.

(B) Violin plots show rs11523871 genotype and expression level for the three tissues showing a statistically significant relationship. Median and interquartile range are shown by the white line and grey box.

(C) Boxplots showing rs11523871 genotype and expression level in duodenum from 41 healthy patients, normalized against two different housekeeping genes. Left boxplot shows data normalized to RPLP0 expression; right boxplot shows data normalized to UBC expression. Boxplots indicate median, interquartile range, and range.

Copy-number variation of DMBT1 SRCR domains is associated with DMBT1 protein size variation in saliva

It is known that cloned DMBT1 transcripts of different lengths generate proteins of different lengths, but the basis of the size variation in DMBT1 protein isoforms isolated directly from saliva, for example, is unclear. To address this, we determined SRCR domain diploid copy number on eight DNA samples from individuals that had previously been assigned as different protein isoforms (I-IV) by western blot of saliva (Eriksson et al., 2007) (Table 1). Even with a small sample size, a strong linear relationship between SRCR diploid copy number and DMBT1 protein isoforms was present (linear regression, p = 0.005, Kendall’s rank correlation p = 0.004), explaining most of the variation in protein size (r2 = 0.75).

Table 1.

DMBT1 protein isoforms and SRCR repeat diploid copy number

Sample CNV1 diploid copy number CNV2 diploid copy number SRCR repeat diploid copy number Inferred Secretor status DMBT1 (gp340) protein isoform Approxiamate isoform sizea (kDa) S. mutans binding activityb
1 2 7 27 + III 389 +
2 2 4 24 I 345
3 2 2 22 + I 345
4 2 6 26 + II 375 +
5 2 4 24 + II 375 +
6 1 4 20 IV 287,345
7 2 5 25 + II 375 +
8 2 9 29 + III 389 +
a

Protein isoform data from (Eriksson et al., 2007).

b

See Figure 5.

We also explored the possibility of alternative splicing in generating DMBT1 variants. Genetic variation in SRCR domain copy number must be reflected in the transcript and the protein if balancing selection operating on the function of the protein is reflected in genetic variation. Long (8 kb, 13 SRCR domains) and short (6 kb, 8 SRCR domains) transcripts have been cloned from cDNA and studied extensively, yet different transcript lengths have previously often been ascribed to alternative splicing between tissues rather than genetically-encoded polymorphic variation. To help resolve this issue, we took a H292 lung cell line model, where DMBT1 is strongly expressed, and used long single molecule sequencing to determine the extent to which the copy number of SRCR domain repeats in DMBT1 transcripts reflects the copy number of SRCR domain repeats in the DMBT1 gene (Figure 4). Using PRT and long PCR, we showed that the H292 cell line was homozygous for 11 SRCR domain repeats. Five transcripts with mappable 3′ and 5′ ends were identified, of which four showed 11 SRCR domain repeats and one showed 10. This suggests that, for this cell line model at least, alternative splicing can generate alternative numbers of SRCR domain repeats but plays a relatively minor role, with the length of most transcripts matching the genetically encoded number of SRCR domains.

Figure 4.

Figure 4

Identification of transcripts spanning the SRCR repeats in DMBT1

The DMBT1 allele from the genome assembly, with 14 tandemly arranged SRCR repeats highlighted in blue, is shown at the top of the figure, with GRCh38 coordinates as a scale immediately underneath. The sequence alignment of single-molecule sequencing reads mapping to the DMBT1 gene are shown, with at the bottom, the genome features format file (GFF) derived from the sequence alignments. In the GFF image, black boxes indicate complete SRCR repeats, with the total number of tandemly repeated SRCR repeats for that transcript highlighted in red on the left of the particular transcript.

A short DMBT1 SRCR domain allele shows loss of pattern recognition of a cariogenic and invasive S. mutans phenotype

We next explored the ability of DMBT1 protein size isoforms I–IV to bind a S. mutans SpaP A, Cnm phenotype using a western-blot-like assay and saliva with known DMBT1 size isoforms I–IV (Eriksson et al., 2007). Both spaP and Cnm are adhesins expressed on the surface of Streptococcus mutans thought to mediate interactions with salivary DMBT1 (Brady et al., 1992, 2010; Esberg et al., 2017; Loimaranta et al., 2005). DMBT1 size isoforms II and III (carrying longer alleles) mediated distinct binding, whereas isoforms I and IV (carrying shorter alleles) did not bind the S. mutans ligand (Figure 5 and Table 1). Both silver staining and western blot with anti-DMBT1 antisera verified the amount of DMBT1 in the saliva (Figure 5). Binding to saliva of single (spaP A- or Cnm-) and double (spaP A-, Cnm-) knock-out mutants of the wild-type strain (spaP A, Cnm) and recombinant rCnm adhesin to saliva indicated that binding to DMBT1 was mediated mainly by the Cnm adhesin (Figure 5). The weak residual activity upon binding of the double mutant (spaP A-, Cnm-) may reflect other adhesins other than Cnm or spaP (Figure S2).

Figure 5.

Figure 5

Differential binding of S. mutans by DMBT1 isoforms in saliva

Blots of SDS-PAGE gels with saliva from different individuals with DMBT1 size isoforms I–IV.

(A and B) (A) probed using a biotinylated S. mutans SpaP A, Cnm strain and (B) probed with DMBT1-specific antibodies. The positions of DMBT1 on both blots is indicated. Note that DMBT1 isoform size differences are not seen, as they are not resolved at the SDS-PAGE gel density used before the blotting.

Binding of S. mutans occurred also to mono and dimeric amylase (50 and 100 kDa) and PRP (43 and 37 kDa) protein bands, although with large individual variation. Recombinant Cnm bound DMBT1 and amylase but not PRP (Figure S2).

Variation in secretor-dependent glycosylation may contribute to DMBT1 protein isoform variation

Previous reports have suggested that differential glycosylation may explain the variation in DMBT1 isoform size observed, and indeed DMBT1 shows complex glycosylation patterns, including glycosylation by the product of the FUT2 gene, where a null allele is responsible for the Secretor negative blood group, and shows evidence of balancing selection itself (Koda et al., 2000; Soejima et al., 2007; Ferrer-Admetlla et al., 2009). We genotyped the FUT2 gene to infer Se status and found 2 Se− and 6Se+ individuals. Incorporating this into our linear model improved the fit of the model to the data (r2 = 0.85), although secretor status was not a statistically significant predictor (p = 0.07). It is clear that SRCR domain copy number is primarily responsible for the different protein isoforms of DMBT1, although Se status, and other glycosylation, may contribute.

Discussion

In this study, we have shown that the strong signature of balancing selection spanning the 5′ region of the DMBT1 gene is most likely to be explained by tight linkage of particular SNP haplotypes at this region to particular SRCR domain repeat copy numbers within the DMBT1 gene. In particular, the signal of balancing selection across populations is driven by one haplotype carrying a short SRCR domain repeat copy-number allele and another carrying a longer SRCR domain repeat copy-number allele. We also show that SRCR domain repeat copy number is the primary determinant of both transcript length and protein isoform size in vivo and that the short protein isoform shows complete loss of pattern recognition or binding to a cariogenic and invasive S. mutans phenotype.

The finding of the short protein isoform showing complete loss of binding of the collagen-binding Cnm S. mutans phenotype associated with caries and invasive systemic conditions is interesting for several reasons. The fact that previous studies have shown a 30%–40% reduction of binding of other S. mutans types and other bacteria suggests that the longer DMBT1 isoform may give optimal protection against the Cnm and other collagen-binding phenotypes with the potential to cross the basal membrane barrier and infect the extracellular matrix. However, our data suggest that other co-receptors, besides DMBT1, may contribute to the partial reduction of S. mutans binding when whole saliva is analyzed. Although the number of individuals in our study are too few for firm conclusions, our present findings are consistent with previous findings (Eriksson et al., 2007) in suggesting that the short DMBT1 isoform may lack FUT2-dependent fucose receptors for many microbial ligands. Studies showing a role for DMBT1 genetic and protein variation in individuals with high risk of caries, and different causal subtypes, indicate the clinical significance of DMBT1 (Jonasson et al., 2007; Esberg et al., 2017; Strömberg et al., 2017).

Taken together, our work and previous work showing that DMBT1 molecules with fewer SRCR repeat domains are less effective at binding bacteria provides a link between the observation of balancing selection in the genome and the functional basis for that balancing selection (Bikker et al., 2017). The exact nature of the balancing selection remains unclear. For example, could it be due to overdominance, where heterozygotes with a long and a short DMBT1 allele are favored. An alternative mechanism could be fluctuating selection, where a change in the environment across space or time alters the selective advantages of different alleles. In support of this, the Tajima’s D value of the linked region can vary wildly between closely located populations (Table S1) with more subtle differences in the SRCR copy number (Polley et al., 2015). A possible scenario is that alleles reducing tethering to teeth are favored in populations with cereal-rich diets, alleles with enhanced binding to bacteria are favored during outbreaks of diarrheal or lung disease, and alleles with improved viral binding signaling to complement are favored during viral infectious disease epidemics.

Any or all of these explanations are plausible, and although it is possible to determine relative importance of the different alleles on these different functions, the actual selection pressures that have acted on DMBT1 and are responsible for the variation that we see are likely to be very difficult to determine.

Finding explanations for patterns of balancing selection in the human genome can have medical importance, as it can identify alleles that have a functional effect in a disease process. Examples include loci under balancing selection for malaria, where GWAS have supported the association. Loci under balancing selection may identify further loci invisible to current GWAS either because of limitations in measuring phenotypes or the transient complex nature of other phenotypes, such as infectious disease. It is notable that, as yet, there is no convincing association of variation at DMBT1 and disease—there are no GWAS-based associations in the region of balancing selection, and early suggestions that the number of SRCR repeat domains was associated with Crohn disease (CD) (Renner et al., 2007) have not been replicated (Polley et al., 2016b), although there is some evidence of an association of CD with rs2981804 in a candidate gene study (Diegelmann et al., 2013), and recent evidence suggests a genetic and functional link between SRCR copy number and the number of urinary tract infections in children with vesicoureteral reflux (Hains et al., 2021). However, most links between DMBT1 and diseases such as lung disease rely entirely on evidence from functional studies (Hartshorn et al., 2006; Müller et al., 2008, 2015).

In conclusion, our work supports the concept that selection acts on a highly mutable SRCR domain-encoding tandem repeat in the DMBT1 gene in humans, at least in part, by differential binding properties of the molecule to co-receptors on the tooth surface and diverse bacteria on mucosal surfaces. The expression of DMBT1 in saliva, lung, intestinal surfaces, amniotic fluid, and as a major protein (4%–10%) in the first meconium stool of newborns may suggest an important role in the host’s interaction with both pathogenic and commensal bacteria. Understanding the consequences of variation of this molecule in the mucosal innate immune response to infection will be an important aspect of understanding the role of DMBT1 in health and disease.

Limitations of the study

As with many studies on human genetic variation, we cannot be certain that the effects we observe now may have actually been the effects responsible in the past. For example, we observe that the two haplotypes showing evidence of balancing selection are correlated with longer and shorter SRCR arrays within DMBT1, but given the high mutation rate of the length of these arrays, this might have not been the case in the past. Our long-read sequencing of the DMBT1 transcript is only from one cell line representing one tissue from one individual, and further research would extend this to multiple tissues in multiple individuals to confirm the generality of our observations. Finally, we focus on cariogenic S. mutans for binding assays, which has medical relevance now, but interactions with other bacteria may have contributed to balancing selection at this locus in the past.

STAR★Methods

Key resources table

REAGENT or RESOURCE SOURCE IDENTIFIER
Antibodies

Anti-DMBT1 D. Malamud, University of Pennsylvania mAb143

Bacterial and virus strains

S. mutans SpaP A, Cnm phenotype (strain n49) Sample collection, Biobank Department of Odontology, Umeå University
Region Västerbotten
472

Biological samples

YRI Genomic DNA samples Coriell cell repositories MGP00013
CEU DNA samples from three generation pedigrees CEPH n/a

Chemicals, peptides, and recombinant proteins

Streptavidin-POD GE Healthcare L1058765
NHS-LC-biotin Pierce 21336P

Critical commercial assays

Superscript III First Strand Synthesis Supermix Kit Thermo Fisher 18080400
Primers and TaqMan hydrolysis probes for ddPCR of UBC, RPLPO and DMBT1 Biorad 10031276 and 10031279
2x ddPCR Supermix for Probe (no dUTP) Biorad 1863024
sequence-specific cDNA-PCR Sequencing kit Oxford Nanopore SQK-PCS-109
Maxwell 16 Cell DNA purification kit Promega AS1020
Maxwell 16 LEV Simply-RNA Cell kit Promega AS1270

Deposited data

Human Genome Diversity Project variation vcf files Wellcome Trust Sanger Institute ftp.sanger.ac.uk
GTex RNAseq expression data gtexportal.org

Experimental models: Cell lines

NCI-H292 cell line European collection of authenticated cell cultures 91091815

Oligonucleotides

5' [PHOS] ACTTGCCTGTCGCTCTATCTT
CGCAGTTTCACCAAAATTCCTTT 3'
This paper DMBT1 cDNA primer
5’TCAGTGATGGTGAATGTTTGTCA-3’ This paper rs11523871-C allele
5’GACCTTACCTTCTGCTACAGTCGG-3’ This paper rs11523871-C allele
5’TGTGAGTGATTTATTTCGGCATTC-3’ This paper rs11523871-T allele
5’GACCTTACCTTCTGCTACAGTCGA-3’ This paper rs11523871-T allele
5′ATTGATTCACTTCACGGATCAAG 3′ This paper Positive control
5′TCTAAGAAATTCCCATGACAGGT 3′ This paper Positive control

Software and algorithms

ONT Guppy v3.3.3 https://github.com/nanoporetech https://github.com/nanoporetech
NanoPlot v.1.32.1 https://github.com/wdecoster/NanoPlot https://github.com/wdecoster/NanoPlot
Pychopper https://github.com/nanoporetech/pychopper https://github.com/nanoporetech/pychopper
Minimap2 v2.17 https://github.com/lh3/minimap2 https://github.com/lh3/minimap2
SAMTools v1.9 https://github.com/samtools/ https://github.com/samtools/
SHAPEIT2v837 http://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html http://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html

Resource availability

Lead contact

Further information and requests should be directed to the corresponding author Edward Hollox (ejh33@leicester.ac.uk).

Materials availability

This study did not generate new unique reagents.

Experimental model and subject details

Source of biological material

The Yoruba from Ibadan, Nigeria (YRI) HapMap samples (90 DNA samples) were derived from lymphoblastoid cell lines and are available from Coriell Cell repositories. Centre de’Etude du Polymorphisme Humain (CEPH) provides a set of DNA samples derived from lymphoblastoid cell lines from European origin individuals as a three-generation pedigrees (Stevens et al., 2012). DNA samples with matching DMBT1 isoform data were derived from saliva collected from healthy volunteers as part of a project at the University of Umea, Sweden, with appropriate informed consent. Matching DNA and cDNA was generated from duodenal biopsies of individuals as part of a previous study (Wang et al., 1994), with local ethics approval UCLH (ref 01/0236).

The H292 cell line (NCI-H292, ECACC 91091815), derived from a human lung mucoepidermoid carcinoma, was supplied by the European Collection of Authenticated Cell Culture (ECACC) (Carney et al., 1985) and grown according to the supplier's recommendation.

Method details

Isolation of nucleic acids

For DNA isolation, a Maxwell 16 (Promega) instrument was configured with the Standard Elution Volume (SEV) hardware and using the Maxwell 16 Cell DNA purification kit (Promega). For total RNA isolation, the SEV hardware was replaced with the Low Elution Volume (LEV) hardware and the Simply-RNA Cell kit (Promega) was used. RNA integrity was measured by electrophoresis on an Agilent Bioanalyzer 2100.

Measurement of DMBT1 SRCR copy number

DMBT1 copy number was determined using paralogue ratio tests (PRTs) as previously published (Polley et al., 2015, 2016a, 2016b) with long PCR used to confirm genotypes (Polley et al., 2015; Sasaki et al., 2002). Briefly, the PRT amplifies a section of DNA in a copy number variable region (test) and in a non-copy number variable region (reference) using the same primer pair. Following amplification and electrophoresis, quantification of test and reference amplicons provides a measure of the copy number of the test locus relative to the reference locus (Armour et al., 2007; Hollox, 2017). For DMBT1, two independent PRTs each are used to measure copy number at CNV1 and CNV2 . Data analysis, and positive controls, were as previously published (Polley et al., 2015).

SNP rs11523871 genotyping

In samples where rs11523871 genotyping data was not publicly available, it was- genotyped directly from genomic DNA using allele-specific PCR. Primers 5’TCAGTGATGGTGAATGTTTGTCA-3’ and 5’GACCTTACCTTCTGCTACAGTCGG-3’ were for the C allele; and 5’TGTGAGTGATTTATTTCGGCATTC-3’ and 5’GACCTTACCTTCTGCTACAGTCGA-3’ were for the T allele, and were included together with positive control primers from unrelated sequence 5′ATTGATTCACTTCACGGATCAAG 3′ and 5′TCTAAGAAATTCCCATGACAGGT 3′ . For each sample, C-specific and T-specific reactions included primers each at a final concentration of 0.5 μM, 10 ng of genomic DNA, 0.2 mM dNTPs and 1U Taq DNA polymerase in a standard ammonium sulfate-based PCR buffer with 1.5 mM MgCl2. Standard cycling used an annealing temperature of 57°C optimised to ensure allele specificity.

Measurement of DMBT1 gene expression using digital droplet PCR

cDNA was generated from total RNA using the Superscript III First Strand Synthesis Supermix Kit (Invitrogen). Primers and TaqMan hydrolysis probes for RPLPO and UBC, as reference genes, and DMBT1, were commercially available and ordered from BioRad. cDNA was digested using 50U XhoI (New England Biolabs), then 1-5 ng added to a PCR mix containing 11 μL 2x ddPCR Supermix for Probe (no dUTP, BioRad), 1.1 μL 20x DMBT1 probe assay (FAM-labelled), 1.1 μL 20x reference probe assay (HEX-labelled) in a final volume of 22 μL. After droplet generation and thermal cycling, following manufacturer’s instructions, droplets were counted by a QX200 droplet reader (Biorad) and analysed using Quantasoft (Biorad).

DMBT1 transcript sequencing

For sequencing on a MinION (Oxford Nanopore), a sequencing library was prepared from total RNAusing the sequence-specific cDNA-PCR Sequencing kit (SQK-PCS-109, Oxford Nanopore Technologies, UK) following the manufacturer’s instructions. A primer corresponding to the 3' end of DMBT1 (5' [PHOS] ACTTGCCTGTCGCTCTATCTTCGCAGTTTCACCAAAATTCCTTT 3' was used to generate cDNA from total high-quality RNA following the manufacturer’s instructions, to enrich the resulting sequence for DMBT1 transcripts. The cDNA subsequently was amplified by PCR to enrich for full-length transcripts, following Oxford Nanopore Technology (ONT) protocols. ONT Guppy software v3.3.3 MinKNOW was used to perform live basecalling, with NanoPlot v.1.32.1 (De Coster et al., 2018) to check quality of reads. Identification of full length cDNA transcripts used Pychopper (ONT), and were mapped to chromosome 10 from the GRCh38 human genome assembly using Minimap2 v2.17 (Li, 2018). Following sorting and indexing the bam alignment files using SAMTools v1.9 (Li et al., 2009), an annotation file was generated by the spliced_bam2gff tool from the Pinfish pipeline (ONT), and visualised using the Ensembl genome browser (Yates et al., 2020).

S. mutans strains and biotinylation

A wildtype S. mutans SpaP A, Cnm phenotype (strain n49) and corresponding single and double knock-out mutants were used (Esberg et al., 2017). We also generated and used a recombinant rCnm protein from the same strain. For biotinylation, bacterial cells grown overnight on blood agar plates were harvested, washed 2 times in 10 mM phosphate-buffered saline pH 7.2 and resuspended in 1 mL 0.2 M NaHCO3, pH 8.3. The resultant bacterial suspension (5 × 109 cells/mL) was mixed (10:1) with NHS-LC-biotin (Pierce, 21336P) dissolved in DMSO (1 mg/mL) and the mixture incubated with rotation for 1 hour at room temperature. After incubation, biotinylated bacteria were washed 3 times in PBS buffer, resuspended to a final concentration of 5 × 109 cell/mL in adhesion buffer (50 mM KCl, 1 mM CaCl2, 0.1 mM MgCl2, 1 mM K2HPO4, pH 6.5) and stored at −20°C until use.

Binding of S.mutans and antisera to saliva DMBT1 size isoform I-IV phenotypes

Parotid saliva (15 μL) from individuals with DMBT1 size isoform I-IV phenotypes were separated by electrophoresis on 4–15% gradient SDS-PAGE gels (BioRad) and subjected to Western blot-like binding of biotinylated bacteria. Briefly, salivary proteins were transferred to methanol treated PVDF membranes using the Trans-Blot Turbo Transfer system (BioRad). After transfer, the membrane was washed 3 times with PBS buffer with 0.05% Tween 20 (PBST), blocked in 5% milk in PBST for 1 h at room temperature, washed 3 times with PBST and incubated with a suspension of biotinylated bacteria overnight at 4°C. The membrane was then washed three times (3 × 10 min) with PBST and bound bacteria were detected by incubating with streptavidin-POD (GE Healthcare), diluted 1:10 000 in adhesion buffer, for 1 h followed by three PBST washes (3 × 10 min). Bands were visualized with enhanced chemiluminescence reagent Chemidoc XRS (Biorad) and a BioRad system was used to capture the images. Western blot experiments also used specific anti-DMBT1 (mAb143 provided by D. Malamud, University of Pennsylvania), anti-PRP and anti-amylase antisera as well as pure salivary protein references to identify receptor-active saliva protein bands.

Quantification and statistical analysis

Statistical analysis used the statistical software R 3.6.3, implemented in RStudio v1.1.456 (https://www.R-project.org/). HGDP variation data, derived from whole genome sequencing (Almarri et al., 2020), were downloaded from the Wellcome Sanger Institute (ftp.sanger.ac.uk as vcf files) and phased using SHAPEIT2v837 (Delaneau et al., 2013). Population genetics statistics were calculated using vcftools (Danecek et al., 2011), with an example script at the following address http://doi.org/10.25392/leicester.data.19372493. Tajima’s D values were normalised by reporting the z value of the observed Tajima’s D value in a distribution of Tajima’s D values, measured across the genome in 10 kb non-overlapping windows. Populations with fewer than three variable sites in the 10 kb DMBT1 region were excluded. Median spanning haplotype networks were calculated from phased vcf genotype data using SNiPlay (Dereeper et al., 2015).

Acknowledgments

Thanks to Dr Rachael Madison for technical support. Access to a Biorad QX200 digital droplet generator and reader was through the NUCLEUS Genomics service at the University of Leicester. We thank Prof Dallas Swallow for access to intestinal DNA and cDNA samples. AA was supported by the Saudi Arabian Ministry of Health and by a PhD studentship grant from the Saudi Arabian Cultural Bureau, London. Funding acquisition: AFA and EJH.

Author contributions

Conceptualization: EJH, NS, and AFA; Investigation: AFA, NS, and KN; Resources: NS and EJ; Writing—Original Draft: EJH and AFA, Writing—Review and Editing: all authors; Supervision: EJH and NS.

Declaration of interests

The authors declare no competing interests.

Published: May 20, 2022

Footnotes

Supplemental information can be found online at https://doi.org/10.1016/j.isci.2022.104189.

Supplemental information

Dcoument S1. Figures S1 and S2 and Table S1
mmc1.pdf (182.4KB, pdf)

Data and code availability

This paper analyzes existing publically available data, with information in the key resources table. This paper does not report any original code. Any additional information required to reanalyze the data reported in this paper is available from the lead contact on request.

References

  1. Almarri M.A., Bergstrom A., Prado-Martinez J., Yang F., Fu B., Dunham A.S., Chen Y., Hurles M.E., Tyler-Smith C., Xue Y. Population structure, stratification, and introgression of human structural variation. Cell. 2020;182:189–199.e15. doi: 10.1016/j.cell.2020.05.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Andrés A.M., Hubisz M.J., Indap A., Torgerson D.G., Degenhardt J.D., Boyko A.R., Gutenkunst R.N., White T.J., Green E.D., Bustamante C.D., et al. Targets of balancing selection in the human genome. Mol. Biol. Evol. 2009;26:2755–2764. doi: 10.1093/molbev/msp190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Armour J.A., Palla R., Zeeuwen P.L., Heijer M., Schalkwijk J., Hollox E.J. Accurate, high-throughput typing of copy number variation using paralogue ratios from dispersed repeats. Nucleic Acids Res. 2007;35:e19. doi: 10.1093/nar/gkl1089. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bikker F.J., End C., Ligtenberg A.J.M., Blaich S., Lyer S., Renner M., Wittig R., Nazmi K., van Nieuw Amerongen A., Poustka A., et al. The scavenging capacity of DMBT1 is impaired by germline deletions. Immunogenetics. 2017;69:401–407. doi: 10.1007/s00251-017-0982-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bitarello B.D., de Filippo C., Teixeira J.C., Schmidt J.M., Kleinert P., Meyer D., Andrés A.M. Signatures of long-term balancing selection in human genomes. Genome Biol. Evol. 2018;10:939–955. doi: 10.1093/gbe/evy054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Brady L.J., Maddocks S.E., Larson M.R., Forsgren N., Persson K., Deivanayagam C.C., Jenkinson H.F. The changing faces of Streptococcus antigen I/II polypeptide family adhesins. Mol. Microbiol. 2010;77:276–286. doi: 10.1111/j.1365-2958.2010.07212.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Brady L.J., Piacentini D.A., Crowley P.J., Oyston P.C., Bleiweis A.S. Differentiation of salivary agglutinin-mediated adherence and aggregation of mutans streptococci by use of monoclonal antibodies against the major surface adhesin P1. Infect. Immun. 1992;60:1008–1017. doi: 10.1128/iai.60.3.1008-1017.1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Bubb K.L., Bovee D., Buckley D., Haugen E., Kibukawa M., Paddock M., Palmieri A., Subramanian S., Zhou Y., Kaul R., et al. Scan of human genome reveals no new loci under ancient balancing selection. Genetics. 2006;173:2165–2177. doi: 10.1534/genetics.106.055715. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Canton J., Neculai D., Grinstein S. Scavenger receptors in homeostasis and immunity. Nat. Rev. Immunol. 2013;13:621–634. doi: 10.1038/nri3515. [DOI] [PubMed] [Google Scholar]
  10. Carney D.N., Gazdar A.F., Bepler G., Guccion J.G., Marangos P.J., Moody T.W., Zweig M.H., Minna J.D. Establishment and identification of small cell lung cancer cell lines having classic and variant features. Cancer Res. 1985;45:2913–2923. [PubMed] [Google Scholar]
  11. Danecek P., Auton A., Abecasis G., Albers C.A., Banks E., DePristo M.A., Handsaker R.E., Lunter G., Marth G.T., Sherry S.T. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. De Coster W., D’Hert S., Schultz D.T., Cruts M., Van Broeckhoven C. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics. 2018;34:2666–2669. doi: 10.1093/bioinformatics/bty149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. de Filippo C., Key F.M., Ghirotto S., Benazzo A., Meneu J.R., Weihmann A., NISC Comparative Sequence Program. Parra G., Green E.D., Andrés A.M. Recent selection changes in human genes under long-term balancing selection. Mol. Biol. Evol. 2016;33:1435–1447. doi: 10.1093/molbev/msw023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. DeGiorgio M., Lohmueller K.E., Nielsen R. A model-based approach for identifying signatures of ancient balancing selection in genetic data. PLoS Genet. 2014;10:e1004561. doi: 10.1371/journal.pgen.1004561. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Delaneau O., Zagury J.-F., Marchini J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods. 2013;10:5–6. doi: 10.1038/nmeth.2307. [DOI] [PubMed] [Google Scholar]
  16. Dereeper A., Homa F., Andres G., Sempere G., Sarah G., Hueber Y., Dufayard J.-F., Ruiz M. SNiPlay3: a web-based application for exploration and large scale analyses of genomic variations. Nucleic Acids Res. 2015;43:W295–W300. doi: 10.1093/nar/gkv351. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Diegelmann J., Czamara D., Le Bras E., Zimmermann E., Olszak T., Bedynek A., Göke B., Franke A., Glas J., Brand S. Intestinal DMBT1 expression is modulated by Crohn’s disease-associated IL23R variants and by a DMBT1 variant which influences binding of the transcription factors CREB1 and ATF-2. PLoS One. 2013;8:e77773. doi: 10.1371/journal.pone.0077773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Eriksson C., Frängsmyr L., Danielsson Niemi L., Loimaranta V., Holmskov U., Bergman T., Leffler H., Jenkinson H.F., Strömberg N. Variant size- and glycoforms of the scavenger receptor cysteine-rich protein gp-340 with differential bacterial aggregation. Glycoconj. J. 2007;24:131–142. doi: 10.1007/s10719-006-9020-1. [DOI] [PubMed] [Google Scholar]
  19. Esberg A., Löfgren-Burström A., Öhman U., Strömberg N. Host and bacterial phenotype variation in adhesion of Streptococcus mutans to matched human hosts. Infect. Immun. 2012;80:3869–3879. doi: 10.1128/IAI.00435-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Esberg A., Sheng N., Mårell L., Claesson R., Persson K., Borén T., Strömberg N. Streptococcus mutans adhesin biotypes that match and predict individual caries development. EBioMedicine. 2017;24:205–215. doi: 10.1016/j.ebiom.2017.09.027. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Ferrer-Admetlla A., Sikora M., Laayouni H., Esteve A., Roubinet F., Blancher A., Calafell F., Bertranpetit J., Casals F. A natural history of FUT2 polymorphism in humans. Mol. Biol. Evol. 2009;26:1993–2003. doi: 10.1093/molbev/msp108. [DOI] [PubMed] [Google Scholar]
  22. Fu W., Akey J.M. Selection and adaptation in the human genome. Annu. Rev. Genomics Hum. Genet. 2013;14:467–489. doi: 10.1146/annurev-genom-091212-153509. [DOI] [PubMed] [Google Scholar]
  23. Gunput S.T., Wouters D., Nazmi K., Cukkemane N., Brouwer M., Veerman E.C., Ligtenberg A.J. Salivary agglutinin is the major component in human saliva that modulates the lectin pathway of the complement system. Innate Immun. 2016;22:257–265. doi: 10.1177/1753425916642614. [DOI] [PubMed] [Google Scholar]
  24. Hains D.S., Polley S., Liang D., Saxena V., Arregui S., Ketz J., Barr-Beare E., Rawson A., Spencer J.D., Cohen A., et al. Deleted in malignant brain tumor 1 genetic variation confers urinary tract infection risk in children and mice. Clin. Transl. Med. 2021;11:e477. doi: 10.1002/ctm2.477. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Hartshorn K.L., Ligtenberg A., White M.R., Van Eijk M., Hartshorn M., Pemberton L., Holmskov U., Crouch E. Salivary agglutinin and lung scavenger receptor cysteine-rich glycoprotein 340 have broad anti-influenza activities and interactions with surfactant protein D that vary according to donor source and sialylation. Biochem. J. 2006;393:545–553. doi: 10.1042/BJ20050695. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Hollox E.J. Genotyping. Springer; 2017. Analysis of copy number variation using the paralogue ratio test (PRT) pp. 127–146. [DOI] [PubMed] [Google Scholar]
  27. Jonasson A., Eriksson C., Jenkinson H.F., Källestål C., Johansson I., Strömberg N. Innate immunity glycoprotein gp-340 variants may modulate human susceptibility to dental caries. BMC Infect. Dis. 2007;7:57. doi: 10.1186/1471-2334-7-57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Kelly R.J., Rouquier S., Giorgi D., Lennon G.G., Lowe J.B. Sequence and expression of a candidate for the human Secretor blood group alpha(1,2)fucosyltransferase gene (FUT2). Homozygosity for an enzyme-inactivating nonsense mutation commonly correlates with the non-secretor phenotype. J. Biol. Chem. 1995;270:4640–4649. doi: 10.1074/jbc.270.9.4640. [DOI] [PubMed] [Google Scholar]
  29. Key F.M., Teixeira J.C., de Filippo C., Andrés A.M. Advantageous diversity maintained by balancing selection in humans. Curr. Opin. Genet. Dev. Genet. Hum. Evol. 2014;29:45–51. doi: 10.1016/j.gde.2014.08.001. [DOI] [PubMed] [Google Scholar]
  30. Koda Y., Tachida H., Soejima M., Takenak O., Kimura H. Ancient Origin of the Null Allele se428 of the Human ABO-Secretor Locus (FUT2) J. Mol. Evol. 2000;50:243–248. doi: 10.1007/s002399910028. [DOI] [PubMed] [Google Scholar]
  31. Leffler E.M., Gao Z., Pfeifer S., Ségurel L., Auton A., Venn O., Bowden R., Bontrop R., Wall J.D., Sella G., et al. Multiple instances of ancient balancing selection shared between humans and chimpanzees. Science. 2013;339:1578–1582. doi: 10.1126/science.1234070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Leito J.T.D., Ligtenberg A.J.M., van Houdt M., van den Berg T.K., Wouters D. The bacteria binding glycoprotein salivary agglutinin (SAG/gp340) activates complement via the lectin pathway. Mol. Immunol. 2011;49:185–190. doi: 10.1016/j.molimm.2011.08.010. [DOI] [PubMed] [Google Scholar]
  33. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Ligtenberg A.J.M., Bikker F.J., De Blieck-Hogervorst J.M.A., Veerman E.C.I., Nieuw Amerongen A.V. Binding of salivary agglutinin to IgA. Biochem. J. 2004;383:159–164. doi: 10.1042/BJ20040265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Loimaranta V., Jakubovics N.S., Hytönen J., Finne J., Jenkinson H.F., Strömberg N. Fluid- or surface-phase human salivary scavenger protein gp340 exposes different bacterial recognition properties. Infect. Immun. 2005;73:2245–2252. doi: 10.1128/IAI.73.4.2245-2252.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R.S., Thormann A., Flicek P., Cunningham F. The Ensembl variant effect predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Mollenhauer J., Wiemann S., Scheurlen W., Korn B., Hayashi Y., Wilgenbus K.K., von Deimling A., Poustka A. DMBT1, a new member of the SRCR superfamily, on chromosome 10q25.3-26.1 is deleted in malignant brain tumours. Nat. Genet. 1997;17:32–39. doi: 10.1038/ng0997-32. [DOI] [PubMed] [Google Scholar]
  39. Müller H., End C., Weiss C., Renner M., Bhandiwad A., Helmke B.M., Gassler N., Hafner M., Poustka A., Mollenhauer J., Poeschl J. Respiratory Deleted in Malignant Brain Tumours 1 (DMBT1) levels increase during lung maturation and infection. Clin. Exp. Immunol. 2008;151:123–129. doi: 10.1111/j.1365-2249.2007.03528.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Müller H., Nagel C., Weiss C., Mollenhauer J., Poeschl J. Deleted in malignant brain tumors 1 (DMBT1) elicits increased VEGF and decreased IL-6 production in type II lung epithelial cells. BMC Pulm. Med. 2015;15:32. doi: 10.1186/s12890-015-0027-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Polley S., Cipriani V., Khan J.C., Shahid H., Moore A.T., Yates J.R.W., Hollox E.J. Analysis of copy number variation at DMBT1 and age-related macular degeneration. BMC Med. Genet. 2016;17:44. doi: 10.1186/s12881-016-0311-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Polley S., Louzada S., Forni D., Sironi M., Balaskas T., Hains D.S., Yang F., Hollox E.J. Evolution of the rapidly mutating human salivary agglutinin gene (DMBT1) and population subsistence strategy. Proc. Natl. Acad. Sci. U S A. 2015;112:5105–5110. doi: 10.1073/pnas.1416531112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Polley S., Prescott N., Nimmo E., Veal C., Vind I., Munkholm P., Fode P., Mansfield J., Skyt Andersen P., Satsangi J., et al. Copy number variation of scavenger-receptor cysteine-rich domains within DMBT1 and Crohn’s disease. Eur. J. Hum. Genet. 2016;24:1294–1300. doi: 10.1038/ejhg.2015.280. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Prakobphol A., Xu F., Hoang V.M., Larsson T., Bergstrom J., Johansson I., Frängsmyr L., Holmskov U., Leffler H., Nilsson C., et al. Salivary agglutinin, which binds Streptococcus mutans and Helicobacter pylori, is the lung scavenger receptor cysteine-rich protein gp-340. J. Biol. Chem. 2000;275:39860–39866. doi: 10.1074/jbc.M006928200. [DOI] [PubMed] [Google Scholar]
  45. Pybus M., Dall’Olio G.M., Luisi P., Uzkudun M., Carreño-Torres A., Pavlidis P., Laayouni H., Bertranpetit J., Engelken J. 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of natural selection in modern humans. Nucleic Acids Res. 2014;42:D903–D909. doi: 10.1093/nar/gkt1188. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Reichhardt M.P., Holmskov U., Meri S. SALSA-A dance on a slippery floor with changing partners. Mol. Immunol. 2017;89:100–110. doi: 10.1016/j.molimm.2017.05.029. [DOI] [PubMed] [Google Scholar]
  47. Reichhardt M.P., Loimaranta V., Thiel S., Finne J., Meri S., Jarva H. The salivary scavenger and agglutinin binds MBL and regulates the lectin pathway of complement in solution and on surfaces. Front. Immunol. 2012;3:205. doi: 10.3389/fimmu.2012.00205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Reichhardt M.P., Meri S. SALSA: a regulator of the early steps of complement activation on mucosal surfaces. Front. Immunol. 2016;7:85. doi: 10.3389/fimmu.2016.00085. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Renner M., Bergmann G., Krebs I., End C., Lyer S., Hilberg F., Helmke B., Gassler N., Autschbach F., Bikker F., et al. DMBT1 confers mucosal protection in vivo and a deletion variant is associated with Crohn’s disease. Gastroenterology. 2007;133:1499–1509. doi: 10.1053/j.gastro.2007.08.007. [DOI] [PubMed] [Google Scholar]
  50. Sasaki H., Betensky R.A., Cairncross J.G., Louis D.N. DMBT1 polymorphisms: relationship to malignant glioma tumorigenesis. Cancer Res. 2002;62:1790–1796. [PubMed] [Google Scholar]
  51. Schulz B.L., Oxley D., Packer N.H., Karlsson N.G. Identification of two highly sialylated human tear-fluid DMBT1 isoforms: the major high-molecular-mass glycoproteins in human tears. Biochem. J. 2002;366:511–520. doi: 10.1042/BJ20011876. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Siewert K.M., Voight B.F. Detecting long-term balancing selection using allele frequency correlation. Mol. Biol. Evol. 2017;34:2996–3005. doi: 10.1093/molbev/msx209. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Soejima M., Pang H., Koda Y. Genetic variation of FUT2 in a Ghanaian population: identification of four novel mutations and inference of balancing selection. Ann. Hematol. 2007;86:199–204. doi: 10.1007/s00277-006-0203-4. [DOI] [PubMed] [Google Scholar]
  54. Solberg O.D., Mack S.J., Lancaster A.K., Single R.M., Tsai Y., Sanchez-Mazas A., Thomson G. Balancing selection and heterogeneity across the classical human leukocyte antigen loci: a meta-analytic review of 497 population studies. Hum. Immunol. 2008;69:443–464. doi: 10.1016/j.humimm.2008.05.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Stevens E.L., Heckenberg G., Baugher J.D., Roberson E.D.O., Downey T.J., Pevsner J. Consanguinity in centre d’Étude du Polymorphisme Humain (CEPH) pedigrees. Eur. J. Hum. Genet. 2012;20:657–667. doi: 10.1038/ejhg.2011.266. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Strömberg N., Esberg A., Sheng N., Mårell L., Löfgren-Burström A., Danielsson K., Källestål C. Genetic- and lifestyle-dependent dental caries defined by the acidic proline-rich protein genes PRH1 and PRH2. EBioMedicine. 2017;26:38–46. doi: 10.1016/j.ebiom.2017.11.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Tajima F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989;123:585–595. doi: 10.1093/genetics/123.3.585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Thornton D.J., Davies J.R., Kirkham S., Gautrey A., Khan N., Richardson P.S., Sheehan J.K. Identification of a nonmucin glycoprotein (gp-340) from a purified respiratory mucin preparation: evidence for an association involving the MUC5B mucin. Glycobiology. 2001;11:969–977. doi: 10.1093/glycob/11.11.969. [DOI] [PubMed] [Google Scholar]
  59. Wang Y., Harvey C., Rousset M., Swallow D.M. Expression of human intestinal mRNA transcripts during development: analysis by a semiquantitative RNA polymerase chain reaction method. Pediatr. Res. 1994;36:514–521. doi: 10.1203/00006450-199410000-00018. [DOI] [PubMed] [Google Scholar]
  60. Yates A.D., Achuthan P., Akanni W., Allen J., Allen J., Alvarez-Jarreta J., Amode M.R., Armean I.M., Azov A.G., Bennett R., et al. Ensembl 2020. Nucleic Acids Res. 2020;48:D682–D688. doi: 10.1093/nar/gkz966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Zuccherato L.W., Schneider S., Tarazona-Santos E., Hardwick R.J., Berg D.E., Bogle H., Gouveia M.H., Machado L.R., Machado M., Rodrigues-Soares F., et al. Population genetics of immune-related multilocus copy number variation in Native Americans. J. R. Soc. Interf. 2017;14:20170057. doi: 10.1098/rsif.2017.0057. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Dcoument S1. Figures S1 and S2 and Table S1
mmc1.pdf (182.4KB, pdf)

Data Availability Statement

This paper analyzes existing publically available data, with information in the key resources table. This paper does not report any original code. Any additional information required to reanalyze the data reported in this paper is available from the lead contact on request.


Articles from iScience are provided here courtesy of Elsevier

RESOURCES