Skip to main content
PLOS One logoLink to PLOS One
. 2023 Jan 24;18(1):e0258009. doi: 10.1371/journal.pone.0258009

Cytokine gene polymorphism and parasite susceptibility in free-living rodents: Importance of non-coding variants

Agnieszka Kloch 1,*, Ewa J Mierzejewska 2, Renata Welc-Falęciak 3, Anna Bajer 4, Aleksandra Biedrzycka 5
Editor: Johan R Michaux6
PMCID: PMC9873194  PMID: 36693052

Abstract

Associations between genetic variants and susceptibility to infections have long been studied in free-living hosts so as to infer the contemporary evolutionary forces that shape the genetic polymorphisms of immunity genes. Despite extensive studies of proteins interacting with pathogen-derived ligands, such as MHC (major histocompatilbility complex) or TLR (Toll-like receptors), little is known about the efferent arm of the immune system. Cytokines are signalling molecules that trigger and modulate the immune response, acting as a crucial link between innate and adaptive immunity. In the present study we investigated how genetic variation in cytokines in bank voles Myodes glareolus affects their susceptibility to infection by parasites (nematodes: Aspiculuris tianjensis, Heligmosomum mixtum, Heligmosomoides glareoli) and microparasites (Cryptosporidium sp, Babesia microti, Bartonella sp.). We focused on three cytokines: tumour necrosis factor (TNF), lymphotoxin alpha (LTα), and interferon beta (IFNβ1). Overall, we identified four single nucleotide polymorphisms (SNPs) associated with susceptibility to nematodes: two located in LTα and two in IFNβ1. One of those variants was synonymous, another located in an intron. Each SNP associated with parasite load was located in or next to a codon under selection, three codons displayed signatures of positive selection, and one of purifying selection. Our results indicate that cytokines are prone to parasite-driven selection and that non-coding variants, although commonly disregarded in studies of the genetic background of host-parasite co-evolution, may play a role in susceptibility to infections in wild systems.

1. Introduction

Parasite-driven selection is considered a key factor shaping the evolution of the components of the immune system. In vertebrates, the immune system comprises many interacting molecules, yet the mechanisms of this selection have been comprehensively studied only in the case of major histocompatibility complex (MHC) genes e.g. [1]. Exceptionally high variation of MHC genes is contributed to parasite-driven balancing selection which maintain alleles within a population at higher frequencies than expected. Balancing selection operates through several mutually not exclusive mechanisms. A heterozygote advantage occurs when heterozygotes are able to recognise a wider array of pathogen-derived motifs compared to homozygotic individuals [2]. A rare-allele advantage occurs when parasites are most likely to adapt to the most frequent host genotypes, thus rare variants are usually beneficial [3]. Those variants are then favoured by natural selection and their frequency increases until they become common and targeted by parasite counter-adaptations. Such heterogeneity over time (and space) results in the third mechanisms, the fluctuating selection [4, 5]. Though the MHC plays an important role in antigen-based pathogen recognition, it is not the only nor the main factor responsible for resistance against pathogens [6, 7]. Recently, researchers have focused on the components of innate immunity, in particular on the toll-like receptors eg. [810], but the evolution of other elements of the immune system, including cytokines, remains poorly understood.

Cytokines, signalling molecules capable of triggering and modulating the immune response, are the crucial link between innate and adaptive immunity. As an efferent arm of the immune system they are expected to be more evolutionarily constrained than elements of the afferent arm [11]. Nonetheless, a handful of studies have so far reported signatures of balancing or positive selection in this group of molecules. For instance, balancing selection was found in genes coding for interleukins Il-1B, Il-2 and tumour necrosis factor TNF in field voles [12], and in interleukin Il-10 and protein CD14 in humans [13]. Another kind of evidence in support of contemporary parasite-driven selection operating on cytokines comes from significant associations between genetic variants and susceptibility to infections. In humans, polymorphism within the lymphotoxin alpha (LTα) has been linked to several diseases, such as leprosy [14] and malaria [15]. Among rodents, both positive and negative associations between variation in interleukins and infections with uni- and multicellular parasites have been found in the field vole [16]. Only one study focusing on non-coding variance in a free-living mammal revealed that variants affecting susceptibility may be located outside the coding part of a cytokine gene [17].

In the present study we investigated a system including the bank vole Myodes glareolus and its parasites from three locations in NE Poland, differing in the composition of the community of gastrointestinal helminths [1821]. There are three dominant nematode species in this study system–the oxyurid Aspiculuris tianjinensis (formerly A. tetraptera) and two Heligmosomoidae worms that show site-specific patterns: Heligmosomum mixtum is absent at the site Pilchy, while Heligmosomoidae glareoli is rare at the site Urwitałt. Apart from nematodes, voles are also infected with cestodes and a range of vector-transmitted blood parasites and protozoa e.g. [18, 22]. Previous studies have shown that the presence of a blood parasite decreases winter survival in root voles [23], and that nematode infections reduce reproductive fitness [24] and depauperate gut microbiota diversity [25] in mice. In our study system, we have previously reported associations between variation in the MHC and TLR (Toll-like receptor) genes [10, 21] and susceptibility to infections. Both gene families code for proteins directly interacting with structures derived from pathogens. TLRs recognize general pathogen-associated motifs triggering the innate immune response, and MHC present specific antigens as a part of the adaptive immune response. Here, we completed this picture by focusing on cytokines, which act as a functional link between those two branches of the immune system.

We selected three genes coding for non-interleukin cytokines, as this group is relatively less frequent studied than interleukins (e.g. [16]): tumour necrosis factor (TNF), lymphotoxin alpha (LTα) and interferon beta (IFNβ1). Their versatile role within the immune system, along with evidence from human studies demonstrating their importance in responding against various pathogens, makes them promising candidate genes for studying the mechanisms of parasite-driven selection.

TNF and lymphotoxin alpha (LTα) genes have similar structure, and until recently LTα was called tumour necrosis factor beta (TNFβ). The genes, separated by just 1600bp, are located in the chromosome 17 in the proximity of the genes encoding major histocompatibility complex (MHC), key component of the adaptive immune system. Tumour necrosis factor (TNF) initiates the acute phase response and acts as an endogenous pyrogen contributing to inflammation. By stimulating the endothelial cells in blood vessels it plays a role in preventing the pathogen from entering the bloodstream and in locally containing an infection [26]. It has been shown that genetic variants located in the promoter sequence of TNF affect resistance against viral infections in bank voles [17]. Here, we expanded this study by examining associations between polymorphisms within TNF and susceptibility to other groups of pathogens, namely nematodes and blood microparasites.

LTα is expressed in lymphocytes and plays a major role in immunomodulation and signal transduction within the immune system. It induces inflammation and is the key factor facilitating the innate immune response through activation of IFNβ and NF-κB pathways [14, 27] LTα is necessary for effective adaptive responses involving T and B lymphocytes [28] and for developing intestinal lymph nodes [29]. Thus, we expected to confirm links between LTα variants and resistance against gastrointestinal nematode infections. Human GWAS studies reported associations of single-nucleotide polymorphisms, including promoter and intron variants of TNF and LTα with susceptibility to several bacterial and protist infections, including Mycobacterium leprae, M. tuberculosis and Plasmodium falciparum (reviewed in [30]). Thus, we expected to find associations between polymorphisms in those genes and resistance against microbial pathogens. IFNβ1 is produced in response to viral but also bacterial infections [31], and through a specific pathway, it plays a major role in linking innate and adaptive immunity. This cytokine has not yet been studied in the context of parasite-driven selection, and we expected that variation within IFNβ1 gene may play a role in resistance against bacterial pathogens.

Helminth infections, which are prevailing pathogens in wild animals, show immunomodulatory effects supressing acute, inflammatory Th-1 type reactions, and it has been shown that presence of nematodes may alter the outcome of bacterial infections in free-living mammals [32]. Assuming that polymorphisms within inflammatory cytokines may play a role in this process, we expected to find associations between studied gene variants and presence of helminth infections.

2. Material and methods

2.1. Samples and parasite screening

In the present work we used samples collected in autumn (September) of 2005 and in 2016 at three sites in NE Poland: Urwitałt (53.80004 N, 21.65250 E), Pilchy (53.70268N, 21.80116 E), and Tałty (53.89362N, 21.55392 E). All sites are located in coniferous and mixed forests, situated within ~20 km distance. The sites Urwitałt and Pilchy lie on opposite banks of Lake Śniardwy, and Tałty is ~10km north of Urwitałt. All three sites are on public ground managed by the Polish State Forests and no specific permission to access the land was required. The number of samples collected in each year at each site is given in Table 1.

Table 1. Number of samples analysed per gene, site and year.

As explained in the Methods, only samples with all SNP successfully genotyped were included in the association analysis, these are presented in the bottom part of the table. For technical reasons, not all individuals were genotyped in all genes.

Sequenced
Pilchy Tałty Urwitałt total
2005 2005 2016 2005 2016
TNF 30 18 8 13 11 80
LTα 45 17 8 44 12 126
IFNβ1 41 0 0 44 0 85
After quality filtering
TNF 23 18 7 8 11 67
LTα 38 17 8 40 11 114
IFNβ1 41 0 0 44 0 85

The field procedures followed the guidelines of the National Ethics Committee for Experimentation on Animals and were approved by the Local Ethical Committee no. 1 in Warsaw, decisions no. 280/2003 and 304/2012. Parasite screening followed protocols described in [18, 21]. In brief, voles were live-trapped in wooden traps and transported to the field station, where they were anesthetized with isoflurane and killed by cervical dislocation. GI helminths were identified in autopsies. Worms counts were considered as the parasite abundance. Infections with intestinal protist Cryptosporidium sp. were identified in faecal smears using the Ziehl-Nielsen staining technique [33]. Blood pathogen Bartonella sp. was identified by PCR using degenerate primers described by [34], and for Babesia microti we used the protocol described by [35]. Details of the PCR reactions are given in S1 Table.

2.2. Sequencing and genotyping

DNA was extracted from vole ears using the Qiagen DNeasy Blood & Tissue Kit. To amplify partial sequences of TNF, we used primers designed by [12]; LTα and IFNβ1 were amplified using degenerate primers designed in our group (S1 Table). To minimize PCR errors, we used high fidelity polymerase Phusion or Q5 (NewEngland Biolabs). The mix contained 10uM dTNPs, 0.5uM of each primer, 0.02U/ul of the polymerase, and 10–50 ng of genomic DNA. PCR conditions specific for each locus are given in S1 Table. Amplified regions spanned most of the expressed exons and respective separating introns (S2 Table).

Samples from 2005 and 2016 were processed separately following the same protocol. In both experiments, the amplicons were pooled for each individual as described previously [10] and purified twice using CleanUp kit (Aabiot). The libraries were constructed using Nextera XT DNA Library Preparation Kit, and sequenced using MiSeq Reagent Kit v3 in an Illumina MiSeq machine. The only difference between experiments was the number of cycles used for sequencing: samples from 2005 were sequenced using 2x75 paired-end kit, and samples from 2016 were processed using 2x150bp paired-end kit. Both runs gave high coverage exceeding 1000x per site and per sample.

Reads from both runs were processed in the same pipeline as described previously [10]. In brief, adapter-clipped reads were mapped using bwa-mem ver. 0.7.12 with default parameters [36] against a reference constructed from regions including genes of interests extracted from bank vole genome (PRJNA290429). Duplicated read-pairs were removed using Picard MarkDuplicates (http://picard.sourceforge.net), and variants were called in two-step procedure in FreeBayes v 1.1.0–60 [37]. In the first round, potential variants were called using the following parameters: minimal fraction of alternate allele of 20% as recommended by [38], minimum number of reads supporting alternate allele > 2, and minimal read coverage > 5. The results were filtered using vcffilter v. 41 (https://github.com/ekg/vcflib) with conservative criteria: remove low quality calls (QUAL/AO > 10), remove loci with low read depth (DP > 10), remove alleles present in only one strand (SAF > 0 & SAR > 0), remove alleles that are only observed by reads placed to the left or right (RPR > 0 & RPL > 0). The resulting in high confidence variants were used to construct haplotypes in a second round of variant calling in FreeBayes where physical position on a read was used for phasing by specifying —max-complex-gap 37. A fraction of SNPs (5 of 36 in TNF, and 6 of 18 in LTα) could not be phased by FreeBayes; these were computationally assigned to DNA strands using PHASE algorithm [39]. Reconstructed alleles were converted to fasta format using vcfx [40, 41]. The reading frames and intron/exon structure were resolved through an alignment to the orthologue mouse sequences (TNF Y00467.1, LTα: Y00137.1, IFNβ1 NP_034640).

Sequenced parts of TNF and LTα spanned ca. 800bp and each contained 3 exons. IFNβ1 gene is intronless. The number of polymorphic sites per locus was similar considering the length of the sequenced fragment (S2 Table). However, most variance in TNF was located in introns, whereas in exons we found only 2 SNPs forming 3 haplotypes. After filtering out variants in linkage equilibrium, not in Hardy-Weinberg equilibrium, with minor allele frequency <5% and present in fewer than 5 hosts, of the initial 18 SNPs in TNF, 17 in LTα, and 8 in IFNβ1, we retained one SNP in TNF, 7 SNPs in LTα, and two in IFNβ1.

2.3. Associations between genetic variants and parasite load

The sample size was strongly limited by ethical considerations. Unable to sample more animals to test for the effect of rare alleles, we filtered the data to remove variables that were too few to produce robust results [42]. First, we removed variants and individuals with missing SNPs, keeping only those with a genotyping rate of 100%. Next, SNPs were filtered in PLINK v1.90 [43] so that only those in Hardy-Weinberg equilibrium (threshold of p<0.001) and in linkage equilibrium (r>0.7) were retained. (We tested for LD values from r>0.6 to r>0.9 and all the values resulted in the same set of SNPs.) This ensured that genetic variants fitted in the models can be treated as independent explanatory variables.

In a second step, we excluded from the models variants with minor allele frequency (MAF) <5% and those present in fewer than 5 animals (see S3 Table), as we lacked statistical power to test for their effects. The number of samples before and after filtering in given in Table 1. Although new beneficial mutations may initially be present in one or few individuals, in practice it is impossible to test for effect of alleles of MAF <5% when the relative risk of disease is lower than 1.5 [44]. Thus, due to computational limitations, our analysis omitted the effects of rare alleles.

Among the parasites found in voles, there were five nematode species, one cestode, one intestinal protist and two blood parasites. The prevalence varied from 5 to 60% but due to limitations of generalized linear models, we run only models where prevalence was 20–80% for a given parasite/ gene combination (S4 Table, S1 Fig). In total, we constructed 23 models, 16 with infection presence/absence as reponse variable, and 7 with parasite abundance as response variable. We tested for effect of TNF variants on prevalence and abundance of A. tjaniensis, H. mixtum and prevalence of Cryptosporidium, Babesia microti and Bartonella sp, LTα variants on prevalence and abundance of A. tjaniensis, H. mixtum and prevalence of Cryptosporidium, Babesia microti and Bartonella sp, and IFNβ1 variants on prevalence and abundance of A. tjaniensis, H. mixtum, H. glareoli and prevalence of Cryptosporidium, Babesia microti and Bartonella sp.

We adapted a two-step procedure used in several association studies of free-living mammals where minimal model with non-genetic terms is fitted first, and then the effects of genetic terms are tested eg. [16, 45]. Such a procedure prevents overfitting the model. The non-genetic terms were: year and site of sampling, host sex, and host age approximated by its body mass (S5 Table). If p<0.1, these terms were included in the models as co-factors. As response variables, we used either i) parasite presence/absence, or ii) parasite abundance (number of parasites of a given species per host). Parasite presence/absence was modelled using binomial distribution with logit link function, and abundance was fitted using Poisson distribution and log link function in R [46]. To control for overdispersion, in abundance models we used quasi-Poisson errors and in presence models quasi-binomial errors implemented in the glm function in the R library {stat}. The effect sizes were estimated using partial R2 implemented in the R package {rsq}. The significance of terms was determined using LR type III tests.

After fitting minimal non-genetic models including the non-genetic terms, we tested for genetic effects. The models were constructed as above, with parasite presence/absence or abundance as response variables. The explanatory variables were: significant non-genetic terms and genotypes in each SNP retained after filtering. Number of successfully genotyped individuals differed between studied genes, so we constructed separate sets of models for each studied gene. To control for multiple comparisons we applied conservative Bonferroni correction. In total, we tested for the effect of 10 SNPs retained after filtering (one in TNF, seven in LTα, and two in IFNβ1), so as a significance threshold corresponding to α = 0.05 we adapted α = 0.05/10 = 0.005.

2.4. Tests of selection

To identify potential targets of selection in exonic regions, we computed phylogenetically controlled codon-based tests which are suitable for identifying sites under selection using sets of sequences from a single species [47]. For the calculations, we used DataMonkey server [48]. Prior to analysis, we tested for possible recombinations using GARD, Genetic Algorithm Recombination Detection [49]. We computed three models based on dN/dS ratio: MEME (Mixed Effects Model of Evolution), which employs mixed-effects maximum likelihood to infer synonymous and nonsynonymous substitution rates and detect episodic positive selection under a proportion of branches [50]; FUBAR (Fast Unconstrained Bayesian Approximation), which estimates the dN/dS ratio using a Bayesian approach to detect sites under pervasive diversifying selection [51]; and FEL (Fixed Effects Likelihood), which uses a maximum-likelihood approach to detect pervasive diversifying selection using corresponding phylogeny [47]. Contrary to the other two tests, FEL assumes constant selection pressure across phylogeny.

3. Results

3.1. Cytokine polymorphisms and susceptibility to infections

After filtering the variants, in the models we included one SNP in TNF gene, 7 SNPs in LTα and two in IFN. Contrary to expectations, in either gene we found no association between genetic variants and susceptibility to infections with microparasites: intestinal protozan Cryptosporidium, bacteria Bartonella sp., and Apicomplexan blood parasite Babesia microtii.

Moreover, there were no significant associations between TNF variants and susceptiblity to helminths (S6 Table).

None of the LTa variants affected parasite prevalence but after correcting for multiple comparisons variant LTα 535 turned out to be significantly associated with abundance of the nematode Heligmosomum mixtum. The SNP LTα 522 was located in an intron, and homozygotes TT were infected by more worms (3.03 on average) compared to genotypes GG and TG (1.25 and 0.94 respectively, Fig 1).

Fig 1. Effect of SNP genotypes in LTα and IFNβ1 on the risk of infection and parasite abundance.

Fig 1

Infection intensity is expressed as the number of worms per host); the top and bottom of a box are 25 and 75% percentiles, the outliers are points 1.5 times the interquartile range above the third quartile.

SNPs in IFNβ affected risk of infection and parasite load with the nematode H. glareoli. Voles heterozygous in IFNβ1 105 were less frequently infected with the nematode H. glareoli (11.7%) compared to homozygotes TT (57.14%) and CC (18.9%, Fig 1, Table 2), and they also harboured fewer H. glareoli worms (0.411) compared to 1.78 vs 1.19 in the respective homozygotes. In IFNβ1 127 homozygotes AA were most often infected (38%) compared to 18.2% of infected individuals with the genotype AG and 16.1% with GG. IFNβ1 105 coded for synonymous substitution, and IFNβ1 127 coded non-synonymous substitution Asn→ Ser.

Table 2. Summary of GLM models showing significant effect of SNP genotype at given locus on the parasite load.

Table presents summary of GLM models including non-genetic terms as indicated in S4 Table. All models are given in S6 Table. R2 is partial coefficient of determination (effect size), χ2 and p-values are based on LR type III test. To control for multiple comparisons when testing for the effect of several genetic variants, we used conservative Bonferroni correction. For 10 genetic terms (SNPs) tested, the critical p-level corresponding to α = 0.05 is 0.005. p-values of genetic terms significant after correction are given in bold.

gene nematode variable R2 χ2 df p
Presence / absence
IFNβ1 H. glareoli IFNβ1 105 0.267 30.025 2 3.21x10 -7
IFNβ1 127 0.077 7.636 2 0.022
site 0.492 91.094 1 <0.001
host body mass 0.084 7.149 1 0.007
host sex 0.191 17.498 1 <0.001
Abundance
IFNβ1 H. glareoli IFNβ1 105 0.136 12.986 2 0.00151
IFNβ1 127 0.172 12.546 2 0.00189
site 0.180 42.039 1 <0.001
host sex 0.127 10.558 1 0.001
LTα H. mixtum LTα 322 0.001 0.051 1 0.822
LTα 347 -0.002 2.039 1 0.153
LTα 371 0.000 0.000 1 1.000
LTα 389 0.000 0.000 1 1.000
LTα 411 0.001 0.334 1 0.563
LTα 488 0.009 0.287 1 0.592
LTα 525 0.046 12.480 2 0.00195
site 0.124 30.689 2 <0.001
host body mass 0.023 2.779 1 0.095

3.2. Signatures of selection

No signal of recombination was detected in any of the studied genes using the GARD method. MEME did not detect any codon thus we did not find evidence for episodic selection. FEL and FUBAR results were generally consistent across loci (S7 Table, Fig 2). In TNF gene, two codons located in the exon 4 bore signatures of purifying selection, as indicated by FUBAR only. FUBAR and FEL detected two sites under purifying selection in LTα. Two codons in this gene were under positive selection according to FUBAR, one of those codons comprised SNP 371 located near 3’ end of the third exon. In IFNβ1, positively selected codon 39 comprised SNPs IFNβ1 127 that affected intensity of infection with the nematode H. glareoli. Two codons in TNF and one in IFNβ1 displayed signatures of negative selection. The negatively selected codon 31 in IFNβ1 comprised variant IFNβ1 105, which was associated with prevalence and abundance of H. glareoli.

Fig 2. Schematic position of the sites under selection (marked as triangles) in the studied genes in relation to exon-intron structure.

Fig 2

Positions of SNPs included in the GLM models are marked with bars and numbers, and in red we marked positions of SNPs that were significantly associated with parasite load. In bank vole sequences the numbers show distance in base pairs from the first transcribed nucleotide in mouse CDS. Note that the length of introns differed between mouse and vole.

4. Discussion

4.1. Associations between cytokine variants and parasite load

In the present study we found associations between SNPs in cytokines LTα and IFNβ1 and susceptibility to helminth infections but not to microparasites (bacteria and protists). Intronic variant LTα 535 affected abundance of the nematode Heligmosomum mixtum. IFNβ1 105 affected prevanlence and abundance of H. glareoli, and IFNβ1 127 affected risk of infection with this parasite.

Despite numerous examples showing links between SNP variation in cytokines with a variety of diseases and conditions (reviewed in [30, 52]), the role of these cell-signalling proteins in resistance against infections in free-living hosts has rarely been studied. In the present paper, we show that variation in the genes coding for two cytokines–lymphotoxin LTα and interferon beta IFNβ1 –affect susceptibility to nematode infections in bank voles. To our knowledge, this is a first report showing they play a role in resistance against infections in free-living species. We hypothesise that the mechanism may be linked to the immunomodulatory effect of nematodes altering production of inflammatory cytokines, yet this hypothesis need further studies.

Among the three frequent bank vole nematodes analysed here, H. glareoli lives in close contact with the intestinal wall, likely feeding on mucus and blood, while the larger H. mixtum dwells in the intestine lumen [53]. These two parasite species occur in voles in relatively low numbers, rarely exceeding 10–15 worms. In contrast, infections with A. tianjinesis occupying the caecum are often abundant, reaching dozens or hundreds of worms. Such a high parasite burden may result in pathological changes in the guts [54] and lead to depauperated gut microbiota diversity [25], which can have a negative impact on host fitness. In our previous work with the same study system, we found associations between infections with A. tianjinesis and genotype in MHC-DRB [21]. This effect was site-dependent: the same variant had the opposite effect at different sites. In our previous studies, we focused on haplotypes rather than SNP variants.

The signatures of positive selection and lower parasite load in LTα homozygotes reported in the current study suggest the overdominance of one allele which forms a putatively beneficial genotype. Due to the versatile role of LTα in the immune response, it is difficult to provide a functional explanation for the observed polymorphisms. Yet, LTα is crucial for development of Peyer’s patches–groupings of lymphoid follicles in the mucus membrane lining the small intestine. In mice infected with the nematode Heligmosomum polygyrus, LTα signalling was essential for the generation of T cells triggering interleukin Il-4 production [55]. IFNβ, on the other hand, is generally linked to antiviral resistance but its activation is mediated by LTα [14], which may explain its links to resistance against gastro-intestinal infections reported in the current paper. Contrary to expectations, we did not find an effect of variation within the TNF gene on susceptibility to infections with neither nematodes nor microparasites. TNF triggers an inflammatory, Th-1 type immune response that is usually suppressed by nematode infections [32], which may explain our results.

It is important to note that association studies of wild systems, as the one presented here, have some caveats. First of all, the multiple comparisons may produce false positives. As wild systems are far from randomized trials, it is difficult to control underlying population structure or standing genetic variation, even though we incorporated factors such as sampling site to our association models. To strengthen our results, in S8 Table we present models with all non-genetic terms fitted (rather than only those that significantly affect parasite load), and it is clear that these do not affect the links between genetic variants and susceptibility to parasitic infection. Nonetheless, the candidate genes approach presented here is just a first step to show the role of genetic variation within immunity genes for pathogen resistance but to better understand this connection, further studies should functionally verify the effect of predicted SNPs.

4.2. Signatures of selection

Most studies of selection acting on immunity genes in wild mammals have focused on proteins presenting motifs derived from pathogens, such as MHC or TLR, where nucleotide composition was directly attributed to functional variation. In cytokines, identifying potential targets for selection is even more difficult. Some authors have suggested that molecules of such a function are primarily affected by purifying selection [11], but several studies in free-living mammals have reported otherwise [12, 16].

We found signatures of positive selection in two variants: LTα 371 and IFNβ1 127, which suggests evolutionary pressure favouring these sites. On the other hand, the variant IFNβ1 105 displayed signatures of purifying selection, and individuals with the genotype TT were more susceptible to the nematode H. glareoli. This may be explained by a significant effect of these variants on parasite load, which is consistent with the hypothesis of pathogen-mediated model of evolution through frequency-dependent selection. It is important to note that signatures of selection do not always imply that the selected site is of functional importance, and some variation on the sequence level may not be ultimately adaptive [56]. The inference about functional importance of the sites under selection may be strengthened by analysing their associations with parasite susceptibility or resistance [57]. Detecting such a link suggests contemporary parasite-driven selection operating at those loci. In our study, both codons displaying signatures of selection in IFNβ1 comprised SNPs that were significantly associated with parasite load what allows for contributing the signatures of selection to the evolutionary pressure from parasites. However, we are aware that this is a correlative evidence which may arose from an effect of SNPs located in a physically linked region. To strengthen our hypothesis, candidate SNPs should be verified by functional in vitro testing.

4.3. Role of non-coding and synonymous variance

Variant IFNβ1 105, affecting parasite load, is synonymous. This result may seem confusing, as synonymous substitutions do not affect the amino-acid composition of a protein. However, site-specific signals of selection in this locus strongly suggest that non-neutral pressure is exerted on these sites. The role of non-coding variants may be more significant than previously thought; for instance, the list of human diseases associated with synonymous mutations is expanding [58]. In a large meta-analysis of human GWAS, [59] reported that synonymous SNPs were as often involved in disease mechanisms as non-synonymous SNPs, and they were not in linkage disequilibrium with causal non-synonymous SNPs. Several mechanisms may explain the role of synonymous mutations. They may affect mRNA splicing and the stability of transcripts [60]. Although coding for the same amino-acid, some variants may be preferred during elongation, promoting co-evolution to optimise translation efficiency [61]. On the other hand, since synonymous mutations are assumed to be evolutionarily silent, their effects might have been under-reported [58, 59]. This further underlines the importance of studies on synonymous SNPs in non-model species for a better understanding of processes maintaining genetic diversity within the immune system in the wild.

Another type of non-coding SNPs that we found to significantly affect the parasite load was an intronic variant LTα 525. Notably, it was not linked to any exonic variant, and strong LD was found only to another intronic SNP. Again, human association studies confirmed the role of intronic polymorphisms in susceptibility to diseases [62, 63], particularly when mutations are located close to intron-exon junctions or within a branchpoint sequence [64]. Intronic variants may also affect expression through alternative splicing or interactions with regulatory elements [63]. The intronic LTα variant 525 that affected susceptibility to infection with H. mixtum was located only 18bp from the intron-exon junction. Introns in the LTα gene have been shown to affect expression in several in vitro and in vivo studies (reviewed in [65]), and intronic SNPs can still affect splicing or expression, even if separated by over 30bp from any splice site [63, 66].

Studies of the effect of intronic variants on resistance against infections in the wild are rare, but in humans, positively selected SNP associated with susceptibility to Lassa virus have been found in the interleukin IL21 gene outside the open-reading frame [67]. The authors suggested that those variants may lead to regulatory changes such as differential gene expression. Intronic variants may also affect cytokine interactions with other components of the immune system. For instance, an intronic variant in the human IFNγ gene coincides with a putative NF-κ B binding site which might have functional consequences for the transcription of the human IFNγ gene [68]. In bank voles, SNP located within the promoter of the TNF gene affected susceptibility to Puumala virus (PUUV) [17]. Unfortunately, we could not confirm this pattern, as the fragment amplified in the current study did not span the 5’ upstream region.

5. Conclusion

In the present paper we examined parasite-driven selection in cytokines, excreted molecules involved in signal transduction. We identified SNPs affecting parasite load with intestinal nematodes, and we showed that codons comprising those SNPs display signatures of selection. Importantly, among those variants we found one located in an intron, and another one coding for synonymous substitution. Such variants had been commonly disregarded in studies of the genetic background of host-parasite co-evolution, yet our results show that they play a role in parasite resistance in wild systems. We propose that non-coding variants should not be automatically considered non-functional, and only by including them in association studies are we able to understand the genetic background of parasite resistance in the wild.

Supporting information

S1 Table. PCR conditions and sequences of the primers used in the current study.

The primers included degenerated sites (marked in bold).

(PDF)

S2 Table. Characteristic of the studied amplicons and summary of polymorphisms within the studied genes.

Number of respective exons and introns in mouse is given in parentheses. In TNF and LTα after slash we provide polymorphism summarises for exonic parts only.

(PDF)

S3 Table. Number of voles with given genotypes after filtering.

We removed missing calls, variants with MAF<0.05, not in Hardy-Weinberg equilibrium (threshold p<0.001), and in linkage disequilibrium (r> 0.7).

(PDF)

S4 Table. Prevalence of infections among bank voles.

The number of infected animals differs between studied genes because not all individuals were genotyped in three loci. non-inf–number of non-infected hosts, inf–number of infected hosts, %–percentage of host infected.

(PDF)

S5 Table. Summary of effect of non-genetic terms on parasite load.

Terms with p<0.1 (marked in bold) were included in GLM models testing for the effect of genetic variance (S7 Table). The models were run separately on three datasets, each including voles genotyped at a given locus (not all animals were genotyped in all loci).

(PDF)

S6 Table. Effect on cytokine genetic variants on parasite load.

As response variables we used only pathogens that infected 20–80% of hosts (S4 Table). β is parameter estimate for each contrast, R2 is partial coefficient of determination (effect size), χ2 and p-values are based on LR type III test. To control for multiple comparisons when testing for the effect of several genetic variants, we used conservative Bonferroni correction; for 10 genetic terms (SNPs) tested, the critical p-level corresponding to α = 0.05 was 0.005. Exact p-values of genetic terms significant after correction are given in bold.

(PDF)

S7 Table. Codons under selection.

Only exonic parts of the studied genes are analysed. Codons were numbered starting from the first genotyped nucleotide, not from the first transcribed nucleotide. Names of the corresponding SNPs, as used in the current paper, are given in brackets. Codons that comprised SNPs significantly associated with the parasite load are given in bold.

(PDF)

S8 Table. Effect on cytokine genetic variants on parasite load with all non-genetic terms fitted.

β is parameter estimate for each contrast, R2 is partial coefficient of determination (effect size), χ2 and p-values are based on LR type III test. To control for multiple comparisons when testing for the effect of several genetic variants, we used conservative Bonferroni correction; for 10 genetic terms (SNPs) tested, the critical p-level corresponding to α = 0.05 was 0.005. Exact p-values of genetic terms significant after correction are given in bold.

(PDF)

S1 Fig

Parasite load by parasite species in individuals genotyped in a) TNF, b) LTα, and c) IFNβ1. Dark bars represent infected animals, light–non-infected.

(PDF)

Acknowledgments

We thank D.R. Laetsch and M.A. Wenzel for their valuable hints on the bioinformatic pipeline and data analysis. We are thankful to W. Babik who provided access to an Illumina MiSeq platform, and to K. Dudek who prepared the Nextera library.

Data Availability

The raw reads from Illumina sequencing are available from SRA archive PRJNA395763. Minimal data set underlying the results containing individual genotypes and data on on individual parasite load, body mass, sex, and sampling details are stored in Open Science Framework repository https://doi.org/10.17605/OSF.IO/QKFUM.

Funding Statement

The work was supported by grant no. 2012/07/B/NZ8/00058 from the Polish National Science Centre to AK. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Radwan J, Babik W, Kaufman J, Lenz TL, Winternitz J. 2020. Advances in the Evolutionary Understanding of MHC Polymorphism. Trends Genet, 36: 298–311. doi: 10.1016/j.tig.2020.01.008 [DOI] [PubMed] [Google Scholar]
  • 2.Penn DJ, Damjanovich K, Potts WK (2002) MHC heterozygosity confers a selective advantage against multiple-strain infections. Proc Nat Acad Sci USA 99, 11260–11264. doi: 10.1073/pnas.162006499 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Borghans JAM, Beltman JB, De Boer RJ (2004) MHC polymorphism under host–pathogen coevolution. Immunogenet 55: 732–739. doi: 10.1007/s00251-003-0630-5 [DOI] [PubMed] [Google Scholar]
  • 4.Hedrick PW (2002) Pathogen resistance and genetic variation at MHC loci. Evolution 56:1902–1908. doi: 10.1111/j.0014-3820.2002.tb00116.x [DOI] [PubMed] [Google Scholar]
  • 5.Loiseau C, Zoorob R, Garnier S, Birard J, Federici P, Julliard R, et al. (2008) Antagonistic effects of a MHC class I allele on malaria–infected house sparrows. Ecol Lett 11:258–265. doi: 10.1111/j.1461-0248.2007.01141.x [DOI] [PubMed] [Google Scholar]
  • 6.Jepson A, Banya W, Sisay-Joof F, Hassan-King M, Nunes C, Bennett S, et al. (1997). Quantification of the relative contribution of major histocompatibility complex (MHC) and non-MHC genes to human immune responses to foreign antigens. Infect Immun 65: 872–876. doi: 10.1128/IAI.65.3.872-876.1997 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Acevedo-Whitehouse K, Cunningham AA. (2006) Is MHC enough for understanding wildlife immunogenetics? Trends Ecol Evol. 21(8): 433–8. doi: 10.1016/j.tree.2006.05.010 [DOI] [PubMed] [Google Scholar]
  • 8.Fornůsková A, Vinkler M, Pagès M, Galan M, Jousselin E, Cerqueira F, Morand S, Charbonnel N, Bryja J, Cosson JF (2013). Contrasted evolutionary histories of two Toll-like receptors Tlr4 and Tlr7 in wild rodents Murinae. BMC Evol Biol, 13: 194. doi: 10.1186/1471-2148-13-194 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Babik W, Dudek K, Fijarczyk A, Pabijan M, Stuglik M, Szkotak R, et al. (2015). Constraint and Adaptation in newt Toll-Like Receptor Genes. Genome Biol Evol, 7: 81–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kloch A, Wenzel MA, Laetsch DR, Michalski O, Bajer A, Behnke J, et al. (2018). Signatures of balancing selection in toll-like receptor (TLR) genes–novel insights from a free-living rodent. Sci Rep 8: 8361. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Chapman JR, Hellgren O, Helin AS, Kraus RH, Cromie RL, Waldenström J. (2016). The Evolution of Innate Immune Genes: Purifying and Balancing Selection on β-Defensins in Waterfowl. Mol Biol Evol, 33: 3075–3087. [DOI] [PubMed] [Google Scholar]
  • 12.Turner AK, Begon M, Jackson JA, Paterson S. (2012). Evidence for selection at cytokine loci in a natural population of field voles (Microtus agrestis). Mol Ecol 21:1632–1646. [DOI] [PubMed] [Google Scholar]
  • 13.Ferrer-Admetlla A, Bosch E, Sikora M, Marquès-Bonet T, Ramírez-Soriano A, Muntasell A, et al. (2008). Balancing Selection Is the Main Force Shaping the Evolution of Innate Immunity Genes. J Immunol, 181: 1315–1322. doi: 10.4049/jimmunol.181.2.1315 [DOI] [PubMed] [Google Scholar]
  • 14.Ware CF. (2005). Network communications: lymphotoxins, LIGHT, and TNF. Annu Rev Immunol 23: 787–819. doi: 10.1146/annurev.immunol.23.021704.115719 [DOI] [PubMed] [Google Scholar]
  • 15.Barbier M, Delahaye NF, Fumoux F, Rihet P. (2008) Family-based association of a low producing lymphotoxin-alpha allele with reduced Plasmodium falciparum parasitemia. Microbes Infect 10: 673–679. doi: 10.1016/j.micinf.2008.03.001 [DOI] [PubMed] [Google Scholar]
  • 16.Turner AK, Begon M, Jackson JA, Bradley JE, Paterson S (2011) Genetic Diversity in Cytokines Associated with Immune Variation and Resistance to Multiple Pathogens in a Natural Rodent Population. PLoS Genet 7: e1002343. doi: 10.1371/journal.pgen.1002343 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Guivier E, Galan M, Salvador AR, Xuéreb A, Chaval Y, Olsson GE, et al. (2010). Tnf-α expression and promoter sequences reflect the balance of tolerance/resistance to Puumala hantavirus infection in European bank vole populations. Infect Genet Evol 10: 1208–1217. [DOI] [PubMed] [Google Scholar]
  • 18.Bajer A, Behnke JM, Pawełczyk A, Kuliś K, Sereda MJ, Siński E (2005). Medium-term temporal stability of the helminth component community structure in bank voles (Clethrionomys glareolus) from the Mazury Lake District region of Poland. Parasitology 130: 213–228. [DOI] [PubMed] [Google Scholar]
  • 19.Behnke JM, Bajer A, Harris PD, Newington L, Pidgeon E, Rowlands G, et al. (2008. a). Temporal and between-site variation in helminth communities of bank voles (Myodes glareolus) from N.E. Poland. 1. Regional fauna and component community levels. Parasitology 135: 985–97. [DOI] [PubMed] [Google Scholar]
  • 20.Behnke JM, Bajer A, Harris PD, Newington L, Pidgeon E, Rowlands G, et al. (2008. B). Temporal and between-site variation in helminth communities of bank voles (Myodes glareolus) from N.E. Poland. 2. The infracommunity level. Parasitology 135: 999–1018. [DOI] [PubMed] [Google Scholar]
  • 21.Kloch A, Babik W, Bajer A, Siński E, Radwan J. (2010). Effects of an MHC-DRB genotype and allele number on the load of gut parasites in the bank vole Myodes glareolus. Mol Ecol 19 Suppl 1:255–265. doi: 10.1111/j.1365-294X.2009.04476.x [DOI] [PubMed] [Google Scholar]
  • 22.Welc-Falęciak R, Paziewska A, Bajer A, Behnke JM, Siński E. (2008). Bartonella spp. infection in rodents from different habitats in the Mazury Lake District, Northeast Poland. Vector Borne Zoonotic Dis 8:467–474. doi: 10.1089/vbz.2007.0217 [DOI] [PubMed] [Google Scholar]
  • 23.Kloch A, Baran K, Buczek M, Konarzewski M, Radwan J. (2012) MHC influences infection with parasites and winter survival in the root vole Microtus oeconomus. Evol Ecol 27: 635–653. [Google Scholar]
  • 24.Porcherie A (2005) Susceptibilité aux parasites des hybrides entre Mus musculus musculus et Mus musculus domesticus: origine du phénomène et rôle dans la contre-sélection des hybrides. Thesis of the University of Montpellier II, Montpellier.
  • 25.Guiver E, Galan M, Lippens C, Bellenger J, Faivre B, Sorci G. 2022. Increasing helminth infection burden depauperates the diversity of the gut microbiota and alters its composition in mice. Current Research in Parasitology & Vector-Borne Diseases 2: 100082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Murphy K, Travers P, Walport M, Janeway C. (2012). Janeway’s immunobiology. 8th ed. New York: Garland Science. [Google Scholar]
  • 27.Iizuka K, Chaplin DD, Wang Y, Wu Q, Pegg LE, Yokoyama WM, et al. (1999). Requirement for membrane lymphotoxin in natural killer cell development. Proc Natl Acad Sci USA 96: 6336–6340. doi: 10.1073/pnas.96.11.6336 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.De Togni P, Goellner J, Ruddle NH, Streeter PR, Fick A, Mariathasan S, et al. (1994). Abnormal development of peripheral lymphoid organs in mice deficient in lymphotoxin. Science 264:703–707. doi: 10.1126/science.8171322 [DOI] [PubMed] [Google Scholar]
  • 29.Fu YX, Chaplin DD (1999) Development and maturation of secondary lymphoid tissues. Annu. Rev. Immunol. 17, 399–433. doi: 10.1146/annurev.immunol.17.1.399 [DOI] [PubMed] [Google Scholar]
  • 30.Qidwai T, Khan F. (2011). Tumour Necrosis Factor Gene Polymorphism and Disease Prevalence. Scand J Immunol 74: 522–547. doi: 10.1111/j.1365-3083.2011.02602.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nagarajan UM (2011) Induction and Function of IFNβ During Viral and Bacterial Infection. Crit Rev Immunol. 31: 459–474. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Lamb T. (Ed.) (2012) Immunity to parasitic infections. Wiley-Blackwell. [Google Scholar]
  • 33.Henricksen S, Pohlenz J (1981) Staining of cryptosporidia by modified Ziehl–Nielsen technique. Acta Vet Scand 22: 594–596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Norman AF, Regnery R, Jameson P, Greene C, Krause DC (1995). Differentiation of Bartonella-like isolates at the species level by PCR-restriction fragment length polymorphism in the citrate synthase gene. J Clin Microbiol 33: 1797–1803. doi: 10.1128/jcm.33.7.1797-1803.1995 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Persing DH, Mathiesen D, Marshall WF, Telford SR, Spielman A, Thomford JW, et al. (1992). Detection of Babesia microti by polymerase chain reaction. J Clin Microbiol, 30: 2097–2103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Li H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997. [Google Scholar]
  • 37.Garrison E, Marth G. (2012). Haplotype-based variant detection from short-read sequencing. preprint arXiv:1207.3907. [Google Scholar]
  • 38.Nielsen R, Paul JS, Albrechtsen A, Song YS. (2011). Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12:443–451. doi: 10.1038/nrg2986 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978–989. doi: 10.1086/319501 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Castelli EC, Mendes-Junior CT, Sabbagh A, Porto IO, Garcia A, Ramalho J, et al. (2015). HLA-E coding and 3’ untranslated region variability determined by next-generation sequencing in two West-African population samples. Hum Immunol. 76: 945–53. doi: 10.1016/j.humimm.2015.06.016 [DOI] [PubMed] [Google Scholar]
  • 41.Lima THA, Buttura RV, Donadi EA, Veiga-Castelli LC, Mendes-Junior CT, Castelli E. (2016). HLA-F coding and regulatory segments variability determined by massively parallel sequencing procedures in a Brazilian population sample. Hum Immunol. 76: 841–853. [DOI] [PubMed] [Google Scholar]
  • 42.Hong EP, Park JW. (2012). Sample size and statistical power calculation in genetic association studies. Genomics Inform 10: 117–22. doi: 10.5808/GI.2012.10.2.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. (2007) PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet 81: 559–575. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Foulkes WD. (2008). Inherited Susceptibility to Common Cancers. N Engl J Med 359: 2143–2153. doi: 10.1056/NEJMra0802968 [DOI] [PubMed] [Google Scholar]
  • 45.Paterson S, Wilson K, Pemberton JM (1998) Major histocompatibility complex variation associated with juvenile survival and parasite resistance in a large unmanaged ungulate population Proc Natl Acad Sci USA 95: 3714–9. doi: 10.1073/pnas.95.7.3714 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. [Google Scholar]
  • 47.Kosakovsky Pond SL, Frost SD. (2005). Not so different after all, a comparison of methods for detecting amino acid sites under selection. Mol Biol Evol 22: 1208–1222. doi: 10.1093/molbev/msi105 [DOI] [PubMed] [Google Scholar]
  • 48.Delport W, Poon AFY, Frost SDW, Kosakovsky Pond SL. (2010). Datamonkey 2010, a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics, 26: 2455–2457. doi: 10.1093/bioinformatics/btq429 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SDW (2006). GARD: a genetic algorithm for recombination detection. Bioinformatics, 22: 3096–3098. doi: 10.1093/bioinformatics/btl474 [DOI] [PubMed] [Google Scholar]
  • 50.Murrell B, Wertheim JO, Moola S, Weighill T, Scheffler K, Kosakovsky Pond SL (2012). Detecting individual sites subject to episodic diversifying selection. PLoS Genetics, 8: e1002764. doi: 10.1371/journal.pgen.1002764 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Murrell B, Moola S, Mabona A, Weighill T, Sheward D, Kosakovsky Pond SL, et al. (2013). FUBAR, a fast, unconstrained bayesian approximation for inferring selection. Mol Biol Evol, 30: 1196–1205. doi: 10.1093/molbev/mst030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Hollegaard MV, Bidwell JW. (2006). Cytokine gene polymorphism in human disease: on-line databases Genes Immun 7: 269–276. [DOI] [PubMed] [Google Scholar]
  • 53.Haukisalmi V, Henttonen H. (1993). Coexistence in Helminths of the Bank Vole Clethrionomys glareolus. II. Intestinal Distribution and Interspecific Interactions. J Anim Ecol 62: 230–238. [Google Scholar]
  • 54.Deter J, Charbonnel N, Cosson J-Fr, Morand S. (2008). Regulation of vole populations by the nematode Trichuris arvicolae: insights from modelling. J Wildl Res 54: 60–70. [Google Scholar]
  • 55.Else KJ, Finkelman FD (1998). Intestinal nematode parasites, cytokines and effector mechanisms. Int J Parasitol 28: 1145–58. doi: 10.1016/s0020-7519(98)00087-3 [DOI] [PubMed] [Google Scholar]
  • 56.Těšický M, Velová H, Novotný M, Kreisinger J, Beneš V, Vinkler M. (2022). Positive selection and convergent evolution shape molecular phenotypic traits of innate immunity receptors in tits (Paridae). Mol Ecol 29: 3056–3070. [DOI] [PubMed] [Google Scholar]
  • 57.Piertney SB, Oliver MK. (2006). The evolutionary ecology of the major histocompatibility complex. Heredity 96: 7–21. doi: 10.1038/sj.hdy.6800724 [DOI] [PubMed] [Google Scholar]
  • 58.Sauna ZE, Kimchi-Sarfaty Ch. (2011). Understanding the contribution of synonymous mutations to human disease. Nat Rev Genet 12: 683–691. doi: 10.1038/nrg3051 [DOI] [PubMed] [Google Scholar]
  • 59.Chen R, Davydov EV, Sirota M, Butte AJ (2010) Non-Synonymous and Synonymous Coding SNPs Show Similar Likelihood and Effect Size of Human Disease Association. PLoS ONE 5: e13574. doi: 10.1371/journal.pone.0013574 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Cartegni L, Chew SL, Krainer AR. (2002). Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nature Rev Genet 3: 285–298. doi: 10.1038/nrg775 [DOI] [PubMed] [Google Scholar]
  • 61.Plotkin JB, Kudla G. (2011) Synonymous but not the same: the causes and consequences of codon bias. Nature Rev Genet 12: 32–42. doi: 10.1038/nrg2899 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. (2009). Finding the missing heritability of complex diseases. Nature 461: 747–753. doi: 10.1038/nature08494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Cooper DN. (2010). Functional intronic polymorphisms: Buried treasure awaiting discovery within our genes. Hum Genomics 4: 284–288. doi: 10.1186/1479-7364-4-5-284 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Královicová J, Lei H, Vorechovský I. (2006). Phenotypic consequences of branch point substitutions. Hum Mutat 27: 803–813. doi: 10.1002/humu.20362 [DOI] [PubMed] [Google Scholar]
  • 65.Yokley BH. (2010). Regulation of the lymphotoxin alpha gene: characterization of elements located between the transcription and translation start sites that impact expression. PhD thesis, Georgetown University.
  • 66.Coulombe-Huntington J, Lam KCL, Dias C, Majewski J. (2009). Fine-scale variation and genetic determinants of alternative splicing across individuals. PLoS Genet 5:e1000766. doi: 10.1371/journal.pgen.1000766 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Andersen KG, Shylakhter I, Tabrizi S, Grossman SR, Happi CT, Sabeti PC (2012). Genome-wide scans provide evidence for positive selection of genes implicated in Lassa fever. Philos Trans R Soc Lond B Biol Sci., 367: 868–877. doi: 10.1098/rstb.2011.0299 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Pravica V, Perrey C, Stevens A, Lee JH, Hutchinson IV. (2000) A single nucleotide polymorphism in the first intron of the human IFN-gamma gene: absolute correlation with a polymorphic CA microsatellite marker of high IFN-gamma production. Hum Immunol 61: 863–6. doi: 10.1016/s0198-8859(00)00167-1 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Johan R Michaux

10 May 2022

PONE-D-21-29710Cytokine gene polymorphism and parasite susceptibility in free-living rodents: importance of non-coding variantsPLOS ONE

Dear Dr. Kloch,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. More precisely, the three reviewers agree that your manuscript is interesting but need of major revision before a possible acceptance.

A first comment concerns the sampling cohort, which is quite small and which is heterogeneous (recruitment from 2005 to 2016). It is not therefore easy to compare data from different years and locations. Different other factor could also play a role to explain the obtained results. To correct these risks, I agree with the first reviewer and it would be important to strengthen multiple test correction. And accordingly, to rewrite abstract, results and discussion.

Moreover, I also agree with the second and third reviewers who propose to strengthen the introduction, giving more information concerning the study system, the interest to associate these particular cytokines genes with host-parasite interactions and also to better explain your aims and hypothesis in the last paragraph of the introduction and discuss mechanisms of balancing selection.

Like for the three reviewers, I also think that it would be good to clarify the method chapter and particularly the statistical analyses.

All reviewers noted several grammatical and typos errors. They ask to rewrite the text with a native English speaker. I agree with their comments and I think that the text would be improved too.

Finally, all minor comments noted by the three reviewers will have to be taken into account. This will really improve the quality of the text.

Please submit your revised manuscript by June 24, 2022. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Johan R. Michaux

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at 

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. In your Methods section, please provide additional location information, including geographic coordinates of your field collection site if available.

3. In your Methods section, please provide additional information regarding the permits you obtained for the work. Please ensure you have included the full name of the authority that approved the field site access and, if no permits were required, a brief statement explaining why.

4. To comply with PLOS ONE submissions requirements, please provide methods of sacrifice in the Methods section of your manuscript.

5. Thank you for stating the following financial disclosure: "The work was supported by grant no. 2012/07/B/NZ8/00058 from the Polish National Science Centre to AK."

Please state what role the funders took in the study.  If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." 

If this statement is not correct you must amend it as needed. 

Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf.

6. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

7. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

8. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: No

Reviewer #3: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: General comments

This is an interesting study focused on the variation of three cytokines in bank voles. I particularly acknowledge that authors try to correlate particular SNP variation with infection/ diseases data, which is very rare outside human studies. I also agree with the authors that further research should focus more on genes outside MHC and TLRs and/or non-coding regions.

My major comments are that authors should better explain their aims and hypothesis in the last paragraph of the introduction and discuss mechanisms of balancing selection. To avoid multiple testing issues, I am not sure whether it would be better to merge models from Table S6 with models in Table 7 rather than including only significant and marginally significant predictors from models in Table S6 to the final models in Table S7. However, I am leaving the decision to the editor. I also think that manuscript should be checked by an English linguist.

Please, include the line numbering next time as it helps to orient while reviewing the manuscript. I had to mostly calculate line numbers by hand as the manuscript is in PDF format and also copy the whole sentences that need to be modified…

Specific line-by-line comments

***abstract

*P1, L9-L11 Please, revise the sentence below – some words are missing in ““ part

Two SNPs in LTα and two in IFNβ1 significantly affected susceptibility to nematodes, “and was of them was also associated with susceptibility“ to microbial pathogen Bartonella.

*P1, L3-L4

MHC is not a receptor. Revise the sentence below for accuracy:

Despite extensive studies of receptors, such as MHC or TLR, little is known about efferent arm of the immune system.

***Introduction

*P3, L3 What are higher vertebrates? Maybe mentioning mammals is redundant in: In mammals and higher vertebrates, the immune system comprises dozens of interacting molecules, yet the mechanisms of this selection have been comprehensively studied only in the case of major histocompatibility complex (MHC) genes (e.g. Radwan et al 2020).

*P3, L11

Typing error: e.g.

in eg. Fornuskova et al. 2013, Babik et al. 2015, Kloch et al. 2018)

*P3, L16-18 Maybe abbreviation mentioned for the first time should be explained already here:

For instance, balancing selection was found in interleukins Il-1B, Il-2, and TNF in field voles (Turner et al. 2012), and in Il-10 and CD14 in humans (Ferrer-Atmetlla et al. 2008).

and further in L20, In humans, polymorphism within the LTα has been linked to several diseases, such as Mycobacterium leprae (Ware 2005) and malaria (Barbier et al. 2008)

*L27, Please, revise the first part of the sentence for clarity. Gene name abbreviations are not necessary to be explained here. Also, all gene names should be written in Italics here and in other places in the manuscript where they mean gene products and not proteins.

To better understand the role of the parasite-driven selection maintaining polymorphism in cytokines, and to further explain the role of this polymorphism in resistance against pathogens in the wild, we studied three cytokine genes: tumor necrosis factor (TNF), lymphotoxin alpha (LTα) formerly known as tumor necrosis factor beta (TNF�), and interferon beta (IFNβ1).

*I think that the aims of the study and hypothesis should be better clarified in the last paragraph of the introduction.

*I would also briefly mention the mechanism of parasite-mediated balancing selection in the introduction: heterozygote advantage, NFDS, and space-time fluctuating selection.

***Methods

*P5, L23, What reference sequences were used for numbering? Please, mention their sequence IDs.

*P5, L25: The following sentence should be better in results.

The sequencing revealed 43 SNP in ~2000bp in total

*P4, L26

I am lacking details about which tissue has been used for DNA extraction and which extraction kit has been used. It should be here even if it is referenced in one of the author´s papers cited.

*P5, L24

I am not sure if I fully understand the sequencing strategy. Do authors have only one amplicon per gene? But given the restricted overlaps between forward and reverse reads/ or little variation in the overlapping regions, there was a need to phase the alleles? If so, please mention it explicitly in the methods. What is the percentage of alleles that were directly reconstructed without the need of phasing?

*P6, L1

What is MAF? Please, explain. Please, also better mention what is the variant frequency in % than less in than 5 animals.

*P6, L3

I appreciate that the authors used Benjamini-Yekutieli false discovery rate to avoid type I error. However, to avoid multiple testing issues, I am not sure whether it would be better to merge models from Table S6 with models in Table 7 rather than including only significant and marginally significant predictors from models in Table S6 to the final models in Table S7.

*P6, L20

I do not see any results for recombination testing and given that I think that recombination was not revealed. Please, mention it either here or in the relevant result section.

*P6, L21

Please, add that the selection methods used are based on dN/dS ratio testing.

***Results

*P6, 3.1. Cytokine polymorphisms and susceptibility to infections

I would add some general summary of polymorphism revealed, ideally with a table in SI with values such as number of sequences, number of unique nucleotide alleles, number of variable sites, number of substitutions to see how much are cytokines variable.

*** Discussion

*P7, L30

MHC is not a receptor. Please, correct the following sentence:

Studies of associations between polymorphisms in the immunity genes and susceptibility to diseases in wild mammals usually focus on receptor proteins, such as MHC or TLR

*P7, L30 I do not agree with the statement that nucleotide variation in molecules that physically interact with pathogen structure is ultimately functional. There is much variation that is the most likely non-adaptive, even if it is under positive selection. Please, see e.g. Těšický et al. 2020 wherein some approaches on how to distinguish putative functional variation from non-functional one are outlined.

Těšický M, Velová H, Novotný M, et al (2020) Positive selection and convergent evolution shape molecular phenotypic traits of innate immunity receptors in tits (Paridae). Mol Ecol 3056–3070. https://doi.org/10.1111/mec.15547

Original sentence: Since they physically interact with pathogen-derived motifs, their nucleotide composition can be directly attributed to functional variation

*P8, L7 Please, clarify what is meant by repeated rounds of positive selection interspersed with purifying selection.

***Tables and figures

*Figure 1

I think that it is quite confusing to present in upper panels of the figure both percentage and no. individuals as there is no scale on the y-axis for the percentage. Also, e.g. column for 57 % is higher than for 34.4 %. Why there is no CC genotype for LTalfa 322?

*Figure 2.

It seems that the last sentence of figure 1 caption is unfinished: “Positions of SNPs are given with bars, and SNPs”

Please, also add the numbering at the beginnings and the ends of exons.

* Table S6 and S7

Please add a new column with the number of observations in both tables

*Table S8

LTalfa 107* - what does this asterisk mean?

Please, include the same position identified under selection by multiple methods in the same row to allow a reader to better compare which positions have been identified by the multiple selection methods.

Reviewer #2: In the paper " Cytokine gene polymorphism and parasite susceptibility in free-living rodents: importance of non-coding variants”, Agnieszka Kloch and co-Authors investigated how genetic variation in cytokines affects susceptibility to parasitic diseases in bank voles.

The Authors, in particular, studied the three cytokines TNF, LTα and IFNβ1, demonstrating that two SNPs in LTα and two in IFNβ1 significantly affected susceptibility to nematodes and to the microbial pathogen Bartonella.

The Authors concluded that the identified cytokines are prone to parasite-driven selection, and non-coding variants may be linked to susceptibility to infections in wild systems.

The cohort is quite small for this particular type of study; then, the data are quite heterogeneous, because the recruitment was performed in 2005 and 2016; it is not easy to compare data from different years and locations.

The Authors wrote they corrected the results by means of false discovery rate; nevertheless, in the present paper there are a lot of comparisons and given the great number of tests performed in the study, some obtained p-values may be spurious. I therefore ask the Authors to strengthen multiple test correction; accordingly, the Authors should rewrite abstract, results and discussion. I recognise that if the Authors would apply a too strong correction (such as Bonferroni), probably most of the comparisons would be lost; so, I ask the Authors to set a value in order to clean the P which may be resulted significant only by chance. In fact, in my opinion, P-values of 0.02, 0.03 obtained in a small cohort study without a strong correction for multiple testing may be spurious.

Minor changes:

• Introduction, L3: please change the word “dozen” with another more scientific word

• Introduction: in the sentence “In humans, polymorphism within the LTα …”, please define LT the first time you mention it.

• Material and Methods: I ask the Authors a brief comment in the main-text about the sentence: “samples from 2005 were sequenced using 150 cycles, and samples from 2016 using 300 cycles.

• Material and Methods: In the sentence: “In a second step, we excluded from the model variants with MAF<0.5 and those present in fewer than 5 animals (see Table S4), as we lacked statistical power to test for their effects”, I understand the reasons of a lack of statistical power, but, in my opinion, a new mutation may be (at the beginning) present in only one animal and with a MAF<0.5. Please comment this fact in the main-text.

• Supplementary Tables 2: the Authors indicated that some primers contain degenerate nucleotides (such as W, Y, etc.); I ask the Authors to add a brief comment and an explanation of this in the main-text

In conclusion, in my opinion the present paper cannot be accepted in the present form; I ask the Authors both major and minor revisions. A major rewrite is necessary of all the sections of the paper.

Reviewer #3: In this manuscript, the authors investigate associations between genetic variants in cytokine genes and parasite infection and parasite load in wild bank voles. The authors find that there are associations between a few SNPs and parasite infection or load and there is evidence that these variants are under selection.

I believe that the topic is interesting, particularly because studies looking beyond the MHC and in wild systems are rare. However, I do feel that the introduction could be strengthened by giving a bit more background to the study system and stating the rationale for looking at these particular cytokine genes in this host-parasite system. In addition, I think the methods could be a bit clearer on the details of the study system and statistical analyses. There are also some grammatical errors and typos, which I have tried to point out as best as I can despite the lack of line numbers. I go through these points in more detail, along with other comments that I hope will improve the clarity of the manuscript, below:

Abstract, line 3: little is known about ‘the’ efferent arm of the immune system

Abstract: would be nice to add in the sample size into the abstract

Abstract, line 9: I think this should read ‘one of them’ rather than ‘was of them’

Abstract, line 10: would be nice to add in the type of selection observed

Abstract, final sentence: it appears that there are more associations with coding (exonic) variants that affect parasite infection/load than intronic variants?

Introduction, paragraph 2, line 3: I think the ‘and’ needs to be removed here before ‘they are expected’ and ‘have’ needs to be added before ‘reported signatures’

Introduction, paragraph 2, line 6 and 11: might need to specify that selection was found in the genes (line 6) and variation was genetic variants in interleukin genes (line 11)

Introduction, paragraph 2, line 12: find = found

Introduction, paragraph 2, line 12: ‘A’ single study. Might need to specify whether this has been investigated and not found, or if only one study has investigated non-coding regions

Introduction, paragraph 2: in the next paragraph you explain what LTα is but it would be better to put this here on the first mention of this cytokine gene.

Introduction, paragraph 3, first sentence: remove ‘the’ before parasite-driven selection and ‘this polymorphism’ should be ‘these polymorphisms’

Introduction, paragraph 3, second sentence: remove ‘an’ before evidence, its importance = their importance and ‘makes them a promising candidate gene’ should be ‘makes them promising candidate genes’

Introduction, paragraph 3, third sentence: add ‘the’ before acute phase and remove ‘the’ before inflammation

Introduction: I believe that the introduction would benefit from bringing in the study system and describing the types of pathogens that are present in the two populations of bank voles. What pathogens cause more morbidity or mortality in this system? It would then be nice to use this information to link into why you are studying these particular cytokine genes so that it feels more hypothesis driven. This would also help you to develop key expectations of the types of genes that might be important – from previous work for example by Turner et al in the field voles – and also what types of variants might be important (e.g. synonymous vs non-synonymous and exonic vs intronic).

Materials and methods, section 2.1: It would be important to add here the sample size of voles in the two years at each site into the main text (rather than a supplementary table). Later, post QC of SNPs it would be good to state from X number of individuals X were used in the final analyses.

Materials and methods, section 2.1: Are any of the individuals in these populations related? Did you control for this at all?

Materials and methods, section 2.1: A brief discussion of the differences in parasite burden between the two sites would be good to have here (or in the introduction) so that this manuscript can stand alone.

Materials and methods, section 2.1: In the introduction and discussion it would be important to introduce and compare respectively the associations you find here with those in the MHC and TLR genes in the same study system – are patterns of synonymous vs non-synonymous and exonic vs intronic similar? Are effect sizes larger in the MHC or TLR genes than in the cytokine genes?

Materials and methods, section 2.1, second to last sentence: ‘the’ protocol

Materials and methods, page 5, third sentence: repeat of the word ‘using’

Materials and methods, page 5, first paragraph: why were a different number of cycles used?

Materials and methods, page 5, second paragraph, second sentence: ‘a’ reference and ‘genes of interest’

Materials and methods, page 5, second paragraph: it would help to give a version and reference for both the bank vole genome and mouse genome used (for orthologues).

Materials and methods, page 5, second paragraph: it would help the reader if you explained what the abbreviations are that are used as your QC criteria in vcffilter.

Materials and methods, page 5, line 12: ‘where physical position’ rather than ‘were physical position’

Materials and methods, page 5, second paragraph, final sentence: would be nice to add in the number of genes and corresponding chromosomes

Materials and methods, page 5, section 2.3, first paragraph: I don’t really understand the last sentence about minimising false positives– could you clarify?

Materials and methods, page 5, section 2.3, second paragraph: do you mean you removed SNPs with a genotyping rate <100% and any individuals that had any missing SNP genotypes? Please give final sample sizes after QC

Materials and methods, page 5, section 2.3, second paragraph: How many SNPs deviated from HWE? This might be expected if selection is occurring on these loci.

Materials and methods, page 6, first paragraph: I think there is a typo – surely minor allele frequency must be <0.5?

Materials and methods, page 6, first paragraph: rather than put the pathogens in a supplementary table it would be important to specifically list each pathogen looked at and ideally the % prevalence. Histograms of parasite load would be nice and should go in supplement. I think this is really important because it is not immediately obvious to the reader how many pathogens you are looking at and therefore it is hard to put the results into context.

Materials and methods, page 6, second paragraph: some details of the models are missing including the type of link, how significance of fixed effects was determined, how post-hoc testing was done between SNP genotypes, which package was used for models etc. How did you control for overdispersion (and presumably zero-inflation) in the poisson model? Or did the model fit without other means of controlling for overdispersion? Did every pathogen abundance/load model suitably fit a poisson model? Did you try running these in a hurdle model which should combine the two models you mention (parasite infection y/n and parasite load)? Did you test for any interactions between SNPs and non-genetic sources such as sex or site?

Materials and methods, page 6, section 2.4: it would be useful to the reader if you could explain how these codon-based tests identify sites under selection, even briefly.

Results: I think it would be useful to explicitly state each of the null results (for both models and each pathogen) you found too rather than just the significant results. It would also be good to state X number of SNPs out of X total SNPs in a gene were associated with a trait – as you did with TNF in the first sentence. It would also be good to state whether SNP effects were additive, and to compare the effects of different SNPs (on the same parasite) in the discussion.

Results: There is some inconsistency in how you explain the results. For some SNPs you give the average number of worms in each genotype class but not for others – it would be good to provide this for all. Also it would be good to have errors around these values and to explain how these were calculated in the methods – is it from the raw data or post hoc testing controlling for other non-genetic variables in the model?

Results: I would not discuss a result that did not meet the p value after correction for multiple testing.

Results: It would be nice to have full results from each model (model estimates, errors and p values for each SNP and for non-genetic variables too and post hoc testing between genotypes) reported somewhere, even if in the supplement. You could also compare the effects of non-genetic sources to SNPs in each model

Results, page 6, second paragraph: give full latin name on first mention

Results, page 7, second paragraph: typo – lowerst

Results, page 7, second paragraph: what does average high mean?

Results, page 7, section 3.2: typo compring = comprising

Figure 1: Explain bar chart colours and outliers and labels. A stacked bar chart might be a better option.

Figure 2: I think the caption is unfinished?

Discussion: it would be nice if the discussion was reformatted to follow the same format as the results – starting with SNPs associated with infection or parasite load, followed by discussion of selection on these variants

Discussion, page 8, line 3: I would rephrase to ‘genetic variation in cytokine genes also plays’ rather than ‘cytokine variance’ as this sounds like you are actually measuring cytokine levels

Discussion, page 8, paragraph 2: there are a lot of ‘what’s in this paragraph that should be ‘which’

Discussion, page 8, paragraph 2: I wouldn’t comment on the effect on Bartonella sp. as I think it was non significant after correction for multiple testing

Discussion, page 8, paragraph 2: following on from the final sentence it would be nice to have some insight into how these pathogens affect morbidity and mortality in this species or a similar species

Discussion, page 8, paragraph 3: This is a nice discussion of the role of synonymous mutations

Discussion: some more discussion of other wild systems (e.g. Turner et al’s work in the field voles) and comparisons to effects of the MHC and TLR variants previously investigated in this system would strengthen the manuscript. It would also be nice to add whether variants in IFNβ1 and LTα (or orthologues) have previously been implicated in susceptibility to parasitic diseases in other species.

Discussion, page 9: the conclusion feels a bit rushed. While there is an intronic and a synonymous SNP, it appears that more associations with exonic and non-synonymous SNPs and parasite infection/load – so can you say that intronic and synonymous variants play a (more) important role? Or are you just saying that they are also present and should not be ruled out?

Discussion: some discussion of the advantages and disadvantages of candidate gene studies would be helpful to the reader

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Jan 24;18(1):e0258009. doi: 10.1371/journal.pone.0258009.r002

Author response to Decision Letter 0


7 Jul 2022

Response to the Reviewer #1:

My major comments are that authors should better explain their aims and hypothesis in the last paragraph of the introduction and discuss mechanisms of balancing selection.

We expanded the Introduction adding a passage describing the study system and we specified the aims and the hypothesis. We also completed the Introduction with a passage discussing mechanisms of the BS.

To avoid multiple testing issues, I am not sure whether it would be better to merge models from Table S6 with models in Table 7 rather than including only significant and marginally significant predictors from models in Table S6 to the final models in Table S7.

The two step procedure, where first non-genetic minimal model is constructed and then genetic terms are fitted, has been often used before (eg. Paterson et al. PNAS 1998: 95, Turner et al. PloS Genetics 2011). It helps to avoid overfitting the model and to improve the fit of the model by including only those non-genetic factors that indeed affect parasite load. We tested two alternative procedures: i) model simplification based on AIC and removing variables that explain little variance in data, and ii) fitting non-genetic terms as fixed variables in mixed models, and both gave similar results as the original models but resulting models had lower fit.

Specific line-by-line comments

***abstract

*P1, L9-L11 Please, revise the sentence below – some words are missing in ““ part

Two SNPs in LTα and two in IFNβ1 significantly affected susceptibility to nematodes, “and was of them was also associated with susceptibility“ to microbial pathogen Bartonella.

The sentence was rewritten, and the part about Bartonella was removed as after applying Bonferroni correction this association turned non-significant.

*P1, L3-L4

MHC is not a receptor. Revise the sentence below for accuracy:

Despite extensive studies of receptors, such as MHC or TLR, little is known about efferent arm of the immune system.

We changed “receptor” to a more general term “proteins interacting with pathogen-derived ligands”

***Introduction

*P3, L3 What are higher vertebrates? Maybe mentioning mammals is redundant in: In mammals and higher vertebrates, the immune system comprises dozens of interacting molecules, yet the mechanisms of this selection have been comprehensively studied only in the case of major histocompatibility complex (MHC) genes (e.g. Radwan et al 2020).

Since MHC has been also studied in fish, we decided this sentence may be misleading and we changed “ In mammals and higher vertebrates” simply to “in vertebrates”.

*P3, L16-18 Maybe abbreviation mentioned for the first time should be explained already here:

For instance, balancing selection was found in interleukins Il-1B, Il-2, and TNF in field voles (Turner et al. 2012), and in Il-10 and CD14 in humans (Ferrer-Atmetlla et al. 2008).

and further in L20, In humans, polymorphism within the LTα has been linked to several diseases, such as Mycobacterium leprae (Ware 2005) and malaria (Barbier et al. 2008)

We added the full names of these proteins.

*L27, Please, revise the first part of the sentence for clarity. Gene name abbreviations are not necessary to be explained here. Also, all gene names should be written in Italics here and in other places in the manuscript where they mean gene products and not proteins.

To better understand the role of the parasite-driven selection maintaining polymorphism in cytokines, and to further explain the role of this polymorphism in resistance against pathogens in the wild, we studied three cytokine genes: tumor necrosis factor (TNF), lymphotoxin alpha (LTα) formerly known as tumor necrosis factor beta (TNFβ), and interferon beta (IFNβ1).

After re-arranging the Introduction, we provided full names of the genes in their first occurrence in the text (except for abstract). We changed abbreviations to italics throughout the paper to make clear that we refer to a gene, and not its product.

*I think that the aims of the study and hypothesis should be better clarified in the last paragraph of the introduction.

We clarified the hypothesis and aims

*I would also briefly mention the mechanism of parasite-mediated balancing selection in the introduction: heterozygote advantage, NFDS, and space-time fluctuating selection.

We added a relevant passage to the first paragraph of the Introduction

***Methods

*P5, L23, What reference sequences were used for numbering? Please, mention their sequence IDs.

We added GenBank accesion numbers for the reference sequences.

*P5, L25: The following sentence should be better in results.

The sequencing revealed 43 SNP in ~2000bp in total

We decided to remove this sentence, as we describe the number of SNPs in each gene in details in the section 2.3, paragraph 2.

*P4, L26

I am lacking details about which tissue has been used for DNA extraction and which extraction kit has been used. It should be here even if it is referenced in one of the author´s papers cited.

We added information on DNA extraction.

*P5, L24

I am not sure if I fully understand the sequencing strategy. Do authors have only one amplicon per gene? But given the restricted overlaps between forward and reverse reads/ or little variation in the overlapping regions, there was a need to phase the alleles? If so, please mention it explicitly in the methods. What is the percentage of alleles that were directly reconstructed without the need of phasing?

In our case, Illumina reads are shorter than the amplicons. The chemistry allows for sequencing fragments up to 300bp, and amplified fragments of LTa and TNF were ~800bp. Thus, prior to sequencing the amplicons had to be split into shorter fragments using Illumina Nextera kit. As a result, some distant SNPs could be always located in separate fragments. Lacking information about the physical position of alleles on a DNA strand, Freebayes (the software used for variant calling and haplotype reconstruction) could not phase some variants. The alleles in those SNPs had to be phased computationally. There were 5 such SNPs out of 36 SNPs in TNF, and 6 of 18 in LTA, and none in IFN. We added this information and we changed potentially confusing phrase “Phased alleles were reconstructed...” to “ Phased alleles were converted to fasta format...”

*P6, L1

What is MAF? Please, explain. Please, also better mention what is the variant frequency in % than less in than 5 animals.

MAF is minor allele frequency, and MAF of 0.05 is a standard filtering threshold. However, in our case due to relatively low sample size after removing alleles with frequency <5%, we still had some variants that were in too few voles to be included in the statistical models. Thus, we applied second filtering step, removing alleles present in fewer that 5 animals to gain enough statistical power.

*P6, L3

I appreciate that the authors used Benjamini-Yekutieli false discovery rate to avoid type I error. However, to avoid multiple testing issues, I am not sure whether it would be better to merge models from Table S6 with models in Table 7 rather than including only significant and marginally significant predictors from models in Table S6 to the final models in Table S7.

In general, we followed two step procedure used in several association studies of free-living mammals (eg. Paterson et al. PNAS 1998: 95, Turner et al. PloS Genetics 2011) to avoid model overfitting and to obtain better fitting models. We added this explanation to the section 2.3.

Before submitting the paper, we tested two alternative procedures: i) model simplification based on AIC and removing variables that explain little variance in data, and ii) fitting non-genetic terms as fixed variables in mixed models, and both gave similar results as the original models but resulting models had lower fit.

*P6, L20

I do not see any results for recombination testing and given that I think that recombination was not revealed. Please, mention it either here or in the relevant result section.

We found no evidence for recombination and we added this sentence in the beginning of section 3.2.

*P6, L21

Please, add that the selection methods used are based on dN/dS ratio testing.

We added brief description of the tests.

***Results

*P6, 3.1. Cytokine polymorphisms and susceptibility to infections

I would add some general summary of polymorphism revealed, ideally with a table in SI with values such as number of sequences, number of unique nucleotide alleles, number of variable sites, number of substitutions to see how much are cytokines variable.

We rewrite Table (now) S2 adding basing statistics describing nucleotide diversity, such as number of segregating sites (S), number of haplotypes, nucleotide diversity (π) etc, and we summarized the table in the first section of the Results.

*** Discussion

*P7, L30

MHC is not a receptor. Please, correct the following sentence:

Studies of associations between polymorphisms in the immunity genes and susceptibility to diseases in wild mammals usually focus on receptor proteins, such as MHC or TLR

We changed “receptor” to a descriptive term : “proteins presenting motifs derived from pathogens”

*P7, L30 I do not agree with the statement that nucleotide variation in molecules that physically interact with pathogen structure is ultimately functional. There is much variation that is the most likely non-adaptive, even if it is under positive selection. Please, see e.g. Těšický et al. 2020 wherein some approaches on how to distinguish putative functional variation from non-functional one are outlined.

Těšický M, Velová H, Novotný M, et al (2020) Positive selection and convergent evolution shape molecular phenotypic traits of innate immunity receptors in tits (Paridae). Mol Ecol 3056–3070. https://doi.org/10.1111/mec.15547

Original sentence: Since they physically interact with pathogen-derived motifs, their nucleotide composition can be directly attributed to functional variation

We thank for this remark and suggesting us the paper. We added this remark to the re-arranged version of the Discussion.

*P8, L7 Please, clarify what is meant by repeated rounds of positive selection interspersed with purifying selection.

The sentence was a clumsy attempt to describe frequency-dependent selection where a rare allele is supposed to be advantageous and it is favored but once its frequency increases, it is more likely to be “recognized” by pathogens and thus it turns disadvantageous. We altered the sentence so that it better describes FDS.

***Tables and figures

*Figure 1

I think that it is quite confusing to present in upper panels of the figure both percentage and no. individuals as there is no scale on the y-axis for the percentage. Also, e.g. column for 57 % is higher than for 34.4 %. Why there is no CC genotype for LTalfa 322?

We clarified the figure, changing bars to stacked bars and by removing %. The % were meant to show how many individuals with given genotype were infected but we agree that this seemed confusing.

There was only one individual with the genotype CC in LTa and we removed it from the analysis as we lack statistical power to test the effect of this genotype. We described this in the Results.

*Figure 2.

It seems that the last sentence of figure 1 caption is unfinished: “Positions of SNPs are given with bars, and SNPs”

Please, also add the numbering at the beginnings and the ends of exons.

We completed the sentence (it seems that there was some formatting error and it was not visible) and we added the numbering.

* Table S6 and S7

Please add a new column with the number of observations in both tables

We added sample sizes below the gene names.

*Table S8

LTalfa 107* - what does this asterisk mean?

Please, include the same position identified under selection by multiple methods in the same row to allow a reader to better compare which positions have been identified by the multiple selection methods.

We aligned the positions vertically in the table to facilitate comparisons. We used asterisk to indicated that the positively selected codon 107 is next to codon 108 which corresponds to LT� SNP 322 that was significantly associated with presence of A. tianjinensis. We added this information below the table.

Response to the Reviewer #2:

The cohort is quite small for this particular type of study; then, the data are quite heterogeneous, because the recruitment was performed in 2005 and 2016; it is not easy to compare data from different years and locations.

We are afraid that this is inherent problem in field studies where obtaining sample sizes comparable to human association studies is difficult and also unethical if many the animals are culled.

We dealt with the problem the best we could by controlling for data heterogenity in GLMs. In the models, we included all factors that could affect parasite load beside genetic factors eg. year, sample site or host sex.

The Authors wrote they corrected the results by means of false discovery rate; nevertheless, in the present paper there are a lot of comparisons and given the great number of tests performed in the study, some obtained p-values may be spurious. I therefore ask the Authors to strengthen multiple test correction; accordingly, the Authors should rewrite abstract, results and discussion. I recognise that if the Authors would apply a too strong correction (such as Bonferroni), probably most of the comparisons would be lost; so, I ask the Authors to set a value in order to clean the P which may be resulted significant only by chance. In fact, in my opinion, P-values of 0.02, 0.03 obtained in a small cohort study without a strong correction for multiple testing may be spurious.

We applied Bonferroni correction as requested and significant values remained valid except for the association with Bartonella.

We tested for the effect of 10 SNPs (2 in IFN, 1 in TNF, and 7 in LTa) and in the reviewed manuscript we adapted the most conservative criterion i.e. Bonferroni correction with 10 comparisons, the critical p-level correcponding to α=0.05 is 0.05/10 = 0.005. On the other hand, there is no statistical correction that could be applied when the same set of explanatory variables – alleles in our case – is fitted in different models with different parasite species as response, as this does not meet the definition of multiple comparisons.

Minor changes:

• Introduction, L3: please change the word “dozen” with another more scientific word

We wanted to emphasize that the number of interacting molecules is high, we changed “dozen” to simply “many”.

• Introduction: in the sentence “In humans, polymorphism within the LTα …”, please define LT the first time you mention it.

We added the definition (marked in blue, as it was also requested by the Reviewer #1)

• Material and Methods: I ask the Authors a brief comment in the main-text about the sentence: “samples from 2005 were sequenced using 150 cycles, and samples from 2016 using 300 cycles.

Number of cycles is a characteristic of the sequencing cycle on MiSeq machine. We changed the text to perhaps more often used alternative: “using 2x75 paired-end kit, and samples from 2016 were processed using 2x150bp paired-end kit”

• Material and Methods: In the sentence: “In a second step, we excluded from the model variants with MAF<0.5 and those present in fewer than 5 animals (see Table S4), as we lacked statistical power to test for their effects”, I understand the reasons of a lack of statistical power, but, in my opinion, a new mutation may be (at the beginning) present in only one animal and with a MAF<0.5. Please comment this fact in the main-text.

Yes, we are aware that this is an usual problem with estimating the effect of rare alleles, and obviously a beneficial mutation will initially be present in single (or few) animals. Practically, it is impossible to test for effect of alleles of MAF <5% when the relative risk of disease is lower that 1.5 (Foulkes 2008), and such strong risks can be rarely observed in wild systems. Although new methods to deal with rare variants have been developed (eg. SKAT, Wu et al.2011, Am J Hum Genet 89), they are not suitable to deal with response variable of Poisson distribution such as parasite counts. We added this comment to the text.

Supplementary Tables 2: the Authors indicated that some primers contain degenerate nucleotides (such as W, Y, etc.); I ask the Authors to add a brief comment and an explanation of this in the main-text

We do not think it is necessary to comment, as degenerate primers, in particular those with a single degenerate site, are commonly used. Nonetheless, we added the word “degenerate” to the passage where we refer to Table S2.

Response to the Reviewer #3:

I do feel that the introduction could be strengthened by giving a bit more background to the study system and stating the rationale for looking at these particular cytokine genes in this host-parasite system.

We expanded the Introduction adding a passage describing the study system. We also added the hypothesis and the goals.

In addition, I think the methods could be a bit clearer on the details of the study system and statistical analyses.

Following the Reviewers remarks, we clarified and completed the Methods with a focus on statistical approach.

There are also some grammatical errors and typos, which I have tried to point out as best as I can despite the lack of line numbers. I go through these points in more detail, along with other comments that I hope will improve the clarity of the manuscript, below:

We greatly appreciate the effort and we apologized for the lack of line numbering that we somehow missed in the submission process. The minor grammatical errors and typos were corrected in the text but for simplicity, we did not mark them in in colour. The text was sent for professional English editing service.

Abstract: would be nice to add in the sample size into the abstract

We agree that sample size should be provided but in the number of genotyped animals differed between studied genes. Yet, it takes at least 2 sentences to explain which may be too complex for an abstract.

Abstract, line 9: I think this should read ‘one of them’ rather than ‘was of them’

Yes, this was an editing mistake, the sentence was corrected (correction in blue, as it was also pointed out by the Reviewer 1)

Abstract, line 10: would be nice to add in the type of selection observed

Added

Abstract, final sentence: it appears that there are more associations with coding (exonic) variants that affect parasite infection/load than intronic variants?

We clarified the sentence so that it accurately describes the results present in the reviewed version of the paper

Introduction, paragraph 2, line 6 and 11: might need to specify that selection was found in the genes (line 6) and variation was genetic variants in interleukin genes (line 11)

Thank you for pointing out this too-often used mental shortcut, of course the selection acts on genes, not on the proteins they code. We clarified this in the text.

Introduction, paragraph 2, line 12: ‘A’ single study. Might need to specify whether this has been investigated and not found, or if only one study has investigated non-coding regions

To clarify, we changed “A study” to “ Only one study focusing on non-coding variance in free- living animal”.

Introduction, paragraph 2: in the next paragraph you explain what LTα is but it would be better to put this here on the first mention of this cytokine gene.

We explained the abbreviation here (as requested by the other Reviewers) but we opt to explain the function later, where we introduce LTα along with other two cytokines studied in the paper.

Introduction: I believe that the introduction would benefit from bringing in the study system and describing the types of pathogens that are present in the two populations of bank voles. What pathogens cause more morbidity or mortality in this system? It would then be nice to use this information to link into why you are studying these particular cytokine genes so that it feels more hypothesis driven. This would also help you to develop key expectations of the types of genes that might be important – from previous work for example by Turner et al in the field voles – and also what types of variants might be important (e.g. synonymous vs non-synonymous and exonic vs intronic).

We re-arranged the Introduction so that it now consists the second paragraph formerly in Methods describing the study system. We added a brief info about the fitness consequences of infections (although there is surprisingly few studies on that topic in wild systems). We also better formulated the hypothesis and we clarified aims.

Materials and methods, section 2.1: It would be important to add here the sample size of voles in the two years at each site into the main text (rather than a supplementary table). Later, post QC of SNPs it would be good to state from X number of individuals X were used in the final analyses.

We moved table S1 to the main text (now Table 1), and we amended it so that it shows number of samples collected / sequenced and number of samples included in the final analyses.

Materials and methods, section 2.1: Are any of the individuals in these populations related? Did you control for this at all?

We did not controlled for relatedness in the current paper. Yet, samples collected in 2005 and used in the current paper have been genotyped in 7 microsatellite loci and those results are reported in Kloch et al. 2010 and were later used in Kloch et al. 2018.

In the 2018 paper, we fitted first and second principal components (PC1 and PC2) of the relatedness matrix in GLM models analysing links between genetic variants in TLR genes and susceptibility to infections. In any model these two variables were significant and their effect sizes were low.

In the current paper we did not include relatedness data as we lack microsatellite analysis for samples collected in 2016. However, based on the 2018 results we are pretty confident that relatedness did not affect the outcome of the current study.

Materials and methods, section 2.1: A brief discussion of the differences in parasite burden between the two sites would be good to have here (or in the introduction) so that this manuscript can stand alone.

We added a description of the study system to the Introduction and we explained the differences between sites. Generally, the main difference is site-specific occurrence of H. mixtum and H. glareoli. The rest of GI nemetodes is observed in all sites. Their abundance vary from year to year but there is no consistent temporal pattern.

Materials and methods, section 2.1: In the introduction and discussion it would be important to introduce and compare respectively the associations you find here with those in the MHC and TLR genes in the same study system – are patterns of synonymous vs non-synonymous and exonic vs intronic similar? Are effect sizes larger in the MHC or TLR genes than in the cytokine genes

Both previous studies (MHC and TLR) included haplotypes, not individual SNPs, and comprised only exonic sequences. Since the previous studied focused on functional difference between variants, in the previous studies we focused on haplotypes rather than single nucleotide substitutions as in the current paper. In the MHC study we analysed whole haplotypes of the exon2 of MHC DRB, and in the TLR study we analysed amino-acid haplotypes. Hypothetically, we can extract the synonymous/non-synonymous data from our archives but this would take much more time than available for the review. Nonetheless, we added a passage referring to our previous results.

Materials and methods, page 5, first paragraph: why were a different number of cycles used?

Between first and second sequencing the prices and availability of Illumina kits changed, so in the second run we were able to afford longer reads (The number of cycles equals to the read length).

Materials and methods, page 5, second paragraph: it would help to give a version and reference for both the bank vole genome and mouse genome used (for orthologues).

We added the reference to the mouse sequences (in blue as also requested by Reviewer #1). For bank vole reference, we used draft genome with BioProject accession no. PRJNA290429 which was described in the third paragraph of the section 2.2.

Materials and methods, page 5, second paragraph: it would help the reader if you explained what the abbreviations are that are used as your QC criteria in vcffilter.

We added a description, and now the passage reads: “The results were filtered […] with following conservative criteria: remove low quality calls (QUAL/AO > 10), remove loci with low read depth (DP > 10), remove alleles present in only one strand (SAF > 0 & SAR > 0), remove alleles that are only observed by reads placed to the left or right (RPR > 0 & RPL > 0).

Materials and methods, page 5, second paragraph, final sentence: would be nice to add in the number of genes and corresponding chromosomes

We are sorry but we don’t quite understand this comment. We studied only 3 genes. Did the Reviewer mean number of haplotypes/alleles found in our study or number of SNPs per gene? We added a short summary of the genetic variance in the studied loci in section 3.1, as requested also by Reviewer 1.

We cannot provide chromosomes, as the chromosome-resolved bank vole genome has not been released yet and the location of the studied genes on chromosomes in vole genome is unknown.

Materials and methods, page 5, section 2.3, first paragraph: I don’t really understand the last sentence about minimising false positives– could you clarify?

We agree that this sentence was unclear. We meant that it is hard to provide statistically robust analysis when individual with a rare allele has a rare parasite. Increased sample size might help but we could not sample more animals due to ethical reasons. Thus, we applied several filters removing variables that were too few to produce reliable results. We rewrote this passage to make our point clear.

Materials and methods, page 5, section 2.3, second paragraph: do you mean you removed SNPs with a genotyping rate <100% and any individuals that had any missing SNP genotypes? Please give final sample sizes after QC

We clarified that including number of samples before and after filtering in the Table 1.

Materials and methods, page 5, section 2.3, second paragraph: How many SNPs deviated from HWE? This might be expected if selection is occurring on these loci.

We agree that selection could affect HWE but there can be also other reasons for disequilibrium. We decided to remove loci not in HWE following recommendations for association analysis (PLINK manual for example). This step is advised as such variants may result from genotyping errors. Due to HWE, we removed 8 SNPs in TNF, 3 in LTa, and 0 in IFN. In tests of selection, we used all variants, regardless of HWE.

Materials and methods, page 6, first paragraph: I think there is a typo – surely minor allele frequency must be <0.5?

Yes, this was a typo, we meant 0.05.

Materials and methods, page 6, first paragraph: rather than put the pathogens in a supplementary table it would be important to specifically list each pathogen looked at and ideally the % prevalence. Histograms of parasite load would be nice and should go in supplement. I think this is really important because it is not immediately obvious to the reader how many pathogens you are looking at and therefore it is hard to put the results into context.

We moved the Table S5 that shows prevalence in each sample set (i.e. in samples genotyped in each locus) to the main text, now it’s Table 2. We also added a passage summarizign this data to. Graphs of parasite loads are in Figure S1.

Materials and methods, page 6, second paragraph: some details of the models are missing including the type of link, how significance of fixed effects was determined, how post-hoc testing was done between SNP genotypes, which package was used for models etc. How did you control for overdispersion (and presumably zero-inflation) in the poisson model? Or did the model fit without other means of controlling for overdispersion? Did every pathogen abundance/load model suitably fit a poisson model? Did you try running these in a hurdle model which should combine the two models you mention (parasite infection y/n and parasite load)? Did you test for any interactions between SNPs and non-genetic sources such as sex or site?

We wished to make this section concise but we admit that indeed several details are missing, and we completed section 2.3. Parasite presence/absence was modelled using binomial distribution with logit link function, and abundance was fitted using Poisson distribution and log link function. To control for overdispersion, we used quasi-Poisson errors implemented in glm function in R library {stat}. The significance of terms was determined using LR type III tests. We tested for interactions but we lacked df to do that properly i.e. not all combinations of alleles and non-genetic factors were present in the data resulting in a poor fit of the models.

As explained in the letter to the Editor, before submitting the original paper we examined several statistical approaches (eg. mixed models with non-genetic data as fixed terms, stepwise-simplified models, models with negative binomial distribution for count data instead of Poisson etc.) to conclude that all produced similar results. They all indicated the same set of SNPs to be significant, although the exact p-values varied a bit between models. We present the models with simplest structure as in more complicated models we encountered problems related to data structure such as separation or singularity, and the fact that the same SNPs were significant regardless the model structure makes us confident that the presented effects are valid.

Materials and methods, page 6, section 2.4: it would be useful to the reader if you could explain how these codon-based tests identify sites under selection, even briefly.

The added a brief explanation on what do the tests do. We also clarified results of those tests in section 3.2

Results: I think it would be useful to explicitly state each of the null results (for both models and each pathogen) you found too rather than just the significant results. It would also be good to state X number of SNPs out of X total SNPs in a gene were associated with a trait – as you did with TNF in the first sentence. It would also be good to state whether SNP effects were additive, and to compare the effects of different SNPs (on the same parasite) in the discussion.

We rewrote this section of results following the suggestion.

Results: There is some inconsistency in how you explain the results. For some SNPs you give the average number of worms in each genotype class but not for others – it would be good to provide this for all. Also it would be good to have errors around these values and to explain how these were calculated in the methods – is it from the raw data or post hoc testing controlling for other non-genetic variables in the model?

We clarified the description of the results and in Fig. 1

Results: I would not discuss a result that did not meet the p value after correction for multiple testing.

We removed this part of the discussion

Results: It would be nice to have full results from each model (model estimates, errors and p values for each SNP and for non-genetic variables too and post hoc testing between genotypes) reported somewhere, even if in the supplement. You could also compare the effects of non-genetic sources to SNPs in each model

Full results from each model are in Table S5. We added parameter estimates and effect sizes.

Results, page 7, second paragraph: what does average high mean?

Clearly it is some editing typo, in the reviewed version we rewrote the sentence.

Figure 1: Explain bar chart colours and outliers and labels. A stacked bar chart might be a better option.

In the caption we explained colours, and defined outliners. Also we changed bars to stacked.

Discussion: it would be nice if the discussion was reformatted to follow the same format as the results – starting with SNPs associated with infection or parasite load, followed by discussion of selection on these variants

We reformatted as requested. The discussion is now divided into three sub-sections (the extra one that does not follow the Result pattern is about the role of non-coding variants).

Discussion, page 8, line 3: I would rephrase to ‘genetic variation in cytokine genes also plays’ rather than ‘cytokine variance’ as this sounds like you are actually measuring cytokine levels

We rephrased the sentence

Discussion, page 8, paragraph 2: there are a lot of ‘what’s in this paragraph that should be ‘which’

We corrected that.

Discussion, page 8, paragraph 2: I wouldn’t comment on the effect on Bartonella sp. as I think it was non significant after correction for multiple testing

After applying Bonferroni correction this result turn non-significant and we removed Bartonella from discussion.

Discussion, page 8, paragraph 2: following on from the final sentence it would be nice to have some insight into how these pathogens affect morbidity and mortality in this species or a similar species

We added a paragraph describing effect of nematode infections on host fitness in rodents.

Discussion, page 8, paragraph 3: This is a nice discussion of the role of synonymous mutations. Thank you

Thank you :)

Discussion: some more discussion of other wild systems (e.g. Turner et al’s work in the field voles) and comparisons to effects of the MHC and TLR variants previously investigated in this system would strengthen the manuscript. It would also be nice to add whether variants in IFNβ1 and LTα (or orthologues) have previously been implicated in susceptibility to parasitic diseases in other species.

We elaborated this part of the Discussion.

Discussion, page 9: the conclusion feels a bit rushed. While there is an intronic and a synonymous SNP, it appears that more associations with exonic and non-synonymous SNPs and parasite infection/load – so can you say that intronic and synonymous variants play a (more) important role? Or are you just saying that they are also present and should not be ruled out?

In the Conclusion, we added a brief summary of the key findings. At the moment, we cannot say how important is their role but our findings clearly suggest that they should not be disregarded in future studies. We clarified our point.

Discussion: some discussion of the advantages and disadvantages of candidate gene studies would be helpful to the reader

Having dealing with candidate-genes vs. genome-wide scans in our work, we see this topic as too broad to be included in the Discussion. To discuss the pros and cons we would need to get into technical details and we think that this is rather off-topic to the main theme of the paper.

Attachment

Submitted filename: READYrebuttal.docx

Decision Letter 1

Johan R Michaux

1 Sep 2022

PONE-D-21-29710R1Cytokine gene polymorphism and parasite susceptibility in free-living rodents: importance of non-coding variantsPLOS ONE

Dear Dr. Kloch,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

All reviewers agree that your manuscript is interesting but two of them consider that it still needs of major revisions before a final acceptance.

Two importants aspects would be particularly corrected:

- the first one concerns the risks of false positives and the interest to use multiple testing corrections. One reviewer would be happy if you could add some text to the discussion to indicate the caveats of this approach and the follow up studies needed to confirm any associations which are observed in this study.

- the second one concerns positive selection analyses. Positive selection analysis cannot be performed on exon-intron sequences and authors need to redo the analysis only on CDS regions and re-interpret results where necessary.

Different other comments are given in the new reviews. I suggest you to consider all of them, as they will improve the quality of your study.

Please submit your revised manuscript by Oct 16 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Johan R. Michaux

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

Reviewer #3: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

Reviewer #3: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I appreciate that authors have carefully addressed my comments and significantly improved the quality of the manuscript. I now have one major comment concerning positive selection analysis that I unfortunately missed previously. Positive selection analysis cannot be performed on exon-intron sequences and authors need to redo the analysis only on CDS regions and re-interpret results where necessary. Please, find some other comments below.

1) L49, Parasites may affect the genetic variation of their hosts through three main mechanisms.

There also some other „main“ mechanisms, such positive or purifying selection. Better opening sentence for BS is needed.

2) L127, All three sites are on public ground managed by the Polish State Forests 128 and no specific permission to access the land was required.

This sentence ca be omitted.

3) L171, A fraction of SNPs (5 of 36 in TNF, and 6 of 18 in LTα) could 172 not be phased by FreeBayes.

To avoid confusion, better to use „reconstructed“ over „phased“. Also, better to write, a fraction of alleles…

4) Table 1 can be better placed in SI as it shows only supportive information.

5) L233, We overcome this drawback by using two additional 234 tests (MEME and FUBAR) to test for episodic selection.

FUBAR detect only pervasive selection. Please, correct the statement.

5) L238-239 should be moved to the methods.

6) L346-349. This is redundant sentence as mechanism of NFSD is already well-explained in the introduction.

7) L357, Given the absence of revealed recombination, those results must be interpreted with caution. Without direct functional testing using mouse models, the revealed association between parasite load and particular SNPs must be interpreted with caution as it can be caused by some other adjacent variation. Please, incorporate this point in the discussion.

7) L642,

It seems that authors analysed pattern of positive selection across whole exon-intron sequences. This is misleading since dN/dS based selection method are intended only for protein-coding data. Introns are under different mode of evolution. I ask authors to perform selection analyses only on protein-coding data, i.e. CDS region and change the interpretation of data where necessary. Also, please, explicitly mention in the methods that selection analysis was performed only on exonic sequences

8) Figure 2

Why do authors not provide location of sites under selection also for TNF?

Reviewer #2: (No Response)

Reviewer #3: I reviewed the earlier version of the manuscript and I am pleased to see that the majority of comments and suggestions from the reviewers have been addressed. I would like to highlight, however, that the reviewing process would have been easier if the authors had added line numbers corresponding to each of their edits for each of their responses to the editors and reviewers comments.

The manuscript reads much better following revision, and I do believe that all the sections of the manuscript have been strengthened. However, I am still slightly skeptical about the associations since population structure can lead to false positives. Ideally, population structure would be accounted for in some way, or a set of control genes would also be included to check whether the test statistic (lambda) follows the null distribution or whether it is inflated. I am aware however that the authors do not have this data, and I am also aware that there are a lot of candidate gene studies out there that do not account for population structure, and I do believe that multiple testing correction in the revision has reduced the chance of false positives. As a result, I would be happy if the authors added some text to the discussion to indicate the caveats of this approach and the follow up studies needed to confirm any associations they see.

Comments to the authors response to the editor and reviewers:

• In terms of the non-genetic sources of variation to include in the model: I would usually advocate to keep all terms in the model as a more conservative approach. However, I understand that overfitting is a problem particularly when the sample sizes are low. I would trust the authors assertion that the models were similar if they added the alternative models with all fixed effects fitted (non-genetic sources and SNPs) to the supplement. I also think it would have been an easier approach to test – for each parasite and each parasite model (absence or parasite load) – the fixed effects rather than checking for each parasite model/gene subset as they could be significant in one and not the other just due to small changes in sample size.

• I do feel that the results are strengthened since applying the Bonferonni correction. I think ideally the number of tests would be - the number of parasites tested (5?) x number of parasite models (2) x number of genes (as SNPs within a gene might not be independent?) or SNPs. I would be interested to see what the other reviewers think of the criteria of 10 tests because I understand that my suggestion would be very very conservative. I would also like to note that given the current criteria the association between LTa 322 and A. tianjinensis is technically not significant (p=0.0054) and reference to this association should be removed.

• I would add the data availability statement into the manuscript

• You mention that you exclude variants (genotypes?) present in fewer than 5 animals but then I am unsure why LTa 322 is included since it has no CC genotype (or had 1 animal which was removed)?

• I feel that the introduction is much improved by the addition of information about the study system. I do feel as though some hypotheses are a bit vague – as well as the statement that the authors focused on these genes as “little is known”. I did expect some more references to human GWAS which may have found SNPs associated with these genes and parasite/pathogen infections, although perhaps there are none. Here it would be better to specifically indicate the parasite species studied rather than e.g. ‘blood microparasites’.

• Page 2, line 108 – parasite load of what species?

• The statistical methods are a lot clearer now, thank you.

• I appreciate the re-writing of the conclusion when I queried how important the non-coding variants are – I wonder also if this part of the title could be removed as you did find more coding variants.

• I appreciate that the authors brought in the table of parasite species (Table 2) into the main manuscript in response to my query that I wanted the parasite species tested mentioned specifically in the text. However, I do feel like there is a lot of information in this table for the main manuscript. I would really prefer if the authors just listed the species studied directly in the text (referring to the type of parasite they are too) and put this table back in the Supp. I suggest this because it is much easier for readers to work out which species are looked at than having to look up the table. I also found the table difficult to interpret because I was unclear if parasite species were included if it was 20-80% infections across all gene subsets or only if in any – and if the latter – did you only run models where prevalence was > 20% for a given parasite/ gene combination (i.e. was H. glareoli only run with IFN variants and not the other two genes?). Therefore, I feel like it would be helpful to add the parasite species tested directly to the text as it was not immediately clear to me. If the authors elected to keep the table in the main manuscript you could remove the numbers infected and non-infected (as you could work this out from Table 1 from the % prevalence) to make it simpler. Either way I would highlight the species you tested in bold to help the reader.

Other comments

• I would add the parasite species tested in the abstract. I imagine most people would be looking specifically for genetic variants associated with a given parasite (or closely related parasite) so this would help their search.

• Page 3, line 74 – “non-coding variance” = non-coding variants

• Add how species abundance (not just prevalence) was measured to the methods.

• Add season/month of sample collection to methods.

• Methods - are quasi-binomial and quasi-poisson not two different types of model?

• Results – I would add null results for each parasite/gene combination.

• Table 3 – I am glad the authors have put the full model results in the main manuscript. However, the formatting is not as nice as the old table and I am unclear why there are lines around host body mass. I would recommend adding a reference to the supp table with the other (non-sig) results to the legend. There is a reference to padj which isn’t reported in the table.

• I think it would be nice to add to the summary at the start of the discussion which genes and which parasite species tested were not significant.

• Page 10, line 291 – change individuals to worms

• Page 10, line 298 – SNP variance = SNP variants

• Page 10, line 305 – generation *of* T cells

• Page 10, line 310 – but you also investigated the association between TNF and other non-nematode infections?

• Page 11, line 330 – strengthened

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

Reviewer #3: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2023 Jan 24;18(1):e0258009. doi: 10.1371/journal.pone.0258009.r004

Author response to Decision Letter 1


31 Oct 2022

Reviewer #1:

Changes in text following Reviewer’s #1 suggestions are marked in blue.

We refer to line numbers in “track changes” version of the revised manuscript.

I appreciate that authors have carefully addressed my comments and significantly improved the quality of the manuscript.

Thank you. We are glad that we managed to satisfy the Reviewer remarks.

I now have one major comment concerning positive selection analysis that I unfortunately missed previously. Positive selection analysis cannot be performed on exon-intron sequences and authors need to redo the analysis only on CDS regions and re-interpret results where necessary.

Thank you for pointing out this important mistake. We calculated the tests again using only exonic fragments, changing relevant parts in Methods and Results (lines 290-297). (There were no changes in IFNb1, as it consisted of exonic part only).

1) L49, Parasites may affect the genetic variation of their hosts through three main mechanisms.

There also some other „main“ mechanisms, such positive or purifying selection. Better opening sentence for BS is needed.

We agree with the remark, although summarizing such a complex idea in a single sentence is challenging. Nonetheless, we attempted to better introduce BS (lines 48-50).

2) L127, All three sites are on public ground managed by the Polish State Forests 128 and no specific permission to access the land was required. This sentence ca be omitted.

This sentence was added upon request of the Editor: „In your Methods section, please provide additional information regarding the permits you obtained for the work. Please ensure you have included the full name of the authority that approved the field site access and, if no permits were required, a brief statement explaining why.”

3) L171, A fraction of SNPs (5 of 36 in TNF, and 6 of 18 in LTα) could 172 not be phased by FreeBayes. To avoid confusion, better to use „reconstructed“ over „phased“. Also, better to write, a fraction of alleles…

By using the term „a fraction of SNPs” we wanted to precisely depict the situation. In the vcf file resulting from our pipeline most of the SNPs were phased (i.e. assigned to given DNA strand) based on their positions on sequenced Illumina fragments (reads). If two SNPs are located in the same read, the FreeBayes assumes that they originated from the same DNA strand. This method does not work for SNPs that are separated from other SNPs by more than a read length and thus never occurr in a read with any other SNP. Here we had to use PHASE algorithm and assigned them computationally to given DNA strands.

We clarified the sentence, so now it reads “a fraction of SNPs (5 of 36 in TNF, and 6 of 18 in LTα) could not be phased by FreeBayes; these were computationally assigned to DNA strands using PHASE (Stephens et al. 2001). Reconstructed alleles were converted to fasta format ..” (lines 188-190).

4) Table 1 can be better placed in SI as it shows only supportive information.

The table was initially in the SI but was placed in the main text upon request of the Reviewer 3: „Materials and methods, section 2.1: It would be important to add here the sample size of voles in the two years at each site into the main text (rather than a supplementary table).” We can move it back to the SI if both Reviewers (#1 and #3) agree.

5) L233, We overcome this drawback by using two additional tests (MEME and FUBAR) to test for episodic selection. FUBAR detect only pervasive selection. Please, correct the statement.

We agree with this remark. After consideration, we decided to remove two final sentences of this paragraph, as the principles of all the tests are described above. We kept the information that FEL assumes constant selection pressure across phylogeny.

5) L238-239 should be moved to the methods.

This passage was added upon request of the Reviewer #1 in the first round of reviews: “ *P6, 3.1. Cytokine polymorphisms and susceptibility to infections. I would add some general summary of polymorphism revealed, ideally with a table in SI with values such as number of sequences, number of unique nucleotide alleles, number of variable sites, number of substitutions to see how much are cytokines variable.” Nontheless, in the current review, we moved it to the end of section 2.2, lines 196-199.

6) L346-349. This is redundant sentence as mechanism of NFSD is already well-explained in the introduction.

This was added upon request of Reviewer #1: „P8, L7 Please, clarify what is meant by repeated rounds of positive selection interspersed with purifying selection.”. However, as we the explanation of NFSD was also added to the introduction, we removed this redundant fragment.

7) L357, Given the absence of revealed recombination, those results must be interpreted with caution. Without direct functional testing using mouse models, the revealed association between parasite load and particular SNPs must be interpreted with caution as it can be caused by some other adjacent variation. Please, incorporate this point in the discussion.

We agree. We added this point to the discussion, lines 373-376.

7) L642,

It seems that authors analysed pattern of positive selection across whole exon-intron sequences. This is misleading since dN/dS based selection method are intended only for protein-coding data. Introns are under different mode of evolution. I ask authors to perform selection analyses only on protein-coding data, i.e. CDS region and change the interpretation of data where necessary. Also, please, explicitly mention in the methods that selection analysis was performed only on exonic sequences

Thank you for pointing out this mistake. We have now recalculated site-selection tests using exonic sequences only in LTa and TNF (INFb1 did not contain introns). We changed relevant parts of Methods (l. 226) and Results (section 3.2), including Fig 2.

8) Figure 2

Why do authors not provide location of sites under selection also for TNF?

Initially in Fig 2 we included only the genes where we find significant associations of given SNPs with parasite load. We agree that this may be confusing so we added TNF to the Figure.

Reviewer #3:

Changes in text following Reviewer’s #2 suggestions are marked in green.

We refer to line numbers in “track changes” version of the revised manuscript.

I reviewed the earlier version of the manuscript and I am pleased to see that the majority of coments and suggestions from the reviewers have been addressed. I would like to highlight, however, that the reviewing process would have been easier if the authors had added line numbers corresponding to each of their edits for each of their responses to the editors and reviewers comments.

Thank you for appreciation. In this review, we add line numbers corresponding to the edits.

However, I am still slightly skeptical about the associations since population structure can lead to false positives. Ideally, population structure would be accounted for in some way, or a set of control genes would also be included to check whether the test statistic (lambda) follows the null distribution or whether it is inflated. I am aware however that the authors do not have this data, and I am also aware that there are a lot of candidate gene studies out there that do not account for population structure, and I do believe that multiple testing correction in the revision has reduced the chance of false positives. As a result, I would be happy if the authors added some text to the discussion to indicate the caveats of this approach and the follow up studies needed to confirm any associations they see.

We are really thankful for understanding that some factors are difficult (or impossible) to control in the field data. We did our best to make our results as robust as possible. We added a paragraph discussing caveats of our study design in a final part of the section 4.1.

In terms of the non-genetic sources of variation to include in the model: I would usually advocate to keep all terms in the model as a more conservative approach. However, I understand that overfitting is a problem particularly when the sample sizes are low. I would trust the authors assertion that the models were similar if they added the alternative models with all fixed effects fitted (non-genetic sources and SNPs) to the supplement. I also think it would have been an easier approach to test – for each parasite and each parasite model (absence or parasite load) – the fixed effects rather than checking for each parasite model/gene subset as they could be significant in one and not the other just due to small changes in sample size.

We added models with all genetic terms in Table S7. Please note, that for IFN there is no factor “year”, as in this gene we only genotyped samples from 2005. Similarly, there is no “year” for Cryptosporidium which was analysed only in samples from 2005.

I do feel that the results are strengthened since applying the Bonferonni correction. I think ideally the number of tests would be - the number of parasites tested (5?) x number of parasite models (2) x number of genes (as SNPs within a gene might not be independent?) or SNPs. I would be interested to see what the other reviewers think of the criteria of 10 tests because I understand that my suggestion would be very very conservative. I would also like to note that given the current criteria the association between LTa 322 and A. tianjinensis is technically not significant (p=0.0054) and reference to this association should be removed.

Thank you for rising this point. Prior the publication, we discussed between us what should we consider multiple comparisons. For instance, we tested for linkage between SNPs within a locus as described in the first paragraph of the section 2.3. so technically they should be independent, yet statistically the more explanatory variables, the higher probability that any of them is significant by chance. Thus, we used the criterion of 10, as this was the number of explanatory variables whose effect we were interested in (the non-genetic terms were included only to control for their contribution to the observed variance in parasite load).

Probably it has to be clarified that due to a fact that prevalence was <20% in some combinations of loci and parasites, in total we run 23 models as shown in the table below. (This information was added in line 220). If we consider all combination of parasite x type of test x SNP as multiple comparisons, this gives in total 74 “comparisons” resulting in p-value 0.05 /74 = 0.00068. This super-conservative criterion still shows significant association between IFNβ 105 and prevalence with H. glareoli.

Table. Combinations of parasites / genes tested in the GLM models. In total we run 23 tests. When each SNP is considered, this gives 74 comparisons.

Presence / absence

16 models

Abundance

7 models

parasite / pathogen

TNF

LTα

IFNβ1

TNF

LTα

IFNβ1

Aspiculuris tianjensis

1 SNP

7 SNP

2 SNP

1 SNP

7 SNP

2 SNP

Heligmosomum mixtum

1 SNP

7 SNP

2 SNP

1 SNP

7 SNP

2 SNP

Heligmosomoides glareoli

-

-

2 SNP

-

-

2 SNP

Cryptosporidium sp.

1 SNP

7 SNP

2 SNP

-

-

-

Babesia microti

1 SNP

7 SNP

2 SNP

-

-

-

Bartonella sp.

1 SNP

7 SNP

2 SNP

-

-

-

However, we doubt if models of abundance and presence/absence should be considered as multiple comparisons, as biologically each of them involves different hypothesis. Resistance against parasites involve different immunological mechanisms (i.e. preventing a pathogen from colonizing the hosts) than dealing with the number of pathogens once the infection happened. If we agree with this assumption, the threshold p-val for presence/absence tests is 0.05 / 52 (10 SNPs tested in 5 parasites + 2 SNPs tested for H. mixtum) = 0.00096 and for abundance this value is 0.05 / 22 (10 SNPs for two parasites plus 2 SNPs tested for H. mixtum) = 0.0023. Again, with those thresholds our results remain valid.

Finally, we think that corrections for multiple comparisons can be applied only to explanatory variables, as we are interested in minimizing risk of false positives. Models with different parasites species as response variables do not fulfil this criterion, as finding an association between genetic marker and susceptibility to disease A is independent from susceptibility to disease B - at least statistically, if we don’t assume any underlying biological mechanisms that may link these two diseases. Thus, we simply corrected for the number of loci tested. Such an approach is recommended in human GWAS studies, for instance in PLINK manual.

We agree that the association between LTa 322 and A. tianjinensis is weak and should not be reported so we adjusted Results and Discussion.

I would add the data availability statement into the manuscript

Initially, it was included in the PloS submission forms but it seems it did not appear in the manuscript. We added the statement to the main text. lines 451-455

You mention that you exclude variants (genotypes?) present in fewer than 5 animals but then I am unsure why LTa 322 is included since it has no CC genotype (or had 1 animal which was removed)?

We removed this one individual with the genotype CC from the analysis so that we still can test for the effect of LTa322.

I feel that the introduction is much improved by the addition of information about the study system. I do feel as though some hypotheses are a bit vague – as well as the statement that the authors focused on these genes as “little is known”. I did expect some more references to human GWAS which may have found SNPs associated with these genes and parasite/pathogen infections, although perhaps there are none. Here it would be better to specifically indicate the parasite species studied rather than e.g. ‘blood microparasites’.

We added some text to this paragraph to make it more specific. Although we did not intend to cite many human GWAS studies, as they usually have much power and different design compared to wild systems, we included a reference to a review by Khan and Qidwai 2011 (line 123). We rephrased the sentence in line 97 (“little is known”), and we reformulated the hypotheses.

Page 2, line 108 – parasite load of what species?

We rephrased the hypothesis and we made it more specific (lines 123-125 and 129-134).

The statistical methods are a lot clearer now, thank you.

We are happy that we succeeded to make the text clearer.

I appreciate the re-writing of the conclusion when I queried how important the non-coding variants are – I wonder also if this part of the title could be removed as you did find more coding variants.

We agree to remove “non-coding variants” from the title after acceptance from the Associated Editor.

I appreciate that the authors brought in the table of parasite species (Table 2) into the main manuscript in response to my query that I wanted the parasite species tested mentioned specifically in the text. However, I do feel like there is a lot of information in this table for the main manuscript. I would really prefer if the authors just listed the species studied directly in the text (referring to the type of parasite they are too) and put this table back in the Supp. I suggest this because it is much easier for readers to work out which species are looked at than having to look up the table. I also found the table difficult to interpret because I was unclear if parasite species were included if it was 20-80% infections across all gene subsets or only if in any – and if the latter – did you only run models where prevalence was > 20% for a given parasite/ gene combination (i.e. was H. glareoli only run with IFN variants and not the other two genes?). Therefore, I feel like it would be helpful to add the parasite species tested directly to the text as it was not immediately clear to me. If the authors elected to keep the table in the main manuscript you could remove the numbers infected and non-infected (as you could work this out from Table 1 from the % prevalence) to make it simpler. Either way I would highlight the species you tested in bold to help the reader.

We put Table 2 back in the Suppl Mat (now S4). We run only tests where prevalence was >20% for given gene/parasite combination, we clarified that in the text and we summarized the models tested (lines 220-226).

I would add the parasite species tested in the abstract. I imagine most people would be looking specifically for genetic variants associated with a given parasite (or closely related parasite) so this would help their search.

Added.

Page 3, line 74 – “non-coding variance” = non-coding variants

Corrected

Add how species abundance (not just prevalence) was measured to the methods.

Worms of each species were counted. The counts were used as abundance. We added this information in the line 135.

Add season/month of sample collection to methods.

Added. In both years it was September (line 138).

Methods - are quasi-binomial and quasi-poisson not two different types of model?

Yes, they are. The sentence may be confusing so we clarified “in abundance models we used quasi-Poisson errors and in presence models quasi-binomial errors” (line 235)

Results – I would add null results for each parasite/gene combination.

We added a suitable paragraph to the first part of the section 3.1. (lines 266-273)

Table 3 – I am glad the authors have put the full model results in the main manuscript. However, the formatting is not as nice as the old table and I am unclear why there are lines around host body mass. I would recommend adding a reference to the supp table with the other (non-sig) results to the legend. There is a reference to padj which isn’t reported in the table.

The lines were errors in editing. We corrected the caption. In the reviewed version we removed results for LTAa 322 (marginally non-significant, p=0.0052) which prompted us to reformat the table.

I think it would be nice to add to the summary at the start of the discussion which genes and which parasite species tested were not significant.

Added (lines 304-309)

Page 10, line 291 – change individuals to worms

Corrected

Page 10, line 298 – SNP variance = SNP variants

corrected to „variants”

Page 10, line 305 – generation *of* T cells

corrected

Page 10, line 310 – but you also investigated the association between TNF and other non-nematode infections?

Yes, we clarified this (line 337)

Page 11, line 330 – strengthened

corrected

Attachment

Submitted filename: 2response_rev.docx

Decision Letter 2

Johan R Michaux

8 Jan 2023

Cytokine gene polymorphism and parasite susceptibility in free-living rodents: importance of non-coding variants

PONE-D-21-29710R2

Dear Dr. Kloch,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. We would just like to take into account the last minor comments suggested by one of the reviewers.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Johan R. Michaux

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I again appreciate that the authors carefully integrated my comments. Now I only have a few minor comments.

L30, First mentioning of MHC and TLR abbreviations should be explained here and, in the introduction.

L33, Latin name can be added if there is space

L50 Please, add "mutually not exclusive mechanisms"

L59, Please add the reference Acevedo-Whitehouse K, Cunningham AA. Is MHC enough for understanding wildlife immunogenetics? Trends Ecol Evol. 2006 Aug;21(8):433-8. doi: 10.1016/j.tree.2006.05.010. Epub 2006 Jun 9. PMID: 16764966.

L73, Mycobacterium is not a disease

L90, Please, explain TLR abbreviation

L92-93, I do not like the sentence. Better TLRs recognize...

L196-197 I do not understand why you filtered variants that are not in Hardy-Weinberg equilibrium. I would expect that strong parasite-mediated selection going on particular SNP might deviate it from HW. Please, explain it.

L350-351 Given the context, I would use instead "further studies are needed, ideally accounting for genome-wide variation" rather "further studies should functionally verify the effect of predicted SNPs."

L363-365 Please, remove the following part: “through frequency-dependent selection” as it is speculative.

L378, I suggest using "Candidate SNPs should be verified by functional in vitro testing." better than previous "To strengthen our hypothesis, further 379 studies using direct functional tests eg. using mouse models are needed.“

L672-675, Bonferroni correction

I am not a biostatistician but I agree with the other reviewer´s conservative view on Bonferroni's correction. To obtain the critical p-value, l would simply divide p = 0.05 by the final number of all performed tests/ models from the statistical second step. To be clearer for a reader, you could also number all models in the tables and when referring to their results, you could use their model numbers in the text.

L690 Please, add reference sequence ID.

Reviewer #2: All comments have been addressed by the Authors; now, in my opinion, the present paper can be accepted for publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

Acceptance letter

Johan R Michaux

13 Jan 2023

PONE-D-21-29710R2

Cytokine gene polymorphism and parasite susceptibility in free-living rodents: importance of non-coding variants

Dear Dr. Kloch:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Johan R. Michaux

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. PCR conditions and sequences of the primers used in the current study.

    The primers included degenerated sites (marked in bold).

    (PDF)

    S2 Table. Characteristic of the studied amplicons and summary of polymorphisms within the studied genes.

    Number of respective exons and introns in mouse is given in parentheses. In TNF and LTα after slash we provide polymorphism summarises for exonic parts only.

    (PDF)

    S3 Table. Number of voles with given genotypes after filtering.

    We removed missing calls, variants with MAF<0.05, not in Hardy-Weinberg equilibrium (threshold p<0.001), and in linkage disequilibrium (r> 0.7).

    (PDF)

    S4 Table. Prevalence of infections among bank voles.

    The number of infected animals differs between studied genes because not all individuals were genotyped in three loci. non-inf–number of non-infected hosts, inf–number of infected hosts, %–percentage of host infected.

    (PDF)

    S5 Table. Summary of effect of non-genetic terms on parasite load.

    Terms with p<0.1 (marked in bold) were included in GLM models testing for the effect of genetic variance (S7 Table). The models were run separately on three datasets, each including voles genotyped at a given locus (not all animals were genotyped in all loci).

    (PDF)

    S6 Table. Effect on cytokine genetic variants on parasite load.

    As response variables we used only pathogens that infected 20–80% of hosts (S4 Table). β is parameter estimate for each contrast, R2 is partial coefficient of determination (effect size), χ2 and p-values are based on LR type III test. To control for multiple comparisons when testing for the effect of several genetic variants, we used conservative Bonferroni correction; for 10 genetic terms (SNPs) tested, the critical p-level corresponding to α = 0.05 was 0.005. Exact p-values of genetic terms significant after correction are given in bold.

    (PDF)

    S7 Table. Codons under selection.

    Only exonic parts of the studied genes are analysed. Codons were numbered starting from the first genotyped nucleotide, not from the first transcribed nucleotide. Names of the corresponding SNPs, as used in the current paper, are given in brackets. Codons that comprised SNPs significantly associated with the parasite load are given in bold.

    (PDF)

    S8 Table. Effect on cytokine genetic variants on parasite load with all non-genetic terms fitted.

    β is parameter estimate for each contrast, R2 is partial coefficient of determination (effect size), χ2 and p-values are based on LR type III test. To control for multiple comparisons when testing for the effect of several genetic variants, we used conservative Bonferroni correction; for 10 genetic terms (SNPs) tested, the critical p-level corresponding to α = 0.05 was 0.005. Exact p-values of genetic terms significant after correction are given in bold.

    (PDF)

    S1 Fig

    Parasite load by parasite species in individuals genotyped in a) TNF, b) LTα, and c) IFNβ1. Dark bars represent infected animals, light–non-infected.

    (PDF)

    Attachment

    Submitted filename: READYrebuttal.docx

    Attachment

    Submitted filename: 2response_rev.docx

    Data Availability Statement

    The raw reads from Illumina sequencing are available from SRA archive PRJNA395763. Minimal data set underlying the results containing individual genotypes and data on on individual parasite load, body mass, sex, and sampling details are stored in Open Science Framework repository https://doi.org/10.17605/OSF.IO/QKFUM.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES