Skip to main content
eLife logoLink to eLife
. 2021 May 25;10:e64669. doi: 10.7554/eLife.64669

Detecting adaptive introgression in human evolution using convolutional neural networks

Graham Gower 1,, Pablo Iáñez Picazo 1, Matteo Fumagalli 2, Fernando Racimo 1
Editors: George H Perry3, George H Perry4
PMCID: PMC8192126  PMID: 34032215

Abstract

Studies in a variety of species have shown evidence for positively selected variants introduced into a population via introgression from another, distantly related population—a process known as adaptive introgression. However, there are few explicit frameworks for jointly modelling introgression and positive selection, in order to detect these variants using genomic sequence data. Here, we develop an approach based on convolutional neural networks (CNNs). CNNs do not require the specification of an analytical model of allele frequency dynamics and have outperformed alternative methods for classification and parameter estimation tasks in various areas of population genetics. Thus, they are potentially well suited to the identification of adaptive introgression. Using simulations, we trained CNNs on genotype matrices derived from genomes sampled from the donor population, the recipient population and a related non-introgressed population, in order to distinguish regions of the genome evolving under adaptive introgression from those evolving neutrally or experiencing selective sweeps. Our CNN architecture exhibits 95% accuracy on simulated data, even when the genomes are unphased, and accuracy decreases only moderately in the presence of heterosis. As a proof of concept, we applied our trained CNNs to human genomic datasets—both phased and unphased—to detect candidates for adaptive introgression that shaped our evolutionary history.

Research organism: Human

Introduction

Ancient DNA studies have shown that human evolution during the Pleistocene was characterised by numerous episodes of interbreeding between distantly related groups (Green et al., 2010; Reich et al., 2010; Meyer et al., 2012; Prüfer et al., 2017; Kuhlwilm et al., 2016). We now know, for example, that considerable portions of the modern human gene pool derive from Neanderthals and Denisovans (Green et al., 2010; Reich et al., 2010; Prüfer et al., 2014). In the past few years, several methods have been developed to identify regions of present-day or ancient human genomes containing haplotypes that were introgressed from other groups of hominins. These include methods based on probabilistic models (Sankararaman et al., 2014; Steinrücken et al., 2018; Racimo et al., 2017a), on summary statistics (Vernot and Akey, 2014; Vernot et al., 2016; Racimo et al., 2017b; Durvasula and Sankararaman, 2019) and on ancestral recombination graph reconstructions (Kuhlwilm et al., 2016; Hubisz et al., 2020; Speidel et al., 2019). Presumably, some of the introgressed material may have had fitness consequences in the recipient populations. While recent evidence suggests that a large proportion of Neanderthal ancestry was likely negatively selected (Harris and Nielsen, 2016; Juric et al., 2016), there is also support for positive selection on a smaller proportion of the genome—a phenomenon known as adaptive introgression (AI) (Whitney et al., 2006; Hawks and Cochran, 2006; Racimo et al., 2015).

Genomic evidence for AI has been found in numerous other species, including butterflies (Pardo-Diaz et al., 2012; Enciso-Romero et al., 2017), mosquitoes (Norris et al., 2015), hares (Jones et al., 2018), poplars (Suarez-Gonzalez et al., 2016), and monkeyflowers (Hendrick et al., 2016). A particularly striking example is AI in dogs, which appears to show strong parallels to AI in humans when occupying the same environmental niches. For example, a variant of the gene EPAS1 has been shown to have introgressed from an archaic human population into the ancestors of Tibetans, and subsequently risen in frequency in the latter population, as a consequence of positive selection to high altitude (Huerta-Sánchez et al., 2014). A different high-frequency EPAS1 variant is also uniquely found in Tibetan Mastiffs, and appears to also have introgressed into this gene pool via admixture with a different species, in this case Tibetan wolves (Miao et al., 2017), likely due to the same selective pressures.

To detect AI, researchers can look for regions of the genome with a particularly high frequency of introgressed haplotypes from a donor species or population into a recipient species or population. These haplotypes are often detected assuming neutrality of archaic alleles since the introgression event (Vernot et al., 2016; Vernot and Akey, 2014; Sankararaman et al., 2016). Other studies have designed statistics that are sensitive to characteristic patterns left by AI, using simulations incorporating both admixture and selection (Gittelman et al., 2016; Racimo et al., 2017b). More recently, Setter et al., 2020 developed a likelihood framework to look for local alterations to the site frequency spectrum that are consistent with adaptive introgression, using only data from the recipient species. The main challenge that these studies face is that it is hard to jointly model selection from material introduced via admixture (Racimo et al., 2015).

To overcome the need to compress data into summary statistics (which might miss important features), deep learning techniques are increasingly becoming a popular solution to address problems in population genetics. These problems include the inference of demographic histories (Sheehan and Song, 2016; Flagel et al., 2019; Villanea and Schraiber, 2019; Mondal et al., 2019; Sanchez et al., 2021), admixture (Blischak et al., 2021), recombination (Chan et al., 2018; Flagel et al., 2019; Adrion et al., 2020b), and natural selection (Schrider and Kern, 2018; Sheehan and Song, 2016; Torada et al., 2019; Isildak et al., 2021). Deep learning is a branch of machine learning that relies on algorithms structured as multi-layered networks, which are trained using known relationships between the input data and the desired output. They can be used for classification, prediction, or data compression (Aggarwal, 2018). Among the techniques in this field, convolutional neural networks (CNNs) are a family of methods originally designed for image recognition and segmentation (LeCun and Bengio, 1995; Krizhevsky et al., 2012), which have been recently applied to population genetic data (Chan et al., 2018; Flagel et al., 2019; Torada et al., 2019; Isildak et al., 2021; Blischak et al., 2021; Sanchez et al., 2021). A CNN can learn complex spatial patterns from large datasets that may be informative for classification or prediction, using a series of linear operations known as convolutions, to compress the data into features that are useful for inference.

Despite the recent advances in deep learning for population genetics, no dedicated attempts have been made to identify AI from population genomic data. Here, we develop a deep learning method called genomatnn that jointly models archaic admixture and positive selection, in order to identify regions of the genome under adaptive introgression. We trained a CNN to learn relevant features directly from a genotype matrix at a candidate region, containing data from the donor population, the recipient population and a unadmixed outgroup. The method has >88% precision to detect AI and is effective on both ancient and recently selected introgressed haplotypes. We then applied our method to population genomic datasets where the donor population is either Neanderthals or Denisovans and the recipient populations are Europeans or Melanesians, respectively. In each case, we used the Yoruba population as a unadmixed outgroup and we were able to both recover previously identified AI regions and unveil new candidates for AI in human history.

Results

A CNN for detecting adaptive introgression

We assume we have sequence data from multiple populations: the donor population and the recipient population in an admixture event, as well as an unadmixed population that is a sister group to the recipient (Figure 1). Our method relies on partitioning the genomes into windows, which we chose to be 100 kbp in size. For each window, we constructed an n×m matrix, where n corresponds to the number of haplotypes (or diploid genotypes, for unphased data) and m correspond to a set of equally sized bins along the genomic length of the window. Each matrix entry contains the count of minor alleles in an individual’s haplotype (or diploid genotype) in a given bin. Within each population, we sorted these pseudo-haplotypes (or genotypes) according to similarity to the donor population, and concatenated the matrices for each of the populations into a single pseudo-genotype matrix (Figure 1).

Figure 1. A schematic overview of how genomatnn detects adaptive introgression.

We first simulate a demographic history that includes introgression, such as Demographic Model A1 shown in (A), using the SLiM engine in stdpopsim. Parameter values for this model are given in Appendix 3—table 1. Three distinct scenarios are simulated for a given demographic model: neutral mutations only, a sweep in the recipient population, and adaptive introgression. The tree sequence file from each simulation is converted into a genotype matrix for input to the CNN. (B) shows a genotype matrix from an adaptive introgression simulation, where lighter pixels indicate a higher density of minor alleles, and haplotypes within each population are sorted left-to-right by similarity to the donor population (Nea). In this example, haplotype diversity is low in the recipient population (CEU), which closely resembles the donor (Nea). Thousands of simulations are produced for each simulation scenario, and their genotype matrices are used to train a binary-classification CNN (C). The CNN is trained to output Pr[AI], the probability that the input matrix corresponds to adaptive introgression. Finally, the trained CNN is applied to genotype matrices derived from a VCF/BCF file (D).

Figure 1.

Figure 1—figure supplement 1. Schematic overview of Demographic Model A1 and A2.

Figure 1—figure supplement 1.

Schematic overview of Demographic Model A1 (A) and A2 (B). Each population is depicted as a tube, where the tube’s width is proportional to the population’s size at any given time. Horizontal lines with arrows indicate either an ancestor/descendant relation (thick solid lines, open arrow heads), an admixture pulse (dashed lines, closed arrow heads), or a period of continuous migration (thin solid lines, closed arrow heads). The time of continuous migration lines were drawn randomly from the time interval over which the migrations occur. A Demes-format YAML file for each demographic model is available from the genomatnn git repository.
Figure 1—figure supplement 2. Schematic overview of Demographic Model B.

Figure 1—figure supplement 2.

Overview of the Jacobs et al., 2019 demographic model (A), featuring two pulses of Denisovan gene flow into Papuans, which we implemented as the PapuansOutOfAfrica_10J19 model in stdpopsim. The same model is shown in (B), zoomed in to more clearly show the many events occurring between generations 800–2300. Each population is depicted as a tube, where the tube’s width is proportional to the population’s size at any given time. Horizontal lines with arrows indicate either an ancestor/descendant relation (thick solid lines, open arrow heads), an admixture pulse (dashed lines, closed arrow heads), or a period of continuous migration (thin solid lines, closed arrow heads). The time of continuous migration lines were drawn randomly from the time interval over which the migrations occur. DenA and NeaA are the sampled populations corresponding to Altai Denisovan and Altai Neanderthal, while Den1, Den2, and Nea1 correspond to introgressing lineages. A Demes-format YAML file for each demographic model is available from the genomatnn git repository.

We designed a CNN (Figure 1) that takes this concatenated matrix as input to distinguish between adaptive introgression scenarios and other types of neutral or selection scenarios. The CNN was trained using simulations, and uses a series of convolution layers with successively smaller outputs, to extract increasingly higher level features of the genotype matrices—features which are simultaneously informative of introgression and selection. The CNN outputs the probability that the input matrix comes from a genomic region that underwent adaptive introgression. As our simulations used a wide range of selection coefficients and times of selection onset, the network does not assume these parameters are known a priori, and is able to detect complete or incomplete sweeps at any time after gene flow.

Our method has several innovative features relative to previous population genetic implementations of CNNs (described extensively in the Materials and methods section). For example, when loading the genotype matrices as input, we implemented an image resizing scheme that leads to fast training times, while avoiding the drawbacks of similar methods (Torada et al., 2019), by preserving inter-allele distances and thus the local density of segregating sites. Additionally, instead of using pooling layers, we used a 2 × 2 step size when performing convolutions. This has the same effect as pooling, in that the output size of the layer is smaller than the input, so the accuracy of the model is comparable to traditional implementations of CNNs, but it has a much lower computational burden (Springenberg et al., 2015).

Furthermore, we incorporated a framework to visualise the features of the input data that draw the most attention from the CNN, by plotting saliency maps (Simonyan et al., 2014). Saliency maps can help to understand which regions of the genotype matrix contribute the most towards the CNN prediction score.

We also provide downloadable pre-trained CNNs as well as a pipeline for training new CNNs (see Materials and methods). These interface with a new selection module that we designed and incorporated into the stdpopsim framework (Adrion et al., 2020a), using the forwards-in-time simulator SLiM (Haller and Messer, 2019). We believe this will facilitate the application of the method to other datasets, allowing users to modify its parameters according to the specific requirements of the biological system under study.

Performance on simulations

We aimed to assess the performance of our method on simulations. We performed simulations under two main demographic models:

  • Demographic Model A1: a three-population model including an African, a European and a Neanderthal population, with Neanderthal gene flow into Europeans (Figure 1 and Figure 1—figure supplement 1)

  • Demographic Model B: a more complex model, including an African, a Melanesian, a Neanderthal and a Denisovan population, with two pulses of Denisovan gene flow into Melanesians, plus Neanderthal gene flow into non-Africans, based on Jacobs et al., 2019 (Figure 1—figure supplement 2).

When training a CNN on Demographic Model A1 using phased data, we obtained a precision of 90.2% (the proportion of AI predictions that were AI simulations) and 97.9% negative predictive value (NPV; the proportion of ‘not-AI’ predictions that were either neutral or sweep simulations) (Figure 2). The network output higher probabilities for AI simulations with larger selection coefficients, and for older times of onset of selection. We also observed that the network falsely classified neutral simulations as AI more frequently than it falsely classified sweep simulations. When the CNN was trained on this same demographic model assuming genotypes were unphased, the results were very similar, with 88.1% precision and 98.7% NPV.

Figure 2. CNN performance on validation simulations for Demographic Model A.

The CNN was trained using only AI simulations with selected mutation having allele frequency >0.25. (A) Confusion matrix. For the two prediction categories, either 'not AI' or AI, we show the proportion attributed to each of the true (simulated) scenarios. (B) Average CNN prediction for AI scenarios, binned by selection coefficient, s, and time of onset of selection Tsel. (C) ROC curves, precision-recall curves and MCC-F1 curves. The positive condition is AI. The negative conditions are shown using different line styles/colours. The circles indicate the point in ROC-space (respectively Precision-Recall-space, and MCC-F1-space) when using the threshold Pr[AI]>0.5 for classifying a genotype matrix as AI. DFE: distribution of fitness effects. TP: true positives; FP: false positives; TN: true negatives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; ROC: Receiver operating characteristics; MCC: Mathews correlation coefficient; F1: harmonic mean of precision and recall.

Figure 2.

Figure 2—figure supplement 1. Performance evaluation for Demographic Model B.

Figure 2—figure supplement 1.

CNN performance on validation simulations for Demographic Model B with unphased data. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%. (A) Confusion matrix. For the two prediction categories, either 'not AI' or AI, we show the proportion attributed to each of the true (simulated) scenarios. (B) Average CNN prediction for AI scenarios, binned by selection coefficient, s, and time of onset of selection Tsel. (C) ROC curves, precision-recall curves and MCC-F1 curves. The positive condition is AI. The negative conditions are shown using different line styles/colours. The circles indicate the point in ROC-space (respectively Precision-Recall-space, and MCC-F1-space) when using the threshold Pr[AI]>0.5 for classifying a genotype matrix as AI. DFE: distribution of fitness effects. TP: true positives; FP: false positives; TN: true negatives; FN: false negatives; TPR: true positive rate; FPR: false positive rate; ROC: Receiver operating characteristics; MCC: Mathews correlation coefficient; F1: harmonic mean of precision and recall.
Figure 2—figure supplement 2. Comparison to other methods and performance evaluation with misspecified demographic models.

Figure 2—figure supplement 2.

Unit-normalised Matthews correlation coefficient (MCC) versus F1 score (the harmonic mean of accuracy and precision). A value of 0.5 on the vertical axis corresponds to the performance of a random classifier. The point at coordinate (1,1) marked with a black dot corresponds to 100% true positives and 0% false negatives. Lines in MCC-F1 space were drawn by calculating the MCC and F1 values for 100 false-positive rates between 0 and 100, and the point closest to (1,1) is indicated with the symbol shown in the legend. This point may not correspond to an acceptably low false-positive rate, but for the classifiers shown here it is indicative of the method’s overall performance. In all panels, condition positive is the AI simulation scenario, and the condition negative varies by panel column (indicated at top). The 'weakly misspecified' row used simulations of Model A1 as the training/null, and evaluated the method on simulations of Model A2. The 'strongly misspecified' row used simulations of Model A1 as the training/null, and evaluated the method on simulations of Model B.

When training a CNN on Demographic Model B (assuming unphased genotypes, as accurately phased data are not readily available for Melanesian genomes), we obtained 88.8% precision and 82.5% NPV (Figure 2—figure supplement 1). We note here that the network had greater precision when detecting AI derived from the more ancient pulse of Denisovan gene flow than the younger pulse.

Kim et al., 2018 and Zhang et al., 2020 recently suggested that introduced genetic material can mask deleterious recessive variation and produce a signal very similar to adaptive introgression. To assess whether heterosis following introgression affects the false positive rates in our CNN, we simulated a distribution of fitness effects (DFE) with recessive dominance for 70% of derived mutations (the rest were simulated as neutral), and found this only slightly increases the false positive rate (Figure 2).

MCC-F1 curve

While precision, recall, and false positive rate are informative, these each consider only two of the four cells in a confusion matrix (true positives, true negatives, false positives, false negatives), and may produce a distorted view of performance with imbalanced datasets (Chicco, 2017). To obtain a more robust performance assessment, we plotted the Matthews correlation coefficient (MCC; Matthews, 1975) against F1-score (the harmonic mean of precision and recall) for false-positive-rate thresholds from 0 to 100 (Figure 2, Figure 2—figure supplement 1, Figure 2—figure supplement 2), as recently suggested by Cao et al., 2020. MCC produces low scores unless a classifier has good performance in all four confusion matrix cells, and also accounts for class imbalance. In MCC-F1 space, the point (1, 1) indicates perfect predictions, and values of 0.5 for the (unit-normalised) MCC indicate random guessing. These results confirm our earlier findings, that the CNN performance is excellent for Demographic Model A1 when considering either neutral and sweep simulations as the condition negative, and performance decreases slightly when DFE simulations are the negative condition (Figure 2). Furthermore, the CNN performance is not as good for Demographic Model B, but this is unlikely to be caused by using unphased genotypes (Figure 2—figure supplement 1 and Figure 2—figure supplement 2).

Comparison to other methods

We compared the performance of our CNN to VolcanoFinder (Setter et al., 2020), which scans genomes of the recipient population for patterns of diversity indicative of AI using a coalescent-based model of adaptive introgression (Figure 2—figure supplement 2). However, this method only incorporates information from a single population and is designed to detect 'ghost' introgression in cases where the source is not available. VolcanoFinder performed poorly for the demographic models considered here—in some cases, worse than guessing randomly. We also compared our CNN to an outlier-based approach for a range of summary statistics that are sensitive to AI (Racimo et al., 2017b). Our CNN is closest to a perfect MCC-F1 score for Demographic model A1 and B, closely followed by the Q95(1%, 100%) and then U(1%, 20%, 100%) statistics developed in Racimo et al., 2017b.

Demographic model misspecification

We then tested robustness to demographic misspecification, by evaluating the CNN trained on Demographic Model A1 against simulations for two other demographic models (Figure 2—figure supplement 2). We considered weak misspecification, where the true demographic history is similar to Demographic Model A1 but also includes archaic admixture within Africa following Ragsdale and Gravel, 2019 (Demographic Model A2; Figure 1—figure supplement 1). This resulted in only a small performance reduction. We also considered strong misspecification, where the true demographic history is Demographic Model B. As there are more Melanesian individuals than European individuals in our simulations (because we aimed to mimic the real number of genomes available in our data analysis below), we downsampled the Melanesian genomes to match the number of European genomes, so as to perform a fair misspecification comparison. In this case, the performance of the CNN was noticeably worse than that of the summary statistics, but still better than VolcanoFinder. We note that the summary statistics performance decreased also, to match their performance for the correctly-specified assessments on Demographic Model B. Interestingly, we found that the Q95(1%, 100%) statistic was the most robust method for both cases of misspecification.

Network attention

To understand which features of the input matrices were used by the CNN to make its predictions, we constructed saliency maps (Simonyan et al., 2014). This technique works by computing the gradient of a network’s output with respect to a single input. Thus, highlighted regions from the saliency map indicate where small changes in the input matrix have a relatively large influence over the CNN output prediction.

We calculated an average saliency map for each simulation scenario (neutral, sweep, or AI), for a CNN trained on Demographic Model A1 (Figure 3). Our results show that when the network was presented with an AI matrix, it focused most of the attention on the Neanderthal and European haplotypes, rather than on the African haplotypes. In non-AI scenarios, the network focused sharply on the Neanderthal and left-most European haplotypes. The saliency maps also show a concentration of attention in the central region of the genomic window, closer to where the selected mutation was drawn (even though this mutation does not exist in neutral simulations, and was removed from sweep and AI simulations before constructing genotype matrices; see Methods). We also note a periodic vertical banding pattern in the saliency maps, corresponding to the filter width for the convolution layers.

Figure 3. Saliency maps, showing the CNN’s attention across the input matrices for each simulated scenario, calculated for the CNN trained on Demographic Model A, filtered for beneficial allele frequency >0.25.

Figure 3.

Each panel shows the average gradient over 300 input matrices encoding either neutral (top), sweep (middle), or AI (bottom) simulations. Pink/purple colours indicate larger gradients, where small changes in the genotype matrix have a relatively larger influence over the CNN’s prediction. Columns in the input matrix correspond to haplotypes from the populations labelled at the bottom.

Calibration

We implemented a score calibration scheme to account for the fact that our simulation categories (neutrality, sweep, and AI) will be highly imbalanced in real data applications (Guo et al., 2017; Kull et al., 2017). CNN classifiers sometimes produce improperly calibrated probabilities (Guo et al., 2017). In our case, this occurs because the proportion of each category is not known in reality, and thus does not match the simulated proportion. For this reason, we fitted our calibration procedure using training data resampled with various ratios of neutral:sweep:AI simulations (Figure 4). We tested different calibration methods by fitting the calibrator to the training dataset, and inspecting reliability plots and the sum of residuals on a validation dataset (see Materials and methods, Figure 4—figure supplement 1, Figure 4—figure supplement 2, Figure 4—figure supplement 3, Figure 4—figure supplement 4). When the probabilities are calibrated for even class ratios, Manhattan plots show a large number of high probability candidates across the genome, which obscure the strongest peaks (Figure 4). However, once calibrated for class ratios that are skewed towards the neutral class, strong candidates for AI become more apparent.

Figure 4. Comparison of Manhattan plots using beta-calibrated output probabilities for different class ratios.

Each row indicates a single CNN, with equivalent data filtering. Each column indicates different class ratios used for calibration (Neutral:Sweep:AI). AF = Minimum beneficial allele frequency.

Figure 4.

Figure 4—figure supplement 1. Reliability plots for Demographic Model A1 with AF > 5%.

Figure 4—figure supplement 1.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model A1 with a minimum beneficial allele frequency of 5%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (Z), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).
Figure 4—figure supplement 2. Reliability plots for Demographic Model A1 with AF > 25%.

Figure 4—figure supplement 2.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model A1 with a minimum beneficial allele frequency of 25%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (Z), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).
Figure 4—figure supplement 3. Reliability plots for Demographic Model B with AF > 5%.

Figure 4—figure supplement 3.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model B with a minimum beneficial allele frequency of 5%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (Z), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).
Figure 4—figure supplement 4. Reliability plots for Demographic Model B with AF > 25%.

Figure 4—figure supplement 4.

Reliability of probabilities produced by the CNN, for the validation dataset, with and without calibration, for Demographic Model B with a minimum beneficial allele frequency of 25%. The variance-normalised sum of residuals is inset in the upper left corner of each of the reliability plots (Z), which for well-calibrated predictions is approximately normally distributed (Turner et al., 2019).

Candidates for Neanderthal adaptive introgression in European genomes

We applied our method to a combined genomic panel of archaic hominins (Prüfer et al., 2017; Meyer et al., 2012) and present-day humans (The 1000 Auton et al., 2015; Jacobs et al., 2019), to look for regions of the genome where Non-African humans show signatures of AI from archaic hominins. First, we looked for Neanderthal introgression into the ancestors of Northwestern Europeans (CEU panel), using Yoruba Africans (YRI panel) as the unadmixed sister population. To give the network the best chance of avoiding false positives, we tried two different beneficial-allele frequency cutoffs for training: 5% and 25% (Table 1 and Appendix 1—table 1).

Table 1. Top ranking gene candidates corresponding to Neanderthal AI in Europeans.

We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >0.25, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Chrom Start End Genes
1 104500001 104600000
2 109360001 109460000 LIMS1; RANBP2; CCDC138; EDAR
2 160160001 160280000 TANC1; WDSUB1; BAZ2B
3 114480001 114620000 ZBTB20
4 54240001 54340000 SCFD2; FIP1L1; LNX1
5 39220001 39320000 FYB; C9; DAB2
6 28180001 28320000 ZSCAN16-AS1; ZSCAN16; ZKSCAN8; ZSCAN9; ZKSCAN4; NKAPL; PGBD1; ZSCAN31; ZKSCAN3; ZSCAN12; ZSCAN23
8 143440001 143560000 TSNARE1; BAI1
9 16700001 16820000 BNC2
12 85780001 85880000 ALX1
19 20220001 20380000 ZNF682; ZNF90; ZNF486
19 33580001 33740000 RHPN2; GPATCH1; WDR88; LRP3; SLC7A10
20 62100001 62280000 CHRNA4; KCNQ2; EEF1A2; PPDPF; PTK6; SRMS; C20orf195; HELZ2; GMEB2; STMN3; RTEL1; TNFRSF6B; ARFRP1; ZGPAT; LIME1; SLC2A4RG; ZBTB46
21 25840001 25940000

We focus here on describing the results from the 25% condition (Figure 5 and Appendix 4). We found several candidate genes for AI that have been reported before (Sankararaman et al., 2014; Vernot and Akey, 2014; Gittelman et al., 2016; Racimo et al., 2017b), including BNC2, KCNQ2/EEF1A2 WRD88/GPATCH1 and TANC1. Notably, the candidate region we identify on chromosome 2 around TANC1 extends farther downstream of this gene, also overlapping BAZ2B (Appendix 4—figure 3). This codes for a protein related to chromatin remodelling, and may have a role in transcriptional activation. Mutations in BAZ2B have recently been associated with neurodevelopmental disorders, including developmental delay, autism spectrum disorder and intellectual disability (Scott et al., 2020).

Figure 5. Application of the trained CNN to the Vindija and Altai Neanderthals, and 1000 genomes populations YRI and CEU.

Figure 5.

The CNN was applied to overlapping 100 kbp windows, moving along the chromosome in steps of size 20 kbp. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Additionally, we found two novel candidates for AI that have not been previously reported, spanning the regions chr6:28.18 Mb–28.32 Mb (Appendix 4—figure 7) and chr20:62.1 Mb–62.28 Mb (Appendix 4—figure 13), including multiple genes encoding zinc finger proteins. UK-biobank PheWAS associations (Canela-Xandri et al., 2018) suggest both regions generally affect phenotypes related to blood, including platelet, erythrocyte and leukocyte counts (at the p<10-8 association level, the chr6 region has 91 hits, while the chr20 region has 19, with 10 of these traits in common).

Candidates for Denisovan adaptive introgression in Melanesian genomes

We then looked for Denisovan AI in Melanesian genomes from the IGDP panel (Jacobs et al., 2019), also considering Yoruba Africans as the unadmixed sister group, and using two different beneficial-allele frequency cutoffs for training: 5% and 25% (Table 2 and Appendix 1—table 2).

Table 2. Top ranking gene candidates corresponding to Denisovan AI in Melanesians.

We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >0.25, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Chrom Start End Genes
2 129960001 130060000
3 3740001 3840000 SUMF1; LRRN1
4 41980001 42080000 TMEM33; DCAF4L1; SLC30A9; BEND4
5 420001 520000 PDCD6; AHRR; C5orf55; EXOC3; CTD-2228K2.5; SLC9A3; CEP72
6 74640001 74740000
6 81960001 82060000
6 137920001 138120000 TNFAIP3
7 25100001 25200000 OSBPL3; CYCS; C7orf31; NPVF
7 38020001 38120000 EPDR1; NME8; SFRP4; STARD3NL
7 121160001 121260000
8 3040001 3140000 CSMD1
12 84640001 84740000
12 108240001 108340000 PRDM4; ASCL4
12 114020001 114280000 RBM19
14 61860001 61960000 PRKCH
14 63120001 63220000 KCNH5
14 96700001 96820000 BDKRB2; BDKRB1; ATG2B; GSKIP; AK7
15 55260001 55400000 RSL24D1; RAB27A
16 62600001 62700000
16 78360001 78460000 WWOX
18 22060001 22160000 OSBPL1A; IMPACT; HRH4
22 19040001 19140000 DGCR5; DGCR2; DGCR14; TSSK2; GSC2; SLC25A1; CLTCL1

Again, we focus on describing the results from the 25% condition (Figure 6 and Appendix 5) Among the top candidates, we found a previously reported candidate for AI in Melanesians: TNFAIP3 (Vernot et al., 2016; Gittelman et al., 2016). Denisovan substitutions carried by the introgressed haplotype in this gene have been found to enhance the immune response by tuning the phosphorylation of the encoded A20 protein, which is an immune response inhibitor (Zammit et al., 2019).

Figure 6. Application of the trained CNN to the Altai Denisovan and Altai Neanderthal, 1000 genomes YRI populations, and IGDP Melanesians.

Figure 6.

The CNN was applied to overlapping 100 kbp windows, moving along the chromosome in steps of size 20 kbp. The CNN was trained using only AI simulations with selected mutation having allele frequency > 25%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

We found evidence for Denisovan AI in Melanesians at several other candidate regions. A few of these regions (or contiguous regions) were previously reported by Sankararaman et al., 2016 but not extensively described, possibly because the previously reported sections of those regions deemed to be introgressed were intergenic. One of the regions with strong evidence for AI (chr7:25.1 Mb–25.2 Mb; Appendix 5—figure 8) overlaps the CYCS gene. This gene codes for cytochrome C: a small heme protein that plays a crucial role in the electron transport chain in mitochondria, and has been associated with various blood-related diseases, like thrombocytopenia (Morison et al., 2008; De Rocco et al., 2014; Uchiyama et al., 2018). Another top candidate region (chr12:108.24–108.34 Mb, Appendix 5—figure 13) is upstream of PRDM4 and ASCL4. The former gene codes for a transcription factor that may be involved in the nerve growth factor cell survival pathway and might play a role in tumour suppression (Yang and Huang, 1999). The latter gene codes for a different transcription factor that may be involved in skin development (Jonsson et al., 2004).

We detected signatures of Denisovan AI in a region in chromosome 3 near SUMF1 and LRNN1 (Appendix 5—figure 2), which was also identified in Jacobs et al., 2019. SUMF1 codes for an enzyme involved in the hydrolysis of sulfate esters, which has been associated with sulfatase deficiency (Cosma et al., 2003). LRNN1 encodes a protein involved in neuronal differentiation, which has been associated with neuroblastoma and Alzheimer’s disease (Bai et al., 2014; Hossain et al., 2012). Another candidate region is in chromosome 7 and is upstream of SFRP4 (Appendix 5—figure 9), which encodes a protein associated with diabetes (Mahdi et al., 2012) and Pyle’s disease (Kiper et al., 2016). Moreover, there is also a candidate region upstream of RAB27A, in chromosome 15 (Appendix 5—figure 18). Mutations in this gene cause Griscelli syndrome, which results in pigmentary dilution in the hair and skin, as well as melanosome accumulation in melanocytes (Ménasché et al., 2000). Finally, we found evidence for Denisovan AI in two nearby regions in chromosome 14 (Appendix 5—figure 15 and Appendix 5—figure 16). One of these overlaps with PRKCH—encoding a protein kinase associated with cerebral infarction (Kubo et al., 2007). The other overlaps with KCNH5—coding for a potassium channel that may be associated with epileptic encephalopathy (Veeramah et al., 2013).

Discussion

We have developed a new method to detect adaptive introgression along the genome using convolutional neural networks. The method has high precision when reporting candidate AI loci, and high negative predictive value when rejecting loci as not-AI: we obtain greater than 90% accuracy under a variety of different selection scenarios (Appendix 2—table 1), with low false positive rates.

As reported previously (Kim et al., 2018; Zhang et al., 2020), heterosis following introgression can produce patterns very similar to AI, and we found this can inflate false positive detection of AI by our CNN to a small extent. However, we simulated a DFE with recessive dominance for all mutations, which is not realistic in general, so our results in this regard represent a worst-case scenario. A possible future improvement would be to train the CNN on simulations incorporating heterosis. We did not attempt this here because realistic DFE simulations represent a substantial computational burden.

When the demographic model is correctly specified, our CNN performed better than using summary statistics, although the Q95(1%, 100%) statistic (Racimo et al., 2017b) also performed remarkably well. This statistic captures high-frequency derived alleles that are shared between the donor and recipient population, to the exclusion of a non-introgressed sister population—intuitively, these are the same features we expect our CNN to be learning. Because of its relative robustness to model misspecification, an outlier approach based on Q95 may be a better choice than our CNN when there is uncertainty regarding the demographic history of the study system. We also found that VolcanoFinder performed very poorly across all our tests, but this is arguably an unfair comparison because it only incorporates information from a single population, and Setter et al., 2020 themselves found that their method has low sensitivity when the donor population split from the recipient population recently (on the order of N generations ago for Neanderthals/Denisovans and humans).

The CNN took approximately 15 min to train on one NVIDIA Tesla T4 GPU, which amounts to 60 CPU hours for an equivalent CPU-only training procedure. All data were loaded into memory, which required approximately 120 GB RAM during training. The computational bottleneck lay in the generation of SLiM forward simulations: 300,000 simulations took approximately 80 weeks of CPU time for each of demographic models A1 and B. In the future, considerable speedups could potentially be obtained by optimising the simulation step, perhaps by implementing an adaptive introgression simulation framework that takes advantage of the backwards coalescent (e.g. building on the work by Setter et al., 2020).

We applied the method to human data, to look for adaptive introgression from archaic humans into the ancestors of present-day human genomes. When looking for Neanderthal AI in European genomes, we found previously reported candidate genes (BNC2, WRD88/GPATCH1, KCNQ2/EEF1A2, TANC1/BAZ2B). We also recovered candidates for adaptive introgression from Denisovans by applying our method to unphased Melanesian genomes. The top candidates include TNFAIP3, which has been reported before, but also include other, novel regions, containing genes involved in blood diseases (CYCS), neurological diseases (PRKCH, KCNH5, LRNN1), metabolism (SFRP4, SUMF1), and skin development (ASCL4, RAB27A).

We note, however, that, as with previous methods, visual inspection of the haplotypes or genotypes of the top candidate regions remains a necessary criterion to accurately assess whether a region may have been under adaptive introgression. For example, in the scans we performed, we found a few candidate regions for Neanderthal AI in Europeans that are likely to be false positives, for example chr2:109360001–109460000 (Appendix 4—figure 2); chr4:54240001–54340000 (Appendix 4—figure 5); chr8:143440001–143560000 (Appendix 4—figure 8). These appear to be the result of shared ancestral variation between European and African populations, and yet are classified as having high probability of being under AI. These regions also appear to be generally low in diversity, which is possibly a result of data missingness rather than low diversity per se. Thus, our method allows for a rapid scan and prioritisation of potential targets, but these need to be further assessed with care. Inclusion of more complex selection scenarios, involving positive or balancing selection on ancestral variation, as well as linked selection, might serve to ameliorate the rate of false positives in the future. Furthermore, our simulation procedure does not model genotype errors or variation in data missingness. Not explicitly accounting for this may negatively impact the robustness of the minor allele density computation and the subsequent haplotype sorting procedure, and, in turn, affect the accuracy of the CNN.

Conversely, there may be regions under AI that are classified as highly probable by the CNN, but that did not appear in our top candidates. Validating a large number of candidates might be difficult, but one could imagine running a differently trained CNN (perhaps one better tailored to distinguish AI from more similar scenarios, like selection on shared ancestral variation) on the subset of the regions that are predicted to be AI using a lenient probability cutoff. One could also use our method more generally, to assess the impact of AI across the genome, by comparing the distribution of probability scores with those of simulation scenarios under different amounts of admixture and selection, though in that case one would need to train the CNN on a wider range of admixture rates and demographic models.

The performance of our method necessarily depends upon the demographic history of the populations involved. We found it more challenging to detect AI when the timing of gene flow is younger or the introgressing population is more diverged from the panel that is used to represent it. This is apparent when comparing results for the Neanderthal-into-European demographic scenario and the Denisovan-into-Melanesian demographic scenario. In the former, gene flow is older (∼55 kya versus ∼50 kya and ∼30 kya) (Sankararaman et al., 2016; Jacobs et al., 2019) and sequences are available for a population closely related to the putative source, which increases power. Furthermore, for the two putative pulses of Denisovan gene flow (Jacobs et al., 2019), we find our model has greater recall with AI for the more ancient pulse (94% versus 83.6%; Figure 2—figure supplement 1), likely because haplotypes from the older pulse have more time to rise in frequency. Similarly, recall is diminished when the onset of selection is more recent. We also found that distinguishing AI from a selective sweep (hard or soft), is relatively easier than distinguishing AI from neutral variation.

Our method requires sequencing data from the population from which the introgression event originated. This may be problematic in cases where the source of introgression may be distantly related to the population genomic panel that is used to represent it. Future work could involve developing a CNN that can detect adaptive introgression from a ghost (unsampled) population, for cases in which genomic data from the source is unavailable (e.g. see Setter et al., 2020).

The method can take either phased or unphased data as input. This flexibility allows for its application to a range of study systems in the future, in which phasing may not be financially or methodologically feasible. It does, however, require called genotypes and is therefore not yet suitable for genomes sequenced at low coverage. One could envision extending the framework developed here to low-coverage genomes by working with matrices of genotype likelihoods (Korneliussen et al., 2014) rather than matrices of genotypes or haplotypes. Flagel et al., 2019, for example, developed a CNN to infer recombination rates in tetraploids without genotype calls, using read pileup information. A CNN could learn the relationship between observed read counts or genotype likelihoods under a given adaptive introgression scenario and the model parameters that can generate that data, but we leave that to a future work.

Future studies could also address the fact that we must use simulations to train the network, which involves an implicit amount of supervision by the user. The range of parameters and models that are simulated during training are necessarily specified a priori, and misspecification can negatively affect CNN performance. Progress in this regard could involve the use of generative adversarial networks (GANs), which appears to be a fruitful way to address this. Indeed, recent work suggests that one can train a GAN to learn to generate realistic population genomic data for any population (Wang et al., 2020).

The attention analyses performed here allowed only a posteriori reasoning on how the network learned to predict AI, so further work is needed in this area. For instance, interpretability of neural networks can be assessed using symbolic metamodelling (Alaa and van der Schaar, 2019) with reinforcement learning algorithms deployed to identify the subset of most informative features of input data (Yoon et al., 2019). In this context, such approaches should be able to pinpoint the important characteristics of genomic data, and possibly derive more informative summary statistics to predict complex evolutionary events.

In summary, we have shown that CNNs are a powerful approach to detecting adaptive introgression and can recover both known and novel selection candidates that were introduced via admixture. As in previous applications to other problems in the field (Sheehan and Song, 2016; Flagel et al., 2019; Schrider and Kern, 2018; Villanea and Schraiber, 2019; Mondal et al., 2019; Torada et al., 2019; Isildak et al., 2021), this exemplifies how deep learning can serve as a very powerful tool for population genetic inference. This type of technique may thus be a useful resource for future studies aiming to unravel our past history and that of other species, as statistical methodologies and computational resources continue to improve.

Materials and methods

Simulations

For CNN training, we performed simulations under three scenarios: neutral mutations only; positive selection of a de novo mutation in the recipient population (selective sweep); and positive selection of a derived mutation that was transferred via gene flow from the donor population to the recipient population (adaptive introgression, AI). In the sweep and AI scenarios, the selection coefficient was drawn log-uniformly from between 0.0001 and 0.1 for Europeans and between 0.001 and 0.1 for Melanesians (in the latter case, very few selected alleles survive with a very small selection coefficient, so we narrowed the range to reduce computational burden). The uniformly distributed time of mutation was decoupled from the uniformly distributed time of selection onset, thus allowing for soft sweeps (Hermisson and Pennings, 2005). For the selective sweep scenario, the mutation and selection times could occur at any time older than 1 kya but more recent than the split between the recipient population and its unadmixed sister population, with the constraint that the mutation must be introduced before the onset of selection. For the AI scenario, a neutrally evolving mutation was introduced to the donor population any time more recent than the split between the donor and the ancestor of recipient and unadmixed sister population, but older than 1 kya before the introgression event. Then, this mutation was transmitted to the recipient population, whereupon selection could start to act on it at any time after introgression but before 1 kya.

We further evaluated our Demographic Model A1 CNNs using an additional 10,000 simulations that incorporated a DFE using the parameters estimated for Europeans in Kim et al., 2017 and used in Kim et al., 2018. We considered two mutation types: 30% neutral and 70% deleterious. The deleterious portion of introduced mutations had a selection coefficient drawn from a reflected gamma distribution with shape parameter 0.186, and expected value −0.01314833. We approximated the dominance scheme from Kim et al., 2018, using a fixed dominance coefficient for deleterious mutations of 0.5/(1-7071.07*E[s]) where E[s] is the expected value from the gamma distribution (i.e. all deleterious mutations were effectively recessive).

To incorporate selection, we implemented a new module in stdpopsim (Adrion et al., 2020a), which leverages the forwards-in-time simulator SLiM (Haller and Messer, 2019) for simulating selection. For consistency, we also used stdpopsim’s SLiM engine for neutral simulations. stdpopsim uses SLiM’s ability to output tree sequences (Haller et al., 2019; Kelleher et al., 2018), which retains complete information about the samples’ marginal genealogies. Further, stdpopsim recapitates the tree sequences (ensuring that all sampled lineages have a single common ancestor), and applies neutrally evolving mutations to the genealogies, using the coalescent framework of msprime (Kelleher et al., 2016).

We simulated 100 kbp regions, with a mutation rate of 1.29×10-8 per site per generation (Tian et al., 2019), an empirical recombination map drawn uniformly at random from the HapMapII genetic map (Frazer et al., 2007), and the selected mutation introduced at the region’s midpoint. For both the sweep scenario and the AI scenario, we used a rejection-sampling approach to condition on the selected allele’s frequency being ≥ in the recipient population at the end of the simulation. This was done by saving the simulation state prior to the introduction of the selected mutation (and saving again after successful transmission to the recipient population, for the AI scenario), then restoring simulations to the most recent save point if the mutation was lost, or if the allele frequency threshold was not met at the end of the simulation.

To speed up simulations, we applied a scaling factor of Q=10. Scaling divides population sizes (N) and event times (T) by Q, and multiplies the mutation rate µ, recombination rate r and selection coefficient s by Q, such that the population genetic parameters θ=4Nμ, ρ=4Nr, and Ns remain approximately invariant to the applied scaling factor (Haller and Messer, 2019). After simulating, we further filtered our AI scenario simulations to exclude those that ended with a minor beneficial allele frequency less than a specific cutoff. We tried two cutoffs—5% and 25%—and present results for both. Rejection sampling within SLiM was not possible at these higher thresholds, as simulations often had low probability of reaching the threshold, particularly for recently introduced mutations. We note that this post-simulation filtering alters the distributions of selection coefficients and times of selection onset used for CNN training.

To investigate Neanderthal gene flow into Europeans, we simulated an out-of-Africa demographic model with a single pulse of Neanderthal gene flow into Europeans but not into African Yoruba (Demographic Model A1, Figure 1—figure supplement 1), using a composite of previously published model parameters (Appendix 3—table 1). The number of samples to simulate for each population was chosen to match the YRI and CEU panels in the 1000 Genomes dataset (Auton et al., 2015), and the two high coverage Neanderthal genomes (Prüfer et al., 2014). The two simulated Neanderthals were sampled at times corresponding to the estimated ages of the samples as reported in Prüfer et al., 2017. To test model misspecification, we performed an additional 10,000 simulations per simulation scenario on a modified version of this model that also incorporates archaic admixture in Africa (Ragsdale and Gravel, 2019) (Demographic Model A2; Figure 1—figure supplement 1).

To investigate Denisovan gene flow into Melanesian populations, we simulated an out-of-Africa demographic history incorporating two pulses of Denisovan gene flow (Malaspinas et al., 2016; Jacobs et al., 2019) implemented as the PapuansOutOfAfrica_10J19 model in stdpopsim (Adrion et al., 2020a). For this demographic model we sampled a single Denisovan and a single Neanderthal (with sampling time of the latter corresponding to the Altai Neanderthal’s estimated age). The number of Melanesian samples was chosen to match a subset of the IGDP panel (Jacobs et al., 2019). The Baining population of New Britain was excluded at the request of the IGDP data access committee, and we also excluded first-degree relatives, resulting in a total of 139 Melanesian individuals used in the analysis. As this demographic model includes two pulses of Denisovan admixture, we simulated half of our AI simulations to correspond with gene flow from the first pulse, and half from the second pulse.

Conversion of simulations to genotype matrices

We converted the tree sequence files from the simulations into genotype matrices using the tskit Python API (Kelleher et al., 2016). Major alleles (those with sample frequency greater than 0.5 after merging all individuals) were encoded in the matrix as 0, while minor alleles were encoded as 1. In the event of equal counts for both alleles, the major allele was chosen at random. Only sites with a minor allele frequency >5% were retained. For sweep and AI simulations, we excluded the site of the selected mutation.

We note that different simulations result in different numbers of segregating sites, but a constraint for efficient CNN training is that each datum in a batch must have the same dimensions. Existing approaches to solve this problem are to use only a fixed number of segregating sites (Chan et al., 2018), to pad the matrix out to the maximum number of observed segregating sites (Flagel et al., 2019), or to use an image-resize function to constrain the size of the input data (Torada et al., 2019). Each approach discards spatial information about the local density of segregating sites, although this may be recovered by including an additional vector of inter-site distances as input to the network (Flagel et al., 2019).

To obtain the benefits of image resizing (fast training times for reduced sizes and easy application to genomic windows of a fixed size), while avoiding its drawbacks, we chose to resize our input matrices differently, and only along the dimension corresponding to sites. To resize the genomic window to have length m, the window was partitioned into m bins, and for each individual haplotype we counted the number of minor alleles observed per bin. Compared with interpolation-based resizing (Torada et al., 2019), binning is qualitatively similar, but preserves inter-allele distances and thus the local density of segregating sites. Furthermore, as we do not resize along the dimension corresponding to individuals, this also permits the use of permutation-invariant networks (Chan et al., 2018), although we do not pursue that network architecture here.

We report results for m=256, but also tried m=32, 64, and 128 bins. Preliminary results indicated greater training and validation accuracy for CNNs trained with more bins, around 1% difference between both 32 and 64, and 64 and 128, although only marginal improvement for 256 compared with 128 bins. When matching unphased data, we combined genotypes by summing minor allele counts between the chromosomes of each individual. We note that all data were treated as either phased, or unphased, and no mixed phasing was considered.

We then partitioned the resized genotype matrix into submatrices by population. Submatrices were ordered left-to-right according to the donor, recipient, and unadmixed populations respectively. For genotype matrices including both Neanderthals and Denisovans, we placed the non-donor archaic population to the left of the donor. To ensure that a non-permutation-invariant CNN could learn the structure in our data, we sorted the haplotypes (Flagel et al., 2019; Torada et al., 2019). The resized haplotypes/individuals within each submatrix were ordered left-to-right by decreasing similarity to the donor population, calculated as the Euclidean distance to the average minor-allele density of the donor population (analogous to a vector of the donor allele frequencies). An example (phased) genotype matrix image for an AI simulation is shown in Figure 1.

Conversion of empirical data to genotype matrices

Using bcftools (Li, 2011), we performed a locus-wise intersection of the following VCFs: 1000 Genomes (The 1000 Auton et al., 2015), IGDP (Jacobs et al., 2019), the high coverage Denisovan genome (Meyer et al., 2012), and the Altai and Vindija Neanderthal genomes (Prüfer et al., 2014). All VCFs corresponded to the GRCh37/hg19 reference sequence. Genotype matrices were constructed by parsing the output of bcftools query over 100 kbp windows, filtering out sites with sample allele frequency <5% or with more than 10% of genotypes missing, then excluding windows with fewer than 20 segregating sites. Each genotype matrix was then resized and sorted as described for simulations. When data were considered to be phased, as for the CEU/YRI populations, we also treated the Neanderthal genotypes as if they were phased according to REF/ALT columns in the VCF. While this is equivalent to random phasing, both high-coverage Neanderthal individuals are highly inbred, so this is unlikely to be problematic in practice.

CNN model architecture and training

We implemented the CNN model in Keras (Chollet, 2015), configured to use the Tensorflow backend (Abadi et al., 2015). To save disk space and memory, the input matrices were stored as eight bit integers rather than floating point numbers, and were not mean-centred or otherwise normalised prior to input into the network. We instead made the first layer of our network a batch normalisation layer, which simultaneously converts the input layer to floating point numbers and learns the best normalisation of the data for the network.

The CNN architecture (Figure 1) consists of k convolution blocks each comprised of a batch normalisation layer followed by a 2D convolution layer with 2 × 2 stride, 16 filters of size 4 × 4, and leaky ReLU activation. The k blocks are followed by a single fully-connected output node of size one, with sigmoid activation giving the probability Pr[AI]. We do not include pooling layers, as is common in a CNN architecture (e.g. Torada et al., 2019), and instead use a 2 × 2 stride size to reduce the output size of successive blocks (Springenberg et al., 2015). This is computationally cheaper and had no observable difference in network performance. We sought to maximise the depth of the network, but the size of the input matrix constrains the maximum number of blocks in the network due to successive halving of the dimensionality in each block. For m=256 resizing bins, we used k=7 blocks.

We partitioned 100,000 independent simulations for each of the three selection scenarios into training and validation sets (approximate 90%/10% split). The hyperparameters and network architecture were tuned on a smaller preliminary set of simulations that did not vary the selection coefficient or time of onset of selection, so we chose not to split the simulations into a third 'test’ set when evaluating the models trained on our final simulations. The model was trained for three epochs, with model weights updated after batches of 64, using the Adam optimiser and cross-entropy for the loss function. We evaluated model fit by inspecting loss and accuracy terms at the end of training (Appendix 2—table 1). Preliminary analyses indicated three epochs were sufficient for approximate convergence between training and validation metrics, but we did not observe divergence (likely indicating overfitting) even when training for additional epochs.

Comparison to other methods

We converted our simulated tree sequences to VolcanoFinder (Setter et al., 2020) input (a per-locus allele counts file, and a frequency-spectrum input file). One frequency-spectrum input file was created for each distinct demographic model, obtained by averaging over all neutral-scenario simulations. VolcanoFinder was run following examples in the manual, using 800 evenly-spaced genomic bins (-G 800), and taking the maximum value for the likelihood-ratio test statistic (LRT) as the summary statistic for that simulation. VolcanoFinder further requires a value for the divergence between the donor and recipient populations, which it can estimate by doing a grid search for the value which maximises the LRT. However, this is more computationally intensive than providing a value, so we obtained a value by grid search for a small sample of our simulations for each of Demographic Models A1 and B, and used the most frequently observed value (-D 0.001075 for Demographic Model A1 and A2, and -D 0.001465 for Demographic Model B).

Additional AI-related summary statistics were chosen based on Racimo et al., 2017b and calculated on the simulated tree sequences. The fd statistic was implemented from the description in Martin et al., 2015 and the remaining statistics were implemented from their description in Racimo et al., 2017b. Summary statistics (including the VolcanoFinder LRT) were obtained for 600,000 simulations (100,000 for each of three simulation scenarios, for each of Demographic Models A1 and B). We calculated p-values by comparing each statistic to the null distribution that was obtained from the neutral simulation scenarios.

Calibration

For a well calibrated output, we expect proportion x of the output probabilities with Pr[AI] x to be true positives. It has been noted elsewhere (Guo et al., 2017) that CNNs may produce improperly calibrated probabilities. However, even if the probabilities are calibrated with respect to the validation dataset (which has even class ratios), this is unlikely to hold for empirical data, as the relative ratios of AI versus not-AI windows in the genome are very skewed.

We tested three calibration methods: beta regression (Kull et al., 2017), isotonic regression (Chakravarti, 1989), and Platt, 1999 scaling. To calibrate our CNN output, we first resampled our training dataset to the desired class ratios. We then fit each calibrator to predict the true class in the resampled training dataset from the CNN prediction for the resampled training dataset. To assess the calibration procedure, we inspected reliability plots for our calibrated and uncalibrated predictions, as evaluated with a resampled validation dataset (Figure 4—figure supplement 1, Figure 4—figure supplement 2, Figure 4—figure supplement 3, Figure 4—figure supplement 4). We also checked if the sum of the residuals was normally distributed, following the approach of Turner et al., 2019. Both beta calibration and isotonic regression gave well-calibrated probabilities compared with uncalibrated model outputs, and for our predictions on empirical data we chose to apply beta calibration due to its relative simplicity.

Saliency maps

We computed average saliency maps, by averaging over a set of input-specific saliency maps that were calculated for a set of 300 simulated genotype matrices for each simulated scenario. The input-specific saliency maps were calculated using tf-keras-vis v0.5.5 (Kubota, 2020) configured to use ‘vanilla’ saliency maps. A sharper image was obtained by exchanging the CNN output layer’s sigmoid activation with linear activation, as recommended in the tf-keras-vis documentation. For the ‘vanilla’ saliency option, the image-specific class saliency is calculated by computing the gradient of a network’s output with respect to a single input. The exact details of how the saliency is calculated via propagation through a neural network can be found in Simonyan et al., 2014, who offer this interpretation: ‘[T]he magnitude of the derivative indicates which pixels need to be changed the least to affect the class score the most’.

Application of trained CNN to empirical datasets

We show Manhattan plots where each data point is a 100 kbp window that moves along the genome in steps of size 20 kbp. Gene annotations were extracted from the Ensembl release 87 GFF3 file (with GRCh37/hg19 coordinates), obtained via ensembl’s ftp server. We extracted the columns with source=‘ensembl_havana’ and type=‘gene’, and report the genes which intersected with the 30 top ranking CNN predictions or a 100 kbp flanking region. Adjacent regions were merged together prior to intersection, so that genes were reported only once.

Compute resources

All simulations and results reported here were obtained on an compute server with two Intel Xeon 6248 CPUs (80 cores total), 768 GB RAM, and five NVIDIA Tesla T4 GPUs. 300,000 SLiM simulations took approximately 80 weeks of CPU time for each of Demographic Model A1 and B. Each simulation executes independently, and is readily distributed across cores or compute nodes. This produced 450 GB of tree sequence files. The resized genotype matrices were compressed into a Zarr cache (Zarr Development Team, 2020) with size 2.8 GB, for faster loading. Training a single CNN on one GPU took approximately 15 min, or 60 CPU hours for an equivalent CPU-only training procedure. We did not attempt to optimise memory usage, and thus all data were loaded into memory, requiring approximately 120 GB RAM during training. Predicting AI for all genomic windows on an empirical dataset (22 single-chromosome BCF files) took 1 CPU hour. However, our prediction pipeline uses multiprocessing and efficiently scales to 80 cores.

Code availability

The source code for performing simulations, training and evaluating a CNN, and applying a CNN to empirical VCF data, were developed in a new Python application called genomatnn, available at Gower, 2021.

Acknowledgements

We thank Andrew Kern, Martin Sikora, Flora Jay and Anders Albrechtsen, as well as three anonymous reviewers, and the members of the Racimo group and the PopSim consortium, for helpful advice and discussions. We also thank Murray Cox and Georgi Hudjashov for facilitating access to the IGDP data. Funding FR and GG were supported by a Villum Fonden Young Investigator award to FR (project no. 00025300). MF was supported by a Leverhulme Research Project grant (RPG-2018–208) and the Imperial College European Partners Fund. FR was also supported by a Lundbeckfonden grant (R302-2018-2155) and a Novo Nordisk Fonden grant (NNF18SA0035006) to the GeoGenetics Centre. The funding sources were not involved in study design, data collection and interpretation, or the decision to submit the work for publication.

Appendix 1

Appendix 1—table 1. Top ranking gene candidates corresponding to Neanderthal AI in Europeans.

We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >5%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02 .c

Chrom Start End Genes
1 39420001 39520000 RRAGC; MYCBP; GJA9; RHBDL2; AKIRIN1; NDUFS5; MACF1
2 159880001 160280000 TANC1; WDSUB1; BAZ2B
2 180060001 180160000 SESTD1
2 227800001 227900000 RHBDD1; COL4A4
2 238820001 238960000 LRRFIP1; RBM44; RAMP1; UBE2F; SCLY; ESPNL; KLHL30
3 114500001 114600000 ZBTB20
5 57960001 58060000 RAB3C
6 28160001 28380000 ZSCAN16-AS1; ZSCAN16; ZKSCAN8; ZSCAN9; ZKSCAN4; NKAPL; PGBD1; ZSCAN31; ZKSCAN3; ZSCAN12; ZSCAN23; GPX6
8 17060001 17160000 MICU3; ZDHHC2; CNOT7; VPS37A; MTMR7
8 91840001 91940000 TMEM64; NECAB1; TMEM55A
9 16700001 16860000 BNC2
10 11800001 11900000 ECHDC3; PROSER2; UPF2
11 37740001 37840000
19 20260001 20360000 ZNF90; ZNF486
19 33580001 33700000 RHPN2; GPATCH1; WDR88; LRP3; SLC7A10
20 14340001 14440000 MACROD2; FLRT3

Appendix 1—table 2. Top ranking gene candidates corresponding to Denisovan AI in Melanesians.

We show genes which overlap, or are within 100 kbp of, the 30 highest ranked 100 kbp intervals. Adjacent intervals have been merged. The CNN was trained using only AI simulations with selected mutation having allele frequency >5%, and subsequently calibrated with resampled neutral:sweep:AI ratios of 1:0.1:0.02.

Chrom Start End Genes
1 2880001 2980000 ACTRT2; LINC00982; PRDM16
1 220080001 220180000 SLC30A10; EPRS; BPNT1; IARS2
2 221040001 221140000
3 15400001 15500000 SH3BP5; METTL6; EAF1; COLQ
4 41960001 42100000 TMEM33; DCAF4L1; SLC30A9; BEND4
5 135440001 135540000 TGFBI; SMAD5-AS1; SMAD5; TRPC7
6 81980001 82120000 FAM46A
7 121160001 121260000
9 95500001 95600000 IPPK; BICD2; ZNF484
10 59660001 59760000
12 80780001 80880000 OTOGL; PTPRQ
12 84620001 84740000
14 57620001 57760000 EXOC5; AP5M1; NAA30
17 29480001 29720000 NF1; OMG; EVI2B; EVI2A; RAB11FIP4
18 38180001 38320000
20 54340001 54440000

Appendix 2

Appendix 2—table 1. Loss and accuracy for CNNs after training for three epochs, as reported by Keras/Tensorflow, for the training and validation datasets.

Binary cross-entropy was used for the loss function.

Demographic model Hyperparameters Training Validation
Loss Accuracy Loss Accuracy
A1 AF>0.05 0.1592 0.9458 0.1618 0.9468
A1 AF>0.25 0.1224 0.9585 0.1265 0.9578
A1 AF>0.25; unphased 0.1347 0.9537 0.1368 0.9530
B AF>0.05; unphased 0.3415 0.8439 0.3441 0.8439
B AF>0.25; unphased 0.3546 0.8372 0.3583 0.8376

Appendix 3

Appendix 3—table 1. Parameter values used for simulating Demographic Model A1.

A Demes-format YAML file for each demographic model is available from the genomatnn git repository.

Parameter Description Value Units Source
NAnc ancestral pop. size 18500 Kuhlwilm et al., 2016
NNea Neanderthal pop. size 3400 Kuhlwilm et al., 2016
NYRI YRI pop. size 27600 Kuhlwilm et al., 2016
NCEU0 CEU bottleneck pop. size 1080 Ragsdale and Gravel, 2019
NCEU1 CEU growth-start pop. size 1450 Ragsdale and Gravel, 2019
NCEU2 CEU current pop. size 13377
rCEU CEU growth rate 0.00202 Ragsdale and Gravel, 2019
TCEU2 CEU time at growth start 31.9 kya Ragsdale and Gravel, 2019
T0 Nea/other split time 550 kya Prüfer et al., 2017
T1 CEU/YRI split time 65.7 kya Ragsdale and Gravel, 2019
T2 time of Nea → CEU gene flow 55 kya Prüfer et al., 2017
g generation time 29 years Prüfer et al., 2017
α Nea → CEU admixture proportion 2.25 Prüfer et al., 2017
TAltai sampling time 115 kya Prüfer et al., 2017
TVindija sampling time 55 kya Prüfer et al., 2017
nNean sample size 2 diploid individuals
nAfr sample size 108 diploid individuals
nEur sample size 99 diploid individuals
s selection coefficient 10Unif(-4,-1)
Tsel1 selection onset (sweep) Unif(1, T1) kya
Tmut1 mutation (sweep) Unif(Tsel1, T1) kya
Tsel2 selection onset (AI) Unif(1, T2) kya
Tmut2 mutation (AI) Unif(T2, T0) kya

Appendix 4

Appendix 4—figure 1. Haplotype plot for the candidate region chr1:104500001–104600000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 1.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 2. Haplotype plot for the candidate region chr2:109360001–109460000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 2.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 3. Haplotype plot for the candidate region chr2:160160001–160280000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 3.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 4. Haplotype plot for the candidate region chr3:114480001–114620000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 4.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 5. Haplotype plot for the candidate region chr4:54240001–54340000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 5.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 6. Haplotype plot for the candidate region chr5:39220001–39320000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 6.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 7. Haplotype plot for the candidate region chr6:28180001–28320000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 7.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 8. Haplotype plot for the candidate region chr8:143440001–143560000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 8.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 9. Haplotype plot for the candidate region chr9:16700001–16820000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 9.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 10. Haplotype plot for the candidate region chr12:85780001–85880000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 10.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 11. Haplotype plot for the candidate region chr19:20220001–20380000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 11.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 12. Haplotype plot for the candidate region chr19:33580001–33740000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 12.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 13. Haplotype plot for the candidate region chr20:62100001–62280000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 13.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 4—figure 14. Haplotype plot for the candidate region chr21:25840001–25940000 in the Neanderthal-into-European AI scan.

Appendix 4—figure 14.

Bright yellow indicates minor allele, dark blue indicates major allele. Haplotypes within populations are sorted left-to-right by similarity to Neanderthals.

Appendix 5

Appendix 5—figure 1. Genotype plot for the candidate region chr2:129960001–130060000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 1.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 2. Genotype plot for the candidate region chr3:3740001–3840000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 2.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 3. Genotype plot for the candidate region chr4:41980001–42080000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 3.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 4. Genotype plot for the candidate region chr5:420001–520000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 4.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 5. Genotype plot for the candidate region chr6:74640001–74740000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 5.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 6. Genotype plot for the candidate region chr6:81960001–82060000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 6.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 7. Genotype plot for the candidate region chr6:137920001–138120000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 7.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 8. Genotype plot for the candidate region chr7:25100001–25200000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 8.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 9. Genotype plot for the candidate region chr7:38020001–38120000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 9.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 10. Genotype plot for the candidate region chr7:121160001–121260000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 10.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 11. Genotype plot for the candidate region chr8:3040001–3140000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 11.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 12. Genotype plot for the candidate region chr12:84640001–84740000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 12.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 13. Genotype plot for the candidate region chr12:108240001–108340000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 13.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 14. Genotype plot for the candidate region chr12:114020001–114280000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 14.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 15. Genotype plot for the candidate region chr14:61860001–61960000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 15.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 16. Genotype plot for the candidate region chr14:63120001–63220000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 16.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 17. Genotype plot for the candidate region chr14:96700001–96820000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 17.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 18. Genotype plot for the candidate region chr15:55260001–55400000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 18.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 19. Genotype plot for the candidate region chr16:62600001–62700000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 19.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 20. Genotype plot for the candidate region chr16:78360001–78460000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 20.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 21. Genotype plot for the candidate region chr18:22060001–22160000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 21.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Appendix 5—figure 22. Genotype plot for the candidate region chr22:19040001–19140000 in the Denisovan-into-Melanesian AI scan.

Appendix 5—figure 22.

Dark blue = homozygote major allele, light blue = heterozygote, yellow = homozygote minor allele. Genotypes within populations are sorted left-to-right by similarity to the Denisovan.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Graham Gower, Email: graham.gower@gmail.com.

George H Perry, Pennsylvania State University, United States.

George H Perry, Pennsylvania State University, United States.

Funding Information

This paper was supported by the following grants:

  • Villum Fonden 00025300 to Fernando Racimo.

  • Leverhulme Trust RPG-2018-208 to Matteo Fumagalli.

  • Lundbeckfonden R302-2018-21555 to Fernando Racimo.

  • Novo Nordisk Fonden NNF18SA0035006 to Fernando Racimo.

Additional information

Competing interests

No competing interests declared.

Author contributions

Software, Formal analysis, Investigation, Visualization, Methodology, Writing - original draft, Writing - review and editing.

Software, Formal analysis, Visualization, Methodology, Writing - review and editing.

Conceptualization, Writing - review and editing.

Conceptualization, Supervision, Funding acquisition, Methodology, Writing - original draft, Writing - review and editing.

Additional files

Transparent reporting form

Data availability

Source code is available from https://github.com/grahamgower/genomatnn/.

The following datasets were generated:

References

  1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X. TensorFlow: large-scale machine learning on heterogeneous systems. arXiv. 2015 https://arxiv.org/abs/1603.04467
  2. Adrion JR, Cole CB, Dukler N, Galloway JG, Gladstein AL, Gower G, Kyriazis CC, Ragsdale AP, Tsambos G, Baumdicker F, Carlson J, Cartwright RA, Durvasula A, Gronau I, Kim BY, McKenzie P, Messer PW, Noskova E, Ortega-Del Vecchyo D, Racimo F, Struck TJ, Gravel S, Gutenkunst RN, Lohmueller KE, Ralph PL, Schrider DR, Siepel A, Kelleher J, Kern AD. A community-maintained standard library of population genetic models. eLife. 2020a;9:e54967. doi: 10.7554/eLife.54967. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Adrion JR, Galloway JG, Kern AD. Predicting the landscape of recombination using deep learning. Molecular Biology and Evolution. 2020b;37:1790–1808. doi: 10.1093/molbev/msaa038. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Aggarwal CC. Neural Networks and Deep Learning. Springer; 2018. [DOI] [Google Scholar]
  5. Alaa AM, van der Schaar M. Demystifying black-box models with symbolic metamodels. In: Wallach H, Larochelle H, Beygelzimer A, Alché-Buc F. d, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems 32. Curran Associates, Inc; 2019. pp. 11304–11314. [Google Scholar]
  6. Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR, 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Bai Z, Stamova B, Xu H, Ander BP, Wang J, Jickling GC, Zhan X, Liu D, Han G, Jin LW, DeCarli C, Lei H, Sharp FR. Distinctive RNA expression profiles in blood associated with Alzheimer disease after accounting for white matter hyperintensities. Alzheimer Disease and Associated Disorders. 2014;28:226–233. doi: 10.1097/WAD.0000000000000022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Blischak PD, Barker MS, Gutenkunst RN. Chromosome-scale inference of hybrid speciation and admixture with convolutional neural networks. Molecular Ecology Resources. 2021;8:13355. doi: 10.1111/1755-0998.13355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Canela-Xandri O, Rawlik K, Tenesa A. An atlas of genetic associations in UK Biobank. Nature Genetics. 2018;50:1593–1599. doi: 10.1038/s41588-018-0248-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Cao C, Chicco D, Hoffman MM. The MCC-F1 curve: a performance evaluation technique for binary classification. arXiv. 2020 https://arxiv.org/abs/2006.11278
  11. Chakravarti N. Isotonic median regression: a linear programming approach. Mathematics of Operations Research. 1989;14:303–308. doi: 10.1287/moor.14.2.303. [DOI] [Google Scholar]
  12. Chan J, Perrone V, Spence J, Jenkins P, Mathieson S, Song Y. A likelihood-free inference framework for population genetic data using exchangeable neural networks. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, editors. Advances in Neural Information Processing Systems 31. Curran Associates, Inc; 2018. pp. 8594–8605. [PMC free article] [PubMed] [Google Scholar]
  13. Chicco D. Ten quick tips for machine learning in computational biology. BioData Mining. 2017;10:35. doi: 10.1186/s13040-017-0155-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Chollet F. Keras. 2015 https://keras.io
  15. Cosma MP, Pepe S, Annunziata I, Newbold RF, Grompe M, Parenti G, Ballabio A. The multiple sulfatase deficiency gene encodes an essential and limiting factor for the activity of sulfatases. Cell. 2003;113:445–456. doi: 10.1016/s0092-8674(03)00348-9. [DOI] [PubMed] [Google Scholar]
  16. De Rocco D, Cerqua C, Goffrini P, Russo G, Pastore A, Meloni F, Nicchia E, Moraes CT, Pecci A, Salviati L, Savoia A. Mutations of cytochrome c identified in patients with thrombocytopenia THC4 affect both apoptosis and cellular bioenergetics. Biochimica Et Biophysica Acta (BBA) - Molecular Basis of Disease. 2014;1842:269–274. doi: 10.1016/j.bbadis.2013.12.002. [DOI] [PubMed] [Google Scholar]
  17. Durvasula A, Sankararaman S. A statistical model for reference-free inference of archaic local ancestry. PLOS Genetics. 2019;15:e1008175. doi: 10.1371/journal.pgen.1008175. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Enciso-Romero J, Pardo-Díaz C, Martin SH, Arias CF, Linares M, McMillan WO, Jiggins CD, Salazar C. Evolution of novel mimicry rings facilitated by adaptive introgression in tropical butterflies. Molecular Ecology. 2017;26:5160–5172. doi: 10.1111/mec.14277. [DOI] [PubMed] [Google Scholar]
  19. Flagel L, Brandvain Y, Schrider DR. The unreasonable effectiveness of convolutional neural networks in population genetic inference. Molecular Biology and Evolution. 2019;36:220–238. doi: 10.1093/molbev/msy224. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Frazer KA, Ballinger DG, Cox DR, Hinds DA, Stuve LL, Gibbs RA, Belmont JW, Boudreau A, Hardenbol P, Leal SM, Pasternak S, Wheeler DA, Willis TD, Yu F, Yang H, Zeng C, Gao Y, Hu H, Hu W, Li C, Lin W, Liu S, Pan H, Tang X, Wang J, Wang W, Yu J, Zhang B, Zhang Q, Zhao H, Zhao H, Zhou J, Gabriel SB, Barry R, Blumenstiel B, Camargo A, Defelice M, Faggart M, Goyette M, Gupta S, Moore J, Nguyen H, Onofrio RC, Parkin M, Roy J, Stahl E, Winchester E, Ziaugra L, Altshuler D, Shen Y, Yao Z, Huang W, Chu X, He Y, Jin L, Liu Y, Shen Y, Sun W, Wang H, Wang Y, Wang Y, Xiong X, Xu L, Waye MM, Tsui SK, Xue H, Wong JT, Galver LM, Fan JB, Gunderson K, Murray SS, Oliphant AR, Chee MS, Montpetit A, Chagnon F, Ferretti V, Leboeuf M, Olivier JF, Phillips MS, Roumy S, Sallée C, Verner A, Hudson TJ, Kwok PY, Cai D, Koboldt DC, Miller RD, Pawlikowska L, Taillon-Miller P, Xiao M, Tsui LC, Mak W, Song YQ, Tam PK, Nakamura Y, Kawaguchi T, Kitamoto T, Morizono T, Nagashima A, Ohnishi Y, Sekine A, Tanaka T, Tsunoda T, Deloukas P, Bird CP, Delgado M, Dermitzakis ET, Gwilliam R, Hunt S, Morrison J, Powell D, Stranger BE, Whittaker P, Bentley DR, Daly MJ, de Bakker PI, Barrett J, Chretien YR, Maller J, McCarroll S, Patterson N, Pe'er I, Price A, Purcell S, Richter DJ, Sabeti P, Saxena R, Schaffner SF, Sham PC, Varilly P, Altshuler D, Stein LD, Krishnan L, Smith AV, Tello-Ruiz MK, Thorisson GA, Chakravarti A, Chen PE, Cutler DJ, Kashuk CS, Lin S, Abecasis GR, Guan W, Li Y, Munro HM, Qin ZS, Thomas DJ, McVean G, Auton A, Bottolo L, Cardin N, Eyheramendy S, Freeman C, Marchini J, Myers S, Spencer C, Stephens M, Donnelly P, Cardon LR, Clarke G, Evans DM, Morris AP, Weir BS, Tsunoda T, Mullikin JC, Sherry ST, Feolo M, Skol A, Zhang H, Zeng C, Zhao H, Matsuda I, Fukushima Y, Macer DR, Suda E, Rotimi CN, Adebamowo CA, Ajayi I, Aniagwu T, Marshall PA, Nkwodimmah C, Royal CD, Leppert MF, Dixon M, Peiffer A, Qiu R, Kent A, Kato K, Niikawa N, Adewole IF, Knoppers BM, Foster MW, Clayton EW, Watkin J, Gibbs RA, Belmont JW, Muzny D, Nazareth L, Sodergren E, Weinstock GM, Wheeler DA, Yakub I, Gabriel SB, Onofrio RC, Richter DJ, Ziaugra L, Birren BW, Daly MJ, Altshuler D, Wilson RK, Fulton LL, Rogers J, Burton J, Carter NP, Clee CM, Griffiths M, Jones MC, McLay K, Plumb RW, Ross MT, Sims SK, Willey DL, Chen Z, Han H, Kang L, Godbout M, Wallenburg JC, L'Archevêque P, Bellemare G, Saeki K, Wang H, An D, Fu H, Li Q, Wang Z, Wang R, Holden AL, Brooks LD, McEwen JE, Guyer MS, Wang VO, Peterson JL, Shi M, Spiegel J, Sung LM, Zacharia LF, Collins FS, Kennedy K, Jamieson R, Stewart J, International HapMap Consortium A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. doi: 10.1038/nature06258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Gittelman RM, Schraiber JG, Vernot B, Mikacenic C, Wurfel MM, Akey JM. Archaic Hominin Admixture Facilitated Adaptation to Out-of-Africa Environments. Current Biology : CB. 2016;26:3375–3382. doi: 10.1016/j.cub.2016.10.041. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Gower G. Predicts adaptive introgression using a CNN trained on genotype matrices. 7a51abdGitHub. 2021 https://github.com/grahamgower/genomatnn
  23. Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, Hansen NF, Durand EY, Malaspinas AS, Jensen JD, Marques-Bonet T, Alkan C, Prüfer K, Meyer M, Burbano HA, Good JM, Schultz R, Aximu-Petri A, Butthof A, Höber B, Höffner B, Siegemund M, Weihmann A, Nusbaum C, Lander ES, Russ C, Novod N, Affourtit J, Egholm M, Verna C, Rudan P, Brajkovic D, Kucan Ž, Gušic I, Doronichev VB, Golovanova LV, Lalueza-Fox C, de la Rasilla M, Fortea J, Rosas A, Schmitz RW, Johnson PLF, Eichler EE, Falush D, Birney E, Mullikin JC, Slatkin M, Nielsen R, Kelso J, Lachmann M, Reich D, Pääbo S. A draft sequence of the Neandertal genome. Science. 2010;328:710–722. doi: 10.1126/science.1188021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Guo C, Pleiss G, Sun Y, Weinberger KQ. On calibration of modern neural networks. arXiv. 2017 https://arxiv.org/abs/1706.04599
  25. Haller BC, Galloway J, Kelleher J, Messer PW, Ralph PL. Tree-sequence recording in SLiM opens new horizons for forward-time simulation of whole genomes. Molecular Ecology Resources. 2019;19:552–566. doi: 10.1111/1755-0998.12968. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Haller BC, Messer PW. SLiM 3: Forward Genetic Simulations Beyond the Wright-Fisher Model. Molecular Biology and Evolution. 2019;36:632–637. doi: 10.1093/molbev/msy228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Harris K, Nielsen R. The Genetic Cost of Neanderthal Introgression. Genetics. 2016;203:881–891. doi: 10.1534/genetics.116.186890. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Hawks J, Cochran G. Dynamics of adaptive introgression from archaic to modern humans. PaleoAnthropology. 2006;2006:101–115. [Google Scholar]
  29. Hendrick MF, Finseth FR, Mathiasson ME, Palmer KA, Broder EM, Breigenzer P, Fishman L. The genetics of extreme microgeographic adaptation: an integrated approach identifies a major gene underlying leaf trichome divergence in Yellowstone Mimulus guttatus. Molecular Ecology. 2016;25:5647–5662. doi: 10.1111/mec.13753. [DOI] [PubMed] [Google Scholar]
  30. Hermisson J, Pennings PS. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics. 2005;169:2335–2352. doi: 10.1534/genetics.104.036947. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Hossain S, Takatori A, Nakamura Y, Suenaga Y, Kamijo T, Nakagawara A. NLRR1 enhances EGF-mediated MYCN induction in neuroblastoma and accelerates tumor growth in vivo. Cancer Research. 2012;72:4587–4596. doi: 10.1158/0008-5472.CAN-12-0943. [DOI] [PubMed] [Google Scholar]
  32. Hubisz MJ, Williams AL, Siepel A. Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph. PLOS Genetics. 2020;16:e1008895. doi: 10.1371/journal.pgen.1008895. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Huerta-Sánchez E, Jin X, Asan, Bianba Z, Peter BM, Vinckenbosch N, Liang Y, Yi X, He M, Somel M, Ni P, Wang B, Ou X, Huasang, Luosang J, Cuo ZX, Li K, Gao G, Yin Y, Wang W, Zhang X, Xu X, Yang H, Li Y, Wang J, Wang J, Nielsen R. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature. 2014;512:194. doi: 10.1038/nature13408. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Isildak U, Stella A, Fumagalli M. Distinguishing between recent balancing selection and incomplete sweep using deep neural networks. Molecular Ecology Resources. 2021;1:13379. doi: 10.1111/1755-0998.13379. [DOI] [PubMed] [Google Scholar]
  35. Jacobs GS, Hudjashov G, Saag L, Kusuma P, Darusallam CC, Lawson DJ, Mondal M, Pagani L, Ricaut FX, Stoneking M, Metspalu M, Sudoyo H, Lansing JS, Cox MP. Multiple deeply divergent denisovan ancestries in papuans. Cell. 2019;177:1010–1021. doi: 10.1016/j.cell.2019.02.035. [DOI] [PubMed] [Google Scholar]
  36. Jones MR, Mills LS, Alves PC, Callahan CM, Alves JM, Lafferty DJR, Jiggins FM, Jensen JD, Melo-Ferreira J, Good JM. Adaptive introgression underlies polymorphic seasonal camouflage in snowshoe hares. Science. 2018;360:1355–1358. doi: 10.1126/science.aar5273. [DOI] [PubMed] [Google Scholar]
  37. Jonsson M, Björntorp Mark E, Brantsing C, Brandner JM, Lindahl A, Asp J. Hash4, a novel human achaete-scute homologue found in fetal skin. Genomics. 2004;84:859–866. doi: 10.1016/j.ygeno.2004.07.004. [DOI] [PubMed] [Google Scholar]
  38. Juric I, Aeschbacher S, Coop G. The Strength of Selection against Neanderthal Introgression. PLOS Genetics. 2016;12:e1006340. doi: 10.1371/journal.pgen.1006340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Kelleher J, Etheridge AM, McVean G. Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes. PLOS Computational Biology. 2016;12:e1004842. doi: 10.1371/journal.pcbi.1004842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Kelleher J, Thornton KR, Ashander J, Ralph PL. Efficient pedigree recording for fast population genetics simulation. PLOS Computational Biology. 2018;14:e1006581. doi: 10.1371/journal.pcbi.1006581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Kim BY, Huber CD, Lohmueller KE. Inference of the Distribution of Selection Coefficients for New Nonsynonymous Mutations Using Large Samples. Genetics. 2017;206:345–361. doi: 10.1534/genetics.116.197145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Kim BY, Huber CD, Lohmueller KE. Deleterious variation shapes the genomic landscape of introgression. PLOS Genetics. 2018;14:e1007741. doi: 10.1371/journal.pgen.1007741. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Kiper POS, Saito H, Gori F, Unger S, Hesse E, Yamana K, Kiviranta R, Solban N, Liu J, Brommage R, Boduroglu K, Bonafé L, Campos-Xavier B, Dikoglu E, Eastell R, Gossiel F, Harshman K, Nishimura G, Girisha KM, Stevenson BJ, Takita H, Rivolta C, Superti-Furga A, Baron R. Cortical-Bone fragility--insights from sFRP4 deficiency in Pyle's Disease. New England Journal of Medicine. 2016;374:2553–2562. doi: 10.1056/NEJMoa1509342. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Korneliussen TS, Albrechtsen A, Nielsen R. ANGSD: Analysis of Next Generation Sequencing Data. BMC Bioinformatics. 2014;15:356. doi: 10.1186/s12859-014-0356-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems; 2012. pp. 84–90. [Google Scholar]
  46. Kubo M, Hata J, Ninomiya T, Matsuda K, Yonemoto K, Nakano T, Matsushita T, Yamazaki K, Ohnishi Y, Saito S, Kitazono T, Ibayashi S, Sueishi K, Iida M, Nakamura Y, Kiyohara Y. A nonsynonymous SNP in PRKCH (protein kinase C eta) increases the risk of cerebral infarction. Nature Genetics. 2007;39:212–217. doi: 10.1038/ng1945. [DOI] [PubMed] [Google Scholar]
  47. Kubota Y. tf-keras-vis. 2020 https://github.com/keisen/tf-keras-vis
  48. Kuhlwilm M, Gronau I, Hubisz MJ, de Filippo C, Prado-Martinez J, Kircher M, Fu Q, Burbano HA, Lalueza-Fox C, de la Rasilla M, Rosas A, Rudan P, Brajkovic D, Kucan Ž, Gušic I, Marques-Bonet T, Andrés AM, Viola B, Pääbo S, Meyer M, Siepel A, Castellano S. Ancient gene flow from early modern humans into Eastern Neanderthals. Nature. 2016;530:429–433. doi: 10.1038/nature16544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Kull M, Filho TS, Flach P. Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers. Proceedings of the 20th International Conference on Artificial Intelligence and Statistics; 2017. pp. 623–631. [Google Scholar]
  50. LeCun Y, Bengio Y. Convolutional networks for images, speech, and time series. In: Arbib M. A, editor. The Handbook of Brain Theory and Neural Networks. MIT Press; 1995. pp. 255–258. [Google Scholar]
  51. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Mahdi T, Hänzelmann S, Salehi A, Muhammed SJ, Reinbothe TM, Tang Y, Axelsson AS, Zhou Y, Jing X, Almgren P, Krus U, Taneera J, Blom AM, Lyssenko V, Esguerra JL, Hansson O, Eliasson L, Derry J, Zhang E, Wollheim CB, Groop L, Renström E, Rosengren AH. Secreted frizzled-related protein 4 reduces insulin secretion and is overexpressed in type 2 diabetes. Cell Metabolism. 2012;16:625–633. doi: 10.1016/j.cmet.2012.10.009. [DOI] [PubMed] [Google Scholar]
  53. Malaspinas AS, Westaway MC, Muller C, Sousa VC, Lao O, Alves I, Bergström A, Athanasiadis G, Cheng JY, Crawford JE, Heupink TH, Macholdt E, Peischl S, Rasmussen S, Schiffels S, Subramanian S, Wright JL, Albrechtsen A, Barbieri C, Dupanloup I, Eriksson A, Margaryan A, Moltke I, Pugach I, Korneliussen TS, Levkivskyi IP, Moreno-Mayar JV, Ni S, Racimo F, Sikora M, Xue Y, Aghakhanian FA, Brucato N, Brunak S, Campos PF, Clark W, Ellingvåg S, Fourmile G, Gerbault P, Injie D, Koki G, Leavesley M, Logan B, Lynch A, Matisoo-Smith EA, McAllister PJ, Mentzer AJ, Metspalu M, Migliano AB, Murgha L, Phipps ME, Pomat W, Reynolds D, Ricaut FX, Siba P, Thomas MG, Wales T, Wall CM, Oppenheimer SJ, Tyler-Smith C, Durbin R, Dortch J, Manica A, Schierup MH, Foley RA, Lahr MM, Bowern C, Wall JD, Mailund T, Stoneking M, Nielsen R, Sandhu MS, Excoffier L, Lambert DM, Willerslev E. A genomic history of Aboriginal Australia. Nature. 2016;538:207–214. doi: 10.1038/nature18299. [DOI] [PubMed] [Google Scholar]
  54. Martin SH, Davey JW, Jiggins CD. Evaluating the use of ABBA-BABA statistics to locate introgressed loci. Molecular Biology and Evolution. 2015;32:244–257. doi: 10.1093/molbev/msu269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica Et Biophysica Acta (BBA) - Protein Structure. 1975;405:442–451. doi: 10.1016/0005-2795(75)90109-9. [DOI] [PubMed] [Google Scholar]
  56. Ménasché G, Pastural E, Feldmann J, Certain S, Ersoy F, Dupuis S, Wulffraat N, Bianchi D, Fischer A, Le Deist F, de Saint Basile G. Mutations in RAB27A cause Griscelli syndrome associated with haemophagocytic syndrome. Nature Genetics. 2000;25:173–176. doi: 10.1038/76024. [DOI] [PubMed] [Google Scholar]
  57. Meyer M, Kircher M, Gansauge MT, Li H, Racimo F, Mallick S, Schraiber JG, Jay F, Prüfer K, de Filippo C, Sudmant PH, Alkan C, Fu Q, Do R, Rohland N, Tandon A, Siebauer M, Green RE, Bryc K, Briggs AW, Stenzel U, Dabney J, Shendure J, Kitzman J, Hammer MF, Shunkov MV, Derevianko AP, Patterson N, Andrés AM, Eichler EE, Slatkin M, Reich D, Kelso J, Pääbo S. A high-coverage genome sequence from an archaic Denisovan individual. Science. 2012;338:222–226. doi: 10.1126/science.1224344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Miao B, Wang Z, Li Y. Genomic analysis reveals hypoxia adaptation in the tibetan mastiff by introgression of the gray wolf from the tibetan plateau. Molecular Biology and Evolution. 2017;34:734–743. doi: 10.1093/molbev/msw274. [DOI] [PubMed] [Google Scholar]
  59. Mondal M, Bertranpetit J, Lao O. Approximate Bayesian computation with deep learning supports a third archaic introgression in Asia and Oceania. Nature Communications. 2019;10:246. doi: 10.1038/s41467-018-08089-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Morison IM, Cramer Bordé EM, Cheesman EJ, Cheong PL, Holyoake AJ, Fichelson S, Weeks RJ, Lo A, Davies SM, Wilbanks SM, Fagerlund RD, Ludgate MW, da Silva Tatley FM, Coker MS, Bockett NA, Hughes G, Pippig DA, Smith MP, Capron C, Ledgerwood EC. A mutation of human cytochrome c enhances the intrinsic apoptotic pathway but causes only thrombocytopenia. Nature Genetics. 2008;40:387–389. doi: 10.1038/ng.103. [DOI] [PubMed] [Google Scholar]
  61. Norris LC, Main BJ, Lee Y, Collier TC, Fofana A, Cornel AJ, Lanzaro GC. Adaptive introgression in an African malaria mosquito coincident with the increased usage of insecticide-treated bed nets. PNAS. 2015;112:815–820. doi: 10.1073/pnas.1418892112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Pardo-Diaz C, Salazar C, Baxter SW, Merot C, Figueiredo-Ready W, Joron M, McMillan WO, Jiggins CD. Adaptive introgression across species boundaries in Heliconius butterflies. PLOS Genetics. 2012;8:e1002752. doi: 10.1371/journal.pgen.1002752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Platt JC. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers; 1999. pp. 61–74. [Google Scholar]
  64. Prüfer K, Racimo F, Patterson N, Jay F, Sankararaman S, Sawyer S, Heinze A, Renaud G, Sudmant PH, de Filippo C, Li H, Mallick S, Dannemann M, Fu Q, Kircher M, Kuhlwilm M, Lachmann M, Meyer M, Ongyerth M, Siebauer M, Theunert C, Tandon A, Moorjani P, Pickrell J, Mullikin JC, Vohr SH, Green RE, Hellmann I, Johnson PL, Blanche H, Cann H, Kitzman JO, Shendure J, Eichler EE, Lein ES, Bakken TE, Golovanova LV, Doronichev VB, Shunkov MV, Derevianko AP, Viola B, Slatkin M, Reich D, Kelso J, Pääbo S. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature. 2014;505:43–49. doi: 10.1038/nature12886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  65. Prüfer K, de Filippo C, Grote S, Mafessoni F, Korlević P, Hajdinjak M, Vernot B, Skov L, Hsieh P, Peyrégne S, Reher D, Hopfe C, Nagel S, Maricic T, Fu Q, Theunert C, Rogers R, Skoglund P, Chintalapati M, Dannemann M, Nelson BJ, Key FM, Rudan P, Kućan Ž, Gušić I, Golovanova LV, Doronichev VB, Patterson N, Reich D, Eichler EE, Slatkin M, Schierup MH, Andrés AM, Kelso J, Meyer M, Pääbo S. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science. 2017;358:655–658. doi: 10.1126/science.aao1887. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Racimo F, Sankararaman S, Nielsen R, Huerta-Sánchez E. Evidence for archaic adaptive introgression in humans. Nature Reviews. Genetics. 2015;16:359. doi: 10.1038/nrg3936. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Racimo F, Gokhman D, Fumagalli M, Ko A, Hansen T, Moltke I, Albrechtsen A, Carmel L, Huerta-Sánchez E, Nielsen R. Archaic Adaptive Introgression in TBX15/WARS2. Molecular Biology and Evolution. 2017a;34:509–524. doi: 10.1093/molbev/msw283. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Racimo F, Marnetto D, Huerta-Sánchez E. Signatures of Archaic Adaptive Introgression in Present-Day Human Populations. Molecular Biology and Evolution. 2017b;34:296–317. doi: 10.1093/molbev/msw216. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Ragsdale AP, Gravel S. Models of archaic admixture and recent history from two-locus statistics. PLOS Genetics. 2019;15:e1008204. doi: 10.1371/journal.pgen.1008204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Reich D, Green RE, Kircher M, Krause J, Patterson N, Durand EY, Viola B, Briggs AW, Stenzel U, Johnson PL, Maricic T, Good JM, Marques-Bonet T, Alkan C, Fu Q, Mallick S, Li H, Meyer M, Eichler EE, Stoneking M, Richards M, Talamo S, Shunkov MV, Derevianko AP, Hublin JJ, Kelso J, Slatkin M, Pääbo S. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature. 2010;468:1053. doi: 10.1038/nature09710. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Sanchez T, Cury J, Charpiat G, Jay F. Deep learning for population size history inference: design, comparison and combination with approximate bayesian computation. Molecular Ecology Resources. 2021;1:13224. doi: 10.1111/1755-0998.13224. [DOI] [PubMed] [Google Scholar]
  72. Sankararaman S, Mallick S, Dannemann M, Prüfer K, Kelso J, Pääbo S, Patterson N, Reich D. The genomic landscape of Neanderthal ancestry in present-day humans. Nature. 2014;507:354–357. doi: 10.1038/nature12961. [DOI] [PMC free article] [PubMed] [Google Scholar]
  73. Sankararaman S, Mallick S, Patterson N, Reich D. The Combined Landscape of Denisovan and Neanderthal Ancestry in Present-Day Humans. Current Biology : CB. 2016;26:1241–1247. doi: 10.1016/j.cub.2016.03.037. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Schrider DR, Kern AD. Supervised Machine Learning for Population Genetics: A New Paradigm. Trends in Genetics : TIG. 2018;34:301–312. doi: 10.1016/j.tig.2017.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Scott TM, Guo H, Eichler EE, Rosenfeld JA, Pang K, Liu Z, Lalani S, Bi W, Yang Y, Bacino CA, Streff H, Lewis AM, Koenig MK, Thiffault I, Bellomo A, Everman DB, Jones JR, Stevenson RE, Bernier R, Gilissen C, Pfundt R, Hiatt SM, Cooper GM, Holder JL, Scott DA. BAZ2B haploinsufficiency as a cause of developmental delay, intellectual disability, and autism spectrum disorder. Human Mutation. 2020;41:921–925. doi: 10.1002/humu.23992. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Setter D, Mousset S, Cheng X, Nielsen R, DeGiorgio M, Hermisson J. VolcanoFinder: Genomic scans for adaptive introgression. PLOS Genetics. 2020;16:e1008867. doi: 10.1371/journal.pgen.1008867. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Sheehan S, Song YS. Deep Learning for Population Genetic Inference. PLOS Computational Biology. 2016;12:e1004845. doi: 10.1371/journal.pcbi.1004845. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv. 2014 https://arxiv.org/abs/1312.6034
  79. Speidel L, Forest M, Shi S, Myers SR. A method for genome-wide genealogy estimation for thousands of samples. Nature Genetics. 2019;51:1321–1329. doi: 10.1038/s41588-019-0484-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M. Striving for simplicity: the all convolutional net. arXiv. 2015 https://arxiv.org/abs/1412.6806
  81. Steinrücken M, Spence JP, Kamm JA, Wieczorek E, Song YS. Model-based detection and analysis of introgressed neanderthal ancestry in modern humans. Molecular Ecology. 2018;27:3873–3888. doi: 10.1111/mec.14565. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Suarez-Gonzalez A, Hefer CA, Christe C, Corea O, Lexer C, Cronk QC, Douglas CJ. Genomic and functional approaches reveal a case of adaptive introgression from Populus balsamifera (balsam poplar) in P. trichocarpa (black cottonwood) Molecular Ecology. 2016;25:2427–2442. doi: 10.1111/mec.13539. [DOI] [PubMed] [Google Scholar]
  83. Tian X, Browning BL, Browning SR. Estimating the Genome-wide Mutation Rate with Three-Way Identity by Descent. American Journal of Human Genetics. 2019;105:883–893. doi: 10.1016/j.ajhg.2019.09.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Torada L, Lorenzon L, Beddis A, Isildak U, Pattini L, Mathieson S, Fumagalli M. ImaGene: a convolutional neural network to quantify natural selection from genomic data. BMC Bioinformatics. 2019;20:337. doi: 10.1186/s12859-019-2927-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Turner R, Hung J, Frank E, Saatci Y, Yosinski J. Metropolis-Hastings generative adversarial networks. arXiv. 2019 https://arxiv.org/abs/1811.11357
  86. Uchiyama Y, Yanagisawa K, Kunishima S, Shiina M, Ogawa Y, Nakashima M, Hirato J, Imagawa E, Fujita A, Hamanaka K, Miyatake S, Mitsuhashi S, Takata A, Miyake N, Ogata K, Handa H, Matsumoto N, Mizuguchi T. A novel CYCS mutation in the α-helix of the CYCS C-terminal domain causes non-syndromic thrombocytopenia. Clinical Genetics. 2018;94:548–553. doi: 10.1111/cge.13423. [DOI] [PubMed] [Google Scholar]
  87. Veeramah KR, Johnstone L, Karafet TM, Wolf D, Sprissler R, Salogiannis J, Barth-Maron A, Greenberg ME, Stuhlmann T, Weinert S, Jentsch TJ, Pazzi M, Restifo LL, Talwar D, Erickson RP, Hammer MF. Exome sequencing reveals new causal mutations in children with epileptic encephalopathies. Epilepsia. 2013;54:1270–1281. doi: 10.1111/epi.12201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Vernot B, Tucci S, Kelso J, Schraiber JG, Wolf AB, Gittelman RM, Dannemann M, Grote S, McCoy RC, Norton H, Scheinfeldt LB, Merriwether DA, Koki G, Friedlaender JS, Wakefield J, Pääbo S, Akey JM. Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals. Science. 2016;352:235–239. doi: 10.1126/science.aad9416. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Vernot B, Akey JM. Resurrecting surviving Neandertal lineages from modern human genomes. Science. 2014;343:1017–1021. doi: 10.1126/science.1245938. [DOI] [PubMed] [Google Scholar]
  90. Villanea FA, Schraiber JG. Multiple episodes of interbreeding between Neanderthal and modern humans. Nature Ecology & Evolution. 2019;3:39. doi: 10.1038/s41559-018-0735-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  91. Wang Z, Wang J, Kourakos M, Hoang N, Lee HH, Mathieson I, Mathieson S. Automatic inference of demographic parameters using generative adversarial networks. bioRxiv. 2020 doi: 10.1101/2020.08.05.237834. [DOI] [PMC free article] [PubMed]
  92. Whitney KD, Randell RA, Rieseberg LH. Adaptive introgression of herbivore resistance traits in the weedy sunflower Helianthus annuus. The American Naturalist. 2006;167:794–807. doi: 10.1086/504606. [DOI] [PubMed] [Google Scholar]
  93. Yang XH, Huang S. PFM1 (PRDM4), a new member of the PR-domain family, maps to a tumor suppressor locus on human chromosome 12q23-q24.1. Genomics. 1999;61:319–325. doi: 10.1006/geno.1999.5967. [DOI] [PubMed] [Google Scholar]
  94. Yoon J, Jordon J, van der Schaar M. INVASE: instance-wise variable selection using neural networks. International Conference on Learning Representations; 2019. pp. 1–24. [Google Scholar]
  95. Zammit NW, Siggs OM, Gray PE, Horikawa K, Langley DB, Walters SN, Daley SR, Loetsch C, Warren J, Yap JY, Cultrone D, Russell A, Malle EK, Villanueva JE, Cowley MJ, Gayevskiy V, Dinger ME, Brink R, Zahra D, Chaudhri G, Karupiah G, Whittle B, Roots C, Bertram E, Yamada M, Jeelall Y, Enders A, Clifton BE, Mabbitt PD, Jackson CJ, Watson SR, Jenne CN, Lanier LL, Wiltshire T, Spitzer MH, Nolan GP, Schmitz F, Aderem A, Porebski BT, Buckle AM, Abbott DW, Ziegler JB, Craig ME, Benitez-Aguirre P, Teo J, Tangye SG, King C, Wong M, Cox MP, Phung W, Tang J, Sandoval W, Wertz IE, Christ D, Goodnow CC, Grey ST. Denisovan, modern human and mouse TNFAIP3 alleles tune A20 phosphorylation and immunity. Nature Immunology. 2019;20:1299–1310. doi: 10.1038/s41590-019-0492-0. [DOI] [PubMed] [Google Scholar]
  96. Zarr Development Team 2.4.0Zarr. 2020 https://zarr.readthedocs.io/en/stable/
  97. Zhang X, Kim B, Lohmueller KE, Huerta-Sánchez E. The Impact of Recessive Deleterious Variation on Signals of Adaptive Introgression in Human Populations. Genetics. 2020;215:799–812. doi: 10.1534/genetics.120.303081. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: George H Perry1
Reviewed by: Diego Ortega Del Vecchyo2

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

This paper describes a novel approach for detecting adaptive introgression using a deep neural network. The authors demonstrate that their method can accurately detect adaptive introgression under a number of scenarios, and they apply the method to find loci where modern humans may have received beneficial alleles from Neandertals and Denisovans. This application of a convolutional neural network to detect events of adaptive introgression represents an excellent contribution to the field of population and evolutionary genomics.

Decision letter after peer review:

Thank you for submitting your article "Detecting adaptive introgression in human evolution using convolutional neural networks" for consideration by eLife. Your article has been reviewed by 3 peer reviewers, and the evaluation has been overseen by George Perry as the Senior Editor and Reviewing Editor. The following individual involved in review of your submission has agreed to reveal their identity: Diego Ortega Del Vecchyo (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

We would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). Specifically, we are asking editors to accept without delay manuscripts, like yours, that they judge can stand as eLife papers without additional data, even if they feel that they would make the manuscript stronger. Thus the revisions requested below only address clarity and presentation.

Summary:

This paper describes a novel approach for detecting adaptive introgression using a deep neural network. The authors demonstrate that their method can accurately detect adaptive introgression under a number of scenarios, and they apply the method to find loci where moderns humans may have received beneficial alleles from Neanderthals and Denisovans. This application of a convolutional neural network to detect events of adaptive introgression represents an excellent contribution to the field of population and evolutionary genomics.

Essential revisions:

I highlight three points of essential revision, and then include the full reviews below which contain additional context on these points as well as a number of other excellent suggestions for your consideration as you revise your paper.

1) Model misidentification. You should test a few more scenarios in which there is demographic model misspecification. For example, you assume that the Yoruba are an 'unadmixed sister population'. Yet recent papers have pointed out that this assumption is potentially incorrect on timescales relevant for your analysis. How does your method perform despite such potential model misidentification? See the individual reviews for several other specific scenarios that would also be ideal examples to test.

2) Comparison with previous methods. The reviewers point out several previous methods to detect adaptive introgression. Please perform a direct comparison of results obtained between your and the previously available approaches.

3) Select a method other than guided propagation for the saliency map, and then evaluate the data along the lines of the detailed suggestions provided by reviewer 3.

Reviewer #1:

This manuscript, "Detecting adaptive introgression in human evolution using convolutional neural networks" by Gower et al., proposes a novel approach toward detecting adaptive introgression using a deep neural network. The paper is well written, the authors have taken care to ensure reproducibility and code availability. This method joins a rapidly growing group of deep learning tools for population genetic analysis, and the growing body of evidence that these methods represent an important step forward for the field.

1) I do think that it is essential that the authors compare the power of their method to some existing methods. I know that not all approaches use the exact same information as genomatnn, but I do think that the authors could easily compare power to some of the statistics from Racimo et al., 2017 MBE paper. This would at least offer some context to help readers gauge the level of advance offered by this method.

2) As the authors point out, their accuracy can go down substantially in the case of model misspecification. But they have only examined one scenario--training on Model A and applying to Model B--and this might be unrealistically pessimistic. It would therefore be helpful to see what happens for more modest amounts of misspecification. For example, the authors could sample a few different areas of the parameter space between Model A and Model B and see what happens when the model gets more and more mis-specified. This would give the reader a better feel for how good a demographic model estimate needs to be to use the genomeatnn method in practice.

Reviewer #2:

The authors have developed a new method to detect adaptive introgression events using convolutional neural networks (CNN). To use the CNN's, the authors create a training set consisting of 100 kb region simulations where three different scenarios could take place: an adaptive introgression event, a de novo mutation undergoing a sweep without introgression and a scenario without advantageous mutations. The authors use these simulations to train their CNN's to be able to differentiate between an adaptive introgression event happening in the 100 kb region from the latter two scenarios. The methodology presented by the authors includes innovations in the form of codifying the data to run the CNN´s and the CNN architecture used to solve this problem. The authors show that their methodology is able to perform very accurate classifications of regions undergoing adaptive introgression events in realistic human demographic scenarios. Finally, the authors show an application of their method to identify regions undergoing adaptive introgression events from Neanderthals to Europeans, and from Denisovans to Melanesians.

Convolutional neural networks have been recently applied to efficiently solve a variety of problems in population genetics. The application presented by the authors to detect events of adaptive introgression is an excellent and necessary contribution to the field. The manuscript is very well written, and the methods are clearly explained. The method is very robust and I can see that it will be applied to other species as genomic data becomes available. I only have a few minor comments about this manuscript.

The authors assume that YRI is an 'unadmixed sister population'. However, African populations also had introgression with another archaic ghost population (as reported by Ragsdale and Gravel (2019) PLoS Genetics; Durvasula and Sankararaman (2020), Science Advances). Would this have an impact on the detection of introgressed segments from Neanderthals to Europeans or Denisova to Melanesian populations using the method developed by the authors?

Reviewer #3:

The authors present in this work a new method based on convolutional neural network to infer adaptive introgression in human population. They trained their network on two types of scenarios with different demographic models. They show that the method works well.

The authors propose interesting features for their network which I think will be of interest of the population genomic and deep learning community.

They also advertise their method as being one of the few to do adaptive introgression inference.

The method is constrained by a demographic model.

The network takes as input a genotype matrix, with sorted and ordered individuals, and m bins in a window of 100kb.

The network is trained as a classification task to say whether the window of interest corresponds to adaptive introgression or not. The output of the sigmoid function (last layer of the CNN), is then used as the probability of AI in the given window.

This probability is then calibrated to take into account the fact that on real world dataset, the categories used in the training phase will likely not match the relative frequency of 100kb region under neutrality, selective sweep or AI.

Finally, the authors applies their method on real dataset and propose new gene as candidate for archaic adaptive introgression.

1. The authors suggest that few methods exist for this task, however, I believe this would be of interest to know how existing methods perform (such as the one from Setter et al., 2020) compared to genomatnn, if possible.

2. The results are good (>95% recall, Figure 2) for scenarios with high selection coefficient and/or early time of onset. The same as Figure 2B but with the precision in addition to recall might be interesting to know whether good precision is found in the same space or in another one (e.g. low selection coeff and late time of onset). Besides, the results seem quite affected by the coefficient of selection, time of onset, and also time of gene flow. Again, having a comparison with other method might help to assess whether the results presented here are good or not, in terms of method. Could this be calibrated as well, or given more weight in the training to improve accuracy for low selection coefficient for instance?

3. Authors do not use a test set. How can we be sure that the author did not "overfit" (unconsciously) the validation dataset after trying different hyperparameters? Showing result on a test set is better practice.

4. The attention analysis should be improved. First, guided backpropagation has been shown to not pass sanity check (Adebayo et al., "Sanity check for saliency maps", NIPS, 2018). It was shown that guided backpropagation highlight pixels that are independent of both the data used for training and the model parameters, and this method should not be used for explanation and interpretation tasks. In this paper, guided backpropagation appears to work merely as an edge detector. I wonder whether this is what is captured by the vertical lines we see in the saliency map that would correspond to clusters made by the rearrangement of the individuals. So I would highly recommend the author to choose another method for the saliency map (that passes sanity check). With the new saliency maps, if there is a signal localized along the SNP dimension, it would be interesting to know whether this actually due to the fact that the beneficial mutation was in the middle of the window, or caused by some other artifact. To do so, the authors could re-simulate scenarios that were easily well classed (e.g. with high selection coefficient and early time of onset), but with the mutation not in the middle.

5. I do no understand why the author propose to study the results with 2 cut-offs of 5 and 25% for the beneficial allele. Also only the 25% cut-off is discussed at the end. So why a cut-off in the first place? Then, why two? And finally, why showing and discussing the results only with 25%? For instance, table 1 and S2 have only 5 regions in common out of 25. How the reader should interpret that?

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

Thank you for resubmitting your work entitled "Detecting adaptive introgression in human evolution using convolutional neural networks" for further consideration by eLife. Your revised article has been evaluated by George Perry (Senior Editor) and a Reviewing Editor.

All reviewers expressed appreciation for your hard work and thorough revisions. We have just two relatively small remaining requests before we can consider accepting your paper for publication in eLife:

1. You now state in the text: "We note, however, that, as with previous methods, visual inspection of the haplotypes or genotypes of the top candidate regions remains a necessary criterion to accurately assess whether a region may have been under adaptive introgression." We would also like for you to briefly discuss what to do with regions classified as AI are not among the "top candidates". These cases might be harder to validate by visual inspection. Do you think that these cases should be discarded? Or can you share any other insights on how to tell whether these AI regions could be legitimate, and whether users could use this tool to detect the impact of AI across the genome more broadly than just the "top candidates".

2. One previous reviewer comment concerned the absence of a test set (because having only a validation set might lead to overfitting hyperparameters to that set). In your response to reviewer comments you noted that you actually used different simulations and then simulate a new dataset while writing the paper. However, this clarification does not appear in the main text – please revise the manuscript to describe to the reader that the choice of hyperparameters was not made on the validation set mentioned, but instead during the "Preliminary analysis". Otherwise, the reader could think that the network might be biased.

eLife. 2021 May 25;10:e64669. doi: 10.7554/eLife.64669.sa2

Author response


Essential revisions:

I highlight three points of essential revision, and then include the full reviews below which contain additional context on these points as well as a number of other excellent suggestions for your consideration as you revise your paper.

We have addressed all points below.

1) Model misidentification. You should test a few more scenarios in which there is demographic model misspecification. For example, you assume that the Yoruba are an 'unadmixed sister population'. Yet recent papers have pointed out that this assumption is potentially incorrect on timescales relevant for your analysis. How does your method perform despite such potential model misidentification? See the individual reviews for several other specific scenarios that would also be ideal examples to test.

We now include two evaluations of the method with mis-specified demographic models. We retain the existing strongly mis-specified evaluation, training on Demographic Model A1 (Neanderthal/European) and evaluating using model B (Denisovan/Melanesian). In addition, we evaluated the method on a weakly mis-specified model, training on Model A1 and evaluating using model A2 (an extension to model A1 that also includes archaic admixture in Africa as described in Ragsdale and Gravel, 2019). The Results section now reports:

“We then tested robustness to demographic misspecification, by evaluating the CNN trained on Demographic Model A1 against simulations for two other demographic models (Figure 2—figure supplement 2). We considered weak misspecification, where the true demographic history is similar to Demographic Model A1 but also includes archaic admixture within Africa following Ragsdale and Gravel, (2019) (Demographic Model A2; Figure 1—figure supplement 1). This resulted in only a small performance reduction. We also considered strong misspecification, where the true demographic history is Demographic Model B. As there are more Melanesian individuals than European individuals in our simulations (because we aimed to mimic the real number of genomes available in our data analysis below), we down-sampled the Melanesian genomes to match the number of European genomes, so as to perform a fair misspecification comparison. In this case, the performance of the CNN was noticeably worse than that of the summary statistics, but still better than VolcanoFinder. We note that the summary statistics performance decreased also, to match their performance for the correctly-specified assessments on Demographic Model B. Interestingly, we found that the Q95(1%, 100%) statistic was the most robust method for both cases of misspecification.”

To more clearly explain the demographic models, we have added a supplementary figure (Figure 1—figure supplement 1) that shows diagrams of both Demographic Model A1 and A2 together. A diagram of Demographic Model B previously appeared as Figure S1, which we have now removed in favour of a much improved diagram (now Figure 1—figure supplement 2). Writing a demographic model can be error prone (particularly if one must gather the parameters from supplementary material), and so we also provide Demes-format YAML files implementing the three demographic models, available from the genomatnn repository (https://github.com/grahamgower/genomatnn/tree/main/demographic_models). This format is intended to become a de facto standard, used in stdpopsim and elsewhere (see https://popsim-consortium.github.io/demes-spec-docs/).

2) Comparison with previous methods. The reviewers point out several previous methods to detect adaptive introgression. Please perform a direct comparison of results obtained between your and the previously available approaches.

We have now compared the performance of our method to various summary statistics developed by Martin et al. (2015) and Racimo et al., (2017), as well as to the VolcanoFinder method (Setter and Mousset et al., 2020). We note, however, that the latter is meant for cases of a deeply divergent “ghost” adaptive introgression (where data from the source population is absent), and so comparisons with it may not be as appropriate as with the former methods. The results now report:

“We compared the performance of our CNN to VolcanoFinder (Setter et al. 2020), which scans genomes of the recipient population for "volcanoes" of diversity using a coalescent-based model of adaptive introgression (Figure 2—figure supplement 2). However, this method only incorporates information from a single population and performed poorly for the demographic models considered here---in some cases worse than guessing randomly. We also compared our CNN to an outlier-based approach for a range of summary statistics that are sensitive to adaptive introgression (Racimo et al., 2017). Our CNN is closest to a perfect MCC-F1 score for Demographic model A1 and B, closely followed by the Q95(1% ,100%) and then U(1%, 20%, 100%) statistics developed in Racimo et al., (2017).”

To avoid adding a substantial number of figures to the manuscript, and to make fairer and easier comparisons between methods and the different demographic models (correctly and incorrectly specified), we have included a single multi-panel supplementary figure (Figure 2—figure supplement 2) that shows MCC-F1 curves for a range of scenarios. We’ve replaced the TNR/NPV panel in Figure 2C and Figure 2—figure supplement 1C with an MCC-F1 curve and added a section to the results which introduces the MCC-F1:

“While precision, recall, and false positive rate are informative, these each consider only two of the four cells in a confusion matrix (true positives, true negatives, false positives, false negatives), and may produce a distorted view of performance with imbalanced datasets (Chicco, 2017). To obtain a more robust performance assessment, we plotted the Matthews correlation coefficient (MCC; Matthews, 1975) against F1-score (the harmonic mean of precision and recall) for false-positive-rate thresholds from 0 to 100 (Figure 2, Figure 2—figure supplement 1, Figure 2—figure supplement 2), as recently suggested by Cao et al., (2020). MCC produces low scores unless a classifier has good performance in all four confusion matrix cells, and also accounts for class imbalance. In MCC-F1 space, the point (1, 1) indicates perfect predictions, and values of 0.5 for the (unit-normalised) MCC indicate random guessing. These results confirm our earlier findings, that the CNN performance is excellent for Demographic Model A1 when considering either neutral and sweep simulations as the condition negative, and performance decreases slightly when DFE simulations are the negative condition (Figure 2). Furthermore, the CNN performance is not as good for Demographic Model B, but this is unlikely to be caused by using unphased genotypes (Figure 2—figure supplement 1 and Figure 2—figure supplement 2).”

3) Select a method other than guided propagation for the saliency map, and then evaluate the data along the lines of the detailed suggestions provided by reviewer 3.

We have now switched to using the “vanilla” saliency method originally proposed in Simonyan, Vedaldi and Zisserman, (2014), as provided by the tf-keras-vis python package. The results (Figure 3) are very similar, and our interpretation remains the same after these changes. Additionally, we changed the colour scheme for this figure, to more clearly show the bordering pixels --- these indicate a gradient close to zero, but with the previous colour scheme this could have been mistaken for a black border.

Reviewer #1:

1) I do think that it is essential that the authors compare the power of their method to some existing methods. I know that not all approaches use the exact same information as genomatnn, but I do think that the authors could easily compare power to some of the statistics from Racimo et al., 2017 MBE paper. This would at least offer some context to help readers gauge the level of advance offered by this method.

We have now compared the performance of our method to various summary statistics developed by Martin et al., (2015) and Racimo et al., (2017), as well as to the VolcanoFinder method (Setter and Mousset et al., 2020). These new results are summarised in the Essential revisions section above.

2) As the authors point out, their accuracy can go down substantially in the case of model misspecification. But they have only examined one scenario--training on Model A and applying to Model B--and this might be unrealistically pessimistic. It would therefore be helpful to see what happens for more modest amounts of misspecification. For example, the authors could sample a few different areas of the parameter space between Model A and Model B and see what happens when the model gets more and more mis-specified. This would give the reader a better feel for how good a demographic model estimate needs to be to use the genomeatnn method in practice.

We have now included a more modest model misspecification scenario, in which the model is trained with Model A1, and then tested in a version of Model A2 in which there is also migration with an archaic African lineage (Ragsdale and Gravel, 2019). These new results are summarised in the Essential revisions section above.

Reviewer #2:

The authors have developed a new method to detect adaptive introgression events using convolutional neural networks (CNN). To use the CNN's, the authors create a training set consisting of 100 kb region simulations where three different scenarios could take place: an adaptive introgression event, a de novo mutation undergoing a sweep without introgression and a scenario without advantageous mutations. The authors use these simulations to train their CNN's to be able to differentiate between an adaptive introgression event happening in the 100 kb region from the latter two scenarios. The methodology presented by the authors includes innovations in the form of codifying the data to run the CNN´s and the CNN architecture used to solve this problem. The authors show that their methodology is able to perform very accurate classifications of regions undergoing adaptive introgression events in realistic human demographic scenarios. Finally, the authors show an application of their method to identify regions undergoing adaptive introgression events from Neanderthals to Europeans, and from Denisovans to Melanesians.

Convolutional neural networks have been recently applied to efficiently solve a variety of problems in population genetics. The application presented by the authors to detect events of adaptive introgression is an excellent and necessary contribution to the field. The manuscript is very well written, and the methods are clearly explained. The method is very robust and I can see that it will be applied to other species as genomic data becomes available.

The authors assume that YRI is an 'unadmixed sister population'. However, African populations also had introgression with another archaic ghost population (as reported by Ragsdale and Gravel (2019) PLoS Genetics; Durvasula and Sankararaman (2020), Science Advances). Would this have an impact on the detection of introgressed segments from Neanderthals to Europeans or Denisova to Melanesian populations using the method developed by the authors?

We now also evaluated our CNN using a model that includes archaic admixture into Yoruba ancestors (Figure 1—figure supplement 1; Ragsdale and Gravel, 2019). The CNN performance is decreased only slightly compared to a correctly specified model (Figure 2—figure supplement 2). These new results are summarised in the Essential revisions section above.

Reviewer #3:

The authors present in this work a new method based on convolutional neural network to infer adaptive introgression in human population. They trained their network on two types of scenarios with different demographic models. They show that the method works well.

The authors propose interesting features for their network which I think will be of interest of the population genomic and deep learning community.

They also advertise their method as being one of the few to do adaptive introgression inference.

The method is constrained by a demographic model.

The network takes as input a genotype matrix, with sorted and ordered individuals, and m bins in a window of 100kb.

The network is trained as a classification task to say whether the window of interest corresponds to adaptive introgression or not. The output of the sigmoid function (last layer of the CNN), is then used as the probability of AI in the given window.

This probability is then calibrated to take into account the fact that on real world dataset, the categories used in the training phase will likely not match the relative frequency of 100kb region under neutrality, selective sweep or AI.

Finally, the authors applies their method on real dataset and propose new gene as candidate for archaic adaptive introgression.

1. The authors suggest that few methods exist for this task, however, I believe this would be of interest to know how existing methods perform (such as the one from Setter et al., 2020) compared to genomatnn, if possible.

We have now made a comparison to VolcanoFinder (Setter and Mousset et al., 2020), as well as several summary statistics (Martin et al., 2015; Racimo et al., 2017). These new results are summarised in the Essential revisions section above. We note that the comparison with VolcanoFinder may not be entirely fair, as Setter et al., themselves suggest their method works poorly for a divergence time between donor-recipient on the scale of Humans and Neanderthals/Denisovans.

2. The results are good (>95% recall, Figure 2) for scenarios with high selection coefficient and/or early time of onset. The same as Figure 2B but with the precision in addition to recall might be interesting to know whether good precision is found in the same space or in another one (e.g. low selection coeff and late time of onset). Besides, the results seem quite affected by the coefficient of selection, time of onset, and also time of gene flow. Again, having a comparison with other method might help to assess whether the results presented here are good or not, in terms of method. Could this be calibrated as well, or given more weight in the training to improve accuracy for low selection coefficient for instance?

Figure 2B shows a heatmap of the true positive rate (aka sensitivity, aka recall: TP/(TP + FN)) across the space of selection coefficient, s, and time of onset of selection, Tsel. The information for this heatmap is easily obtained: the value in any given s/Tsel bin is equivalent to the average Pr{AI} prediction score across AI scenario simulations with s/Tsel values corresponding to the bin. To make an equivalent plot for the precision (TP/(TP + FP)), we would need to also count false positives (FP) corresponding to each s/Tsel bin (a non-AI simulation with the given s/Tsel parameters that was given a Pr[AI]>0.5). But when the condition negative is the DFE or neutral scenario (the majority of our false positives), the simulations cannot be assigned to an s/Tsel bin because there is no positively selected allele and thus no s/Tsel parameters.

The poorer performance of the method in the recent past, and/or with lower selection coefficients, is likely because the haplotypes carrying the beneficial allele had not risen to higher frequency in the recipient population. In the discussion we state:

“for the two putative pulses of Denisovan gene flow (Jacobs et al., 2019), we find our model has greater recall with AI for the more ancient pulse (94% versus 83.6%; Figure 2—figure supplement 1), likely because haplotypes from the older pulse have more time to rise in frequency. Similarly, recall is diminished when the onset of selection is more recent.”

We expect other methods will have difficulty with recent and/or weak selection for the same reason.

3. Authors do not use a test set. How can we be sure that the author did not "overfit" (unconsciously) the validation dataset after trying different hyperparameters? Showing result on a test set is better practice.

We split our data into 90%/10% training/validation sets. The hyperparameters and network architecture were largely tuned on a smaller preliminary set of training/validation simulations that did not vary the selection coefficient or time of onset of selection. After doing simulations that varied the selection coefficient and time of onset of selection, we again split into training/validation sets. Later, when the manuscript was at an advanced draft stage, we discovered our simulations contained a bug, and thus we reran all simulations once again. These latter simulations were used for the training/validation sets reported in the manuscript.

4. The attention analysis should be improved. First, guided backpropagation has been shown to not pass sanity check (Adebayo et al., "Sanity check for saliency maps", NIPS, 2018). It was shown that guided backpropagation highlight pixels that are independent of both the data used for training and the model parameters, and this method should not be used for explanation and interpretation tasks. In this paper, guided backpropagation appears to work merely as an edge detector. I wonder whether this is what is captured by the vertical lines we see in the saliency map that would correspond to clusters made by the rearrangement of the individuals. So I would highly recommend the author to choose another method for the saliency map (that passes sanity check). With the new saliency maps, if there is a signal localized along the SNP dimension, it would be interesting to know whether this actually due to the fact that the beneficial mutation was in the middle of the window, or caused by some other artifact. To do so, the authors could re-simulate scenarios that were easily well classed (e.g. with high selection coefficient and early time of onset), but with the mutation not in the middle.

We thank the reviewer for pointing out the interesting work of Adebayo et al., (2018). We have now switched to using the “vanilla” saliency method originally proposed in Simonyan, Vedaldi and Zisserman, (2014), as provided by the tf-keras-vis python package (as opposed to the keras-vis package we used previously). Note that this package does not include implementations of any “suspect” saliency methods identified by Adebayo et al. The results, and our interpretation, remain the same after these changes.

The vertical bands that appeared in the previous saliency map remain. After much investigation we believe the vertical bands derive from the width of the convolution filter in the first convolution layer, as increasing the filter width does appear to increase the width of the vertical bands in the saliency maps (results not shown).

We have further provided saliency maps for each of the distinct simulation scenarios: neutral, sweep, and AI. Even in the saliency map produced for neutral simulations, it is clear that attention is more concentrated in the middle of the genomic window. This strongly suggests that the trained network has learned to focus here, rather than this being an artefact of edge detection for some classes of input.

5. I do not understand why the author propose to study the results with 2 cut-offs of 5 and 25% for the beneficial allele. Also only the 25% cut-off is discussed at the end. So why a cut-off in the first place ? Then, why two ? And finally, why showing and discussing the results only with 25%? For instance, table 1 and S2 have only 5 regions in common out of 25. How the reader should interpret that?

The motivation for the cut-off was unclear, and we have now fixed that in the text:

“To give the network the best chance of avoiding false positives, we tried two different beneficial-allele frequency cut-offs for training: 5% and 25% (Table 1 and Appendix 1-Table 1). We focus here on describing the results from the 25% condition […]”

The high probability candidates are very similar between the two cut-offs (e.g. see peaks in Figure 4), although when looking at only the top-ranked candidates, we can see differences between the two cut-offs. Naturally, we should expect some variation, because these results are derived from distinct CNNs (the same architecture, but two different training runs), and the training process is stochastic in each case.

[Editors' note: further revisions were suggested prior to acceptance, as described below.]

All reviewers expressed appreciation for your hard work and thorough revisions. We have just two relatively small remaining requests before we can consider accepting your paper for publication in eLife:

1. You now state in the text: "We note, however, that, as with previous methods, visual inspection of the haplotypes or genotypes of the top candidate regions remains a necessary criterion to accurately assess whether a region may have been under adaptive introgression." We would also like for you to briefly discuss what to do with regions classified as AI are not among the "top candidates". These cases might be harder to validate by visual inspection. Do you think that these cases should be discarded? Or can you share any other insights on how to tell whether these AI regions could be legitimate, and whether users could use this tool to detect the impact of AI across the genome more broadly than just the "top candidates".

We have added the following paragraph to the discussion below the text quoted above.

“Conversely, there may be regions under AI that are classified as highly probable by the CNN, but that did not appear in our top candidates. Validating a large number of candidates might be difficult, but one could imagine running a differently trained CNN (perhaps one better tailored to distinguish AI from more similar scenarios, like selection on shared ancestral variation) on the subset of the regions that are predicted to be AI using a lenient probability cut-off. One could also use our method more generally, to assess the impact of AI across the genome, by comparing the distribution of probability scores with those of simulation scenarios under different amounts of admixture and selection, though in that case one would need to train the CNN on a wider range of admixture rates and demographic models.”

2. One previous reviewer comment concerned the absence of a test set (because having only a validation set might lead to overfitting hyperparameters to that set). In your response to reviewer comments you noted that you actually used different simulations and then simulate a new dataset while writing the paper. However, this clarification does not appear in the main text – please revise the manuscript to describe to the reader that the choice of hyperparameters was not made on the validation set mentioned, but instead during the "Preliminary analysis". Otherwise, the reader could think that the network might be biased.

We have added the following text to the “CNN model architecture and training” subsection of the Methods:

“The hyperparameters and network architecture were tuned on a smaller preliminary set of simulations that did not vary the selection coefficient or time of onset of selection, so we chose not to split the simulations into a third "test" set when evaluating the models trained on our final simulations.”

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    Transparent reporting form

    Data Availability Statement

    Source code is available from https://github.com/grahamgower/genomatnn/.

    The following datasets were generated:


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES