Abstract
Hypolimnas misippus is a Batesian mimic of the toxic African Queen butterfly (Danaus chrysippus). Female H. misippus butterflies use two major wing patterning loci (M and A) to imitate three color morphs of D. chrysippus found in different regions of Africa. In this study, we examine the evolution of the M locus and identify it as an example of adaptive atavism. This phenomenon involves a morphological reversion to an ancestral character that results in an adaptive phenotype. We show that H. misippus has re-evolved an ancestral wing pattern present in other Hypolimnas species, repurposing it for Batesian mimicry of a D. chrysippus morph. Using haplotagging, a linked-read sequencing technology, and our new analytical tool, Wrath, we discover two large transposable element insertions located at the M locus and establish that these insertions are present in the dominant allele responsible for producing mimetic phenotype. By conducting a comparative analysis involving additional Hypolimnas species, we demonstrate that the dominant allele is derived. This suggests that, in the derived allele, the transposable elements disrupt a cis-regulatory element, leading to the reversion to an ancestral phenotype that is then utilized for Batesian mimicry of a distinct model, a different morph of D. chrysippus. Our findings present a compelling instance of convergent evolution and adaptive atavism, in which the same pattern element has independently evolved multiple times in Hypolimnas butterflies, repeatedly playing a role in Batesian mimicry of diverse model species.
Keywords: transposable elements, mimicry, adaptive evolution, linked-reads, bioinformatic methods
Introduction
Butterfly wing patterns are a classic example of adaptive evolution. Evolutionary genetic studies have dissected the loci controlling wing pattern in several species of butterflies from a wide range of ecotypes and families, providing extensive information on the evolution of adaptive traits (Jiggins 2017; Beldade and Brakefield 2018). The genetic architectures uncovered are varied, from supergenes formed by inversions encompassing multiple loci in Heliconius numata (Joron et al. 2011) to transposable element (TE) insertions in the peppered moth (van’t Hof et al. 2016) or a gene duplication in the wood tiger moth (Brien et al. 2023).
Key insights on the genetic basis of butterfly wing patters come from the Heliconius genus of tropical butterflies (Jiggins 2017). These are best known for the multiple instances of Müllerian mimicry in which several pairs of unpalatable sympatric species converge to the same wing pattern sharing the costs of teaching predators. Numerous studies have looked at the genetic basis of their mimetic patterns, identifying the main genes contributing to these adaptive phenotypes and describing their genetic architecture (Joron et al. 2011; Reed et al. 2011; Martin and Reed 2014; Nadeau et al. 2016; Westerman et al. 2018). Although much is known about the genetic basis of mimicry in Heliconius, exploring other systems, particularly those with other evolutionary dynamics such as Batesian mimics, in which palatable mimics resemble toxic models, will provide crucial knowledge on the evolution of adaptive phenotypes.
The Hypolimnas genus of tropical butterflies is diverse in wing pattern phenotypes (Fig. 1B). Interestingly, the genus presents many instances of Batesian mimicry, with the main models being Danaid species of the Danaus, Amauris, and Euploea genera (Vane-Wright et al. 1977). Despite the diversity in phenotype and model species being mimicked, some wing pattern elements are common in most Hypolimnas, exemplified by the black-and-white forewing tips found in most species (17/21 species with phenotype data) or the common black or brown background color. These common wing patterns are often adaptive, and it is not known whether they have independently evolved multiple times or are shared through common ancestry. A less likely hypothesis is that these phenotypes are ancestral but have been lost and re-evolved in some species through convergent evolution. This hypothesis is similar to atavism, in which mutations or recombination events recreate an ancestral phenotype using existing developmental machinery, but differs from it in that atavism is often maladaptive. Hypolimnas therefore offer an opportunity to study the evolution of adaptive phenotypes in a group that has not been well studied to date.
Fig. 1.
Mimicry in Hypolimnas missippus and the Hypolimnas genus. A) Female morphs of H. misippus side by side with their matching model morphs of D. chrysippus. Names of the forewing morph of H. misippus are specified below each photo. Although morphs matching the bottom-right H. misippus (immima forewing and white-spotted hindwing) exist within the D. chrysippus hybrid zone, they are considered maladaptive intermediates outside of it. D. chrysippus is not sexually dimorphic; individuals shown are all males. Non-mimetic H. misippus male at the bottom. B) Phylogram of the Hypolimnas genus extracted from Sahoo et al. (2018) (concatenated Bayesian inference tree) showing that black-and-white forewing tips are common through the genus and most likely ancestral. For presentation purposes, one specimen is shown per species, although not all species are monophyletic. Choosing other specimens would not change the conclusion on the ancestrality of black-and-white wing tips. All species shown are sexually dimorphic and/or polymorphic except antevorta, dexithea, inopinata, and usambara. Species showing Batesian mimicry are indicated by a small dark dot. Recurrent forewing phenotypes are indicated by wing drawings. Male and female signs indicate the sex of the individual photographed. Butterfly photos are reproduced from Moore (2023) under CC-BY.
Hypolimnas misippus or Diadem is a pantropical butterfly with complex Batesian mimicry. Females are mimetic and polymorphic, with detailed resemblances to three of the morphs of the toxic African Queen, Danaus chrysippus (Fig. 1A; Smith 1973). Despite the striking mimicry, a puzzling mismatch exists in the geographical distribution of H. misippus and D. chrysippus morphs across Africa, in that the most abundant models are not reflected in the frequency of mimics at a given location (Gordon et al. 2010). This, together with the fact that maladaptive intermediate morphs of H. misippus are commonly found, suggests that current selection for mimicry might be weak and raises the question of how the polymorphism is maintained (Gordon and Smith 1998; Gordon et al. 2010). Clarifying the genetic underpinnings of wing mimicry in H. misippus will shed light on this complex case of Batesian mimicry and the forces maintaining polymorphism in the population.
Wing coloration in H. misippus is determined by two loci of large effect, the M and A loci, determining forewing and hindwing patterns, respectively (Smith and Gordon 1987; Gordon and Smith 1989; VanKuren et al. 2019). The existence of a third locus, the hindwing white suppressor S, has also been hypothesized (Gordon and Smith 1989). The M locus is a Mendelian locus with two alleles, with the dominant M allele (diploid genotype M-) producing the mimetic black-and-white forewing tips in the misippus morph (Fig. 1A); whereas recessive homozygotes (mm) have mimetic orange or intermediate forewings, known as the inaria and immima morphs, respectively. Epistasis exists between the M and the A loci, producing the intermediate immima forms in mm genotypes when the dominant A allele for white hindwings is present (mmA-genotype). Previous work has identified the M locus to an intergenic region of 10 kb near genes of interest such as pink and Sox 5/6 (VanKuren et al. 2019). However, not much is known about the structure of the locus itself, which of the alleles is derived, and whether it arose through de novo mutation or introgression.
Structural variation forms a large part of the genetic variation observed in wild populations and can play a key role in adaptation and speciation (Auton et al. 2015; Wellenreuther et al. 2019). Structural variants (SVs) are typically defined as events larger than 50 bp and include various combinations of gains, losses, and rearrangement of genetic material, which can have extensive effects on gene content, as well as genetic contiguity (reviewed in Ho et al. 2020). These effects have major roles in adaptation and speciation in many species (reviewed in Hoffmann and Rieseberg 2008; Kondrashov 2012; Faria et al. 2019) as well as human disease (Weischenfeldt et al. 2013; Zeevi et al. 2019). For example, inversions have often been associated with complex phenotypes, as reduced recombination at the inversion promotes the joint inheritance of co-adapted alleles (Kirkpatrick and Barton 2006). Examples of this are seen in elytra coloration in ladybirds and reproductive morph switches in the ruff (Küpper et al. 2015; Lamichhaney et al. 2016; Ando et al. 2018; Gautier et al. 2018; reviewed in Thompson and Jiggins 2014; Orteu and Jiggins 2020). In other cases, gene duplications might give rise to adaptive loci through neo-functionalization as seen in heterostyly in Primula plants and in the complex phenotypes of the wood tiger moth (Li et al. 2016; Brien et al. 2023).
Despite the importance of SVs in phenotypic variation, their study is limited by the difficulty of detecting them using high throughput “short-read” DNA sequencing (Mahmoud et al. 2019). SVs involve the rearrangement of otherwise identical DNA sequences, so their detection often requires sequencing molecules that span the rearranged sequence junction. Relative to the size of an SV (often >50 kb), the fraction of read molecules (typically 300–500 bp) that span junctions can be vanishingly small. This problem is made worse by ambiguous mapping due to repetitive elements, which contribute to the formations of SVs (Sharp et al. 2005; Carvalho and Lupski 2016; Payer et al. 2017). Nevertheless, a number of programs exist to detect SVs from short-read sequencing (Rausch et al. 2012; Sindi et al. 2012; Layer et al. 2014; Iakovishina et al. 2016). Long-read sequencing, in contrast, has improved our power to detect SVs via reads that span repetitive and problematic regions, but is limited by cost (Sedlazeck et al. 2018; Ho et al. 2020).
Linked-read sequencing has emerged as an alternative that combines the scalability of short-read sequencing while retaining linkage information (Marks et al. 2019). The newly developed “haplotagging” is a simple, linked-read technique that can be used to sequence entire study populations with hundreds of individuals (Meier et al. 2021). In this approach, large DNA molecules are barcoded as they are broken up for short-read sequencing. To detect SVs, the barcoded, larger DNA molecule greatly boosts the fraction of junction-spanning molecules, thus improving detection power. Importantly, haplotagging can be easily scaled up to population level by multiplexing, which makes it possible to track the frequency of polymorphic SVs in single individuals, making it an ideal tool for the study of adaptation and speciation in non-model organisms (Meier et al. 2021).
Here, we dissect the genetic architecture of an adaptive polymorphism in H. misippus using haplotagging data. First, we describe our custom program, WRapped Analysis of Tagged Haplotypes (Wrath, github.com/annaorteu/wrath), and validate in two ways: (1) we run Wrath on published Heliconius haplotagging data with known SVs and (2) we test it against simulated Heliconius data. Thereafter, we focus on the H. misippus case by first performing an association study using hundreds of whole genome haplotagging sequences to pinpoint the candidate locus controlling mimicry in this system. We then use the linked-read information to dissect the genetic structure of the locus by applying Wrath. Finally, we perform a cross-species comparison within the genus Hypolimnas to investigate the evolutionary history of the wing pattern mimicry alleles.
Results
Wrath: A Tool for Visualizing and Detecting Candidate SVs From Linked-Read Data
We describe Wrath, our program for the visualization and exploration of SVs from linked-read data. Wrath divides chromosomes into genomic windows and quantifies barcode sharing among them, creates a matrix and heatmap that can be used to identify outlier regions caused by SVs, and performs statistical testing to identify candidate SVs (see Methods for a more detailed description). We validate Wrath’s accuracy with two approaches: (1) we apply Wrath to an existing haplotagging dataset of Heliconius butterflies (Meier et al. 2021) and (2) we create a simulated dataset of known SVs by inducing SVs of a range of sizes (1, 5, 10, 50, and 100 kb) on the Heliconius melpomene genome (v2.5) and test the sensitivity of Wrath at detecting them using the Meier et al. (2021) dataset.
To validate Wrath, we first use it on an available dataset of the two tropical butterfly species Heliconius erato and H. melpomene (Meier et al. 2021). Each species presents two morphs or subspecies with mimetic wing patterns which hybridize: H. melpomene plesseni and Heliconius malleti, and H. erato notabilis and Heliconius lativitta. We used this dataset to explore which SVs are found in these populations. First, we searched for any SVs present in the dataset genome wide and identified 2,284 large (>10 kb) putative SVs in H. melpomene and 2,265 in H. erato (Supplementary Material online). We then explored these using the heatmaps produced by Wrath. Patterns of barcode sharing observed in the heatmaps can be used to identify the type of SV present in the samples (Fig. 2A–C). For example, inversions result in a bowtie pattern in the heatmap, as more barcodes are shared than expected between loci that are far apart in the reference genome. We produced heatmap plots for all large scaffolds of H. melpomene and H. erato and explored the SVs present in the dataset (Fig. 2C and Supplementary Material online). For example, a known SV in chromosome 2 in H. erato (Meier et al. 2021) was clearly visible in the heatmap (supplementary fig. S1, Supplementary Material online). With this, we show that Wrath can visualize patterns of barcode sharing and help prioritize the order of exploration of SVs, as visual examination of the haplotagging data helps explore the SV content in the samples. Nonetheless, some SVs produce similar or matching signals. For example, patterns of interchromosomal translocations like that shown in Fig. 2A can also be produced by TE insertions.
Fig. 2.
Wrath produces visualizations of linked-read data to identify candidate SVs. A) Hypothetical Wrath outputs for some SV types showing heatmaps of barcode sharing between genomic windows of a chromosome. On top of each heatmap are depicted the reference genome and its order of loci, and below a depiction of the rearranged genome region containing an SV. Points A, B, C, and D depict different loci around the breakpoints of the SV that we use as a guide through the diagrams. B) Linked-read mapping pattern on the reference genome for each of the hypothetical SVs. C) Wrath output heatmaps from the Heliconius dataset depicting possible SVs that match the hypothetical predictions. Plots are produced using all H. melpomene (Hmel) or H. erato (Herato) at a time, focusing on the following genomic positions (left to right): Hmel208001o:6150000-6850000, Hmel213001o:4850000-5550000, Herato2101:12790000-13490000, Hmel214004o:2750000-3450000, and Hmel220003o:10150000-10850000. D) Wrath outputs of the region around one of the loci (the optix locus) associated with color pattern in H. erato (left) and H. melpomene (right). Each triangle half of the matrix depicts barcode sharing for one of the color pattern subspecies—in each case depicted by the side. Arrows point at putative deletions that are polymorphic between the two subspecies. Above the heatmap is plotted the Manhattan plot of the result of the Genome Wide Association Study (GWAS) of color pattern between each pair of subspecies. These show only the region around the associated optix locus (Herato1801:1000000-1500000 for H. erato and Hmel218003o:700000-850000 for H. melpomene). Triangles depict the correspondence of regions between the GWAS and heatmap. Black rectangles depict the location of optix.
Finally, we assessed the time required to run on the dataset. In a small subset of the data and running with 20 threads, the parallel implementation ran 10× faster than a single-threaded implementation of the same algorithm (supplementary fig. S2, Supplementary Material online).
Wrath Identifies SVs From Simulated Data, But Is Limited by Molecule Size and Window Size
To further test Wrath’s accuracy, we applied it to simulated data with known SVs. To do that we induced 710 SVs of a range of sizes (1, 5, 10, 50, and 100 kb) on the H. melpomene genome. We then mapped the H. melpomene sp. plesseni samples from Meier et al. (2021) and tested the sensitivity (true positives/(true positives + false negatives)) of Wrath at detecting them using two genomic window sizes, 10 and 5 kb. First, we defined a measure of error to categorize the SVs identified by Wrath as correct or incorrect. To do that we measure the absolute distance between detected and simulated breakpoints and scale it by genomic window size to make the measure easier to interpret (supplementary fig. S3, Supplementary Material online). We use this scaled error measure to evaluate the sensitivity of Wrath. SVs equal or larger than 50 kb were detected with low errors and high sensitivity using either window size (for threshold = 20 windows, sensitivity ≥0.981, mean = 0.994 for 10 kb and ≥0.846, mean = 0.877 for 5 kb windows; supplementary figs. S4 and S5, Supplementary Material online). On the other hand, Wrath largely failed at recovering SVs of 10 kb or smaller size. There are two reasons for this: window size and molecule size. Smaller genomic window sizes allow for the detection of smaller SVs, and their breakpoints are detected with higher precision (i.e. distance between the identified breakpoint and the real one is smaller). Using a 5 kb window, we were able to identify most simulated SVs of 50 kb size or larger within 20 windows. However, smaller genomic windows come at a computational cost and produce noisier results, that is, if coverage is not high enough, there might not be enough barcodes in each window to pick up signals of SVs with barcode sharing. A second limitation to the detectability of SVs particularly for inversions is molecule size, which in this dataset ranges from 40 to 60 kb (Meier et al. 2021). SVs that are much smaller than the molecule size (e.g. 10 kb) will not have any difference in barcode sharing at either side of the breakpoints, which is the signal used to detect certain rearrangements such as inversions. Thus, molecule size poses a lower limit on the size of certain type of SVs that can be detected using linked-read data. Finally, for very large inversions (i.e. 100 kb), sensitivity of the detection is correlated with window size, because smaller windows identify the edges of the bowtie pattern, which are not equivalent to the breakpoints. However, the signal of depletion of barcode sharing that the inversion breakpoint has at the diagonal (second case in Fig. 2A–C) will be identified as a separate SV by Wrath, which can be used to accurately identify the breakpoint.
Multiple Deletions Are Found at the Optix Color Locus H. melpomene and H. erato
Following the genome wide analyses of the Heliconius dataset, we were interested in visualizing barcode sharing and candidate SVs at a local scale, particularly to explore whether known wing patterning loci in Heliconius were also associated with structural variation. Using Wrath, we found that in both species pairs, there are multiple putative deletions of 1–10 kb that differentiate color morphs at the locus associated with red pattern elements near the gene optix (Fig. 2D). Deletions leave an area depleted of barcode sharing, as reads do not map to the area, which can be visually identified using the heatmaps (Fig. 2A–C). However, only polymorphic deletions can be identified, as deletions that are fixed in all sequenced individuals but not in the reference cannot be distinguished from assembly artifacts or poor mapping (e.g. repetitive regions for which mapping reads are filtered out due to low mapping quality). Larger deletions (>10 kb) leave an additional signature in the heatmap, showing increased barcode sharing between the breakpoints, in the shape of a triangle (Fig. 2A–C). This is because more molecules than expected span through the breakpoints in individuals presenting the deletion. This signature is harder to detect in smaller deletions, as barcode sharing across the small, deleted region is similarly high in individuals with and without the deletion.
The red patterning locus contains cis-regulatory elements that control the expression of optix and influence development of red color elements, which have been functionally tested with CRISPR (supplementary fig. S6, Supplementary Material online; Lewis et al. 2019). Thus, one possibility is that these deletions are disrupting the function of an optix cis-regulatory element (CRE) and affecting optix expression, although they will need to be functionally tested. Alternatively, it could be that these deletions are in linkage with selected single nucleotide polymorphisms (SNPs) and thus associated with the adaptive color patterns. Finally, the deletion detected in H. melpomene was detected previously using short-read sequencing of two different subspecies (Wallbank et al. 2016).
Forewing Mimicry in H. misippus Is Controlled by the M Locus
To explore the genetic underpinnings of Batesian mimicry in H. misippus, we first confirmed the previously described identity of the M locus (VanKuren et al. 2019). We sequenced 332 H. misippus females collected in distinct locations across Africa (supplementary table S1, Supplementary Material online) using haplotagging. The dataset contains 275 misippus (M-) individuals and 57 inaria/immima (mm, supplementary table S2, Supplementary Material online), sequenced to 0.81 coverage on average (supplementary table S3 and fig. S7, Supplementary Material online). By using haplotagging with a large dataset, we could sequence at low coverage per individual without compromising on statistical power to detect loci associated with mimicry. This is because, although read coverage is low, molecular coverage (i.e. coverage of DNA molecules) is higher in linked-read data, as SNP information of reads belonging to the same DNA molecule can be used for imputation and phasing (Marks et al. 2019; Meier et al. 2021). Also, using a large population sample (>200 individuals) facilitates the identification of regions associated with the trait of interest (Lou et al. 2021).
First, we parsed and demultiplexed the data and then imputed SNPs and phased haplotypes, which resulted in the identification of 46.1 M SNPs, with a mean phased block N50 of 109.02 kbp (supplementary fig. S8, Supplementary Material online). We then performed a Genome Wide Association Study (GWAS) to identify the locus controlling forewing mimicry. A single large peak of association with variation in forewing phenotype was found on chromosome 26 (6,731,000–6,743,400 bp; [1, N = 331] = 118; P-value = 1.703 × 10−27 of top associated SNP; Fig. 3A), corresponding to the M locus (VanKuren et al. 2019). Principal component analysis (PCA) of the whole of chromosome 26 showed no evidence for population structure in the data (supplementary fig. S9, Supplementary Material online). In contrast, when using just the associated region for PCA, individuals of the same phenotype were found closer together (supplementary fig. S10, Supplementary Material online). Closer examination of the associated region in the GWAS result revealed three separate peaks of association (Fig. 3D), suggestive of a linked haplotype block. By contrast, the intervening SNPs with little to no association occur within tracts with extremely low read depth in individuals with the inaria (mm) phenotype (Fig. 3E), indicative of an absence of sequence reads matching these tracts in inaria individuals. One possibility is that structural variation is segregating at this locus.
Fig. 3.
Two large indels are found at the locus associated with forewing mimicry in mm individuals. A) GWAS of forewing phenotype shows a unique peak at chromosome 26. Blue line indicates the genome wide significance threshold with Bonferroni correction for multiple testing (P = 0.05, N = 46,088,305). B) The GWAS peak showing local annotation track with the three main candidate genes colored: Pink, Sox 5/6 and g1743. C) Heatmap of barcode sharing of the region around the association peak. Upper triangle shows barcode sharing for M- individuals and lower triangle mm individuals. A different pattern of barcode sharing between mm and M- individuals is seen at the associated region. D) At a finer scale, the association peak reveals a three-peak structure. E) Depth of read coverage around the associated region supports the hypothesis of deletions, as mm individuals present almost 0 coverage between the association peaks, while M- individuals have more constant coverage throughout the region (note that some M- individuals are likely to be heterozygous, carrying one copy of the recessive m allele, explaining the partial reduction in read depth in M- individuals). F) A zoom in of the barcode sharing heatmap reveals a signal of depletion between the peaks of association in mm individuals, a signature of deletions relative to the reference (or insertions in the reference).
misippus Individuals Carry Multiple TE Insertions at the M Locus
To explore structural variation more closely, we applied Wrath to H. misippus population data. The region associated with differences in forewing pattern spans 10 kb in length, thus a very small window size is necessary to elucidate whether there is any structural variation at the locus, given that SVs can only be detected if they are smaller than the genomic windows used (Methods). We visualized barcode sharing using a window size of 100 bp around the M locus and identified two putative deletions in the recessive m allele relative to the reference genome, which is a haploid assembly generated from an M homozygote individual (Fig. 3C and F; see Methods). These two indels perfectly match the locations of the troughs of association seen in the GWAS analysis where read coverage is almost zero in mm individuals, supporting the hypothesis of two deletions in the m allele (or two insertions in the M allele; Fig. 3E). This explains the decline in association in these two regions, as SNPs cannot be confidently called in mm individuals.
To verify the presence of these indels, we designed PCR primers flanking each indel, and at the breakpoints, and amplified them from misippus (M-) and inaria/immima (mm) individuals (four individuals per phenotype; supplementary fig. S11, Supplementary Material online). This confirmed that two insertions of 2.4 and 4.3 kb are present in the dominant misippus phenotype relative to the recessive inaria/immima. A set of TEs insertions detected with RepeatMasker compose the entirety of the two insertions, which are situated in a 3′-UTR intron of the gene g1743, an ankyrin repeat and sterile alpha motif domain containing gene of unknown function (Fig. 3B and A and supplementary tables S4 and S5, Supplementary Material online). Insertion A (most downstream) is composed of a tandem duplication of Helitron family TE and an unknown TE, while insertion B is composed of three Helitrons, four LINEs, and two unknown TEs (Fig. 4A and supplementary table S6, Supplementary Material online). Given that the insertions are found in the dominant allele, the most plausible explanation is that the insertion is modifying the expression of a nearby gene, either g1743 or others such as Sox 5/6 and pink, by acting as or affecting existing cis-regulatory elements.
Fig. 4.
TE insertions are found in the dominant derived allele m. A) Zoom in of the association peak showing a TE annotation track. The regions between the association peaks are composed by TE insertions. B) Schematic of the structure of the dominant derived (top) and recessive ancestral M locus alleles. C) The alignment of the reference genomes of H. misipppus and H. bolina (in reverse orientation) shows that H. bolina does not present the TE insertions. D) Read coverage of H. bolina and other Hypolimnas species suggests that other Hypolimnas do not carry the insertions and that those are thus derived. E) A phylogeny of the Hypolimnas species used and other Lepidoptera. H. deceptor, which is missing, is a sister species to H. anthedon (Fig. 1B). Phylogeny extracted from Kumar et al. (2022). F) PCA of the locus associated with forewing phenotype reveals the structure by species, while local PCA in sliding windows of 25 SNPs reveals that the m allele is more similar to the H. bolina samples in the region of the first peak (chromosome 26: 6732649-6732923), supporting the hypothesis of the recessive allele being ancestral.
The M Allele Carrying TE Insertions Is Derived and Produces an Atavistic Adaptive Phenotype
We next explored the evolutionary history of the M locus. The presence of the TE insertions could either be ancestral or derived. In other words, either the ancestor of both alleles already carried these TE insertions, and they were subsequently deleted to form the m allele as it is today, or the ancestor lacked the TE insertions, and they were inserted to form the M allele as it is today. To test these hypotheses, we explored this region in the genome of Hypolimnas bolina, a relative of H. misippus with very different wing patterns that diverged approximately 8 million years ago (Sahoo et al. 2018). First, we aligned the H. misippus and H. bolina reference genomes using Satsuma2, an aligner intended for inferring homology from sequence similarity (Grabherr et al. 2010). We identified an orthologous region on chromosome 26 of H. bolina showing strong synteny across the M locus, with no further rearrangements except for the TE insertions, indicating conserved synteny in H. misippus (Fig. 4C). Furthermore, the alignment shows that, while the peaks of association have homologous sequences in chromosome 26 in H. bolina, the two indels between the peaks have no matches in the H. bolina genome (Fig. 4C). This strongly suggests that the m allele is ancestral, and the TEs represent derived insertions into the M allele.
To further explore the origin of the alleles, we analyzed whole genome resequencing data from 4 other Hypolimnas species, including 214 H. bolina, 4 Hypolimnas anthedon, 4 Hypolimnas deceptor, and 2 Hypolimnas usambara (see Fig. 4E for a phylogeny), sequenced to an average coverage of ∼6.5×. First, we mapped all resequenced Hypolimnas to the H. misippus reference genome and quantified read coverage at the M locus. Read coverage across the TE insertions is approximately zero in the outgroup species (Fig. 4D), confirming the hypothesis that the TE insertions are derived and unique to the M allele of H. missippus.
To investigate in more detail the evolution of the M and m alleles, we evaluated the SNP variation at the locus across all the H. misippus and H. bolina samples using PCA. PCA was conducted using the SNPs of the associated region, and then repeated using windows of 25 SNPs across the region. We chose to use PCA because compared to other techniques such as phylogenetic trees, PCA or similar dimension reduction techniques focus on the main mode of variation and can be more robust under low-coverage sequencing scenarios such as our dataset. Here, our aim is to identify overall genetic similarity: in broader regions, a PCA should recapitulate the species relationships and separate all H. misippus from all H. bolina samples. However, if the derived M allele accumulated an excess of derived mutations, we may find certain tracts in which the inaria/immima individuals, which carry the ancestral m allele, group more closely with the H. bolina individuals than with the H. missipus individuals carrying the M allele.
Across the whole associated region that contains the M locus, the PCA reflects the species relationships, with one cluster for each species (Fig. 4F, top). Contrastingly, performing local PCAs in genomic windows of 25 SNPs reveals a different pattern. We identified a region containing the top associated SNPs of the most upstream peak of association (6,732,649–6,732,923) for which the relationship between the samples did not reflect the species relationship (Fig. 4F, bottom). Inaria/immima mm individuals are found closer to H. bolina than to misippus M- individuals, which could suggest that the m allele is ancestral. The TE insertions found in the M allele have reduced recombination, because of reduced effective population size, which could lead to the reduction of recombination at the flanking regions. The TEs might be disrupting a functional element and thus under weak selection which could be coupled with the low recombination and lead to the accumulation of mutations. These coupled effects could make the M allele retain fewer ancestral SNPs at the flanking region than the m allele. Taken all together, the read coverage, reference genome alignment, and local PCA at the associated region suggest that the recessive m allele that produces orange forewings in homozygosis is ancestral to the dominant M allele that produces black-and-white forewing tips.
Discussion
Here, we present a case of adaptive atavism in the diadem butterfly, H. misippus, in which the derived allele is associated with a reversion to an ancestral yet adaptive phenotype. Atavisms are caused by mutational or recombination events that enable the pre-existing developmental machinery to reproduce the ancestral character (Hall 2010). Crucially, they are often maladaptive, as the lost phenotype has been selected against, such as hind limbs in whales and teeth in birds, or are associated with a malfunctional state such as cancer (Thomas et al. 2017). In line with this, Stephen Jay Gould revisited Dollo's law, which refers to the paleontological observation that morphological traits that are lost in an evolutionary lineage do not later on re-evolve in that lineage (Gould 1970). We present a case where the atavistic phenotype is adaptive, with the derived allele of the M locus in H. misippus producing a mimetic wing phenotype. We show that two large insertions of 2.4 and 4.3 kb are found in the dominant allele of the M locus and that these are formed by multiple TE insertions. By comparison to other Hypolimnas species, we show that the insertions are derived. Our results suggest that, from an ancestral black-and-white forewing morph, an orange morph evolved in H. misippus by a mutation in an unknown locus, and that this morph reached fixation in the population. Following that, TE insertions at the M locus created the M allele, which reverted the phenotype to the ancestral black-and-white forewing morph. Melanised apexes in the forewing with subapical white bands (i.e. forewings with black-and-white tips) are a common wing phenotype in Hypolimnas present in 81% of the species (17 out of 21 with phenotype data; Fig. 1B) and in Nymphalids such as Danaids or some Nymphalinae, including Antanartia and Vanessa species (e.g. Vanessa cardui; Fig. 4D). Here, we can understand the phenomenon of “evolutionary reversion” in the H. misippus butterfly as a molecular example of convergent evolution that is the re-evolution of the same (ancestral) phenotype via regulatory rewiring. Under this model, the original mutation that caused the change to orange wings would not be identifiable by sampling wild H. misippus, as this mutation fixed deep in the past, prior to the emergence of the M allele.
In summary, we show that H. misippus is an example of adaptive atavism in which the TE insertions in the derived M allele cause a reversal to an ancestral phenotype, the black-and-white phenotype, which has an adaptive function in Batesian mimicry. Adaptive atavism is a rare event with only a few known examples such as the re-evolution of wings in stick insects (Whiting et al. 2003) and aphids (Saleh Ziabari et al. 2023), sexual reproduction in oribatid mites (Domes et al. 2007), and shell coiling in gastropods (Collin and Cipriani 2003). Alternatively, it could be that the TE insertions are not directly causal but in linkage disequilibrium with the causal mutation. Functional testing such as gene editing with CRISPR-Cas9 would be necessary to prove the causality of the TE insertions. Furthermore, introgression of the m allele from another Hypolimnas species such as H. bolina could explain the local PCA results. However, phylogenetic trees of those SNPs do not show any introgression signal, that is clustering of H. bolina samples with H. misippus m alleles (Supplementary Material online).
The TE insertions identified are found in the intron of the gene g1743, which encodes an ankyrin repeat and sterile alpha motif domain containing protein of unknown function. Our results represent a similar case to the peppered moth, where a 22-kb TE insertion increases the expression of cortex, resulting in the production of melanic morphs (van’t Hof et al. 2016). Similarly, variation in wing color in the H. melpomene/timareta lineage is associated with a TE insertion in the cis-regulatory region of cortex, suggesting that cis-regulatory structural variation controls these mimetic phenotypes (Livraghi et al. 2021). Outside of Lepidoptera, TE insertions in cis-regulatory regions have also been found to be of adaptive importance, such as in egg-spot phenotypes in cichlid fish (Santos et al. 2014) and flowering time in the annual and inbreeding forb Capsella rubella (Niu et al. 2019). Given that the M locus insertion is found in a non-coding region, the most plausible explanation is that the insertion is modifying the expression of a nearby gene, either g1743 or others such as Sox 5/6 and pink by acting as or affecting existing cis-regulatory elements. Also, given that the insertion is found in the dominant allele, an increase in expression of the candidate gene or a certain isoform are possible explanations, which could be achieved by the disruption of a repressor or the generation of a novel enhancing function. Overall, this case adds more evidence that cis-regulatory mutations are associated with pattern variation, while coding mutations are more likely to be associated with color, and sheds light on the adaptive importance of TEs (Casacuberta and González 2013; van’t Hof et al. 2016; Orteu and Jiggins 2020).
Although the accuracy of the mimicry seen in H. misippus would suggest strong selection, intermediates are often found (Gordon et al. 2010). Moreover, while the subspecies of the model D. chrysippus each have large geographically distinct regions in which they are monomorphic (Liu et al. 2022), the three H. misippus mimetic morphs are found in all these regions. In other words, there is a phenotypic mismatch, in which many H. misippus individuals are mimics of a model that is rare or absent in their region, suggesting that other evolutionary forces might be at play (Gordon et al. 2010). Negative-frequency dependent selection has been invoked as one of several forces maintaining the polymorphism in H. misippus (Gordon 1987; Gordon et al. 2010). Another alternative is that structural variation causes associative overdominance and prevents the fixation of a single allele even in scenarios of weak mimicry selection. An example of this has been shown in H. numata in which a supergene containing three inversions controls wing phenotype (Jay et al. 2021). The inversions result in a region of low recombination, which in turn lead to the accumulation of deleterious recessive mutations. This leads to reduced fitness of homozygotes for both inversion alleles due to the deleterious effects of recessive mutations at different, but tightly linked sites (i.e. associative overdominance). The TE insertions at the M allele of H. misippus could theoretically lead to associative overdominance by reducing local recombination between M and m alleles, but the impact on recombination is likely to be less severe than that of a large inversion, so the role of recombination suppression on the maintenance of polymorphism requires further investigation.
In addition to our empirical results, we also present Wrath, a user-friendly, flexible, and fast tool for the visualization of haplotagging data and exploration of candidate SVs, which we test using two large haplotagging datasets and simulated data. Wrath produces heatmap plots of barcode sharing that can be used to visually inspect the data and identify candidate SVs. Wrath can be run with any chosen window size, which gives flexibility to the user and allows for the detection of SVs of different sizes. There are multiple software solutions for the identification of SVs from linked-read data, including LongRanger (Sudmant et al. 2015), Leviathan (Morisse et al. 2021), NAIBR (Elyanow et al. 2018), and GROC-SV (Spies et al. 2017). LongRanger and GROC-SV are very well curated tools for the analysis of linked-reads produced by 10×-Genomics, while NAIBR uses the BAM (Binary Alignment Map) files produced by LongRanger for its SV detection pipeline. Whilst these programs could be used for haplotagging data, the data would need to be converted to the 10×-Genomics format for input to these programs. Finally, Leviathan can take haplotagging data as input and produces a list of detected candidate SVs and their predicted breakpoints; however, unlike the above tools, Leviathan does not produce graphic visualizations of barcode sharing, which we have found a very useful tool for manual verification of putative SVs. We tested the effectiveness of Wrath using our dataset of H. misippus, a published dataset from Heliconius butterflies and simulated data, and show that Wrath can be useful in the visualization of haplotagging data to identify candidate SVs.
Altogether, our study presents a striking case of adaptive atavism in which an ancestral trait reappears for an adaptive function. Furthermore, our results highlight the importance of structural variation in the evolution of adaptive phenotypes, adding to the mounting evidence that TEs have an important role in adaptive evolution and particularly in the evolution of color and mimetic phenotypes in Lepidoptera. Finally, we have shown that Wrath is an easy and flexible means to visualize haplotagging data and explore candidate SVs.
Methods
Visualization and Exploration of SVs From Haplotagging Data Using Wrath
To analyze haplotagging data, we developed Wrath (WRapped Analysis of Tagged Haplotypes, available at github.com/annaorteu/wrath), a program for the exploration and visualization of SVs consisting of three steps.
Barcode Parsing
Haplotagging reads are produced using magnetic beads that present a modified Tn5 enzyme on their surface carrying sequencing adapters, each with a unique barcode. During library preparation, DNA molecules wrap around the beads and are cut into smaller fragments and barcodes attached to them. Thus, reads belonging to the same DNA molecule present the same unique barcode, and the small size of the beads ensures that each barcode combination is unique to one or a small number of molecules. In the sequencing files, barcode information is included as four nucleotide sequences of 6 bp each (two per index read). To analyze haplotagging data, first, molecule information needs to be included as a BX tag in the information fields of the fastq files, a process known as molecule demultiplexing. Once the reads include information on their molecule-of-origin in their BX tag, they are ready to be mapped.
SV Visualization
Using mapped reads and a reference genome, Wrath plots heatmaps of barcode sharing with a single command. Wrath can also produce a list of candidate SVs. Haplotagging reads belonging to the same DNA molecule present the same unique barcode and are expected to map in close proximity in the genome in the absence of rearrangements. Thus, we can use patterns of barcode sharing between more distant genomic windows that exceed the background expectation to identify SVs.
Wrath divides a given chromosome into n windows of size m (m needs to be specified, by default 10 kb) and identifies the barcodes attached to the reads mapping in each of the genomic windows. Window size is chosen based on two factors. First, computational overhead, as Wrath builds a matrix of nxn dimensions which can require a large amount of memory for large values of n. And second, molecule size, which depends on several factors such as sample preservation, DNA extraction, and library preparation. By default, molecules are assumed to be centered around 50 kb in length, although they can be much larger (Meier et al. 2021). Window size needs to be smaller than molecule size (e.g. 10 kpb window size and 50 kb molecule size) as the identification of SVs is only possible if molecules span more than one window.
Once the chromosome has been split into windows, Wrath determines the barcodes that are present in each of those windows and calculates the Jaccard index for each pair of windows along the chromosome and stores the value in a matrix of nxn dimensions. The Jaccard index is an index of similarity that quantifies barcode sharing between windows:
where J is the Jaccard value between window A and B from a given chromosome.
The highest values are expected around the diagonal, which then decay exponentially with distance from it. This is because windows that are closer to each other are expected to share more barcodes than windows that are further apart, as DNA molecules span more than one window. SVs such as moderate to large inversions (>50 kb), intrachromosomal translocations, and long duplicated regions are expected to deviate from the background distribution of barcode sharing. For example, inversions show up as bowtie patterns of excessive barcode sharing (Fig. 2). We define excessive sharing as barcode sharing that is statistically higher than that expected by the distance between the windows given that barcode sharing decays exponentially from the diagonal. Conversely, SV of a much smaller size than the average molecule length cannot be detected with linked-reads. In those cases, short-read methods are more appropriate.
Finally, the construction of the matrix and calculation of Jaccard indices for each genomic window can be a computationally expensive task, for that Wrath can be run in parallel, which reduces computational time.
Exploration of Candidate SVs
Wrath has the additional functionality of producing a list of candidate SVs. Wrath detects SVs from the matrix by finding window comparisons with excess of barcode sharing, such as in inversions, interchromosomal translocations, and duplications. In a chromosome without any structural variation compared to the reference genome, windows in close proximity are expected to share more barcodes than windows that are far apart. Thus, in the matrix of barcode sharing, values are expected to be the highest at the diagonal and to decay exponentially with distance from it, following a double exponential decay model. Knowing the distance between each entry and the diagonal and the value of barcode sharing, we can identify outliers, that is candidate SVs. Wrath fits double exponential decay model fitted to the data such that
where x is the distance of each entry to the diagonal, y is the value of the entry (barcode sharing between a pair of windows), and a, b, and c are parameters that are estimated from the data. While a relates to read coverage and noise in barcode sharing and b relates to read coverage (both being positively correlated), c is inversely related to molecule length (supplementary fig. S12, Supplementary Material online).
The prediction bands fitted by the model include the background distribution of barcode sharing and any windows whose values are outside the prediction bands (α = 0.05) are then classified as putative SVs. Once the model has been fitted, Wrath outputs a list of putative SVs with their genomic coordinates and produces plots of the fitted model and identified outliers. The putative SVs are not classified into SV types and are intended to be used for prioritization processes before further exploration.
Additionally, Wrath scales barcode sharing values according to their distance to the diagonal using Z-scores and filters out any absolute Z-score larger than a given threshold (default 2). This additional filter reduces the number of false positives close to the diagonal, as the closer the genomic windows are, the more variation there is in their barcode sharing.
Wrath can be applied to single populations to visualize and explore putative SVs in each chromosome of the genome. It can also be applied to detect SVs in different populations separately, which can then easily be compared and scanned for overlaps using bedtools (see case-studies below).
Analyses of Heliconius Data
Data from Meier et al. (2021) were used. First, reads were pre-processed, and adapters and low-quality ends were trimmed using TRIMMOMATIC (Bolger et al. 2014). Then, we mapped the reads to the respective reference genome, H. erato v1.0 (Nadeau et al. 2014) and H. melpomene v2.5 (Davey et al. 2016, 2017), using BWA-MEM (Li 2013) and marked PCR duplicates using the MarkDuplicates utility from Picard tools (broadinstitute.github.io/picard). Alignment (BAM) files were used as input for Wrath, which we ran separately for each scaffold in the reference genome with a window size of 10 kb. Reads with a mapping quality below 10 were filtered out by Wrath. To examine the color loci specifically, we ran Wrath with a window size of 1 kb and specifying the desired coordinates.
Simulations of SVs in the H. melpomene Genome
We tested Wrath's accuracy with synthetic SV data. We generated a modified version of H. melpomene Hmel2.5 genome assembly carrying 710 SVs of known location and size using the script simversion.py (https://github.com/simonhmartin/simversion). Insertions, deletions, and inversions ranging in size from 1 to 100 kb (1, 5, 10, 50, and 100 kb) were distributed along each chromosome with the intervals between them randomly sampled from an exponential distribution with mean 1/3 Mb. This approach of sampling the intervals between variants avoids any overlaps between the simulated SVs, but it does not rule out the possibility of overlap between simulated SVs and real SVs in the resequenced individuals. The simulated insertions comprised randomly generated sequences. We then mapped the H. melpomene supp. plesseni samples from Meier et al. (2021) to the modified genome following the methods above. We then applied Wrath to these simulated data using two window sizes, 5 and 10 kb, and evaluated the results.
To evaluate the accuracy of the detection, we defined three measures: overlap, error, and sensitivity (supplementary figs. S3–S5, Supplementary Material online). First, we calculated the overlap between the simulated SV and the detected SV relative to the length of the simulated SV. We also defined a measure of error of detection of SVs in the simulated data. This error is calculated by adding up the absolute distances between detected SV breakpoints and simulated ones and scaled by window size. The error can be interpreted as the number of genomic windows between the SV breakpoint detected by Wrath and the simulated SV breakpoint. Once we have calculated these two metrics, we select detected SV with the maximum overlap and minimum scaled error for each simulated SV, also filtering for a minimum overlap of 0.9. Finally, we defined 3 error thresholds: 15, 20, and 25 genomic windows (supplementary fig. S5, Supplementary Material online) and calculated sensitivity (true positives/(true positives + false negatives)) for each type of SV.
H. misippus Sampling
Samples were collected in multiple locations in Africa. Most sample bodies were preserved in 100% ethanol immediately after collection, with some exceptions that were air dried. Samples in ethanol were kept at room temperature for 2 mo and then stored at −80 °C. A small (1/8) piece of the thorax was used for sequencing.
H. misippus Sequencing Library Construction
DNA extractions and haplotag libraries were prepared essentially as described in Meier et al. (2021), with the some modifications. Haplotag libraries were prepared in batches of 96 samples with DNA sample diluted to 0.15 ng/µl with 10 mM Tris, pH 8, and quantified with Quant-iT PicoGreen dsDNA Assay Kit (Thermo Fisher Scientific). We used only 1.2 µl of haplotagging beads (∼0.88 million beads, each carrying one of 885K well-specific barcodes) per sample, 30 µl of WASH buffer (20 mM Tris pH 8, 50 mM NaCl, 0.1% Triton X-100), 10 µl of 5× tagmentation buffer (50 mM TAPS pH 8.5 with NaOH, 25 mM MgCl2, 50% N,N-dimethylformamide), and 25 µl of 0.6% SDS for Tn5-stripping following tagmentation. For sub-sampling, one-tenth of the beads + DNA (0.15 ng DNA per sample) from each of the 96 samples was pooled into a single eight-tube PCR strip, and then again from every eight pools into four final samples pools. With only four pooled samples on the magnetic stand, the buffer was removed, and 20 µl of 1× Lambda Exonuclease buffer, supplemented with 10 units of exonuclease and 5 units of Lambda exonuclease (New England BioLabs), was added to each sample. Samples were incubated at 37 °C for 30 min, and then washed twice for 5 min with 150 µl of WASH buffer. DNA library was then amplified using Q5 High-Fidelity DNA Polymerase (New England BioLabs) in four 25 µl PCR reactions according to manufacturer's instructions, using 4 µl of 10 µM TruSeq-F AATGATACGGCGACCACCGAGATCTACAC and TruSeq-R CAAGCAGAAGACGGCATACGAGAT primers, with the following cycling conditions: 10 min at 72 °C followed by 30 s 98 °C and 10 cycles of 98 °C for 15 s, 65 °C for 30 s, and 72 °C for 60 s. Libraries were pooled after PCR into a single library pool, size selected using Ampure magnetic beads (Beckman Coulter), Qubit quantified, and adjusted with 10 mM Tris, pH 8, 0.1 mM EDTA to 2.5 nM concentration for sequencing.
Sequencing and demultiplexing. Pooled libraries were sequenced by a HiSeq 3000 (Illumina) instrument at the Genome Core Facility at the MPI Tübingen Campus with a 150 + 13 + 12 + 150 cycle run setting, such that the run produced 13 and 12nt in the i7 and i5-index reads, respectively. Sequence data were first converted into fastq format using bcl2fastq v2.17.1.14 with the following parameters: –use-bases-mask = Y150, I13, I12, Y150; –minimum-trimmed-read-length = 1; –mask-short-adapter-reads = 1; –create-fastq-for-index-reads (Illumina).
Quality Control and Read Mapping
First, molecules were demultiplexed. When using haplotagging, the molecule of origin information is embedded in the read name in the fastq file as a string of four barcodes of six nucleotides each. This was done as described in Meier et al. (2021) to generate the modified fastq files. The molecule ID is then included in the BX tag of each read. Barcode mismatches caused by sequencing errors are allowed as long as there is an unambiguous closest match. Once the BX tag was created, we pre-processed the reads, cutting adapters and low-quality ends using TRIMMOMATIC (Bolger et al. 2014), mapped to the reference genome HypMis_v2 using BWA-MEM2 (Vasimuddin et al. 2019), and sorted by coordinate using Samtools v1.9 (Danecek et al. 2021). PCR duplicates were marked using the MarkDuplicates utility from Picard tools (broadinstitute.github.io/picard) with two specific options CREATE_INDEX = TRUE and READ_ONE_BARCODE_TAG = BX. We then demultiplexed the individuals using their barcodes and included their individual ID information in the read group field.
SNP Calling and Imputation
SNPs were identified using the mpileup utility of bcftools v1.11 (Danecek et al. 2021), running each chromosome separately including the INFO/AD, AD, DP, DV, DPR, INFO/DPR, DP4, SP tags in the output (-a option), setting the minimum mapping quality to 10 (-q) and the minimum base quality to 20 (-Q), ignoring Read Group tags (–ignore-RG) and removing duplicates (-F 1024), and the optput directly piped to bcftools call using the alternative model for multiallelic and rare-variant calling (–multiallelic-caller), including only variants in the output (–variants-only) and the fields GQ and GP (-f GQ, GP). Then, using bcftools query (-f), we generated a file containing the chromosome, position, reference, and alternative alleles for each SNP and with that produced a file of SNP positions that we could use as one of the inputs for the SNP imputation program STITCH (Davies et al. 2016). Following that, we generated genomic windows of 500 kb using bedtools over which we could iterate to run the remainder of the pipeline.
We ran STITCH separately for each of the genomic intervals using all our bam files as input. STITCH imputes SNPs from read and linked-read information but requires fine tuning of the input parameters. To optimize the values, we tested multiple values and compared the results, evaluating their performance using the M locus following methods from Meier et al. (2021). SNPs at the M locus are expected to be 0/0 or 0/1 for misippus individuals and 1/1 for inaria individuals. Options that optimized the results were K = 30, method = diploid, nGen = 500, readAware = TRUE, keepInterimFiles = FALSE, shuffle_bin_radius = 500, expRate = 5, iSizeUpperLimit = 500,000, keepSampleReadsInRAM = TRUE, and use_bx_tag = FALSE. We concatenated the resulting imputed variant calls (vcf files) using bcftools concat.
SNP Phasing in H. misippus
To phase the SNPs into haplotypes, we used HapCut2 (Edge et al. 2017), which we ran separately for each of the 500 kb intervals used for imputation and each individual separately. First, we filtered SNPs based on their informativeness (INFO_SCORE ≥ 0.2) and selected all heterozygous SNPs. We used this as input for the –extractHAIRS utility of HapCut2 together with the BAM files with marked duplicates and the option –10× turned on, which indicates that the input contains linked-reads. This produced a file with unlinked fragments, which we then used as input for the LinkFragments.py script of HapCut2, which integrates the information of the linked-reads. We specified a maximum distance of 50 kb. Then, we used the linked fragment file and vcf as input for the HAPCUT2 utility with the option –nf 1 –threshold 30 –error_analysis_mode 1 –call_homozygous 1. Finally, we integrated the resulting vcf to our vcf of homozygous sites.
Phenotyping and GWAS
We photographed forewings and hindwings of each individual in a standardized set-up, using a green background and a color checker. Phenotypes were scored by hand following the phenotype categorizations of Gordon et al. (2010), coding misippus morphs as 1, and inaria as 0. All sample phenotypes are found in supplementary table S2, Supplementary Material online. Using the merged HAPCUT2 output vcf file as input and the phenotype scores, we performed a GWAS with Plink v1.9 (Purcell et al. 2007) using the option –assoc.
Detection of SVs in H. misippus
Genome wide SVs were identified using Wrath using the same method as for the Heliconius data. We used the intersect utility from bedtools v2.30.0 to assess overlap between SVs identified in homozygous recessive, and heterozygous and homozygous dominant individuals, setting the minimum fraction of overlap to 0.8 for both sets and extracting only one match per SV (intersectBed with options -f 0.8 -F 0.8 -u).
DNA Extractions and Amplification of the M Locus in H. misippus
DNA extractions were carried out using a custom protocol using PureLink buffers and homemade magnetic beads. Briefly, a small piece of thorax tissue (1/10) is placed in a eight-tube PCR strip. Then, 45 ul of PureLink Digestion buffer and 10 ul of Proteinase K (20 mg/ml) are added, and the mix is incubated at 58 °C with shaking (500 rpm). Thereafter, we added 2 ul of RNAseA (DNAse free, 10 mg/ml) and incubated it 10 min at room temperature. Then, we added 45 ul of PureLink Lysis buffer and incubated at 58 °C for 30 min with shaking (500 rpm). We then used a homemade magnetic bead mix to extract the DNA from the lysate. First, we added 37.5 ul of magnetic beads and 75 ul of lysate to a 96-well plate. After mixing, we incubated for 15 min at room temperature, placed the plate in a magnetic stand for 10 min, removed the supernatant and cleaned the beads with 80% ethanol. After drying out, we added 50 ul of 10 mM Tris (pH = 8) to elute and incubated at 45 °C for 15 min without resuspending. Then, we resuspended the beads and incubated for 20 min at room temperature. Finally, we placed the plate on the magnetic stand and, after 10 min, transferred the supernatant (the DNA) to a fresh tube.
To amplify the regions of interest, we designed primers at each side of the deletions and at the breakpoints (supplementary fig. S11, Supplementary Material online). We used a Q5 High-Fidelity 2× Master Mix from New England BioLabs and with 35 cycles. We used eight individuals: four inaria/immima (CAM035232, CAM035239, CAM035244, and CAM035250) and four misippus (CAM035230, CAM035240, CAM035245, and CAM035249).
Reference Genome Alignments
To identify putative homologous regions of the reference genomes of H. bolina HypBol_v1 (Orteu et al. 2023) and H misippus, we aligned the two references to each using Satsuma2 (Grabherr et al. 2010) with default parameters. We visualized the resulting alignments using the asynt R functions (Kim et al. 2022) from https://github.com/simonhmartin/asynt.
Sample Preparation and Genome Wide Analysis of H. bolina Samples and Other Hypolimnas Species
A total of 214 wild and reared H. bolina samples were used. Briefly, sample DNA was extracted, and DNA Nextera libraries were prepared using custom protocols. First, DNA was extracted of the H. bolina samples following a custom protocol that uses PureLink buffers and homemade magnetic beads. Briefly, a small piece of thorax tissue was placed in PureLink Digestion buffer and Proteinase K and incubated for 2–3 h at 58 °C with shaking (500 rpm). RNAse (DNAse free) was added together with PureLink Lysis buffer and the samples were incubated for 30 min at 58 °C with shaking (500 rpm). Afterwards, to pellet any undigested solids, the samples were spun at 4,000g for 10 min at room temperature. Following that, the DNA was extracted from the lysate using a homemade magnetic bead mix with two rounds of 80% ethanol clean-ups.
From the extracted DNA, libraries were prepared following a method based on Nextera DNA Library Prep (Illumina, Inc.) with purified Tn5 transposase (Picelli et al. 2014). PCR extension with the N701–N800 i7-index primer and the N501–N508 and N5017 i5-index primers was performed to barcode the samples. Library purification and size selection were done using the same homemade beads as above.
Short-read data from whole genomes were sequenced to ∼6.5× in coverage. Reads were trimmed using fastp (Chen et al. 2018) and mapped to the reference genome HypMis_v2 using BWA-MEM2 (Vasimuddin et al. 2019) and sorted by coordinate using Samtools v1.9 (Danecek et al. 2021). PCR duplicates were marked using MarkDuplicatesSpark from GATK (Auwera and O’Connor 2020) and SNPs called bcftools v1.11 (Danecek et al. 2021) mpileup like described above. Imputation was carried out with STITCH (Davies et al. 2016) using the same settings as for H. misippus.
Read Depth Analysis and PCA
We calculated read depth from the bam files using the depth utility from samtools (Danecek et al. 2021) with the –a option to output depth for all sites, including those with no reads mapping to them. We visualized the output in R v 4.1.2 using the ggplot2 package (Wickham 2016). VCFs of phased haplotypes were used for PCA, which was performed using Plink v1.9. For local PCA, a sliding window of 25 SNPs was used.
Supplementary Material
Acknowledgments
This project would have not been possible without the help of Eunice Katili and Charo Ngumbao at Kipepeo Project in Malindi, Kenya, who were crucial in collecting samples. A.O. thanks Gergely Flamich for the numerous discussions about Wrath. A.O. thanks the Cambridge Trust and St. John's College for their support.
Contributor Information
Anna Orteu, Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK; Tree of Life Programme, Wellcome Sanger Institute, Hinxton, UK.
Marek Kucka, Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany.
Ian J Gordon, Centre of Excellence in Biodiversity, University of Rwanda, Huye, Rwanda.
Ivy Ng’iru, Mpala Research Centre, Nanyuki 10400, Laikipia, Kenya; School of Biosciences, Cardiff University, Cardiff CF 10 3AX, UK; UK Centre for Ecology and Hydrology, Wallingford OX10 8BB, UK.
Eva S M van der Heijden, Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK; Tree of Life Programme, Wellcome Sanger Institute, Hinxton, UK.
Gerard Talavera, Institut Botànic de Barcelona (IBB), CSIC-CMCNB, Barcelona, Catalonia, Spain.
Ian A Warren, Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK.
Steve Collins, African Butterfly Research Institute, Nairobi, Kenya.
Richard H ffrench-Constant, Centre for Ecology and Conservation, University of Exeter in Cornwall, Penryn TR10 9FE, UK.
Dino J Martins, Turkana Basin Institute, Stony Brook University, Stony Brook, NY 11794, USA.
Yingguang Frank Chan, Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany.
Chris D Jiggins, Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK.
Simon H Martin, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK.
Supplementary Material
Supplementary material is available at Molecular Biology and Evolution online.
Author Contributions
A.O., C.D.J., D.J.M., R.H.f.-C., S.C., and I.J.G. conceptualized the project. A.O., M.K., I.J.G., I.N., E.S.M.H., G.T., I.A.W., S.C., and Y.F.C. collected material and/or generated data. A.O. led the project and performed analyses. A.O. and S.H.M. developed Wrath. S.H.M. and C.D.J. supervised the project. A.O. produced visualizations and wrote the manuscript. All authors read and approved the final manuscript.
Funding
This work was supported by the Natural Environment Research Council (grant number NE/L002507/1 to A.O.); the Varley-Gradwell Travelling Fellowship in Insect Ecology 2021 from the University of Oxford to A.O.; the Student Research Award from the American Society of Naturalists to A.O., and the Lepidopterists’ Society Ron Leuschner Memorial Fund for Research from the Lepidopterists' Society. S.H.M. was supported by a University Research Fellowship from the Royal Society (grant number URF\R1\180682).
Data Availability
Wrath is available at https://github.com/annaorteu/wrath. Whole genome sequencing samples of H. misippus, H. bolina, and other Hypolimnas are available at ENA project accession PRJEB64669. Supporting Supplementary Material online including H. erato and H. melpomene and H. misippus Wrath output and H. melpomene simulation results have been deposited in dryad 10.5281/zenodo.8199940.
References
- Ando T, Matsuda T, Goto K, Hara K, Ito A, Hirata J, Yatomi J, Kajitani R, Okuno M, Yamaguchi K, et al. Repeated inversions within a pannier intron drive diversification of intraspecific colour patterns of ladybird beetles. Nat Commun. 2018:9(1):3843. 10.1038/s41467-018-06116-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, et al. A global reference for human genetic variation. Nature. 2015:526(7571):68–74. 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Auwera Gd, O’Connor BD. Genomics in the cloud: using Docker, GATK, and WDL in terra. Sebastopol (CA): O’Reilly Media, Inc; 2020. [Google Scholar]
- Beldade P, Brakefield PM. The genetics and evo-devo of butterfly wing patterns. Nat Rev Genet. 2018:3(6):442–452. 10.1038/nrg818. [DOI] [PubMed] [Google Scholar]
- Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014:30(15):2114–2120. 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brien MN, Orteu A, Yen EC, Galarza JA, Kirvesoja J, Pakkanen H, Wakamatsu K, Jiggins CD, Mappes J. Colour polymorphism associated with a gene duplication in male wood tiger moths. eLife. 2023:12(e80116):e80116. 10.7554/eLife.80116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carvalho CMB, Lupski JR. Mechanisms underlying structural variant formation in genomic disorders. Nat Rev Genet. 2016:17(4):224–238. 10.1038/nrg.2015.25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Casacuberta E, González J. The impact of transposable elements in environmental adaptation. Mol Ecol. 2013:22(6):1503–1517. 10.1111/mec.12170. [DOI] [PubMed] [Google Scholar]
- Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018:34(17):i884–i890. 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collin R, Cipriani R. Dollo's law and the re-evolution of shell coiling. Proc R Soc Lond Ser B: Biol Sci. 2003:270(1533):2551–2555. 10.1098/rspb.2003.2517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021:10(2):giab008. 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davey JW, Barker SL, Rastas PM, Pinharanda A, Martin SH, Durbin R, McMillan WO, Merrill RM, Jiggins CD. No evidence for maintenance of a sympatric Heliconius species barrier by chromosomal inversions. Evol Lett. 2017:1(3):138–154. 10.1002/evl3.12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davey JW, Chouteau M, Barker SL, Maroja L, Baxter SW, Simpson F, Merrill RM, Joron M, Mallet J, Dasmahapatra KK, et al. Major improvements to the Heliconius melpomene genome assembly used to confirm 10 chromosome fusion events in 6 million years of butterfly evolution. G3 (Bethesda). 2016:6(3):695–708. 10.1534/g3.115.023655. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Davies RW, Flint J, Myers S, Mott R. Rapid genotype imputation from sequence without reference panels. Nat Genet. 2016:48(8):965–969. 10.1038/ng.3594. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Domes K, Norton RA, Maraun M, Scheu S. Reevolution of sexuality breaks Dollo's law. Proc Natl Acad Sci U S A. 2007:104(17):7139–7144. 10.1073/pnas.0700034104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edge P, Bafna V, Bansal V. HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 2017:27(5):801–812. 10.1101/gr.213462.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Elyanow R, Wu H-T, Raphael BJ. Identifying structural variants using linked-read sequencing data. Bioinformatics. 2018:34(2):353–360. 10.1093/bioinformatics/btx712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faria R, Johannesson K, Butlin RK, Westram AM. Evolving inversions. Trends Ecol Evol. 2019:34(3):239–248. 10.1016/j.tree.2018.12.005. [DOI] [PubMed] [Google Scholar]
- Gautier M, Yamaguchi J, Foucaud J, Loiseau A, Ausset A, Facon B, Gschloessl B, Lagnel J, Loire E, Parrinello H, et al. The genomic basis of color pattern polymorphism in the harlequin ladybird. Curr Biol. 2018:28(20):3296–3302.e7. 10.1016/j.cub.2018.08.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gordon IJ. Natural-selection for rare and mimetic color pattern combinations in wild populations of the diadem butterfly, Hypolimnas misippus L. Biol J Linnean Soc. 1987:31(1):1–23. 10.1111/j.1095-8312.1987.tb01977.x. [DOI] [Google Scholar]
- Gordon IJ, Edmunds M, EDgar JA, Lawrence J, Smith DAS. Linkage disequilibrium and natural selection for mimicry in the Batesian mimic Hypolimnas misippus (L.) (lepidoptera: nymphalidae) in the afrotropics. Biol J Linnean Soc. 2010:100(1):180–194. 10.1111/j.1095-8312.2010.01412.x. [DOI] [Google Scholar]
- Gordon JS, Smith DAS. Genetics of the mimetic African butterfly Hypolimnas misippus: hindwing polymorphism. Heredity (Edinb). 1989:63(3):409–425. 10.1038/hdy.1989.115. [DOI] [Google Scholar]
- Gordon IJ, Smith DAS. Body size and colour-pattern genetics in the polymorphic mimetic butterfly Hypolimnas misippus (L.). Heredity (Edinb). 1998:80(1):62–69. 10.1046/j.1365-2540.1998.00259.x. [DOI] [Google Scholar]
- Gould SJ. Dollo on Dollo's law: irreversibility and the status of evolutionary laws. J Hist Biol. 1970:3(2):189–212. 10.1007/BF00137351. [DOI] [PubMed] [Google Scholar]
- Grabherr MG, Russell P, Meyer M, Mauceli E, Alföldi J, Di Palma F, Lindblad-Toh K. Genome-wide synteny through highly sensitive sequence alignment: satsuma. Bioinformatics. 2010:26(9):1145–1151. 10.1093/bioinformatics/btq102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hall BK. Atavisms. Curr Biol. 2010:20(20):R871. 10.1016/j.cub.2010.07.002. [DOI] [PubMed] [Google Scholar]
- Ho SS, Urban AE, Mills RE. Structural variation in the sequencing era. Nat Rev Genet. 2020:21(3):171–189. 10.1038/s41576-019-0180-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoffmann AA, Rieseberg LH. Revisiting the impact of inversions in evolution: from population genetic markers to drivers of adaptive shifts and speciation? Ann Rev Ecol Evol Syst. 2008:39(1):21–42. 10.1146/annurev.ecolsys.39.110707.173532. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Iakovishina D, Janoueix-Lerosey I, Barillot E, Regnier M, Boeva V. SV-Bay: structural variant detection in cancer genomes using a Bayesian approach with correction for GC-content and read mappability. Bioinformatics. 2016:32(7):984–992. 10.1093/bioinformatics/btv751. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jay P, Chouteau M, Whibley A, Bastide H, Parrinello H, Llaurens V, Joron M. Mutation load at a mimicry supergene sheds new light on the evolution of inversion polymorphisms. Nat Genet. 2021:53(3):288–293. 10.1038/s41588-020-00771-1. [DOI] [PubMed] [Google Scholar]
- Jiggins CD. The ecology and evolution of heliconius butterflies. Oxford (UK): Oxford University Press; 2017. [Google Scholar]
- Joron M, Frezal L, Jones RT, Chamberlain NL, Lee SF, Haag CR, Whibley A, Becuwe M, Baxter SW, Ferguson L, et al. Chromosomal rearrangements maintain a polymorphic supergene controlling butterfly mimicry. Nature. 2011:477(7363):203–206. 10.1038/nature10341. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kim K-W, De-Kayne R, Gordon IJ, Omufwoko KS, Martins DJ, ffrench-Constant R, Martin SH. Stepwise evolution of a butterfly supergene via duplication and inversion. Phil Trans R Soc B: Biol Sci. 2022:377(1856):20210207. 10.1098/rstb.2021.0207. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kirkpatrick M, Barton N. Chromosome inversions, local adaptation and speciation. Genetics. 2006:173(1):419–434. 10.1534/genetics.105.047985. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kondrashov FA. Gene duplication as a mechanism of genomic adaptation to a changing environment. Proc R Soc B: Biol Sci. 2012:279(1749):5048–5057. 10.1098/rspb.2012.1108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kumar S, Suleski M, Craig JM, Kasprowicz AE, Sanderford M, Li M, Stecher G, Hedges SB. TimeTree 5: an expanded resource for species divergence times. Mol Biol Evol. 2022:39(8):msac174. 10.1093/molbev/msac174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Küpper C, Stocks M, Risse JE, Dos Remedios N, Farrell LL, McRae SB, Morgan TC, Karlionova N, Pinchuk P, Verkuil YI, et al. A supergene determines highly divergent male reproductive morphs in the ruff. Nat Genet. 2015:48(1):79–83. 10.1038/ng.3443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lamichhaney S, Fan G, Widemo F, Gunnarsson U, Thalmann DS, Hoeppner MP, Kerje S, Gustafson U, Shi C, Zhang H, et al. Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax). Nat Genet. 2016:48(1):84–88. 10.1038/ng.3430. [DOI] [PubMed] [Google Scholar]
- Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014:15(6):R84. 10.1186/gb-2014-15-6-r84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lewis JJ, Geltman RC, Pollak PC, Rondem KE, Van Belleghem SM, Hubisz MJ, Munn PR, Zhang L, Benson C, Mazo-Vargas A, et al. Parallel evolution of ancient, pleiotropic enhancers underlies butterfly wing pattern mimicry. Proc Natl Acad Sci U S A. 2019:116(48):24174–24183. 10.1073/pnas.1907068116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio]. http://arxiv.org/abs/1303.3997
- Li J, Cocker JM, Wright J, Webster MA, McMullan M, Dyer S, Swarbreck D, Caccamo M, Oosterhout Cv, Gilmartin PM. Genetic architecture and evolution of the S locus supergene in Primula vulgaris. Nat Plants. 2016:2(12):16188. 10.1038/nplants.2016.188. [DOI] [PubMed] [Google Scholar]
- Liu W, Smith DAS, Raina G, Stanforth R, Ng'Iru I, Ireri P, Martins DJ, Gordon IJ, Martin SH. Global biogeography of warning coloration in the butterfly Danaus chrysippus. Biol Lett. 2022:18(6):20210639. 10.1098/rsbl.2021.0639. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Livraghi L, Hanly JJ, Van Bellghem SM, Montejo-Kovacevich G, van der Heijden ES, Loh LS, Ren A, Warren IA, Lewis JJ, Concha C, et al. Cortex cis-regulatory switches establish scale colour identity and pattern diversity in Heliconius. eLife. 2021:10(e68549):e68549. 10.7554/eLife.68549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lou RN, Jacobs A, Wilder AP, Therkildsen NO. A beginner's guide to low-coverage whole genome sequencing for population genomics. Mol Ecol. 2021:30(23):5966–5993. 10.1111/mec.16077. [DOI] [PubMed] [Google Scholar]
- Mahmoud M, Gobet N, Cruz-Dávalos DI, Mounier N, Dessimoz C, Sedlazeck FJ. Structural variant calling: the long and the short of it. Genome Biol. 2019:20(1):246. 10.1186/s13059-019-1828-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marks P, Garcia S, Barrio AM, Belhocine K, Bernate J, Bharadwaj R, Bjornson K, Catalanotti C, Delaney J, Fehr A, et al. Resolving the full spectrum of human genome variation using linked-reads. Genome Res. 2019:29(4):635–645. 10.1101/gr.234443.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martin A, Reed RD. Wnt signaling underlies evolution and development of the butterfly wing pattern symmetry systems. Dev Biol. 2014:395(2):367–378. 10.1016/j.ydbio.2014.08.031. [DOI] [PubMed] [Google Scholar]
- Meier JI, Salazar PA, Kučka M, Davies RW, Dréau A, Aldás I, Box Power O, Nadeau NJ, Bridle JR, Rolian C, et al. Haplotype tagging reveals parallel formation of hybrid races in two butterfly species. Proc Natl Acad Sci U S A. 2021:118(25):e2015005118. 10.1073/pnas.2015005118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moore R. 2023. Auckland Museum Entomology Collection. https://www.gbif.org/dataset/1671ddfb-1f8a-4d24-ab4d-50adb89af001
- Morisse P, Legeai F, Lemaitre C. 2021. LEVIATHAN: efficient discovery of large structural variants by leveraging long-range information from linked-reads data. biorxiv 437002. https://www.biorxiv.org/content/10.1101/2021.03.25.437002v1, 25 March 2021, preprint: not peer reviewed.
- Nadeau NJ, Pardo-Diaz C, Whibley A, Supple MA, Saenko SV, Wallbank RWR, Wu GC, Maroja L, Ferguson L, Hanly JJ, et al. The gene cortex controls mimicry and crypsis in butterflies and moths. Nature. 2016:534(7605):106–110. 10.1038/nature17961. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nadeau NJ, Ruiz M, Salazar P, Counterman B, Medina JA, Ortiz-Zuazaga H, Morrison A, McMillan WO, Jiggins CD, Papa R. Population genomics of parallel hybrid zones in the mimetic butterflies, H. melpomene and H. erato. Genome Res. 2014:24(8):1316–1333. 10.1101/gr.169292.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Niu X-M, Xu Y-C, Li Z-W, Bian Y-T, Hou X-H, Chen J-F, Zou Y-P, Jiang J, Wu Q, Ge S, et al. Transposable elements drive rapid phenotypic variation in Capsella rubella. Proc Natl Acad Sci U S A. 2019:116(14):6908–6913. 10.1073/pnas.1811498116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Orteu A, Jiggins CD. The genomics of coloration provides insights into adaptive evolution. Nat Rev Genet. 2020:21(8):461–475. 10.1038/s41576-020-0234-z. [DOI] [PubMed] [Google Scholar]
- Orteu A, McCarthy SA, Hornett EA, Gemmel M, Reynolds LA, Warren IA, Gordon IJ, Hurst GDD, Durbin R, Martin SH, et al. 2023. The Hypolimnas misippus genome supports a common origin of the W chromosome in Lepidoptera. biorxiv 533969. https://www.biorxiv.org/content/10.1101/2023.03.24.533969v1, 24 March 2023, preprint: not peer reviewed.
- Payer LM, Steranka JP, Yang WR, Kryatova M, Medabalimi S, Ardeljan D, Liu C, Boeke JD, Avramopoulos D, Burns KH. Structural variants caused by Alu insertions are associated with risks for many human diseases. Proc Natl Acad Sci U S A. 2017:114(20):E3984–E3992. 10.1073/pnas.1704117114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Picelli S, Björklund ÅK, Reinius B, Sagasser S, Winberg G, Sandberg R. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res. 2014:24(12):2033–2040. 10.1101/gr.177881.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007:81(3):559–575. 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rausch T, Zichner T, Schlattl A, Stütz AM, Benes V, Korbel JO. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012:28(18):i333–i339. 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Reed RD, Papa R, Martin A, Hines HM, Kronforst MR, Chen R, Halder G, Nijhout HF, Mcmillan WO. Optix drives the repeated. Science. 2011:333(6046):1137–1141. 10.1126/science.1208227. [DOI] [PubMed] [Google Scholar]
- Sahoo RK, Lohman DJ, Wahlberg N, Müller CJ, Brattström O, Collins SC, Peggie D, Aduse-Poku K, Kodandaramaiah U. Evolution of Hypolimnas butterflies (nymphalidae): out-of-Africa origin and Wolbachia-mediated introgression. Mol Phylogenet Evol. 2018:123:50–58. 10.1016/j.ympev.2018.02.001. [DOI] [PubMed] [Google Scholar]
- Saleh Ziabari O, Li B, Hardy NB, Brisson JA. Aphid male wing polymorphisms are transient and have evolved repeatedly. Evolution. 2023:77(4):1056–1065. 10.1093/evolut/qpad024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Santos ME, Braasch I, Boileau N, Meyer BS, Sauteur L, Böhne A, Belting H-G, Affolter M, Salzburger W. The evolution of cichlid fish egg-spots is linked with a cis-regulatory change. Nat Commun. 2014:5(1):5149. 10.1038/ncomms6149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sedlazeck FJ, Rescheneder P, Smolka M, Fang H, Nattestad M, von Haeseler A, Schatz MC. Accurate detection of complex structural variations using single-molecule sequencing. Nat Methods. 2018:15(6):461–468. 10.1038/s41592-018-0001-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, Vallente RU, Pertz LM, Clark RA, Schwartz S, Segraves R, et al. Segmental duplications and copy-number variation in the human genome. Am J Hum Genet. 2005:77(1):78–88. 10.1086/431652. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sindi SS, Önal S, Peng LC, Wu H-T, Raphael BJ. An integrative probabilistic model for identification of structural variation in sequencing data. Genome Biol. 2012:13(3):R22. 10.1186/gb-2012-13-3-r22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith DAS. Batesian mimicry between Danaus chrysippus and Hypolimnas misippus (lepidoptera) in Tanzania. Nature. 1973:242(5393):129–131. 10.1038/242129a0. [DOI] [Google Scholar]
- Smith DAS, Gordon JS. The genetics of the butterfly Hypolimnas misippus (L.): the classification of phenotypes and the inheritance of forms misippus and inaria. Heredity (Edinb). 1987:59(3):467–475. 10.1038/hdy.1987.157. [DOI] [Google Scholar]
- Spies N, Weng Z, Bishara A, McDaniel J, Catoe D, Zook JM, Salit M, West RB, Batzoglou S, Sidow A. Genome-wide reconstruction of complex structural variants using read clouds. Nat Methods. 2017:14(9):915–920. 10.1038/nmeth.4366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Hsi-Yang Fritz M, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015:526(7571):75–81. 10.1038/nature15394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomas F, Ujvari B, Renaud F, Vincent M. Cancer adaptations: Atavism, de novo selection, or something in between? BioEssays. 2017:39(8):1700039. 10.1002/bies.201700039. [DOI] [PubMed] [Google Scholar]
- Thompson MJ, Jiggins CD. Supergenes and their role in evolution. Heredity (Edinb). 2014:113(1):1–8. 10.1038/hdy.2014.20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Vane-Wright RI, Ackery PR, Smiles RL. The polymorphism, mimicry, and host plant relationships of Hypolimnas butterflies. Biol J Linnean Soc. 1977:9(3):285–297. 10.1111/j.1095-8312.1977.tb00271.x. [DOI] [Google Scholar]
- VanKuren NW, Massardo D, Nallu S, Kronforst MR. Butterfly mimicry polymorphisms highlight phylogenetic limits of gene reuse in the evolution of diverse adaptations. Mol Biol Evol. 2019:36(12):2842–2853. 10.1093/molbev/msz194. [DOI] [PMC free article] [PubMed] [Google Scholar]
- van’t Hof AE, Campagne P, Rigden DJ, Yung CJ, Lingley J, Quail MA, Hall N, Darby AC, Saccheri IJ. The industrial melanism mutation in British peppered moths is a transposable element. Nature. 2016:534(7605):102–105. 10.1038/nature17951. [DOI] [PubMed] [Google Scholar]
- Vasimuddin Md, Misra S, Li H, Aluru S. 2019. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. In: 2019 IEEE international parallel and distributed processing symposium (IPDPS). p. 314–324.
- Wallbank RWR, Baxter SW, Pardo-Diaz C, Hanly JJ, Martin SH, Mallet J, Dasmahapatra KK, Salazar C, Joron M, Nadeau N, et al. Evolutionary novelty in a butterfly wing pattern through enhancer shuffling. PLoS Biol. 2016:14(1):e1002353. 10.1371/journal.pbio.1002353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Weischenfeldt J, Symmons O, Spitz F, Korbel JO. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet. 2013:14(2):125–138. 10.1038/nrg3373. [DOI] [PubMed] [Google Scholar]
- Wellenreuther M, Mérot C, Berdan E, Bernatchez L. Going beyond SNPs: the role of structural genomic variants in adaptive evolution and species diversification. Mol Ecol. 2019:28(6):1203–1209. 10.1111/mec.15066. [DOI] [PubMed] [Google Scholar]
- Westerman EL, VanKuren NW, Massardo D, Tenger-Trolander A, Zhang W, Hill RI, Perry M, Bayala E, Barr K, Chamberlain N, et al. Aristaless controls butterfly wing color variation used in mimicry and Mate choice. Curr Biol. 2018:28(21):3469–3474.e4. 10.1016/j.cub.2018.08.051. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whiting MF, Bradler S, Maxwell T. Loss and recovery of wings in stick insects. Nature. 2003:421(6920):264–267. 10.1038/nature01313. [DOI] [PubMed] [Google Scholar]
- Wickham H. Ggplot2: elegant graphics for data analysis. 2nd ed. Cham: Springer International Publishing; 2016. [Google Scholar]
- Zeevi D, Korem T, Godneva A, Bar N, Kurilshikov A, Lotan-Pompan M, Weinberger A, Fu J, Wijmenga C, Zhernakova A, et al. Structural variation in the gut microbiome associates with host health. Nature. 2019:568(7750):43–48. 10.1038/s41586-019-1065-y. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Wrath is available at https://github.com/annaorteu/wrath. Whole genome sequencing samples of H. misippus, H. bolina, and other Hypolimnas are available at ENA project accession PRJEB64669. Supporting Supplementary Material online including H. erato and H. melpomene and H. misippus Wrath output and H. melpomene simulation results have been deposited in dryad 10.5281/zenodo.8199940.




