Skip to main content
BMC Biology logoLink to BMC Biology
. 2025 Jul 1;23:179. doi: 10.1186/s12915-025-02282-z

An ancient influenza genome from Switzerland allows deeper insights into host adaptation during the 1918 flu pandemic in Europe

Christian Urban 1,2,, Bram Vrancken 3,4, Livia V Patrono 5, Ariane Düx 5, Mathilde Le Vu 1, Katarina L Matthes 1, Nina Maria Burkhard-Koren 6, Navena Widulin 7, Thomas Schnalke 7, Sabina Carraro 1, Frank Rühli 1, Philippe Lemey 4, Kaspar Staub 1, Sébastien Calvignac-Spencer 5, Verena J Schuenemann 1,2,8,9,
PMCID: PMC12211374  PMID: 40597331

Abstract

Background

From 1918 to 1920, the largest influenza A virus (IAV) pandemic known to date spread globally causing between 20 to 100 million deaths. Historical records have captured critical aspects of the disease dynamics, such as the occurrence and severity of the pandemic waves. Yet, other important pieces of information such as the mutations that allowed the virus to adapt to its new host can only be obtained from IAV genomes. The analysis of specimens collected during the pandemic and still preserved in historical pathology collections can significantly contribute to a better understanding of its course. However, efficient RNA processing protocols are required to work with such specimens.

Results

Here, we describe an alternative protocol for efficient ancient RNA sequencing and evaluate its performance on historical samples, including a published positive control. The phenol/chloroform-free protocol efficiently recovers ancient viral RNA, especially small fragments, and maintains information about RNA fragment directionality through incorporating fragments by a ligation-based approach. One of the assessed historical samples allowed for the recovery of the first 1918 IAV genome from Switzerland. This genome, derived from a patient deceased during the beginning of the first pandemic wave in Switzerland, already harbours mutations linked to human adaptation.

Conclusion

We introduce an alternative, efficient workflow for ancient RNA recovery from formalin-fixed wet specimens. We also present the first precisely dated and complete influenza genome from Europe, highlighting the early occurrence of mutations associated with adaptation to humans during the first European wave of the 1918 pandemic.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12915-025-02282-z.

Keywords: Ancient RNA, Influenza A virus, 1918 flu pandemic, Ancient RNA method comparison

Background

A better understanding of historical pandemics can provide relevant information for the prevention of future pandemics [1]. For example, reconstructing the evolution of pandemic pathogens can unveil their animal reservoir, if they are zoonotic, and the mutations associated with host switches [25].

The 1918–1920 influenza pandemic was the most devastating of the twentieth century, resulting in an estimated 20 to 100 million deaths, including a disproportionate number of young people [6, 7]. While it is probably the best-researched pandemic [8, 9], there are still important unanswered questions. For example, the evolution of its causative agent, the influenza A virus (IAV), is often assumed to have played a significant role in its multi-wave nature [1012], but the relative paucity of 1918 IAV genomes has largely prevented substantiating or dismissing this notion. This is in part because valuable information contained in historical epidemiological data and human wet specimens from medical-historical collections has largely remained locked in archives. Furthermore, truly interdisciplinary studies, for example between biologists and historical epidemiologists, are still rare [13].

Because IAV isolation only became possible in the 1930s, genomic information on 1918 IAV started accumulating when historical biological material such as permafrost remains and formalin-fixed paraffin-embedded (FFPE) pathology samples were first investigated in the 1990s–2000s [1419]. These PCR-based efforts culminated with the sequencing of a first complete genome in 2005 [20]. Since then, the use of high-throughput sequencing methods seems to have facilitated the use of fixed biological material, which has been used to generate an additional five high coverage 1918 IAV genomes [12, 21, 22]. However, such studies remain rare as many researchers consider the detection of viral RNA fragments from such specimens to represent a significant challenge.

Indeed, unprotected phosphodiester bonds in RNA are estimated to be 200 times less stable than in DNA [23]. Combined with the action of ubiquitously present RNAs, this can lead to rapid degradation of unprotected RNA [24]. Formalin fixation stops the action of RNAs and generally halts post-mortem decay processes, but it also has many negative effects on RNA quality. The formalin solution can cause deamination [25], crosslinking between proteins and nucleic acids [26, 27], as well as fragmentation [28]. The preparation of samples as FFPE blocks, which adds steps of tissue dehydration and paraffin-embedding, still results in measurable degradation of nucleic acids in a matter of years [29]. Recent studies have already demonstrated the successful extraction of unexpectedly long viral RNA fragments from century-old formalin-fixed wet specimen samples, hinting at a potential long-term RNA preservation mechanism by the viral capsid [12, 30]. Diversifying and improving extraction and library preparation methods will likely be essential to unlock information from the many thousands of specimens in medical collections [31].

Here, we present a new user-friendly ancient RNA (aRNA) workflow, which yielded reliable results from formalin-fixed wet specimen samples linked to the 1918 influenza pandemic. We were able to reconstruct a new well-covered (63.9 × average coverage) genome from Zurich, Switzerland, a country from which genomic information on this pandemic was so far missing. This genome, dating to the early first wave in Switzerland in the summer of 1918, already showed many genomic signs of adaptation to the human host.

Results

Ligation-based workflow assessment—overview

Although ancient RNA research is a rather young field, there are already a few workflows for RNA extraction from various tissue types and library preparation available [30, 32, 33]. However, each of those workflows also comes with different restrictions. When designing our new ancient RNA processing workflow, we aimed to create a workflow compatible with formalin-fixed wet specimen samples from medical collections that efficiently recover ancient RNA fragments in a broad range of sizes. Moreover, the RNA extraction should be phenol- and chloroform-free to enable safe and easy use in a clean room environment without a fume hood. Lastly, we opted for a ligation-based conversion protocol to ensure complete fragment incorporation in libraries while maintaining information about RNA fragment directionality.

A brief summary of the developed ligation-based workflow in comparison to another phenol- and chloroform-free (further referred to as “R6” [30]) workflow suitable for formalin-fixed wet specimen samples from medical collections is represented in Fig. 1. In short, the developed ligation-based workflow uses a short high temperature decrosslinking step, followed by extracting RNA fragments (including small RNA fragments), DNase treatment to remove genomic DNA, polynucleotide kinase treatment (PNK) for RNA end repair, and complete fragment incorporation in a library while maintaining strand orientation information by the NEBNext Small RNA Library Prep Set for Illumina. In contrast, the previous R6 Protocol [30] uses a similar high temperature decrosslinking step, followed by RNA extraction (potentially losing smaller fragments), DNase treatment to remove genomic DNA, rRNA depletion (lowering rRNA content but potentially also losing fragments of interest in the cleanup steps), random-hexamer-based cDNA synthesis, and DNA library preparation (losing information about RNA orientation).

Fig. 1.

Fig. 1

Comparison between two phenol- and chloroform-free workflows. The new ligation-based workflow is on the right and the R6 workflow [12] on the left

In order to assess the new protocol, we compared the performance to the previously published phenol- and chloroform-free workflow using a random hexamer-based cDNA-synthesis [30] (Fig. 1). As a positive control, we used BE-572 [12], which worked very well with the previously published, phenol- and chloroform-free workflow. A negative control was included for each processing batch to assess potential background. For a direct comparison, we cut each sample and the positive control individually to small pieces followed by grounding to a mash. Equal amounts of mash were then used for each of the six different samples and the positive control, BE-572 [12], to run both workflows (Fig. 1) in independent triplicates for comparison. These samples consist of three lung tissue samples (time of autopsy ranges between 1918 and 1946) and three liver tissue samples (time of autopsy ranges between 1912 and 1947) from the Medical Collection of formalin-fixed wet specimens at the IEM, UZH. Of particular interest is the specimen ZH1502 dating to 15 July 1918 from which we recovered fragments positive for influenza virus A in our analyses (s. method section for a detailed description of ZH1502 and Fig. 2B for sample picture).

Fig. 2.

Fig. 2

Results for sample ZH1502 mapping against the influenza A reference. A The average coverage of the reference for both RNA workflows. B Picture of the collection specimen from which sample ZH1502 was taken. C Length distribution of mapped reads from ZH1502 against MU-162 reference [12]. D Mapping mismatches over the first 25 nt of read 1 from ZH1502 against MU-162 reference [12]

The first comparison of the two workflows was assessed immediately after decrosslinking and RNA extraction, by Qubit quantification of extracted total RNA. We would like to caution that these numbers only serve as rough indicators, since ssDNA from the high-heat decrosslinking step could still be present after the RNA extraction and be detected in the Qubit HS RNA assay. The results of the RNA measurements indicate that there is no clear tendency towards one of the two different extraction methods (Fig. S1). One extraction method could result in slightly higher yields for one sample, whereas the opposite is true for another sample. Most samples were close to the lower range of the detection range of the Qubit HS RNA assay, indicating overall low amounts of extracted input material for library preparation.

Ligation-based workflow assessment—pathogen RNA detection

From here on, we discuss the comparison of the sequencing data for both workflows. To even the read numbers for all samples and replicates, only the first 2.5 million reads of R1 were used for the direct comparison. From the previous publication of the positive control, it is known that the sample can display a high content of IAV RNA fragments potentially leading to a good reference coverage, even without enrichment [12]. Therefore, we opted to first have a look at the IAV fragments in the samples by aligning them against the reference of the de novo reconstruction based on the MU-162 sample [12]. For the positive control, the average coverage with the ligation protocol was slightly higher than the R6 workflow, but not in a significant way (ligation-based: 4.3 × coverage; R6: 3.6 × coverage; Fig. 2A). All negative controls got exactly 0 hits, for all replicates and protocols. Additionally, sample ZH1502 also showed very good coverage (ligation-based: 12.4 × ; R6: 2.3 × ; Fig. 2A).

Next, we investigated the distribution of read lengths mapping to the MU-162 reference. For the positive control and ZH1502, using the R6 workflow less than 5% of reads were 72 nt or shorter whereas for the ligation-based protocol more than 60% of reads were 72 nt or shorter for all replicates. Exact distributions for the mapped read lengths are shown in Fig. 2C and Additional file 1: Fig. S2B. Therefore, we were truly able to increase the recovered number of short fragments without decreasing the reference coverage. This could be especially helpful for strongly degraded samples with a high number of short RNA fragments. We further assessed the recovered fragment lengths of the two protocols for the IAV positive samples by re-sequencing with longer read lengths (NextSeq2000, 300PE). The results indicate longer average fragment lengths for the R6 protocol (Additional file 1: Figs. S5, S6). The maintained RNA fragment orientation in the ligation protocol showed a tendency of negative-strand fragments towards longer fragments (Additional file 1: Fig. S6). This cannot be seen with the R6 protocol, due to missing strand orientation information (Additional file 1: Fig. S5).

Another advantage of the ligation-based protocol is that the adapters are directly being ligated to the ssRNA fragment. This should allow the complete incorporation of the whole ssRNA fragment in a library without causing mismatches or fragment shortening by random hexamer priming. We briefly checked for cytosine to thymine (C → T) mismatches as previously reported for RNA fragments [33] and for all other mismatches, potentially caused by mispriming [34]. Indeed we see a high number of general mismatches towards the beginning of read 1 for the R6 workflow (positive control: 33.7%, Additional file 1: Fig. S2A; ZH1502: 28.7%, Fig. 2D), which is not present for the ligation-based workflow (Positive Control: 6.5%, Additional file 1: Fig. S2A; ZH1502: 6.0%, Fig. 2D). The previously described C → T mismatch pattern distribution is only visible for the ligation-based workflow (Positive control first base: 13.2%, then diminishing, Additional file 1: Fig. S2A; ZH1502 first base: 15.2%, then diminishing, Fig. 2D) and not for the R6 workflow (positive control first base: 0.6%, not diminishing, Additional file 1: Fig. S2A; ZH1502 first base: 1.1%, not diminishing, Fig. 2D).

Ligation-based workflow assessment—host RNA detection

A previous study suggested that host RNA can be more fragmented than viral RNA of the same sample [12]. Given better recovery rate of smaller fragments of the ligation-based workflow, we started to assess the human RNA content of all six samples and the positive control. As a first step, the rRNA content for both workflows was assessed. The ligation-based workflow has no rRNA depletion step, to minimise sample loss during the multiple incubation steps for the rRNA-depletion and the additional cleanup. Therefore, it is not surprising to see that some of the samples processed with this workflow have a high rRNA content. The positive control has an average of 24.4% rRNA of all the reads mapping to the human reference (GRCh38.p13), and ZH1502 has 40.2%. All other samples have less than 1% rRNA content for all replicates (Fig. 3A). The R6 workflow uses a rRNA-depletion step, and shows a lower rRNA-proportion for the positive control and ZH1502 (< 2% for all replicas, Fig. 3A). Ribosomal RNA is the most abundant RNA in a cell. Therefore, it might serve as an indication for general host RNA preservation, if not depleted. As a next step, the actual coverage of known isoform references was assessed for all samples by mapping against the human reference. The results for the ligation-based workflow reflect the results from the rRNA content. Only the positive control and ZH1502 show visible isoform coverage (Fig. 3B). For both samples, most isoforms have low coverage (< 10% covered) and only a tiny number show ≥ 10% coverage (Fig. 3B). Isoform fraction coverage for all other samples and the negative control is negligible. For the R6-based workflow, the number of covered isoform references is comparable to the negative controls (Fig. 3B). Absolute numbers of reads falling in exons, introns, rRNA, lncRNA, and unannotated regions were assessed for both protocols (Fig. S7). Both, the ligation-based workflow and R6 workflow, show background of lncRNA reads mapping to unannotated regions in all samples and the negative control. The proportion of reads mapping to unannotated regions is especially high for triplicates of the positive control of the ligation-based workflow (36.7%, 36.1%, 36.1%).

Fig. 3.

Fig. 3

Results of mapping against human transcriptome reference. A Fraction of reads mapping to rRNA. B Number of isoform references with < 10% and ≥ 10% coverage partial coverage

Reconstruction of a new 1918 IAV genome

After establishing the workflow, the influenza virus positive sample ZH1502 was subjected to a deeper analysis. For this, all reads from the R6 workflow and the ligation-based workflow were combined and trimmed by 1 nt on both sides to reduce mismatches caused by either mispriming (R6) or C → T mismatches (ligation-based). Reads were then aligned to the MU-162 [12] reference, resulting in an overall average coverage of 63.9 × , with 98.8% of the reference genome covered at least 1 × and 96.7% covered at least 5 × (Fig. 4A, Additional file 2: Table S1). In contrast to MU-162 [12] a detailed dating and contextualization within the 1918–1920 pandemic was possible for this newly recovered genome due to its preserved autopsy report from the Pathological Institute of the former Cantonal Hospital Zurich (nowadays University Hospital Zurich). Based on this information, the specimen ZH1502 could be associated with a deceased patient of 18 years whose autopsy was performed on 15 July 1918 at the onset of the first wave of the “Spanish flu” pandemic in Switzerland [35, 36].

Fig. 4.

Fig. 4

A Segment-wise coverage plots of ZH1502 mapping against the MU-162 reference [12]. Coloured bars indicate SNPs. Synonymous SNPs are shown in orange and nonsynonymous SNPs in green. The circle at the top of the bar indicates actual SNP coverage and might therefore slightly vary from the average coverage of the region. For more detailed SNP information, please also consider Table 1. The abbreviations used in the figure are as follows: Segment 1: Polymerase Basic 2 (PB2); Segment 2: Polymerase Basic 1 (PB1); Segment 3: Polymerase Acidic (PA); Segment 4: Hemagglutinin (HA); Segment 5: Nucleoprotein (NP); Segment 6: Neuraminidase (NA); Segment 7: Matrix Proteins (MP1 and MP2); Segment 8: Non-Structural Proteins (NS1 and NS2). B Newly reported influenza-like cases in Zurich city and Zurich Canton from 1918 to 1925 (mandatory reporting introduced on 25. July 1918 [41]). “A” marks the wave from which sample ZH1502 originates. For a higher temporal resolution, see also Additional file 1: Fig. S1

The 1918–1920 flu pandemic in Switzerland

In Switzerland, the 1918–1920 pandemic occurred in three to four waves, causing at least 2 million infections (at least two-thirds of the population) and about 25,000 deaths (0.67% of the population) between summer 1918 and spring 1920 [37]. As elsewhere, the death toll included a disproportionate number of young people and especially men. While the second wave in the autumn and winter of 1918 was the strongest wave anywhere in the country, eastern Switzerland was hit less severely than the rest of Switzerland by the summer wave in July and August 1918, as measured by the mortality figures [3741].

In the canton and city of Zurich, the pandemic probably began in early July 1918, when there were various reports from factories of outbreaks of the disease [42]. Even though there were already a few autopsies at the Cantonal Hospital Zurich in June in which pneumonia was listed as the main cause of death, the influenza pneumonia as the explicit cause was first used in the autopsy reports on 9 July 1918 [36]. The first wave probably emerged in the first weeks of July 1918, but this cannot be traced precisely based on incidence data as the cantonal influenza reporting obligation for physicians was not introduced until 25 July 1918. Overall, 2370 deaths from flu were reported in the canton of Zurich in 1918 (4.5 deaths per 1000 inhabitants). In the city of Zurich, 920 deaths from flu were reported in 1918 [43].

The weekly course of the pandemic in Zurich is shown in Additional file 1: Fig. S3 [41]. There was clearly a summer wave in Zurich from early July 1918, as can be seen with increased incidence of both diseases (only reliable from the mandatory reporting date of 25 July 1918) and hospitalisations. This first wave was apparently not as deadly as in western Switzerland, as all-cause mortality incidence only moderately increased. The date of the autopsy of the 18-year-old man under consideration here (indicated by the black vertical line in Additional file 1: Fig. S3, or with “A” in Fig. 4B), therefore falls in an early phase of the pandemic in Zurich, when the first wave was on the rise and before the cantonal authorities reacted with measures. The first wave appeared to level off in August and September 1918 after the authorities took measures. After a rather calm September, the fatal and powerful autumn and winter wave emerged. A small third wave can then be seen in the first few months of 1919, followed by a very pronounced later wave in February 1920, which again had a large impact on mortality.

IAV virus single nucleotide polymorphisms (SNPs) and phylogenetic analysis

ZH1502 differs from the best-quality European genome it was mapped to (MU-162 [12]) at 35 positions out of which 14 result in amino acid changes (Table 1 and Fig. 4A). SNPs are not distributed evenly amongst the IAV genome segments. Segment 1 (Polymerase Basic 2 (PB2)) has the highest number of ZH1502/MU-162 SNPs (n = 12, 34.3% of all SNPs; Table 2), and the same is observed when only considering non-synonymous SNPs (n = 5, 35.7% of all SNPs; Table 2). Importantly, this is not the mere reflection of segment length, as SNP density was also the highest for PB2. At the opposite end of the spectrum, ZH1502 and MU-162 segment 4 sequences (hemagglutinin (HA)) are exactly identical.

Table 1.

SNP positions and coverage. A detailed list containing the SNP position, percentage of SNP occurrence, coverage, and amino acid change, if applicable, in respect to the MU-162 reference [12]. Segment 4, coding for hemagglutinin (HA), did not contain any SNPs

Gene Nucleotide change (position of start codon as 1) SNP coverage (x fold) Percentage of SNP Amino acid change
MP1 123T>C 52 65% A41= 
459G>A 31 90% Q153= 
693T>C 67 93% D231= 
MP2 161T>G 78 88% L54R
NA 393C>T 12 75% T108= 
435G>A 6 100% S122= 
588C>A 24 67% S173= 
774C>A 4 100% I235= 
791C>T 7 57% T241I
1036G>A 16 81% V323I
1386A>G 24 100% E439= 
NP 181T>A 168 100% L61I
864G>A 73 100% G288= 
1221T>C 56 98% S407= 
NS1 45 T>C 32 94% L15= 
64G>T 18 100% V22F
PA 531T>C 23 100% T177= 
1009T>G 24 94% S337A
1203A>G 75 99% R401= 
1601C>T 36 67% P534L
1709T>C 13 100% I570T
PB1 2081G>A 34 97% S694N
2202A>G 111 100% R734= 
PB2 12G>A 79 100% M4I
213T>C 170 59% N71= 
239A>G 158 56% K80R
246T>C 156 95% N82= 
322A>G 136 93% T108A
549G>A 118 62% L183= 
816A>G 105 94% V272= 
999A>T 99 59% T333= 
1326A>G 38 100% A442= 
1452T>A 68 100% G484= 
1615A>G 64 100% I539V
1891T>A 87 99% L631M

Table 2.

SNP frequency for individual ZH1502 genome segments. Both total SNPs and non-synonymous SNPs are listed

Genome segment MU-162 reference length [nt] Total SNPs SNPs per 1000 nt Non-synonymous SNPs Non-synonymous SNPs per 1000 nt
Segment 1 (PB2) 2329 12 5.15 5 2.15
Segment 2 (PB1) 2333 2 0.86 1 0.43
Segment 3 (PA) 2233 5 2.69 3 1.34
Segment 4 (HA) 1778 0 0.00 0 0.00
Segment 5 (NP) 1565 3 1.92 1 0.64
Segment 6 (NA) 1369 7 5.11 2 1.46
Segment 7 (MP) 1097 4 3.65 1 0.91
Segment 8 (NS) 841 2 2.38 1 1.19

To investigate whether this pattern is shared with other IAV pandemics, we built maximum likelihood trees and compared overall tree length and average pairwise patristic distances during the 1918 and 2009 pandemics. To do this, we subsampled the 2009 pandemic to generate genome sets matching the 1918 pandemic genome set. Both measures are higher in the current sample of 1918 IAV than in any of 10 matched subsamples of the 2009 pandemic for three segments: segment 1 (PB2), segment 3 (Polymerase Acidic (PA)) and segment 4 (HA) (Fig. 5A, B). This effect is so pronounced for average distances that PB2 appears as the second most diverse segment in 1918, when it is only the sixth most diverse in 2009 (Fig. 5B).

Fig. 5.

Fig. 5

Comparison of divergence (panel A) and diversity (panel B) between the 1918 and 2009 H1N1 pandemics. For each segment, the black dot represents the divergence/diversity estimate for the 1918 pandemic. Coloured dots represent the 2009H1N1pdm divergence/diversity estimates based on 10 subsamples of the available 2009H1N1pdm sequence data with spatiotemporal spacing matching that of the available 1918 sample as closely as possible

The HA segment has the highest number of available 1918 to 1919 pandemic references of all IAV segments [12, 14, 21, 22, 44, 45]. Therefore, we used this segment to build a time-scaled phylogeny using BEAST [46]. ZH1502 clusters in a relatively well supported clade that comprises strains from mainland Europe and North America from the first and second waves of the pandemic (Fig. 6).

Fig. 6.

Fig. 6

Time-scaled phylogeny of HA sequences from 1918 and 1919 pandemic IAV. Sequences from the USA are in grey, published European sequences in green, and the new ZH1502 sequence in red. Support values are given only for branches with > 80% posterior probability. The same model specification as for Fig. 3 by Patrono and colleagues (2022) was used [12]

Finally, ZH1502 shows human-like residues at three sites previously discussed as possibly differentiating first and second wave genomes (D222 in HA, and D16 and P283 in nucleoprotein (NP)—segment 5) [12, 44].

Discussion

Countless medical collections worldwide are true treasure chests for historic pathogen research. The samples for this study originate from specimens of the Human Remains Collection of the Institute of Evolutionary Medicine (University of Zurich), which houses over 1700 formalin-fixed wet specimen samples. Larger collections, though limited in number, like the Pathological-Anatomical Collection at the Narrenturm in Vienna, can house over 35,000 well curated and documented wet-specimen samples [31] showcasing the importance of this sample type for scientific research as well as the necessity for efficient methods to access its genomic information. Early work on human immunodeficiency virus 1 (HIV-1) demonstrated that archival samples can offer key insights in viral evolutionary histories [47], and this eventually resulted in more sensitive approaches for recovering viral RNA from such samples [48]. The first successful reconstruction of a pathogen genome with high-throughput sequencing out of a formalin-fixed wet specimen sample [30] sparked further interest in efficient ancient RNA extraction protocols. Ancient RNA is often only present in trace amounts [33] as the samples were not specifically collected and preserved for RNA work, which can be challenging even for more recent samples. Issues can arise from post-mortem RNA degradation as delayed fixation can cause severe RNA degradation [49, 50] as well as from the effects of formalin fixation itself [51]. Efficient workflows for ancient RNA-sequencing accounting for these limitations [25, 26, 28], are needed to facilitate the study of RNA in formalin-fixed wet specimen samples for RNA studies. Efficient protocols for aRNA sequencing of formalin-fixed wet specimens are not limited to human samples from medical collections. Animal specimens preserved in this way and housed in museum collections could significantly enhance our understanding of interspecies transmission of zoonotic pathogens, such as IAV. While there is evidence of human-to-animal transmission during the pandemic [52], no credible IAV sequences from animal samples of that period are currently available. Although animal specimens were not included in this study, such data could meaningfully inform ongoing discussions on topics like viral reassortment and the role of potential mixing vessels. [2, 3, 53, 54].

The ligation-based workflow was mainly designed for pathogen RNA recovery. However, it can also be used for host RNA detection (Fig. 3, Fig. S7). It should be noted that the fraction of reads mapping to unannotated regions of the human reference (Fig. S7A), could originate partly from ssDNA fragments. Efficiency of DNase treatment of aDNA might be impaired [55]. The presence of a few reads stemming from aDNA origin, cannot be ruled out. This does not impact aRNA analysis from viruses, but should be taken into account when analysing host RNA data.

The ligation-based protocol incorporates complete RNA fragments for cDNA synthesis by ligation of adapters to the 3′- and 5′-sites. Complete cDNA synthesis is important, since both adapter sequences are required for further library construction. In contrast, random hexamer-based cDNA synthesis can generate double-stranded cDNA for further library preparation, even if the initial first-strand cDNA synthesis stops prematurely. Hence, we suspected that a ligation-based library preparation would be more impacted by RNA modifications which might cause premature cDNA-synthesis termination like crosslinking [56] or abasic sites [57]. Formalin solution is known to cause cross links between RNA and various types of biomolecules [5860] and various processes can lead to an abasic-site in RNA over time, e.g. acid-induced depurination [61]. Therefore, we compared the total number of mapped fragments > 72 nt and those ≤ 72 nt for both methods (Fig. 2C, Additional file 1: Fig. S2B). Indeed, the number of large fragments is higher for the R6 workflow. More importantly, the number of smaller fragments is significantly higher in the ligation-based workflow (Fig. 2C, Additional file 1: Fig. S2B). This indicated a better potential for the ligation-based workflow to recover highly fragmented RNA. In addition, the increased number of small fragments in the ligation-based protocol resulted in comparable coverage to the R6 workflow (Fig. 2A). Based on the data alone, it is not clear if the lower number of long fragments for the ligation-based workflow is linked to premature termination of cDNA synthesis upon encounter of ancient RNA modifications. It might also be linked to an inherent bias of the extraction and library preparation methods, both being predominantly developed for short RNA fragments. Since the two workflows were compared complete processes, it is not possible to pinpoint the specific step responsible for the lack of small fragments in the R6 workflow. It could be attributed to the RNA extraction, which is not optimal for small RNA fragments, to the clean-up during the rRNA depletion step, or to the library preparation itself. In conclusion, the ligation-based protocol performs on-par with the random hexamer protocol in regard to average reference coverage, but strongly favours the recovery of short RNA fragments. The results for fragment lengths are consistent for sequencing data with longer read parameters (Additional file 1: Figs. S5, S6, Additional file 2: Table S1). The information about RNA fragment orientation from the ligation-based protocols indicates that negative strand IAV fragments have slightly longer fragment sizes. Since IAV have negative strand RNA genomes, this could further support the initial idea of a protective function from the viral capsid on RNA preservation [12]. Furthermore, although limited, protective effects from tight association of the viral genome (and cRNA) with NP may contribute to IAV RNA preservation in historic samples [62, 63]. We also note that directional information paves the way to differentiating IAV replication dynamics (negative strand) and transcriptional activity (positive strand).

A general benefit of ligation-based cDNA-synthesis is that RNA fragments are completely incorporated in the sequencing library and no random priming is involved. Therefore, we assumed that only ligation-based cDNA synthesis protocols can pick up C → T mismatches caused by deamination, which were previously observed to not only occur towards fragments ends on higher frequency for ancient DNA [64] but also for ancient RNA [33]. In both cases, the lower deamination rate in the middle regions is assumed to be linked to a protective function of, in case of ancient DNA, double-stranded DNA [65, 66] or, in case of ancient RNA, secondary structure formation [67]. We validated the initial observation [33] for the ligation-based protocol (Fig. 2D, Additional file 1: Fig. S2A). For this workflow, we can indeed see a higher C → T mismatch rate towards the fragment end, rapidly dropping off after the first base position. All other mismatches, counted together, have a much lower frequency. In contrast, we see almost no C → T mismatches for the random hexamer-based workflow, especially at the very beginning, whereas other mismatches are found with high frequency at the fragment start (Fig. 2D, Additional file 1: Fig. S2A). This is in accordance with the previous observations that random hexamer misprimings occur predominantly on the primers 5′ site and C → T mismatches occur with one of the lowest ratios [34]. In conclusion, mismatches towards the fragment end of the ligation-based protocol could serve as a very first indication for authenticity of ancient RNA like for ancient DNA [64], whereas random hexamer-based workflows can lead to higher overall mismatches towards fragment ends due to mispriming.

Here, we have used our technical investigations to add to the knowledge about the 1918 influenza pandemic by reconstructing a new genome from Zurich, Switzerland. The patient died on July 15th, 1918, during the early first wave of the pandemic in the city. Interestingly, this IAV genome already encodes amino acid residues associated with human adaptation at multiple positions. For example, it encodes for an aspartic acid in position 16 of its nucleoprotein (D16) and a proline in position 283 (P283) which are associated with resistance to the human myxovirus resistance protein 1 (MxA) [4, 6870]. MxA is an interferon-induced antiviral factor targeting the nucleoprotein of susceptible viruses and thereby sequestering viral RNA [7174], and it is assumed to represent an important barrier against zoonotic transmissions of avian-like IAV to humans [75]. The only other clearly dated European first wave IAV genomes, BE-572 and BE-576 from Berlin (Germany), were reconstructed from samples originating about three weeks earlier (June 27th, 1918) than ZH1502 and showed avian-like G16 and L283 for NP [12]. Taken together, these findings suggest that viruses with both genotypes co-circulated in the early summer 1918, but fixation of the NP adaptive changes was still under way. A perhaps less likely alternative scenario is that fixation happened in the exact three weeks between the sampling of ZH 1502 and BE-572/576.

Sheng and colleagues [44] similarly hypothesised that the viruses involved in the first and second waves of the 1918 pandemic presented different residues at position 222 of the HA gene. This position in the HA receptor-binding domain influences the sialic acid binding preferences of the protein. G222 can bind to avian-like receptors (α2–3 SA) and human-like receptors (α2–6 SA), whereas D222 only binds to human-like sialic acids (α2–6 SA) [5, 44, 76]. ZH1502 already displays human-like D222 in HA, just as BE-572 and BE-576 [12], to which the ZH1502 HA segment is closely related (Fig. 6). This reinforces the notion that the G222D substitution had already reached high frequency in viruses of the first wave. In summary, by July 1918, first wave viruses had already evolved several critical adaptations to their new human niche.

ZH1502 is the seventh 1918 influenza case, for which high-coverage genome-wide information is obtained [12, 2022]. Although this number is still very small, it allows us to develop more comparisons with subsequent IAV pandemics. The 2009 H1N1 pandemic is the only one for which such a rich genomic record exists that it is possible to draw subsamples matching the spatiotemporal spacing of the small collection of 1918 cases. Recent comparisons based on a smaller set of 1918 IAV sequences had shown similarities between both pandemics, with widespread intercontinental dispersal and persistence of lineage across waves [12]. Here, we identify measurable differences in terms of diversity generated across the 8 genomic segments. We show that 1918 IAV evolved more diverse HA, PA, and PB2 than matched 2009pdm IAV. This high diversity may have favoured the transition toward higher pathogenicity since functional analyses have shown that the HA and polymerase genes were crucial for the high pathogenicity of the only 1918 IAV pandemic virus reconstructed thus far [7779].

Finally, the fact that the same HA variant was found associated with clearly divergent other segments (especially in PB2) is a first hint at potential reassortment early in the pandemic (segment-wise maximum likelihood (ML) trees, Additional file 1: Fig. S4). However, given the few sequences at hand, it is not possible to exclude completely an alternative scenario combining strong purifying selection on HA and strong diversifying selection on some other segments (including PB2). In general, gaining more insight into the evolution of 1918 IAV will require the identification of more cases in historical pathology collections and the generation of more genomes.

Conclusions

All European influenza A genomes from the 1918–1920 flu pandemic originate from formalin-fixed wet specimen samples [12]. Historical pathology collections certainly house more biological samples of this pandemic, as well as many well-documented samples with great potential for other ancient pathogen genomics. Here, we introduced a new protocol for ancient RNA sequencing from such samples, thereby contributing to making them more accessible. We used this method to reconstruct the first influenza A virus genome from the early first wave in Switzerland, which already contained three amino acid residue changes associated with human adaptation. The first precisely dated and complete genome (97% coverage ≥ 3) from Europe is currently the only first wave genome with the substitutions G16D and L283P, which potentially gave this strain an advantage during the course of the pandemic as all high-coverage second wave genomes also carry these mutations.

Methods

Detailed sample information for the influenza A positive sample ZH1502

The specimen from sample ZH1502 is nowadays part of the Human Remains Collection of the Institute of Evolutionary Medicine, University of Zurich (Inventory Number 1156). According to the autopsy report from the pathological institute at the former Cantonal Hospital Zurich (nowadays University Hospital Zurich), the deceased patient was an 18-year-old man of rather strong physique. The autopsy No. 496 was performed on 15 July 1918 [36]. The main cause of death was bilateral haemorrhagic bronchopneumonia. The autopsy report also notes secondary diagnoses such as additional pathologies of the upper respiratory tract (bronchitis, laryngitis, tracheitis), lower respiratory tract (pleural effusion), heart (cardiac hypertrophy, dilation of the aorta) and digestive organs (gastritis).

The medical history of autopsies at Cantonal Hospital Zurich during the 1918 pandemic has been very well researched and analysed using all autopsy reports 20 years ago [35, 36] (these reports are no longer accessible in their original form today). At the beginning of the twentieth century, autopsies were performed on almost all deaths at the Cantonal Hospital Zurich by experienced pathologists [35, 80]. Influenza was listed as the main cause of death in 411 of 970 autopsies between May 1918 and April 1919. In all of the 411 influenza deaths, pneumonia and the pulmonary complications associated with it were the main causes. The death of the 18-year-old man described here corresponds to the typical picture at the time: the autopsied influenza deaths tended to be young (average 28 years) and male (62%), and the other confirmed diseases in addition to pneumonia were frequent inflammation in the upper respiratory tract [35].

The historical-epidemiological data on the pandemic course in Zurich, which are used for context, were transcribed from the weekly statistical bulletin of the Federal Office of Public Health in an earlier project [81]. The following parameters are used to illustrate the course of the pandemic: (a) Deaths from all causes (for the city of Zurich). These figures refer to both the resident and non-resident population. The quality of these historical vital statistics is considered in the literature to be very good. However, death figures by age, sex, and cause were not available on a weekly basis. (b) New cases of influenza reported by physicians (for the city and canton of Zurich). This series begins with the introduction of mandatory reporting of influenza in the canton of Zurich on 25 July 1918. The authorities at the time estimated that these figures were probably underestimated 3–fourfold due to unreported mild cases [43]. (c) Hospitalisations due to the category “Infectious diseases in general and influenza in particular” (for the canton of Zurich).

Brief sample information for the other samples

As a positive control, previously published and described sample BE-572 was used [12]. All other samples are part of the same collection (IEM, University of Zurich) as ZH1502. The remaining two lung samples are ZH1414 and ZH1501. ZH1414 was collected from a 1946 lung specimen diagnosed with pneumonia. ZH1501 was collected from a 1918 lung specimen diagnosed with tuberculosis.

The three liver samples are described in short: ZH1415 has been collected from a 1933 liver carcinoma specimen. ZH1416 has been collected from a 1947 primary liver carcinoma specimen. ZH1417 has been collected from a 1912 liver specimen with infiltrating gall bladder carcinoma.

Workflow comparison

The R6-based workflow and the ligation-based workflow were conducted as described below. The sample preparation step was only performed once for all experiments at the same time. This step and all following pre-PCR steps were carried out in the cleanroom facility of the Institute of Evolutionary Medicine, University of Zurich. After surface removal, a uniform sample mash was produced for each sample, and aliquots of exactly 25 mg were taken for all subsequent experiments (a total of 24 aliquots per sample).

Ligation-based workflow

All pre-PCR steps of the workflow were conducted in a clean room facility matching the high standards for ancient DNA workflows. All reagents and consumables used were nuclease-free and PCR-clean, whenever possible. All pipettes and surfaces involved in the workflow were cleaned with RNase-away prior to PCR amplification.

Sample preparation

Sample preparation was conducted for one sample at a time in a RNase-away cleaned PCR cabinet. The sample was placed in a single-use petri dish and the surface was removed using a pair of single-use scalpels. The surface-removed sample was transferred to a new petri dish with a fresh pair of single-use scalpels and cut into small pieces. Generating very small pieces with a high surface-to-volume ratio is important at this point. For most tissues, tearing off tiny fibrous pieces (using one scalpel to hold down the sample and one to tear away tiny fragments) proved best for this purpose. Twenty-five milligrams of the resulting fragments were then transferred in 1.5-ml Eppendorf DNA LoBind tubes. Four aliquots of the sample mash were used for each experiment. Aliquots were stored at 4 °C for short-term storage. This is a safe stopping point.

RNA decrosslinking and extraction

Decrosslinking was carried out at 98 °C for 15 min, followed by RNA extraction based on a Qiagen miRNeasy Tissue/Cells Advanced Mini Kit. For details, see below and refer to the manufacturer’s instructions.

  1. Make sure a cooling block is properly cooled to − 20 °C before starting the workflow and that Qiagen miRNeasy Tissue/Cells Advanced Mini Kit is set up according to manufacturer’s instructions.

  2. Retrieve 4 aliquots of 25 mg sample mash per sample. Note: Include 4 empty tubes as negative control for each processing batch, which get treated identically to the samples for all subsequent steps including sequencing and data analysis.

  3. Preheat a heat block to 98 °C

  4. Add 50 µl of nuclease-free water to each sample aliquot or empty tube

  5. Use a fresh single-use pestle fitting 1.5-ml Eppendorf DNA LoBind tubes to further grind up the mash in the liquid for each sample. Properly grinding the sample is important to the workflow. However, formalin-fixed wet specimen samples hardly ever yield a homogenous mash. After grinding the sample (pushing and rotating the pestle) for a few times, continue with step (d) even if there are still a few fragments present. Additional grinding steps will follow during the extraction.

  6. Put all sample tubes in the heat block at 98 °C for 15 min

  7. Just before the end of the 15-min incubation, retrieve the cooling block from − 20 °C

  8. Place all samples in the cooling block for 2 min

  9. Transfer samples to a rack at room temperature

  10. Add 450 µl buffer RLT. Note: Buffer RLT contains guanidine thiocyanate. Do not bring in contact with bleach. Collect waste separately and make sure to dispose of it according to your local guidelines.

  11. For each sample, use a fresh pestle fitting 1.5-ml Eppendorf DNA LoBind tubes to grind the sample (properly push and rotate)

  12. Incubate for 5 min at room temperature

  13. Repeat steps (k) to (l) for a total of 4 times

  14. Pellet debris for 3 min at maximum speed in a centrifuge

  15. Transfer supernatant for each aliquot to a new tube (maintain 4 tubes per sample)

  16. Add 140 µl buffer AL to each tube

  17. Add 20 µl buffer RPP to each tube and vortex for > 20 s

  18. Incubate for 3 min at room temperature

  19. Centrifuge at 12,000 g for 3 min

  20. Transfer supernatant to a new tube (maintain 4 tubes per sample)

  21. Add one volume of isopropanol and mix by pipetting. Do not centrifuge.

  22. Transfer 700 µl to an RNeasy mini column

  23. Centrifuge 15 s at > 8000 g

  24. Use a pipette to remove and discard the flow-through

  25. Load the remaining volume of the corresponding sample aliquot to the column and repeat steps (w) to (x) once

  26. Load 700 µl buffer RWT to the RNeasy column and centrifuge for 15 s at > 8000 g

  27. Use a pipette to remove and discard the flow-through

  28. Load 500 µl buffer RPE to the RNeasy column and centrifuge for 15 s at > 8000 g

  29. Use a pipette to remove and discard the flow-through

  30. Load 500 µl of 80% ethanol to the RNeasy Column and centrifuge for 2 min at > 8000 g

  31. Discard the flow-through and the Collection tube. Transfer the RNeasy column to a new collection tube

  32. Spin for 1 min at maximum speed

  33. Transfer RNeasy column to a new 1.5-ml Eppendorf DNA LoBind Tube and discard the collection tube

  34. Add 35 µl (if running a Qubit assay: 37 µl) nuclease-free water directly to the membrane of the RNeasy column

  35. Incubate for 1 min at room temperature

  36. Elute by spinning for 1 min at maximum speed

  37. Combine all four extracts from the same sample in one Eppendorf DNA LoBind tube for further processing

  38. Optional: Use 10 µl of combined extract to run a Qubit RNA High Sensitivity assay according to the manufacturer’s instructions

DNA digestion and cleanup

DNA digestion was performed on the combined extracts using the Invitrogen TURBO DNA-free Kit and purification by Zymo RNA Clean & Concentrator-5. For details, see below and refer to the manufacturer’s instructions.

  1. Split the combined extract (~ 140 µl) in three PCR tubes with 45 µl of extract, each

  2. Add 5 µl DNase Buffer. Mix by pipetting

  3. Add 1 µl DNase I. Gently mix by pipetting

  4. Incubate 30 min at 37 °C in a PCR cycler. If possible, set lid temperature to 45 °C. Otherwise, leave lid heating off.

  5. Combine all three reactions of the same sample in one 1.5-ml Eppendorf DNA LoBind tube

  6. Add 15 µl DNase inactivation reagents. Vortex and incubate for 5 min at room temperature, while occasionally vortexing

  7. Spin down for 2 min at maximum speed and transfer supernatant (~ 150 µl) to a new 1.5-ml Eppendorf DNA LoBind tube

  8. Add 300 µl Zymo RNA Binding Buffer to each sample and mix.

  9. Add 450 µl of Ethanol (95% to 100%) to each sample and mix.

  10. Combine Zymo-Spin™ IC column and a collection tube

  11. Transfer 450 µl of sample mix to the Zymo-Spin™ IC column

  12. Centrifuge at room temperature at 12,000 g for 30 s

  13. Aspire flow-through using a 1000-µl pipette tip and discard

  14. Repeat steps (k) to (m) once to load the remaining sample mix

  15. Add 400 µl RNA Prep Buffer

  16. Centrifuge at room temperature at 12,000 g for 30 s

  17. Aspire flow-through using a 1000-µl pipette tip and discard

  18. Add 700 µl RNA wash buffer

  19. Centrifuge at room temperature at 12,000 g for 30 s

  20. Transfer Zymo-Spin™ IC column to a new 1.5-ml Eppendorf DNA LoBind tube

  21. Add 40 µl nuclease-free water to the Zymo-Spin™ IC Column and incubate for 1 min

  22. Centrifuge at 16,000 g and room temperature for 1 min

  23. Continue with the next step or store sample short term at − 20 °C. This is the last stopping point before the indexing step of the library preparation part.

End-repair and cleanup

5′ phosphorylation and removal of 3′ phosphoryl groups was performed using New England Biolabs T4 Polynucleotide Kinase followed by a cleanup step using Zymo RNA Clean & Concentrator-5. For details, see below and refer to the manufacturer’s instructions.

  1. Transfer the complete eluate from the DNase digestion step to a PCR tube

  2. Add 5 µl T4 PNK Reaction Buffer (10 ×)

  3. Add5 µl of ATP (10 mM)

  4. Add 1 µl T4 PNK

  5. Pipette mix and spin down briefly

  6. Incubate at 37 °C for 30 min in a PCR cycler

  7. Transfer reaction to a new 1.5-ml Eppendorf DNA LoBind tube

  8. Add 100 µl Zymo RNA Binding Buffer to each sample and mix.

  9. Add 150 µl of Ethanol (95% to 100%) to each sample and mix.

  10. Combine Zymo-Spin™ IC column and a collection tube

  11. Transfer complete cleanup mix to the Zymo-Spin™ IC column

  12. Centrifuge at room temperature at 12,000 g for 30 s

  13. Aspire flow-through using a 1000-µl pipette tip and discard

  14. Add 400 µl RNA Prep Buffer

  15. Centrifuge at room temperature at 12,000 g for 30 s

  16. Aspire flow-through using a 1000-µl pipette tip and discard

  17. Add 700 µl RNA Wash Buffer

  18. Centrifuge at room temperature at 12,000 g for 30 s

  19. Transfer Zymo-Spin™ IC column to a new 1.5-ml Eppendorf DNA LoBind tube

  20. Add 7 µl nuclease-free water to the Zymo-Spin™ IC column and incubate for 1 min

  21. Centrifuge at 16,000 g and room temperature for 1 min

  22. Continue immediately with the library preparation step

Ligation-based library preparation

The library preparation is based on the NEBNext® Multiplex Small RNA Library Prep Set for Illumina with multiplexing using NEBNext® Multiplex Small RNA Library Prep Set for Illumina® (Set 1) and cleanup by the Qiagen MinElute PCR Purification Kit. For details, see below and refer to the manufacturer’s instructions.

  1. Make sure a PCR tube cooling block is properly cooled to − 20 °C before starting the workflow.

  2. Transfer the solution from the end-repair step to a new PCR tube.

  3. Set PCR cycler to 65 °C with lid temperature of 75 °C.

  4. Incubate samples for 20 min at 65 °C in a PCR cycler.

  5. Dilute 3′ SR Adaptor for Illumina 1:2 in nuclease-free water.

  6. After 20 min of incubation, take out the tube and set the PCR cycler to 70 °C with an 80 °C lid temperature.

  7. Add 1 µl of 1:2 diluted 3′ SR Adaptor for Illumina, for a total of ~ 7 µl of reaction volume.

  8. Incubate for 2 min at 70 °C in a PCR cycler.

  9. Transfer tube to pre-cooled PCR cooling block.

  10. Take out PCR tube and set the cycler to incubation at 25 °C with lid heating off.

  11. Return cooling block to − 20 °C for later use.

  12. Add 10 µl 3′ Ligation Reaction Buffer (2 ×).

  13. Add 3 µl 3′ Ligation Enzyme Mix, for a total of ~ 20 µl reaction volume.

  14. Pipette mix and spin down. Incubate 1 h at 25 °C.

  15. Dilute SR RT Primer for Illumina 1:2 in nuclease-free water.

  16. Take the tubes out of the PCR cycler after the 1 h incubation time.

  17. Add 4.5 µl of nuclease-free water.

  18. Add 1 µl of 1:2 diluted SR RT Primer for Illumina, for a total of ~ 25.5 µl reaction volume. Mix by pipetting.

  19. Program a PCR cycler with the following protocol with lid temperature at 85 °C: 5 min at 75 °C, 15 min at 37 °C, 15 min at 25 °C, hold at 4 °C.

  20. Transfer the tube to the PCR cycler and start the program.

  21. 5 min before the PCR program ends, carry out steps (v) to (z).

  22. Set another PCR cycler to 70 °C incubation.

  23. Resuspend 5′ SR Adaptor for Illumina in 120 µl nuclease-free water. Vortex and spin down.

  24. Transfer sample number times 1 µl of resuspended 5′ SR Adaptor for Illumina to a new PCR tube (e.g. 5 µl, if processing 5 samples).

  25. Add the same volume of nuclease-free water to the PCR tube to dilute the 5́′ SR Adaptor for Illumina 1:2 and mix.

  26. Incubate at 70 °C for 2 min. Immediately place in pre-cooled cooling block and use within 30 min.

Note: Minimise freezing/thawing of 5′ SR Adaptor for Illumina. Aliquot the remaining volume in new 1.5-ml Eppendorf DNA LoBind tubes and store at − 80 °C. A − 80 °C freezer outside of the clean room can be used. In this case, pack each aliquot individually in three layers of Ziplock bags. Wipe with DNase away and remove one layer of Ziplock bad for each clean room “checkpoint” you pass (e.g. door to changing room, door to clean room, and door to library preparation room). If required, the aliquot can be retrieved on dry ice and stored in closer proximity to the clean room facility one day before use. If using a pre-aliquoted 5′ SR Adaptor for Illumina, skip step (r) of the protocol.

  1. After the sample incubation is done, retrieve PCR tubes from the cycler.

  2. Add 1 µl of denatured and 1:2 diluted 5′ SR Adaptor for Illumina.

  3. Add 1 µl 5′ Ligation Reaction Buffer (10 ×).

  4. Add 2.5 µl 5′ Ligation Enzyme Mix, for a total of ~ 30 µl of reaction volume.

  5. Mix by pipetting and spin down.

  6. Set a thermal cycler to 25 °C and incubate for 1 h.

  7. Retrieve reactions from the thermal cycler.

  8. Set the cycler to 50 °C with a lid temperature of 60 °C.

  9. Add 8 µl first strand synthesis reaction.

  10. Add 1 µl Murine RNase Inhibitor.

  11. Add 1 µl ProtoScript II Reverse Transcriptase, for a total of ~ 40 µl of reaction volume.

  12. Incubate for 1 h at 50 °C.

  13. Retrieve tube from thermal cycler.

  14. Add 5.0 µl nuclease-free water.

  15. Add 2.5 µl Index (X) Primer (choose Index Primer accordingly. Make sure that there are no overlaps with other samples for the same sequencing run).

  16. Add 2.5 µl SR Primer for Illumina.

  17. Add 50 µl LongAmp Taq 2X Master Mix.

  18. Mix by pipetting. Transport to post-PCR laboratory on ice.

  19. In a post-PCR laboratory, program a thermal cycler with the following program:
    • Initial denaturation: 94 °C for 30 s.
    • 18 cycles of denaturation, 94 °C for 15 s;
    • Annealing, 62 °C for 30 s;
    • Extension, 70 °C for 30 s.
    • Final extension: 70 °C for 5 min.
    • Hold: 4 °C indefinitely.
  20. Spin down the reaction tubes, place them in a thermal cycler, and start the program.

NOTE: After the indexing PCR, samples can be stored at − 20 °C. This is a safe stopping point.

Index library cleanup

The libraries were cleaned up using a Qiagen MinElute PCR Purification Kit. For details, see below and refer to the manufacturer’s instructions.

  1. Prepare 50 ml of Tris–EDTA-Tween (TET) buffer: 10 mM Tris–HCl, 1 mM EDTA, 0.05% Tween-20, pH 8.0. Can be stored at room temperature for a month.

  2. Make sure that the kit has been set up according to the manufacturer’s instructions and Buffer PE is ready to use.

  3. Add 500 µl Buffer PB to a new 1.5-ml Eppendorf DNA LoBind tube.

    Note: Buffer PB contains guanidine hydrochloride. Do not bring into contact with bleach. Collect waste separately and make sure to dispose of it according to your local guidelines.

  4. Add the complete volume of reaction mix from Indexing PCR Step to the tube and mix.

  5. Transfer all 600 µl of cleanup mix to a pre-assembled MinElute column.

  6. Spin for 1 min at maximum speed at room temperature.

  7. Discard flow through.

  8. Add 750 µl Buffer PE.

  9. Spin for 1 min at maximum speed at room temperature.

  10. Discard flow through.

  11. Spin for 1 min at maximum speed.

  12. Rotate MinElute columns (↻) in centrifuge by 180°

  13. Spin again for 1 min at maximum speed.

  14. Transfer MinElute column to a new 1.5-ml Eppendorf DNA LoBind tube.

  15. Add 50 µl of TET and incubate for 1 min.

  16. Elute by spinning 1 min at maximum speed.

Note: After the cleanup, samples can be stored at − 20 °C. This is a safe stopping point.

qPCR and amplification

At this point, the indexed libraries were still too low in concentration for a QC by Tapestation (or comparable instruments). A qPCR was performed to assess sample specific cycles for further amplification. Amplification was then carried out using Herculase II Fusion DNA Polymerase followed by a Qiagen MinElute PCR Purification Kit cleanup.

qPCR

Prepare the qPCR standard as indicated elsewhere [82]. Prepare a standard dilution of 103 to 109 copies/μl. Prepare a mastermix of 7 μl nuclease-free water, 1 μl IS5 (10 μM), 1 μll IS6 (10 μM), and 10 μl Power SYBR™ Green PCR Master Mix (calculate for 10% extra). Distribute 19 μl of mastermix to the appropriate number of wells in a 96-well qPCR plate to run standards and samples in duplicates. Run the qPCR as indicated by the manufacturer’s instructions with an annealing temperature of 55 °C. Calculate cycles for amplification of 10 μl of input volume to end up with just below 1013 total theoretical fragments (based on qPCR).

Amplification

  1. Prepare a master mix on ice for the amplification containing the following quantities (incl. + 10% extra) per reaction: 66 µl nuclease-free water, 22 µl 5X Herculase II Reaction buffer, 1.1 µl dNTP mix (25 mM each), 4.4 µl IS5 Primer, 4.4 µl IS6 Primer, 1.1 µl Herculase II Fusion DNA Polymerase.

  2. Add 90 µl of amplification master mix to a new PCR tube per sample on ice.

  3. Add 10 µl of the respective purified index library to the 90 µl of reaction mix on ice.

  4. Mix and spin down.

  5. Program a separate thermal cycler for each cycle number x, as determined by qPCR, with the following program:

    Initial denaturation: 95 °C for 2 min.

    x cycles of denaturation, 95 °C for 30 s;

    Annealing, 60 °C for 30 s;

    Extension, 72 °C for 30 s.

    Final extension: 72 °C for 5 min.

    Hold: 4 °C indefinitely.

  6. Transfer the samples for amplification to the appropriate thermal cycler matching the required cycle number and start the cycler.

  7. After the thermal cyclers are finished proceed to “ Amplified library cleanup”.

Amplified library cleanup

The libraries were cleaned up using a Qiagen MinElute PCR Purification Kit. For details, see below and refer to the manufacturer’s instructions.

  1. Prepare 50 ml of Tris–EDTA-Tween (TET) buffer: 10 mM Tris–HCl, 1 mM EDTA, 0.05% Tween-20, pH 8.0. Can be stored at room temperature for a month.

  2. Make sure that the kit has been set up according to the manufacturer’s instructions and Buffer PE is ready to use.

  3. Add 500 µl Buffer PB to a new 1.5-ml Eppendorf DNA LoBind tube.

    Note: Buffer PB contains guanidine hydrochloride. Do not bring into contact with bleach. Collect waste separately and make sure to dispose of it according to your local guidelines.

  4. Add the complete volume of reaction mix from Indexing PCR step to the tube and mix.

  5. Transfer all 600 µl of cleanup mix to a pre-assembled MinElute column.

  6. Spin for 1 min at maximum speed at room temperature.

  7. Discard flow through.

  8. Add 750 µl buffer PE and incubate for 2 min.

  9. Spin for 1 min at maximum speed at room temperature.

  10. Discard flow through.

  11. Spin for 1 min at maximum speed.

  12. Rotate MinElute columns (↻) in centrifuge by 180°

  13. Spin again for 1 min at maximum speed.

  14. Transfer MinElute Column to a new 1.5-ml Eppendorf DNA LoBind tube.

  15. Add 20 µl of TET and incubate for 1 min.

  16. Elute by spinning 1 min at maximum speed.

Note: After the cleanup, samples can be stored at − 20 °C. This is a safe stopping point.

Tapestation quality control

The amplified libraries were diluted 1:10 and measured with D1000 reagents on the Agilent Tapestation according to the manufacturer’s instructions. A portion of each amplified library was diluted to 10 nM and the concentration was verified by another Tapestation measurement. Diluted amplified libraries were pooled equimolar and sent for sequencing.

Sequencing

Samples were sequenced 75 paired-end on the Illumina NextSeq 500. Sequencing was conducted by the Functional Genomics Center Zurich and the Next Generation Sequencing Facility of Vienna BioCenter Core Facilities.

R6-based workflow

All pre-PCR steps were carried out in a clean room of the designated ancient DNA facility at the Institute of Evolutionary Medicine (University of Zurich). The R6-based workflow was carried out based on the published workflow by Ariane Düx and colleagues (source). In short, four 25 mg aliquots of the sample mash created in “ Sample preparation” were used for each individual sample, and four empty tubes were added per processing batch as negative control. After heat-mediated decrosslinking, RNA was extracted by a DNeasy® Blood & Tissue Kit (Qiagen). DNA was removed using the TURBO DNA-free™ Kit (Ambion) followed by a cleanup by RNA Clean & Concentrator-5 Kit (Zymo Research). Ribosomal RNA was depleted by NEBNext® rRNA Depletion Kit (Human/Mouse/Rat) with RNA Sample Purification Beads (New England Biolabs). Random hexamer-based cDNA synthesis was conducted using the SuperScript™ IV First-Strand Synthesis System (Invitrogen) for first strand synthesis and NEBNEXT® mRNA Second Strand Synthesis Module (New England Biolabs) for second strand synthesis. Cleanup was conducted using the MagSi-NGSprep Plus Beads (Steinbrenner Laborsysteme) followed by direct application of the NEBNext® Ultra™ II DNA Library Prep Kit for Illumina® (New England Biolabs) and the NEBNext® Multiplex Oligos for Illumina® (New England Biolabs) for dual indexing. Cleanups during the library preparation were carried out using MagSi-NGSprep Plus Beads (Steinbrenner Laborsysteme). After concluding the library preparation with dual indexing, samples were purified, quantified, amplified, quality controlled, and prepared for sequencing in exactly the same way as the samples for ligation-based workflow, which is a slight deviation from the original protocol [30]. Sequencing was performed, as for the ligation-based protocol.

Comparative mapping to influenza A reference

For each replicate individually, 2.5 million reads were mapped against the MU-162 IAV reference [12]. In short, the first 2.5 million reads from read 1 were used for each of the replicates (R6 workflow and ligation-based workflow) to be processed with the EAGER pipeline version 1.92.37 [83]. Read quality was assessed using FastQC version 0.11.5 [84] followed by adapter-trimming using AdapterRemoval version 2.2.1a [85]. Reads below 30 nt length were discarded. The pre-processed reads were then mapped against the IAV reference MU-162 [12] by BWA [86] with a maximum edit distance of 0.04 and mapping quality 37. Read duplicates were removed by MarkDuplicates [87] and mapping mismatches assessed by DamageProfiler version 1.0 [88].

Comparative mapping to human transcriptome reference

For each replicate individually, 2.5 million reads were mapped against the human reference GRCh38.p13 Release 42 (GENCODE [89]). STAR 2.7.10b was used to remove adapter sequences, trim the front of the read by 1 nt, filter out reads shorter than 30 nt, and map the remaining reads to the reference [90]. Duplicates were marked by MarkDuplicates [87].

Re-sequencing with longer read lengths

All libraries from IAV positive samples (positive control and ZH1502) were 300 paired-end re-sequenced on the Illumina NextSeq 2000. Sequencing was conducted by the Functional Genomics Center Zurich.

Comparative mapping to influenza A reference for NextSeq2000 data

For each replicate individually, the original number of reads from the NextSeq2000 300PE sequencing run were mapped against the MU-162 IAV reference [12]. In short, all reads were used for each of the replicates (R6 workflow and ligation-based workflow) to be processed with the EAGER pipeline version 1.92.37 [83]. Read quality was assessed using FastQC version 0.11.5 [84] followed by adapter-trimming and merging using AdapterRemoval version 2.2.1a [85]. Reads below 30 nt length were discarded. The pre-processed reads were then mapped against the IAV reference MU-162 [12] by BWA [86] with a maximum edit distance of 0.04 and mapping quality 37. Read duplicates were removed by MarkDuplicates [87] and mapping mismatches assessed by DamageProfiler version 1.0 [88].

Mapping to influenza A reference and SNP-calling

For the actual reconstruction of the 1918 IAV genome, adapters were trimmed for all 75PE reads from the NextSeq500 and only merged reads were kept (AdapterRemoval version 2.2.1a [85]). The data was then combined with adapter-trimmed and merged reads (AdapterRemoval version 2.2.1a [85]) from 300PE sequencing on the NExtSeq2000. Reads of the combined dataset were trimmed by 1 nt on both ends using fastp v0.23.4 and reads below 30 nt were filtered out [91]. The total pre-processed data was then used in EAGER v1.92.37 [83] for mapping by BWA [86] with a maximum edit distance of 0.04 and mapping quality 37 to the MU-162 reference [12]. Read duplicates were removed by MarkDuplicates [87] and mapping mismatches assessed by DamageProfiler version 1.0 [88]. Criteria for SNP-calling were at least 3 × coverage and over 50% SNP-frequency. All SNPs were verified by hand to rule out incorrect SNP-calling due to mismatches at read ends. Although we did not apply a strict quantitative threshold, we made sure that any variant position we called was supported by multiple reads where the variant did not appear in the last 3 positions of a (trimmed) read.

Phylogenetic divergence and diversity analyses

For each segment, maximum likelihood (ML) trees were estimated with IQtree v.2.2.6 [92] under a GTR + G model of nucleotide substitution. Divergence was measured as the total tree length, and diversity as the average pairwise distance between the tips of the tree. The same strategy as outlined by Patrono and colleagues [12] was used to create subsamples from the 2009H1N1pdm with spatiotemporal spacing matching that of the available 1918 sample of each segment. For this, an in-house developed R script capitalising on the ape and readxl packages was used [9395].

Supplementary Information

12915_2025_2282_MOESM1_ESM.pdf (1.3MB, pdf)

Additional file 1: Fig. S1-S7. FigS1–RNA quantification after extraction. FigS2–Mismatches and read-length of IAV mapping for positive control BE-572. FigS3–The weekly course of the 1918–1920 pandemic in Zurich. FigS4–Maximum Likelihood trees for all individual IAV segments. FigS5–R6 Protocol fragment length distribution from 300 PE sequencing. FigS6–Ligation Protocol fragment length distribution from 300 PE sequencing. FigS7–Mapping results against human reference.

12915_2025_2282_MOESM2_ESM.xlsx (16.6KB, xlsx)

Additional file 2: Table S1: Overview table of IAV mapping results.

Acknowledgements

Olivia Keiser, Ella Ziegler, Christian Althaus and Joël Floris contributed to the interpretation of the historical data and/or the historical context analysis. Sequencing for this publication was conducted by the Functional Genomics Center Zurich and the Next Generation Sequencing Facility of the Vienna Biocenter Core Facilities. Lennart Opitz contributed to the analysis of the human transcriptome data. Anja Furtwängler tested the protocol and provided feedback.

Abbreviations

aDNA

Ancient DNA

aRNA

Ancient RNA

C

Cytosine

FFPE

Formalin-fixed paraffin-embedded

HA

Hemagglutinin

HIV-1

Human immunodeficiency virus 1

IAV

Influenza A virus

ML

Maximum likelihood

MP1

Matrix Protein 1

MP2

Matrix Protein 2

NA

Neuraminidase

NS1

Non-Structural Protein 1

NS2

Non-Structural Protein 2

NP

Nucleoprotein

PA

Polymerase Acidic

PB1

Polymerase Basic 1

PB2

Polymerase Basic 2

PNK

Polynucleotide Kinase

SNP

Single-nucleotide polymorphism

T

Thymine

Authors' contributions

V.J.S, and C.U. conceived and designed the study. F.R., S.C., N.W., T.S., and S.C.-S. provided samples and K.M., M.L.V., N.M.B.-K., and K.S. provided historical context and data. V.J.S. supervised the work. C.U., A.D., and L.V.P. performed the experimental work. C.U., B.V., P.L., and S.C.-S. analysed the sequenced data. V.J.S., K.S., and F.R. provided funding. C.U., V.J.S., and K.S. wrote the manuscript with input from all authors. All authors read and approved the final manuscript.

Funding

Open access funding provided by University of Basel This work was supported by the University of Zurich’s University Research Priority Program “Evolution in Action: From Genomes to Ecosystems” (V.J.S), the Foundation for Research in Science and the Humanities at the University of Zurich (PI Kaspar Staub, Grant-No. STWF-21–011), the Digitalization Initiative of the Zurich Higher Education Institutions DIZH (PI Kaspar Staub, Grant-No. 2021.1_RC_ID_15), and the Mäxi Foundation (F.R.). PL acknowledges funding from the Research Foundation—Flanders (“Fonds voor Wetenschappelijk Onderzoek – Vlaanderen,” G051322N) and the EU Horizon 2020 grant MOOD (H2020-874850). Bram Vrancken is a postdoctoral researcher of the Fonds de la Recherche Scientifique – FNRS.

Data availability

In agreement with the ethics committee of the Charité, Berlin, and the IEM Human Remains Scientific Committee, human reads were removed from all sequencing files before uploading them to NCBI under the following BioProject ID: PRJNA1181848.

Historical data is available under https://zenodo.org/records/7986584 or 10.5281/zenodo.7986584 or www.leaddata.ch.

Declarations

Ethics approval and consent to participate

For the German sample, ethics approval was obtained from the ethics committee of the Charité (Berlin, Germany) under the reference number EA4/212/19.

All human samples from Switzerland are more than 70 years old and anonymized. Therefore, they do not require ethical approval for genetic analysis under current Swiss law (https://www.admin.ch/opc/de/classified-compilation/20061313/index.html).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Christian Urban, Email: christian.urban@uzh.ch.

Verena J. Schuenemann, Email: verena.schuenemann@unibas.ch

References

  • 1.Miller MA, Viboud C, Balinska M, Simonsen L. The signature features of influenza pandemics–implications for policy. N Engl J Med. 2009;360:2595–8. [DOI] [PubMed] [Google Scholar]
  • 2.Worobey M, Han G-Z, Rambaut A. Genesis and pathogenesis of the 1918 pandemic H1N1 influenza A virus. Proc Natl Acad Sci U S A. 2014;111:8107–12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Worobey M, Han G-Z, Rambaut A. A synchronized global sweep of the internal genes of modern avian influenza virus. Nature. 2014;508:254–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Mänz B, Dornfeld D, Götz V, Zell R, Zimmermann P, Haller O, et al. Pandemic influenza A viruses escape from restriction by human MxA through adaptive mutations in the nucleoprotein. PLoS Pathog. 2013;9:e1003279. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Srinivasan A, Viswanathan K, Raman R, Chandrasekaran A, Raguram S, Tumpey TM, et al. Quantitative biochemical rationale for differences in transmissibility of 1918 pandemic influenza A viruses. Proc Natl Acad Sci U S A. 2008;105:2800–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Murray CJL, Lopez AD, Chin B, Feehan D, Hill KH. Estimation of potential global pandemic influenza mortality on the basis of vital registry data from the 1918–20 pandemic: a quantitative analysis. Lancet. 2006;368:2211–8. [DOI] [PubMed] [Google Scholar]
  • 7.Mills CE, Robins JM, Lipsitch M. Transmissibility of 1918 pandemic influenza. Nature. 2004;432:904–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Morens DM, Taubenberger JK, Fauci AS. A centenary tale of two pandemics: the 1918 influenza pandemic and COVID-19, part I. Am J Public Health. 2021;111:1086–94. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Morens DM, Taubenberger JK. The mother of all pandemics is 100 years old (and going strong)! Am J Public Health. 2018;108:1449–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Matthes KL, Le Vu M, Bhattacharyya U, Galliker A, Kordi M, Floris J, et al. Reinfections and cross-protection in the 1918/19 influenza pandemic: Revisiting a survey among male and female factory workers. Int J Public Health. 2023;68:1605777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Barry JM, Viboud C, Simonsen L. Cross-protection between successive waves of the 1918–1919 influenza pandemic: epidemiological evidence from US Army camps and from Britain. J Infect Dis. 2008;198:1427–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Patrono LV, Vrancken B, Budt M, Düx A, Lequime S, Boral S, et al. Archival influenza virus genomes from Europe reveal genomic variability during the 1918 pandemic. Nat Commun. 2022;13:2314 Raw sequencing data available under European Nucleotide Archive Project PRJEB41631: https://www.ebi.ac.uk/ena/browser/view/PRJEB41631 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Simonsen L, Viboud C, Chowell G, Andreasen V, Olson DR, Parekh V, et al. The need for interdisciplinary studies of historic pandemics. Vaccine. 2011;29(Suppl 2):B1-5. [DOI] [PubMed] [Google Scholar]
  • 14.Reid AH, Fanning TG, Hultin JV, Taubenberger JK. Origin and evolution of the 1918 “Spanish” influenza virus hemagglutinin gene. Proc Natl Acad Sci. 1999;96:1651–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Taubenberger JK, Reid AH, Krafft AE, Bijwaard KE, Fanning TG. Initial genetic characterization of the 1918 “Spanish” influenza virus. Science. 1997;275:1793–6. [DOI] [PubMed] [Google Scholar]
  • 16.Reid AH, Fanning TG, Janczewski TA, Taubenberger JK. Characterization of the 1918 “Spanish” influenza virus neuraminidase gene. Proc Natl Acad Sci U S A. 2000;97:6785–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Basler CF, Reid AH, Dybing JK, Janczewski TA, Fanning TG, Zheng H, et al. Sequence of the 1918 pandemic influenza virus nonstructural gene (NS) segment and characterization of recombinant viruses bearing the 1918 NS genes. Proc Natl Acad Sci U S A. 2001;98:2746–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Reid AH, Fanning TG, Janczewski TA, McCall S, Taubenberger JK. Characterization of the 1918 “Spanish” influenza virus matrix gene segment. J Virol. 2002;76:10717–23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Reid AH, Fanning TG, Janczewski TA, Lourens RM, Taubenberger JK. Novel origin of the 1918 pandemic influenza virus nucleoprotein gene. J Virol. 2004;78:12462–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Taubenberger JK, Reid AH, Lourens RM, Wang R, Jin G, Fanning TG. Characterization of the 1918 influenza virus polymerase genes. Nature. 2005;437:889–93. [DOI] [PubMed] [Google Scholar]
  • 21.Xiao Y, Sheng Z-M, Williams SL, Taubenberger JK. Two complete 1918 influenza A/H1N1 pandemic virus genomes characterized by next-generation sequencing using RNA isolated from formalin-fixed, paraffin-embedded autopsy lung tissue samples along with evidence of secondary bacterial co-infection. MBio. 2024;15:e0321823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Xiao YL, Kash JC, Beres SB, Sheng ZM, Musser JM, Taubenberger JK. High-throughput RNA sequencing of a formalin-fixed, paraffin-embedded autopsy lung tissue sample from the 1918 influenza pandemic. J Pathol. 2013;229:535–45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lindahl T. The Croonian Lecture, 1996: endogenous damage to DNA. Philos Trans R Soc Lond B Biol Sci. 1996;351:1529–38. [DOI] [PubMed] [Google Scholar]
  • 24.Deng M, Wang X, Xiong Z, Tang P. Control of RNA degradation in cell fate decision. Front Cell Dev Biol. 2023;11:1164546. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Prentice LM, Miller RR, Knaggs J, Mazloomian A, Aguirre Hernandez R, Franchini P, et al. Formalin fixation increases deamination mutation signature but should not lead to false positive mutations in clinical practice. PLoS ONE. 2018;13:e0196434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Hoffman EA, Frey BL, Smith LM, Auble DT. Formaldehyde crosslinking: a tool for the study of chromatin complexes. J Biol Chem. 2015;290:26404–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Karmakar S, Harcourt EM, Hewings DS, Scherer F, Lovejoy AF, Kurtz DM, et al. Organocatalytic removal of formaldehyde adducts from RNA and DNA bases. Nat Chem. 2015;7:752–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Krafft AE, Duncan BW, Bijwaard KE, Taubenberger JK, Lichy JH. Optimization of the isolation and amplification of RNA from formalin-fixed, paraffin-embedded tissue: the Armed Forces Institute of Pathology Experience and Literature Review. Mol Diagn. 1997;2:217–30. [DOI] [PubMed] [Google Scholar]
  • 29.Yi QQ, Yang R, Shi JF, Zeng NY, Liang DY, Sha S, et al. Effect of preservation time of formalin-fixed paraffin-embedded tissues on extractable DNA and RNA quantity. J Int Med Res. 2020;48:300060520931259. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Düx A, Lequime S, Patrono LV, Vrancken B, Boral S, Gogarten JF, et al. Measles virus and rinderpest virus divergence dated to the sixth century BCE. Science. 2020;368:1367–70. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Boer LL, Kircher SG, Rehder H, Behunova J, Winter E, Ringl H, et al. History and highlights of the teratological collection in the Narrenturm, Vienna (Austria). Am J Med Genet A. 2023;191:1301–24. [DOI] [PubMed] [Google Scholar]
  • 32.Smith O, Palmer SA, Clapham AJ, Rose P, Liu Y, Wang J, et al. Small RNA activity in archeological barley shows novel germination inhibition in response to environment. Mol Biol Evol. 2017;34:2555–62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Smith O, Dunshea G, Sinding MHS, Fedorov S, Germonpre M, Bocherens H, et al. Ancient RNA from late Pleistocene permafrost and historical canids shows tissue-specific transcriptome survival. PLoS Biol. 2019;17:e3000166. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.van Gurp TP, McIntyre LM, Verhoeven KJF. Consistent errors in first strand cDNA due to random hexamer mispriming. PLoS ONE. 2013;8:e85583. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Burkhard-Koren NM, Haberecker M, Maccio U, Ruschitzka F, Schuepbach RA, Zinkernagel AS, et al. Higher prevalence of pulmonary macrothrombi in SARS-CoV-2 than in influenza A: autopsy results from “Spanish flu” 1918/1919 in Switzerland to Coronavirus disease 2019. Hip Int. 2021;7:135–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Koren NM, Heitz PU. Die Spanische Grippe in Zürich 1918/19: Erfahrungen aus heutiger Sicht anhand von 970 Sektionen des Pathologischen Institutes Zürich. Diss. Univ. Zürich, 2003.
  • 37.Krämer D, Pfister C, Segesser DM. “Woche für Woche neue Preisaufschläge” : Nahrungsmittel-, Ernergie-und Ressourcenkonflikte in der Schweiz des Ersten Weltkrieges. 2., unveränderte Auflage. Berlin: Schwabe Verlag; 2016.
  • 38.Staub K, Jüni P, Urner M, Matthes KL, Leuch C, Gemperle G, et al. Public health interventions, epidemic growth, and regional variation of the 1918 influenza pandemic outbreak in a Swiss Canton and its greater regions. Ann Intern Med. 2021;174:533–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Tscherrig A. Krankenbesuche verboten!: die Spanische Grippe 1918/19 und die kantonalen Santitätsbehörden in Basel-Landschaft und Basel-Stadt. Verlag Basel-Landschaft; 2016.
  • 40.Chowell G, Ammon CE, Hengartner NW, Hyman JM. Estimation of the reproductive number of the Spanish flu epidemic in Geneva. Switzerland Vaccine. 2006;24:6747–50. [DOI] [PubMed] [Google Scholar]
  • 41.Ziegler E, Matthes KL, Middelkamp PW, Schuenemann VJ, Althaus CL, Rühli F, et al. Retrospective modelling of the disease and mortality burden of the 1918–1920 influenza pandemic in Zurich. Switzerland Epidemics. 2025;50:100813. [DOI] [PubMed] [Google Scholar]
  • 42.Thalmann H. Die Grippeepidemie 1918/19 im Zürich. Th. méd. Zurich, 1968; 1968.
  • 43.Bureau ES. Die Influenza-Pandemie in der Schweiz 1918/1919. Bull des Schweizerischen Gesundheitsamtes. 1919;29:337–44. [Google Scholar]
  • 44.Sheng Z-M, Chertow DS, Ambroggio X, McCall S, Przygodzki RM, Cunningham RE, et al. Autopsy series of 68 cases dying before and during the 1918 influenza pandemic peak. Proc Natl Acad Sci U S A. 2011;108:16416–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Reid AH, Janczewski TA, Lourens RM, Elliot AJ, Daniels RS, Berry CL, et al. 1918 influenza pandemic caused by highly conserved viruses with two receptor-binding variants. Emerg Infect Dis. 2003;9:1249–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018;4:vey016. [DOI] [PMC free article] [PubMed]
  • 47.Worobey M, Gemmel M, Teuwen DE, Haselkorn T, Kunstman K, Bunce M, et al. Direct evidence of extensive diversity of HIV-1 in Kinshasa by 1960. Nature. 2008;455:661–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Worobey M, Watts TD, McKay RA, Suchard MA, Granade T, Teuwen DE, et al. 1970s and “Patient 0” HIV-1 genomes illuminate early HIV/AIDS history in North America. Nature. 2016;539:98–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Jones W, Greytak S, Odeh H, Guan P, Powers J, Bavarva J, et al. Deleterious effects of formalin-fixation and delays to fixation on RNA and miRNA-Seq profiles. Sci Rep. 2019;9:6980. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Thakral S, Purohit P, Mishra R, Gupta V, Setia P. The impact of RNA stability and degradation in different tissues to the determination of post-mortem interval: a systematic review. Forensic Sci Int. 2023;349:111772. [DOI] [PubMed] [Google Scholar]
  • 51.Liu Y, He H, Yi S, Hu Q, Zhang W, Huang D. Comparison of different methods for repairing damaged DNA from buffered and unbuffered formalin-fixed tissues. Int J Legal Med. 2018;132:675–81. [DOI] [PubMed] [Google Scholar]
  • 52.Koen JS. A practical method for field diagnosis of swine disease. Am J Vet Med. 1919;14:468–70.
  • 53.Nelson MI, Worobey M. Origins of the 1918 pandemic: revisiting the swine “mixing vessel” hypothesis. Am J Epidemiol. 2018;187:2498–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.He CQ, He M, He HB, Wang HM, Ding NZ. The matrix segment of the “Spanish flu” virus originated from intragenic recombination between avian and human influenza A viruses. Transbound Emerg Dis. 2019;66:2188–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Smith O. Small RNA-mediated regulation, adaptation and stress response in barley archaeogenome. Doctoral dissertation. University of Warwick; 2012. https://wrap.warwick.ac.uk/id/eprint/57032/.
  • 56.Engreitz JM, Sirokman K, McDonel P, Shishkin AA, Surka C, Russell P, et al. RNA-RNA interactions enable specific targeting of noncoding RNAs to nascent Pre-mRNAs and chromatin sites. Cell. 2014;159:188–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Küpfer PA, Crey-Desbiolles C, Leumann CJ. Trans-lesion synthesis and RNaseH activity by reverse transcriptases on a true abasic RNA template. Nucleic Acids Res. 2007;35:6846–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Velema WA, Lu Z. Chemical RNA cross-linking: mechanisms, computational analysis, and biological applications. JACS Au. 2023;3:316–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Dator RP, Murray KJ, Luedtke MW, Jacobs FC, Kassie F, Nguyen HD, et al. Identification of formaldehyde-induced DNA-RNA cross-links in the A/J mouse lung tumorigenesis model. Chem Res Toxicol. 2022;35:2025–36. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Suryo Rahmanto A, Blum CJ, Scalera C, Heidelberger JB, Mesitov M, Horn-Ghetko D, et al. K6-linked ubiquitylation marks formaldehyde-induced RNA-protein crosslinks for resolution. Mol Cell. 2023;83:4272-89.e10. [DOI] [PubMed] [Google Scholar]
  • 61.Prashar T, De La Selle F, Hudak KA. Abasic RNA: its formation and potential role in cellular stress response. RNA Biol. 2023;20:348–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Dawson WK, Lazniewski M, Plewczynski D. RNA structure interactions and ribonucleoprotein processes of the influenza A virus. Brief Funct Genomics. 2018;17:402–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Moeller A, Kirchdoerfer RN, Potter CS, Carragher B, Wilson IA. Organization of the influenza virus replication machinery. Science. 2012;338:1631–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Sawyer S, Krause J, Guschanski K, Savolainen V, Pääbo S. Temporal patterns of nucleotide misincorporations and DNA fragmentation in ancient DNA. PLoS ONE. 2012;7: e34131. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Brotherton P, Endicott P, Sanchez JJ, Beaumont M, Barnett R, Austin J, et al. Novel high-resolution characterization of ancient DNA reveals C > U-type base modification events as the sole cause of post mortem miscoding lesions. Nucleic Acids Res. 2007;35:5717–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Briggs AW, Stenzel U, Johnson PLF, Green RE, Kelso J, Prüfer K, et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci U S A. 2007;104:14616–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Smith O, Gilbert MTP. Ancient RNA. In: Lindqvist C, Rajora OP, editors. Paleogenomics: genome-scale analysis of ancient DNA. Cham: Springer International Publishing; 2019. p. 53–74. [Google Scholar]
  • 68.Ashenberg O, Padmakumar J, Doud MB, Bloom JD. Deep mutational scanning identifies sites in influenza nucleoprotein that affect viral inhibition by MxA. PLoS Pathog. 2017;13:e1006288. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Chen GW, Chang SC, Mok CK, Lo YL, Kung YN, Huang JH, et al. Genomic signatures of human versus avian influenza A viruses. Emerg Infect Dis. 2006;12:1353–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Pan C, Cheung B, Tan S, Li C, Li L, Liu S, et al. Genomic signature and mutation trend analysis of pandemic (H1N1) 2009 influenza A virus. PLoS ONE. 2010;5:e9549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Haller O, Kochs G. Human MxA protein: an interferon-induced dynamin-like GTPase with broad antiviral activity. J Interferon Cytokine Res. 2011;31:79–87. [DOI] [PubMed] [Google Scholar]
  • 72.Haller O, Kochs G. Interferon-induced mx proteins: dynamin-like GTPases with antiviral activity. Traffic. 2002;3:710–7. [DOI] [PubMed] [Google Scholar]
  • 73.Gao S, von der Malsburg A, Dick A, Faelber K, Schröder GF, Haller O, et al. Structure of myxovirus resistance protein a reveals intra- and intermolecular domain interactions required for the antiviral function. Immunity. 2011;35:514–25. [DOI] [PubMed] [Google Scholar]
  • 74.Pavlovic J, Haller O, Staeheli P. Human and mouse Mx proteins inhibit different steps of the influenza virus multiplication cycle. J Virol. 1992;66:2564–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Zimmermann P, Mänz B, Haller O, Schwemmle M, Kochs G. The viral nucleoprotein determines Mx sensitivity of influenza A viruses. J Virol. 2011;85:8133–40. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Stevens J, Blixt O, Glaser L, Taubenberger JK, Palese P, Paulson JC, et al. Glycan microarray analysis of the hemagglutinins from modern and pandemic influenza viruses reveals different receptor specificities. J Mol Biol. 2006;355:1143–55. [DOI] [PubMed] [Google Scholar]
  • 77.Tumpey TM, Basler CF, Aguilar PV, Zeng H, Solórzano A, Swayne DE, et al. Characterization of the reconstructed 1918 Spanish influenza pandemic virus. Science. 2005;310:77–80. [DOI] [PubMed] [Google Scholar]
  • 78.Watanabe T, Tisoncik-Go J, Tchitchek N, Watanabe S, Benecke AG, Katze MG, et al. 1918 Influenza virus hemagglutinin (HA) and the viral RNA polymerase complex enhance viral pathogenicity, but only HA induces aberrant host responses in mice. J Virol. 2013;87:5239–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Watanabe T, Kawaoka Y. Pathogenesis of the 1918 pandemic influenza virus. PLoS Pathog. 2011;7:e1001218. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Studer E. Ueber die Influenza-Epidemie 1918/19 nach Beobachtungen auf der medizinischen Universitätsklinik in Zürich. Buchdruckerei Schöpfheim; 1920.
  • 81.Ziegler E, Birkhölzer I, Simola J, Matthes K, Floris J, Staub K. All cause mortality and morbidity from influenza in the City and the Canton of Zurich, 1910–1970. 2023.
  • 82.Gansauge M-T, Aximu-Petri A, Nagel S, Meyer M. Manual and automated preparation of single-stranded DNA libraries for the sequencing of DNA from ancient biological remains and other sources of highly degraded DNA. Nat Protoc. 2020;15:2279–300. [DOI] [PubMed] [Google Scholar]
  • 83.Peltzer A, Jäger G, Herbig A, Seitz A, Kniep C, Krause J, et al. EAGER: efficient ancient genome reconstruction. Genome Biol. 2016;17:60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Andrews S. FastQC: a quality control tool for high throughput sequence data. 2010.
  • 85.Schubert M, Lindgreen S, Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes. 2016;9:88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Broad Institute. Picard tools. 2017. https://github.com/broadinstitute/picard/releases. Accessed 1 Mar 2024.
  • 88.Neukamm J, Peltzer A, Nieselt K. DamageProfiler: fast damage pattern calculation for ancient DNA. Bioinformatics. 2021;37:3652–3. [DOI] [PubMed] [Google Scholar]
  • 89.Frankish A, Diekhans M, Ferreira A-M, Johnson R, Jungreis I, Loveland J, et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019;47:D766-73 GENCODE Release 42 (GRCh38.p13) available under the following link: https://www.gencodegenes.org/human/release_42.html . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol. 2020;37:1530–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Paradis E, Schliep K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics. 2019;35:526–8. [DOI] [PubMed]
  • 94.Wickham H, Bryan J. readxl: read excel files. 2023.
  • 95.R Core Team. R: a language and environment for statistical computing. 2023. https://www.R-project.org.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

12915_2025_2282_MOESM1_ESM.pdf (1.3MB, pdf)

Additional file 1: Fig. S1-S7. FigS1–RNA quantification after extraction. FigS2–Mismatches and read-length of IAV mapping for positive control BE-572. FigS3–The weekly course of the 1918–1920 pandemic in Zurich. FigS4–Maximum Likelihood trees for all individual IAV segments. FigS5–R6 Protocol fragment length distribution from 300 PE sequencing. FigS6–Ligation Protocol fragment length distribution from 300 PE sequencing. FigS7–Mapping results against human reference.

12915_2025_2282_MOESM2_ESM.xlsx (16.6KB, xlsx)

Additional file 2: Table S1: Overview table of IAV mapping results.

Data Availability Statement

In agreement with the ethics committee of the Charité, Berlin, and the IEM Human Remains Scientific Committee, human reads were removed from all sequencing files before uploading them to NCBI under the following BioProject ID: PRJNA1181848.

Historical data is available under https://zenodo.org/records/7986584 or 10.5281/zenodo.7986584 or www.leaddata.ch.


Articles from BMC Biology are provided here courtesy of BMC

RESOURCES