Developing and Benchmarking One Health Genomic Surveillance Tools for Influenza A Virus in Wastewater

Minxi Jiang; Audrey LW Wang; James B Thissen; Kara L Nelson; Lenore Pipes; Rose S Kantor

doi:10.1101/2025.09.19.676942

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2025 Sep 20:2025.09.19.676942. [Version 1] doi: 10.1101/2025.09.19.676942

Developing and Benchmarking One Health Genomic Surveillance Tools for Influenza A Virus in Wastewater

Minxi Jiang ^a, Audrey LW Wang ^a, James B Thissen ^b, Kara L Nelson ^a, Lenore Pipes ^c,^*, Rose S Kantor ^b,^*

PMCID: PMC12458313 PMID: 41000925

Abstract

Influenza A viruses (IAV) remain a persistent One Health threat, and whole-genome sequencing from wastewater offers a promising surveillance tool. However, IAV is at low abundance in wastewater, making it difficult to sequence. We benchmarked four targeted enrichment methods suited for whole-genome sequencing including custom and off-the-shelf amplicon and probe-based methods. Our custom HA tiled-amplicon panel was sensitive, fast, and cost-effective, making it suitable for monitoring low-abundance seasonal variants of known subtypes. However, its reliance on conserved and intact primer-binding sites limited primer design to fewer subtypes. A previously published universal amplicon method targeted all IAV subtypes, but it performed poorly in wastewater due to its reliance on intact genome segments. Probe-capture methods were resilient to RNA degradation and mismatches, potentially enabling broader surveillance and detection of emerging strains. However, probes were costly, labor-intensive, and less sensitive than tiled-amplicon. When testing compatibility of sequencing methods with upstream virus concentration and extraction methods, ultrafiltration-based virus concentration outperformed large-volume direct extraction with all four sequencing methods. This set of benchmarking comparisons and custom panels provides needed information for the translation of IAV genomic sequencing into a routine component of wastewater surveillance.

1. Introduction

Influenza A viruses (IAV) are panzoonotic, and animal spillover events have led to at least 5 previous pandemics in the 20th and 21st centuries^1,2. Of recent concern, highly pathogenic avian influenza H5N1 clade 2.3.4.4.b reached wild birds in the US in 2022, subsequently spreading to poultry, dairy cattle, and humans with multiple species crossover events since 2024³. In this context, whole genome sequencing of human and animal specimens has provided critical information for tracking the origins of human infections, detecting adaptation to human hosts and antiviral resistance, and examining the suitability of existing candidate vaccine viruses⁴. Meanwhile, the US CDC conducts routine monitoring of human seasonal influenza through whole genome sequencing of approximately 7,000 clinical specimens each year. These sequences are used to inform vaccine development for seasonal IAV strains (H3N2 and H1N1)⁵. However, experiences from surveillance during the COVID-19 pandemic highlighted biases in clinical surveillance systems due to differences in symptoms, testing availability, and test-seeking behavior for different populations⁶. Wastewater-based genomic surveillance has the potential to complement clinical data and to integrate human and animal testing with fewer biases, operationalizing the One Health paradigm.

Acquisition of high-quality influenza A whole genome sequences from wastewater is challenging due to viral genetic diversity across multiple subtypes, the segmented nature of the genome, low concentrations of target viral RNA, and a complex sequencing background. To overcome these challenges, three principal enrichment strategies have been applied, each with limited success: first, a whole genome amplification method using universal primers was paired with nanopore sequencing, making use of the highly conserved 5’ and 3’ terminal sequences present on all segments across subtypes⁷. Lee et al. reported sequencing data for just 12 of 48 samples tested, and no single sample yielded all eight segments⁸. Second, several multi-virus probe-capture panels have also been applied to detect the whole genome of IAV in wastewater, including Illumina’s RVOP^9–11, Qiagen’s xHYB adventitious agent panel¹², and Twist’s Comprehensive Viral Research panel¹³. However, in all reports, IAV genomes were highly incomplete unless data from many samples were combined. While whole-genome recovery is valuable for identifying reassortment events and mutations across different segments, each carrying different public health risks, prioritizing key segments may offer a faster and potentially more sensitive alternative for subtype tracking, vaccine development, and early warning. A recently developed tiled amplicon panel covering the HA, NA, and M segments of seasonal subtypes was applied to wastewater, with successful sequencing of H1N1 but limited detection of H3N2¹⁴. A commercial tiled amplicon panel targeting H1N1, H1N2, and H3N2 (CleanPlex Respiratory Virus, Paragon Genomics) was also applied to wastewater, yielding high recovery for six of eight IAV segments¹⁵.

The quality of extracted nucleic acids from wastewater may also limit IAV genome recovery. RNA degradation during wastewater transit and sample processing could lead to fragmented genomes that cannot be enriched by certain downstream sequencing methods¹⁶. Reflecting this challenge, we previously found that wastewater processing methods that were more sensitive for PCR-based viral detection did not necessarily yield the best results for sequencing¹⁷, Additionally, we predicted that the ratio between target viral nucleic acid and non-target nucleic acids in a sample may affect the sensitivity of probe-capture sequencing. This ratio is largely determined by the method of wastewater concentration and by the number of targets in the probe panel¹⁸. Further work is needed to understand the interaction between sample processing methods and sequencing methods for specific viruses of interest such as IAV.

The goal of this study was to develop a One Health-focused, wastewater-based genomic surveillance workflow, which is capable of sensitively, efficiently, and cost-effectively detecting and characterizing both circulating and emerging influenza A viruses (IAVs). To our knowledge when we embarked on this research, no publicly available tiled amplicon methods yet existed for IAV subtype H5N1. Similarly, no IAV-specific probe set could target both zoonotic and seasonal human subtypes, although human-specific broad viral panels (Twist CVRP, Illumina VSP) and avian-specific IAV panels existed¹⁹. This motivated us to design two custom panels (1) a HA-segment tiled-amplicon panel prioritized for H1N1, H3N2, and H5N1 subtypes; (2) a probe-capture panel covering the whole genome (8 segments) of 11 representative panzoonotic IAV subtypes. We benchmarked the performance of these assays against two existing sequencing methods (Table 1): (3) a pan-viral whole-genome probe-capture panel (Probe-Twist), and (4) a whole-genome amplification method covering all IAV subtypes and segments (Universal-amplicon)⁷. We report on challenges in custom panel design, sequencing sensitivity, interactions with upstream virus concentration methods, and logistical factors such as cost and turnaround time.

Table 1.

Sequence enrichment methods applied in this study.

Method	Targeted segment/subtype	Sequencing	Input vol. (uL)	PCR cycles	Insert size (bp)	Avg. Depth per sample (M reads)^*	Source
Tiled-amplicon	HA / H1N1, H3N2, H5N1	NextSeq 300 PE	5	35+8	400–565	8.4 ± 1.5	This study
Probe-IAV	All / 11 subtypes^**	NextSeq 300 PE	15	12+17	300–500	20.3 ± 2.0	This study
Probe-Twist	All /119	NextSeq 150 PE	15	10+8	180–220	2.8 ± 0.1	Twist
Universal-amplicon	All / all	minION	5	35		0.5 ± 0.2	Zhou et al. (2009)⁷
Unenriched	All / all	NextSeq 300 PE	15	12	300–500	14.4	NA

Open in a new tab

All values are reported for S1, except for the unenriched samples, for which the sequencing depth is reported for S3. The sequencing depths for other samples are included in Table S7.

^**

see Supplementary Methods.

2. Results

2.1. Design of the Probe-IAV and HA segment tiled-amplicon panel

To compile a dataset for probe design, we first explored sequences and host diversity of IAV genomes. Searching GISAID for the 11 most prevalent panzoonotic IAV subtypes yielded 486,782 complete genomes collected within the previous year (Supplementary Methods). We noted that the number of H1N1 and H3N2 sequences released in a single year exceeded the total 5-year data for H5N1 and all archived sequences of other subtypes (Table S1). Further, the recent H1N1 and H3N2 sequences were predominantly from human hosts. To improve host diversity, we added avian and swine H1N1 and H3N2 sequences from the past 10 years (Figure S1). We also included sequences from the four spike-in strains used for benchmarking, resulting in a final dataset of 525,075 sequences. We used Syotti for probe design, testing multiple strategies to balance genome coverage, probe count, and hybridization efficiency across IAV genomes. Clustering input sequences at 100% identity initially produced up to 69,835 probes. A stepwise design approach, which targeted 90% of genomes with stricter mismatch limits and the remaining 10% with relaxed criteria (e.g., allowing gaps), helped reduce the probe count but still exceeded our target of fewer than 7,500 probes (Table S2). Ultimately, clustering IAV genomes at 90% identity resulted in a final set of 7,448 probes (Design 15, Table S2). When probes were aligned to the spike-in reference genomes, all spike-in viruses showed coverage breadth >75%, except for H3N1, where the MP and PB2 segments exhibited lower coverage breadth (Figure S2). We also observed regions with higher predicted probe density (>4x), particularly at the ends of each segment in the H1N1 genome.

Tiled primer design used a dataset consisting of HA sequences from the previous one-year for H3N2 and H1N1, and 5 years for H5N1. Unlike probe design, clustering was performed at 100% identity to account for the higher specificity of PCR. The final datasets contained sequences from 8558, 5876, and 6052 strains for H1N1, H3N2, and H5N1, respectively. We investigated design tools including PriMux^20,21, PrimalScheme²², and Olivar²³. The latter two tools have since been updated^24,25, but at the time of our design, PrimalScheme input was limited to 200 reference genomes, while Olivar required a risk profile rather than an alignment. No tool could accommodate segmented genomes as input. These restrictions prompted us to choose PriMux and to focus solely on the HA segment to limit complexity. Although the kmer-based-approach of PriMux allows amplification of a broader diversity of sequences, we chose to perform separate runs for each subtype to improve specificity. PriMux produced shorter-than-requested amplicons, and after tuning parameters, the resulting amplicons were typically around 400 nt. However, some amplicons were shorter or longer than expected and/or spanned other amplicons. Manual review was required for several steps, resulting in a lengthy design process: first, more than one primer set per amplicon was often produced, and manual review was required to choose the best one; second external tools were required to predict primer dimers and hairpins, and the melting temperatures of the primers varied widely; third, compatibility between primer sets from multiple subtypes required manual investigation; lastly, we found some cases where none of the PriMux primer options for a given tile were suitable based on our review, necessitating manual redesign (see Supplementary Methods).

2.2. Comparison of coverage breadth and depth among different methods for IAV sequencing

Four targeted IAV sequencing methods (Table 1) were benchmarked by sequencing three RNA mixtures in triplicate. The mixtures, “S1”, “S2”, and “S3”, were prepared by combining purified RNA from four IAV strains (subtypes H1N1, H3N2, H3N1, and H5N1) with IAV-negative wastewater RNA (Figure 1a). The resulting target-to-background RNA mass ratio increased by approximately 100-fold from S1 to S3, (4.4×10⁻⁸ to 2.4×10⁻⁶), although for H3N1 and H5N1, equal concentrations were targeted to spike into mixtures S2 and S3 due to limited viral titers. Unenriched metagenomic sequencing was also performed with mixture S3 to assess the fold-enrichment of the targeted sequencing methods.

Across all sequencing methods, the coverage breadth of HA segments increased with increasing viral RNA concentrations. At the lowest concentration (mixture S1), the Universal-amplicon method recovered the HA segment only for H3N2, while Probe-Twist panel recovered 86 ± 10% for H3N1 and H3N2, but only 59 ± 4% for H1N1 and 40 ± 17% for H5N1 (Table S8). The two customized approaches, Tiled-amplicon and Probe-IAV, achieved over 78 ± 13% coverage for all four strains (Table S8). At the highest IAV concentrations (mixture S3), Tiled-amplicon, Probe-IAV, and Probe-Twist each recovered near-complete HA segments for H3N1, H3N2, and H5N1 (coverage breadth >98.5%, Figure 1b, Table S8), while tiled amplicon coverage of H1N1 remained limited to 87% due to one failed tile. Universal-amplicon did not produce reads for HA from H5N1 at any concentration, as the synthetic RNA lacked the primer-binding regions. Notably, all enrichment methods outperformed unenriched metagenomic sequencing, which recovered less than 22% of the HA segment for H3N1 and H3N2 even at the highest input concentration (Table S8). We note that coverage breadth is somewhat dependent on total sequencing effort, which is not directly comparable across sequencing platforms, insert sizes, and read lengths. For these analyses, we used the full depth shown in Table S7.

To assess potential regional biases in HA segment recovery, coverage depth profiles for each method were examined in mixture S3 (Figure 1c). The Tiled-amplicon method generally showed higher normalized coverage depth for most regions of the HA segment, and exhibited a repeating pattern of peaks and valleys due to tile overlap, along with reduced coverage for one tile of H3N1 and H3N2. Probe-IAV provided the most uniform HA segment coverage across all strains. In contrast, Probe-Twist exhibited coverage dropouts in H3N2 (500–1000bp) and H5N1 (1200–1500bp). The Universal-amplicon method recovered only the ends of the H1N1 HA segment, while unenriched samples recovered only short and randomly distributed fragments.

For the three whole-genome methods, we expanded our coverage analyses to include all eight segments (Figure S3). Here, the two probe-capture methods both performed well for mixture S3 (Figure S3a and Table S8). For mixture S1, both showed similar coverage breadth across different strains, but Probe-IAV provided higher and more even coverage depth than Probe-Twist (Figure S3b). Notably, when compared to the predicted coverage results, Probe-IAV showed higher empirical coverage breadth for segments with low predicted probe density (e.g., PB2 and NS of H3N1 showed over 95% coverage despite predicted recovery of less than 75%; Figure S2a). Additionally, regions with higher designed probe density did not consistently correspond to greater coverage depth; instead, coverage was generally uniform across the genome (Figure S2b vs. Figure 1c). Unlike the two customized methods, the universal-amplicon method continued to perform poorly for mixture S1, achieving good coverage across all strains only for the two shortest segments, MP and NS (> 99.4%, Table S8), while coverage for other segments varied widely among strains (Figure S3a).

2.3. Comparison of sensitivity and quantitativeness of different sequencing methods

To compare the sensitivity and potential enrichment bias towards different strains, we analyzed the relationship between library input quantity (as determined by dPCR assays targeting the HA segment of each strain) and the resulting coverage depth (RPKM) across all sequencing methods and strains (Figure 2a). H5N1 was excluded from this correlation because dPCR failed for H5 in spike-in samples, even though three sequencing methods yielded reads for H5N1 (Figure 1a).

Figure 2: — (a) Correlation between sequencing library input (HA segment copy number measured by dPCR) and the normalized coverage depth (RPKM) for each sequencing method. Note that library input gene copies accounts for differences in library input volumes. (b) log10(RPKM) values of all 8 segments within each virus strain, colored by sequencing method for mixture S3 (top) and mixture S1 (bottom). Note that in many cases, Probe-IAV and Probe-Twist are completely overlapping.

The correlations were strong for Probe-Twist across all strains (R² = 0.96–0.99, Table S9), while Universal-amplicon displayed the weakest correlations (for H1N1 and H3N2; R² < 0.82, Table S9). Tiled-amplicon and Probe-IAV performed well for H1N1 (R² ≥ 0.97). Notably, the slopes of the regression lines for different IAV subtypes were generally similar within sequencing methods, with highly parallel or overlapping trends (e.g., H3N1 and H3N2 in the two probe-capture methods, Figure 2a). However, regression lines for H3 and H1 were vertically offset within a given sequencing method, showing that normalized coverage depth is predicted to differ by subtype, even if library input quantities are held constant. Among all sequencing methods, the Tiled-amplicon was the most sensitive, yielding the highest coverage depths for all strains, particularly at the lowest input levels (< 10 gc). Universal-amplicon showed lower overall sensitivity but had a steeper response curve for H3N2 (slope = 1.25, Table S9), with RPKM values eventually surpassing that of Probe-Twist and Probe-IAV at higher input concentrations. The two probe-capture methods showed similar performance across the tested concentration range, but more gradual slopes for Probe-IAV suggest this method may be more sensitive at low inputs (Figure 2a). When the coverage depth analysis was extended to compare across segments within a strain, both probe-capture methods (Probe-IAV and Probe-Twist) yielded relatively uniform RPKM values across segments (Figure 2b). In contrast, the Universal-amplicon method exhibited extreme biases towards random segments (Figure 2b). Notably, both probe-capture methods displayed a modest bias toward the MP segment.

2.4. Impact of virus decay and concentration/extraction methods on IAV sequencing

To evaluate the impact of concentration/extraction methods and virus decay on sequencing outcomes, we spiked equal volumes (10 uL) of cultured H1N1, H3N2, and H3N1 into 50 mL raw wastewater aliquots. Immediately after spike-in (time 0), RNA was extracted from one set of aliquots using two different concentration/extraction methods: Innovaprep + Powerviral (IP) and the large-volume Promega Wizard Enviro TNA kit (PMG). The remaining aliquots were incubated for 3 hours at room temperature to reach equilibrium partitioning of viruses to solids and to allow potential viral decay, followed by RNA extraction in triplicate using both methods (Figure 3a). IAV was quantified in extracted RNA using dPCR and sequenced using the four methods.

After three hours of incubation, the PMG method recovered significantly higher HA quantities (by dPCR) than IP for all IAV strains (Figure 3b, p < 0.05, Table S10). However, IP samples yielded higher normalized coverage depth, especially when sequenced with Tiled-amplicon and Probe-Twist (p < 0.05, Table S10). HA segment coverage breadth was similar between IP and PMG-treated samples, with >80% coverage across all strains when sequenced using Tiled-amplicon, Probe-IAV, and Probe-Twist. In contrast, coverage was generally poor with the Universal-amplicon method (Figure 3c). The potential impact of virus decay on genome integrity was inspected by comparing the coverage across the HA genome. All methods except universal-amplicon consistently recovered the full HA segment across all strains, and no region-specific decay was observed. However, sharp drops in coverage depth were observed at the ends of the HA segment with the Probe-IAV method (first 100 bp for H1N1 and the last 100 bp for H3N1, Figure 3d). Overall, coverage depth generally followed the trend IP_0h > IP_3h > PMG_0h > PMG_3h across all sequencing methods, while Universal-amplicon showed greater variability (Figure 3d).

To assess the effects of decay across the whole genome, we compared the log₂ fold change (log₂FC) in coverage breadth between zero and three hours of incubation using both IP and PMG concentration/extraction methods (Figure 3e). IP samples showed little change in coverage breadth across virus strains and segments, with only minor decreases (log₂FC > 0.1) observed in Universal-amplicon. PMG samples exhibited greater coverage loss across more segments for all three whole-genome methods (Table S11). Universal-amplicon showed both increases and decreases in segment coverage with large standard deviations, which suggests stochastic amplification in PMG samples.

2.5. Economic analysis for different sequencing methods

To support practical decision-making and optimize wastewater processing workflows, we conducted a cost and labor analysis of the four sequencing methods (Table S12). The amplicon-based approaches had lower reagent costs ($173 per sample) and labor time (0.38 h per sample for H5 tiled-amplicon) compared to the probe-capture methods. The two probe-capture approaches had similar instrument and labor requirements, but the customized Probe-IAV was slightly less expensive than Probe-Twist in reagent costs ($533 vs. $572 per sample). Processing 24 probe-capture samples required ~40 hours, nearly half of which was hands-off time for PCR cycling and hybridization (~16–17 hours). In contrast, 57 tiled-amplicon samples were processed in ~20 hours, with only ~3 hours attributed to PCR cycling. The sequencing cost also varied depending on the required sequencing depth per sample. Although our study used the lowest sequencing depth for Probe-Twist due to cross-lab operational differences, in practice, the most sensitive method (tiled-amplicon) would likely incur the lowest sequencing depth cost.

3. Discussion

3.1. Performance of different targeted sequencing methods

The four benchmarked sequencing methods differed in their targets (HA segment vs. whole genome), reaction chemistry (PCR vs. hybridization), and library preparation protocols. These differences resulted in varying sequencing performance, which can be leveraged to suit different surveillance objectives.

One objective is the ability to detect low-abundance emerging variants within the seasonal subtypes of IAV. The sensitivity (highest RPKM) and coverage breadth of the HA segment (>87% for all subtypes) achieved with tiled-amplicon make it well-suited for this application. Although our current panel targets only the HA segments of H1N1, H3N2, and H5N1 subtypes, the design principles and performance demonstrated here suggest that this approach can be extended to other segments, enabling broader, subtype-specific coverage as shown in recent studies^26,27. An important concern is that viral evolution could lead to amplicon drop-out or decreased recovery, necessitating primer redesign²⁸ (see below).

Another objective is whole-genome variant analysis of prevalent human and panzoonotic subtypes of IAV in wastewater. Such analysis is important to understand host range adaptation, antiviral resistance, and virulence^29,30, and would be strengthened by the recovery of whole genomes for identification of unique mutations indicative of particular strains. In this regard, the Universal-amplicon method performed poorly, likely due to its dependence on high-quality, high-concentration genomic RNA inputs. (Figure 1b, 1c, and 2b). The two probe panels performed well, consistently recovering near-complete genomes. We observed differences at low input quantities (Figure 1b, 1c, and 2b), perhaps attributable to the probe design and sequencing settings: First, our custom Probe-IAV panel incorporated sequences updated through 2024, including H5N1 2.3.4.4b and all spike-in strains, whereas Probe-Twist used references only up to 2018, with the most recent H5N1 from 2017 (Casey Riegler, Twist Bioscience, personal communication, 7/11/25). Further, the broad viral target range of Probe-Twist may have reduced read coverage for low-abundance targets like IAV^18,31. Second, due to differences in laboratory protocols, Probe-Twist had lower sequencing depth and shorter paired-read length than Probe-IAV (Table 1), which likely reduced its HA segment coverage breadth, especially in the lowest-concentration sample (S1). Despite these differences, the Probe-Twist panel still performed well, although with lower sensitivity for the H5N1 2.3.4.4b strain than the custom Probe-IAV panel. These results highlight the general robustness of the probe-capture approach to detect strains not included in the original design and suggest that regular design updates will increase sensitivity for emerging strains.

In addition to qualitative surveillance information, we were interested in the ability of sequencing to provide semiquantitative information about strain prevalence. This could be achieved if the sequencing method preserved relative abundance information. Our RNA mixtures contained a constant amount of wastewater RNA background with serially diluted viral RNA, but none of the sequencing methods reliably reflected the concentration ratios between subtypes as verified by dPCR (Figure 2a), as H1N1 was consistently undercounted relative to H3N1 and H3N2. Such discrepancies are likely to become more complex in real wastewater samples, where viral composition varies longitudinally and geographically (Section 3.3). We recommend against quantitative comparisons across subtypes using amplicon- or probe-based enrichment, as enrichment efficiencies may differ across subtypes; however, there may be less bias within subtypes, as observed for our H3N1 and H3N2 strains (Figure 2a; highly overlapping lines). In contrast to uneven subtype enrichment, the two probe-capture methods generally enriched all eight IAV segments evenly (Figure 2b), which might assist in identifying potential reassortments across different subtypes.

3.2. Design of Probe Capture vs. Tiled-amplicon Sequencing

We designed custom probe and tiled amplicon panels, allowing us to compare the ease of design and accuracy of in silico performance predictions for both. Successful design of broad panels was easier for probes than primers due to the availability of comprehensive design algorithms and to different constraints inherent in the enrichment mechanisms. In probe capture, random mismatches have been shown to be more detrimental than continuous mismatches³², leading us to choose a strategy that allowed gaps in some regions while keeping strict mismatch limits in others. In silico performance predictions by Syotti underestimated actual coverage breadth, likely because sequence mapping oversimplified these important details of hybridization chemistry.

Meanwhile, PCR-based enrichment is affected by mismatches only if they are in the primer-binding regions. Our simulate_pcr results predicted 2–3 primer mismatches which, although theoretically allowable under permissive conditions, caused certain tiles to fail in vitro. Additional design needs for PCR included consistent ranges of amplicon sizes and melting temperatures, and prevention of primer dimers and unwanted amplicons. These challenges would be expected to intensify when the number of primers increases, e.g., for panels targeting more genome segments, using shorter amplicons, or targeting increased diversity of IAV strains and subtypes. Recent primer design tools, such as Olivar model basic PCR reaction parameters²³. Expanding models in tools like Olivar and Syotti to incorporate reaction dynamics and integrating iterative design-test-refine workflows could improve future performance. Overall, we found that increasing the diversity of input sequences increased both probe and primer counts in design outputs. While the cost of increased probes was higher compared to primers, the mechanistic constraints for increased primers were far greater than for probes.

Both probe-capture and tiled-amplicon panels may require updates over time because Influenza A viruses continually evolve through genetic drift and reassortment, altering the sequence diversity of circulating strains³⁴. Probe panels can be easily updated with newly released sequences or genomes reconstructed from wastewater, and the required frequency of updates depends on the balance between viral evolution rate and mismatch tolerance. The frequency of primer updates for amplicon-based panels depends on whether the primer binding sites remain “low entropy” over time²³. Designing primers that target broader and more conserved regions, where feasible within PCR constraints, could reduce the need for frequent redesign. For example, the Universal-amplicon panel is highly robust to viral evolution due to the conserved 5’ and 3’ termini in IAV genomes. Lastly, although probe design updates are likely to be more straightforward, the lead time and cost required would be expected to be higher than that of primers.

3.3. Wastewater-Specific Challenges and Implications of One Health IAV surveillance

Real-world application of IAV targeted sequencing in wastewater must account for factors that affect RNA yield, integrity, and purity, including concentration and extraction methods, virus decay, and freeze-thaw cycles. Based on our findings here and previously^17,35, large-volume concentration and extraction (PMG) maximizes total IAV RNA yield, whereas ultrafiltration (IP) produces a higher proportion of IAV RNA relative to background, which is particularly beneficial for probe-capture methods where off-target reads are common (Figure 3b). IP also outperformed PMG for amplicon sequencing despite higher dPCR signals with PMG (Figure 3b). One plausible explanation is greater RNA fragmentation in PMG samples: longer amplicons require longer intact RNA segments containing both primer binding sites, making tiled-amplicon and especially universal-amplicon more vulnerable to fragmentation, whereas short dPCR targets remain amplifiable. Beyond extraction, viral RNA quantity and quality may also be impacted by virus decay in wastewater or during freeze-thaw cycles after extraction. This could lead to either overall decreased virus signal or more fragmented RNA. We observed that wastewater incubation led to modest, genome-wide RNA degradation, affecting amplicon methods more severely due to their reliance on intact primer sites. Probe-capture showed a similar overall decline but with end losses and sporadic dips (Figure 3d). More experiments are needed to confirm whether wastewater processing causes region-specific degradation.

Ultimately, the choice of sequencing approach for IAV characterization in wastewater should be guided by the surveillance goal, available budget, and laboratory capacity. Practically, the tiled-amplicon assay is faster and more cost-effective to implement, making it well-suited for high-throughput subtyping of specific HA subtypes. In contrast, probe-capture is more labor-intensive and expensive (Table S12), but offers whole-genome coverage, broader subtype detection, and greater sensitivity in highly degraded samples.

The findings of this study have limitations when applied to real wastewater conditions. The spike-in samples used ranged from ~50 to <5 genome copies/μL (S3-S1), which may not fully represent the lower and more variable concentrations typically found in wastewater³⁶. Moreover, the use of a single grab sample of IAV-negative wastewater as the background does not account for the complexity and variability of chemistry and microbiology across wastewater from different locations and over time. These factors may influence the observed quantitative relationship between dPCR and RPKM, especially for probe-capture methods. Further investigation is required to assess the performance of these sequencing methods under a range of real-world wastewater conditions.

4. Methods

4.1. Culturing and Acquisition of Spike-in Influenza A Viruses

Three IAV serotypes were obtained from BEI: H1N1 (A/California/04/2009), H3N2 (A/Netherlands/823/1992), and a reassorted H3N1 strain, containing the HA segment from H3N2 (A/Texas/1/77) and the remaining seven segments from H1N1 (A/Puerto Rico/8/34). The strains were propagated, harvested, and titered following established protocols as previously described³⁷. For H5N1, synthetic RNA controls were obtained from Twist Bioscience, consisting of the HA (GenBank: OR051630.1) and NA (GenBank: OR051629.1) segments. All viruses and RNA were stored at −80°C prior to use.

4.2. Design of Genomic Surveillance Tools

The custom IAV-specific probe panel was designed to target 11 different subtypes of IAV: H1N1, H3N1, H5N1, H1N2, H2N2, H3N2, H9N2, H7N9, H4N6, H5N6, and H5N9³⁸. Genomes from avian, swine, and human hosts were downloaded from GISAID and filtered for completeness. 525,075 sequences are quality-filtered and clustered by segment at 90% identity, producing a final set of 589 sequences for probe design (See Table S1 and Supplementary Methods for more details on the input genomes). Syotti³⁹ was used to design probes with a length of 120 bp (-L 120), a mismatch tolerance of 5 (-d 5), coverage of 90% (-c 0.90), and random input sequence order (-r). This was followed by a fill-gaps command with redesigned probes for gaps larger than 20 bp (-g 20) and a mismatch tolerance of 5 (-d 5). The coverage of the generated probes was validated using Syotti against the spike-in reference genomes (Supplementary methods). Probes with alignment lengths greater than 80 bp to the human genome or prokaryotic sequences were removed from the panel. The final probe panel contained 7,448 probes, which were synthesized by Twist Bioscience (San Francisco, California, USA).

Custom tiled amplicon primers were designed for the HA segments of H1N1, H3N2, and H5N1 subtypes using PriMux^40,41. Briefly, HA sequences were downloaded from GISAID and filtered to remove short sequences (see Supplementary Methods). After clustering segments from each subtype at 100% identity with CD-HIT v4.8.1⁴², the reference genomes for the cultured virus strains were added, and multiple sequence alignments were generated for each subtype using MAFFT v7.525⁴³. Each alignment was submitted separately to PriMux with options for 450 nt segments with 200 nt overlaps. The resulting primer sets were analyzed with simulate_PCR⁴⁴ with a word-size of 5, maximum of 3 mismatches, and no mismatches allowed within 3 nt of the 3’ end, and visualized in Python v3.9.13 (Figure S2c). Subsequent manual curation and iteration were performed to optimize the primer design (see Supplementary Methods).

4.3. Pre-existing Methods for IAV Sequencing

Primers for whole-segment amplification were adapted from widely used sets for influenza genomic surveillance in clinical samples⁷, referred to herein as “Universal-amplicon”. These primers targeted the conserved non-coding regions at both ends (3’ UTR and 5’ UTR) of each segment. The Comprehensive Viral Research Panel (Twist Biosciences) was adopted to represent an off-the-shelf probe panel, referred to herein as “Probe-Twist”. This panel contains >1 million unique probes targeting 3,153 viral genomes, including influenza A virus genomes from multiple collection dates spanning all four spike-in subtypes (H1N1, H3N1, H3N2, H5N1).

4.4. Sample Preparation

Direct RNA mixtures of extracted virus RNA and wastewater RNA

Direct RNA mixtures were prepared by separately extracting viral stock RNA and IAV-negative wastewater RNA, then combining them at a series of three targeted concentration ratios to mimic post-extraction conditions (Figure 1a). RNA from H1N1, H3N2, and H3N1 viral stocks was isolated using the Direct-zol RNA kit (Zymo Research). Twist synthetic RNA controls for the H5N1 HA and NA segments were used directly without extraction. Wastewater was collected as a grab sample (10/3/2025) from a large city in California, and viral RNA was obtained using the InnovaPrep Concentrating Pipette, followed by extraction with the AllPrep PowerViral Kit (Qiagen) as previously described¹⁷. Wastewater RNA was treated with DNase (Qiagen). Viral RNA was quantified by dPCR using assays targeting the HA gene and distinguishing the spike-in strains, as previously described³⁷. The wastewater RNA was confirmed to be HA-negative for all four IAV spike-in strains. Virus cocktails were then prepared, and a 5 uL aliquot was mixed with 12 uL of wastewater RNA to reach target concentrations of around 5.8,58.8, and 588 gc/uL per virus in library input (Figure 1a and Table S6). The mass ratio of virus to wastewater background ranged from 4.4×10⁻⁸ to 2.4×10⁻⁶ (S1 to S3, Table S6).

Incubation mixtures with spike-in virus in 40-mL wastewater

Incubation samples were prepared by spiking virus stocks into wastewater, followed by incubation before concentration and extraction (Figure 3a). These samples were designed to mimic potential virus decay in wastewater. Specifically, 10 μL of each of the three virus stocks (H1N1, H3N1, and H3N2) was spiked into 40 mL aliquots of wastewater influent. Two aliquots were concentrated and extracted immediately, while six aliquots underwent a 3-hour rotation at room temperature before concentration and extraction⁴⁵. Viral RNA was concentrated and extracted from each 40 mL aliquot according to two protocols previously described³⁵. Briefly, the Innovaprep method (IP) included the addition of 400μL of 5% Tween 20, centrifugation at 7,000 g, ultrafiltration of the supernatant using the Innovaprep CP Select pipette (CC08004 Unirradiated, InnovaPrep), and elution using elution fluid Tris (InnovaPrep). Nucleic acids were extracted from viral concentrate with the AllPrep PowerViral kit (Qiagen), with an on-column DNase treatment. The Promega method (PMG) utilized the Wizard Enviro Total Nucleic Acid kit (Promega), according to manufacturer’s instructions, followed by RNA purification using Promega RQ1 DNase. For both methods, the final RNA was eluted in 80 μL per sample. The concentrations of all three virus input stocks were tested after same-day extraction using the Direct-zol RNA kit (Zymo Research) and dPCR quantification to avoid extra freeze-thaw cycles.

4.5. Library preparation and sequencing

Probe capture sequencing using Twist library preparation and enrichment kits

The two probe-capture methods, Probe-IAV and Probe-Twist, followed the Twist Total Nucleic Acids Library Preparation EF Kit 2.0 and the Twist Target Enrichment Standard Hybridization v2 Protocol, with optimizations for viral samples. Briefly, 15 μL of RNA (approximately 50 ng) were used as input for library preparation. First-strand cDNA synthesis was carried out using the ProtoScript II First Strand Synthesis Kit with Random Primer 6 (New England Biolabs) for the Probe-IAV panel, while Superscript IV (Thermo Fisher Scientific) was used for the Probe-Twist panel. For both panels, second-strand cDNA synthesis was performed using the NEBNext Ultra II Non-directional RNA Second Strand Synthesis Module (New England Biolabs). For fragmentation, cDNA samples (≤ 25 ng) intended for Probe-IAV were incubated for 5 minutes to achieve a target segment length of 330bp, samples intended for Probe Twist were incubated for 20 minutes to achieve a segment length of 180–200 bp. After end repair and dA-tailing using NEBNext Ultra II End Prep reagent, adapters were ligated, and each sample was indexed using Twist’s unique dual index system. For barcoding PCR, 12 amplification cycles were used for the Probe-IAV panel, while 10 cycles were performed for Probe-Twist panel.

Following PCR clean-up, 5–8 indexed libraries were multiplexed to a total mass of 1500 ng for each hybridization reaction. Hybridization was performed for 16 hours, followed by washing with magnetic Twist Streptavidin Binding Beads and elution. The post-capture libraries were amplified with 17 cycles in the Probe-IAV and 8 cycles in the Probe-Twist, based on their different panel sizes. Enriched samples were purified before PCR quantification for sample pooling. After the final clean-up, all enriched samples from the customized IAV panel were sequenced on a NextSeq P2 600-cycle 300 PE, and Probe-Twist samples were sequenced on NextSeq P1 300-cycle 150 PE.

Tiled amplicon sequencing

5 μL RNA samples were used as input for HA segment tiled amplicon library preparation. Nuclease-free water served as the negative control, while a mixture of pure virus stock and Twist RNA control was used as the positive control. cDNA was synthesized using the SuperScript IV One-Step Master Mix (Thermo Fisher Scientific). All primers, reactions, and cycling conditions are summarized in Tables S3–5. For each sample, four separate reactions were performed using H1 & H3 primer pools 1 and 2, and H5 primer pools 1 and 2, generating 300–415 bp tiled amplicons. The reaction products were purified using AMPure XP beads with a ratio of 1X, and the amplicons were confirmed via e-gel and quantified by Qubit DNA assay. Based on the Qubit concentration, the four PCR products were combined in a 3:3:1:1 mass ratio for H1 & H3 pool 1, H1 & H3 pool 2, H5 pool 1, and H5 pool 2, respectively. Nuclease-free water was added to reach 50 μL per sample as the final input for end-repair and dA-tailing (NEBNext Ultra II module). Following the Twist EF library preparation protocol, Twist adaptors were ligated to the dA-tailed sequences, and each sample was indexed with Twist’s unique dual indices. An additional 8-cycle barcoding PCR was performed. The final enriched libraries, containing ~400–565 bp segments, were quality-checked using a fragment analyzer, followed by size selection to remove amplicons smaller than 300 bp. Libraries were pooled and sequenced on NextSeq P2 600 cycles 300 PE.

Universal amplicon library and Nanopore sequencing

5-μL samples were used for whole-segment universal primer amplicon sequencing library preparation. The same negative and positive controls were used as in the tiled-amplicon method. Primers for whole-segment RT-PCR were taken directly from Zhou et al., 2009⁷. RT-PCR was performed according to the Superscript IV protocol (Tables S4 and S5). After purification with 0.8X beads, amplicons were eluted in 20 μL nuclease-free water. Up to 200 fmol of purified amplicon was used as input to the Oxford Nanopore ligation library preparation with Native Barcoding Kit 24 V14 (SQK-NBD114.24). Following library preparation, the pooled sample was eluted in 15 μL of elution buffer and quantified by Qubit DNA assay. A total of 50 fmol was loaded onto a minION flowcell (FLO-MIN114) for sequencing. Basecalling was performed using the high accuracy model: dna_r10.4.1_e8.2_400bps_hac@v5.0.0.

4.6. Bioinformatic analysis

The reference genomes for four spike-in viruses were downloaded from NCBI using the subtypes and accession numbers provided by the manufacturer or culturing lab. Segments were aligned using MUSCLE (v3.8.31)⁴⁶. The highly conserved 5’ and 3’ UTRs were trimmed from all references prior to mapping to prevent mis-mapping of reads across strains and segments. All raw short reads generated from Illumina sequencing were quality-trimmed using fastp (v0.24.0)⁴⁷ to remove adaptors, polyX tails, and low-quality bases, as well as to filter out reads shorter than 70 bp. For tiled-amplicon sequencing, forward and reverse primers were additionally trimmed using Cutadapt⁴⁸. All high-quality reads were summarized for statistical metrics using Seqkit (v2.4.0)⁴⁹ and mapped to the trimmed reference genomes using Bowtie2 (v2.5.1)⁵⁰. Concordantly mapped paired reads were further filtered with Reformat (BBMap v39.01)⁵¹ to ensure the total number of indels, deletions, insertions, and substitutions was less than 5. Filtered mapped reads were re-paired and indexed before downstream coverage analysis with Samtools (v1.17)⁵², which summarized mapping statistics, coverage depth, and coverage breadth. Variant calling was performed using Samtools mpileup and iVar (v1.4.2)⁵³, with variants filtered by a minimum mapping quality of 20 and a coverage depth of at least 10. BCFtools (v1.17) was also used to generate VCF files and consensus sequences.

4.7. Data analysis

The normalization of coverage depth and reads RPKM across different sequencing methods was calculated with equations (1) and (2), respectively.

RPKM = \frac{\frac{mapped reads}{genome length (k b)}}{Total reads after fastp filtering (M)}

(1)

Coverage depth = \frac{coverage depth per position}{Total length of filtered reads (G b)}

(2)

The normality of data was assessed using the Shapiro-Wilk test. If data were normally distributed, statistical differences between different sequencing methods were evaluated using the one-way ANOVA test followed by post hoc pairwise Tukey’s HSD. Otherwise, Kruskal-Wallis tests were applied, followed by the post hoc pairwise Dunn’s test (Table S10). All statistical tests were performed using the Python package scipy.stats, and significance was determined at a 95% confidence interval (p < 0.05). All analysis and data visualization were performed with custom python and R scripts, available upon request.

4.8. Economic analysis

The economic analysis includes both cost and labor evaluations for each sequencing method. Costs focus on reagents and consumables, excluding instrument purchase and maintenance. Reagents sold in bulk formats (e.g., 96 reactions) were normalized to a per-sample cost, and sequencing costs were based on the facility and sequencing depth used in this study. Labor time was recorded from manual processing of 24 probe-capture and 57 tiled-amplicon samples. Although labor time could be converted into operational costs, this was not done in the current analysis. All cost data were sourced from purchase records, list prices, or lab documentation and are reported in 2024 U.S. dollars.

Supplementary Material

Supplement 1

media-1.xlsx^{(80KB, xlsx)}

Supplement 2

media-2.pdf^{(1.1MB, pdf)}

Acknowledgements

Funding was provided by the UCOP Lab Fees CRT Award (L22CR4507) and NIH R00 Award (4R00GM144747-03). We thank Dr. Jarno N. Alanko, author of Syotti, for his suggestions during probe design. We are grateful to Anne Beaudoin at EBMUD for assistance with wastewater sample collection. The spike-in virus was cultured by Dr. Staci R. Kane, Dr. Monica K. Borucki, and Jordan Boeck. We prepared sequencing libraries with guidance from Casey Riegler (Twist Bioscience) and Bryan Bach (Functional Genomics Laboratory). Flow-cell loading and sequencing were performed at the Vincent J. Coates Sequencing Laboratory (QB3, UC Berkeley; RRID: SCR_022170) with advice from Shana L. McDevitt and Carrie Rose. Funding to RSK and JBT was provided by the Lawrence Livermore National Laboratory’s Laboratory Directed Research and Development Program. This work was performed in part under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, Release Number LLNL-JRNL-2011445. LLNL Disclaimer: This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United States government nor Lawrence Livermore National Security, LLC, nor any of their employees make any warranty, expressed or implied, or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represent that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.

4.9. Data availability

All raw sequencing data and analysis code will be made available upon request.

Reference

1.Biggerstaff M., Cauchemez S., Reed C., Gambhir M. & Finelli L. Estimates of the reproduction number for seasonal, pandemic, and zoonotic influenza: a systematic review of the literature. BMC Infect. Dis. 14, 480 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Horimoto T. & Kawaoka Y. Influenza: lessons from past pandemics, warnings from current incidents. Nat. Rev. Microbiol. 3, 591–600 (2005). [DOI] [PubMed] [Google Scholar]
3.Nguyen T.-Q. et al. Emergence and interstate spread of highly pathogenic avian influenza A(H5N1) in dairy cattle in the United States. Science 388, eadq0900 (2025). [Google Scholar]
4.CDC. Genetic Sequences of Highly Pathogenic Avian Influenza A(H5N1) Viruses Identified in a Person in Louisiana. Avian Influenza (Bird Flu) https://www.cdc.gov/bird-flu/spotlights/h5n1-response-12232024.html (2025). [Google Scholar]
5.CDC. Influenza Virus Genome Sequencing and Genetic Characterization. Influenza (Flu) https://www.cdc.gov/flu/php/viruses/genetic-characterization.html (2024). [Google Scholar]
6.Pagel C. & Yates C. A. Tackling the pandemic with (biased) data. Science 374, 403–404 (2021). [DOI] [PubMed] [Google Scholar]
7.Zhou B. et al. Single-reaction genomic amplification accelerates sequencing and vaccine production for classical and Swine origin human influenza a viruses. J. Virol. 83, 10309–10313 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Lee A. J. et al. Wastewater monitoring of human and avian influenza A viruses in Northern Ireland: a genomic surveillance study. Lancet Microbe 5, 100933 (2024). [DOI] [PubMed] [Google Scholar]
9.Khan M. et al. Significance of wastewater surveillance in detecting the prevalence of SARS-CoV-2 variants and other respiratory viruses in the community – A multi-site evaluation. One Health 16, 100536 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Maqsood R. et al. Influenza Virus Genomic Surveillance, Arizona, USA, 2023–2024. Viruses 16, 692 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
11.Rothman J. A. et al. RNA Viromics of Southern California Wastewater and Detection of SARS-CoV-2 Single-Nucleotide Variants. Appl. Environ. Microbiol. 87, e01448–21 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Wyler E. et al. Pathogen dynamics and discovery of novel viruses and enzymes by deep nucleic acid sequencing of wastewater. Environ. Int. 190, 108875 (2024). [DOI] [PubMed] [Google Scholar]
13.Tisza M. J. et al. Sequencing-Based Detection of Avian Influenza A(H5N1) Virus in Wastewater in Ten Cities. N. Engl. J. Med. 391, 1157–1159 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
14.John A. et al. Characterizing influenza A virus lineages and clinically relevant mutations through high-coverage wastewater sequencing. Water Res. 287, 124453 (2025). [DOI] [PubMed] [Google Scholar]
15.Vo V. et al. Identification and genome sequencing of an influenza H3N2 variant in wastewater from elementary schools during a surge of influenza A cases in Las Vegas, Nevada. Sci. Total Environ. 872, 162058 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Kantor R. S., Nelson K. L., Greenwald H. D. & Kennedy L. C. Challenges in Measuring the Recovery of SARS-CoV-2 from Wastewater. Environ. Sci. Technol. 55, 3514–3519 (2021). [DOI] [PubMed] [Google Scholar]
17.Jiang M. et al. Evaluation of the Impact of Concentration and Extraction Methods on the Targeted Sequencing of Human Viruses from Wastewater. Environ. Sci. Technol. 58, 8239–8250 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Kantor R. S. & Jiang M. Considerations and Opportunities for Probe Capture Enrichment Sequencing of Emerging Viruses from Wastewater. Environ. Sci. Technol. 58, 8161–8168 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Kuchinski K. S., Duan J., Himsworth C., Hsiao W. & Prystajecky N. A. ProbeTools: designing hybridization probes for targeted genomic sequencing of diverse and hypervariable viral taxa. BMC Genomics 23, 579 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Hysom D. A. et al. Skip the Alignment: Degenerate, Multiplex Primer and Probe Design Using K-mer Matching Instead of Alignments. PLOS ONE 7, e34560 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Gardner S. N. et al. Multiplex degenerate primer design for targeted whole genome amplification of many viral genomes. Adv. Bioinforma. 2014, 101894 (2014). [Google Scholar]
22.Kent C. et al. PrimalScheme: open-source community resources for low-cost viral genome sequencing. 2024.12.20.629611 Preprint at 10.1101/2024.12.20.629611 (2024). [DOI] [Google Scholar]
23.Wang M. X. et al. Olivar: towards automated variant aware primer design for multiplex tiled amplicon sequencing of pathogens. Nat. Commun. 15, 6306 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Kent C. et al. PrimalScheme: open-source community resources for low-cost viral genome sequencing. 2024.12.20.629611 Preprint at 10.1101/2024.12.20.629611 (2024). [DOI] [Google Scholar]
25.treangenlab/Olivar. Treangen Lab; (2025). [Google Scholar]
26.John A. et al. Wastewater sequencing reveals the genomic landscape of Influenza A virus in Switzerland. 2025.04.17.25325990 Preprint at 10.1101/2025.04.17.25325990 (2025). [DOI] [Google Scholar]
27.Nash D. et al. A Genome Sequence Variant Monitoring Program for Seasonal Influenza A H3N2 and Respiratory Syncytial Virus A using Wastewater-Based Surveillance in Ontario, Canada. 2025.07.29.667219 Preprint at 10.1101/2025.07.29.667219 (2025). [DOI] [Google Scholar]
28.Davis J. J. et al. Analysis of the ARTIC Version 3 and Version 4 SARS-CoV-2 Primers and Their Impact on the Detection of the G142D Amino Acid Substitution in the Spike Protein. Microbiol. Spectr. 9, e01803–21. [Google Scholar]
29.Chakraborty C., Das A., Bhattacharya M. & Islam Md. A. Mutations in the influenza virus, primarily H5N1, enhance the virus’s virulence, favor receptor interaction, and increase drug resistance. Int. J. Surg. Lond. Engl. 111, 4128–4131 (2025). [Google Scholar]
30.Ping J. et al. PB2 and Hemagglutinin Mutations Are Major Determinants of Host Range and Virulence in Mouse-Adapted Influenza A Virus. J. Virol. 84, 10606–10618 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.McCall C. et al. Targeted Metagenomic Sequencing for Detection of Vertebrate Viruses in Wastewater for Public Health Surveillance. ACS EST Water 3, 2955–2965 (2023). [Google Scholar]
32.The Effects of mismatches on DNA Capture by Hybridization | Twist Bioscience. https://www.twistbioscience.com/resources/white-paper/effects-mismatches-dna-capture-hybridization.
33.Untergasser A. et al. Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res. 35, W71–W74 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Kim H., Webster R. G. & Webby R. J. Influenza Virus: Dealing with a Drifting and Shifting Pathogen. Viral Immunol. 31, 174–183 (2018). [DOI] [PubMed] [Google Scholar]
35.Wang A. L. et al. Benchmarking Concentration and Extraction Methods for Wastewater-Based Surveillance of Eight Human Respiratory Viruses: Implications for Rapid Application to Novel Pathogens. Environ. Sci. Technol. (2025) doi: 10.1021/acs.est.4c13635. [DOI] [Google Scholar]
36.Wolfe M. K. et al. Wastewater-Based Detection of Two Influenza Outbreaks. Environ. Sci. Technol. Lett. 9, 687–692 (2022). [Google Scholar]
37.Wang A. L. et al. Benchmarking concentration and direct extraction methods for wastewater-based surveillance of eight human respiratory viruses: implications for rapid application to novel pathogens. 2024.11.27.625007 Preprint at 10.1101/2024.11.27.625007 (2024). [DOI] [Google Scholar]
38.Zhuang Q. et al. Diversity and distribution of type A influenza viruses: an updated panorama analysis based on protein sequences. Virol. J. 16, 85 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Alanko J. N. et al. Syotti: scalable bait design for DNA enrichment. Bioinformatics 38, i177–i184 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Gardner S. N. et al. Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes. Adv. Bioinforma. 2014, 101894 (2014). [Google Scholar]
41.Hysom D. A. et al. Skip the Alignment: Degenerate, Multiplex Primer and Probe Design Using K-mer Matching Instead of Alignments. PLOS ONE 7, e34560 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Fu L., Niu B., Zhu Z., Wu S. & Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Katoh K., Misawa K., Kuma K. & Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Gardner S. N. & Slezak T. Simulate_PCR for amplicon prediction and annotation from multiplex, degenerate primers and probes. BMC Bioinformatics 15, 237 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Ye Y., Ellenberg R. M., Graham K. E. & Wigginton K. R. Survivability, Partitioning, and Recovery of Enveloped Viruses in Untreated Municipal Wastewater. Environ. Sci. Technol. 50, 5077–5085 (2016). [DOI] [PubMed] [Google Scholar]
46.Edgar R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Chen S., Zhou Y., Chen Y. & Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011). [Google Scholar]
49.Shen W., Le S., Li Y. & Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLOS ONE 11, e0163962 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Langmead B. & Salzberg S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Bushnell B. BBMap: A Fast, Accurate, Splice-Aware Aligner. https://www.osti.gov/biblio/1241166 (2014).
52.Li H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Grubaugh N. D. et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 20, 8 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

media-1.xlsx^{(80KB, xlsx)}

Supplement 2

media-2.pdf^{(1.1MB, pdf)}

Data Availability Statement

All raw sequencing data and analysis code will be made available upon request.

[R1] 1.Biggerstaff M., Cauchemez S., Reed C., Gambhir M. & Finelli L. Estimates of the reproduction number for seasonal, pandemic, and zoonotic influenza: a systematic review of the literature. BMC Infect. Dis. 14, 480 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.Horimoto T. & Kawaoka Y. Influenza: lessons from past pandemics, warnings from current incidents. Nat. Rev. Microbiol. 3, 591–600 (2005). [DOI] [PubMed] [Google Scholar]

[R3] 3.Nguyen T.-Q. et al. Emergence and interstate spread of highly pathogenic avian influenza A(H5N1) in dairy cattle in the United States. Science 388, eadq0900 (2025). [Google Scholar]

[R4] 4.CDC. Genetic Sequences of Highly Pathogenic Avian Influenza A(H5N1) Viruses Identified in a Person in Louisiana. Avian Influenza (Bird Flu) https://www.cdc.gov/bird-flu/spotlights/h5n1-response-12232024.html (2025). [Google Scholar]

[R5] 5.CDC. Influenza Virus Genome Sequencing and Genetic Characterization. Influenza (Flu) https://www.cdc.gov/flu/php/viruses/genetic-characterization.html (2024). [Google Scholar]

[R6] 6.Pagel C. & Yates C. A. Tackling the pandemic with (biased) data. Science 374, 403–404 (2021). [DOI] [PubMed] [Google Scholar]

[R7] 7.Zhou B. et al. Single-reaction genomic amplification accelerates sequencing and vaccine production for classical and Swine origin human influenza a viruses. J. Virol. 83, 10309–10313 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Lee A. J. et al. Wastewater monitoring of human and avian influenza A viruses in Northern Ireland: a genomic surveillance study. Lancet Microbe 5, 100933 (2024). [DOI] [PubMed] [Google Scholar]

[R9] 9.Khan M. et al. Significance of wastewater surveillance in detecting the prevalence of SARS-CoV-2 variants and other respiratory viruses in the community – A multi-site evaluation. One Health 16, 100536 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Maqsood R. et al. Influenza Virus Genomic Surveillance, Arizona, USA, 2023–2024. Viruses 16, 692 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R11] 11.Rothman J. A. et al. RNA Viromics of Southern California Wastewater and Detection of SARS-CoV-2 Single-Nucleotide Variants. Appl. Environ. Microbiol. 87, e01448–21 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Wyler E. et al. Pathogen dynamics and discovery of novel viruses and enzymes by deep nucleic acid sequencing of wastewater. Environ. Int. 190, 108875 (2024). [DOI] [PubMed] [Google Scholar]

[R13] 13.Tisza M. J. et al. Sequencing-Based Detection of Avian Influenza A(H5N1) Virus in Wastewater in Ten Cities. N. Engl. J. Med. 391, 1157–1159 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.John A. et al. Characterizing influenza A virus lineages and clinically relevant mutations through high-coverage wastewater sequencing. Water Res. 287, 124453 (2025). [DOI] [PubMed] [Google Scholar]

[R15] 15.Vo V. et al. Identification and genome sequencing of an influenza H3N2 variant in wastewater from elementary schools during a surge of influenza A cases in Las Vegas, Nevada. Sci. Total Environ. 872, 162058 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Kantor R. S., Nelson K. L., Greenwald H. D. & Kennedy L. C. Challenges in Measuring the Recovery of SARS-CoV-2 from Wastewater. Environ. Sci. Technol. 55, 3514–3519 (2021). [DOI] [PubMed] [Google Scholar]

[R17] 17.Jiang M. et al. Evaluation of the Impact of Concentration and Extraction Methods on the Targeted Sequencing of Human Viruses from Wastewater. Environ. Sci. Technol. 58, 8239–8250 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Kantor R. S. & Jiang M. Considerations and Opportunities for Probe Capture Enrichment Sequencing of Emerging Viruses from Wastewater. Environ. Sci. Technol. 58, 8161–8168 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Kuchinski K. S., Duan J., Himsworth C., Hsiao W. & Prystajecky N. A. ProbeTools: designing hybridization probes for targeted genomic sequencing of diverse and hypervariable viral taxa. BMC Genomics 23, 579 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Hysom D. A. et al. Skip the Alignment: Degenerate, Multiplex Primer and Probe Design Using K-mer Matching Instead of Alignments. PLOS ONE 7, e34560 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.Gardner S. N. et al. Multiplex degenerate primer design for targeted whole genome amplification of many viral genomes. Adv. Bioinforma. 2014, 101894 (2014). [Google Scholar]

[R22] 22.Kent C. et al. PrimalScheme: open-source community resources for low-cost viral genome sequencing. 2024.12.20.629611 Preprint at 10.1101/2024.12.20.629611 (2024). [DOI] [Google Scholar]

[R23] 23.Wang M. X. et al. Olivar: towards automated variant aware primer design for multiplex tiled amplicon sequencing of pathogens. Nat. Commun. 15, 6306 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Kent C. et al. PrimalScheme: open-source community resources for low-cost viral genome sequencing. 2024.12.20.629611 Preprint at 10.1101/2024.12.20.629611 (2024). [DOI] [Google Scholar]

[R25] 25.treangenlab/Olivar. Treangen Lab; (2025). [Google Scholar]

[R26] 26.John A. et al. Wastewater sequencing reveals the genomic landscape of Influenza A virus in Switzerland. 2025.04.17.25325990 Preprint at 10.1101/2025.04.17.25325990 (2025). [DOI] [Google Scholar]

[R27] 27.Nash D. et al. A Genome Sequence Variant Monitoring Program for Seasonal Influenza A H3N2 and Respiratory Syncytial Virus A using Wastewater-Based Surveillance in Ontario, Canada. 2025.07.29.667219 Preprint at 10.1101/2025.07.29.667219 (2025). [DOI] [Google Scholar]

[R28] 28.Davis J. J. et al. Analysis of the ARTIC Version 3 and Version 4 SARS-CoV-2 Primers and Their Impact on the Detection of the G142D Amino Acid Substitution in the Spike Protein. Microbiol. Spectr. 9, e01803–21. [Google Scholar]

[R29] 29.Chakraborty C., Das A., Bhattacharya M. & Islam Md. A. Mutations in the influenza virus, primarily H5N1, enhance the virus’s virulence, favor receptor interaction, and increase drug resistance. Int. J. Surg. Lond. Engl. 111, 4128–4131 (2025). [Google Scholar]

[R30] 30.Ping J. et al. PB2 and Hemagglutinin Mutations Are Major Determinants of Host Range and Virulence in Mouse-Adapted Influenza A Virus. J. Virol. 84, 10606–10618 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.McCall C. et al. Targeted Metagenomic Sequencing for Detection of Vertebrate Viruses in Wastewater for Public Health Surveillance. ACS EST Water 3, 2955–2965 (2023). [Google Scholar]

[R32] 32.The Effects of mismatches on DNA Capture by Hybridization | Twist Bioscience. https://www.twistbioscience.com/resources/white-paper/effects-mismatches-dna-capture-hybridization.

[R33] 33.Untergasser A. et al. Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res. 35, W71–W74 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Kim H., Webster R. G. & Webby R. J. Influenza Virus: Dealing with a Drifting and Shifting Pathogen. Viral Immunol. 31, 174–183 (2018). [DOI] [PubMed] [Google Scholar]

[R35] 35.Wang A. L. et al. Benchmarking Concentration and Extraction Methods for Wastewater-Based Surveillance of Eight Human Respiratory Viruses: Implications for Rapid Application to Novel Pathogens. Environ. Sci. Technol. (2025) doi: 10.1021/acs.est.4c13635. [DOI] [Google Scholar]

[R36] 36.Wolfe M. K. et al. Wastewater-Based Detection of Two Influenza Outbreaks. Environ. Sci. Technol. Lett. 9, 687–692 (2022). [Google Scholar]

[R37] 37.Wang A. L. et al. Benchmarking concentration and direct extraction methods for wastewater-based surveillance of eight human respiratory viruses: implications for rapid application to novel pathogens. 2024.11.27.625007 Preprint at 10.1101/2024.11.27.625007 (2024). [DOI] [Google Scholar]

[R38] 38.Zhuang Q. et al. Diversity and distribution of type A influenza viruses: an updated panorama analysis based on protein sequences. Virol. J. 16, 85 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Alanko J. N. et al. Syotti: scalable bait design for DNA enrichment. Bioinformatics 38, i177–i184 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Gardner S. N. et al. Multiplex Degenerate Primer Design for Targeted Whole Genome Amplification of Many Viral Genomes. Adv. Bioinforma. 2014, 101894 (2014). [Google Scholar]

[R41] 41.Hysom D. A. et al. Skip the Alignment: Degenerate, Multiplex Primer and Probe Design Using K-mer Matching Instead of Alignments. PLOS ONE 7, e34560 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Fu L., Niu B., Zhu Z., Wu S. & Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Katoh K., Misawa K., Kuma K. & Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Gardner S. N. & Slezak T. Simulate_PCR for amplicon prediction and annotation from multiplex, degenerate primers and probes. BMC Bioinformatics 15, 237 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R45] 45.Ye Y., Ellenberg R. M., Graham K. E. & Wigginton K. R. Survivability, Partitioning, and Recovery of Enveloped Viruses in Untreated Municipal Wastewater. Environ. Sci. Technol. 50, 5077–5085 (2016). [DOI] [PubMed] [Google Scholar]

[R46] 46.Edgar R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R47] 47.Chen S., Zhou Y., Chen Y. & Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R48] 48.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12 (2011). [Google Scholar]

[R49] 49.Shen W., Le S., Li Y. & Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLOS ONE 11, e0163962 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R50] 50.Langmead B. & Salzberg S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Bushnell B. BBMap: A Fast, Accurate, Splice-Aware Aligner. https://www.osti.gov/biblio/1241166 (2014).

[R52] 52.Li H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R53] 53.Grubaugh N. D. et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 20, 8 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

This is a preprint.

Developing and Benchmarking One Health Genomic Surveillance Tools for Influenza A Virus in Wastewater

Minxi Jiang

Audrey LW Wang

James B Thissen

Kara L Nelson

Lenore Pipes

Rose S Kantor

Roles

Abstract

1. Introduction

Table 1.

2. Results

2.1. Design of the Probe-IAV and HA segment tiled-amplicon panel

2.2. Comparison of coverage breadth and depth among different methods for IAV sequencing

Figure 1. Performance of different sequencing methods for detecting the HA segment of IAV.

2.3. Comparison of sensitivity and quantitativeness of different sequencing methods

Figure 2:

2.4. Impact of virus decay and concentration/extraction methods on IAV sequencing

Figure 3. Comparison of concentration/extraction methods and the impact of incubation on genome-wide segment recovery across sequencing methods and IAV strains.

2.5. Economic analysis for different sequencing methods

3. Discussion

3.1. Performance of different targeted sequencing methods

3.2. Design of Probe Capture vs. Tiled-amplicon Sequencing

3.3. Wastewater-Specific Challenges and Implications of One Health IAV surveillance

4. Methods

4.1. Culturing and Acquisition of Spike-in Influenza A Viruses

4.2. Design of Genomic Surveillance Tools

4.3. Pre-existing Methods for IAV Sequencing

4.4. Sample Preparation

Direct RNA mixtures of extracted virus RNA and wastewater RNA

Incubation mixtures with spike-in virus in 40-mL wastewater

4.5. Library preparation and sequencing

Probe capture sequencing using Twist library preparation and enrichment kits

Tiled amplicon sequencing

Universal amplicon library and Nanopore sequencing

4.6. Bioinformatic analysis

4.7. Data analysis

4.8. Economic analysis

Supplementary Material

Acknowledgements

4.9. Data availability

Reference

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases