Abstract
Background
Hybrid capture is a critical technology for selective enrichment of genomic regions of interest in genomic analysis. Despite its widespread adoption, the core methodology has remained largely unchanged for over 15 years, with traditional workflows involving time-consuming bead-based capture steps, multiple temperature-controlled washes, and post-hybridization PCR. These steps introduce workflow complexity, increase turnaround time, and can negatively impact library complexity and variant calling accuracy.
Results
We present a simplified hybrid capture workflow that eliminates these complexities by directly loading the hybridization product onto the sequencing flow cell. The approach is enabled by the development of a streptavidin flow cell surface, a method to circularize and amplify captured targets on the flow cell, and a fast hybridization protocol. Our workflow reduces the time from the start of library preparation to the start of sequencing by over 50% while maintaining or improving capture specificity and library complexity. We demonstrate improved variant calling performance with indel false positive and false negative reductions of 89% and 67%, respectively. We also show how the approach can be used to create an entirely PCR-free targeted sequencing workflow.
Conclusions
We present a targeted sequencing workflow that eliminates bead-based capture, multiple washes, and post-hybridization PCR, while improving various aspects of data quality. The performance of the approach was evaluated by sequencing hundreds of samples and demonstrating high on-target rates, reduced duplicates, and improved indel accuracy. By combining the approach with a PCR-free library preparation, we enable an entirely PCR-free targeted sequencing assay which further improves indel calling, and shows the ability to call an HTT expansion, associated with Huntington’s disease. This streamlined approach addresses key operational challenges in targeted sequencing, offering potential benefits for applications requiring rapid turnaround times or increased capability in variant detection.
Supplementary Information
The online version contains supplementary material available at 10.1186/s12864-025-11939-6.
Keywords: Hybrid capture, Improved workflow, targeted sequencing, PCR-free, Exome sequencing, Improved indel calling, HTT repeat expansion detection
Background
Hybridization-based target selection, also known as hybrid capture, is a powerful molecular biology technique. It enables researchers to selectively enrich specific regions of the genome for high-throughput sequencing, facilitating deeper analysis and understanding of targeted loci. In this technique, DNA or RNA probes (baits) are designed to be complementary to the regions of interest in the genome. These baits are then used to capture and enrich target sequences from a fragmented genomic DNA library. The process generally involves hybridizing the library fragments to the baits, capturing the baits with hybridized library, followed by washing away unbound fragments. The captured target sequences are then eluted or directly amplified for downstream sequencing.
Hybridization based methods are nearly ubiquitous in genomics, with applications in basic and translational research [1–12]. Other targeting approaches using nanopore sequencing have also been demonstrated [13–15]. The solution-phase hybrid capture method is the primary approach used today, driven by optimization of hybridization-based probe designs and the broad availability of targeted panels [2, 3, 5, 8, 9, 11]. A popular application is exome sequencing, which focuses on the protein-coding regions of the genome that harbor the majority of known disease-causing mutations while comprising only ~ 1% of the human genome. The approach is also applied to smaller panels allowing researchers to focus on specific genes or genomic regions of interest across applications such as cancer genomics, microbial genomics, forensic science and agriculture [12, 16–22]. Hybrid capture offers several advantages over alternative target enrichment methods such as multiplex PCR. Most notably, hybrid capture can be applied to a wide range of target sizes, from a few kilobases to many megabases that cover the entire exome [9]. The method provides cost-savings in sequencing while delivering relatively uniform coverage across the target regions, minimizing biases, and ensuring accurate representation of the sequences [10].
A major drawback of targeted hybridization methods is a lengthy and complex workflow. The post-hybridization steps of the workflow have remained largely the same since their original development with minor optimizations in hybridization time and the use of refined buffers and wash steps to increase specificity. The methods nearly universally use magnetic beads containing streptavidin to bind the biotinylated oligo baits that have been hybridized to the target library. Following bead binding, a series of temperature-controlled washes to remove the unbound and non-specific material are performed. These steps are essential in achieving a high on-target rate, but they lead to a large loss of DNA, thus requiring PCR amplification of the captured DNA to generate a final library with sufficient input material for sequencing. The common use of two rounds of PCR throughout the hybridization selection process followed by exacting washes and the need to retain bead materials through multiple wash steps make the post-hybridization steps a key source of variability in the assay. The post-hybridization PCR also leads to reduced library complexity through a sampling effect, where random amplification of DNA molecules results in the loss of some rare sequences. Streamlining this workflow would substantially reduce turnaround time and increase efficiency across sequencing applications.
Here we describe Trinity, a modified hybrid capture workflow, that addresses the workflow challenges of the traditional approach, while retaining its key advantages and extending its capabilities. These changes eliminate the need for post-hybridization PCR amplification, multiple wash steps, and the use of streptavidin beads. Traditional target enrichment workflows require 12–24 h to complete [8, 9, 20]. With the Trinity workflow, the entire process from the start of library preparation to sequencer loading can be completed in as fast as 5 h. The sequencing results demonstrate reduced duplicate rates, improved on-target rates in smaller panels, and higher accuracy indel calling. When combined with a PCR-free library prep, Trinity also supports an entirely PCR-free targeted sequencing workflow.
Methods
Flow cell functionalization
Passivated streptavidin flow cells were prepared using formulations developed by Element Biosciences, Inc. The flow cells can be purchased as part of the sequencing kits under part numbers 860 − 00019 and 860 − 00020.
Samples and library preparation
Human genomic DNAs from cell line sample HG001, HG002, HG003, and HG004 were obtained from the Coriell Institute. The myeloid gDNA reference sample was purchased from Horizon Discovery (catalog no. HD829). Additionally, representative DNA samples were provided from blood, bone marrow, fresh frozen tissue, FFPE biopsy, and FFPE resection for evaluation.
Extracted DNA samples were fragmented via enzyme treatment or mechanical shearing. Libraries were prepared using the IDT xGen Exome Sequencing Kit Trinity for Element AVITI System (catalog no. 10022463), Twist for Element Exome 2.0 + Comp Library Preparation (catalog no. 109326 or 109327), or Roche KAPA EvoPrep (material no. 10154039001) and EvoPlus v2 (material no. 09420037001) Library Preparation with xGen Stubby Adapter-UDI’s for Element (catalog no. 10017036), following the vendor’s instructions. For some experiments, a pool of “end-polished” libraries were used. They were prepared with terminal primers and an additional five cycles of PCR prior to library QC. Briefly, 100 ng of each individual-indexed libraries were input into each 50 µl PCR reactions containing 1X IDT xGen HiFi master mix, and 5 µl of xGen Library Amp Primer Mix for Element (catalog no. 10016959). The libraries were amplified in a Thermocycler with the following PCR program: 98 °C 45 s, followed by 5 cycles of 98 °C 45 s, 60 °C 30 s, and 72 °C 45 s, and a final 72 °C extension for 1 min. The end-polished libraries were purified using 1.0x SPRI beads and quantified by Qubit dsDNA HS Assay Kit (catalog no. 32851).
For PCR-free library preparation, 500 ng gDNA per sample was input into the PCR-free workflow using the Element Elevate Enzymatic Library Prep Kits (catalog no. 830-00009). The gDNA was enzymatically sheared for 15 min at 37 °C. The final PCR-free libraries were quantified using qPCR as described in the user guide.
For PCR-free library preparation of the repeat expanded control samples, 1ug of gDNA per sample was input into the PCR-free workflow using the Element Elevate Mechanical Library Prep Kits (catalog no. 830-00008). The ME220 Covaris was used to shear the DNA with the following program settings:
| Duration | Peak Power | Duty% Factor | Cycles/Bursts | Avg Power | Iterations |
|---|---|---|---|---|---|
| 10 | 50 | 20 | 1000 | 10 | 7 |
Following post-ligation cleanup, a 0.46X/0.62X double-sided size selection was used to target a 300-350bp insert library. The final PCR-free libraries were quantified using qPCR as described in the user guide
Hybridization and sequencing
Trinity hybridization reactions for IDT and Twist exome panels were carried out according to the Trinity hybridization user guides and using the associated kits (Trinity Sequencing User Guide, IDT xGen Exome Sequencing Kit Trinity for Element AVITI System, Twist for Element Exome 2.0 + Comprehensive Exome Spike-in Library Preparation and Standard Hybridization with Trinity Sequencing Workflow, and Twist for Element Exome 2.0 + Comprehensive Exome Spike-in Library Preparation and Fast Hybridization with Trinity Sequencing Workflow). Additionally, the Trinity workflow with the IDT exome kit was carried out with a one and two-hour fast hybridization (Supplementary Table 6). For this experiment, libraries were pooled by 24-plex for hybridization reactions. After pooling, Human Cot DNA and Trinity Binding Reagent were added to each reaction prior to being dried down as per the xGen Exome Hybridization & Trinity Run Setup protocol. After reactions were dried down, some minor deviations were made from the standard 16-hour hybridization workflow: the pellets were resuspended in 8.5 µL of xGen 2x Hybridization Buffer, 4 µL of xGen Exome v2 Panel, and 4.5µL of nuclease-free water (omitting the xGen Hybridization Buffer Enhancer). These reactions were incubated at room temperature for 5–10 min, then sealed and transferred to a thermal cycler, where hybridization would occur with the temperature in user guide.
PCR-free library Trinity hybridization was carried out using a 3 µg PCR-free library pool (quantified by qPCR) as input into the Trinity for Twist fast hybridization workflow followed by the Trinity user guide.
Trinity hybridization reactions for non-exome panels from the same vendors were carried out according to the same Trinity protocols with some modifications. For the analysis of the myeloid horizon reference sample (HD829, Horizon Discovery), 6.25 µg of material was hybridized following the Twist for Element Fast Hybridization workflow with 4 µl of the GMS Myeloid gene panel (Genomic Medicine Sweden). For the analysis of the 0.8 Mb panel, 24 µg of total end-polished material was hybridized according to the IDT XGen Trinity workflow with 4 µL of the IDT xGen Pan-Cancer Hybridization Panel. For the analysis of the 43 Mb exome panel, 5.2 µg of total material was hybridized with the Roche KAPA HyperExome v2 Probes Whole Exome Sequencing solution panel and reagents, plus the Trinity Binding Reagent with reagent volumes and hybridization temperature adjusted to accommodate the addition.
After hybridization was complete, the reactions were removed from the thermal cycler and Trinity run setup proceeded as per the xGen or Twist Exome Hybridization & Trinity Sequencing user guides. The quantity of each diluted hybridization reaction used for final loading was adjusted depending on the library type (stubby or full-length adapter libraries) and hybridization time (1 or16 hours). Trinity hybridization reactions were sequenced on AVITI with a 2 × 75 or 2 × 150 Trinity Sequencing Kit, following Trinity Sequencing User Guide.
Data analysis
FASTQ files were downsampled to a proper read depth, e.g. 30 M paired-end reads (30 M polonies, 60 M total reads/sample). Reads were aligned to the hg38 or GRCh37 reference genomes using Sentieon BWA-MEM (sentieon-genomics-202308.03) based on target panel coordinates. After alignment, corresponding bed files were applied for panel performance evaluation using Picard (v2.25.0) and Sentieon (sentieon-genomics-202308.03). Library complexity in Picard leverages the Lander-Waterman Eq. [23] to estimate library complexity from the duplicate rate. Variant calling within the defined target regions was performed using DeepVariant (v1.6.1). Variant calling benchmarking utilized hap.py (v0.3.14) against the NIST v4.2.1 truthsets. For the PCR-free data, ExpansionHunter (v5.0.0) was used to estimate sizes of repeat alleles at loci defined within a comprehensive STR catalog corresponding to the hg38 reference. REViewer (v0.2.7) was used to visualize alignment of reads containing tandem repeats.
Results
Workflow and performance comparison
Figure 1A illustrates the traditional hybrid capture workflow as originally described in Gnirke et al. [8]. We set out to eliminate all the manual post-hybridization steps by enabling direct loading of the hybridization product onto the flow cell. A major challenge with this approach is that the hybridization product contains not only library fragments bound to the baits (i.e. the targeted.
Fig. 1.
Comparison of the traditional hybrid selection workflow (A) and Trinity (B). The simplified Trinity workflow eliminates bead binding, washes, post-hyb PCR, and library QC
regions) but also the unbound library fragments, which have far greater abundance. To overcome this specificity challenge, we developed a passivated streptavidin flow cell surface that could directly capture the biotinylated baits with bound library elements, without capturing the unbound (off-target) library elements. This flow cell surface obviates the need for the streptavidin beads and temperature-controlled bead washing steps that require centrifugation and often multiple subsequent washes [24]. In the traditional workflow, off-target sequences are inadvertently captured through two mechanisms: (1) hybridization between repetitive elements within the genomic inserts themselves, and (2) adapter-mediated cross-hybridization, where complementary adapter sequences anneal between on-target and off-target library fragments. These are two major contributors to off-target sequences in the hybrid capture workflow [25]. The insert annealing is usually prevented by adding repetitive blocking DNA [26]. With our new flow cell, biotinylated probe/library complexes are captured on the surface, and we perform an on-flowcell circularization followed by rolling circle amplification [27]. Our use of a circularization oligonucleotide prevents adapter annealing formation and results in high on-target performance, without the need for adapter-specific “blocker” reagents that are typically used to achieve high specificity [28]. The RCA product is directly sequenced on the AVITI instrument as previously described [27]. Figure 1B illustrates the Trinity workflow. The initial steps leading to hybridization remain unchanged, but the workflows diverge significantly thereafter, with Trinity proceeding to sequencing following hybridization. In addition to eliminating the bead capture, wash steps, post-hybridization PCR, and QC, we optimized conditions to shorten the hybridization time to one hour while maintaining high specificity and uniformity. The optimization enabled us to go from library preparation to loading of the sequencer within a single work shift. Specifically, we completed the workflow in under 5 h with 2.5 h for the library preparation, 1 h for pooling and pre-hybridization steps, 1 h for the hybridization, and 20 min for setting up the sequencing run. For the traditional workflow, the additional steps required us to break the workflow into two days.
To evaluate the performance of the Trinity assay, we compared the traditional workflow with a 16-hour hybridization to the Trinity workflow with a 1-hour hybridization across two bait set vendors. Table 1 shows the average results for 24 samples in each workflow. Trinity performance exceeded the traditional approach across most metrics including fold-80, duplicate rate, and variant calling accuracy– particularly for indels. The 16-hour traditional approach from vendor B showed a higher on-target rate, but this was partially offset by the higher duplicate rate. Additionally, Trinity demonstrated higher variant calling accuracy when starting with the same number of input reads. A limited cross-platform comparison using Vendor B exome data for HG001 [29] showed improved indel calling with Trinity relative to the traditional workflow sequenced on NovaSeq 6000. Detailed results are provided in Supplementary Table 1.
Table 1.
Metric comparison of traditional exome approach with 16-hour hybridization and trinity 1-hour hybridization across two bait set manufacturers. vendor A sequenced with 2 × 150 reads and vendor B sequenced with 2 × 75 reads. vendor panels differ in size and content. All analysis performed with 30 million reads per sample and metrics averaged over 24 samples per condition
| Metric | Traditional 16 h Vendor A |
Trinity 1 h Vendor A |
Traditional 16 h Vendor B |
Trinity 1 h Vendor B |
|---|---|---|---|---|
| On-Target | 88% | 87% | 93% | 87% |
| Fold-80 | 1.47 | 1.27 | 1.46 | 1.43 |
| % Duplicates | 2.31% | 1.17% | 3.60% | 1.27% |
| SNP F1 | 0.991 | 0.993 | 0.989 | 0.991 |
| Indel F1 | 0.957 | 0.975 | 0.967 | 0.981 |
To evaluate the robustness of the Trinity assay, we sequenced 216 whole exome samples (replicates of HG001 and HG002) with the Trinity workflow and collected metrics. The study utilized 9 flow cells multiplexed at 24 samples per hybridization reaction per flow cell. We evaluated bait sets from two vendors and different hybridization protocols, with multiple replicates for each configuration. Each of the conditions showed metrics within the expected range with low variation across runs (Supplementary Table 2).
We next evaluated Trinity’s compatibility with FFPE and biopsy samples, which are widely used in clinical research despite their challenges of limited DNA quality and quantity. For this, we prepared libraries from 8 gDNA samples from blood, 8 FFPE preserved samples from needle biopsy, and 8 FFPE preserved samples from tissue resection, starting with 50 ng DNA input (DIN ranging from 1.7 to 4.9 for FFPE and 4.7–8.7 for blood) per sample. The libraries were pooled for hybridization to an exome panel and sequenced following the Trinity protocol. Trinity showed higher mean target coverage due to lower duplicate rates compared to the traditional workflow with the same samples. Statistical significance of the improvement varied by sample type (Supplementary Fig. 1).
Evaluation of different panel sizes
Bait sets targeting the exome and its extensions represent some of the largest panels routinely used in research. Many applications use panels with a more focused target space and therefore a small fraction of the number of baits used for the exome. We explored the compatibility of the Trinity workflow with a variety of such panels to assess performance and determine the amount of starting material required. Testing ten panels ranging from 730 kb (191 genes, 7816 probes, where ~ 0.024% of genome is targeted by capture probes) to 43 Mb (~ 1.39% of genome), we investigated whether (1) adequate target density could be achieved with smaller panels and (2) whether we could predict the required amount of input material based primarily on panel size.
Though multiple factors may impact conversion efficiency from an input capturable molecule to a successfully sequenced polony (including panel design, capture efficiencies, and vendor differences), we found that capture performance primarily correlates with panel size relative to genome size. By accounting for this potential capture space of molecules, we found that we could achieve our target flow cell output (~ 800 M paired-end reads) for a large range of panel sizes by appropriately scaling input into the hybridization reaction. Figure 2A shows the inverse relationship between target size and required input in a log-log plot across a range of commercially available panels (gray circles) and those empirically tested (colored circles). With the ten panels tested, we were able to achieve target density while using total pool input amounts from 0.9 to 24 µg from the largest to the smallest panels, respectively (Supplementary Table 3). The simple relationship between the input amount and the panel size serves as a useful starting point for the loading optimization of any custom panel within the tested range.
Fig. 2.
Library loading and performance across panel sizes. (A) log-log plot demonstrating relationship between panel size and library loading amount. The gray circles provide the approximate loading concentration as a function of panel size for commercially available targeted panels that are similar in size to those that we have empirically tested (colored circles). The DNA amount is cumulative across multiplexed samples, so the upper left-most point (a 200 kb panel which captures roughly 0.006% of the human genome) would have a corresponding loading pool amount of ~ 96 µg–1 µg per sample in a 96 plex assay. (B) On-target performance for a 0.8 Mb pan-cancer panel with Traditional and Trinity workflows (24 µg hybridization input, 18 µg loading) and a 43 Mb exome panel (3 µg hybridization input, 1.6 µg loading)
Importantly, the Trinity workflow maintained high on-target stringency despite these varying hybridization inputs. When testing panels at both extremes of our size range, Trinity demonstrated superior performance compared to the traditional workflow (Fig. 2B). The 800 kb panel showed a significant increase in on-target rate from 85.4% with the traditional workflow to 88.9% with the Trinity workflow (p-value < 0.0001), while the 43.2 Mb panel improved from 84.4 to 89.4% (p-value < 0.0001). In addition, mean target coverage, duplicate rate, and indel calling accuracy were comparable or better than in the traditional assay (Supplementary Table 4).
Evaluation of somatic use cases
We sequenced several somatic panels using the Trinity protocol and 1-hour hybridization to evaluate the somatic use case. A representative example was a 7,901-probe custom oncology panel applied to 10 samples. Compared to the traditional workflow, Trinity showed a nearly two-fold reduction in duplicates (56–33%) corresponding to higher library complexity and higher mean target coverage when starting with the same number of input reads (Fig. 3A). One of the 10 samples was a reference standard (Horizon myeloid DNA HD829) with 22 known variants at specified allele frequencies. We detected all 22 variants with observed allele frequencies closely matching their expected values (Fig. 3B).
Fig. 3.
Duplicate percentage and variant calling performance for custom oncology panel. (A) Percent duplicates with Traditional and Trinity workflows as determined by Sentieon Dedup algorithm implementation of the analogous Picard metric. Mean target coverage is 871X and 1430X for Traditional and Trinity, respectively, with the difference explained by the difference in duplicates. (B) Expected versus observed allele frequency from the expected variants in the Horizon myeloid DNA HD829 reference sample. All 22 variants are shown, though many points near 5% allele frequency overlap
Performance of the PCR-free workflow
The Trinity workflow obviates the need for post hybridization PCR, which presents a unique opportunity to eliminate PCR from all process steps and to evaluate an entirely PCR-free exome. We prepared PCR-free whole genome libraries from the well-characterized Genome in a Bottle samples HG001-HG004, executed the Trinity workflow with exome baits, and sequenced two replicates of each sample on the AVITI sequencer, which employs a rolling circular amplification chemistry, as opposed to a PCR-based approach [27]. As a control, the same samples were processed using the traditional hybrid capture method that employs PCR before and after the hybridization. We also sequenced the same samples using Trinity with PCR prior to hybridization to determine the impact of each PCR step. Using 25 million reads per sample, we saw 1.8% duplicates with the traditional method, 1.3% with Trinity PCR, and 0.41% with Trinity PCR-free. This translates to a 4.4-fold increase in library complexity when comparing the traditional method to Trinity PCR-free, as estimated via Picard [30] (see Methods). Figure 4 shows the comparison of variant calling performance among the methods. The PCR-free exome shows a modest improvement in SNP F1-score (from 0.987 to 0.992) but the improvement in indel calling is striking (from 0.938 to 0.985). The number of indel false negative calls is reduced by an average of 67% and the number of indel false positive calls is reduced by an average of 89%. The result demonstrates that most indel errors in exome sequencing are caused by the PCR amplification (as opposed to e.g. coverage non-uniformity). Interestingly, the indel performance gap between Trinity with PCR and Trinity PCR-free is smaller than that gap between traditional and Trinity with PCR, suggesting that more of the benefit comes either from the elimination of the post-hybridization PCR step or from some other aspect of the Trinity assay. To determine the impact of higher coverage on variant calling benchmarking, we repeated the analysis with 80 million reads per sample (Supplementary Fig. 2) and demonstrated an indel F1-score above 0.99, approaching the standard set by PCR-free whole genome sequencing [31].
Fig. 4.
Variant calling benchmarking comparing traditional workflow to Trinity and Trinity PCR-free. (A) The SNP performance is comparable, while the indel F1 scores (B) are markedly higher for the Trinity and Trinity PCR-free assay. The indel improvements are significant with p-values < 0.0001. Reads were down sampled to 25 million per sample prior to analysis
We next evaluated the ability of the PCR-free exome data to detect repeat expansions, a variant type reliably called only in PCR-free whole genome data [32], with initial testing focused on the HTT locus. The HTT (CAG)n repeat expansion is associated with Huntington’s disease and any length over 35 repeat units is pathogenic. We prepared libraries from Coriell samples NA13503 and NA13509, known to harbor a pathogenic expansion in the HTT locus of lengths 45 repeat units and 70 repeat units, respectively. The samples were prepared and sequenced in triplicate using the Trinity PCR-free workflow. Figure 5 shows the allele lengths for both alleles in each sample. The 45-unit expansion is called with the correct length in each replicate. The 70-unit expansion is slightly underestimated, but the reported length is well within the pathogenic range, and the confidence intervals provided by ExpansionHunter contain 70 repeats in each expansion (Supplementary Table 5). The non-expanded alleles in all samples were assigned the expected length. We next looked at 9 additional repeat expansion disease loci across these samples, though we did not expect any of them to harbor pathogenic alleles. We observed sufficient coverage and high confidence length estimates for 6 of the loci, while the other 3 had insufficient coverage for confident calls with the enrichment probe set that we used (Supplementary Figs. 3 and 4).
Fig. 5.
Repeat expansion performance. A: Normal and expanded alleles in the HTT locus based on Trinity PCR-free data. Error bars reflect variation across 3 replicates of each sample. NA13503 and NA13509 both have one expanded allele, while HG001 has two normal alleles. 40 M reads were used for each sample. B: Allele visualizations from Reviewer [33] for a representative replicate of NA13509
Discussion
The current implementation of Trinity has certain limitations that will be addressed in future versions. First, the assay is currently limited to 96-plex unique dual indexing sets. While this is adequate for exome sequencing, where 24 samples typically fill the flow cell, much smaller targeted panels could benefit from a larger set of indices depending on desired coverage. Second, the testing mostly focused on baits from two manufacturers (Twist Bioscience and Integrated DNA Technologies). As probe design strategies vary significantly (e.g. single stranded vs. double stranded probes or DNA vs. RNA probes) additional work is underway to ensure broad compatibility with other manufacturers and probe designs. Finally, in the evaluation of repeat expansion calling, we only tested expanded alleles at the HTT locus. For other repeat loci, we evaluated coverage depth in healthy controls but have yet to procure and sequence samples carrying the pathogenic expanded alleles. For loci that were not well covered, we are exploring optimized probe locations.
Our benchmarking focused on comparing the traditional hybrid capture workflow to Trinity using only the AVITI sequencing platform. This allowed for a controlled comparison in which key variables such as the library preparation, exome panel, sequencing chemistry, variant calling pipeline, and benchmarking methodology were held constant. While our cross-platform comparison was limited, published benchmarks [34, 35] of exome performance report findings consistent with ours, with Trinity demonstrating improved indel performance while maintaining comparable SNP accuracy.
We believe that additional gains and extensions are possible. For example, given the performance of the 1-hour hybridization, it is likely that the hybridization time could be reduced further without a significant drop in on-target rate for some panel designs and sizes. We are also exploring a modification where the hybridization occurs on the flow cell, thus eliminating additional hands-on steps and equipment from the assay. Finally, by combining low level of the original input libraries with the libraries captured in the hybridization, we can tune the on-target rate for applications that require an enrichment of certain targets along with a uniform genomic background. This combined workflow could be used for imputation or copy number calling. Beyond workflow improvements, the higher library complexity achieved by eliminating post-hybridization PCR opens additional research possibilities. Deep sequencing of these high-complexity libraries may facilitate the discovery of low frequency minor alleles in somatic applications or low abundance species in metagenomics. Work is ongoing to evaluate and optimize these capabilities and to match them to specific applications.
Conclusion
We present a simplified hybrid capture workflow that eliminates the steps between hybridization and sequencer loading. Several technological challenges had to be resolved to achieve this simplification without sacrificing specificity. These included the development of a streptavidin flow cell surface that enabled direct capture of the biotinylated probes bound to target, the development of an on-flow cell process for the circularization and amplification of the captured molecules, and the optimization of the hybridization protocol. We also established a relationship between target panel size and DNA input that provides guidance to custom panel users. The workflow was routinely completed within a single work shift enabling users to start a sequencing run on the same day that they began library preparation. The performance of the approach was evaluated by sequencing hundreds of samples and demonstrating high on-target rates, reduced duplicates, and improved indel accuracy. By combining Trinity with a PCR-free library preparation, we enable an entirely PCR-free targeted sequencing assay which further improves indel calling and demonstrates the ability to detect pathogenic HTT repeat expansions, indicating potential for application to other expansion loci. This streamlined approach addresses key operational challenges in targeted sequencing, offering potential benefits for applications requiring rapid turnaround times or increased capability in variant detection.
Supplementary Information
Acknowledgements
We acknowledge Joseph Puglisi for valuable comments and discussion during the writing of the manuscript. We thank Andrew Carroll and the DeepVariant team at Google for cross-platform comparison analysis.
Authors’ contributions
AM, XQ, KW, LE, KM, KD, MO, SB, JM, MM, BK, KG, RA, AB, JC, PM, and SD performed library preparation, protocol development, and sequencing. KW and BL performed data analysis. JZ, SK, and SL contributed to writing of the manuscript. MM created the figures. JZ, SL, and MP conceived of the method.
Funding
Not applicable.
Data availability
Data underlying Figs. 2 and 4 cannot be publicly shared due to sample privacy restrictions. All other datasets generated and analyzed during this study are available in the NCBI Sequence Read Archive under BioProject accession PRJNA1292974. Platform comparison data presented in Supplementary Table 1 is hosted in a public Google Cloud Storage bucket maintained by collaborators at Google. Direct file links are provided in Supplementary Table 7.
Declarations
Ethics approval and consent to participate
This study used publicly available reference materials from the National Institute of Standards and Technology (NIST), including the HG001 reference sample, which do not require ethics approval. Additionally, anonymized customer samples were analyzed, from which only high-level aggregate metrics were reported without any individual-level data. These anonymized samples were processed according to standard operating procedures with appropriate consent for research use. No identifiable human data or individual results are reported in this study.
Consent for publication
Not applicable.
Competing interests
All authors are current or former employees of Element Biosciences and may hold stock options in the company.
Footnotes
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Adeline Huizhen Mah, Xiaodong Qi and Junhua Zhao contributed equally to this work.
References
- 1.Southern EM. Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol. 1975;98:503–17. [DOI] [PubMed] [Google Scholar]
- 2.Lovett M, Kere J, Hinton LM. Direct selection: a method for the isolation of cDNAs encoded by large genomic regions. Proc Natl Acad Sci U A. 1991;88:9628–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lovett M. Fishing for complements: finding genes by direct selection. Trends Genet. 1994;10:352–7. [DOI] [PubMed] [Google Scholar]
- 4.Lander ES. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. [DOI] [PubMed] [Google Scholar]
- 5.Bashiardes S. Direct genomic selection. Nat Methods. 2005;2:63–9. [DOI] [PubMed] [Google Scholar]
- 6.Noonan JP. Sequencing and analysis of Neanderthal genomic DNA. Science. 2006;314:1113–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Choi M. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U A. 2009;106:19096–101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gnirke A. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol. 2009;27:182–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Turner EH, Ng SB, Nickerson DA, Shendure J. Methods for genomic partitioning. Annu Rev Genomics Hum Genet. 2009;10:263–84. [DOI] [PubMed] [Google Scholar]
- 10.Kim DW, Nam SH, Kim RN, Choi SH, Park HS. Whole human exome capture for high-throughput sequencing. Genome. 2010;53:568–74. [DOI] [PubMed] [Google Scholar]
- 11.Bansal V, Tewhey R, Leproust EM, Schork NJ. Efficient and cost effective population resequencing by pooling and in-solution hybridization. PLoS One. 2011;6: 18353. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Yang Y. Clinical whole-exome sequencing for the diagnosis of Mendelian disorders. N Engl J Med. 2013;369:1502–11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Loose M, Malla S, Stout M. Real-time selective sequencing using nanopore technology. Nat Methods. 2016;13(9):751–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Martin S, Heavens D, Lan Y, Horsfield S, Clark MD, Leggett RM. Nanopore adaptive sampling: a tool for enrichment of low abundance species in metagenomic samples. Genome Biol. 2022;23(1):11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Payne A, Holmes N, Clarke T, Munro R, Debebe BJ, Loose M. Readfish enables targeted nanopore sequencing of gigabase-sized genomes. Nat Biotechnol. 2021;39(4):442–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Stransky N. The mutational landscape of head and neck squamous cell carcinoma. Science. 2011;333:1157–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Jelin AC, Vora N. Whole exome sequencing: applications in prenatal genetics. Obstet Gynecol Clin North Am. 2018;45:69–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Vilarinho S, Mistry PK. Exome sequencing in clinical hepatology. Hepatology. 2019;70:2185–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Satterstrom FK. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell. 2020;180:568–584523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Xiao W. Toward best practice in cancer mutation detection with whole-genome and whole-exome sequencing. Nat Biotechnol. 2021;39:1141–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Gorukmez O, Gorukmez O, Topak A. Clinical exome sequencing findings in 1589 patients. Am J Med Genet A. 2023;191:1557–64. [DOI] [PubMed] [Google Scholar]
- 22.Monies D. The clinical utility of rapid exome sequencing in a consanguineous population. Genome Med. 2023;15:44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Lander ES, Waterman MS. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics. 1988;2(3):231–9. [DOI] [PubMed] [Google Scholar]
- 24.Kozarewa I, Armisen J, Gardner AF, Slatko BE, Hendrickson CL. Overview of target enrichment strategies. Curr Protoc Mol Biol. 2015;112(1):21–7. [DOI] [PubMed] [Google Scholar]
- 25.Hodges E. Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing. Nat Protoc. 2009;4:960–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Bao P. High-sensitivity detection of DNA hybridization on microarrays using resonance light scattering. Anal Chem. 2002;74:1792–7. [DOI] [PubMed] [Google Scholar]
- 27.Arslan S. Sequencing by avidity enables high accuracy with low reagent consumption. Nat Biotechnol. 2024;42:132–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Igartua C. Targeted enrichment of specific regions in the human genome by array hybridization. Curr Protoc Hum Genet Chap. 2010;18. Unit 18 13. [DOI] [PMC free article] [PubMed]
- 29.Baid G, Nattestad M, Kolesnikov A, Goel S, Yang H, Chang PC, et al. An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development. Cold Spring Harbor Laboratory. 2020. Available from: http://biorxiv.org/lookup/doi/10.1101/2020.12.11.422022. Cited 17 Jul 2025.
- 30.Picard. Available from: http://broadinstitute.github.io/picard.
- 31.Carroll A. Accurate human genome analysis with Element Avidity sequencing. bioRxiv. 2023. [DOI] [PMC free article] [PubMed]
- 32.Dolzhenko E. Detection of long repeat expansions from PCR-free whole-genome sequence data. Genome Res. 2017;27:1895–903. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Dolzhenko E, Weisburd B, Ibañez K. ReViewer: haplotype-resolved visualization of read alignments in and around tandem repeats. Genome Med. 2022;14:84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Belova V, Vasiliadis I, Repinskaia Z, Samitova A, Shmitko A, Ponikarovskaya N et al. Comparative evaluation of four exome enrichment solutions in 2024: Agilent, Roche, Vazyme and Nanodigmbio. BMC Genomics. 2025;26(1). Available from: https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-024-11196-z. Cited 22 May 2025. [DOI] [PMC free article] [PubMed]
- 35.Wong M, Liew B, Hum M, Lee NY, Lee ASG. Benchmarking of variant calling software for whole-exome sequencing using gold standard datasets. Sci Rep. 2025;15(1): 13697. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data underlying Figs. 2 and 4 cannot be publicly shared due to sample privacy restrictions. All other datasets generated and analyzed during this study are available in the NCBI Sequence Read Archive under BioProject accession PRJNA1292974. Platform comparison data presented in Supplementary Table 1 is hosted in a public Google Cloud Storage bucket maintained by collaborators at Google. Direct file links are provided in Supplementary Table 7.





