Abstract
Multiplex PCR-based assays are indispensable platforms for rapid and cost-effective DNA-based multi-target detection. The success of such an assay highly depends on the accurate design of oligonucleotide primers, arguably its most vital component. In this study, the ThermoPlex design tool is introduced, offering an automated design pipeline for target-specific multiplex PCR primers motivated by DNA thermodynamics. From a sequence alignment of all relevant target and non-target sequences, ThermoPlex automatically designs multiplex PCR primer candidates in just a matter of minutes. The software also offers tools for thermodynamic calculations that can either be used apart from the automated primer screening routine or in conjunction with other existing primer design tools, depending on the needs of the user. Evidence presented in this study provides insights into the performance of the software performance through theoretical and experimental analyses, serving to establish the reliability of its framework.
Keywords: DNA Thermodynamics, Multiplex PCR, Primer Design, Target-specific PCR
Introduction
Multiplex PCR-based assays remain widely employed for rapid DNA-based target detection due to their multi-target detection capabilities in a single assay platform [1]. Through the years, various detection and identification assays have emerged, utilizing this platform for a wide variety of organisms, including microbial [2, 3], plant [4], and animal [5–9] species. Recently, with the emergence of the coronavirus disease 2019 (COVID-19), numerous multiplex PCR assays for clinical diagnostics have been developed commercially and publicly [10–13], taking advantage of the assay’s ability to simultaneously detect the target along with multiple non-target samples in a single PCR reaction system. Multiplex PCR assays are thus expected to remain a relevant assay platform in the subsequent years, owing to their versatility in a variety of fields of application, as well as their cost–effectiveness.
The success of a PCR assay heavily depends on the design of its primers, perhaps its most vital component. In fact, PCR assay design issues typically arise from the designer’s lack of experience with the primer design process [14]. This includes understanding of key parameters to optimize and familiarity with appropriate design tools to generate optimal primers. The complexity of the problem further escalates when trying to design a multiplex PCR assay. Apart from the primer’s interaction with the target DNA, one must consider the interaction of multiple primers among themselves, as all these components interact together in a single PCR solution system. These interactions result in numerous undesirable hybridization reactions affecting the assay’s overall specificity and efficiency. Tackling such complexity with rigor, researchers often resort to an iterative, expensive, and labor-intensive workflow [15] with high incidence of failure even after extensive experimentation [16].
The design challenges described can be alleviated by in silico approaches that systematically evaluate the performance of primers before undergoing experimental scrutiny. The most common type of in silico method for primer design evaluates primer performance through heuristics driven by practical experience with PCR (e.g. melting point matching, base pair mismatches, BLAST/alignment scores). However, some of the major caveats of these approaches include the lack of direct physical interpretation and reliance on parameters with arbitrarily chosen thresholds [17]. Such approaches can thus be considered lacking in terms of proper design of multiplex PCR assays. In contrast, approaches based on nucleic acid thermodynamics can overcome these limitations. Thermodynamics of nucleic acid interactions have been extensively studied in the past two decades, with values for useful thermodynamic parameters [enthalpy (), entropy (), and Gibbs free energy () changes] already been compiled for every possible interaction motifs [18–24]. This makes it possible to quantitatively predict thermodynamic interactions between nucleic acids of any base pair sequence. Moreover, the use of thermodynamic parameters through the Nearest-Neighbor (NN) [25, 26] model have been proven to yield fairly accurate results consistent to that of experimental observations for oligonucleotide interactions [18]. This makes nucleic acid thermodynamics a solid foundation for developing a rigorous approach to designing multiplex PCR primers.
In this study, a design tool called ThermoPlex is introduced (Fig. 1), with the aim of assisting with the design and screening of multiplex-compatible, target-specific PCR Primers from target and non-target sequence alignment datasets. Although numerous programs already exist predicting thermodynamic interactions of nucleic acids, very few open-access software focuses on automated and streamlined screening of multiplex PCR primer candidates from a given sequence dataset. The methodology utilizes a novel time algorithm based on the NN model doublet parameters to predict the ensemble thermodynamics of DNA–DNA interactions. The thermodynamic information is then used to select target-specific primer candidates against undesirable target sequences. The target-specific primer candidates are then grouped together based on binding region overlaps and then assessed for multiplex-compatibility through an algorithm that computationally simulates multi-reaction equilibrium thermodynamics. The ThermoPlex software was designed in MATLAB with user interaction available either through a standalone GUI-enabled app (Windows/MacOS) or through the use of the source code. The standalone app together with the source codes are available at https://github.com/aagmata/ThermoPlex.
Figure 1.
Snapshot of the ThemoPlex software running the (A) ThermoDHyb and (B) SiMulEq routines
Methodology
ThermoDHyb: thermodynamics of DNA hybridization prediction algorithm
The ThermoDHyb algorithm aims to predict the thermodynamics of double-stranded ensemble interaction between two single-stranded DNAs. The novel algorithm calculates and predicts the associated Gibbs free energy change () between a single-stranded reference state and each double-stranded microstate corresponding to local minima in the cumulative pairwise energy matrix . The routine derives its main working principle from the Smith–Waterman algorithm [27] for sequence alignment. Sequence pairwise comparison (Fig. 2A) is done per sequence doublet according to the NN model, with values of the matrix populated according to the equation:
Figure 2.
(A) Cumulative pairwise energy matrix of doublet pairs for sequences 5′-ATCGCCT-3′ (Sequence ) and 5′-AGGTCGAT-3′ (Sequence ) at 40°C and 1M NaCl simulation conditions. Sequence and doublets are listed per row and column respectively. Arrows represent the cumulative dependence of each element to , or elements. Solid arrows depict the dependence case chosen corresponding to the minimum score case evaluated through Equation (1). (B) Propagation of cumulative energy scores (down-right for , up-left for P′). Terminal doublet pairs are boxed, arrows show the path of tracing the doublets to infer the minimum free energy (MFE) structure of each microstate. (C) Global MFE structure and corresponding free energy change of the most stable microstate predicted by ThermoDHyb for sequences and at 40°C and 1M NaCl simulation conditions
| (1) |
where is the cumulative free energy value, while and correspond to length-dependent penalties from bulge and internal loop formation, respectively. values were calculated from enthalpy and entropy values while also accounting for the salt concentration dependence according to Owczarzy’s model [28].
The matrix-filling procedure is performed bi-directionally (Fig. 2B) to predict multiple initiation and propagation paths across each local minimum, corresponding to each system microstate . The energies of the set of microstates are then summed according to the partition function [Equation (2)] which is in turn used to find the hybridization Gibbs free energy () according to Equation (3):
| (2) |
| (3) |
This allows ThermoDHyb to calculate thermodynamic interactions that account for significant contributors to the partition function, weighed according to the Gibbs factor. Subsequently, the routine traces back the propagation path from the terminal state along each local minimum to predict the minimum free energy (MFE) structure of each microstate (Fig. 2C). The inner workings of the algorithm, together with assumptions made, are explained further in the Supplementary Section S1.
ThermoDHyb was benchmarked for accuracy against the Visual OMP software (DNAsoftwareTM), DINAMelt, and NUPACK by performing a pairwise linear regression of global MFE values and concordance testing of predicted global MFE structures. Ninety-nine randomly generated sequence pairs with sizes ranging from 20 to 40 bp were simulated for interaction in 1M NaCl solution. The sub-routine was also benchmarked for the average computational time complexity to see how its runtimes evolve as a function of input sequence length (N). Random pairs of equal-length sequences were simulated ranging from 15 to 100 bp at 1 bp increments with five replicates per base pair increment. Mean values of each replicate were then used for polynomial curve-fitting from linear to quartic model by using the equations in Table 3. Curve-fitting was implemented through non-linear least squares fit by using the lsqnonlin function in MATLAB with default parameters.
Table 3.
Resulting fitting parameters for each model through non-linear regression of ThermoDHyb runtimes.
| Model | Fitting parameter |
|
|---|---|---|
| a | b | |
| 1.85 × 10−2 | −0.4037 | |
| 1.59 × 10−2 | 0.0326 | |
| 1.57 × 10−6 | 0.1930 | |
| 1.57 × 10−8 | 0.2824 | |
SiMulEq: Simulation of multi-reaction equilibria interaction
The SiMulEq algorithm simulates the concentration of the double-stranded DNA products in thermodynamic equilibrium formed by the primers binding specifically and non-specifically to the given DNA template. The equilibrium scenario is modeled as a set of interdependent competitive reactions that seek to minimize the total Gibbs free energy of the system (Fig. 3). A fully-defined system of equations was derived from the minimization of the total Gibbs free energy as a function of all the species’ chemical potential through the summability rule of partial properties [29]. The function was then constrained with material balance equations representing nucleotide strand conservation. By employing the method of Lagrange multipliers [30], the problem is converted from an optimization problem to a multi-dimensional root-finding problem of the form (derivation at Supplementary Section S2):
Figure 3.
Complex multi-reaction equilibrium scenario accounting for each of the possible pairwise interactions in a two-primer, one-template PCR reaction system. Template–template interactions are not accounted for as they occur on a different mechanism in the PCR annealing step
| (4a) |
| (4b) |
| (4c) |
where , are initial single-stranded, equilibrium single- and double-stranded DNA concentrations in the multiplex solution in standard mol/l, respectively, is a repetition multiplier, is the Lagrange multiplier for each strand species and . The resulting system of equations with equations and unknowns is solved through Newton’s method with finite difference Jacobian. This yields concentration values of each chemical species at chemical equilibrium for any given temperature. The concentration values are then corrected (see Supplementary Section S3) based on amplification inhibition due to non-target primers binding between the forward target primer and the reverse primer (Fig. 4). SiMulEq then iterates this procedure for a range of temperatures (40–70°C) with 2°C increments. The final output of SiMulEq are equilibrium product distribution (EPD) curves as functions of temperature.
Figure 4.

Extension inhibition mechanism where amplification of P1 products is inhibited by a primer (P2) binding along its extension path
ThermoPlex primer selection routine
The ThermoPlex primer selection routine (Fig. 5) aims to select multiplex-compatible, target-specific PCR primers from an input alignment of target sequences. Target-specific forward primers are first selected by iteratively evaluating every possible primer sequence from the consensus target sequence against all the non-target sequences through a sliding window with size equal to the primer length + 2 (the value 2 corresponds to single nucleotide flanking region at 5′ and 3′-end). This assumes that the most favorable interaction with a non-target template of a primer is located in the same region where the sequence was derived from the target sequence. In each iteration, a primer is first evaluated across a set of heuristic criteria (Fig. 5*) prior to thermodynamic calculations. Subsequently, calculation of the primers —on a single user-specified temperature (arbitrarily around PCR annealing temperature) with the sequence frame of non-target sequence —is performed through ThermoDHyb. Template fractional conversion , the mole fraction of the target or non-target template hybridized by the primer, is then calculated according to the equation:
Figure 5.
The ThermoPlex algorithm is subdivided into the target-specific primer selection and multiplex primer compatibility evaluation. *Selection criteria include: # of mismatch < user-defined mismatch parameter (); primer does not contain ambiguous nucleotide characters; primer does not form stable secondary structures (checked through MATLAB’s bioinformatics toolbox function rnafold); fractional specificity () and target fractional conversion (≥ user-defined threshold
| (5) |
where and are the initial concentrations of the target template and the primer, respectively. Derivation is explained further in the Supplementary Section S4. The derived model contains the variable in , which is a huge source of uncertainty due to the difficulty of knowing its exact value in terms of molar units. To ensure the robustness of the model despite this uncertainty, a sensitivity analysis for all the relevant variables (, MgCl2 concentration, and ) was performed to check whether the uncertainty in with respect to is relevant enough. This was implemented through a Monte-Carlo simulation with uniform distribution of input variables.
Fractional specificity () is then calculated as the ratio of between target () to non-target ():
| (6) |
describes how probable a primer will hybridize with the target compared to the undesirable non-target sequence—for example, 1000 means there is 1000 times more primer-target product compared to primer-non-target product upon annealing at the specified temperature. Both and are used as selection criteria based on a user-defined threshold to screen for primer specificity and efficiency, respectively. Primers that passed the criteria are then included in the set corresponding to primers derived from target specific against non-target . Upon completion of iteration through all the non-target sequences, the intersection of sets —where is the total number of targets—is then taken, comprising the set of primers for target specific to all the non-target sequences. In the event that the intersection results in an empty set, the process is repeated for a reduced primer length to increase the relative effects of mismatches and thus decrease increasing primer specificity. The whole process is repeated until each target has its own set of target-specific primers.
The set of all target-specific primers for each target is then used to build all possible primer combinations with one primer for each target, for a total of primers per combination. The combinations are evaluated on the constraint of amplicon site resolution (). Specifically, each primer in a multiplex combination should be spaced at least bp apart. This will ensure that primers will produce distinguishable gel bands corresponding to each of the targets in a multiplex set. Each primer combinations are then evaluated on its multi-reaction equilibrium thermodynamics using SiMulEq. New values for fractional specificity are also calculated in the context of the multiplex reaction system for each primer corresponding to the desired highest yielding primer, across each temperature increment :
| (7) |
This results in a curve describing the fractional conversion of target from its hybridization with primer relative to all the undesired double-stranded products due to hybridization between primers and target . The routine will ultimately output EPD curves, ) curves and gel profiles for each multiplex primer combination, which gives insights into the performance of each multiplex primer set across varying temperature conditions. The ThermoPlex’s subroutines—“Select Target-Specific Primers” and “Select Multiplex-Compatible Primers”—comprising the whole primer selection process, were benchmarked in terms of runtimes, with input parameters listed in Table 1, to test how fast each performs across various personal computing platforms.
Table 1.
ThermoPlex parameters used in the study for the design of Multiplex PCR Primers.
| Parameters | Defined or set as threshold |
|---|---|
| 55°C | |
| MgCl2 concentration | 10 mmol/l |
| Equimolar primer concentration () | 0.5 µmol/l |
| Frac Specificity at ( | 1000 |
| Frac Threshold at () | 0.5 |
| Initial primer length () | 20 |
| Mismatch parameter () | 3 |
| Amplicon length resolution () | 50 |
Laboratory validation
Performance of the predictive capabilities of the SiMulEq algorithm in simulating multi-reaction equilibrium primer interactions was tested using an in-house multiplex assay for identifying five [5] sardine species through the amplification of Cytochrome C Oxidase I (COI) gene. These primers were previously designed using conventional heuristics (e.g. melting temperature balancing, minimal mismatches) and trial-and-error approaches without thermodynamic modeling. A simulation was performed using the primer sequences listed in Table 2 and then compared with gel electrophoresis profile of the assay.
Table 2.
Target species sample set for the PCR trial run with corresponding specific primers and expected amplicon product size for both In-house and ThermoPlex-designed primers.
| Target species | Sequence (5′ to 3′) | Approx. product size relative to sequence (bp) |
|---|---|---|
| In-house primers | ||
| S.fimbriata | CTAACAGACCGAAACCTAAA | 121 |
| S.lemuru | ATATCAAACCCCACTCTTCG | 215 |
| H.quadrimaculatus | AGTTATACCCATCCTGATCG | 542 |
| S.tawilis | ACCTTACCATCTTCTCACTC | 318 |
| A.sirm | CAGTATACCCCCCACTTTCT | 369 |
| Designed primers | ||
| S.fimbriata | TATTACTACGATTATCAACA | 171 |
| S.goni | CTCCGTCGACCTAACTATTT | 239 |
| S.lemuru | GATCAAATCTACAATGTTAT | 511 |
| H.quadrimaculatus | GAAATTTAAACACAACCTTC | 24 |
The capabilities of the ThermoPlex algorithm to assist in the screening of target-specific multiplex PCR primers were also assessed. Multiplex PCR primers (Table 2), also targeting the COI gene region, were designed for an assay to delineate four [4] sardine species using the algorithm. This set excluded two species from the first pentaplex validation assay (S.tawilis and A.sirm) and incorporated one new species (S.goni) due to evolving project objectives, prioritization, and inherent limitations of ThermoPlex with respect to its input data. Specifically, the distribution of informative polymorphic sites for S.tawilis and A.sirm sequences was insufficient to support the automated selection of target-specific primers, particularly when both were included alongside S.goni. This reflects a key limitation of the algorithm, particularly in its input sequence data, which depends on the presence of localized sequence variation to enable effective target discrimination. One of the three predicted sets of primers was then randomly picked to be evaluated in the laboratory for its performance. For all the PCR reactions performed, one universal reverse primer was used with the sequence 5′-TAGACTTCTGGGTGGCCAAAGAATCA-3′, known to work for all the species investigated in the study. The reverse primer was intentionally conserved across all assays to ensure that gel band distinctions would be driven solely by forward primer specificity. This also allowed direct assessment of forward primer performance in multiplex settings.
High-quality DNA used for the experiment was extracted from tissue samples of each of the target species using Qiagen’s DNeasy Blood & Tissue Kit. The extracted DNA was amplified through PCR using a 10 µl PCR mix comprised of 2.45 µl of ultrapure water, 2 µl of 5x GoTaq® PCR Buffer, 0.4 µl of 25 mM MgCl2, 0.8 µl of 2.5 mM dNTP, 5 × 0.5 µl of 10 µM each of the five forward primers, 0.5 µl of 10 µM reverse primer, 0.8 µl of Bovine Serum Albumin (BSA), 0.05 µl of 5 U/µl GoTaq® polymerase and 0.5 µl of DNA template. Gel electrophoresis was then performed using 3% agarose gel to view the results of the amplification.
Results and discussion
Benchmarking of ThermoDHyb algorithm and ThermoPlex subroutines
Table 3 provides the list of model equations used to assess the computational complexity of the ThermoDHyb algorithm, and their corresponding fitted parameters. Visual inspection of the results of the time-complexity benchmark (Fig. 6) suggests that the quadratic model is the best fit for the resulting simulation runtimes as it demonstrates good overlap with the data points relative to other polynomial models. Moreover, the fitting parameter (y-intercept) of the quadratic model has the closest value to zero, suggesting the model’s ideal fit compared to others. All these indicate an average-case quadratic time complexity () for the ThermoDHyb algorithm, for the special case of equal sequence length (). The general case is thus expected to be for , as this represents the computational bottleneck of filling up the matrix of doublet interactions. The sub-routine’s efficiency better fits the iterative nature of the ThermoPlex algorithm compared to other published algorithms predicting nucleic acid thermodynamic interactions, such as McCaskill’s [32] and Dirks’s [33], both running on time, and NUPACK’s time [36]. Although their algorithm’s sampling of possible microstates in the ensemble is more rigorous, the microstates predicted in ThermoDHyb should be enough to sample the most important contributors in the ensemble free energy, particularly in the case of PCR applications, where interactions are limited to small oligonucleotide primers that seldom adopt more complex secondary structures and interactions such as pseudoknots. Indeed, the benchmark results (Table 4) manifest the algorithm’s efficiency, with subroutines completing the computations under 10 min across various personal computers and operating systems tested in the study.
Figure 6.
Benchmarking for time-complexity of ThermoDHyb through exploration of runtime as a function of sequence length (bp). Box plots (blue) were generated from sequence length of 15–100 bp at 1 bp increment with five replicates each bp. The orange lines represent each polynomial model fitted through non-linear least squares
Table 4.
Computing platform specifications, subroutine average runtimes for three replicates, and input parameters for the ThermoPlex benchmarking.
| 1 | 2 | 3 | |
|---|---|---|---|
| Specifications | |||
| Model | ASUS TUF A15 | MSI GL63 8RD | MacBook Pro 2019 |
| Operating system | Windows 10 | Windows 10 | MacOS Catalina (10.15.7) |
| Processor | AMD Ryzen 7 4800H | Intel® Core™ i7—8750 | Intel® Core™ i5—8279U |
| No. of cores | 8 | 6 | 4 |
| Clock rate | 2.9 GHz | 2.20 GHz | 1.4 GHz |
| RAM | 8 GB | 12 GB | 8 GB |
| Average runtimes | |||
| Select target-specific primers | 54 s | 1 min 32 s | 1 min 44 s |
| Select multiplex-compatible primers | 3 min 43 s | 5 min 5 s | 5 min 12 s |
| TOTAL | 4 min 37 s | 6 min 37 s | 6 min 56 s |
| Benchmark parameters | |||
| No. of target sequences | 4 | ||
| Sequence alignment length | 563 | ||
| No. of predicted multiplex combinations | 3 | ||
Meanwhile, results of the linear regression analysis (Fig. 7) between ThermoDHyb and three other software (Visual OMP [34], (B) DINAMelt [35], and (C) NUPACK [36]) suggest good concordance in the predicted values, especially with Visual OMP. A near-unity regression slope and of 0.9959 and 0.9770, respectively, as well as a near-zero y-intercept of 0.0192 indicate a good one-to-one agreement in the predicted values. Good concordance (84%) in terms of predicted secondary structures is also observed, with non-concordance existing only at more positive values (> −3 kcal/mol). These suggest that the ThermoDHyb algorithm is at least at par with the accuracy of the Visual OMP software in terms of predicting the thermodynamic interactions of two oligonucleotides. Although quite expensive, the Visual OMP software has been extensively used in numerous multiplex-PCR primer design studies [37–40] and yielded accurate results. Moreover, fairly good concordance in predicted secondary structures (70% and 84%) and values, at least in terms of trend, can still be seen in the comparison with both DINAMelt and NUPACK. It is also worth noting that the y-intercept of Fig. 7C implies a shift in the predicted value of ∼3.41 kcal/mol on average by NUPACK. This shift is consistent with the study of Zhang et al. [41], where they needed to adjust the NUPACK-predicted values by 1.5 kcal/mol for their model to better fit the experimental results. All of these results establish the reliability of the ThermoDHyb algorithm and thus ensure a solid framework for thermodynamic calculations of oligonucleotide interactions performed within the ThermoPlex routines.
Figure 7.
Linear regression analysis for the benchmarking of predicted global minimum free energy values (kcal/mol) simulated at 37°C and1M NaCl solution as predicted by ThermoDHyb as compared to various software: (A) Visual OMP [34], (B) DINAMelt [35], and (C) NUPACK [36]. x- and y-axis correspond to predicted by ThermoDHyb and the other three software, respectively. Blue and orange data points correspond to concordant and non-concordant predicted secondary structures between ThermoDHyb and the software
Sensitivity analysis of the model for fractional conversion
The scatter plots from Monte Carlo (MC) simulations in Fig. 8A–C show that is most sensitive to temperature parameter changes, followed by MgCl2 concentration, and lastly . This is elucidated by a more distinct pattern in the plot of C than in B, and then in A. Such is an expected trend since DNA hybridization is most strongly influenced by temperature, as heat is mainly responsible for breaking the hydrogen bonds of the DNA’s N-bases in PCR reactions. Meanwhile, salt concentration moderately influences DNA hybridization stability through the neutralization of the negative charges on the backbones by cationic metals (Na+ and Mg2+), reducing electrostatic repulsion between the phosphate groups [42]. On the other hand, no pattern is discernible in plot A, but further evaluation was performed since the pattern may be heavily diluted by high variations in . This was done by performing another MC simulation where is held constant at 62°C (Fig. 8D and E), thus comparing only the influence of MgCl2 and parameters relative to each other. The results of the second simulation depict that although a pattern is completely discernible in plot E and still none in plot D, outliers, colored in yellow, can be observed in the former indicating that still influences the model to a certain extent. However, linking these outliers to plot D (also yellow) suggests that the significant influence of is limited only to lower limit values ( <30), while the influence diminishes as values of go higher (blue). PCR reactions are typically conducted with of at least 108 to ensure that template will anneal to primers rather than itself [43], thus, the model is sufficiently valid for PCR applications despite the uncertainties attributed to . The simulation results also provide an interesting insight into the context of modeling any bimolecular interactions beyond PCR applications, as the function was derived from a generalized mass action equation for bimolecular reactions. For instance, the model can be used for predicting fractional conversion of a limiting reactant even without accurate information about its value, given that the excess reactant is at least a factor of 102 more than the limiting.
Figure 8.
Monte Carlo (MC) simulation (n = 1000) scatter plots of versus (A and D), MgCl2 conc. (B and E), and Temperature (C). The parameter was held constant at 62°C for MC simulations (n = 1000) in D and E
SiMulEq and ThermoPlex simulations, and laboratory validation
Generated EPD curves from Fig. 9 through the SiMulEq algorithm illustrate the predicted product distributions of the whole multiplex assay as functions of temperature. Each plot corresponds to the multiplex assay’s performance on each target species’ gene sequence. Inspection of the resulting EPD curves suggests that the primers for S.lemuru and S.fimbriata are highly specific to their own respective target species. This is because in plots D and E, apart from products from their respective targets, there are no other curves visible. Such is the opposite to plots A to C, as multiple curves are visible—aside from each of their own target curves—particularly for S.tawilis (plot B), wherein multiple curves are already obvious for as high a temperature as 55°C. This implies that the primer for S.tawilis should demonstrate a certain degree of non-specificity toward S.lemuru and S.fimbriata. Indeed, this is consistent with experimental results as shown in Fig. 10. Gel electrophoresis profile for the assay indicates the same expected non-specificity of S.tawilis primers for both species, as shown by the two faint gel bands in its lane whose sizes are consistent with that of S.lemuru and S.fimbriata PCR products. The consistency of SiMulEq simulations with experimental results demonstrates the reliability of the algorithm in the thermodynamic analysis of multiplex PCR assays. It is worth noting, however, that this type of theoretical analysis does not directly translate to expected PCR products per se due to the simulation capturing only what is happening per annealing cycle, or even just the first cycle at best. Nevertheless, the analysis done on the EPD curves still proves to be insightful since it illustrates the assay’s behavior when hybridization due to non-specific primer binding becomes more and more significant at lower temperatures.
Figure 9.
Equilibrium Product Distribution (EPD) curves as function of temperature generated by SiMulEq from the in-house developed multiplex PCR assay primers simulated per target COI gene sequence. Each color and figure corresponds to the target species the primer is designed for: (A) H. quadrimaculatus, (B) S. tawilis, (C) A. sirm, (D) S. lemuru, and (E) S. fimbriata species. Primer sequences and their targets are listed in Table 3
Figure 10.
Output of the ThermoPlex’s design routine showing (A) EPD curves, (B) curves (log10 y-axis scale) as functions of temperature (black lines correspond to ψ|Tsimulation= 1000), and (C) predicted gel electrophoresis profiles for the multiplex assay
The output of the ThermoPlex design routine is shown in Figs 9 and 11, which includes the multiplex specificity () curves and the predicted gel electrophoresis profile for the designed assay. Due to the unavailability of a scoring function (still under development) to rank all the designed candidate multiplex primer sets, random selection of one assay out of three candidate primer sets was done for laboratory validation. The curves in Fig. 12 show the temperature limits at which each primer in the assay should theoretically retain its specificity above the user-defined threshold (black horizontal line). The curves suggest that at of 55°C, the assay retains specificity, with the least specific primer (S.lemuru) still having values in the order of 104, that is, the primer is expected to hybridize 104 times more to the target than all the other primers in the assay combined. The result of the simulation is indeed supported by the experimental results in Fig. 12. The gel electrophoresis profile of the designed multiplex assay shows no discernible non-specificity, with only one distinct gel band manifesting per each target. The experimental results thus show that the set of multiplex primers automatically designed by ThermoPlex performs adequately, proving the reliability of the primer screening routine that the software implements. Tests involving larger data sets (e.g. longer sequence length, more targets), however, should be performed to further assess the automated screening capabilities of ThermoPlex.
Figure 11.
Gel electrophoresis profile of the PCR products for in-house developed multiplex PCR assay. Characteristic ladder was produced through equimolar mixture of single-plex amplicons from each species
Figure 12.
Gel electrophoresis profile of the PCR products of ThermoPlex-designed multiplex PCR assay. Characteristic ladder was produced through equimolar mixture of single-plex amplicons from each species
Conclusion and recommendations
Despite the proven performance of ThermoPlex in automatically designing multiplex PCR primers, one caveat is the limitation in terms of sequence input to the program, specifically:
The program strictly needs the sequence input to be aligned and trimmed to equal length.
In its current state, it is unable to handle sequence alignments with gaps, that is, gene regions with insertions and deletions.
It also fails to pick candidates for sequences with highly dispersed and sparse polymorphic sites despite having fairly high genetic distances.
The third limitation was exemplified in our study by S.tawilis and A.sirm, where the pattern of sequence variation hindered the identification of discriminatory primer binding sites. While these species were ultimately excluded from the automated design routine testing, their exclusion underscores the importance of polymorphism clustering, beyond overall sequence divergence, in enabling effective computational primer design. As the number of targets in a multiplex assay increases, the algorithm faces a progressively constrained design space: primers must achieve high specificity across a larger set of competing sequences, which amplifies the requirement for locally concentrated, target-distinctive sequence polymorphisms for each target. In this context, dispersed or weakly informative variation becomes increasingly inadequate. It may be necessary, in such cases, to evaluate alternative gene regions with higher local discriminatory power, while keeping track of the balance between interspecific resolution and universality within the target taxa.
Moreover, the laboratory performance assessments so far of ThermoPlex and all of its subroutines lack sufficient quantitative basis despite the program being motivated by quantitative thermodynamic variables. Hence, it would be best to test its predictions with quantitative experiments such as qPCR-based methodologies. These types of experiments would also empirically fine-tune the models underlying the algorithm’s framework, making them more consistent with real-life expectations.
Input-parameter optimization was also not done in order to exhaustively explore the output candidate primer space to design more optimal multiplex primers. Additionally, a ranking/scoring system would also be needed, especially if the program were able to select numerous candidate primer sets that will be difficult for the user to evaluate and handle. Presently, the workaround would be to make the parameters—for example, MgCl2 concentration, specificity thresholds—stringent enough to produce a manageable number of candidate sets. All of these will be implemented in the future to further optimize the selection process.
In addition, a key limitation of the present ThermoPlex framework is that, while it comprehensively evaluates equilibrium concentrations for primer–target and primer–non-target interactions, it does not yet incorporate explicit penalties for primer–primer hybridizations (primer–dimer formation). This omission is less consequential in low-plex assays (e.g. four to five targets as was conducted in this study), where dimerization is comparatively rare and unlikely to distort amplification outcomes. However, as multiplex complexity increases, the risk of primer–dimer formation scales disproportionately and may significantly influence both reaction efficiency and specificity. Future iterations of ThermoPlex will therefore integrate a scoring module that accounts for thermodynamic stability and predicted equilibrium abundance of dimeric structures. Incorporating such a criterion, validated against experimental datasets, will provide a more rigorous selection framework under high-plex conditions and strengthen the predictive robustness of the platform.
In summary, the ThermoPlex program demonstrates great potential to improve the accuracy and speed of designing multiplex PCR primers, given all the evidence presented in this study. With the program’s automated screening routine, designers can expect to have candidate multiplex PCR primers within minutes, subject to the speed of the computing platform, number of design targets, length of the input sequence, and number of candidate multiplex sets predicted. All these are achievable through common personal computers without the need for high-performance computing. Moreover, apart from the automated primer screening algorithm, the program also offers useful and insightful tools motivated by DNA thermodynamics via its ThermoDHyb and SiMulEq routines. Depending on the needs of the user, these novel algorithms can be used independently or in concert with other existing programs. Although already useful in its current state, further improvements in the future are to be implemented to enhance the program’s capabilities, such as more robust handling of sequence alignment, ability to score and rank predicted multiplex sets, and possible expansion to a web server format.
Supplementary Material
Acknowledgements
The authors would like to acknowledge the contribution of Dr Ma. Rio. Naguit, Dr Asuncion De Guzman, Mr Jerry Garcia, Mr Jhunrey Follante, and Mr John Christopher Azcarraga in the collection and processing of sardine samples. Additionally, the authors would like to express their gratitude to Ms. Dame Loveliness Apaga for her valuable insights into chemical thermodynamics and physical chemistry.
Contributor Information
Altair Agmata, Department of Chemical Engineering, University of the Philippines, Diliman, Quezon City 1101, Philippines; The Marine Science Institute, University of the Philippines, Diliman, Quezon City 1101, Philippines.
Kevin Labrador, The Marine Science Institute, University of the Philippines, Diliman, Quezon City 1101, Philippines.
Joseph Dominic Palermo, The Marine Science Institute, University of the Philippines, Diliman, Quezon City 1101, Philippines; Institute of Environmental Science and Meteorology, University of the Philippines, Diliman, Quezon City 1101, Philippines.
Maria Josefa Pante, The Marine Science Institute, University of the Philippines, Diliman, Quezon City 1101, Philippines.
Supplementary data
Supplementary data is available at Biology Methods and Protocols online.
Conflict of interest statement. None declared.
Funding
The work was made possible through the funding support by the Department of Science and Technology—Philippine Council for Agriculture, Aquatic, and Natural Resources Research and Development (DOST-PCAARRD).
Data availability
The standalone app, together with the source codes, is available in the GitHub repository (https://github.com/aagmata/ThermoPlex).
References
- 1. Ali ME, Razzak MA, Hamid SBA. Multiplex PCR in species authentication: probability and prospects—a review. Food Anal Methods 2014;7:1933–49. [Google Scholar]
- 2. Aguilera-Arreola MG, Gonzalez-Cardel AM, Tenorio AM et al. Highly specific and efficient primers for in-house multiplex PCR detection of Chlamydia trachomatis, Neisseria gonorrhoeae, Mycoplasma hominis and Ureaplasma urealyticum. BMC Res Notes 2014;7:433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Ke L, Wang L, Li H et al. Molecular identification of lactic acid bacteria in Chinese rice wine using species-specific multiplex PCR. Eur Food Res Technol 2014;239:59–65. [Google Scholar]
- 4. Moon JC, Kim JH, Jang CS. Development of multiplex PCR for species-specific identification of the Poaceae family based on chloroplast gene, rpoC2. Appl Biol Chem 2016;59:201–7. [Google Scholar]
- 5. Ali ME, Razzak MA, Hamid SBA et al. Multiplex PCR assay for the detection of five meat species forbidden in Islamic foods. Food Chem 2015;177:214–24. [DOI] [PubMed] [Google Scholar]
- 6. Hou B, Meng X, Zhang L et al. Development of a sensitive and specific multiplex PCR method for the simultaneous detection of chicken, duck and goose DNA in meat products. Meat Sci 2015;101:90–4. [DOI] [PubMed] [Google Scholar]
- 7. Piergiorge RM, Pontes MN, Duarte AVB et al. Haplotype-specific single-locus multiplex PCR assay for molecular identification of sea-bob shrimp, Xiphopenaeus kroyeri (Heller, 1862), cryptic species from the Southwest Atlantic using a DNA pooling strategy for simultaneous identification of multiple s. Biochem Syst Ecol 2014;54:348–53. [Google Scholar]
- 8. You J, Huang L, Zhuang J et al. Species-specific multiplex real-time PCR assay for identification of deer and common domestic animals. Food Sci Biotechnol 2014;23:133–9. [Google Scholar]
- 9. Ravago-Gotanco RG, Manglicmot MT, Pante MJR. Multiplex PCR and RFLP approaches for identification of rabbitfish (Siganus) species using mitochondrial gene regions. Mol Ecol Resour 2010;10:741–3. [DOI] [PubMed] [Google Scholar]
- 10. Ishige T, Murata S, Taniguchi T et al. Highly sensitive detection of SARS-CoV-2 RNA by multiplex rRT-PCR for molecular diagnosis of COVID-19 by clinical laboratories. Clin Chim Acta 2020;507:139–42. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Mancini F, Barbanti F, Scaturro M et al. ; Istituto Superiore di Sanità (ISS) COVID-19 Team. Multiplex real-time reverse-transcription polymerase chain reaction assays for diagnostic testing of severe acute respiratory syndrome coronavirus 2 and seasonal influenza viruses: a challenge of the phase 3 pandemic setting. J Infect Dis 2021;223:765–74. 10.1093/infdis/jiaa658 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Park M, Won J, Choi BY et al. Optimization of primer sets and detection protocols for SARS-CoV-2 of coronavirus disease 2019 (COVID-19) using PCR and real-time PCR. Exp Mol Med 2020;52:963–77. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Kudo E, Israelow B, Vogels CBF et al. ; Yale IMPACT Research Team. Detection of SARS-CoV-2 RNA by multiplex RT-qPCR. PLOS Biol 2020;18:e3000867. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Bustin S, Huggett J. qPCR primer design revisited. Biomol Detect Quantif 2017;14:19–28. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Henegariu O, Heerema NA, Dlouhy SR et al. Multiplex PCR: critical parameters and step-by-step protocol. Biotechniques 1997;23:504–11. [DOI] [PubMed] [Google Scholar]
- 16. Yuryev A. PCR Primer Design. Totowa, NJ: Humana Press, 2007. [Google Scholar]
- 17. Mann T, Humbert R, Dorschner M et al. A thermodynamic approach to PCR primer design. Nucleic Acids Res 2009;37:e95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. SantaLucia J, Hicks D. The thermodynamics of DNA structural motifs. Annu Rev Biophys Biomol Struct 2004;33:415–40. [DOI] [PubMed] [Google Scholar]
- 19. SantaLucia J. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci USA 1998;95:1460–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Allawi HT, SantaLucia J. Thermodynamics and NMR of internal G.T mismatches in DNA. Biochemistry 1997;36:10581–94. [DOI] [PubMed] [Google Scholar]
- 21. Allawi HT, SantaLucia J. Nearest neighbor thermodynamic parameters for internal G?? A mismatches in DNA. Biochemistry 1998;37:2170–9. [DOI] [PubMed] [Google Scholar]
- 22. Bommarito S, Peyret N, SantaLucia JJ. Thermodynamic parameters for DNA sequences with dangling ends. Nucleic Acids Res 2000;28:1929–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Allawi HT, SantaLucia J. Thermodynamics of internal C?? T mismatches in DNA. Nucleic Acids Res 1998;26:2694–701. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Allawi HT, SantaLucia J. Nearest-neighbor thermodynamics of internal A·C mismatches in DNA: sequence dependence and pH effects. Biochemistry 1998;37:9435–44. [DOI] [PubMed] [Google Scholar]
- 25. Crothers DM, Zimm BH. Theory of the melting transition of synthetic polynucleotides: evaluation of the stacking free energy. J Mol Biol 1964;9:1–9. [DOI] [PubMed] [Google Scholar]
- 26. Kallenback NR, Crothers DM. Theory of thermal transitions in cohered DNA from phage lambda. Proc Natl Acad Sci USA 1966;56:1018–25. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol 1981;147:195–7. [DOI] [PubMed] [Google Scholar]
- 28. Owczarzy R, Moreira BG, You Y et al. Predicting stability of DNA duplexes in solutions containing magnesium and monovalent cations. Biochemistry 2008;47:5336–53. [DOI] [PubMed] [Google Scholar]
- 29. Dimitrov R. A, Zuker M. Prediction of hybridization and melting for double-stranded nucleic acids. Biophys J 2004;87:215–26. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Smith JM, Abbott MM, Van Ness HC. Introduction to Chemical Engineering Thermodynamics, 7th edn. New York, NY, USA: McGraw-Hill, 2005. [Google Scholar]
- 31. Montero-Pau J, Gómez A, Muñoz J. Application of an inexpensive and high-throughput genomic DNA extraction method for the molecular ecology of zooplanktonic diapausing eggs. Limnol Ocean Methods 2008;6:218–22. [Google Scholar]
- 32. McCaskill JS. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 1990;29:1105–19. [DOI] [PubMed] [Google Scholar]
- 33. Dirks RM, Bois JS, Schaeffer JM et al. Thermodynamic analysis of interacting nucleic acid strands. SIAM Rev 2007;49:65–88. [Google Scholar]
- 34. SantaLucia J. Physical Principles and Visual-OMP Software for Optimal PCR Design. Totowa, NJ: Humana Press, 2007, 3–33. [DOI] [PubMed] [Google Scholar]
- 35. Markham NR, Zuker M. DINAMelt web server for nucleic acid melting prediction. Nucleic Acids Res 2005;33:W577–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Zadeh JN, Steenberg CD, Bois JS et al. NUPACK: analysis and design of nucleic acid systems. J Comput Chem 2011;32:170–3. [DOI] [PubMed] [Google Scholar]
- 37. Medina RA, Rojas M, Tuin A et al. Development and characterization of a highly specific and sensitive SYBR green reverse transcriptase PCR assay for detection of the 2009 pandemic H1N1 influenza virus on the basis of sequence signatures. J Clin Microbiol 2011;49:335–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Pierce KE, Khan H, Mistry R et al. Rapid detection of sequence variation in Clostridium difficile genes using LATE-PCR with multiple mismatch-tolerant hybridization probes. J Microbiol Methods 2012;91:269–75. [DOI] [PubMed] [Google Scholar]
- 39. Moser MJ, Christensen DR, Norwood D et al. Multiplexed detection of anthrax-related toxin genes. J Mol Diagn 2006;8:89–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Ronish B, Hakhverdyan M, Ståhl K et al. Design and verification of a highly reliable linear-after-the-exponential PCR (LATE-PCR) assay for the detection of African swine fever virus. J Virol Methods 2011;172:8–15. [DOI] [PubMed] [Google Scholar]
- 41. Zhang DY, Chen SX, Yin P. Optimizing the specificity of nucleic acid hybridization. Nat Chem 2012;4:208–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Tan Z-J, Chen S-J. Nucleic acid helix stability: effects of salt concentration, cation valence and size, and chain length. Biophys J 2006;90:1175–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Cha RS, Thilly WG. Specificity, efficiency, and fidelity of PCR. PCR Methods Appl 1993;3:S18–S29. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The standalone app, together with the source codes, is available in the GitHub repository (https://github.com/aagmata/ThermoPlex).











