Abstract
Polymerase Chain Reaction (PCR) is a critical step in amplicon-based microbial community profiling, allowing the selective amplification of marker genes such as 16S rRNA from environmental or host-associated samples. Despite its widespread use, PCR is known to introduce amplification bias, where some DNA sequences are preferentially amplified over others due to factors such as primer-template mismatches, sequence GC content, and secondary structures. Although these biases are known to affect transcript abundance, their implications for ecological metrics remain poorly understood. In this study, we conduct a comprehensive evaluation of how PCR-bias influences both within-samples (-diversity) and between-sample (-diversity) analyses. We show that perturbation-invariant diversity measures remain unaffected by PCR bias, but widely used metrics such as Shannon diversity and Weighted-Unifrac are sensitive, with their values varying according to the true community composition. To address this, we provide theoretical and empirical insight into how PCR-induced bias varies across ecological analyses and community structures, and we offer practical guidance on when bias-correction methods should be applied. Our findings highlight the importance of selecting appropriate diversity metrics for PCR-based microbial ecology workflows and offer guidance for improving the reliability of diversity analyses.
1. Introduction
The emergence of next-generation sequencing technologies, along with advances in computational tools, has substantially expanded microbiome research across biomedicine [Tringe and Hugenholtz, 2008]. Amplicon-based microbiome studies typically target hypervariable regions of the 16S rRNA gene, using them as barcodes to quantify the relative abundance of bacterial taxa in mixed communities [Youssef et al., 2009, Caporaso et al., 2011]. To isolate the 16S rRNA gene and generate sufficient material for sequencing, multiple cycles of polymerase chain reaction (PCR) amplification are usually required [Pinto and Bhatt, 2024]. However, amplification efficiency can vary across 16S rRNA templates, leading to systematic over- or under-representation of certain taxa in sequencing libraries [Acinas et al., 2005, Suzuki and Giovannoni, 1996, Polz and Cavanaugh, 1998]. These biases can lead to over 30-fold errors in observed relative abundances, severely distorting estimates of microbial composition [Silverman et al., 2021]. Overall, PCR bias can be a substantial source of error in 16S rRNA studies [Acinas et al., 2005, Eisenstein, 2018, Polz and Cavanaugh, 1998, Suzuki and Giovannoni, 1996, McLaren et al., 2019, Silverman et al., 2021, Korvigo et al., 2022]. Despite efforts to optimize experimental protocols (e.g., Krehenwinkel et al. [2017]), PCR bias remains an outstanding problem.
PCR bias likely originates from multiple distinct processes. Mismatches between primer and template sequences are one source of bias [Parada et al., 2016]. Yet, this primer-mismatch bias is unlikely to persist past the initial cycles of PCR, after which point the template sequence is replaced by a sequence complementary to the primers themselves [Wu et al., 2009]. Still other transcript level factors can lead to Non-Primer-Mismatch (NPM) bias which persist through later cycles [Silverman et al., 2021]. For example, GC-rich templates are more stable and can require higher melting temperatures or can form hairpin structures that hamper amplification [Frey et al., 2008]. Mock community studies suggest that sources of NPM bias dominates primer-mismatch biases [Silverman et al., 2021, Korvigo et al., 2022, Gimpel et al., 2024]. Because NPM bias is the dominant and persistent source of PCR distortion, we use the terms “PCR bias” and “NPM bias” interchangeably throughout this work.
Recent studies have shown that PCR bias is highly consistent across PCR cycles and can be accurately modeled by a simple exponential bias model. Let represent the true ratio of two templates before amplification (i.e., at cycle 0), and let denote the ratio of their per-cycle amplification efficiencies, where each , with 1 indicating no amplification and 2 indicating perfect doubling. After cycles of PCR, the observed ratio of the two templates becomes . These quantities are related through the following model:
| (1) |
This model was originally proposed by Suzuki and Giovannoni [1996] who used it to study PCR reactions with two templates. This model was recently generalized by Silverman et al. [2021] and McLaren et al. [2019] who extended it to the case where there are more than two templates. Silverman et al. [2021] further generalized this model to situations where the ratios (e.g., ) are not measured directly but observed only through noisy sequencing data. By sequencing 16S rRNA libraries with different numbers of PCR cycles, those authors could directly infer relative PCR efficiencies (e.g., ) and the unbiased ratio abundances (e.g., ). They found that this simple model explained over 95% of variation due to PCR bias in both mock and real 16S rRNA libraries. Those findings have subsequently been validated by multiple groups [Gimpel et al., 2024, Korvigo et al., 2022].
Despite advances in modeling PCR bias, relatively little research has investigated which types of microbiome analyses are most affected. For instance, it remains unclear when researchers should perform calibration experiments, such as sequencing replicates with varying numbers of PCR cycles, to enable application of models like Equation (1). Most existing studies have focused on how PCR bias distorts estimates of relative abundance, leading to systematic over- or under-representation of certain taxa. However, such distortions may not affect all downstream analyses equally. For example, both Silverman et al. [2021] and McLaren et al. [2019] have shown that differential abundance analyses, which aim to estimate changes in relative abundance across conditions (e.g., health vs. disease), can be invariant to PCR bias. In contrast, other commonly used approaches remain under-explored. In particular, ecological diversity measures such as -diversity (e.g., Shannon index) and -diversity (e.g., Bray-Curtis dissimilarity) form the basis of many microbiome analyses, yet the extent to which PCR bias affects these metrics has not been systematically studied.
This article presents the first systematic evaluation of how PCR bias affects ecological analyses based on - and -diversity metrics. First, we prove that there exists a class of perturbation-invariant diversity measures that are unaffected by PCR bias. Second, we demonstrate that a broader class of perturbation-sensitive metrics-including Shannon diversity and Bray–Curtis dissimilarity—are impacted by PCR bias. Importantly, this bias is not consistent: it depends on the true underlying composition (e.g., ), implying that PCR bias can also distort differential diversity analyses aimed at identifying changes in diversity across groups. To guide future research, we provide intuitive explanations of how bias varies with community composition and which types of microbial communities are most susceptible. Based on our findings, we offer a simple recommendation: researchers who adopt perturbation-invariant metrics need not perform calibration experiments to mitigate PCR bias, whereas those using perturbation-sensitive metrics should strongly consider doing so.
2. Results
2.1. A Statistical Model for PCR Bias
This article uses the Silverman et al. [2021] model which extends Equation (1): accounting for more than two taxa and measurement error. Note that Equation (1) can be written as a linear log-contrast model:
and are 2-vectors. The vector is an example of a contrast, a vector with elements that sums to 0. To extend this model beyond taxa we use this linear log-contrast form.
The vector can be thought of as a 1 × 2-matrix. In moving to we now consider a contrast matrix with columns that sum to zero: for all . We now have a multivariate linear log-contrast model:
| (2) |
where , and are all -vectors. For our purposes, we can choose any contrast matrix with rank , e.g., a matrix which corresponds to the contrast matrix of the Additive Log-Ratio Transform [Pawlowsky-Glahn et al., 2015]. We use the following notation as a shorthand: , and resulting in a linear model: .
Sequencing itself is a noisy measurement process that produces a zero-ladden -dimensional count matrix with element representing the number of sequencing reads mapping to taxon in sample . As a result, we cannot measure directly, or calculate it directly from , instead, Silverman et al. [2021] proposed the following Bayesian hierarchical model which treats as nuissance parameters to be estimated along with the parameters of interest and :
where is a -dimensional vector of relative abundances, is a dimensional vector of log-ratios and is a -dimensional covariance matrix that is also estimated from the observed data. As in Silverman et al. [2021], we estimate the parameters in this model ( and ) and the nuisance parameters ( and ) using the fido R library Silverman et al. [2022]. This model has been experimentally validated by multiple groups [Korvigo et al., 2022, Gimpel et al., 2024, Silverman et al., 2021].
When applied to datasets that include calibration samples—i.e., samples sequenced after undergoing different numbers of PCR cycles—this model enables inference of the taxon-specific relative amplification efficiencies and the underlying unbiased relative abundances , with quantified uncertainty. In practice, we often include additional covariates (beyond cycle number, ) to account for batch effects. By including additional covariates, this model can also be applied to samples from distinct microbial communities, each with their own unbiased relative abundance vector (See Methods 4.1).
To illustrate the model, we reproduce the analysis of Silverman et al. [2021] to estimate how PCR bias alters relative abundance estimates in both an in vitro and an ex vivo human gut microbiome study. Supplemental Figure S1, which parallels figures from the original study, visually demonstrates the extent of this distortion. Briefly, some taxa, such as Holdemania, are consistently underrepresented by a factor of approximately 32, whereas others, such as Bacteroides, are over-represented by a factor of about 4. Overall, many taxa show clear evidence of PCR bias, with 95% credible intervals for their amplification efficiencies excluding the null (no bias).
2.2. Sensitivity and Invariance of Estimands to PCR Bias
Most prior work on PCR bias has focused on its effect on relative abundance estimates (e.g., Silverman et al. [2021]). However, relative abundances are often not the ultimate quantity of interest; instead, they serve as intermediate inputs for downstream ecological or statistical estimands. Here, we show that such estimands can be broadly classified into two categories: perturbation-invariant estimands, which remain unaffected by PCR bias, and perturbation-sensitive estimands, whose values can change substantially under PCR bias. This distinction provides a principled framework for understanding when PCR bias can be safely ignored and when it must be explicitly accounted for.
Let denote the estimated relative abundance of taxon in the -th sample. A common estimand in differential abundance analyses is the relative log-fold-change: the difference in mean log-ratio abundance between two biological conditions (e.g., health versus disease):
In Corollary 1, we show that is an example of a broader class of estimands that are perturbation invariant. Formally, an estimand is perturbation invariant if it satisfies
for any -dimensional vector .
From the perspective of PCR bias, perturbation-invariant estimands are special because they are also invariant to PCR bias. We prove this formally in Theorem 1. This theorem and corollary formalize prior work which hypothesized that relative log-fold-changes were invariant to PCR bias [McLaren et al., 2022, Silverman et al., 2021]. Intuitively, PCR bias, as modeled in Equation (2), acts as a sample-specific perturbation: it shifts all log-ratio coordinates by the term . This shift is analogous to adding a global nuisance term , which uniformly affects all log-ratios and therefore cancels out in any estimand based solely on contrasts (i.e., differences between taxa rather than their absolute log-ratio coordinates). Thus, perturbation-invariant estimands are unaffected by PCR bias. In the following section, we turn to perturbation-sensitive estimands, which do not share this invariance and can be strongly influenced by PCR bias.
2.3. Common Ecological -Diversity Analyses are Impacted by PCR Bias
The use of -diversity metrics in microbiome research has become ubiquitous with the rise of next-generation sequencing (NGS). These metrics provide a quantitative description of within-sample diversity, capturing aspects of richness and evenness, and are central to ecological and microbiological studies. Reliable and reproducible estimates are particularly important because microbial community structure can strongly influence host physiology and ecosystem stability [Cassol et al., 2025]. However, unlike relative log-fold-change estimands , we find that -diversity estimands are perturbation sensitive and therefore susceptible to PCR bias.
We evaluated four commonly used -diversity metrics derived from relative abundances: Shannon’s index, Simpson’s index, the Gini coefficient, and the Aitchison norm[Willis and Martin, 2020, Lian et al., 2024, Egozcue and Pawlowsky-Glahn, 2019]. Using the Bayesian multinomial logistic-normal model described in Section 2.1 and data from Silverman et al. [2021], we estimated posterior distributions of these metrics for the true (pre-amplification) compositions and for compositions after 35 PCR cycles (Figure 1). The dataset comprised ten mock communities with controlled proportions of ten bacterial isolates and four human gut microbiota samples from distinct artificial gut systems. Formally, let denote an -diversity metric on the -dimensional simplex. We define PCR-induced bias as:
where represents the estimated log-ratio composition before amplification (0th cycle) and the per-cycle PCR bias.
Figure 1: -diversity metrics are impacted by PCR bias.
Violin plots show posterior distributions of four -diversity metrics (Shannon, Simpson, Gini, and Aitchison) for in vitro and ex vivo data, estimated before amplification (0th cycle) and after 35 PCR cycles using the model of Silverman et al. [2021]. Density plots to the right show the posterior distributions of the PCR-induced bias (35-cycle value minus 0-cycle value) for two representative samples. PCR introduces substantial, sample-specific shifts, indicating that comparisons of -diversity across groups may reflect technical artifacts rather than true biological differences.
All four metrics are clearly impacted by PCR bias, with values after 35 PCR cycles substantially differing from their true pre-amplification (0-cycle) values (Figure 1). The magnitude and direction of this bias vary across communities, reflecting its dependence on the underlying composition. Consequently, PCR bias can distort not only absolute estimates of diversity but also comparisons across experimental conditions—what we refer to as differential -diversity analyses, such as studies testing whether -diversity differs between cases and controls (e.g., Van Syoc et al. [2022]).
To assess how strongly PCR bias could affect differential -diversity analyses, we used mock community data from Silverman et al. [2021]. Because the original study did not include natural experimental groupings, we applied an optimization procedure to construct groupings that maximized changes in statistical signal across PCR cycles. We quantified this change as , where and are ANOVA values at 0 and 35 PCR cycles, respectively. All four metrics showed substantial shifts: Simpson’s index had a of 0.37, Shannon and Aitchison indices each reached 0.48, and the Gini coefficient exhibited the largest shift at 0.54. These results demonstrate that PCR bias could produce large apparent differences between groups, underscoring the need for caution when interpreting differential -diversity analyses.
To illustrate how PCR bias in -diversity depends on community composition, we used a simplified three-taxon system for visualization (Figure 2 and Supplementary Figure S2). This hypothetical example is not drawn from the mock data but instead illustrates how the magnitude and direction of bias depend on a community’s position in the compositional simplex. Bias is minimal near the edges and vertices, where one or a few taxa dominate, but increases toward the interior, where communities are more even. Intuitively, highly uneven communities are less affected because small changes in relative abundance have little impact on diversity measures: Shannon and Simpson indices are already low in skewed communities, the Gini coefficient is near its upper bound, and the Aitchison norm changes little because extreme compositions occupy fixed positions in log-ratio space. In contrast, even communities are more susceptible—small perturbations can substantially alter the relative balance of taxa, changing Shannon and Simpson diversity, increasing the Gini coefficient, and pushing the Aitchison norm toward more extreme log-ratio coordinates. These visualizations provide intuition for why some communities in the mock and gut data exhibited large shifts in differential -diversity analyses, whereas others were less affected.
Figure 2: Bias in Shannon index varies across the compositional space.
Simplex plots illustrate how PCR amplification alters Shannon diversity under two different bias scenarios. Each point represents an initial (pre-amplification) three-taxon composition, and its color indicates the change in Shannon diversity after 35 PCR cycles. The orange arrow shows the direction and magnitude of the PCR bias vector in composition space, and the green dot marks the unbiased origin. Even small differences in amplification efficiency produce highly non-uniform distortions, with the magnitude of bias depending strongly on the initial community composition.
2.4. Ecological -Diversity Analyses are Also Impacted by PCR Bias
-diversity metrics quantify differences in community composition between samples and are widely used in microbiome research when -diversities alone cannot distinguish communities with similar richness and evenness but different taxonomic structures. Among commonly used indices, Weighted-UniFrac incorporates phylogenetic relationships by weighting abundance differences by branch length, whereas Bray-Curtis measures compositional dissimilarity based on relative abundances [Whittaker, 1960, Lozupone et al., 2007]. Because both metrics are perturbation sensitive, they may be affected by PCR bias. Formally, for a -diversity metric , we define bias as:
where and are the log-ratio compositions of the two communities before amplification.
Using the Bayesian multinomial logistic model from Silverman et al. [2021] and the same human gut data described in Section 2.3, we estimated pairwise Bray-Curtis and Weighted-UniFrac distances before amplification (0 cycles) and after 35 PCR cycles (Figure 3). PCR introduced substantial, sample-specific shifts in both metrics, with the magnitude of bias varying considerably across sample pairs.
Figure 3: PCR bias alters pairwise -diversity estimates.
Heatmaps show the change in Weighted-UniFrac (A) and Bray-Curtis (B) distances between pairs of gut microbiome samples after 35 PCR cycles compared to the pre-amplification (0-cycle) values. Each tile shows the posterior median bias, with 95% credible intervals in parentheses. Variation across pairs indicates that PCR bias introduces systematic, sample-specific distortions in inter-sample dissimilarities.
To evaluate how strongly PCR bias could affect downstream inference, we used the mock community data to test a worst-case scenario for differential -diversity analyses. Because the original study lacked natural experimental groupings, we applied an optimization procedure to construct groupings that maximized the change in PERMANOVA across PCR cycles. We defined this change as , where and are PERMANOVA values at 0 and 35 cycles, respectively. The optimized configuration yielded modest but non-negligible shifts: for Bray-Curtis and 0.12 for Weighted-UniFrac. These results indicate that PCR bias can plausibly distort community-level comparisons.
Finally, to build intuition for why some community pairs are more affected than others, we visualized PCR-induced bias using a simplified three-taxon system (Figure 4 and Supplementary Figure S4). This hypothetical example, included for visualization only, shows how bias depends on the positions of the two communities in the compositional simplex. Bias is minimal when the PCR-induced compositional shift is approximately orthogonal to the differences between the two communities, producing a characteristic X-shaped region of low bias in the simplex. Conversely, when PCR bias aligns with the key taxonomic or phylogenetic differences between communities, even small amplification shifts can substantially inflate or deflate -diversity estimates. This provides intuition for the heterogeneity in bias observed in the mock and gut data.
Figure 4: Bias in Weighted-UniFrac depends on community composition.
Simplex plots show how Weighted-UniFrac bias varies as the second community moves across the compositional simplex relative to a fixed reference , with PCR bias held constant. Color indicates the change in Weighted-UniFrac (35-cycle minus 0-cycle value). The orange arrow shows the direction of PCR bias, the green dot marks the unbiased origin, and the grey circle marks the fixed reference. Regions of minimal bias form a characteristic X-shaped band, illustrating that bias is lowest when PCR-induced shifts are orthogonal to the differences between communities.
2.5. Aitchison Distance Remains Invariant to PCR Bias
Unlike other -diversity metrics, the Aitchison distance is unaffected by PCR bias because it is perturbation invariant, meaning it is unchanged by the additive shifts in log-ratio space introduced during amplification. For two communities and , the Aitchison distance is defined as:
where is the log-ratio transformed relative abundance vector for sample . Intuitively, this distance measures how far apart two communities are in log-ratio space: a value of zero indicates identical relative compositions, whereas larger values reflect greater differences in the compositional structure of the communities. Because the distance is Euclidean, differences accumulate additively across log-ratios, making it straightforward to interpret which log-ratios contribute most to the separation.
As we prove in Corollary 2, the additive perturbations introduced by PCR amplification cancel in this log-ratio representation, leaving the distance unchanged. This invariance has important practical implications. Unlike Bray-Curtis or Weighted UniFrac, which we show to be highly sensitive to PCR bias, the Aitchison distance provides a robust framework for -diversity analysis. Because it remains stable even when amplification bias is present, the Aitchison distance helps ensure that observed differences between communities reflect genuine ecological or experimental variation rather than PCR-induced artifacts.
3. Discussion
While PCR is a routine step in microbiome profiling, differences in taxon-specific amplification efficiencies can introduce substantial bias in estimated community compositions. Primer mismatch effects occur during the initial amplification cycles, but multiple studies have shown that non-primer-mismatch (NPM) sources of bias dominate PCR bias overall [Gimpel et al., 2024, Korvigo et al., 2022, Silverman et al., 2021]. Here we extended prior studies of NPM sources of PCR bias (called PCR bias for brevity) to systematically evaluate how these biases affect ecological diversity analyses.
Our results prove that PCR bias alters both - and -diversity estimates, with the magnitude and direction of distortion depending strongly on the underlying community composition. For -diversity, bias is greatest in communities with even taxonomic structures, where small shifts in relative abundances substantially change diversity metrics, whereas highly skewed communities show minimal distortion. For -diversity, bias is lowest when the PCR-induced perturbation is largely orthogonal to the differences between communities, but can be substantial when it aligns with the primary axes of taxonomic or phylogenetic variation. These composition-dependent effects mean that PCR bias can obscure or exaggerate apparent ecological differences between groups. Using ANOVA for -diversities and PERMANOVA for -diversities, we further demonstrate that bias can amplify or suppress group-level signals, potentially leading to misinterpretation in ordination, clustering, or differential diversity analyses.
On the other hand, when all samples are amplified under the same PCR protocol, the multiplicative model of PCR bias implies that taxon-specific efficiencies are applied uniformly, producing a constant additive shift in log-ratio space. As we prove formally, perturbation-invariant estimands, such as differential log-ratios and the Aitchison distance, are unaffected by such shifts, making them well-suited for downstream analyses when consistent amplification protocols are used.
Given these findings, we recommend using compositional metrics based on log-ratio transformations, such as the Aitchison distance, whenever possible for analyses of PCR-amplified microbiome data. These methods are perturbation invariant and therefore unaffected by PCR bias. However, the choice of metric should ultimately be guided by the scientific question, and some analyses may require traditional diversity measures (e.g., Shannon, Simpson, Gini, Bray-Curtis, Weighted UniFrac). Because these metrics are sensitive to PCR bias and the magnitude of bias depends on community composition, they should be interpreted with caution—both in terms of their absolute values and when used for between-sample comparisons. When such traditional metrics are necessary, we recommend performing calibration experiments (e.g., sequencing the same community with different numbers of PCR cycles) to estimate community-specific amplification biases and adjust the diversity measures following techniques introduced in this article and in Silverman et al. [2021].
Our analysis has focused on continuous functions of relative abundances and has not considered presence/absence-based metrics (e.g., Jaccard, unweighted UniFrac), which are more sensitive to detection thresholds and sequencing noise. Although PCR bias can theoretically push rare taxa below detection, modeling such effects requires explicitly accounting for detection variability and is beyond the scope of this study. Future work should assess how PCR bias interacts with presence/absence metrics and evaluate its impact across a broader range of experimental settings. Independent validation data, such as technical replicates or mock communities, will be essential for determining the operational relevance of these findings in real-world microbiome studies. As more datasets become available, our understanding of the ecological consequences of PCR bias will continue to improve.
4. Methods
4.1. PCR Bias Model with Covariates
We modeled PCR bias using the Bayesian multinomial logistic-normal framework introduced by Silverman et al. [2021], which extends the standard exponential amplification model to accommodate multiple taxa, technical noise, and structured sample covariates. The model is specified as:
where is the vector of sequencing read counts for sample is the corresponding vector of relative abundances, and denotes the log-ratio transformed abundance. The matrix contains regression coefficients, is the residual covariance, and is a covariate vector specifying sample-level features.
This formulation allows for flexible modeling of complex experimental designs. For example, to jointly model samples from two biological communities with varying numbers of PCR cycles, we can construct a design matrix , where each column includes a one-hot encoding of community membership and a numeric variable for PCR cycle count:
with if sample belongs to community (and zero otherwise), and denoting the number of PCR cycles applied. In this case, the coefficient matrix takes the form:
where is the baseline (cycle-0) log-ratio abundance of the contrast in community , and represents the per-cycle PCR amplification bias on that log-ratio. This structure enables simultaneous inference of both biological variation and amplification bias, even when only some samples vary in PCR cycle number.
All model fitting was performed in R using the fido package Silverman et al. [2022].
4.2. Datasets and Preprocessing
We analyzed two publicly available datasets originally described by Silverman et al. [2021]: ten in vitro mock communities composed of known proportions of ten bacterial isolates and four ex vivo human gut microbiota samples derived from distinct artificial gut systems. Each sample was split into three replicates, with each replicate subjected to a different number of PCR cycles prior to sequencing, enabling estimation of all model parameters. All analyses were conducted using the same preprocessed datasets and parameter settings as in the original study. We re-ran the publicly available fido pipeline without modification to estimate posterior distributions of (true relative abundances for each community at cycle 0) and (taxon-specific relative amplification efficiencies).
4.3. Diversity Metrics and Bias Estimation
We evaluated the effect of PCR bias on commonly used ecological diversity measures. Posterior samples from the fitted model were used to estimate taxon relative abundances at 0 and 35 PCR cycles, which served as inputs for downstream calculations.
-diversity.
We analyzed four -diversity metrics: Shannon index, Simpson index, Gini coefficient, and the Aitchison norm. Shannon, Simpson, and Gini indices were computed using custom R functions. The Aitchison norm was calculated as the Euclidean norm of centered log-ratio (CLR) transformed abundance vectors. PCR-induced bias for each metric was defined as:
-diversity.
We evaluated three -diversity metrics: Bray-Curtis dissimilarity, Weighted UniFrac, and Aitchison distance. Bray-Curtis dissimilarities were calculated using the vegdist() function in the vegan package. Weighted UniFrac distances were computed using the UniFrac() function in the phyloseq package with a pruned greengenes2 phylogenetic tree. Aitchison distances were calculated as Euclidean distances in Centered Log-Ratio (CLR) space. Bias was defined analogously to -diversity metrics:
4.4. Statistical Analyses
Alpha diversity.
To quantify the effect of PCR bias on group-level diversity comparisons, we computed mean diversity values per sample at both cycle 0 and cycle 35 and evaluated group differences using ANOVA. A genetic algorithm was applied to identify binary groupings of samples that maximized the absolute change in ANOVA across PCR cycles:
Beta diversity.
To assess changes in community structure, we performed PERMANOVA (adonis2() in vegan) on each -diversity distance matrix. Particle swarm optimization (psoptim() in R) was used to find the grouping of samples that maximized changes in PERMANOVA across PCR cycles, providing a worst-case estimate of how much amplification could distort community-level comparisons.
4.5. Visualization of Composition-Dependent Bias
To build intuition for composition-dependent effects, we simulated PCR bias in a simplified three-taxon system. Bias was evaluated across a grid of compositions in the 3-part simplex, holding the amplification bias vector constant. This visualization highlights how bias depends on the position of communities in the simplex and explains variability observed in the mock community analyses.
Supplementary Material
Acknowledgments
The authors would like to thank Dr. Rachel Silverman for her manuscript comments. JDS was supported by NIGMS R01GM148972-01.
Data and Code Avaibility
All code and data used in this study are available at https://github.com/dharmikrathod/pcr_bias_code.
References
- Acinas Silvia G, Sarma-Rupavtarm Ramahi, Klepac-Ceraj Vanja, and Polz Martin F. Pcr-induced sequence artifacts and bias: insights from comparison of two 16s rrna clone libraries constructed from the same sample. Applied and Environmental Microbiology, 71(12):8966–8969, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caporaso J. Gregory, Lauber Christian L., Walters William A., Berg-Lyons Donna, Lozupone Catherine A., Turnbaugh Peter J., Fierer Noah, and Knight Rob. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proceedings of the National Academy of Sciences, 108:4516–4522, March 2011. ISSN 0027–8424, 1091–6490. doi: 10.1073/pnas.1000080107. URL https://pnas.org/doi/full/10.1073/pnas.1000080107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cassol Ignacio, Ibañez Mauro, and Bustamante Juan Pablo. Key features and guidelines for the application of microbial alpha diversity metrics. Scientific Reports, 15(1):622, 2025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Egozcue Juan José and Pawlowsky-Glahn Vera. Compositional data: the sample space and its structure. Test, 28(3):599–638, 2019. [Google Scholar]
- Eisenstein Michael. Microbiology: making the best of PCR bias. Nature Methods, 15:317–320, 2018. [DOI] [PubMed] [Google Scholar]
- Frey U, Bachmann H, Peters J, Sutter G, and Groner B. Pcr-amplification of gc-rich regions: ‘slowdown pcr’. Nature Protocols, 3(8):1312–1317, 2008. doi: 10.1038/nprot.2008.117. [DOI] [PubMed] [Google Scholar]
- Gimpel Andreas L., Fan Bowen, Chen Dexiong, Wölfle Laetitia O. D., Horn Max, Meng-Papaxanthos Laetitia, Antkowiak Philipp L., Stark Wendelin J., Christen Beat, Borgwardt Karsten, and Grass Robert N.. Deep learning uncovers sequence-specific amplification bias in multi-template pcr. bioRxiv, 2024. doi: 10.1101/2024.09.20.614030. URL https://www.biorxiv.org/content/early/2024/09/20/2024.09.20.614030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Korvigo Ilia, Igolkina Anna A., Kichko Arina A., Aksenova Tatiana, and Andronov Evgeny E.. Be aware of the allele-specific bias and compositional effects in multi-template PCR. PeerJ, 10: e13888, August 2022. ISSN 2167–8359. doi: 10.7717/peerj.13888. URL https://peerj.com/articles/13888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Krehenwinkel Henrik, Wolf Madeline, Lim Jun Ying, Rominger Andrew J, Simison Warren B, and Gillespie Rosemary G. Estimating and mitigating amplification bias in qualitative and quantitative arthropod metabarcoding. Scientific reports, 7(1):1–12, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lian Meng, Chen Long, Hui Cang, Zhu Fuyuan, and Shi Peijian. On the relationship between the gini coefficient and skewness. Ecology and Evolution, 14(12):e70637, 2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lozupone Catherine A, Hamady Micah, Kelley Scott T, and Knight Rob. Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial communities. Applied and environmental microbiology, 73(5):1576–1585, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McLaren Michael R, Willis Amy D, and Callahan Benjamin J. Consistent and correctable bias in metagenomic sequencing experiments. eLife, 8:e46923, 2019. doi: 10.7554/eLife.46923. URL https://elifesciences.org/articles/46923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McLaren Michael R., Nearing Jacob T., Willis Amy D., Lloyd Karen G., and Callahan Benjamin J.. Implications of taxonomic bias for microbial differential-abundance analysis. bioRxiv, 2022. doi: 10.1101/2022.08.19.504330. URL https://www.biorxiv.org/content/early/2022/10/08/2022.08.19.504330. [DOI] [Google Scholar]
- Parada Alma E, Needham David M, and Fuhrman Jed A. Every base matters: assessing small subunit rrna primers for marine microbiomes with mock communities, time series and global field samples. Environmental microbiology, 18(5):1403–1414, 2016. [DOI] [PubMed] [Google Scholar]
- Pawlowsky-Glahn Vera, Egozcue Juan José, and Tolosana-Delgado Raimon. Modeling and analysis of compositional data. John Wiley & Sons, 2015. [Google Scholar]
- Pinto Yishay and Bhatt Ami S.. Sequencing-based analysis of microbiomes. Nature Reviews Genetics, 25(12):829–845, December 2024. ISSN 1471–0056, 1471–0064. doi: 10.1038/s41576-024-00746-6. URL https://www.nature.com/articles/s41576-024-00746-6. [DOI] [PubMed] [Google Scholar]
- Polz Martin F and Cavanaugh Colleen M. Bias in template-to-product ratios in multitemplate pcr. Applied and environmental Microbiology, 64(10):3724–3730, 1998. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Silverman Justin D, Bloom Rachael J, Jiang Sharon, Durand Heather K, Dallow Eric, Mukherjee Sayan, and David Lawrence A. Measuring and mitigating PCR bias in microbiota datasets. PLoS Computational Biology, 17(7):e1009113, 2021. doi: 10.1371/journal.pcbi.1009113. URL https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1009113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Silverman Justin D, Roche Kimberly, Holmes Zachary C, David Lawrence A, and Mukherjee Sayan. Bayesian multinomial logistic normal models through marginally latent matrix-t processes. Journal of Machine Learning Research, 23(7):1–42, 2022. [PMC free article] [PubMed] [Google Scholar]
- Suzuki M T and Giovannoni S J. Bias caused by template annealing in the amplification of mixtures of 16 S rRNA genes by PCR. Applied and Environmental Microbiology, 62(2):625–630, February 1996. ISSN 0099–2240, 1098–5336. doi: 10.1128/aem.62.2.625-630.1996. URL https://journals.asm.org/doi/10.1128/aem.62.2.625-630.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tringe Susannah G and Hugenholtz Philip. A renaissance for the pioneering 16s rrna gene. Current Opinion in Microbiology, 11(5):442–446, 2008. doi: 10.1016/j.mib.2008.09.011. [DOI] [PubMed] [Google Scholar]
- Van Syoc Emily, Weaver Evelyn, Rogers Connie J, Silverman Justin D, Ramachandran Ramesh, and Ganda Erika. Metformin modulates the gut microbiome in broiler breeder hens. Frontiers in Physiology, 13:1000144, 2022. doi: 10.3389/fphys.2022.1000144. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Whittaker Robert Harding. Vegetation of the siskiyou mountains, oregon and california. Ecological monographs, 30(3):279–338, 1960. [Google Scholar]
- Willis Amy D and Martin Bryan D. Estimating diversity in networked ecological communities. Biostatistics, 23(1):207–222, May 2020. ISSN 1465–4644. doi: 10.1093/biostatistics/kxaa015. URL https://doi.org/10.1093/biostatistics/kxaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Jer-Horng, Hong Pei-Ying, and Liu Wen-Tso. Quantitative effects of position and type of single mismatch on single base primer extension. Journal of microbiological methods, 77(3):267–275, 2009. [DOI] [PubMed] [Google Scholar]
- Youssef Noha H, Sheik Cody S, Krumholz Lee R, Najar Fares Z, Roe Bruce A, and Elshahed Mostafa S. Comparison of species richness estimates obtained using nearly complete fragments and simulated pyrosequencing-generated fragments in 16s rrna gene-based environmental surveys. Applied and Environmental Microbiology, 75(16):5227–5236, 2009. doi: 10.1128/AEM.00592-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All code and data used in this study are available at https://github.com/dharmikrathod/pcr_bias_code.




