Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2022 Dec 1;17(12):e0277299. doi: 10.1371/journal.pone.0277299

Quantitative PCR from human genomic DNA: The determination of gene copy numbers for congenital adrenal hyperplasia and RCCX copy number variation

Márton Doleschall 1,*, Ottó Darvasi 2, Zoltán Herold 3, Zoltán Doleschall 4, Gábor Nyirő 1,3, Anikó Somogyi 5, Péter Igaz 1,3,6, Attila Patócs 2,5,7
Editor: H Hakan Aydin8
PMCID: PMC9714944  PMID: 36454796

Abstract

Quantitative PCR (qPCR) is used for the determination of gene copy number (GCN). GCNs contribute to human disorders, and characterize copy number variation (CNV). The single laboratory method validations of duplex qPCR assays with hydrolysis probes on CYP21A1P and CYP21A2 genes, residing a CNV (RCCX CNV) and related to congenital adrenal hyperplasia, were performed using 46 human genomic DNA samples. We also performed the verifications on 5 qPCR assays for the genetic elements of RCCX CNV; C4A, C4B, CNV breakpoint, HERV-K(C4) CNV deletion and insertion alleles. Precision of each qPCR assay was under 1.01 CV%. Accuracy (relative error) ranged from 4.96±4.08% to 9.91±8.93%. Accuracy was not tightly linked to precision, but was significantly correlated with the efficiency of normalization using the RPPH1 internal reference gene (Spearman’s ρ: 0.793–0.940, p>0.0001), ambiguity (ρ = 0.671, p = 0.029) and misclassification (ρ = 0.769, p = 0.009). A strong genomic matrix effect was observed, and target-singleplex (one target gene in one assay) qPCR was able to appropriately differentiate 2 GCN from 3 GCN at best. The analysis of all GCNs from the 7 qPCR assays using a multiplex approach increased the resolution of differentiation, and produced 98% of GCNs unambiguously, and all of which were in 100% concordance with GCNs measured by Southern blot, MLPA and aCGH. We conclude that the use of an internal (in one assay with the target gene) reference gene, the use of allele-specific primers or probes, and the multiplex approach (in one assay or different assays) are crucial for GCN determination using qPCR or other methods.

Introduction

Quantitative PCR (qPCR) was originally developed for virus quantification [1], and has recently attracted more attention owing to the SARS-CoV-2 pandemic. It is most often used for the quantification of mRNA levels [2], but the gene copy number (GCN) determination in diploid genomes has gained benefit from it for a long time [3]. GCN is the number of repeats of a gene in one or two sets of chromosomes. The vast majority of the genes occurs twice in a diploid genome, but the copy numbers of some genes can differ from two. GCN is a non-negative whole number, and the “integer GCN” term is used when this characteristic is emphasized. Variations in GCN contributes to both rare genetic disorders and common diseases in humans [4].

The qPCR for human GCN determination can be distinguished from qPCR for gene expression by some key features: 1.) Genomic DNA is the template. The template complexity, which can reduce the performance of qPCR [5], is much greater in genomic DNA than in total mRNA of a particular tissue: The haploid human genome consists of 3.1 billion base pairs, and millions of base pairs differ between two random haploid chromosome sets [6], while a few hundred genes account for 50% of transcripts in most human tissues [7] covering only a couple of hundred thousand base pairs in total length.

2.) Limit of detection (LOD) is not crucial. The absolute copy number of a target gene in a DNA sample is proportional to the absolute number of haploid chromosome sets, which can be approximately calculated from the mass of genomic DNA in the sample. The absolute number of haploid chromosome sets can be more accurately determined by the quantitative measurement of a reference gene, which invariably occurs once in each haploid chromosome set. The ratio of absolute copy numbers of a target gene and a reference gene in the DNA sample of a subject is identical to the ratio of the copies of target and reference genes in two haploid chromosome sets of a diploid cell. The ratios of the target and reference genes is not conditional on the amount of genomic DNA in a sample, and the GCN of a target gene is easily calculated from this ratio since the GCN of a reference gene is always two in a diploid cell. Therefore, the amount of genomic DNA in a measurement also does not influence GCN (in theory), and can be chosen to be conveniently above the limit of detection (LOD).

3.) The differentiation between greater consecutive GCNs is difficult. The quantification cycle (Cq) is determined by qPCR to characterize the absolute copy number of a gene in reality. Cq is proportional to a relatively short DNA sequence specific to a target or a reference gene, and GCN is calculated from Cqs related to the target and reference genes. GCN determined by qPCR can be called “measured GCN”, and can be a positive real number (for example, a rational number), not necessarily a non-negative whole number. The relationship between Cq and GCN can be described by the equation: Cq(target gene)-Cq(reference gene) = -((log2(GCN)/log2(2))-1. The reference gene Cq is constant in theory, and therefore only the target gene Cq determine GCN. The theoretical difference between two target gene Cqs derived from two consecutive GCNs will approach zero if GCN approaches infinity. This means that the theoretical difference of two target gene Cqs is ∞ between 0 and 1 GCN, ΔCq = 1 between 2 and 1 GCNs, ΔCq = 0.585 between 3 and 2 GCNs, ΔCq = 0.415 between 4 and 3 GCNs, ΔCq = 0.322 between 5 and 4 GCNs, and so on. Therefore, it becomes more and more difficult to differentiate the greater consecutive GCN, which presents the key problem of qPCR as well as other molecular biology methods for GCN determination.

4.) The inaccurately measured GCNs can be easily identified in the majority of cases. Ambiguity is the state of a measured GCN which is not close enough to an integer GCN to assign unequivocally the measured GCN to the integer GCN. The measured GCN is a continuous variable, and therefore a measured GCN can be about halfway between two integer GCNs, which clearly indicates the inaccuracy of the particular measurement. The distribution of several measured GCNs derived from the same integer GCN approaches a normal distribution, resulting in the majority of the measured GCNs around the real integer GCN (unambiguous GCNs), some measured GCNs between the real GCN and an adjacent integer GCN (ambiguous GCNs) and a few measured GCNs around the adjacent integer GCNs (misclassified GCNs).

Returning to the key problem, there are two techniques allowing GCN methods to extend beyond this limitation: 1.) The use of allele-specific primers or probes. Paralogous genes or gene variants are the copies of the same gene (or very similar ones), which have high DNA sequence similarity, and are located at the different loci of a haploid chromosome set. There are very often sequence differences between the paralogous gene variants which can be targeted in an allele-specific way, and the total GCN of the gene variants can be decomposed into smaller GCNs. 2.) The use of several probes (in one or different assays) for the same target gene. The integer GCN of a target gene can be estimated from the measured GCNs of more DNA sequences that are parts of the target gene, which can make the estimation more reliable. This estimation of the integer GCN can be based on the personal decision of an operator [8], a simple mathematical measure such as arithmetic mean [9] or a more complex statistical method such as a classifier.

The above-mentioned features and techniques are well illustrated by RCCX copy number variation (CNV, S1 Table). RCCX CNV usually consists of 1–3 tandem repeats of a DNA segment on one chromosome, and each DNA segment harbors 2 complete genes, complement component 4 (C4) and steroid 21-hydroxylase (CYP21) [10, 11]. Therefore, the copy numbers of the RCCX CNV segment, C4 and CYP21 are identical, and no exception to this has been described yet. Both genes have 2–2 paralogous gene variants. C4 genes are sorted into C4A and C4B genes differing in 5 nucleotides in exon 26. The CYP21 genes sort into a functioning gene (CYP21A2) and a pseudogene (CYP21A1P) having several sequence differences. CYP21A2 contributes to the steroidogenesis of adrenal glands, and its mutations cause congenital adrenal hyperplasia (CAH) [12]. An additional sequence difference of C4 genes derived from a virus insertion in intron 9 (called human endogenous retrovirus K (HERV-K(C4) CNV)), and a RCCX CNV breakpoint, where two CNV segments are joined, are currently being researched [13]. Multiplex ligation-dependent probe amplification (MLPA) for the GCNs of CYP21 genes is commercially available, and is recognized as an appropriate methods in the genetic testing of CAH [14]. MLPA uses multiple CYP21A1P- and CYP21A2-specific probes, and special statistical methods followed by the final evaluation of integer GCNs by the operator.

The GCNs in RCCX CNV are also determined by qPCR based on allele-specific primers or probes [13, 1517]. However, these qPCR assays are singleplex for target genes (target-singleplex) in stark contrast to MLPA, and none of their published documentations completely meets the requirements of “minimum information for publication of quantitative real-time PCR experiments” (MIQE) [18, 19]. In this study we therefore aimed to simultaneously assess the performance of 7 qPCR assays for the GCN determination of the genetic elements of RCCX CNV according to MIQE. Verifications were performed on C4A, C4B, HERV-K(C4) CNV deletion, HERV-K(C4) CNV insertion, and RCCX CNV breakpoint qPCR assays in the current study because some information has been published on these assay performances (S2 Table), whereas single laboratory method validations [20] were completed on CYP21 qPCR assays. Furthermore, our goals were to examine the optimal laboratory strategy of qPCR for GCN determination in general, and the fit-for-purpose of qPCR assay for CYP21A2 GCN determination in the genetic testing of CAH.

Materials and methods

DNA samples

The qPCR validation and verification processes were completely in accordance with MIQE (S1 File). The current research was conducted with the approval by the National Scientific and Ethical Committee, Medical Research Council of Hungary (TUKEB, ETT), approval number 4457/2012/EKU. Written informed consent was given by all of the study subjects. Genomic DNA samples were extracted from whole blood of 10 healthy subjects, 11 patients with CAH and 5 patients with non-functioning adrenal incidentaloma (NFAI) using a Qiagen QIAcube instrument (S3 Table) with Qiagen QIAamp DNA blood mini kit or a Roche DNA isolation kit for mammalian blood (S1 File). Genomic DNA samples were also purchased from the International Histocompatibility Working Group (IHWG). Purchased DNA samples derive from 18 healthy subjects of the HapMap European reference population (CEU) [21] and 2 HLA homozygous cell lines, COX and QBL. Purchased DNA samples were isolated by IHWG using 5-Prime ArchivePure DNA cell/tissue kit, and their nominal concentrations were 100 ng/μl. DNA samples of COX and QBL were applied in a 1:1 mixture. An SD039 reference DNA sample for MLPA with a nominal concentration of 10 ng/μl, which is included in MRC Holland SALSA MLPA probe mix P050 CAH, was also used. The concentration and purity of all DNA stock solutions were determined by a Thermo Fisher Scientific (TFS) NanoDrop 2000 spectrophotometer and a TFS Qubit 1 fluorometer with Qubit dsDNA high sensitivity assay kit, and DNA integrity was checked on 0.7 m/V% agarose gels.

Positive controls, samples for calibration curves and study groups

DNA working solutions with 5 ng/μl DNA concentration were separately diluted from the stock solutions of the DNA samples for 3 replicate measurements, except for ones for positive controls (more than 3 separately diluted working solutions) and for the calibration curve (a series of dilutions). DNA samples derived from our own subjects were divided based on DNA quality into “good quality” (A260/A280>1.8 and A260/A230>2.0 and no sign of DNA degradation) and “bad quality” study groups (n = 10). The SD039 reference sample for MLPA was assigned to the “good quality” group (n = 17). The DNA samples purchased from IHWG were labeled as the “population” group (n = 19).

Measurements

The applied custom primers (Integrated DNA Technologies) and hydrolysis probes (produced by TFS) for C4 genes [16], CYP21 genes [15], HERV-K(C4) CNV alleles and RCCX CNV breakpoint [13] (S4 Table) have been previously published. All these hydrolysis probes had a 5’-fluorochrome, 6-carboxyfluorescein (FAM) reporter, and a 3’-nonfluorescent quencher and a 3’-minor groove binder. The mix for qPCR measurements (S5 Table) also contained ribonuclease P RNA component H1 (RPPH1) internal reference gene included in TFS TaqMan human RNase P copy number reference assay with a probe labeled with 2′-chloro-7′-phenyl-1,4-dichloro-6-carboxy-fluorescein (VIC), and TFS TaqMan fast advanced master mix (FAMM). The concentrations of custom primers and probes were slightly different in different qPCR assays (S5 Table). The same qPCR profile according to the manual of FAMM was used for all 7 qPCR assays. The qPCR measurements were carried on a TFS QuantStudio 7 qPCR instrument (QS7) with TFS QuantStudio software v1.2 except for some experiments with SYBR Green and robustness experiments (see more later). The three DNA working solutions of a DNA sample were separately measured for a target gene on different days except for ones for positive controls. We measured 3 replicates from the same and 3 replicates from the different DNA working solutions of a DNA sample for positive control on each of the three days. The quantification threshold of the RPPH1 reference gene was always 0.1 for relative quantification. The quantification thresholds for the replicate measurements of each target gene were tuned in a way that the average of relative errors (REs) of measurements (based on a preliminary calculation) in a replicate measurement of “good quality” and “population” study group equaled approximately zero (S6 Table). Therefore, all measurements of these study groups (n = 36) were taken into account, instead of the use of arbitrarily selected reference samples and ΔΔCt method. There was no need for the further normalization or correction of the Cq values, and there was no Cq optimization for precision, calibration curve or the variances of RE.

The melting curve analyses were performed with Bioline SensiFAST SYBR Lo-ROX kit on GS7. The micro-capillary electrophoreses of duplex qPCR reactions were carried out by Agilent Bioanalyzer 2100 instrument with Agilent Bioanalyzer high sensitivity DNA kit. MRC Holland SALSA MLPA EK1 reagent kit and probe mix P050 CAH kit on TFS ProFlex PCR and TFS 3130 capillary electrophoresis instruments were used for the measurements of MLPA, and MRC Holland Coffalyser software v140721 for the calculation of MLPA results. Robustness experiments were performed using a Roche LightCycler 1.0 instrument with Roche LightCycler FastStart DNA master SYBR green I reagent, QS7 with TFS TaqMan universal master mix II without uracil-N-glycosylase (UMM2) or a TFS 7500 Fast qPCR instrument (7500F) with FAMM. All qPCR reagents in robustness experiments were used according to the manuals of the manufacturers.

Calculation and statistics

Non-specific PCR products were in silico predicted by Primer Blast [22], and the secondary structures of PCR products were in silico determined by UNAFold [23]. The limit of detection was estimated according to Hubaux and Vos [24], and statistical metrics were used based on MIQE (S6 Table). Statistical analyses were performed with R v4.0.2 [25] and SPSS v26. Normal distribution was tested by Shapiro–Wilk (SW) test. Fisher’s exact test (FE), Student’s t-test, Wilcoxon test, ANOVA with Tukey post-hoc test, Kruskal-Wallis (KW) test with Dunn post-hoc test, Pearson’s correlation, Spearman’s rank correlation and Levene’s test were used for basic statistics. Tests were two-tailed, p-values were corrected with the false discovery rate (FDR) method, and p<0.05 was considered as statistically significant. A linear mixed-effect model of the R package lme4 [26] was applied to the analyses of slopes of calibration curves. Integer GCNs were estimated from measured GCNs by a machine learning classifier, the linear discriminant analysis (LDA).

Results

Parameters of genomic DNA samples

DNA concentration, A260/A280, A260/A230 were determined, and DNA integrity was also tested. The genomic DNA samples purchased from IHWG were assigned to the “population” study group (n = 19). The DNA samples of the subjects, enrolled by us, were divided based on DNA quality into “good quality” (A260/A280≥1.8 and A260/A230≥2.0 and no sign of DNA degradation) and “bad quality” study groups (n = 10). A reference sample (SD039) was also assigned to the “good quality” group (n = 17). The mean and SD belonging to the A260/A280 and A260/A230 quality parameters of the stock solutions of genomic DNA sample were 1.848±0.043 and 2.064±0.483 in the „good quality”, 1.817±0.049 and 1.899±0.182 in the „population”, and 1.787±0.046 and 2.152±0.089 in the „bad quality” study group (S1 File). The “population” group included 5 DNA samples of 19 which had slightly lower A260/A280 values than 1.8. However, all samples were sample intact in the “population” and “good quality” groups. The DNA samples in the “bad quality” study group were partially degraded. The concentrations of DNA stock solutions measured by Qubit were significantly lower (Wilcoxon test: p<0.0001) than those by Nanodrop (S1 Fig) agreeing with a previous findings [27]. Nevertheless, genomic DNA concentrations theoretically affect neither precision nor accuracy in qPCR as long as DNA concentration is in the linear range of the measurement.

Analytical specificity

Putative non-specific PCR products were found only at the primer pair of HERV-K(C4) CNV insertion target element using Primer-BLAST (S7 Table). Non-specific PCR product was observed only at the primer pairs of C4A and C4B target genes by melting curve analyses (S2 Fig), and the micro-capillary electrophoresis of duplex qPCR reactions confirmed this finding. Nevertheless, the same PCR product is amplified from both C4A and C4B gene variants, and allele-specific hydrolysis probes have also discriminated between the different target sequences of the mixed PCR products generated by the same primer pair in a previous study [28]. A couple of nucleotide differences in a target sequence can completely block the binding of the non-specific probe, and a non-specific PCR product can bind a specific probe by chance with very low probability. Therefore, the non-specific PCR product does not necessarily distort the quantitative PCR performance of C4 assays.

Matrix effect of genomic DNA

The Cq values (S1 File) of the RPPH1 reference gene from different assays could be compared, because the concentration of RPPH1 reagents, qPCR running parameters and the quantification threshold for RPPH1 were identical in all assays. There were no significant differences between the RPPH1 Cqs in replicate measurements (SW FDR: p = 0.676–0.768; ANOVA: p = 0.632; Tukey: p = 0.717–1.000) (S3 Fig), but significant differences (SW FDR: p = 0.002–0.958, 3 significant ones out of 46; KW: p<0.0001; Dunn: 39.7% significant pairs) were observed between DNA samples (Fig 1). The matrix effect of genomic DNA (sample-to-sample variation) could cause the differing Cq means of different samples, because the causes, derived from the quality and quantity of DNA solutions, could be ruled out; the DNA quality was completely checked, the DNA extractions were performed in 2 independent laboratories using different methods, the concentrations of DNA stock solutions were double-checked with 2 different methods, and Cqs were measured from 3 independent diluted working solution series.

Fig 1. Cq values (N = 966) of the RPPH1 reference gene grouped by DNA samples.

Fig 1

The measurements of only one, predefined DNA working solution were taken into account for a calibration curve or a positive control DNA sample. A light blue dot indicates one RPPH1 Cq value. Means and standard deviations are indicated by bars. The mean of RPPH1 Cq (26.0724) is indicated by a horizontal grey line.

Linearity, PCR efficiency and analytical sensitivity

Separate calibration curves were measured using 3 different genomic DNA samples in each assay, and 5 DNA samples were selected in total 1.) to ensure the GCNs of the target genes were as diverse as possible, and 2.) to measure each sample at least 3 times (S4 and S5 Figs). The effects of samples and different assays on the regression slopes were examined by a linear mixed-effects model (S8 Table). Samples had twice as large an effect on the standard deviation (SD) of slopes than the assays. The effect of samples on slope deviations is more likely to arise from the matrix effect of genomic DNA (template complexity) than from the deviated PCR inhibition of different samples because there is no difference in the deviation of Cqs from the lines of calibration curves between low and high DNA concentrations.

The average PCR efficiencies in different assays were around 1 (S9 Table), and there were no significant differences between them (SW FDR: p = 0.857, ANOVA: p = 0.333, Tukey: p = 0.356–1.000 for target genes, SW FDR: p = 0.687–0.983, ANOVA: p = 0.604, Tukey: p = 0.627–1.000 for reference genes). The PCR efficiencies of target and RPPH1 pairs from the same assays and DNA samples showed a strong and significant correlation (Spearman’s ρ = 0.7426, p = 0.0001) (S6 Fig), indicating that RPPH1 effectively compensated for the matrix effect on PCR efficiency. Estimated LODs were around the theoretical limit, which seems overestimated (S9 Table). Nevertheless, the lowest dilution of calibration curves contained approximately 400–1200 copies of genomic template depending on the GCN, and all 63 measurements on this dilution produced adequate Cqs and GCNs, indicating that the lowest applied concentration was well above LOD, in agreement with a previous study [29].

Precision

Repeatability and reproducibility were assessed from the same and separate dilutions of positive control samples, and both of them showed low and quite homogenous coefficient of variation % (CV%) values throughout the assays (S10 Table). Reproducibility values were also assessed in “good quality”, “population” and “bad quality” study groups producing comparable results, and the highest pooled CV% was 1.01.

Gene copy numbers and their concordance

The measured GCNs between ±0.3 of an integer GCN were considered as unambiguous. The ambiguity of GCNs is usually defined by a customarily applied fixed limit, which increases the ambiguous GCNs at higher integer GCNs. For instance, a 10% difference between the integer and measured GCNs means an unambiguous result at a GCN of 2 (1.8 or 2.2 measured GCNs), but an ambiguous one at a GCN of 4 (3.6 or 4.4 measured GCNs) using a ±0.3 fixed limit. The majority of the average measured GCNs of samples in “good quality” and “population” study groups (Fig 2, S1 File) were unambiguous except for the GCNs of the HERV-K(C4) CNV insertion assay (S11 Table). Measured GCNs in the “bad quality” group were markedly ambiguous in CYP21A1P and CYP21A2 assays, implying a high sensitivity of these assays to DNA quality.

Fig 2. Average measured gene copy numbers (GCNs) of samples in different qPCR assays for RCCX CNV.

Fig 2

Bars indicate standard deviation.

There is no reference method or certified reference material for GCN determinations in RCCX CNV. However, MLPA is most often used for CYP21 genes in the genetic testing of CAH [14]. Furthermore, the GCNs of C4A and C4B in the CEU human reference population [30, 31] and the full RCCX CNV DNA sequences [32, 33] and GCNs [13] of COX and QBL HLA homozygous cell lines have been determined (S1 File). The GCNs were fully concordant in SD039 and COX-QBL samples, and C4A, C4B and RCCX CNV breakpoint GCNs were reasonably concordant with the previous results of Southern blot and array comparative genome hybridization (CGH) (S12 Table). CYP21A1P and CYP21A2 GCNs were also suitably concordant with GCNs determined by MLPA (S7 Fig). The precisions of MLPA probes ranged between 12.60–38.62 CV% (S13 Table). The ambiguity was significantly higher for the CYP21A2 MLPA probe, which recognizes the same insertion allele of a 8 bp genetic variant than CYP21A2 qPCR assay, compared to the ambiguity of qPCR (35% vs 11% ambiguous GCNs, FE FDR: p = 0.024) in spite of appropriate quality controls of MLPA (S8 Fig). The reproducibility of MLPA was also assessed with the same dilutions of positive control samples in the same way as performed in qPCR assays; The reproducibility of GCNs based on positive controls for β-defensin loci are 6.25 CV% for qPCR and 2.88 CV% for MLPA in a previous study [34], whereas they were 5.08 CV% for CYP21A1P qPCR, 3.04 CV% for CYP21A2 qPCR, 4.84 CV% for CYP21A1P MLPA and 7.52 CV% for CYP21A2 MLPA in the current study.

Expected gene copy numbers, estimated integer gene copy numbers and consistency

The larger deviation (above 0.4) of a particular measured GCN from the expected GCN calculated by the multiple linear regression of other GCNs in the same sample highlighted the inconsistent results (S9 Fig, S1 File). Integer GCNs were estimated from the measured GCNs by LDA. Total C4, total CYP21, total HERV-K(C4) CNV GCNs in addition to RCCX CNV breakpoint GCN plus 2 were all equal, and used for the estimation of total integer GCNs. The integer GCN of a paralogous gene was identical to total GCN, where the GCN of the other paralogous gene was 0. The integer GCNs of a paralogous gene pair having both GCNs larger than 0 were estimated from the measured GCNs of the paralogous gene pair and RCCX CNV breakpoint (Fig 3).

Fig 3. Locations of samples based on discriminant scores generated by linear discriminant analyses (LDA).

Fig 3

A.) LDA of total gene copy numbers (GCNs). The GCNs of C4 genes, CYP21 genes, the alleles of HERV-K(C4) CNV and the copy number of RCCX CNV breakpoint + 2 equal each other as well as the copy number of RCCX CNV segment due to genome biology reasons. Grey numbers indicate the total integer GCNs of the particular clusters. B.) LDA of C4 genes. First grey number indicates the integer C4A GCNs of the particular clusters, and the second one indicates the integer C4B GCNs. C.) LDA of CYP21 genes. First grey number indicates the integer CYP21A1P GCNs of the particular clusters, and the second one indicates the integer CYP21A2 GCNs. D.) LDA of the alleles of HERV-K(C4) CNV. First grey number indicates the integer HERV-K(C4) CNV deletion GCNs of the particular clusters, and the second one indicates the integer HERV-K(C4) CNV insertion GCNs. Blue empty circle indicates the NA11839 sample, which was correctly classified for total GCN in spite of its incorrect input class, but its cross-validation of LDA showed a different class for C4 genes. Red arrow indicates the H005 sample, which also received a different class of total GCN by cross-validation than its original class.

The estimations on integer GCNs were unambiguous in 3 cases (1.94%) based on the cross-validation and probabilities of LDA, while the majority of estimations (N = 150, 98.06%) passed the cross-validation (S1 File). All unambiguously estimated integer GCNs were in 100% of concordance with the integer GCNs measured by Southern blot, MLPA and array CGH (N = 137), suggesting that the probabilities and cross-validation of LDA accurately indicated the ambiguity at estimated integer GCNs. Furthermore, all unambiguously estimated integer GCNs showed 100% consistency with each other.

Estimated accuracy

The RE of measurements in study groups sometimes significantly deviated from a normal distribution and from each other (Fig 4). However, there was no such significant difference (SW FDR: 0.055–0.892; ANOVA: 0.552–0.972) in REs grouped by replicate measurements (S10 Fig). Significant differences were between REs in the same assay grouped by GCNs (S11 Fig), although a clear tendency could not be observed. The REs grouped by DNA samples (Fig 5) reflected a significant matrix effect of genomic DNA (SW FDR: p = 0.060–0.987; ANOVA: p<0.0001; Tukey: 21.5% significant pairs), but the matrix effect on accuracy seemed to show a lesser extent than that on RPPH1 Cqs.

Fig 4. Estimated accuracy of different qPCR assays for RCCX CNV grouped by study group.

Fig 4

Estimated accuracy is expressed by the relative error of the qPCR measurements. Samples with rarer GCNs were selected for “good quality” and “bad quality” groups, which may cause the difference between these groups and the “population” group in HERV-K(C4) CNV assays, whereas the lower GCNs of the “bad quality” study group in CYP21A1P and CYP21A2 assays was already observed at measured GCNs. Relative errors were not calculated for the samples with 0 GCNs in the particular assay. Bars indicate means and standard deviation. SW—Shapiro–Wilk test, FDR—false discovery rate method for multiple testing correction, KW—Kruskal-Wallis test.

Fig 5. Estimated accuracy grouped by samples.

Fig 5

Estimated accuracy is expressed by the relative error of the qPCR measurements. Relative errors were not calculated for the samples with 0 GCNs. Bars indicate means and standard deviation.

The means and SDs belonging to the absolute values of average REs of samples in the assays were between 4.96±4.08% and 9.91±8.93% (S12 Fig). The distributions of average REs fitted to normal distributions. The normal distribution of a particular assay, characterized by mean and SD, was assumed around every integer GCNs of the particular assay to estimate the ambiguity and misclassification rates. The estimated ambiguity and misclassification rates of assays (Table 1) were in accordance with observed ambiguity and concordance (S11 and S12 Tables). The estimated ambiguities were fairly moderate at a GCN of 2 in the assays with better performance, whereas the estimated misclassifications were sufficiently low at a GCN of 3. The ambiguity and misclassification rates were also estimated based on the normal distributions calculated by the average REs of each GCN in a particular assay (S14 Table).

Table 1. Estimated ambiguity and misclassification rates of different qPCR assays for RCCX CNV at different gene copy numbers (GCNs).

C4A assay C4B assay CYP21A1P assay CYP21A2 assay HERV-K(C4) CNV deletion assay HERV-K(C4) CNV insertion assay RCCX CNV breakpoint assay
ambiguity
at 1 GCN
0.02% 0.01% 0.15% 0.03% >0.01% 2.57% >0.01%
ambiguity
at 2 GCN
6.16% 4.94% 11.25% 7.07% 2.78% 26.47% 2.05%
ambiguity
at 3 GCN
21.27% 19.01% 29.00% 22.83% 14.24% 45.71% 12.23%
ambiguity
at 4 GCN
35.00% 32.58% 42.74% 36.6% 27.12% 57.71% 24.65%
misclassification at 1 GCN >0.01% >0.01% >0.01% >0.01% >0.01% >0.01% >0.01%
misclassification at 2 GCN >0.01% >0.01% 0.02% >0.01% >0.01% 0.93% >0.01%
misclassification at 3 GCN 0.36% 0.23% 1.37% 0.49% 0.06% 8.28% 0.03%
misclassification at 4 GCN 2.92% 2.19% 6.41% 3.50% 1.02% 19.32% 0.68%

Estimations were performed based on the normal distributions (characterized by mean and standard deviation) of average relative errors of samples.

Robustness of CYP21A1P and CYP21A2 assays

The robustness was screened through precision in CYP21 genes, which was reasonably similar to the general setup except when using different qPCR reagents and instruments (S15 Table). Performance was assessed with UMM2 and a 7500F (S13S16 Figs and S16S20 Tables). The estimated ambiguity and misclassification rates with UMM2 were similar to the better ones of the assays measured by the general setup, despite the higher imprecision and worse linearity. The PCR efficiencies between CYP21 target and RPPH1 showed a higher correlation (S17 Fig) with UMM2 than with 7500F. Normalized root-mean-square error (NRMSE) is a common measure of the differences between a predicted value and a measured one. NRMSE was used to characterize how efficiently the Cqs of a target gene follows the Cqs of the RPPH1. The reproducibility values of target and RPPH1 were significantly correlated from sample to sample in all CYP21 assays, as well as NRMSE and accuracy, but there was no correlation between precision and accuracy (S21 Table). The same relationships could be observed in other assays (S22 Table). Moreover, the SDs of accuracies (the SDs of the average REs of samples) in the assays from the default and robustness experiments of “good quality” and “population” study groups were significantly correlated with observed ambiguities and misclassifications (Spearman’s ρ = 0.671, p = 0.029 and ρ = 0.769, p = 0.009).

Discussion

Despite the observation that the performance parameters of 7 different qPCR assays in the current study vary, the number of assays was large enough and the conditions of measurements were homogenous enough to draw generalized conclusions. The precision based on Cqs and calibration curve parameters were not strongly linked to the accuracy of GCNs. The HERV-K(C4) CNV insertion assay showed similarly good precision and calibration curve parameters to other assays, but its accuracy (RE) had a high SD. In contrast, worse precision and calibration curves were observed in the assays of CYP21 genes measured with UMM2, but their accuracies were similar to the assays with high performance. The accuracy is directly associated with ambiguity and misclassification, and therefore accuracy should be considered as the key performance parameter of qPCR for GCN. The matrix effect of genomic DNA observed at RPPH1 Cqs, calibration curves, and even accuracy to a lesser extent, seemed to have a great impact on the performance of qPCR for GCN. The higher effectiveness of the normalization to the RPPH1 reference gene was correlated with the higher accuracy, and the normalization probably contributed to the reduced matrix effect on accuracy. If ambiguity of around 5% and misclassification under 1% are considered acceptable, the methodological limit of a singleplex qPCR assay for a target gene with 3 replicates proved to be a GCN of 2 for ambiguity and a GCN of 3 for misclassification in general. A lower methodological limit of GCN [35], and a similar one [36] have been described in the literature of qPCR. At any rate, the higher range of integer GCNs occurring in patients or a study population often exceeds the methodological GCN limit of target-singleplex qPCR, producing enough ambiguous GCNs and misclassifications that we should question the fit-for-purpose at higher GCNs.

The usage of multiple target sequences from a multiplex assay [37] or separate assays [38] can overcome the GCN limitation of the target-singleplex method. The measured GCN of the multiple target genetic elements were classified using LDA as a multiplex approach. The majority of integer GCNs (98.06%) estimated from measured GCNs passed the cross-validation of LDA and were qualified as unambiguous. Furthermore, all these unambiguously estimated integer GCNs were in 100% concordance with the integer GCN estimations from MLPA and the findings of Southern blot and array CGH from previous studies [30, 31]. LDA was highly effective, even using the ambiguous data of qPCR assays with lower accuracy, such as HERV-K(C4) CNV insertion assay and DNA samples with bad quality. The low sample size of a GCN class limits the performance of the classifier, so it may be worth using a reference set enriched with samples having rarer GCNs. The multiplex approach could render qPCR for the genetic elements of RCCX CNV very effective, however, it would be interesting to see how this multiplex approach for qPCR tackles a target region with higher average GCN than that found in RCCX CNV.

Several studies [29, 34, 3941] contrast one molecular biology method for GCN with another, often having the ambition to pronounce one of them more advanced or suitable. However, the final conclusions of these studies are controversial, and it is difficult to draw a general conclusion because: 1.) The performances are compared using only a few metrics. GCNs and their concordance between the examined methods are the typical levels of comparison. Methodological differences can hamper the comparison at multiple levels; for instance, the measurement from a single replicate does not allow for the use of many standard performance metrics such as CV%. 2.) Performance metrics are poorly assessed or documented. This is well-illustrated by qPCR experiments where usually only a fraction of performance metrics required by MIQE are published. Ambiguity between measured and integer GCNs is seldom stated explicitly, but it can be assessed in the majority of GCN determination methods, and the performance of a method can be inferred from it. 3.) Study conclusions are drawn from one or a couple of assays. The performance of a GCN method is limited from the direction of high performance due to theoretical or practical reasons, but it can be mediocre to any extent owing to the poor design or execution. Therefore, an assay of a particular method with higher performance characterizes better the particular method than one with lower performance. For instance, the reproducibility of GCN in MLPA for CAH in the current study was a little bit worse than the reproducibility in MLPA for β-defensin in a previous article [34], and the MLPA results for β-defensin more aptly characterize MLPA in general. 4.) The performance of a GCN method is only compared to other methods, not to the fitness for purpose. The target-singleplex performance of MLPA for the same genetic variant also detected by CYP21A2 qPCR assay were lower, but an inappropriate fit-for-purpose does not ensue from this. The real strength of MLPA is the multiplex approach, which provides appropriate final integer GCN results, and the relatively simple procedure. In addition, the MLPA for the genetic test of CAH has some potential to identify CAH mutations and chimeric CYP21 gene variants in the same assay [42].

Southern blot was the first GCN determination method for RCCX CNV [43], and uses a multiplex approach because the unlabeled genomic DNA fragment pattern bound to the membrane can be examined with several probes for the elements of RCCX CNV in succession. The disadvantages of Southern blot include high labor intensity, high time demand, and only semi-quantitative GCN results based on a human operator’s evaluation [44], decreasing its suitability for the genetic test of CAH. Array CGH is a high-throughput method, but its high labor intensity, high time demand and high cost do not fit to the needs of CAH laboratories, where the vast majority of array CGH results would not be used. Other GCN determination methods based on the multiple elements of RCCX CNV such as the paralog ratio test and the high resolution melting PCR have been described [30, 45]. Digital PCR has been used for the GCN determination of C4 paralogous gene variants and HERV-K(C4) CNV [46], but the methodological performance of the digital PCR has not been evaluated on the genetic elements of RCCX CNV yet.

The CYP21A2 qPCR assay in the current study produces a reasonable number of ambiguous results at a GCN of 2, and the measurements of ambiguous GCNs can be conveniently repeated because the labor intensity and time demand of qPCR is low. Lower GCNs (0 and 1) are frequently examined for genetic testing, and misclassification for these GCNs is very low. A GCN of 3 seldom has to be examined, and therefore, the high ambiguity at a GCN of 3 is acceptable, as is the low level of misclassification. Furthermore, the analysis of CAH mutations has to be performed in the genetic testing of CAH, and this should correspond to GCNs, indicating the possible misclassification at a GCN of 3 [47]. Overall, therefore, we suggest that, the target-singleplex CYP21A2 qPCR assay fits for the purpose of the genetic testing of CAH.

Supporting information

S1 Fig. The relationship between the concentrations of DNA stock solutions measured by NanoDrop 2000 spectrophotometer and Qubit 1 fluorometer.

Pearson’s correlation coefficient is indicated by “r”. The data of one predefined dilution (first of separately diluted ones) of each sample for positive control and calibration curve was included in the study groups. Repeatability values of DNA concentration determinations based on positive control samples were 3.93 CV% for NanoDrop and 2.58 CV% for Qubit, and reproducibility values were 3.99 CV% and 4.22 CV%, respectively. The correlation of all samples between NanoDrop and Qubit was high (Pearson’s r = 0.887, p<0.0001), although the concentrations of DNA stock solutions measured by Qubit were significantly lower (Wilcoxon test: p<0.0001). Pearson correlation coefficients (r) between NanoDrop and Qubit values in the “bad quality” study group suggest that DNA quality influenced DNA concentration measurements.

(TIF)

S2 Fig. Melting curve analyses of quantitative PCR primers for different target genes using positive control samples.

Rn is the normalized fluorescence of the reporter dye. Red arrow shows the peak of a non-specific product.

(TIF)

S3 Fig. Cq values (N = 966) of the RPPH1 reference gene grouped by target genes and replicate measurements.

The measurements of only one, predetermined DNA working solution were taken into account for a calibration curve or a positive control DNA sample. A light blue dot indicates one RPPH1 Cq value. Means and standard deviations are indicated by bars. The mean of RPPH1 Cq (26.0724) is indicated by a horizontal grey line. Replicate measurement 1, 2 and 3 are indicated by p1, p2 and p3.

(TIF)

S4 Fig. Calibration curves of target genetic elements.

Ntemp is the genomic copy number of the particular target genetic elements, which is conditional on the amount of genomic DNA in the series of dilutions (2.5, 5, 10, 20, 40 and 80 ng total DNA in a measurement) and the gene copy number of target elements in the diploid genome (1–4). CI means 95% confidence interval. The CIs of the lines are not depicted because they would be too close the lines to discern.

(TIF)

S5 Fig. Calibration curves of the RPPH1 reference gene from different qPCR assays.

Ntemp is the genomic copy number of the target genetic elements of the RPPH1 gene, which is conditional on the amount of genomic DNA in the series of dilutions (2.5, 5, 10, 20, 40 and 80 ng total DNA in a measurement). CI means 95% confidence interval. The CIs of the lines are not depicted because they would be too close the lines to discern.

(TIF)

S6 Fig. Relationship between the PCR efficiencies of target genes and corresponding RPPH1 reference genes of each sample for calibration curves.

Black line is a simple linear regression. Dotted line indicates the 95% confidence intervals.

(TIF)

S7 Fig. Relationships between gene copy numbers (GCNs) of qPCR assays for CYP21 genes and MLPA for CAH.

Ambiguity thresholds (±0.3 of integer GCNs for qPCR and according to the manual for MLPA) are indicated by grey dotted lines. Bars indicate the standard deviation of GCNs in the case of qPCR, and the standard deviation of the dosage quotients (equivalent of GCN in MLPA) of different probes for the same CYP21 gene in the case of MLPA.

(TIF)

S8 Fig. Relationship between gene copy numbers (GCNs) of qPCR assays and MLPA probes for the same two alleles of an 8 bp deletion variant in CYP21 genes (rs387906510, c.332_339delGAGACTAC).

Two MLPA hybridization probes (15221-L20261 and 15221-L20262) detect the two alleles of the same 8 bp deletion variant in CYP21 genes, which is also detected by CYP21A1P and CYP21A2 qPCR assays in the current study. Therefore, these probes and assays are suitable for direct comparison after normalization with an internal reference probe. The MLPA reference probe 16316-L21434 was used for the normalization of the CYP21A2 probe (15221-L20261) and CYP21A1P probe (15221-L20262), because this reference probe showed the highest correlation with the CYP21 probes and the other MLPA reference probes. Only the GCNs of the replicates of the first replicate measurement were used for qPCR, because MLPA according to the official manual applies to one measurement of each DNA sample by default. The ratios of CYP21 and reference probes of MLPA were tuned to approximately zero average relative error, as was done for qPCR. Ambiguity thresholds (±0.3 of integer GCNs) are indicated by grey dotted lines.

(TIF)

S9 Fig. Relationship of measured and expected GCNs in qPCR assays for RCCX CNV.

A multiple regression model was built from all measured GCNs of study groups based on the genomic relations. The expected total GCNs and RCCX CNV breakpoint GCN plus 2 were calculated in each replicate measurement based on the model. Then the expected average GCN of a particular target gene was calculated using the average of corresponding expected total GCNs in proportion to measured GCNs of the particular target genes and its allelic counterpart.

(TIF)

S10 Fig. Estimated accuracy of different qPCR assays for RCCX CNV grouped by replicate measurements.

Estimated accuracy is expressed as the relative error of the qPCR measurements. Relative errors were not calculated for the samples with 0 GCN in the particular assay. Bars indicate means and standard deviation.

(TIF)

S11 Fig. Estimated accuracy of different qPCR assays for RCCX CNV grouped by gene copy numbers (GCNs).

Estimated accuracy is expressed as the relative error of the qPCR measurements. Relative errors were not calculated for the samples with 0 GCN in the particular assay. Bars indicate means and standard deviation. All statistical tests were calculated based on “good quality” and “population” study groups. SW—Shapiro–Wilk test, FDR—false discovery rate method for multiple testing correction, KW—Kruskal-Wallis test.

(TIF)

S12 Fig. Average estimated accuracy of “good quality” and “population” study groups in different qPCR assays for RCCX CNV.

Estimated accuracy is expressed as the average relative error of the samples. The means and SDs of absolute values of average relative errors of the samples are indicated under the p-values of the Shapiro-Wilk test (SW). Relative errors were not calculated for the samples with 0 GCNs in the particular assay. Bars indicate means and standard deviation. FDR—false discovery rate method for multiple testing correction. The variances of average relative errors of samples in were significantly different (Levene’s test: p = 0.0026) between assays. However, only the difference between HERV-K(C4) CNV insertion and RCCX CNV breakpoint assays was significant (Levene’s test FDR: p = 0.04996) after multiple testing correction.

(TIF)

S13 Fig. Calibration curves of CYP21A1P and CYP21A2 target and RPPH1 reference genes for robustness.

Ntemp is the copy number of the particular target genetic element in a measurement, which is conditional on the amount of genomic DNA in the series of dilutions (2.5, 5, 10, 20, 40 and 80 ng total DNA in a measurement) and the copy number of the particular genetic element in the diploid genome. CI means 95% confidence interval. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

(TIF)

S14 Fig. Average measured gene copy numbers of samples in CYP21A1P and CYP21A2 qPCR assays for robustness.

Bars indicate standard deviation. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

(TIF)

S15 Fig. Estimated accuracy of measurements in CYP21A1P and CYP21A2 qPCR assays for robustness grouped by study group.

Estimated accuracy is expressed by the relative error of the qPCR measurements. Relative errors were not calculated for the samples with 0 GCN in the particular assay. Bars indicate means and standard deviation. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

(TIF)

S16 Fig. Average estimated accuracy of samples in CYP21A1P and CYP21A2 qPCR assays for robustness.

Estimated accuracy is expressed by the average relative error of the samples. Relative errors were not calculated for the samples with 0 GCN in the particular assay. Bars indicate means and standard deviation. SW—Shapiro–Wilk test, FDR—false discovery rate method for multiple testing correction. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

(TIF)

S17 Fig. Relationship between the PCR efficiencies of target genes and corresponding RPPH1 reference genes of each sample in CYP21A1P and CYP21A2 qPCR assays for robustness.

Correlations were made under the assumption that the PCR efficiencies of CYP21A1P and CYP21A2 assays behave in a similar way. Black line is a simple linear regression. Dotted line indicates the 95% confidence intervals. UMM2—TaqMan universal master mix II,7500F - 7500 Fast qPCR instrument.

(TIF)

S1 Table. Glossary of important terms used in the current article.

There is no official consensus on the terms of ambiguity, concordance, misclassification and consistency, but they have been widely used in the literature of gene copy number determination.

(PDF)

S2 Table. Performance information of quantitative PCR (qPCR) for gene copy number (GCN) determination in selected literature.

The term “accuracy” is used as a difference between a measured value and a “true” value determined by a reference material or method. “Ambiguity” is a state of a measured GCN not being close enough to an integer GCN to assign it clearly. “Misclassification” is a state of an integer GCN estimated from measured GCN not being identical to the genuine integer GCN. PMID–PubMed ID, avr–average, SD—standard deviation, ΔCq−Cq(target gene)-Cq(reference gene), CV%—coefficient of variation %. 1There are criteria, but no detailed information. 2It is implicitly stated on an graph. 3It is expressed in relation to the results of other method(s) for GCN determination.

(PDF)

S3 Table. The address of the manufacturers of reagents and instruments.

(PDF)

S4 Table. Primers and hydrolysis probes used in the current study.

All primers and probes were purified by HPLC. At least one of the primers of a primer pair is bound to an intronic sequence. Allele-specific sites are indicated on the sequences by underscore.

(PDF)

S5 Table. The concentrations of quantitative PCR reagents in different set-ups.

The RNaseP copy number (CN) reference assay contains the RPPH1 internal reference gene. A reaction usually contained 10 ng genomic DNA, but the reactions for calibration curves also contained 2.5, 5, 20, 40 or 80 ng genomic DNA. TaqMan fast advanced master mix was used as qPCR reagent with AmpliTaq™ Fast DNA Polymerase. Additional Mg2+, additional dNTP and other additives were not added to the reactions. MicroAmp fast 96-well reaction plates (cat. no.: 4346907) were used for qPCR measurements.

(PDF)

S6 Table. Statistical metrics of the current study.

m(gDNA)–mass of genomic DNA, Mnucl avr−average molar mass of nucleotide, Nnucl in hap gen−the amount of nucleotide pairs in the haploid human genome, NA−Avogadro’s number, CV–coefficient of variation, GCN–gene copy number, mGCNrep—measured gene copy number of replicate, Cq−quantification cycle, REmGCN−relative error of measured GCN of replicate, eiGCN–estimated integer gene copy number, NRMSE–normalized root-mean-square error.

(PDF)

S7 Table. Parameters of analytical specificity assessed by in silico analysis, melting curve analysis and Agilent Bioanalyzer 2100 (micro-capillary electrophoresis).

Non-specific PCR product was observed only at the primer pairs of C4A and C4B target genes.

(PDF)

S8 Table. Analyses of the slopes of calibration curves using a linear mixed-effect model.

Fixed effects were calculated based on all calibration curves of target genes or the RPPH1 gene, and a random effect of assays or samples were separately modeled. Both fixed effects of the target and RPPH1 reference genes were calculated from all assays, which can be interpreted as the estimated average slopes. This were very close to a perfect value of -3.322, corresponding to a PCR efficiency of 1. Individual random effects are expressed as a standard deviation from the fixed effect.

(PDF)

S9 Table. The PCR efficiencies and estimated limit of detection (LOD) of target genetic elements and RPPH1 reference genes of different qPCR assays in singleplex or duplex PCR reaction.

The PCR efficiencies and estimated LOD were also assessed in the singleplex PCR reaction of each target or reference gene using the AI001 DNA sample. There were no significant differences between average PCR efficiencies from multiplex reactions (SW FDR: p = 0.857, ANOVA: p = 0.333, Tukey: p = 0.356–1.000 for target genes, SW FDR: p = 0.687–0.983, ANOVA: p = 0.604, Tukey: p = 0.627–1.000 for reference genes). LOD was estimated by the Hubaux-Vos method. SD—standard deviation, CI—confidence interval.

(PDF)

S10 Table. Precisions of target and RPPH1 reference genes in different qPCR assays for RCCX CNV.

All precisions are calculated by pooled coefficient of variation (CV) and expressed as CV%. Repeatability and reproducibility with same and different dilutions were assessed in positive control samples. The measurements of replicates for reproducibility were performed on different days. Reproducibility in “good quality”, “population” and “bad quality” study groups was assessed in samples with GCNs higher than zero. Some tendencies might be observed; the repeatability values from the measurement of the same dilutions tended to be lower than those from measurement of different dilutions, and repeatability values tended to be lower than reproducibility values.

(PDF)

S11 Table. Ambiguities in different qPCR assays for RCCX CNV.

The measured GCNs between ±0.3 of an integer GCN were considered as unambiguous.

(PDF)

S12 Table. Concordance in different qPCR assays for RCCX CNV.

Concordance was assessed in the samples with unambiguous gene copy numbers (GCNs) compared to the GCNs from MLPA, Southern blot and array CGH or to the estimated integer GCNs. Percentages indicate the rate of correctly determined GCNs. Percentage is not calculated when n<9.

(PDF)

S13 Table. Precisions of MLPA probes for CAH.

The peak heights of MLPA probes were determined by Coffalyser software with the default setting. All precisions are calculated by pooled coefficient of variation (CV) and expressed byas CV%. The precisions (repeatability and reproducibility) were assessed with the same dilutions of positive control samples in the same way as performed in qPCR assays. Peak heights equaling zero were excluded from calculations.

(PDF)

S14 Table. Estimated ambiguity and misclassification rates of different qPCR assays for RCCX CNV at different gene copy numbers (GCNs).

Estimations were done based on the means and variances of average relative errors of different GCNs. Estimation were not performed for GCNs with less than 5 samples.

(PDF)

S15 Table. Precisions of target and RPPH1 reference genes in CYP21A1P and CYP21A2 qPCR assays for robustness.

The primers of CYP21A1P and CYP21A2 genes were tested in a LightCycler 1.0 instrument with Sybr Green dye, and several qPCR parameters such as total volume, annealing temperature, primer concentration, probe concentration, qPCR reagent (UMM2) and qPCR instrument (7500F) were changed in the FAMM-GS7 system usually used for the current study. All precisions are calculated by pooled coefficient of variation (CV) and expressed as CV%. Repeatability and reproducibility were assessed in positive control samples from the same dilution. FAMM—TaqMan fast advance master mix, UMM2—TaqMan universal master mix II, GS7—GeneStudio 7 qPCR instrument, 7500F - 7500 Fast qPCR instrument.

(PDF)

S16 Table. The PCR efficiencies of CYP21A1P and CYP21A2 target and RPPH1 reference genes for robustness.

SD—standard deviation, CI—confidence interval. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

(PDF)

S17 Table. Ambiguities in CYP21A1P and CYP21A2 qPCR assays for robustness.

UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

(PDF)

S18 Table. Misclassifications in CYP21A1P and CYP21A2 qPCR assays for robustness.

Misclassifiaction was assessed in the samples with unambiguous GCNs compared to estimated integer GCNs. Percentage is not calculated for n<9. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

(PDF)

S19 Table. Statistical tests for the variances of average relative errors of samples in CYP21A1P and CYP21A2 qPCR assays for robustness.

The variances of average relative errors of samples in the “good quality” and “population” study groups of the assays with UMM2 and 7500F were significantly different. Top left cell of the table contains the test results for all four groups, other cells contains the results between pairs and after multiple testing correction by the false discovery rate method. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

(PDF)

S20 Table. Estimated ambiguity and misclassification rates of CYP21A1P and CYP21A2 qPCR assays for robustness at different gene copy numbers (GCNs).

Estimations were made based on the means and standard deviation of average relative errors. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

(PDF)

S21 Table. Correlation between reproducibility values (coefficients of variance (CVs)) of target Cqs of samples, reproducibility values of the reference Cqs of samples, normalized root-mean-square errors (NRMSEs) of the target Cqs of samples, and accuracy (the average relative errors of samples) in “good quality” and “population” study groups of CYP21A1P and CYP21A2 qPCR assays for robustness and RCCX CNV.

Pooled CV of target and reference gene (reproducibility or inter-assay precision), pooled NRMSD of target Cqs characterizing the normalization of Cqs of target genes by Cqs of the reference gene, and the standard deviation of the average relative error used for the estimation of ambiguity and misclassification rates are given for information purposes. Correlation was assessed by Spearman’s correlation. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

(PDF)

S22 Table. Correlation between reproducibility values (coefficients of variance (CVs)) of target Cqs of samples, reproducibility values of the reference Cqs of samples, normalized root-mean-square errors (NRMSEs) of the target Cqs of samples and accuracy (the average relative errors of samples) in “good quality” and “population” study groups of C4A, C4B, HERV-K(C4) CNV deletion, HERV-K(C4) CNV insertion and RCCX CNV breakpoint qPCR assays.

Pooled CV of target and reference gene (reproducibility or interassay precision), pooled NRMSD of target Cqs characterizing the normalization of Cqs of target genes by Cqs of reference gene, and the standard deviation of average relative error used for the estimation of ambiguity and misclassification rates are given for information purposes. Correlation was assessed by Spearman’s correlation.

(PDF)

S1 File. Sheet 1: MIQE checklist.

All essential information (E) and Desirable information (D) are submitted in the current manuscript. Sheet 2: Data of DNA stock solutions. h—healthy, nfai—non-functioning adrenal incidentaloma, sv—simple virilizing congenital adrenal hyperplasia, sw—salt wasting congenital adrenal hyperplasia; g—good quality, p—population, b—bad quality; Q—Qiagen QIAcube wtih QIAamp DNA blood mini kit, R—Roche DNA isolation kit for mammalian blood, 5P - 5 Prime ArchivePure DNA cell/tissue kit; i—intact, sd—slightly degraded, pd—partially degraded. Sheet 3: Raw Cq values of the main qPCR experiments. Outliers were not identified using quartile ± 1.5 * interquartile range criterion. NTC—no template control. Sheet 4: Raw peak heights of MLPA probes. Coffalyser software with the default settings was used for the determination. Sheet 5: Raw dosage quotient of CYP21A1P and CYP21A2 MLPA probes. Coffalyser software with the default settings was used for the determination. Sheet 6: Raw Cq values for the repeatability and reproducibility values of the robustness experiments. Outliers were not identified using quartile ± 1.5 * interquartile range criterion. NTC—no template control; FAMM—TaqMan fast advance master mix, UMM2—TaqMan universal master mix II, GS7—GeneStudio 7 qPCR instrument, 7500F - 7500 Fast qPCR instrument. Sheet 7: Raw Cq values for the detailed CYP21 robustness experiments. Outliers were not identified using quartile ± 1.5 * interquartile range criterion. NTC—no template control; FAMM—TaqMan fast advance master mix, UMM2—TaqMan universal master mix II, GS7—GeneStudio 7 qPCR instrument, 7500F - 7500 Fast qPCR instrument. Sheet 8: Detailed gene copy number results. GCN—gene copy number, avr—average, SD—standard deviation, lwr—lower, upr—upper, CI—95% confidence interval, CV%—coefficient of variation %, DQ—dosage quotient, CGH—comparative genome hybridization; wd—with dilution, wod—without dilution; g—good quality, p—population, b—bad quality; orange number or amb—ambiguous result, red number—misclassification, orange background—not all MLPA probes were used for a CYP21 gene due to chimeric gene, red background—MLPA was inconclusive probably due to double chimeric genes. Sheet 9: Expected gene copy numbers. GCN—gene copy number, avr—average, exp—expected, dev—deviation, mod—modified SD—standard deviation, lwr—lower, upr—upper, CI—95% confidence interval, CV%—coefficient of variation %, DQ—dosage quotient, CGH—comparative genome hybridization; g—good quality, p—population, b—bad quality; orange number—ambiguous result, red number—misclassification at GCN or a deviation above 0.4 at avr dev GCNs or a misclassification at rounded GCNs. Sheet 10: Detailed results of linear discriminant analyses. The estimated total integer gene copy number (GCN) differed from the input class in sample NA11839 (the only sample having inconsistency in its input classes) and in sample H005 which has a relatively low probability value of classification. Cross-validation, when the estimated integer GCN of one sample is estimated without its input class and is based on the data of all other samples, supported the estimation in the former case, and led to a different classification in the latter one. A deviated classification of C4 genes was also observed by cross-validation in sample NA11839. The cross-validation of CYP21 genes could not be performed in two samples (H004 and H010) with the rare 3 GCN of CYP21A2, because their classes consisted of only one sample. avr—average, P—probability, red number—inconsistent input class, orange background—low probability or difference between input, resultant or cross-validation class, yellow background—cross-validation cannot be applied, red background—ambiguous estimated integer GCN.

(XLSX)

Acknowledgments

We are indebted to Mark Eyre for English proofreading. We thank Prof. Barna Vasarhelyi for ensuring an inspiring research environment. Otto Darvasi passed away before the submission of the final version of this manuscript. Marton Doleschall accepts responsibility for the integrity and validity of the data collected and analyzed.

Data Availability

All relevant data are within the manuscript and its Supporting Information files. The minimal data set will be also available in Zenodo database (DOI: 10.5281/zenodo.6780358) after the publication of the current paper.

Funding Statement

The current research was supported by Semmelweis Science and Innovation Fund to MD (STIA-KF-17) and Hungarian Scientific Research Fund to AP (K125231). MD was supported by Janos Bolyai Research Scholarship from the Hungarian Academy of Sciences, and the UNKP-19-4 New National Excellence Program of the Ministry for Innovation and Technology. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Abbott MA, Poiesz BJ, Byrne BC, Kwok S, Sninsky JJ, et al. (1988) Enzymatic gene amplification: qualitative and quantitative methods for detecting proviral DNA amplified in vitro. J Infect Dis 158: 1158–1169. doi: 10.1093/infdis/158.6.1158 [DOI] [PubMed] [Google Scholar]
  • 2.VanGuilder HD, Vrana KE, Freeman WM (2008) Twenty-five years of quantitative PCR for gene expression analysis. Biotechniques 44: 619–626. doi: 10.2144/000112776 [DOI] [PubMed] [Google Scholar]
  • 3.Gelmini S, Orlando C, Sestini R, Vona G, Pinzani P, et al. (1997) Quantitative polymerase chain reaction-based homogeneous assay with fluorogenic probes to measure c-erbB-2 oncogene amplification. Clin Chem 43: 752–758. [PubMed] [Google Scholar]
  • 4.Fanciulli M, Petretto E, Aitman TJ (2010) Gene copy number variation and common human disease. Clin Genet 77: 201–213. doi: 10.1111/j.1399-0004.2009.01342.x [DOI] [PubMed] [Google Scholar]
  • 5.Yao Y, Nellaker C, Karlsson H (2006) Evaluation of minor groove binding probe and Taqman probe PCR assays: Influence of mismatches and template complexity on quantification. Mol Cell Probes 20: 311–316. doi: 10.1016/j.mcp.2006.03.003 [DOI] [PubMed] [Google Scholar]
  • 6.Lupski JR, Belmont JW, Boerwinkle E, Gibbs RA (2011) Clan genomics and the complex architecture of human disease. Cell 147: 32–43. doi: 10.1016/j.cell.2011.09.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Mele M, Ferreira PG, Reverter F, DeLuca DS, Monlong J, et al. (2015) Human genomics. The human transcriptome across tissues and individuals. Science 348: 660–665. doi: 10.1126/science.aaa0355 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Concolino P, Mello E, Toscano V, Ameglio F, Zuppi C, et al. (2009) Multiplex ligation-dependent probe amplification (MLPA) assay for the detection of CYP21A2 gene deletions/duplications in congenital adrenal hyperplasia: first technical report. Clin Chim Acta 402: 164–170. doi: 10.1016/j.cca.2009.01.008 [DOI] [PubMed] [Google Scholar]
  • 9.Walker S, Janyakhantikul S, Armour JA (2009) Multiplex Paralogue Ratio Tests for accurate measurement of multiallelic CNVs. Genomics 93: 98–103. doi: 10.1016/j.ygeno.2008.09.004 [DOI] [PubMed] [Google Scholar]
  • 10.Banlaki Z, Szabo JA, Szilagyi A, Patocs A, Prohaszka Z, et al. (2013) Intraspecific evolution of human RCCX copy number variation traced by haplotypes of the CYP21A2 gene. Genome Biol Evol 5: 98–112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Handsaker RE, Van Doren V, Berman JR, Genovese G, Kashin S, et al. (2015) Large multiallelic copy number variations in humans. Nat Genet 47: 296–303. doi: 10.1038/ng.3200 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.El-Maouche D, Arlt W, Merke DP (2017) Congenital adrenal hyperplasia. Lancet 390: 2194–2210. doi: 10.1016/S0140-6736(17)31431-9 [DOI] [PubMed] [Google Scholar]
  • 13.Wu YL, Savelli SL, Yang Y, Zhou B, Rovin BH, et al. (2007) Sensitive and specific real-time polymerase chain reaction assays to accurately determine copy number variations (CNVs) of human complement C4A, C4B, C4-long, C4-short, and RCCX modules: elucidation of C4 CNVs in 50 consanguineous subjects with defined HLA genotypes. J Immunol 179: 3012–3025. doi: 10.4049/jimmunol.179.5.3012 [DOI] [PubMed] [Google Scholar]
  • 14.Baumgartner-Parzer S, Witsch-Baumgartner M, Hoeppner W (2020) EMQN best practice guidelines for molecular genetic testing and reporting of 21-hydroxylase deficiency. Eur J Hum Genet. doi: 10.1038/s41431-020-0653-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Szabo JA, Szilagyi A, Doleschall Z, Patocs A, Farkas H, et al. (2013) Both positive and negative selection pressures contribute to the polymorphism pattern of the duplicated human CYP21A2 gene. PLoS One 8: e81977. doi: 10.1371/journal.pone.0081977 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Szilagyi A, Blasko B, Szilassy D, Fust G, Sasvari-Szekely M, et al. (2006) Real-time PCR quantification of human complement C4A and C4B genes. BMC Genet 7: 1. doi: 10.1186/1471-2156-7-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Parajes S, Quinterio C, Dominguez F, Loidi L (2007) A simple and robust quantitative PCR assay to determine CYP21A2 gene dose in the diagnosis of 21-hydroxylase deficiency. Clin Chem 53: 1577–1584. doi: 10.1373/clinchem.2007.087361 [DOI] [PubMed] [Google Scholar]
  • 18.Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, et al. (2009) The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem 55: 611–622. doi: 10.1373/clinchem.2008.112797 [DOI] [PubMed] [Google Scholar]
  • 19.Bustin S, Nolan T (2017) Talking the talk, but not walking the walk: RT-qPCR as a paradigm for the lack of reproducibility in molecular research. Eur J Clin Invest 47: 756–774. doi: 10.1111/eci.12801 [DOI] [PubMed] [Google Scholar]
  • 20.Theodorsson E (2012) Validation and verification of measurement methods in clinical chemistry. Bioanalysis 4: 305–320. doi: 10.4155/bio.11.311 [DOI] [PubMed] [Google Scholar]
  • 21.The International HapMap Project (2003) The International HapMap Project. Nature 426: 789–796. doi: 10.1038/nature02168 [DOI] [PubMed] [Google Scholar]
  • 22.Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, et al. (2012) Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics 13: 134. doi: 10.1186/1471-2105-13-134 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Markham NR, Zuker M (2008) UNAFold: software for nucleic acid folding and hybridization. Methods Mol Biol 453: 3–31. doi: 10.1007/978-1-60327-429-6_1 [DOI] [PubMed] [Google Scholar]
  • 24.Hubaux A, Vos G (1970) Decision and detection limits for linear calibration curves. Anal Chem 42: 849–855. [Google Scholar]
  • 25.R Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
  • 26.Bates D, Machler M, Bolker BM, Walker SC (2015) Fitting Linear Mixed-Effects Models Using lme4. Journal of Statistical Software 67: 1–48. [Google Scholar]
  • 27.Nakayama Y, Yamaguchi H, Einaga N, Esumi M (2016) Pitfalls of DNA Quantification Using DNA-Binding Fluorescent Dyes and Suggested Solutions. PLoS One 11: e0150528. doi: 10.1371/journal.pone.0150528 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Lee LG, Connell CR, Bloch W (1993) Allelic discrimination by nick-translation PCR with fluorogenic probes. Nucleic Acids Res 21: 3761–3766. doi: 10.1093/nar/21.16.3761 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Leung EKY, Agolini E, Pei X, Melis R, McMillin GA, et al. (2017) Validation of an Extensive CYP2D6 Assay Panel Based on Invader and TaqMan Copy Number Assays. Journal of Applied Laboratory Medicine 1: 471–482. doi: 10.1373/jalm.2016.021923 [DOI] [PubMed] [Google Scholar]
  • 30.Fernando MM, Boteva L, Morris DL, Zhou B, Wu YL, et al. (2010) Assessment of complement C4 gene copy number using the paralog ratio test. Hum Mutat 31: 866–874. doi: 10.1002/humu.21259 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, et al. (2010) Origins and functional impact of copy number variation in the human genome. Nature 464: 704–712. doi: 10.1038/nature08516 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Horton R, Wilming L, Rand V, Lovering RC, Bruford EA, et al. (2004) Gene map of the extended human MHC. Nat Rev Genet 5: 889–899. doi: 10.1038/nrg1489 [DOI] [PubMed] [Google Scholar]
  • 33.Norman PJ, Norberg SJ, Guethlein LA, Nemat-Gorgani N, Royce T, et al. (2017) Sequences of 95 human MHC haplotypes reveal extreme coding variation in genes other than highly polymorphic HLA class I and II. Genome Res 27: 813–823. doi: 10.1101/gr.213538.116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Perne A, Zhang X, Lehmann L, Groth M, Stuber F, et al. (2009) Comparison of multiplex ligation-dependent probe amplification and real-time PCR accuracy for gene copy number quantification using the beta-defensin locus. Biotechniques 47: 1023–1028. doi: 10.2144/000113300 [DOI] [PubMed] [Google Scholar]
  • 35.Bubner B, Gase K, Baldwin IT (2004) Two-fold differences are the detection limit for determining transgene copy numbers in plants by real-time PCR. BMC Biotechnol 4: 14. doi: 10.1186/1472-6750-4-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Weaver S, Dube S, Mir A, Qin J, Sun G, et al. (2010) Taking qPCR to a higher level: Analysis of CNV reveals the power of high throughput qPCR to enhance quantitative resolution. Methods 50: 271–276. doi: 10.1016/j.ymeth.2010.01.003 [DOI] [PubMed] [Google Scholar]
  • 37.Jiang W, Johnson C, Simecek N, Lopez-Alvarez MR, Di D, et al. (2016) qKAT: a high-throughput qPCR method for KIR gene copy number and haplotype determination. Genome Med 8: 99. doi: 10.1186/s13073-016-0358-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Hollox EJ, Detering JC, Dehnugara T (2009) An integrated approach for measuring copy number variation at the FCGR3 (CD16) locus. Hum Mutat 30: 477–484. doi: 10.1002/humu.20911 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Whale AS, Huggett JF, Cowen S, Speirs V, Shaw J, et al. (2012) Comparison of microfluidic digital PCR and conventional quantitative PCR for measuring copy number variation. Nucleic Acids Res 40: e82. doi: 10.1093/nar/gks203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Cantsilieris S, Western PS, Baird PN, White SJ (2014) Technical considerations for genotyping multi-allelic copy number variation (CNV), in regions of segmental duplication. BMC Genomics 15: 329. doi: 10.1186/1471-2164-15-329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Fode P, Jespersgaard C, Hardwick RJ, Bogle H, Theisen M, et al. (2011) Determination of beta-defensin genomic copy number in different populations: a comparison of three methods. PLoS One 6: e16768. doi: 10.1371/journal.pone.0016768 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Chen W, Xu Z, Sullivan A, Finkielstain GP, Van Ryzin C, et al. (2012) Junction site analysis of chimeric CYP21A1P/CYP21A2 genes in 21-hydroxylase deficiency. Clin Chem 58: 421–430. doi: 10.1373/clinchem.2011.174037 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Chung EK, Yang Y, Rennebohm RM, Lokki ML, Higgins GC, et al. (2002) Genetic sophistication of human complement components C4A and C4B and RP-C4-CYP21-TNX (RCCX) modules in the major histocompatibility complex. Am J Hum Genet 71: 823–837. doi: 10.1086/342777 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Cantsilieris S, Baird PN, White SJ (2013) Molecular methods for genotyping complex copy number polymorphisms. Genomics 101: 86–93. doi: 10.1016/j.ygeno.2012.10.004 [DOI] [PubMed] [Google Scholar]
  • 45.Jaimes-Bernal CP, Trujillo M, Marquez FJ, Caruz A (2020) Complement C4 Gene Copy Number Variation Genotyping by High Resolution Melting PCR. Int J Mol Sci 21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Sekar A, Bialas AR, de Rivera H, Davis A, Hammond TR, et al. (2016) Schizophrenia risk from complex variation of complement component 4. Nature 530: 177–183. doi: 10.1038/nature16549 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Doleschall M, Luczay A, Koncz K, Hadzsiev K, Erhardt E, et al. (2017) A unique haplotype of RCCX copy number variation: from the clinics of congenital adrenal hyperplasia to evolutionary genetics. Eur J Hum Genet 25: 702–710. doi: 10.1038/ejhg.2017.38 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

H Hakan Aydin

17 May 2022

PONE-D-22-07409Quantitative PCR from human genomic DNA: the determination of gene copy numbers for CAH and RCCX CNVPLOS ONE

Dear Dr. Doleschall,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jul 01 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

H. Hakan Aydin, MD, FAACC

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for including the following ethics statement on the submission details page:

'The current research was conducted with the approval by the National Scientific and

Ethical Committee, Medical Research Council of Hungary (TUKEB, ETT). Approval

number is 4457/2012/EKU.'

Please also include this information in the ethics statement in the Methods section of your manuscript.

3. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If the need for consent was waived by the ethics committee, please include this information.

4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

5. Please include your full ethics statement in the ‘Methods’ section of your manuscript file. In your statement, please include the full name of the IRB or ethics committee who approved or waived your study, as well as whether or not you obtained informed written or verbal consent. If consent was waived for your study, please include this information in your statement as well.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This manuscript is presenting qPCR approached for determination of gene copy number variation for the CAH and RCCX CNV from human genomes. They use a lot a of great statistical analysis to evaluate the value, the standard deviation and abilities to many primers, assays set to evaluate it. Methods seem to be well done and will be interesting for the readers of the journal but more work need to be done to make this paper more clear for the reader. Many section need to be redone, they indicated in acknowledgments someone that help with the English but believe need to be done again. I will ID some not clear sections in my comments but more work need to be done to the entire manuscript. Not all acronym are defined well and this manuscript presents so many supplement materials important and cited in the text that is is also difficult to follow. More than 20 figures and 20 tables if we included the one not supplement in.

Specific comments:

Title: Please define CAH and RCCX CVN, CAH dont seem to be defined in the manuscript

Abstract: Define the RCCX CVN.

P3L46 to L53, rephrase this section not clear, you mention the DNA genomic template and GCN is a whole number, very not clear and difficult to follow.

P4L62-67 difficult to follow as well.

P5L104 need a dot after File). L 106 will change by Qiagen with "using a Qiagen ..."

P6L109, it was not clear here what it mean by "Purchase DNA samples ..." and believe those sample were the samples used for "Population" important part in the manuscript but difficult to understand, you purchase blood for your population studies? or DNA not sure I follow here?

All material and method is lacking on clarity.

L116 "DNA working solution" of what?

L117 not sure I follow what you mean by "in a separate parallel measurement? Clarify this paragraph?

L127 Do you need the double "M" in "FAMM"

L129 and over the manuscipt you need to add the brand or cie name, ex for the Quantstudio 7 qPCR system (QS7)

Define also RPPH1 reference gene. L134 Info on the Bioline qPCR reagents? What PCR profile did you use? Also not sure the S5-table is clear enough about the master mixes?

L159 Primer-BLAST need reference

L206 define the HERV-K(c4) CNV?

Was not easy to follow what exactly mean in your manuscript the "good quality" "population" "bad quality" and they are related to your Purchase DNA?

Figure S1: do you have a "r" or it should be a "r square"

S4-table: What is the information on dye and quencher used?

In abstract you mentioned it was tested with Southern, MLPA and aCGH was it discussed? Very short discussion.

Reviewer #2: The paper technical sound and do the data support the conclusions. Data analysis appropriately and rigorously and presented in an intelligible fashion and English language is need revision. The data underlying the findings described in their manuscript fully available without restriction, with rare exception . The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Dec 1;17(12):e0277299. doi: 10.1371/journal.pone.0277299.r002

Author response to Decision Letter 0


30 Jun 2022

@Reviewer #1:

We thank the Reviewer #1 for valuable remarks. We think that the remarks have helped us to greatly improve the manuscript.

Please find our answers below.

@Many section need to be redone, they indicated in acknowledgments someone that help with the English but believe need to be done again.

The English has been revised by a native speaker biologist.

@I will ID some not clear sections in my comments but more work need to be done to the entire manuscript.

The manuscript has been revised, as proposed.

@Not all acronym are defined well

The absent definitions have been added.

@and this manuscript presents so many supplement materials important and cited in the text that is is also difficult to follow. More than 20 figures and 20 tables if we included the one not supplement in.

The first full version of the current manuscript was double as long as the submitted one. A lot of information was placed from the main text to the legend of supplemental materials because of the word limit required by many journals, but we have placed back the most important pieces of information. We have tried to balance the amount of provided information in the main text against its length.

@Specific comments:

The locations of changes are indicated according to „Revised Manuscript with Track Changes”.

@Title: Please define CAH and RCCX CVN, CAH dont seem to be defined in the manuscript

We have replaced “CAH” with “congenital adrenal hyperplasia” and “CNV” with “copy number variation” in the title. The exact definition of RCCX CNV is in S1 Table. We think the explanation for RCCX is too long and complicated to place into the title. CAH is defined in P8L140 of the manuscript.

@Abstract: Define the RCCX CVN.

We have added “a CNV” to the Abstract to help understanding. The length of the Abstract is limited, so the exact definition for RCCX is too long to add it on.

@P3L46 to L53, rephrase this section not clear, you mention the DNA genomic template and GCN is a whole number, very not clear and difficult to follow.

P4L62-67 difficult to follow as well.

We have rewritten the all section: “The qPCR for human GCN determination can be distinguished from qPCR for gene expression by some key features: 1.) Genomic DNA is the template. The template complexity, which can reduce the performance of qPCR, is much greater in genomic DNA than in total mRNA of a particular tissue: The haploid human genome consists of 3.1 billion base pairs, and millions of base pairs differ between two random haploid chromosome sets, while a few hundred genes account for 50% of transcripts in most human tissues covering only a couple of hundred thousand base pairs in total length.

2.) Limit of detection (LOD) is not crucial. The absolute copy number of a target gene in a DNA sample is proportional to the absolute number of haploid chromosome sets, which can be approximately calculated from the mass of genomic DNA in the sample. The absolute number of haploid chromosome sets can be more accurately determined by the quantitative measurement of a reference gene, which invariably occurs once in each haploid chromosome set. The ratio of absolute copy numbers of a target gene and a reference gene in the DNA sample of a subject is identical to the ratio of the copies of target and reference genes in two haploid chromosome sets of a diploid cell. The ratios of the target and reference genes is not conditional on the amount of genomic DNA in a sample, and the GCN of a target gene is easily calculated from this ratio since the GCN of a reference gene is always two in a diploid cell. Therefore, the amount of genomic DNA in a measurement also does not influence GCN (in theory), and can be chosen to be conveniently above the limit of detection (LOD).

3.) The differentiation between greater consecutive GCNs is difficult. The quantification cycle (Cq) is determined by qPCR to characterize the absolute copy number of a gene in reality. Cq is proportional to a relatively short DNA sequence specific to a target or a reference gene, and GCN is calculated from Cqs related to the target and reference genes. GCN determined by qPCR can be called “measured GCN”, and can be a positive real number (for example, a rational number), not necessarily a non-negative whole number. The relationship between Cq and GCN can be described by the equation: Cq(target gene)-Cq(reference gene) = -((log2(GCN)/log2(2))-1. The reference gene Cq is constant in theory, and therefore only the target gene Cq determine GCN. The theoretical difference between two target gene Cqs derived from two consecutive GCNs will approach zero if GCN approaches infinity. This means that the theoretical difference of two target gene Cqs is ∞ between 0 and 1 GCN, ΔCq=1 between 2 and 1 GCNs, ΔCq=0.585 between 3 and 2 GCNs, ΔCq=0.415 between 4 and 3 GCNs, ΔCq=0.322 between 5 and 4 GCNs, and so on. Therefore, it becomes more and more difficult to differentiate the greater consecutive GCN, which presents the key problem of qPCR as well as other molecular biology methods for GCN determination.

4.) The inaccurately measured GCNs can be easily identified in the majority of cases. Ambiguity is the state of a measured GCN which is not close enough to an integer GCN to assign unequivocally the measured GCN to the integer GCN. The measured GCN is a continuous variable, and therefore a measured GCN can be about halfway between two integer GCNs, which clearly indicates the inaccuracy of the particular measurement. The distribution of several measured GCNs derived from the same integer GCN approaches a normal distribution, resulting in the majority of the measured GCNs around the real integer GCN (unambiguous GCNs), some measured GCNs between the real GCN and an adjacent integer GCN (ambiguous GCNs) and a few measured GCNs around the adjacent integer GCNs (misclassified GCNs).”

@P5L104 need a dot after File).

We have added it to the text.

@L 106 will change by Qiagen with "using a Qiagen ..."

We have modified the text in P9L169 as it was requested.

@P6L109, it was not clear here what it mean by "Purchase DNA samples ..." and believe those sample were the samples used for "Population" important part in the manuscript but difficult to understand, you purchase blood for your population studies? or DNA not sure I follow here?

We have modified the text: “Genomic DNA samples were also purchased from the International Histocompatibility Working Group (IHWG).”

@L116 "DNA working solution" of what?

We have modified the text: “DNA working solutions with 5 ng/µl DNA concentration were separately diluted from the stock solutions of the DNA samples for 3 replicate measurements”

@L117 not sure I follow what you mean by "in a separate parallel measurement? Clarify this paragraph?

We have clarified the paragraph: “DNA working solutions with 5 ng/µl DNA concentration were separately diluted from the stock solutions of the DNA samples for 3 replicate measurements, except for ones for positive controls (more than 3 separately diluted working solutions) and for the calibration curve (a series of dilutions). DNA samples derived from our own subjects were divided based on DNA quality into “good quality” (A260/A280>1.8 and A260/A230>2.0 and no sign of DNA degradation) and “bad quality” study groups (n=10). The SD039 reference sample for MLPA was assigned to the “good quality” group (n=17). The DNA samples purchased from IHWG were labeled as the “population” group (n=19).”

@L127 Do you need the double "M" in "FAMM"

FAM usually stands for fluorescein amidites in molecular biology, which are important synthetic equivalents of fluorescein dye, and used for labeling oligonucleotide probes. We think it would be ideal to reserve this acronym for the dyes.

@L129 and over the manuscipt you need to add the brand or cie name, ex for the Quantstudio 7 qPCR system (QS7)

We have removed the information from S3 Table and added it to the text as it was requested.

@Define also RPPH1 reference gene.

We have defined it in P10L200.

@L134 Info on the Bioline qPCR reagents?

We have added it to the text in P11L220.

@What PCR profile did you use?

The qPCR profiles were used according to the manuals of manufacturers. We have added the information to the text in P11L205-6.

@Also not sure the S5-table is clear enough about the master mixes?

We think all of the information is provided now in Materials and Methods section (P10L195-P11L205), S5 Table and the legend of S5 Table.

@L159 Primer-BLAST need reference

We have added it to the text in P12L232.

@L206 define the HERV-K(c4) CNV?

We have added it to the text in P6L116, and we have also added some additional information to the S1 Table.

@Was not easy to follow what exactly mean in your manuscript the "good quality" "population" "bad quality" and they are related to your Purchase DNA?

They are study groups as described in P10L189-193. The study groups enable us to yield more detailed insights. For example, the estimated accuracies of study groups significantly differed only in certain qPCR assays as shown in Fig. 4. The qPCR for GCN determination in general has been claimed by an article (PMID: 21364933) to be very sensitive to quality of the genomic DNA, generating systematic biases. However, this conclusion has been drawn from one qPCR assay. Our results confirmed this conclusion only partly, and also pointed to that the experimental results based on one assay should not be generalized.

@Figure S1: do you have a "r" or it should be a "r square"

The “r” stands for Pearson’s rank correlation. We have clarified it in the legend of S1Fig. The “r” is directly related to the coefficient of determination “r square” in the obvious way. “r” is a number between -1 and 1 which characterize the direction of the correlation (negative or positive), whereas “r square” is a number between 0 and 1 without the directional component.

@S4-table: What is the information on dye and quencher used?

The custom Taqman probes, used in the current study, contained a 3’-nonfluorescent quencher and a 3’-minor groove binder. We have not found the exact chemical compositions for them. We have added the information to the text in P10L198-199.

@In abstract you mentioned it was tested with Southern, MLPA and aCGH was it discussed?

We have added the next sentences to the text: “Furthermore, all these unambiguously estimated integer GCNs were in 100% concordance with the integer GCN estimations from MLPA and the findings of Southern blot and array CGH from previous studies.” and ”Southern blot was the first GCN determination method for RCCX CNV [43], and uses a multiplex approach because the unlabeled genomic DNA fragment pattern bound to the membrane can be examined with several probes for the elements of RCCX CNV in succession. The disadvantages of Southern blot include high labor intensity, high time demand, and only semi-quantitative GCN results based on a human operator’s evaluation [44], decreasing its suitability for the genetic test of CAH. Array CGH is a high-throughput method, but its high labor intensity, high time demand and high cost do not fit to the needs of CAH laboratories, where the vast majority of array CGH results would not be used.”

@Very short discussion.

We have largely extended the Discussion.

@Reviewer #2

We would like to thank the Reviewer for the remarks.

@English language is need revision.

The English has been revised by a native speaker biologist.

@The data underlying the findings described in their manuscript fully available without restriction, with rare exception . The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository.

We have added the raw MLPA data and the robustness experiments data of qPCR to the minimal data set in S1 File. Our minimal data set will be also available in Zenodo database (DOI: 10.5281/zenodo.6780358) after the publication of the current paper.

Attachment

Submitted filename: GCN rebuttal letter v1.1.docx

Decision Letter 1

H Hakan Aydin

25 Oct 2022

Quantitative PCR from human genomic DNA: the determination of gene copy numbers for congenital adrenal hyperplasia and RCCX copy number variation

PONE-D-22-07409R1

Dear Dr. Doleschall,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

H. Hakan Aydin, MD, FAACC

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

H Hakan Aydin

8 Nov 2022

PONE-D-22-07409R1

Quantitative PCR from human genomic DNA: the determination of gene copy numbers for congenital adrenal hyperplasia and RCCX copy number variation

Dear Dr. Doleschall:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor H. Hakan Aydin

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. The relationship between the concentrations of DNA stock solutions measured by NanoDrop 2000 spectrophotometer and Qubit 1 fluorometer.

    Pearson’s correlation coefficient is indicated by “r”. The data of one predefined dilution (first of separately diluted ones) of each sample for positive control and calibration curve was included in the study groups. Repeatability values of DNA concentration determinations based on positive control samples were 3.93 CV% for NanoDrop and 2.58 CV% for Qubit, and reproducibility values were 3.99 CV% and 4.22 CV%, respectively. The correlation of all samples between NanoDrop and Qubit was high (Pearson’s r = 0.887, p<0.0001), although the concentrations of DNA stock solutions measured by Qubit were significantly lower (Wilcoxon test: p<0.0001). Pearson correlation coefficients (r) between NanoDrop and Qubit values in the “bad quality” study group suggest that DNA quality influenced DNA concentration measurements.

    (TIF)

    S2 Fig. Melting curve analyses of quantitative PCR primers for different target genes using positive control samples.

    Rn is the normalized fluorescence of the reporter dye. Red arrow shows the peak of a non-specific product.

    (TIF)

    S3 Fig. Cq values (N = 966) of the RPPH1 reference gene grouped by target genes and replicate measurements.

    The measurements of only one, predetermined DNA working solution were taken into account for a calibration curve or a positive control DNA sample. A light blue dot indicates one RPPH1 Cq value. Means and standard deviations are indicated by bars. The mean of RPPH1 Cq (26.0724) is indicated by a horizontal grey line. Replicate measurement 1, 2 and 3 are indicated by p1, p2 and p3.

    (TIF)

    S4 Fig. Calibration curves of target genetic elements.

    Ntemp is the genomic copy number of the particular target genetic elements, which is conditional on the amount of genomic DNA in the series of dilutions (2.5, 5, 10, 20, 40 and 80 ng total DNA in a measurement) and the gene copy number of target elements in the diploid genome (1–4). CI means 95% confidence interval. The CIs of the lines are not depicted because they would be too close the lines to discern.

    (TIF)

    S5 Fig. Calibration curves of the RPPH1 reference gene from different qPCR assays.

    Ntemp is the genomic copy number of the target genetic elements of the RPPH1 gene, which is conditional on the amount of genomic DNA in the series of dilutions (2.5, 5, 10, 20, 40 and 80 ng total DNA in a measurement). CI means 95% confidence interval. The CIs of the lines are not depicted because they would be too close the lines to discern.

    (TIF)

    S6 Fig. Relationship between the PCR efficiencies of target genes and corresponding RPPH1 reference genes of each sample for calibration curves.

    Black line is a simple linear regression. Dotted line indicates the 95% confidence intervals.

    (TIF)

    S7 Fig. Relationships between gene copy numbers (GCNs) of qPCR assays for CYP21 genes and MLPA for CAH.

    Ambiguity thresholds (±0.3 of integer GCNs for qPCR and according to the manual for MLPA) are indicated by grey dotted lines. Bars indicate the standard deviation of GCNs in the case of qPCR, and the standard deviation of the dosage quotients (equivalent of GCN in MLPA) of different probes for the same CYP21 gene in the case of MLPA.

    (TIF)

    S8 Fig. Relationship between gene copy numbers (GCNs) of qPCR assays and MLPA probes for the same two alleles of an 8 bp deletion variant in CYP21 genes (rs387906510, c.332_339delGAGACTAC).

    Two MLPA hybridization probes (15221-L20261 and 15221-L20262) detect the two alleles of the same 8 bp deletion variant in CYP21 genes, which is also detected by CYP21A1P and CYP21A2 qPCR assays in the current study. Therefore, these probes and assays are suitable for direct comparison after normalization with an internal reference probe. The MLPA reference probe 16316-L21434 was used for the normalization of the CYP21A2 probe (15221-L20261) and CYP21A1P probe (15221-L20262), because this reference probe showed the highest correlation with the CYP21 probes and the other MLPA reference probes. Only the GCNs of the replicates of the first replicate measurement were used for qPCR, because MLPA according to the official manual applies to one measurement of each DNA sample by default. The ratios of CYP21 and reference probes of MLPA were tuned to approximately zero average relative error, as was done for qPCR. Ambiguity thresholds (±0.3 of integer GCNs) are indicated by grey dotted lines.

    (TIF)

    S9 Fig. Relationship of measured and expected GCNs in qPCR assays for RCCX CNV.

    A multiple regression model was built from all measured GCNs of study groups based on the genomic relations. The expected total GCNs and RCCX CNV breakpoint GCN plus 2 were calculated in each replicate measurement based on the model. Then the expected average GCN of a particular target gene was calculated using the average of corresponding expected total GCNs in proportion to measured GCNs of the particular target genes and its allelic counterpart.

    (TIF)

    S10 Fig. Estimated accuracy of different qPCR assays for RCCX CNV grouped by replicate measurements.

    Estimated accuracy is expressed as the relative error of the qPCR measurements. Relative errors were not calculated for the samples with 0 GCN in the particular assay. Bars indicate means and standard deviation.

    (TIF)

    S11 Fig. Estimated accuracy of different qPCR assays for RCCX CNV grouped by gene copy numbers (GCNs).

    Estimated accuracy is expressed as the relative error of the qPCR measurements. Relative errors were not calculated for the samples with 0 GCN in the particular assay. Bars indicate means and standard deviation. All statistical tests were calculated based on “good quality” and “population” study groups. SW—Shapiro–Wilk test, FDR—false discovery rate method for multiple testing correction, KW—Kruskal-Wallis test.

    (TIF)

    S12 Fig. Average estimated accuracy of “good quality” and “population” study groups in different qPCR assays for RCCX CNV.

    Estimated accuracy is expressed as the average relative error of the samples. The means and SDs of absolute values of average relative errors of the samples are indicated under the p-values of the Shapiro-Wilk test (SW). Relative errors were not calculated for the samples with 0 GCNs in the particular assay. Bars indicate means and standard deviation. FDR—false discovery rate method for multiple testing correction. The variances of average relative errors of samples in were significantly different (Levene’s test: p = 0.0026) between assays. However, only the difference between HERV-K(C4) CNV insertion and RCCX CNV breakpoint assays was significant (Levene’s test FDR: p = 0.04996) after multiple testing correction.

    (TIF)

    S13 Fig. Calibration curves of CYP21A1P and CYP21A2 target and RPPH1 reference genes for robustness.

    Ntemp is the copy number of the particular target genetic element in a measurement, which is conditional on the amount of genomic DNA in the series of dilutions (2.5, 5, 10, 20, 40 and 80 ng total DNA in a measurement) and the copy number of the particular genetic element in the diploid genome. CI means 95% confidence interval. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

    (TIF)

    S14 Fig. Average measured gene copy numbers of samples in CYP21A1P and CYP21A2 qPCR assays for robustness.

    Bars indicate standard deviation. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

    (TIF)

    S15 Fig. Estimated accuracy of measurements in CYP21A1P and CYP21A2 qPCR assays for robustness grouped by study group.

    Estimated accuracy is expressed by the relative error of the qPCR measurements. Relative errors were not calculated for the samples with 0 GCN in the particular assay. Bars indicate means and standard deviation. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

    (TIF)

    S16 Fig. Average estimated accuracy of samples in CYP21A1P and CYP21A2 qPCR assays for robustness.

    Estimated accuracy is expressed by the average relative error of the samples. Relative errors were not calculated for the samples with 0 GCN in the particular assay. Bars indicate means and standard deviation. SW—Shapiro–Wilk test, FDR—false discovery rate method for multiple testing correction. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

    (TIF)

    S17 Fig. Relationship between the PCR efficiencies of target genes and corresponding RPPH1 reference genes of each sample in CYP21A1P and CYP21A2 qPCR assays for robustness.

    Correlations were made under the assumption that the PCR efficiencies of CYP21A1P and CYP21A2 assays behave in a similar way. Black line is a simple linear regression. Dotted line indicates the 95% confidence intervals. UMM2—TaqMan universal master mix II,7500F - 7500 Fast qPCR instrument.

    (TIF)

    S1 Table. Glossary of important terms used in the current article.

    There is no official consensus on the terms of ambiguity, concordance, misclassification and consistency, but they have been widely used in the literature of gene copy number determination.

    (PDF)

    S2 Table. Performance information of quantitative PCR (qPCR) for gene copy number (GCN) determination in selected literature.

    The term “accuracy” is used as a difference between a measured value and a “true” value determined by a reference material or method. “Ambiguity” is a state of a measured GCN not being close enough to an integer GCN to assign it clearly. “Misclassification” is a state of an integer GCN estimated from measured GCN not being identical to the genuine integer GCN. PMID–PubMed ID, avr–average, SD—standard deviation, ΔCq−Cq(target gene)-Cq(reference gene), CV%—coefficient of variation %. 1There are criteria, but no detailed information. 2It is implicitly stated on an graph. 3It is expressed in relation to the results of other method(s) for GCN determination.

    (PDF)

    S3 Table. The address of the manufacturers of reagents and instruments.

    (PDF)

    S4 Table. Primers and hydrolysis probes used in the current study.

    All primers and probes were purified by HPLC. At least one of the primers of a primer pair is bound to an intronic sequence. Allele-specific sites are indicated on the sequences by underscore.

    (PDF)

    S5 Table. The concentrations of quantitative PCR reagents in different set-ups.

    The RNaseP copy number (CN) reference assay contains the RPPH1 internal reference gene. A reaction usually contained 10 ng genomic DNA, but the reactions for calibration curves also contained 2.5, 5, 20, 40 or 80 ng genomic DNA. TaqMan fast advanced master mix was used as qPCR reagent with AmpliTaq™ Fast DNA Polymerase. Additional Mg2+, additional dNTP and other additives were not added to the reactions. MicroAmp fast 96-well reaction plates (cat. no.: 4346907) were used for qPCR measurements.

    (PDF)

    S6 Table. Statistical metrics of the current study.

    m(gDNA)–mass of genomic DNA, Mnucl avr−average molar mass of nucleotide, Nnucl in hap gen−the amount of nucleotide pairs in the haploid human genome, NA−Avogadro’s number, CV–coefficient of variation, GCN–gene copy number, mGCNrep—measured gene copy number of replicate, Cq−quantification cycle, REmGCN−relative error of measured GCN of replicate, eiGCN–estimated integer gene copy number, NRMSE–normalized root-mean-square error.

    (PDF)

    S7 Table. Parameters of analytical specificity assessed by in silico analysis, melting curve analysis and Agilent Bioanalyzer 2100 (micro-capillary electrophoresis).

    Non-specific PCR product was observed only at the primer pairs of C4A and C4B target genes.

    (PDF)

    S8 Table. Analyses of the slopes of calibration curves using a linear mixed-effect model.

    Fixed effects were calculated based on all calibration curves of target genes or the RPPH1 gene, and a random effect of assays or samples were separately modeled. Both fixed effects of the target and RPPH1 reference genes were calculated from all assays, which can be interpreted as the estimated average slopes. This were very close to a perfect value of -3.322, corresponding to a PCR efficiency of 1. Individual random effects are expressed as a standard deviation from the fixed effect.

    (PDF)

    S9 Table. The PCR efficiencies and estimated limit of detection (LOD) of target genetic elements and RPPH1 reference genes of different qPCR assays in singleplex or duplex PCR reaction.

    The PCR efficiencies and estimated LOD were also assessed in the singleplex PCR reaction of each target or reference gene using the AI001 DNA sample. There were no significant differences between average PCR efficiencies from multiplex reactions (SW FDR: p = 0.857, ANOVA: p = 0.333, Tukey: p = 0.356–1.000 for target genes, SW FDR: p = 0.687–0.983, ANOVA: p = 0.604, Tukey: p = 0.627–1.000 for reference genes). LOD was estimated by the Hubaux-Vos method. SD—standard deviation, CI—confidence interval.

    (PDF)

    S10 Table. Precisions of target and RPPH1 reference genes in different qPCR assays for RCCX CNV.

    All precisions are calculated by pooled coefficient of variation (CV) and expressed as CV%. Repeatability and reproducibility with same and different dilutions were assessed in positive control samples. The measurements of replicates for reproducibility were performed on different days. Reproducibility in “good quality”, “population” and “bad quality” study groups was assessed in samples with GCNs higher than zero. Some tendencies might be observed; the repeatability values from the measurement of the same dilutions tended to be lower than those from measurement of different dilutions, and repeatability values tended to be lower than reproducibility values.

    (PDF)

    S11 Table. Ambiguities in different qPCR assays for RCCX CNV.

    The measured GCNs between ±0.3 of an integer GCN were considered as unambiguous.

    (PDF)

    S12 Table. Concordance in different qPCR assays for RCCX CNV.

    Concordance was assessed in the samples with unambiguous gene copy numbers (GCNs) compared to the GCNs from MLPA, Southern blot and array CGH or to the estimated integer GCNs. Percentages indicate the rate of correctly determined GCNs. Percentage is not calculated when n<9.

    (PDF)

    S13 Table. Precisions of MLPA probes for CAH.

    The peak heights of MLPA probes were determined by Coffalyser software with the default setting. All precisions are calculated by pooled coefficient of variation (CV) and expressed byas CV%. The precisions (repeatability and reproducibility) were assessed with the same dilutions of positive control samples in the same way as performed in qPCR assays. Peak heights equaling zero were excluded from calculations.

    (PDF)

    S14 Table. Estimated ambiguity and misclassification rates of different qPCR assays for RCCX CNV at different gene copy numbers (GCNs).

    Estimations were done based on the means and variances of average relative errors of different GCNs. Estimation were not performed for GCNs with less than 5 samples.

    (PDF)

    S15 Table. Precisions of target and RPPH1 reference genes in CYP21A1P and CYP21A2 qPCR assays for robustness.

    The primers of CYP21A1P and CYP21A2 genes were tested in a LightCycler 1.0 instrument with Sybr Green dye, and several qPCR parameters such as total volume, annealing temperature, primer concentration, probe concentration, qPCR reagent (UMM2) and qPCR instrument (7500F) were changed in the FAMM-GS7 system usually used for the current study. All precisions are calculated by pooled coefficient of variation (CV) and expressed as CV%. Repeatability and reproducibility were assessed in positive control samples from the same dilution. FAMM—TaqMan fast advance master mix, UMM2—TaqMan universal master mix II, GS7—GeneStudio 7 qPCR instrument, 7500F - 7500 Fast qPCR instrument.

    (PDF)

    S16 Table. The PCR efficiencies of CYP21A1P and CYP21A2 target and RPPH1 reference genes for robustness.

    SD—standard deviation, CI—confidence interval. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

    (PDF)

    S17 Table. Ambiguities in CYP21A1P and CYP21A2 qPCR assays for robustness.

    UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

    (PDF)

    S18 Table. Misclassifications in CYP21A1P and CYP21A2 qPCR assays for robustness.

    Misclassifiaction was assessed in the samples with unambiguous GCNs compared to estimated integer GCNs. Percentage is not calculated for n<9. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

    (PDF)

    S19 Table. Statistical tests for the variances of average relative errors of samples in CYP21A1P and CYP21A2 qPCR assays for robustness.

    The variances of average relative errors of samples in the “good quality” and “population” study groups of the assays with UMM2 and 7500F were significantly different. Top left cell of the table contains the test results for all four groups, other cells contains the results between pairs and after multiple testing correction by the false discovery rate method. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

    (PDF)

    S20 Table. Estimated ambiguity and misclassification rates of CYP21A1P and CYP21A2 qPCR assays for robustness at different gene copy numbers (GCNs).

    Estimations were made based on the means and standard deviation of average relative errors. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

    (PDF)

    S21 Table. Correlation between reproducibility values (coefficients of variance (CVs)) of target Cqs of samples, reproducibility values of the reference Cqs of samples, normalized root-mean-square errors (NRMSEs) of the target Cqs of samples, and accuracy (the average relative errors of samples) in “good quality” and “population” study groups of CYP21A1P and CYP21A2 qPCR assays for robustness and RCCX CNV.

    Pooled CV of target and reference gene (reproducibility or inter-assay precision), pooled NRMSD of target Cqs characterizing the normalization of Cqs of target genes by Cqs of the reference gene, and the standard deviation of the average relative error used for the estimation of ambiguity and misclassification rates are given for information purposes. Correlation was assessed by Spearman’s correlation. UMM2—TaqMan universal master mix II, 7500F - 7500 Fast qPCR instrument.

    (PDF)

    S22 Table. Correlation between reproducibility values (coefficients of variance (CVs)) of target Cqs of samples, reproducibility values of the reference Cqs of samples, normalized root-mean-square errors (NRMSEs) of the target Cqs of samples and accuracy (the average relative errors of samples) in “good quality” and “population” study groups of C4A, C4B, HERV-K(C4) CNV deletion, HERV-K(C4) CNV insertion and RCCX CNV breakpoint qPCR assays.

    Pooled CV of target and reference gene (reproducibility or interassay precision), pooled NRMSD of target Cqs characterizing the normalization of Cqs of target genes by Cqs of reference gene, and the standard deviation of average relative error used for the estimation of ambiguity and misclassification rates are given for information purposes. Correlation was assessed by Spearman’s correlation.

    (PDF)

    S1 File. Sheet 1: MIQE checklist.

    All essential information (E) and Desirable information (D) are submitted in the current manuscript. Sheet 2: Data of DNA stock solutions. h—healthy, nfai—non-functioning adrenal incidentaloma, sv—simple virilizing congenital adrenal hyperplasia, sw—salt wasting congenital adrenal hyperplasia; g—good quality, p—population, b—bad quality; Q—Qiagen QIAcube wtih QIAamp DNA blood mini kit, R—Roche DNA isolation kit for mammalian blood, 5P - 5 Prime ArchivePure DNA cell/tissue kit; i—intact, sd—slightly degraded, pd—partially degraded. Sheet 3: Raw Cq values of the main qPCR experiments. Outliers were not identified using quartile ± 1.5 * interquartile range criterion. NTC—no template control. Sheet 4: Raw peak heights of MLPA probes. Coffalyser software with the default settings was used for the determination. Sheet 5: Raw dosage quotient of CYP21A1P and CYP21A2 MLPA probes. Coffalyser software with the default settings was used for the determination. Sheet 6: Raw Cq values for the repeatability and reproducibility values of the robustness experiments. Outliers were not identified using quartile ± 1.5 * interquartile range criterion. NTC—no template control; FAMM—TaqMan fast advance master mix, UMM2—TaqMan universal master mix II, GS7—GeneStudio 7 qPCR instrument, 7500F - 7500 Fast qPCR instrument. Sheet 7: Raw Cq values for the detailed CYP21 robustness experiments. Outliers were not identified using quartile ± 1.5 * interquartile range criterion. NTC—no template control; FAMM—TaqMan fast advance master mix, UMM2—TaqMan universal master mix II, GS7—GeneStudio 7 qPCR instrument, 7500F - 7500 Fast qPCR instrument. Sheet 8: Detailed gene copy number results. GCN—gene copy number, avr—average, SD—standard deviation, lwr—lower, upr—upper, CI—95% confidence interval, CV%—coefficient of variation %, DQ—dosage quotient, CGH—comparative genome hybridization; wd—with dilution, wod—without dilution; g—good quality, p—population, b—bad quality; orange number or amb—ambiguous result, red number—misclassification, orange background—not all MLPA probes were used for a CYP21 gene due to chimeric gene, red background—MLPA was inconclusive probably due to double chimeric genes. Sheet 9: Expected gene copy numbers. GCN—gene copy number, avr—average, exp—expected, dev—deviation, mod—modified SD—standard deviation, lwr—lower, upr—upper, CI—95% confidence interval, CV%—coefficient of variation %, DQ—dosage quotient, CGH—comparative genome hybridization; g—good quality, p—population, b—bad quality; orange number—ambiguous result, red number—misclassification at GCN or a deviation above 0.4 at avr dev GCNs or a misclassification at rounded GCNs. Sheet 10: Detailed results of linear discriminant analyses. The estimated total integer gene copy number (GCN) differed from the input class in sample NA11839 (the only sample having inconsistency in its input classes) and in sample H005 which has a relatively low probability value of classification. Cross-validation, when the estimated integer GCN of one sample is estimated without its input class and is based on the data of all other samples, supported the estimation in the former case, and led to a different classification in the latter one. A deviated classification of C4 genes was also observed by cross-validation in sample NA11839. The cross-validation of CYP21 genes could not be performed in two samples (H004 and H010) with the rare 3 GCN of CYP21A2, because their classes consisted of only one sample. avr—average, P—probability, red number—inconsistent input class, orange background—low probability or difference between input, resultant or cross-validation class, yellow background—cross-validation cannot be applied, red background—ambiguous estimated integer GCN.

    (XLSX)

    Attachment

    Submitted filename: GCN rebuttal letter v1.1.docx

    Data Availability Statement

    All relevant data are within the manuscript and its Supporting Information files. The minimal data set will be also available in Zenodo database (DOI: 10.5281/zenodo.6780358) after the publication of the current paper.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES