Skip to main content
PLOS One logoLink to PLOS One
. 2020 Aug 26;15(8):e0238245. doi: 10.1371/journal.pone.0238245

Non-invasive prenatal testing (NIPT) by low coverage genomic sequencing: Detection limits of screened chromosomal microdeletions

Marcel Kucharik 1,2, Andrej Gnip 3,4, Michaela Hyblova 3,4, Jaroslav Budis 1,2,5, Lucia Strieskova 1, Maria Harsanyova 1,6, Ondrej Pös 1,6, Zuzana Kubiritova 1,6,7,*, Jan Radvanszky 1,7, Gabriel Minarik 3,4, Tomas Szemes 1,2,6
Editor: Kelvin Yuen Kwong Chan8
PMCID: PMC7449492  PMID: 32845907

Abstract

To study the detection limits of chromosomal microaberrations in non-invasive prenatal testing with aim for five target microdeletion syndromes, including DiGeorge, Prader-Willi/Angelman, 1p36, Cri-Du-Chat, and Wolf-Hirschhorn syndromes. We used known cases of pathogenic deletions from ISCA database to specifically define regions critical for the target syndromes. Our approach to detect microdeletions, from whole genome sequencing data, is based on sample normalization and read counting for individual bins. We performed both an in-silico study using artificially created data sets and a laboratory test on mixed DNA samples, with known microdeletions, to assess the sensitivity of prediction for varying fetal fractions, deletion lengths, and sequencing read counts. The in-silico study showed sensitivity of 79.3% for 10% fetal fraction with 20M read count, which further increased to 98.4% if we searched only for deletions longer than 3Mb. The test on laboratory-prepared mixed samples was in agreement with in-silico results, while we were able to correctly detect 24 out of 29 control samples. Our results suggest that it is possible to incorporate microaberration detection into basic NIPT as part of the offered screening/diagnostics procedure, however, accuracy and reliability depends on several specific factors.

Introduction

In recent years, prenatal care has been attempting to determine the genetic background of a fetus with a decreasing risk of miscarriage. One of the most important milestones in this field was the discovery of cell-free fetal DNA (cffDNA) in the plasma of pregnant women [1]. It has led to the emergence of a new basic and applied research field of non-invasive prenatal testing (NIPT) [2]. Different commercial companies, e.g. Sequence, BGI, Illumina, Natera, Roche, LifeCodexx, are utilizing results of this research in their services focused on NIPT of T13, T18, T21, and sex chromosome aneuploidies. Furthermore, they are broadening their product portfolio with tests for detection of most common chromosomal microdeletions. Such tests can be used to identify pregnancies with high risk of DiGeorge syndrome, 1p36 syndrome, Cri-Du-Chat syndrome, Prader-Willi/Angelman syndrome, and Wolf-Hirschhorn syndrome. However, validation of methods for detection of microdeletions is very limited. While sensitivity and specificity of detection of most common trisomies and sex chromosomes aneuploidies were published in large meta-analyses [35], these important test parameters for detection of the above mentioned microdeletions are very scarce [68]. Furthermore, the sensitivity for different input parameters such as fetal fraction and aberration size is not evaluated, thus the importance of each parameter is not quantified. This is due to very low prevalence of such syndromes, resulting in very limited clinical validation data sets. Therefore, currently available proof of principle studies focused on chromosomal microdeletions detection and corresponding validation studies through analyses of artificial data. These included samples prepared either by in-silico massively parallel sequencing data manipulation, or by mixing-up of normal and well defined microdeletion positive DNA samples with subsequent testing on few real clinical samples [6,9]. Microdeletion syndromes detection sensitivity and specificity, similarly to most common trisomies detection, was found to be the most dependent on technical and biological parameters of sample and the test itself. Coverage of target region, fetal fraction, size, and position of the deletions were identified as the most prominent factors [9].

Recent years yielded a handful of tools for CNV detection from (shallow) whole-genome sequencing data, some even with comparisons between available tools [10]. However, none of them have thoroughly studied the importance of technical and biological parameters (such as deepness of sequencing, fetal fraction, or variation length) to prediction accuracy. All the tools for CNV detection behave similarly in a way, that first they bin the reads and work only with bin counts, then employ some form of normalization and noise correction and lastly segment the normalized signal and call variations. The notable CNV detection tools include: Wisecondor X [10] (successor of the Wisecondor [11] tool), CNVkit [12], CNVnator [13], iCopyDav [14].

In present work, we tested detection limits for different sizes and positions of known microdeletions causing clinically relevant syndromes using an in-house CNV detection tool (available at https://github.com/marcelTBI/CNV_data). We identified syndrome specific critical regions ranging from 0.9 Mb to 21 Mb in pooled data from the public database of the International Standards for Cytogenomic Arrays (ISCA) Consortium [15] Subsequently, we estimate the sensitivity of detection in such artificially prepared data, mimicking the sizes and positions of pathogenic deletions. The experiment was split into two parts. The first analyses were performed on data prepared by artificially “spiking in” reads into sequencing data from physiological pregnancies. In the second experiment, analyses were performed in a blinded manner on artificially prepared sample mix-ups, with defined proportions of normal and well-defined microdeletions containing control DNA, mimicking different fetal fractions ranging from 5% to 20%.

Material and methods

The study has been approved by the Ethical Committee of the Bratislava Self-Governing Region (Sabinovska ul.16, 820 05 Bratislava) on 30 April 2015 under the decision ID 03899_2015. In each relevant case written informed consents consistent with the Helsinki declaration were obtained.

Retrieval of pathogenic regions

The ISCA database [15] was searched for deletions located in genomic regions associated with five selected microdeletion syndromes (22q11.2 –DiGeorge syndrome, 4p16.3 –Wolf-Hirschhorn syndrome, 15q11 –Angelman/Prader-Willi syndrome, 5p15 –Cri-Du-Chat syndrome, 1p36 – 1p36 deletion syndrome), using the hg19/GRCh37 version of the human reference genome. Coverage was calculated as the number of deletions which included the given nucleotide position, separately for pathogenic (or likely pathogenic) and benign (or likely benign) deletions.

Initially, coverage of 45 pathogenic deletions and higher was used as a cut-off to specify boundaries of the pathogenic regions. Afterwards, the regions were checked for outliers (pathogenic deletions located entirely outside the given region), and the coverage cut-off was gradually decreased until the pathogenic region was defined in such a way, that there were no outliers (in the case of Wolf-Hirschhorn, Angelman/Prader-Willi, and Cri-Du-Chat syndromes), or that the outliers were considered not relevant (in the case of DiGeorge and 1p36 deletion syndromes). The pathogenic region for Wolf-Hirschhorn syndrome defined by coverage cut-off 45 was the only one which did not have any outliers initially Fig 3. The region for DiGeorge syndrome defined by the same cut-off had a considerable number of outliers located towards the 3’-end, but these corresponded to another well described syndrome, known as 22q11.2 distal deletion syndrome [16]. Therefore, no further adjustment of the region boundaries were needed. In order to address the phenomenon of outliers, the coverage cut-off was reduced to 25 for Angelman/Prader-Willi syndrome, to 28 for Cri-Du-Chat syndrome, and to 3 for 1p36 deletion syndrome. There were still considerable numbers of overhangs from the defined pathogenic regions (pathogenic deletions spanning outside of the regions), but as long as there were no pathogenic deletions located entirely outside of them, it was assumed that the defined regions cover genes responsible for the syndromes. The two outliers found in the 1p36 genomic region (with positions chr1:27,133,503–28,011,702 and chr1:27,927,633–28,215,952 on hg19 human genome assembly) were considered a distinct entity from the deletions responsible for the 1p36 deletion syndrome and were thus excluded from the analysis.

Fig 3. Coverage plot representing genomic positions of critical regions of five the most common microdeletion syndromes.

Fig 3

Coverage of pathogenic and likely pathogenic deletions is denoted by blue, while coverage of benign and likely benign microdeletions is in green. The critical region is visualized in red.

Three pathogenic deletions in the 22q11.2 genomic region had overlap with DiGeorge syndrome critical region smaller or equal to only 20kb and in all three cases the overlap was located on the 3’-end of the critical region. These were assumed to be related to the 22q11.2 distal deletion syndrome and were excluded from further analysis. The final set of the microdeletions are summarized in Table 1. In total, we used 533 microdeletions for evaluation of our methods.

Table 1. Counts of microdeletions from the ISCA database.

Deletion size
Syndrome 0-1Mb 1-2Mb 2-3Mb 3-4Mb 4-5Mb 5-10Mb 10-40Mb
1p36 deletion 8 20 15 13 12 27 5
Wolf-Hirschhorn 0 5 5 12 5 11 10
Cri-du-chat 0 2 2 3 4 8 31
Angelman/Prader-Willi 6 0 0 1 48 34 1
DiGeorge 12 22 205 6 0 0 0

Preparation of artificial NIPT data sets

Sequencing data from healthy NIPT samples can be used for identifying the limits of microaberration detection. Reads of such samples are binned into equal size bins according to read start. We used bin size of 20kb, although bin size of 50kb was reported to be used in previous study [9].

Artificial data sets mimicking aberrated samples were created from data sets belonging to healthy samples by multiplication of bins corresponding to the pathogenic regions. The multiplication coefficients were guided by a target fetal fraction. Specifically, for a target fetal fraction ff, all bins between the start and the end of the simulated pathogenic region were multiplied with (1-ff/2) for simulation of a chromosomal microdeletion or with (1+ff/2) for a chromosomal microduplication. Since it is known, that the fetal fraction is not constant throughout the whole genome [17], we apply the multiplication after both normalization procedures aimed to normalize the differences within a sample such as GC content bias and differences between different parts of the genome.

We created an artificially aberrated data set for each combination of NIPT data set and pathogenic region (identified in previous chapter) with either varying fetal fraction ratio in [0.05, 0.075, 0.1, 0.125, 0.15, 0.175, 0.2] or varying read count from 5M to 20M with a step of 1M. All artificial data sets were evaluated by the same algorithm described below.

Identification of microaberrations

To identify microaberrations, we employed an approach similar to that described by Zhao et al. [9]. Briefly, we thoroughly normalized the bin counts and then used the circular binary segmentation algorithm on these bin counts to identify consistent segments of same coverage. The segments were then evaluated to determine their significance. More details are in later sections. Significant deviations are visualized for individual chromosomes (Fig 1) and for the target syndromes.

Fig 1. Visualization of segment significance.

Fig 1

Bin read counts are depicted as gray dots, segments and their significance as colored horizontal lines. Here, we can see an approximately 16.2M long fetal deletion covering the whole Cri-Du-Chat region and a small deletion in the p14.1 region, both depicted in red. However, a possibly fetal deletion depicted by yellow between them may suggest that we are dealing with only one long deletion. Light grey vertical band depicts an unmappable region around the centromere, black bands signify bins that did not pass filtration for healthy characteristics (see Normalisation and filtering in Materials and methods section) and are excluded from the analysis. Light red band emphasizes the detected deletion and pathogenic region. The approximated Z-score of the deletion is displayed over the reported segments. Estimated level of aberration based on the fetal fraction of this sample (12.6%) is visualized as a red dashed line for fetal aberration and magenta dashed line for maternal aberration. This is the one of the very few real NIPT samples with a microaberration available to us.

Normalization and filtering

To obtain bin counts adjusted for intra- and inter- genomic differences we normalized and filtered the bins by a four-step procedure:

  1. LOESS-based GC correction for 20kb bins similar to the one described by Alkan et al. [18]

  2. PCA normalization to remove higher-order population artifacts on autosomal chromosomes (similar to [9])

  3. filtering of bins with low mean and high variance in bin count

  4. subtracting a per-bin mean read count to obtain data normalized around zero

The data used for training both PCA normalization and per-bin mean normalization is available at https://github.com/marcelTBI/CNV_data as bin counts for individual samples (20kb bin size). We used 449 samples for training of PCA normalization (at least 10M reads per sample) and 1238 samples for mean normalization (at least 7M reads per sample). All training samples were collected as standard NIPT samples and checked to be genetically healthy and singleton pregnancies (we include both female and male fetus samples, since we use PCA normalization on autosomal chromosomes only).

In PCA normalization the loess normalized sample bin count matrix was transformed into principal space and first 15 principal components were stored. The first principal components represent common noise in euploid samples [9], so to normalize a sample, we have projected the bin count of a sample to all but the first 15 principal components and thus normalized bin counts to exclude this bias.

Then, only bins that have healthy characteristics were retained–not very low mean and not very high variance. If the read coverage would be uniform, there would be around 7 reads per bin when the sample is normalized to 1M reads. We filtered out bins that had mean read count less than 3.0 (less than half of the ideal mean) or variance higher than 1.5. These numbers were selected manually to ensure that problematic bins would be filtered out, but we still keep around 88% of bins. Note, that this filtering drops all unmappable regions of the genome (centromeres, etc.).

Finally, a per-bin mean read count was subtracted to obtain data normalized around zero. The autosomal chromosomes training samples were firstly rescaled to contain the same number of reads. Since the fetal fraction and fetus gender affects bins of the sex chromosomes, these were rescaled independently of autosomal chromosomes.

Segment identification and CNV calling

We used a circular binary segmentation (CBS) algorithm provided by the R package “DNAcopy”[19] to identify same-coverage segments. CBS partitions a chromosome into regions with equal copy numbers, therefore it can detect the change point quite precisely. However, this algorithm overly partitions a chromosome. Thus, we used a simple rule to determine the significance of a segment. In an ideal case, the deletion/duplication of fetal chromosome will mean a decrease/increase by a factor mb*ff/2, where ff is the fetal fraction and mb is mean bin count. Since this is a crude simplification and there are uncertainties in the fetal fraction estimation [17,20], and the non-uniform distribution of fetal fraction across the genome [17], we mark as significant all segments that overstep 75% of this theoretical increase/decrease. This percentage can be varied and represents a trade-off between sensitivity and specificity of prediction. The segment was categorized as maternal aberration if it overstepped 75% of maternal level mb*(1-ff)/2. When the fetal fraction is close to 50%, we cannot determine if we are dealing with the maternal or fetal aberration, but these fetal fractions are extremely rare. Furthermore, this approach is theoretically able to distinguish between maternal and fetomaternal variants, but the distinction is very unreliable due to focus on fetal variants. Thus, in further text we label both fetomaternal and maternal aberrations as simply “maternal”. The minimal length of segment categorized as significant was set to 200,000 bases for maternal and 600,000 bases for fetal detections.

The presented method needs to know at least an estimate of the fetal fraction to be able to properly categorize the segments. However, when the fetal fraction is unknown, a small value (5%) can be used to categorize all found deviations from normal as significant at the cost of increase in false positives. We found the 5% fetal fraction as the lowest to reliably detect at least long aberrations, CNV detection of samples with lower fetal fraction is thus not recommended.

Preparation of control DNA sample mixes

The control DNA sample mixes were prepared as a mixup of healthy female plasma DNA and affected male DNA with confirmed microdeletion syndrome with different ratios to simulate different fetal fraction. Genomic DNA from clinically affected male probands with confirmed microdeletion syndrome were acquired from Coriell Repository biobank (Table A in S1 Text). Additional two anonymized male DiGeorge samples were donated from the Clinic of Genetics at the University Hospital in Bratislava. In addition, genomic male DNA was fragmented with dsDNA shearase according to manufacturer’s protocol in order to reach fragments less than 500bp of size. Plasma DNA was received from non-pregnant female volunteers to use as the “maternal” part of the artificial mixup. Sequencing libraries were then prepared and quantified both from non-pregnant plasma and sheared genomic DNA individually, with a PCR-free modification of our previously described method [21]. Libraries were prepared using the TruSeq Nano DNA kit (Illumina).

Affected male genomic libraries were then mixed with female healthy plasma libraries to create artificial libraries aiming for different fetal fractions between 5% and 20%. Massively parallel sequencing on NextSeq (Illumina) platform was then performed targeted to 20M uniquely mapped pair-end reads (2x35bp) per sample. This number of reads (instead of more often used 10M) was selected to improve sensitivity, while keeping sequencing costs reasonable. The fetal fraction was measured according to reads mapped on Y chromosomes. Fig 2 and Table A in S1 Text summarize the control DNA sample mixes.

Fig 2. Control samples and their detection accuracy.

Fig 2

Y axis shows the deletion sizes, X axis the fetal fraction, the color of the point determines its read count. Different syndromes are plotted with different shapes. Detection range was plotted based on all control sample mixups. More detailed data is in the Table A in S1 Text.

Results

Identification of critical regions of chromosomal microdeletion associated syndromes

For the five selected syndromes we identified 570 records corresponding to samples with pathogenic phenotypes and 245 records for benign phenotypes in the ISCA database. These records were used for determination of critical regions. For further analyses, we filtered out outliers and kept 533 regions corresponding to the five selected syndromes. Coverage plots, results of determination of sizes, and positions of all microdeletions associated with syndromes of interest are summarized in Table 2 and visualized in Fig 3. Comparison of studied syndromes in ISCA and DECIPHER databases [22] at the time of our study is available in Table B in S1 Text.

Table 2. Size and positions of the pathogenic regions for target syndromes (hg19).

Syndrome Chromosome Start position End position
1p36 deletion 1 564,424 21,598,492
Wolf-Hirschhorn 4 85,040 2,010,761
Cri-du-chat 5 1 15,678,560
Angelman/Prader-Willi 15 22,779,922 28,559,437
DiGeorge 22 18,661,724 21,505,417

Analyses of in-silico prepared artificial data

To test the sensitivity of detection, we tested the algorithm on 200 different NIPT data sets and on 533 pathogenic regions from the ISCA database. In the first analysis, we kept the read count fixed to 20 million (20M) and the fetal fraction varied from 5% to 20% with a step of 2.5%. The second analysis fixed the fetal fraction to 10% and varied the read count from 5M to 20M with a step of 1M.

Detection accuracy for different fetal fractions

Fetal fraction and size of the deletion were previously reported to be the most crucial factors in detection of chromosomal microdeletions [9]. Sensitivity and specificity calculations for different fetal fractions and sizes of microdeletions at fixed read count to 20M reads per sample were performed (Figs 4 and 5). Data for particular syndromes at 10M reads can be found in the S1 Text.

Fig 4. Sensitivity of the prediction for different fetal fraction and microdeletion size.

Fig 4

Read count was set to 20M in each sample.

Fig 5. Specificity of the prediction for different fetal fraction and microdeletion size.

Fig 5

Read count was set to 20M in each sample.

Out of the 746,200 (533x200x7) carried out simulations, the simulated syndrome was correctly predicted in 571,255 cases (sensitivity = 76.6%). Furthermore, this sensitivity increased to 99.6% if the fetal fraction was at least 10% and the size of the deletion was at least 3Mb. Very poor sensitivity in the 0M-1M range is caused by our strict filtering of detections with small sizes (more info in Materials and Methods), which, as a trade-off, increases specificity.

In Figs A-F in S1 Text, we report sensitivity for a case, when critical regions are enlarged by 2Mb on both sides. However, there is no significant increase of sensitivity.

Detection accuracy for different read counts

Next analysis focused on estimation of sensitivity of microdeletion detection at different levels of read counts per sample. Fetal fraction was fixed to 10% in this case. Results from calculations for read counts ranging from 5M to 20M of reads per sample are presented in Figs 6 and 7.

Fig 6. Sensitivity of the prediction for different read count and microdeletion size.

Fig 6

Fetal fraction was set to 10% for all samples.

Fig 7. Specificity of the prediction for different read count and microdeletion size.

Fig 7

Fetal fraction was set to 10% for all samples.

Out of the 1,705,600 (533x200x16) carried out simulations, the simulated syndrome was correctly predicted in 937,335 cases (sensitivity = 55.0%). Furthermore, this sensitivity increased to 97.1% if the read count was at least 15M and the size of the deletion was at least 3Mb. Figures for particular syndromes can be found in the S1 Text.

Validation on control samples

Out of 29 tested DNA samples, created by mixing-up of DNA samples from physiological pregnancies and microdeletion positive control samples, we were able to correctly detect 24 (Fig 2 and Table A in S1 Text). The 5 undetected samples had all DiGeorge syndrome, with fetal fraction below 10%, and size of the deletion below 3Mb. Surprisingly, the read count seems to have minimal effect on prediction accuracy, however, we cannot rule out the possibility that limitations of data points skewed this finding.

Our main goal is to evaluate how the accuracy of prediction is dependent on the parameters like fetal fraction and aberration length. Since the real microaberration samples are very scarce, we were forced to use these “simulated” data mixed in the lab from two different samples to be able to, at least partially, control parameters like fetal fraction and aberration length. By using these mixed samples, we introduce a new kind of bias since we are using DNA from two different individuals and also we slightly deviate from standard NIPT protocol. However, the data seem to be consistent with the very few real microaberration samples we encountered in NIPT (data not included).

Discussion

Following confirmation of the presence of cffDNA in maternal blood and its use for fetal sex determination [1] first NIPT applications for common aneuploidies were introduced [23] and quickly becoming widely used in prenatal care worldwide [24]. According to the updated statement of the American College of Medical Genetics and Genomics [25] there is strong evidence that NIPT can replace conventional screening for Down, Edwards, and Patau syndrome as it can be performed from 9th gestational week. Nowadays, NIPT is being implemented in public prenatal care and recently has become a standard screening procedure for all pregnant women in the Netherlands [26]. Moreover, the use of whole genome sequencing based tests, allows detection of a wider range of chromosomal aberrations. In line with this, studies using SNP based whole genome scans [6,7] or low coverage whole genome sequencing (0.2x coverage) [9], suggested high sensitivities for detection of the five most frequent microdeletion syndromes. However, these studies had limited means to validate their performance, since real samples with invasively or postpartum confirmed results of NIPT detected microaberrations are available in very limited numbers. Therefore, mostly artificially prepared data or DNA sample mix-ups are generally used for proof of principle pilot studies. Both scenarios were tested in our study yielding comparable results. However, it should be noted that some characteristics of NIPT samples, such as fragmentation patterns, are not realistically represented in sample mixups of in-silico samples [27]. We evaluated performance of our algorithm for detection of microdeletions using low uniquely mapped read count at 10M that is currently considered as standard for reliable detection of Down, Edwards, and Patau syndromes. The motivation for our effort was to define limitations when such low read counts are used. Use of lower read counts is one of key means to reduce substantial sequencing analysis costs per sample. When required read counts drop to a reasonable level around 10M reads, use of middle throughput massively parallel sequencing platforms, such as Illumina NextSeq 500/550, becomes an option, creating thus a potential to a wider adoption of testing across the world.

Critical region determination

Similarly to our study, Zhao et al. [9] used data from DECIPHER database [22] to define a critical region specific for each of the syndromes. Instead, we used the ISCA database to manually identify the critical region for every syndrome. The main advantage of our approach is the possibility to define critical regions of microdeletion syndromes more specifically. For example DiGeorge syndrome was previously described as 22q11.21 deletion [28], but the real critical region is not the whole band, but only its 3Mb middle part. Our study brings also new information that could be used in the further specification of the size as well as localization of the tested microdeletions as both these parameters were found to be between the four most critical ones [9,29], since there are only very few pathologic detections that overlap the critical regions both from ISCA and DECIPHER databases (Table B in S1 Text).

Critical parameters for detection of microdeletions

As fetal fraction and deletion size were found to be the most critical parameters in NIPT, different combinations of them were tested. Based on a real range of fetal fraction in routine NIPT testing [6,9], we tested fetal fractions from 5% to 20%. Fetal fractions lower than 5% are problematic due to increased number of false negative detections. For 200 different samples, 533 simulated cases were evaluated. Our approach achieved accuracy of 79.3% for 10% fetal fraction with 20M read count, which further increased to 98.4% if we searched only for deletions longer than 3Mb.

To support our in-silico findings, we designed an artificial laboratory sample evaluation test. We used artificial mixtures for all studied syndromes using control DNA samples with precise information about the microdeletion size and position. The only undetected samples were those with DiGeorge syndrome microdeletion shorter than 3Mb and, simultaneously, with fetal fraction lower than 10% (Fig 7). These results are in accordance with the in-silico simulated data.

Read count importance

Later on, as we tested the influence of read count, the fetal fraction was fixed to 10%, the percentage corresponding to average fetal fraction in pregnant women in the most relevant weeks of pregnancy for NIPT (between 10th and 13th week of pregnancy) [30]. We concluded that the influence of the read count is significant, and the increase of prediction accuracy does not stop at 10M reads (Fig 6), (Fig J in S1 Text). Higher number of reads allows for further increasing of accuracy. Based on our results, we recommend to use approximately 16M-17M (appr. 0.35x genome coverage for 2x35bp reads) reads for analyses, due to fact that the detection rate reaches a plateau for 10% fetal fraction and ≥ 3Mb deletion size around this point (Fig 6). Using even more reads could be beneficial, especially for small deletions and low fetal fractions, but it does not add to the prediction accuracy in most of the tested cases, thus unnecessarily raising the costs of the analysis. On the other hand, if the fetal fraction is higher (>12.5%) and we focus only on moderate deletion sizes (>3Mb), even coverage as low as 10M reads was shown to be sufficient, suggesting that this test can (and should) be included as part of a basic NIPT even with low coverage. However, it should be mentioned, that deletions causing DiGeorge syndrome (the most frequent syndrome from our list) are usually shorter than 3Mb (Fig 7). Moreover, the genomic location for this syndrome contains a small unmappable region in the middle, which further decreases prediction accuracy. For this syndrome, we recommend at least 20M reads to be used.

Applicability in routine practice

To test the applicability of in-silico results in clinical practice, we have tested control mix-up samples, where results are concordant to those from in-silico evaluation (all undetected samples have fetal fraction less than 10% and variation size shorter than 3Mb). As a further proof of the applicability of our approach, we have already detected two microdeletions causing Cri-Du-Chat syndrome and one DiGeorge microdeletion in samples from routine NIPT, where this test is implemented. These findings were subsequently confirmed by conventional methods on samples from amniocentesis.

We should note here, that in the real case scenario, the sensitivity would be slightly lower, since in the experiments we were not dealing with various more rare scenarios like for instance mosaicism. Fetal mosaicism will most likely cause a negative call, since the “effective” fetal fraction for the aberration will be under detection accuracy threshold. On the other hand, maternal mosaicism and fetal aberration can be indistinguishable from the read counts only, which will result in a false positive call. These latter two cases can be distinguished based on the input fetal fraction—if we see an aberration with level that is too far from both expected fetal and maternal levels, we can classify it as “possibly maternal mosaic”. However, there are more (unlikely) alternatives—double deletion, multiple duplication, wrongly estimated fetal fraction, etc.

Although we present results for a non-public tool, we believe that the general trends of sensitivity based on fetal fraction, variation length, and read count are transferable to any of the available CNV detection tools, since they all share a common methodology.

Conclusions

Our results suggest that it is possible to incorporate microaberration detection into whole-genome based NIPT as part of the offered screening/diagnostics procedure, with no or only slight increase in read depths. On the other hand, this is highly dependent on the specific parameters of the used test as well as on the aims of testing. The test has excellent accuracy, when the fetal fraction is above 10% and variation length is above 3Mb, thus becoming a potential diagnostic tool, when these requirements are met. But even when these requirements are not met, the test can detect a significant number of variants, becoming a valuable screening tool in those cases.

However, final decisions on the use and evaluation of the test results, together with specific test parameters, should be a compromise between the cost of the test, objective for testing and the required sensitivity and specificity. Limitations of this approach should always be kept in mind, while professional judgement of a skilled and properly trained evaluator is still in place. Moreover, using this approach, it is possible to distinguish between mother and fetus derived microdeletions, which is based on expected gain or loss of read counts per bin according to determined fetal fraction. This distinction is available for almost the whole range of fetal fractions observed in clinical practice, except for data with fetal fraction around 50%. These would be indistinguishable due to the similar ratio of maternal and fetal DNA. The expected values can be seen on Fig 1 as dashed lines.

Supporting information

S1 Text

(PDF)

Acknowledgments

We would like to thank Dr. Iveta Mlkva (Clinic of Genetics at the University Hospital in Bratislava) for kindly providing the control samples with DiGeorge syndrome and for helpful discussions.

Data Availability

Data used for training and testing are available from GitHub (https://github.com/marcelTBI/CNV_data). These data include anonymized mapped data without genomic information to ensure participant confidentiality. The repository contains all needed data and scripts for reproduction of results of the article. The scripts and the data are for non-commercial use only, since they are part of a commercially used tests Trisomy Test + and Trisomy Test Complete (https://trisomytest.sk/en/) and are intellectual properties of Geneton Ltd.

Funding Statement

This work was supported by the “REVOGENE – Research centre for molecular genetics” project (ITMS 26240220067) supported by the Operational Programme Research and Development funded by the European Research and Development Fund (ERDF) (https://ec.europa.eu/regional_policy/en/funding/erdf/) and APVV project- Slovak Research and Development Agency project (https://www.apvv.sk/?lang=en) with number APVV-15- 0232. The funders Geneton Ltd., Medirex Inc., Trisomy Test Ltd. provided support in the form of salaries for authors MK, JB, LS, MaH, OP, ZK, JR, TS, AG, MiH, GM, respectively, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

References

  • 1.Lo YM, Corbetta N, Chamberlain PF, Rai V, Sargent IL, Redman CW, et al. Presence of fetal DNA in maternal plasma and serum. Lancet. 1997;350: 485–487. 10.1016/S0140-6736(97)02174-0 [DOI] [PubMed] [Google Scholar]
  • 2.Pös O, Budiš J, Szemes T. Recent trends in prenatal genetic screening and testing. F1000Res. 2019;8 10.12688/f1000research.17047.1 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Gil MM, Quezada MS, Revello R, Akolekar R, Nicolaides KH. Analysis of cell-free DNA in maternal blood in screening for fetal aneuploidies: updated meta-analysis. Ultrasound Obstet Gynecol. 2015;45: 249–266. 10.1002/uog.14791 [DOI] [PubMed] [Google Scholar]
  • 4.Taylor-Phillips S, Freeman K, Geppert J, Agbebiyi A, Uthman OA, Madan J, et al. Accuracy of non-invasive prenatal testing using cell-free DNA for detection of Down, Edwards and Patau syndromes: a systematic review and meta-analysis. BMJ Open. 2016;6: e010002 10.1136/bmjopen-2015-010002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Mackie FL, Hemming K, Allen S, Morris RK, Kilby MD. The accuracy of cell-free fetal DNA-based non-invasive prenatal testing in singleton pregnancies: a systematic review and bivariate meta-analysis. BJOG. 2017;124: 32–46. 10.1111/1471-0528.14050 [DOI] [PubMed] [Google Scholar]
  • 6.Wapner RJ, Babiarz JE, Levy B, Stosic M, Zimmermann B, Sigurjonsson S, et al. Expanding the scope of noninvasive prenatal testing: detection of fetal microdeletion syndromes. American Journal of Obstetrics and Gynecology. 2015. pp. 332.e1–332.e9. 10.1016/j.ajog.2014.11.041 [DOI] [PubMed] [Google Scholar]
  • 7.Ravi H, McNeill G, Goel S, Meltzer SD, Hunkapiller N, Ryan A, et al. Validation of a SNP-based non-invasive prenatal test to detect the fetal 22q11.2 deletion in maternal plasma samples. PLoS One. 2018;13: e0193476 10.1371/journal.pone.0193476 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Hu H, Wang L, Wu J, Zhou P, Fu J, Sun J, et al. Noninvasive prenatal testing for chromosome aneuploidies and subchromosomal microdeletions/microduplications in a cohort of 8141 single pregnancies. Human Genomics. 2019. 10.1186/s40246-019-0198-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zhao C, Tynan J, Ehrich M, Hannum G, McCullough R, Saldivar J-S, et al. Detection of fetal subchromosomal abnormalities by sequencing circulating cell-free DNA from maternal plasma. Clin Chem. 2015;61: 608–616. 10.1373/clinchem.2014.233312 [DOI] [PubMed] [Google Scholar]
  • 10.Raman L, Dheedene A, De Smet M, Van Dorpe J, Menten B. WisecondorX: improved copy number detection for routine shallow whole-genome sequencing. Nucleic Acids Res. 2019;47: 1605–1614. 10.1093/nar/gky1263 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Straver R, Sistermans EA, Holstege H, Visser A, Oudejans CBM, Reinders MJT. WISECONDOR: detection of fetal aberrations from shallow sequencing maternal plasma based on a within-sample comparison scheme. Nucleic Acids Res. 2014;42: e31 10.1093/nar/gkt992 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Talevich E, Shain AH, Botton T, Bastian BC. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. PLoS Comput Biol. 2016;12: e1004873 10.1371/journal.pcbi.1004873 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Abyzov A, Urban AE, Snyder M, Gerstein M. CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 2011;21: 974–984. 10.1101/gr.114876.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Dharanipragada P, Vogeti S, Parekh N. iCopyDAV: Integrated platform for copy number variations-Detection, annotation and visualization. PLoS One. 2018;13: e0195334 10.1371/journal.pone.0195334 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Riggs ER, Wain KE, Riethmaier D, Savage M, Smith-Packard B, Kaminsky EB, et al. Towards a Universal Clinical Genomics Database: the 2012 International Standards for Cytogenomic Arrays Consortium Meeting. Hum Mutat. 2013;34: 915–919. 10.1002/humu.22306 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mikhail FM, Burnside RD, Rush B, Ibrahim J, Godshalk R, Rutledge SL, et al. The recurrent distal 22q11.2 microdeletions are often de novo and do not represent a single clinical entity: a proposed categorization system. Genet Med. 2014;16: 92–100. 10.1038/gim.2013.79 [DOI] [PubMed] [Google Scholar]
  • 17.Kim SK, Hannum G, Geis J, Tynan J, Hogg G, Zhao C, et al. Determination of fetal DNA fraction from the plasma of pregnant women using sequence read counts. Prenat Diagn. 2015;35: 810–815. 10.1002/pd.4615 [DOI] [PubMed] [Google Scholar]
  • 18.Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, et al. Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet. 2009;41: 1061–1067. 10.1038/ng.437 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Website. [cited 10 Mar 2020]. Available: Seshan VE, Olshen A. DNAcopy: DNA copy number data analysis. R package version 1.48.0. 2016. http://bioconductor.org/packages/DNAcopy/
  • 20.Gazdarica J, Hekel R, Budis J, Kucharik M, Duris F, Radvanszky J, et al. Combination of Fetal Fraction Estimators Based on Fragment Lengths and Fragment Counts in Non-Invasive Prenatal Testing. Int J Mol Sci. 2019;20 10.3390/ijms20163959 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Minarik G, Repiska G, Hyblova M, Nagyova E, Soltys K, Budis J, et al. Utilization of Benchtop Next Generation Sequencing Platforms Ion Torrent PGM and MiSeq in Noninvasive Prenatal Testing for Chromosome 21 Trisomy and Testing of Impact of In Silico and Physical Size Selection on Its Analytical Performance. PLoS One. 2015;10: e0144811 10.1371/journal.pone.0144811 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, et al. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am J Hum Genet. 2009;84: 524–533. 10.1016/j.ajhg.2009.03.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Lo YMD, Lun FMF, Chan KCA, Tsui NBY, Chong KC, Lau TK, et al. Digital PCR for the molecular detection of fetal chromosomal aneuploidy. Proc Natl Acad Sci U S A. 2007;104: 13116–13121. 10.1073/pnas.0705765104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Minear MA, Lewis C, Pradhan S, Chandrasekharan S. Global perspectives on clinical adoption of NIPT. Prenat Diagn. 2015;35: 959–967. 10.1002/pd.4637 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Gregg AR, Skotko BG, Benkendorf JL, Monaghan KG, Bajaj K, Best RG, et al. Noninvasive prenatal screening for fetal aneuploidy, 2016 update: a position statement of the American College of Medical Genetics and Genomics. Genet Med. 2016;18: 1056–1065. 10.1038/gim.2016.97 [DOI] [PubMed] [Google Scholar]
  • 26.van Schendel RV, van El CG, Pajkrt E, Henneman L, Cornel MC. Implementing non-invasive prenatal testing for aneuploidy in a national healthcare system: global challenges and national solutions. BMC Health Services Research. 2017. 10.1186/s12913-017-2618-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Chandrananda D, Thorne NP, Bahlo M. High-resolution characterization of sequence signatures due to non-random cleavage of cell-free DNA. BMC Medical Genomics. 2015. 10.1186/s12920-015-0107-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Scambler PJ. The 22q11 deletion syndromes. Human Molecular Genetics. 2000. pp. 2421–2426. 10.1093/hmg/9.16.2421 [DOI] [PubMed] [Google Scholar]
  • 29.Neofytou MC, Tsangaras K, Kypri E, Loizides C, Ioannides M, Achilleos A, et al. Targeted capture enrichment assay for non-invasive prenatal testing of large and small size sub-chromosomal deletions and duplications. PLoS One. 2017;12: e0171319 10.1371/journal.pone.0171319 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Ashoor G, Syngelaki A, Poon LCY, Rezende JC, Nicolaides KH. Fetal fraction in maternal plasma cell-free DNA at 11–13 weeks’ gestation: relation to maternal and fetal characteristics. Ultrasound Obstet Gynecol. 2013;41: 26–32. 10.1002/uog.12331 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Kelvin Yuen Kwong Chan

16 Apr 2020

PONE-D-20-07674

Non-invasive prenatal testing (NIPT) by low coverage genomic sequencing: Detection limits of screened chromosomal microdeletions

PLOS ONE

Dear Kubiritova,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

We would appreciate receiving your revised manuscript by May 31 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Kelvin Yuen Kwong Chan, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements:

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following in the Competing Interests section:

"I have read the journal's policy and the authors of this manuscript have the following competing interests:We declare potential competing financial interest in the form of employee contracts (see affiliations for each author) with Geneton Ltd. that participated in the development of a commercial NIPT test in Slovakia. On the other hand, Geneton Ltd. is not a provider of this commercial test, but still continues to do basic and applied research in the field of NIPT. Gnip A, Minarik G and Hyblova M are employees of Medirex Inc./TrisomyTest Ltd. (the commercial providers of NIPT testing in Slovakia), their participation in the study was, however, limited to the routine NIPT testing that generated the genomic results reused in our study. The other authors declare no possible competing interests."

We note that one or more of the authors are employed by a commercial company: Geneton Ltd., Medirex Inc., Trisomy Test Ltd.

  1. Please provide an amended Funding Statement declaring this commercial affiliation, as well as a statement regarding the Role of Funders in your study. If the funding organization did not play a role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript and only provided financial support in the form of authors' salaries and/or research materials, please review your statements relating to the author contributions, and ensure you have specifically and accurately indicated the role(s) that these authors had in your study. You can update author roles in the Author Contributions section of the online submission form.

Please also include the following statement within your amended Funding Statement.

“The funder provided support in the form of salaries for authors [insert relevant initials], but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.”

If your commercial affiliation did play a role in your study, please state and explain this role within your updated Funding Statement.

2. Please also provide an updated Competing Interests Statement declaring this commercial affiliation along with any other relevant declarations relating to employment, consultancy, patents, products in development, or marketed products, etc. 

Within your Competing Interests Statement, please confirm that this commercial affiliation does not alter your adherence to all PLOS ONE policies on sharing data and materials by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests) . If this adherence statement is not accurate and  there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include both an updated Funding Statement and Competing Interests Statement in your cover letter. We will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability.

Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized.

Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access.

We will update your Data Availability statement to reflect the information you provide in your cover letter.

4. Your ethics statement must appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please also ensure that your ethics statement is included in your manuscript, as the ethics section of your online submission will not be published alongside your manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In the paper “Non-invasive prenatal testing (NIPT) by low coverage genomic sequencing: Detection limits of screened chromosomal microdeletions” the authors investigate the feasibility of using cell-free fetal DNA obtained for non-invasive prenatal testing to screen the fetal genome for five clinically relevant microdeletion syndromes using whole genome sequencing based methods.

Currently, NIPT is predominantly focused on the detection of fetal trisomies of chromosomes 13, 18, 21 and the sex chromosomes. Although specific targeted microdeletion tests are offered, they are normally not observed with WGS based methods. For WGS based methods it is not well established what the detection limits and accuracy measures for these syndromes are.

The investigated microdeletion syndromes are sporadic and mostly caused by de-novo deletions that vary in size from ~1.5MB to ~20MB. Boundaries for the actual ‘critical regions’ (the regions that supposedly cause the actual pathogenicity) are often not clearly defined and are mostly smaller than the deletions that are typically observed. Due to the rarity of these syndromes, large scale validation sets are not available. These points complicate the accurate formulation of the performance measures for the detection of these syndromes from NIPT data.

This paper first addresses the problem of defining accurate boundaries for deletions that would cause these syndromes. Then, it uses an ‘in silico’ and a lab-based simulation experiment to determine the sensitivity and precision of detecting deletions that intersect these regions.

In short; the conclusion of this paper is that most syndromes (deletions) can be readily detected from typical NIPT (given a fetal fraction of at least 10% and approximately 20M sequencing reads). The most common DiGeorge syndrome however, is the first to be missed when the fetal fraction becomes too low (<10%).

The paper poses important research questions and answers them sufficiently given the boundaries of the experimental setup.

general remarks:

1) It should be made more clear that the ‘lab mix experiment’ is also a simulated dataset (be it simulated in the lab). No actual fetal micro-deletion pregnancy data is presented. Please reflect on the complexities that will arise when dealing with real microdeletion NIPT data. For instance, what can we expect from the non-uniform distribution of fetal cfDNA and fetal/maternal mosaicism of certain microdeletions, and the huge variation in fetal fraction between samples.

2) Although a complete benchmark is not necessary, the paper would benefit from a comparison of other available tools.

3) The scripts/code does not seem to be publicly available and the methods section alone does not provide enough details on the use of the parameters in order to reproduce these findings accurately. For example, the use of ‘a few steps’ and ‘in-house rules’ (line 127-128), but also ‘bin counts corresponding to these 15 first principal components were removed’, need more in-depth explanation of what is actually happening here to be able to make this reproduceable. Finally, the fact that the segmentation rule uses the exact simulated fetal fraction, seems to be a bit of a fitting procedure. The effect of correctly specifying this parameter on the presented results and the applicability on real NIPT data are unclear to me, and should be addressed. We recommend that you make the scripts available for other users to benefit.

Remarks on the methods section:

Determination of critical/pathological boundaries

To determine the pathological boundaries for each syndrome, the paper describes a somewhat ad-hoc manner in which the most common intersection of deletions in patients with the same syndrome is described, whereas for most of these common syndromes the most common boundaries are thought to be well known. Figures 3 and 4 were more helpful and convincing than the description of filtering criteria. Figures for the other three syndromes are supplied supplementary. Please replace them all by a single Figure in the main text showing the coverage for all five syndromes, as it clearly shows the complexities involved in determining the pathological boundaries, especially for the telomeric microdeletions.

Preparation of artificial NIPT data sets

Based on real NIPT sequencing data, read counts in all bins (of size 20kb) that fall within the pathological boundaries are multiplied by 1-(ff/2) to simulate a deletion. This way, fetal fraction, number of sequenced reads and deletion sizes can be varied in order to test detection accuracy. Although this might be a correct way to simulate the problem ‘in silico’, it should be mentioned in the text that this approach assumes that the fetal fraction is uniformly distributed across the bins, which we know is not the case. Also, a note on mosaicism, might be in place.

Identification of microaberrations

Circular binary segmentation on normalized bin counts; “Identified segments were evaluated using an in-house rule to determine significance”: this needs more detail. It should be replaced by the statistical test that was performed. Or elaborate on the ‘in-house’ rule.

Normalisation

- Then use first 15 PCs from PCA based on reference set of 341 healthy samples:

o Was this based on cfDNA or normal DNA sequencing from 341 samples? No pregnancies?

o “bin counts corresponding to these 15 first principal components were removed”, it is not clear to me what is actually done then? Are certain bins removed directly? Or are bin counts adjusted according to a multiple regression scheme with the 15 PCs as covariates? Maybe add a reference?

o Also, the manuscript says that these 15 PCs now represent ‘common noise’, but couldn’t it be that this is not noise but actual structure that is caused by e.g. common CNVs? Or are there other technical confounders with respect to the sequencing of the reference set (e.g. different sequencing platforms, read length etc.)? Please explain.

Segment identification and CNV calling

- Summary: “Plain circular binary segmentation; over-partitions the genome; fix to pair CBS with a rule based on fetal fraction; theoretical case, mean bin count changes with respect to the fetal fraction in the following manner: mb*ff/2; use segmentation threshold based on the fraction of the theoretical in/decrease.”

- The use of such a rule depends on knowing the actual fetal fraction. In your experiments this is not a problem, as you know the fetal fraction, however, in practice, the fetal fraction can be predicted, but this comes with a large error. The effect of this parameter on the results presented here should therefore be made clear. This seems a weak spot of the presented method, and mentioning this would help readers to be aware of this.

Other:

- Fig 1, is this real, in silico or DNA sample mix data?

- Fig2 ‘Real samples’, probably better to use ‘control samples’, as this is also done throughout the text

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Jasper Linthorst and Erik Sistermans

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Aug 26;15(8):e0238245. doi: 10.1371/journal.pone.0238245.r002

Author response to Decision Letter 0


16 Jul 2020

Among attached files there is also ´Response to reviewers´file, where both, reviewer and editor comments are answered.

Here I copy the same text as there is in the attached file labeled ´Response to reviewers´:

At first, we would like to thank the reviewers for their critical revision of our manuscript and for the commentaries and suggestions, which were incorporated in our revised manuscript. Our specific replies are provided below.

Comments to the academic editor and requirements of the journal:

- We properly checked the manuscript and edited some sections (specified below or see in the 'Revised Manuscript with Track Changes') to meets the PLOS ONE's style requirements. In the title page, we added TrisomyTest Ltd. in the affiliation section, since it was missed and deleted Author contribution´ section as it should be upload during online submission. We also updated our Funding Statement and Competing Interests Statement according to requirements. In the Methods section we included the Ethic Statement and edited “Normalization and filtering” section in the Methods section. We also attached ´Data Availability Statement´ and included URL address https://github.com/marcelTBI/CNV_data, where the data used for training both PCA normalization and per-bin mean normalization is available as bin counts for individual samples (20kb bin size).

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

- Answer to the first and second point, we added some information in to ´Introduction, Results and Discussion section´ and re-written Material and Method section according your comments.

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

- We added data used for training both PCA normalization and per-bin mean normalization available at https://github.com/marcelTBI/CNV_data as bin counts for individual samples (20kb bin size). Raw genomic sequences cannot be made publicly available to ensure privacy of study participants (specified in detail in Data Availability Statement).

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

- Thank you.

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In the paper “Non-invasive prenatal testing (NIPT) by low coverage genomic sequencing: Detection limits of screened chromosomal microdeletions” the authors investigate the feasibility of using cell-free fetal DNA obtained for non-invasive prenatal testing to screen the fetal genome for five clinically relevant microdeletion syndromes using whole genome sequencing based methods.

Currently, NIPT is predominantly focused on the detection of fetal trisomies of chromosomes 13, 18, 21 and the sex chromosomes. Although specific targeted microdeletion tests are offered, they are normally not observed with WGS based methods. For WGS based methods it is not well established what the detection limits and accuracy measures for these syndromes are.

The investigated microdeletion syndromes are sporadic and mostly caused by de-novo deletions that vary in size from ~1.5MB to ~20MB. Boundaries for the actual ‘critical regions’ (the regions that supposedly cause the actual pathogenicity) are often not clearly defined and are mostly smaller than the deletions that are typically observed. Due to the rarity of these syndromes, large scale validation sets are not available. These points complicate the accurate formulation of the performance measures for the detection of these syndromes from NIPT data.

This paper first addresses the problem of defining accurate boundaries for deletions that would cause these syndromes. Then, it uses an ‘in silico’ and a lab-based simulation experiment to determine the sensitivity and precision of detecting deletions that intersect these regions.

In short; the conclusion of this paper is that most syndromes (deletions) can be readily detected from typical NIPT (given a fetal fraction of at least 10% and approximately 20M sequencing reads). The most common DiGeorge syndrome however, is the first to be missed when the fetal fraction becomes too low (<10%).

The paper poses important research questions and answers them sufficiently given the boundaries of the experimental setup.

Comments to the Reviewers´ comments:

General remarks

1) It should be made more clear that the ‘lab mix experiment’ is also a simulated dataset (be it simulated in the lab). No actual fetal micro-deletion pregnancy data is presented. Please reflect on the complexities that will arise when dealing with real microdeletion NIPT data. For instance, what can we expect from the non-uniform distribution of fetal cfDNA and fetal/maternal mosaicism of certain microdeletions, and the huge variation in fetal fraction between samples.

- Thank you for a valuable remark. We have added notes on mosaicism, fetal fraction and complexities of real data to discussion and results sections.

2) Although a complete benchmark is not necessary, the paper would benefit from a comparison of other available tools.

- We have included a short overview of available tools to the Introduction and we are working on a comparison of the accuracy of prediction based on the parameters (fetal fraction, aberration length, …) for a few most notable tools - however this is not yet done and much beyond the scope of the article presented.

3) The scripts/code does not seem to be publicly available and the methods section alone does not provide enough details on the use of the parameters in order to reproduce these findings accurately. For example, the use of ‘a few steps’ and ‘in-house rules’ (line 127-128), but also ‘bin counts corresponding to these 15 first principal components were removed’, need more in-depth explanation of what is actually happening here to be able to make this reproduceable. Finally, the fact that the segmentation rule uses the exact simulated fetal fraction, seems to be a bit of a fitting procedure. The effect of correctly specifying this parameter on the presented results and the applicability on real NIPT data are unclear to me, and should be addressed. We recommend that you make the scripts available for other users to benefit.

- We have expanded the description of the used normalization, filtering, and segment identification. The effect of the used fetal fraction is also discussed at the end of “Segment identification and CNV calling” section. Unfortunately, the scripts cannot be made publicly available as we state in the “Identification of microaberrations” section. Few numbers were changed to reflect recent changes to our CNV detection procedures (the tool is constantly “in development”), but mostly they are only “cosmetic” changes.

Remarks on the methods section:

Determination of critical/pathological boundaries

To determine the pathological boundaries for each syndrome, the paper describes a somewhat ad-hoc manner in which the most common intersection of deletions in patients with the same syndrome is described, whereas for most of these common syndromes the most common boundaries are thought to be well known. Figures 3 and 4 were more helpful and convincing than the description of filtering criteria. Figures for the other three syndromes are supplied supplementary. Please replace them all by a single Figure in the main text showing the coverage for all five syndromes, as it clearly shows the complexities involved in determining the pathological boundaries, especially for the telomeric microdeletions.

- As was suggested, we have created a single figure including Figures 3 and 4 and Figures A, B and C from supplementary material in the main text as Figure 3. In addition to this, we edited figure numbering in the main text and also in the supplementary material.

Preparation of artificial NIPT data sets

Based on real NIPT sequencing data, read counts in all bins (of size 20kb) that fall within the pathological boundaries are multiplied by 1-(ff/2) to simulate a deletion. This way, fetal fraction, number of sequenced reads and deletion sizes can be varied in order to test detection accuracy. Although this might be a correct way to simulate the problem ‘in silico’, it should be mentioned in the text that this approach assumes that the fetal fraction is uniformly distributed across the bins, which we know is not the case. Also, a note on mosaicism, might be in place.

- We have added a remark on non-constant fetal fraction. Mosaicism is discussed in the Discussion.

Identification of microaberrations

Circular binary segmentation on normalized bin counts; “Identified segments were evaluated using an in-house rule to determine significance”: this needs more detail. It should be replaced by the statistical test that was performed. Or elaborate on the ‘in-house’ rule.

- We have moved all the details of the segment categorization to “Segment identification and CNV calling” section.

Normalisation

- Then use first 15 PCs from PCA based on reference set of 341 healthy samples:

o Was this based on cfDNA or normal DNA sequencing from 341 samples? No pregnancies?

o “bin counts corresponding to these 15 first principal components were removed”, it is not clear to me what is actually done then? Are certain bins removed directly? Or are bin counts adjusted according to a multiple regression scheme with the 15 PCs as covariates? Maybe add a reference?

o Also, the manuscript says that these 15 PCs now represent ‘common noise’, but couldn’t it be that this is not noise but actual structure that is caused by e.g. common CNVs? Or are there other technical confounders with respect to the sequencing of the reference set (e.g. different sequencing platforms, read length etc.)? Please explain.

- We have rewritten the “Normalization and filtering” section to accomodate all of the questions. PCA normalization was done according to the publication mentioned in the previous section - the reference was added to the Normalisation section, too.

Segment identification and CNV calling

- Summary: “Plain circular binary segmentation; over-partitions the genome; fix to pair CBS with a rule based on fetal fraction; theoretical case, mean bin count changes with respect to the fetal fraction in the following manner: mb*ff/2; use segmentation threshold based on the fraction of the theoretical in/decrease.”

- The use of such a rule depends on knowing the actual fetal fraction. In your experiments this is not a problem, as you know the fetal fraction, however, in practice, the fetal fraction can be predicted, but this comes with a large error. The effect of this parameter on the results presented here should therefore be made clear. This seems a weak spot of the presented method, and mentioning this would help readers to be aware of this.

- We have added a remark on the fetal fraction input to the end of the section.

Other:

- Fig 1, is this real, in silico or DNA sample mix data?

- This is real data. We added it to sample description.

- Fig 2 ‘Real samples’, probably better to use ‘control samples’, as this is also done throughout the text

- In the main text of the submitted manuscript we used “control samples” in the Fig 2.

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Jasper Linthorst and Erik Sistermans

- We agree with the possibility to publish the peer review history of our article.

We hope that our changes will strengthen our manuscript and that both, the academic editor and reviewers will be satisfied with these changes or modifications.

Sincerely,

Zuzana Kubiritova

Attachment

Submitted filename: Response to Reviewers.docx

Decision Letter 1

Kelvin Yuen Kwong Chan

13 Aug 2020

Non-invasive prenatal testing (NIPT) by low coverage genomic sequencing: Detection limits of screened chromosomal microdeletions

PONE-D-20-07674R1

Dear Dr. Kubiritova,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Kelvin Yuen Kwong Chan, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: N/A

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Acceptance letter

Kelvin Yuen Kwong Chan

17 Aug 2020

PONE-D-20-07674R1

Non-invasive prenatal testing (NIPT) by low coverage genomic sequencing: Detection limits of screened chromosomal microdeletions

Dear Dr. Kubiritova:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Kelvin Yuen Kwong Chan

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Text

    (PDF)

    Attachment

    Submitted filename: Response to Reviewers.docx

    Data Availability Statement

    Data used for training and testing are available from GitHub (https://github.com/marcelTBI/CNV_data). These data include anonymized mapped data without genomic information to ensure participant confidentiality. The repository contains all needed data and scripts for reproduction of results of the article. The scripts and the data are for non-commercial use only, since they are part of a commercially used tests Trisomy Test + and Trisomy Test Complete (https://trisomytest.sk/en/) and are intellectual properties of Geneton Ltd.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES