Skip to main content
PLOS One logoLink to PLOS One
. 2016 Aug 16;11(8):e0161012. doi: 10.1371/journal.pone.0161012

Low Input Whole-Exome Sequencing to Determine the Representation of the Tumor Exome in Circulating DNA of Non-Small Cell Lung Cancer Patients

Steffen Dietz 1,2,#, Uwe Schirmer 1,2,#, Clémentine Mercé 1,2, Nikolas von Bubnoff 3,4, Edgar Dahl 5, Michael Meister 2,6, Thomas Muley 2,6, Michael Thomas 2,7, Holger Sültmann 1,2,4,*
Editor: Alvaro Galli8
PMCID: PMC4987014  PMID: 27529345

Abstract

Circulating cell-free DNA (cfDNA) released from cancerous tissues has been found to harbor tumor-associated alterations and to represent the molecular composition of the tumor. Recent advances in technologies, especially in next-generation sequencing, enable the analysis of low amounts of cfDNA from body fluids. We analyzed the exomes of tumor tissue and matched serum samples to investigate the molecular representation of the tumor exome in cfDNA. To this end, we implemented a workflow for sequencing of cfDNA from low serum volumes (200 μl) and performed whole-exome sequencing (WES) of serum and matched tumor tissue samples from six non-small cell lung cancer (NSCLC) patients and two control sera. Exomes, including untranslated regions (UTRs) of cfDNA were sequenced with an average coverage of 68.5x. Enrichment efficiency, target coverage, and sequencing depth of cfDNA reads were comparable to those from matched tissues. Discovered variants were compared between serum and tissue as well as to the COSMIC database of known mutations. Although not all tissue variants could be confirmed in the matched serum, up to 57% of the tumor variants were reflected in matched cfDNA with mutations in PIK3CA, ALK, and PTEN as well as variants at COSMIC annotated sites in all six patients analyzed. Moreover, cfDNA revealed a mutation in MTOR, which was not detected in the matched tissue, potentially from an untested region of the heterogeneous primary tumor or from a distant metastatic clone. WES of cfDNA may provide additional complementary molecular information about clinically relevant mutations and the clonal heterogeneity of the tumors.

Introduction

Since circulating cell-free DNA (cfDNA) was first shown to carry somatic aberrations, its utility for molecular characterization of tumor diseases has been demonstrated in several recent studies [14]. Thus, the analysis of cfDNA has become one focus of biomarker research in molecular oncology. Currently, tissue biopsies are still the gold standard for molecular genotyping of tumor diseases. However, tissue biopsies are associated with the risk of invasive procedures and often provide only limited information about the heterogeneous molecular composition of the tumor and its genetic causes. Especially characterization of spatial and temporal intra-tumor heterogeneity of primary and metastatic lesions requires unfeasible serial sampling from multiple sites, indicating the strong need for less invasive approaches [5, 6]. CfDNA, easily to obtain from blood, is a potential source of diagnostic and prognostic biomarkers. Recent studies demonstrated the analysis of cfDNA as potential minimal-invasive surrogate for cancer diagnostics and prognostics [79]. Sequential characterization of genetic aberrations in cfDNA has been demonstrated for dynamic therapy monitoring and as an indicator of molecularly manifested resistance [1012]. Moreover, detection of cfDNA in the circulation of cancer patients after surgery could potentially indicate minimal residual disease, which may eventually lead to disease recurrence [13, 14].

Recent technological advances, especially in sequencing and digital PCR technologies, allow the analysis of low amounts of circulating DNA from different body fluids. To date, BEAMing and digital (droplet) PCR have been introduced to detect and track mutations in cfDNA in plasma and serum from cancer patients [8, 15, 16]. These technologies are predominantly used for the analysis of mutational hotspots, as they require previous knowledge of the mutation sites. In addition, the poor integrity of cfDNA, which is typically of about 166 bp in size [17], considerably reduces the efficiencies of all PCR-dependent approaches. In contrast, next generation sequencing allows global identification of molecular variants leading to malignant transformation at a genome-wide scale. In the past decade, large international sequencing consortia have revealed various cancer-associated somatic alterations and have led to a better understanding of the complex molecular composition of tumors, e.g. non-small cell lung cancer (NSCLC) [18, 19]. Only few prominent cancer genes were found to be recurrently mutated at high frequencies among multiple tumor types, whereas the majority of somatic events are present at lower frequencies [2022]. In NSCLC, which is the leading cause of malignancy-related mortality [23], patient tissues often harbor activating mutations in KRAS or in members of the ERBB gene family as well as loss-of-function mutations in the tumor suppressor gene TP53 [19]. However, comprehensive molecular genotyping efforts also revealed a broad mutational spectrum [18, 19]. Hence, since cancer harbors individual mutational signatures, exome sequencing offers the advantage to identify individual coding and UTR mutations aside from the prominent mutational hotspots. Different approaches including whole-genome as well as targeted deep sequencing of cancer-associated loci in cfDNA have been reported for cancer genotyping [1, 2, 12, 24]. Furthermore, recent proof-of-concept studies illustrate the utility of whole-exome sequencing (WES) of cfDNA for disease monitoring under therapy in several cancer entities, including NSCLC. [11, 25, 26]. Besides profiling of disease-associated genetic variants, exome sequencing further enables the identification of emerging molecular resistance markers. However, up to date there is no general consensus or standardized method for the analysis and WES of cfDNA and most commonly available technologies require large amounts of starting material. Moreover, the molecular representation of the complex tumor exome in cfDNA has not yet been investigated comprehensively.

Here, we evaluated WES to assess the exomes of six NSCLC patients in primary tumor and corresponding serum samples. To this end, we implemented a workflow for WES from low volumes of 200 μl serum by combining an ultra-low input library preparation protocol with a hybridization-based exome enrichment technology. Our results provide evidence for cfDNA to inform about the molecular constitution of the disease in the six advanced cancer patients with up to 57% of the tumor variants represented in the matched serum samples. By comparing gene sets of frequently mutated genes and the COSMIC database to WES data, we identified common cancer associated mutations (e.g. PIK3CA, ALK, MAP2K3, and PTEN) in serum and tissue pairs. Moreover, we detected additional mutations of clinical relevance in cfDNA, including a potentially actionable mutation in MTOR, which were not found in the primary tumors. In summary, we show that WES of cfDNA informs about the primary tumors’ molecular alterations and can provide complementary information about the mutational patterns in distant clones.

Materials and Methods

Sample collection

Tumor tissue and corresponding serum from six NSCLC patients was collected at the Thoraxklinik Heidelberg and provided via LungBiobiank Heidelberg. Of the six cases, three were diagnosed with lung adenocarcinoma (LUAD) and three with squamous cell carcinoma (SCC). All patients had provided written informed consent. Blood was collected in S-Monovette 7,5ml Z-Gel tubes (Sarstedt, Nürmbrecht, Germany), allowed to clot for 60 min and then centrifuged for 10 min at 2,000 × g at 10°C. Serum was stored −80°C until use. Two serum pools were collected at the Thoraxklinik Heidelberg and used as control and for protocol implementation. Tissue samples were examined for tumor cell content by pathologists, snap-frozen and stored at -80°C. The study was approved by the local ethics committee of the Medical Faculty Heidelberg (270/2001) with amendment 3 (July 31, 2014).

Isolation and QC of circulating DNA

DNA was isolated from 200 μL serum using the QIAamp DNA Blood Mini Kit (Qiagen, Hilden, Germany). To ensure efficient lysis of DNA-bound proteins, serum was subjected to proteinase K digestion at 37°C for 1h. Purified cfDNA was quantified by digital PCR using the QuantStudio 3D System (Thermo Fischer Scientific, Waltham, MA, USA). Allele copies of the TERT locus in plasma DNA were quantified and the DNA amount was calculated based on an external standard reference curve of fragmented genomic DNA. Briefly, 3 μL of purified cfDNA were mixed with 7.25 μL QS3D Master Mix v2, 0.75 μL TaqMan Copy Number Reference Assay TERT (Thermo Fischer Scientific), and 3.5 μl water. Due to the low integrity of cfDNA, genomic DNA (Roche Diagnostics, Mannheim, Germany) of the external standard curve was sheared to the same length in order to compensate for the influence of the DNA integrity on PCR reactions and quantity estimations. The integrity of cfDNA was examined by capillary electrophoresis on a Bioanalyzer 2100 system with the High Sensitivity DNA Kit (Agilent Technologies, Santa Clara, CA, USA). Approximately 500 pg cfDNA was used for Bioanalyzer analysis. Digital PCR chips were loaded, thermal cycled, and analyzed according to the manufacturer`s instructions.

Isolation of genomic DNA from tumor tissues

Fresh frozen tumor tissue was homogenized using a TissueLyser II (Qiagen) and genomic DNA was extracted using the AllPrep DNA/RNA/miRNA Universal Kit (Qiagen) according to the manufacturer’s protocol. DNA concentrations were determined using a Nanodrop ND-1000 spectrophotometer.

Library preparation and exome enrichment

Prior to library preparation, tissue and serum DNA was sheared to an average fragment length of 150 bp using a S220 Focused-ultrasonicator (Covaris, Woburn, MA, USA). Sequencing libraries were prepared by adapter ligation and PCR amplification using the ThruPLEX-FD Prep Kit (Rubicon Genomics, Ann Arbor, MI, USA) according to the manufacturer’s instructions. Starting from approximately 10 ng of cfDNA, libraries were generated using a total of 11 amplification cycles, consisting of four cycles to fuse the index adapters with the prepared template molecules and seven amplification cycles. Corresponding tumor tissue libraries were prepared from 50 ng DNA using seven amplification cycles. To reduce the number of PCR duplicates in sequencing reads and to avoid amplification biases, EvaGreen was added to the PCR reaction master mix and the amplification was monitored in real time. Once the PCR reaction had reached the exponential amplification phase, it was terminated. The number of required PCR cycles was evaluated in previous experiments. Different barcodes were used for library indexing to allow sample pooling for multiplexed exome capture and sequencing. Hybridization-based exome enrichment was performed using the Agilent SureSelectXT2 All Exon v5 + UTR target enrichment system (Agilent Technologies, Santa Clara, CA, USA). Equal amounts of 215 ng of 7 multiplexed libraries (3 from serum and corresponding tissues as well as 1 from pooled control serum) were combined for enrichment. Universal Blocking Oligos (Integrated DNA Technologies, Coralville, IA, USA) were added to the library pools to ensure compatibility of the hybridization probes with ThruPLEX libraries. Captured libraries were amplified independently in two separate PCR reactions and pooled again afterwards. Library sizes and qualities were evaluated pre- and post-exome enrichment by Bioanalyzer 2100 analysis using the High Sensitivity DNA Kit (Agilent Technologies) and quantified using the Qubit dsDNA HS Assay kit (Thermo Fischer Scientific). Enriched multiplexes were subjected to 100 bp paired-end sequencing using the Illumina HiSeq 2000 v3 at the DKFZ Genomics and Proteomics Core Facility. Each 7-plexed library pool was loaded on two lanes in order to increase the read count per sample.

NGS data processing

A custom computational analysis pipeline was implemented for WES data processing as well as comparison of variants called from tumor tissue and matched serum samples. Upon quality score estimation using FastQC (v0.11.5), FASTQ files were aligned to the human genome (hg19/ GRCh37) using BWA v0.7.4 [27]. Mapping statistics were calculated using SAMtools (v0.1.19) [28] and target enrichment quality and target coverage was assessed using the R Target Enrichment Quality Control (TEQC 3.2.0) package [29] and a custom R script (http://www.gettinggeneticsdone.com/2014/03/visualize-coverage-exome-targeted-ngs-bedtools.html). PCR duplicates were removed using Picard MarkDuplicates (Picard tools v1.129). Mapped reads were locally realigned around known insertion and deletion sites [30] and recalibrated using RealignerTargetCreator, IndelRealigner, and BaseRecalibration from GATK (v3.5–0) [31].

Variant calling and processing

Variants and small INDELs were called using HaplotypeCaller from GATK (v3.5–0). Annotation and effect prediction of identified variants was performed using snpEff 4.1g [32]. Since no matched normal tissue or germline DNA was available from the tumor patients, all variants were subsequently filtered. Variants present in the dbSNP database (dpSNP138) were considered as SNPs and removed. Variants in tumor tissues were only retained for further analysis if they had a mutant allele frequency between 20% and 80% (above 80% was considered as homozygous and thus as germline variant), a minimum sequencing depth of 20x, and a minimum base quality of 50. Variants in cfDNA with a sequencing depth <10x were removed. Variants in tumor tissue were compared with those in the corresponding serum using VCFtools (v0.1.12b) [33]. We further excluded identical variants identified in more than 2 patients, as these are most likely technical artifacts. To identify cancer relevant mutations, variants from NSCLC tissue and serum DNA were compared to the COSMIC database of known somatic mutations. Since no matched normal tissue was available from NSCLC patients, we designed gene sets for LUAD and SCC based on the frequency of mutations listed in the COSMIC and TCGA datasets: The LUAD set of 58 genes was built based on the most frequently mutated genes in the TCGA [19] and COSMIC database for LUAD, a public NSCLC gene panel [12], and the COSMIC top 20 cancer genes for LUAD (S1 Table). The SCC set of 45 genes was designed based on the most frequently mutated genes in the TCGA [18] and COSMIC database for SCC, a published NSCLC gene panel r [12], and the COSMIC top 20 cancer genes for SCC (S2 Table). Variants in tumor tissues and corresponding cfDNA were screened for mutations in genes of the LUAD and SCC sets using VCFtools (v0.1.12b) [33] and visualized in the Integrative Genomics Viewer (IGC v.2.3) [34].

Sanger Sequencing

Prior to Sanger sequencing, a 98 bp fragment spanning the MTOR mutation c.4228 C>A (p.P1410T) in patient 4 was amplified using the KAPA High Fidelity HotStart PCR kit (Kapa Biosystems, Wilmington, MA, USA). The PCR reaction contained 1X KAPA HiFi Fidelity buffer, 0.3 mM each dNTP, 0.3 μM forward primer (5´-GAGGACCGTCGCTTGGTG -3´), 0.3 μM reverse primer (5´- CGAGCATATGCCAAAGCACT—3´), 0.5 U KAPA HiFi HotStart DNA Polymerase, and 5 ng cfDNA or 20 ng tumor tissue DNA in a total volume of 25 μl per reaction. Cycling conditions were as follows: Initial denaturation at 95°C for 3 min, 35 cycles of 98°C for 20 s, 62°C for 15 s, and 72°C for 30 s, followed by a final extension at 72°C for 5 min. The PCR products were purified using the QIAquick PCR Purification Kit (Qiagen) according to the manufacturer’s protocol. Sequencing was performed at GATC Biotech AG (Konstanz, Germany).

Results

Of the six NSCLC patients analyzed, three were female and three male. All patients were diagnosed with advanced, lymph node-positive stage III tumors, three SCC and three LUAD. All patients included had a smoking history. Patient data and clinical characteristics are summarized in (Table 1).

Table 1. Patient characteristics.

Patient Gender Smoking history (py) Tumor type Stage TNM Diameter
P1 F former smoker (40 py) SCC III A pT4 N1 M0 5.5 cm
P2 M smoker (40 py) SCC III A pT3 N2 M0 8 cm
P3 M former smoker (50 py) LUAD III A pT4 N1 M0 11.2 cm
P4 F former smoker (15 py) LUAD III B pT4 N2 M0 7.2 cm
P5 M former smoker (-) LUAD III B pT4 N2 M0 9.5 cm
P6 F smoker (60 py) SCC III B pT4 N2 M0 5.5 cm

(F: female; M: male; py: packyears; SCC: squamous cell carcinoma, LUAD: lung adenocarcinoma)

Experimental platform and serum processing

To investigate genomic alterations in cfDNA, we initially implemented an experimental and computational workflow (S1 Fig) for WES analysis of cfDNA from low volumes of serum and matched tumor tissue samples. Information including yields and input amounts from each step of the workflow are summarized in Table 2. Starting from 200 μL serum, purified cfDNA was quantified by digital PCR. Quantification revealed a wide range of cfDNA amounts from 131 ng/mL to 1,168 ng/mlL serum. The recovery from 200 μL was higher in sera from NSCLC patients (median: 76.01 ng; range: 26.22–233.67 ng), compared to pooled control sera (median: 34.82 ng; range 24.5–45.13 ng). To assess the integrity of cfDNA, we performed capillary gel electrophoresis. Quality assessment also revealed variance between the samples and clear differences in the integrity and size distribution of cfDNA fragments. Profiles of all serum samples revealed an accumulation of short DNA molecules with a predominant fragment size of 166 bp, which is in correspondence with the nucleosomal appearance of circulating DNA fragments bound to a nucleosome plus linker histones [3537]. No difference was observed between serum DNA from cancer patients and control subjects. However, sizing of cfDNA from NSCLC patients further revealed a di- and trinucleosomal fragmentation pattern with molecules of multiples of this size (Fig 1A). We observed cfDNA with a median fragment length of about 360 and 541 bp in four of the six cases (data not shown), representing a (oligo-) nucleosomal laddering and thus indicating the potential origin of cfDNA from cellular DNA cleavage during apoptosis [38, 39]. Previous reports have shown a correlation between the biphasic pattern of plasma DNA fragments and the number of circulating tumor cells (CTCs) as well as elevated plasma DNA concentrations [40]. Further, an increased percentage of mutated DNA molecules in the circulation of cancer patients with biphasic plasma DNA size distribution was noted [40]. In addition, we detected high molecular weight DNA in the sera of patients 2 and 4.

Table 2. Sample characteristics and quality metrics of the sequencing data from cfDNA and corresponding tumor tissues.

Patient Sample ctrl1 ctrl2 P1 P2 P3 P4 P5 P6 median (P1-P6)
DNA amount (ng/mL serum) Serum 123 226 314 1168 620 446 298 131 380
Fragment size Serum 173 169 166 178 166 165 177 159 166
Library insert size Serum 148 149 168 175 163 165 164 167 166
Tissue - - 145 134 127 128 132 141 133
GC content (%) Serum 47 45 47 48 48 47 47 47 47
Tissue - - 48 46 45 44 46 45 45.5
Number of raw reads (mio.) Serum 166 118 140 182 136 181 190 107 160.5
Tissue - - 192 91 161 145 159 119 152
Propely paired reads (mio.) Serum 139 100 120 157 115 155 162 91 137.5
Tissue - - 160 78 140 126 126 103 126
Median target coverage Serum 80x 49x 63x 74x 48x 85x 77x 38x 68.5x
Tissue - - 92x 39x 71x 57x 65x 54x 61x
Targets with coverage >20x (%) Serum 66 62 64 64 60 66 64 58 64
Tissue - - 67 61 65 63 65 63 64
High quality filtered reads (mio.) Serum 26.8 15.96 17 14.56 11.44 23.31 16.3 11.15 15.43
Tissue - - 39.59 36.15 36.93 39.86 54.84 38.66 39.125
Number of variants called Serum 53,728 43,232 43,315 37,170 32,350 46,716 39,933 34,273 38,552
Tissue - - 50,084 44,876 49,024 47,245 49,105 48,080 48,552
Number of variants not in dbSNP Serum 11,449 8,966 9,733 8,305 7,255 10,782 9,090 7,299 8,698
Tissue - - 12,678 10,985 12,937 11,943 11,845 11,253 11,894
Filtered variants Serum 7,623 5,049 2,660 1,073 589 4,105 1,759 769 1,416
Tissue - - 3,322 1,892 2,861 2,232 2,820 2,294 2,557
Common variants in serum + tissue Serum + Tissue - - 1,090 234 148 1,265 621 241 431
Common variants in serum + tissue (% of tissue variants) Serum + Tissue - - 32.81 12.37 5.17 56.68 22.02 10.51 17.195

Fig 1. Integrity of cfDNA and a corresponding sequencing library.

Fig 1

(A) Integrity and size distribution of cfDNA fragments from patient 1 showing a nucleosomal laddering of cfDNA with fragment sizes of 166, 360, and 515 bp; (B) Corresponding sequencing library from patient 1, prepared from 10ng cfDNA.

Library preparation and exome sequencing

Due to the observed size distribution and the nucleosomal laddering, we sheared the cfDNA by ultrasonification in order to increase the amount of appropriately sized input molecules for library preparation. Most commercially available technologies for WES require amounts of < 1 μg genomic DNA as starting material. However, since the DNA yields from 200μl serum or plasma are typically in the low ng range, we aimed to perform WES from serum DNA by combining an ultra-low input library preparation protocol with a hybridization-based approach for exome enrichment. Starting from 10 ng of sonicated serum DNA, we generated indexed sequencing libraries from the six NSCLC and two control samples. Quality assessment confirmed sufficient yields above 200 ng as well as good qualities of the sequencing libraries with median sizes of 297 bp (Fig 1B).

Hybridization-based exome enrichment was performed using the Agilent SureSelectXT2 All Exon v5 + UTR target enrichment system. Compatibility of the SureSelect technology with ThruPLEX low input libraries has been shown in a previous study [41]. Here we combined the ThruPLEX-FD library preparation with the SureSelect technology for WES analysis of cfDNA. In each analyzed multiplex, we pooled equal amounts of cfDNA and corresponding tumor tissue libraries from three NSCLC patients as well as one library generated from control serum DNA. To further increase the complexity, captured libraries of each pool were split, amplified independently in two separate PCR reactions, and pooled again after amplification. To assess the quality of the enriched products, we performed fragment analysis. Consistent with the average size of the input libraries, both multiplexes revealed fragment sizes of approximately 295 bp and were sequenced on two lanes on the Illumina HiSeq instrument.

Evaluation of cfDNA sequencing performance

In median, 161 million paired reads (range: 107–190 million) were obtained from serum DNA and approximately 145 million paired reads (range: 90–192 million) from the corresponding NSCLC tissues. We first examined the overall performance of our exome sequencing approach and data quality of serum DNA reads by calculating different quality metrics, including read count, library insert size, GC content, properly paired reads, enrichment efficiency, target coverage, and read count after post-processing (Table 2). We observed no difference in the alignment of serum and tissue reads: A mean of 86% reads from serum and 85% reads from tissue samples were uniquely aligned to the human reference genome (hg19), resulting in 130 million and 122 million perfectly mapped reads. After removal of PCR duplicates, we investigated whether the DNA shearing had a negative influence on the fragmented cfDNA molecules, which were already of mononucleosomal size before sonification. Estimation of the actual library insert sizes from patient sera using Picard revealed a median insert size of 166 bp, which is consistent with the median size of cfDNA fragments of 166 bp.

Regions with high or low GC content negatively affect library PCR amplification [42, 43] and target hybridization efficiency [44]. Thus, GC- or AT-rich regions might be underrepresented especially in cfDNA reads with an increased number of amplification cycles. Analysis of the GC composition revealed no differences between the GC contents of serum (mean 47%) and tissue samples (mean 46%), indicating that the target regions are equally represented in both specimen types.

Capture efficiency is a central aspect of hybridization-based exome sequencing. In order to evaluate the exome enrichment performance, we estimated the percentage of reads aligned to the target as well as the target region coverage using the R package TEQC [29]. A fraction of 84% of the uniquely and properly paired cfDNA reads were mapped to the target region, resulting in a median exome sequencing depth of 68.5x (range 38x to 85x, Fig 2A). No differences of on-target ratios between serum and tissue DNA were observed. About 64% of the target regions in serum (58–66%) and tumor tissue (61–67%) were sequenced with > 20x coverage (Table 2). Corresponding tissue exomes were sequenced with a median depth of 61x (range 39x to 92x, Fig 2B). Although on average fewer reads were obtained from tissue samples by equal uniquely mapped read fraction on target, the higher coverage of tissue samples might be a result of the increased number of duplicates in serum libraries due to the lower starting amount. A higher library complexity might also influence hybridization efficiency leading to the higher coverage.

Fig 2. Target coverage distributions.

Fig 2

Exome sequence coverages in primary NSCLC tissues (A) and cfDNA from corresponding serum samples (B).

In order to achieve only high confidence unique target reads for variant analysis, we performed stringent post-mapping read processing using RealignerTargetCreator and IndelRealigner from GATK. Mapped reads were locally realigned around known insertion and deletion sites from the 1000 Genomes Project [30] in order to reduce the number of mismatching bases, which are easily mistaken as SNPs. Furthermore, all Phred scores were recalibrated to more accurately represent the real error probability, taking into account known SNPs and specific positions on the reads. After post-processing, we retained a median of 15 million (range 11.1–26.8 million) de-duplicated high quality reads localized to the target regions from serum DNA and 41 million reads (range: 36.2–54.8 million) from tissue DNA (Table 2).

Identification of high-confidence variants in serum and tissue

The main aim of this study was to compare variants from tumor tissues with those found in corresponding serum samples, independent of their somatic origin. Therefore, we assessed the common variants in serum and tissue pairs in order to examine the informative value of cfDNA and to which extent it represents the tumors´ genetic profiles. First, we called variants in the filtered reads of cfDNA and corresponding NSCLC exomes using the GATK HaplotypeCaller. Consistent with previous reports on WES without matched normal tissue [4547], we identified mean numbers of 48,069 variants in tissue and 38,959 variants in serum samples. On average, 75% of the variants found in tissues and 78% of the variants found in serum samples were annotated as single nucleotide polymorphisms (SNPs) in the dbSNP (v129) database and therefore excluded from further analysis.

Next, we applied filters to remove low quality and germline variants for the serum vs. tissue comparison. We retained tissue variants with a mutant allele frequency between 20% and 80%, a minimum coverage of 20x, and a base quality ≥ 50, common in maximum two samples. Variants with an allele frequency above 80% were considered as homozygous germline variants and thus excluded from tissue as well as serum calls. As allele frequencies below 1% have been reported for somatic alterations in cfDNA [15], no lower frequency limit for calls in serum samples was used. Only variants with a sequencing depth < 10x were removed. These filtering steps led to a final data set of 2,557 (range: 1,892–3,322) high-confidence variants in NSCLC tissues and 1,416 (range: 589–4,105) in the corresponding serum samples (Table 2).

Variants in cancer-associated genes

To investigate to what extent cfDNA informs about cancerous molecular alterations, we compared the variants from tumor tissues with those found in the corresponding serum samples. Of the 2,557 high-confident tissue and 1,416 serum variants, a median of 431, representing 17.2% (range 5.2% - 56.7%; 148–1.265) variants were called in both specimen types (Fig 3, Table 2). We further analyzed the variants commonly identified in serum and matched NSCLC tissue from each patient. Consistent with previous findings [46, 47], we detected 39% (1,242) synonymous and 61% (1,966) non-synonymous variants, including a median of 238 (range: 76–654) missense variants among the coding alterations identified in the 6 patients. To identify cancerous somatic mutations in the absence of germline controls, we used the COSMIC database and the sets of NSCLC associated genes (S1 and S2 Tables). In the first approach, we compared serum and matched tissue variants with the COSMIC database of known mutation sites in human cancers (Fig 3). In the common variants of tissue and matched serum pairs, we identified 81 (range: 30–222) variants at COSMIC-annotated sites in each of the six patients. A median of 1.218 (range: 441–2.840) variants was identified in cfDNA, but not in the matched tissues, 1.970 (range: 967–2.713) were exclusive for the tumor tissues. Of these, an average of 254 serum and 363 tissue variants were found at COSMIC annotated sites.

Fig 3. Comparison of shared and exclusive variants in serum and tumor tissue pairs compared to the COSMIC database of annotated somatic mutations.

Fig 3

In order to validate the performance of our approach, we performed WES of two serum pools (ctrl1 and ctrl2) from control subjects without evidence of NSCLC. Sequencing data were processed and variants filtered with the presented bioinformatical pipeline. Variant calling revealed 53,728 and 43,232 variants in cfDNA from ctrl1 and ctrl2. Upon SNP removal, a total of 11,449 and 8,966 variants were filtered using identical criteria as for the NSCLC serum variants. Filtration resulted in 7,623 and 5,056 remaining variant calls in cfDNA from pool ctrl1 and ctrl2, respectively. Thereof, only 64 and 51 variants were found at COSMIC annotated sites, including only 22 and 23 missense as well as 14 and 4 frameshift variants in ctrl1 and ctrl2, respectively (Table 2). Thus, the rate of coding COSMIC annotated mutations in pooled control samples is lower compared to NSCLC patient sera.

We further used sets of genes, which had previously been found to harbor mutations associated with NSCLC to identify potential driver and prominent lung cancer mutations. Based on the TCGA and COSMIC databases as well as a published NSCLC panel [12], we designed sets of 58 and 45 genes for LUAD and SCC, respectively. By comparing thes target genes and the COSMIC reference to the WES data, we identified a broad range of NSCLC-associated somatic mutations in tissue and matched cfDNA (Table 3). Among the tumor tissue and cfDNA pairs, we identified COSMIC listed mutations in various kinases, including PIK3CA, ALK, MAP2K3, and PAK2. We further detected a splice site variant in the tumor suppressor gene PTEN. Moreover, cfDNA confirmed variants in LRP1B, MET, and the epigenetic modulator KMT2C. Apart from confirming variants detected in tumor tissues, cfDNA revealed additional variants of clinical relevance. For example, cfDNA of patient 4 revealed an MTOR mutation with an allele frequency of 15%, which was not found in tumor tissue. In order to support this finding, we performed Sanger sequencing of cfDNA and genomic DNA from the corresponding primary tumor tissue of patient 4. Sanger sequencing confirmed the presence of the MTOR mutation with a lower frequency compared to the wild type in serum cfDNA from patient 4. As expected, this mutation was not present in the tumor tissue (Fig 4). Although we identified common and COSMIC-annotated variants in serum and tissue pairs of all six patients, exome analysis of cfDNA could not confirm 2,126 (range: 967–2,716) mutations identified in primary tumor tissues. While none of the five TP53 variants identified in tissues was found in the serum samples, cfDNA did not reflect the potential driver mutations in PIK3CA and CDKN2A from primary tissues of patients 1 and 2, respectively.

Table 3. Coding variants identified in tumor tissue and serum samples.

Case Gene Coding Consequence Tissue Serum
P1 ALK p.E1419K, COSM159021    
VEGFB p.A194_A195dup    
PDGFRA p.S478P, COSM5008347    
MAP2K3 p.L219W, COSM1579439    
ROS1 p.R560H    
TP53 p.R175H, COSM10648    
PIK3CA p.E545K, COSM763    
LRP1B p.D2670E    
P2 NOTCH4 p.L16_C17insL, COSM451257    
TP53 p.Y236D, COSM43602    
CDKN2A p.R58*, stopgain, COSM12473    
FLT1 c.1437-6dupT    
CSMD3 p.S253C, COSM3644419    
P3 VEGFA p.E273G    
KMT2C p.Tyr816fs, at COSM289942    
TGFA p.P54L    
NOTCH1 p.P1210T    
FLT1 p.R183L    
TP53 p.P177L, COSM44097    
CSMD3 splice site    
RYR2 p.D2932H    
PRKCG p.M355I    
PRKCG p.V356F    
P4 PTEN splice site    
PIK3CA p.I391M, COSM328028    
TP53 p.K120_A129dup    
MTOR p.R32L    
MTOR p.P1410T    
P5 MET p.T1010I, COSM707    
PAK2 p.K128R, COSM4005518    
LRP1B p.G3615A    
EPHA3 p.R914H, at COSM4002833    
P6 TP53 p.H179R, COSM10889    
PTCH1 p.L420I    
ERBB3 p.T906S    

Fig 4. Sanger sequencing results from patient 4.

Fig 4

Confirmation of the presence of the MTOR mutation c.4228C>A (p.P1410T) at a lower allele frequency in cfDNA and its absence in the corresponding primary tumor tissue.

Discussion

Currently, cancer genome sequencing is used to identify genetic variants associated with malignant transformation. Since somatic alterations were first found in the blood of cancer patients, sequencing of cfDNA has been shown to be useful for minimal invasive diagnostics and therapy monitoring of malignant diseases [12, 48]. Few proof-of-concept studies have demonstrated the feasibility of WES of cfDNA for disease monitoring in several cancer entities, including NSCLC [11, 25, 26]. However, to which extent cfDNA represents the tumors´ molecular profiles in the circulation of cancer patients has not been systematically investigated yet. Moreover, standardized methods are needed to translate WES of cfDNA into clinical practice. Here, we present a robust experimental workflow for WES analysis of cfDNA and evaluate the molecular representation of the tumor exome in cfDNA by WES of matched tumor and serum samples from NSCLC patients.

Since most commonly available technologies for WES require large amounts of starting material which cannot be obtained from serum samples, DNA amount and complexity of the sequencing library are the limiting factors of hybridization-based WES, especially since more PCR cycles are required when the input material is limited. Based on previous reports on WES from low input samples and cfDNA, we used the ThruPLEX-FD Prep Kit (Rubicon Genomics) for library generation [11, 26]. Evaluation of cfDNA sequencing data illustrates the high performance of the established workflow, which combines the ThruPLEX-FD library preparation with the SureSelect technology for exome enrichment. We observed no differences in mapping performance, enrichment efficiency, target coverage, and sequencing depth between cfDNA reads compared to those from matched tissue samples.

We demonstrate the utility of WES for the identification of variants in serum samples from cancer patients. SNP removal and annotation of called variants using the COSMIC database of known mutations in cancer further showed that somatic mutations can be identified in the absence of germline controls. Our results from variant calling of matched serum and tissue pairs illustrate the informative value of cfDNA for cancer genotyping. While other groups used sequencing approaches primarily for limited numbers of prominent cancer associated genes [12, 48], we performed WES in order to estimate the representation of the tumor exomes in cfDNA. A median of 17.19% of the tissue variants (5.17% - 56.68%) was also found in the corresponding serum samples of the six tested NSCLC exomes. In addition, 81 (range: 30–222) of the common mutations in the serum and tissue pairs were at COSMIC-annotated mutation sites.

Although these data demonstrate the informative value of cfDNA at least for advanced cancers, the sequencing depth of WES represents a major limiting factor of the technology compared to targeted approaches, especially for the detection of low allele frequencies. While allele frequencies below 1% have been reported for tumor fragments in the circulation [15], the achieved sequencing depth eventually was too low to analyze the full representation of the tumor exome in cfDNA. Thus, an extensive fraction of variants, including TP53 mutations in five patients, PIK3CA mutation in patient 1, and a CDKN2A mutation in patient 2, were exclusive for the primary tissues and not found in the corresponding sera, Their low abundances in the circulation could be influenced by several factors, including differences in tumor load, influences of therapy on the presence of cfDNA, or temporal variations of the abundance of cfDNA with respect to the tumor status.

Apart from shared mutations, 1.218 (range: 441–2.840) variants were identified in cfDNA only and were absent from the matched tissues. Although the cellular origin of these variants is difficult to trace, such variants may derive from different cell types and tissues in the body. Previous studies noted an accumulation of variants and mutations in different tissues within the same individual [4952]. Thus, variants could originate from healthy cells, which accumulated mutations during differentiation and aging. However, analysis of cfDNA also allows for the identification of somatic mutations originating from metastatic lesions distinct from primary tumors [25]. Notably, we detected a MTOR mutation with a frequency of 15% in cfDNA of patient 4, which was not detected in primary tissue and confirmed this finding by Sanger sequencing. Such mutated alleles in cfDNA might have originated from an untested tissue lesion and thus provide complementary molecular information about therapeutically relevant mutations and the clonal heterogeneity of the disease.

In summary, we evaluated cfDNA to assess the exomes of six NSCLC patients in primary tumor and corresponding serum samples. We show that exome analysis of cfDNA is feasible for minimal-invasive characterization of tumor diseases. Our results provide evidence for cfDNA to inform about the molecular alteration in advanced cancer. Nevertheless, further evaluation and larger cohorts of different entities are needed to fully understand the value of WES of cfDNA as faithful representations of tumors.

Supporting Information

S1 Fig. Experimental and computational workflow for whole-exome sequencing of tumor tissues and cfDNA from corresponding serum samples.

(TIF)

S1 Table. LUAD panel of recurrently mutated genes in lung adenocarcinomas.

(XLSX)

S2 Table. SCC panel of recurrently mutated genes in lung squamous cell carcinomas.

(XLSX)

Acknowledgments

We thank Stephan Wolf and the DKFZ Genomics and Proteomics Core Facility for technical support and high-throughput sequencing. Tissue and serum sample were provided by LungBiobank Heidelberg a member of the BioMaterialBank Heidelberg (BMBH) and the biobank platform of the German Center for Lung Research (DZL).

Data Availability

All fastq sequencing files are available at the Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra). Accession number: SRP073475.

Funding Statement

The authors have no support or funding to report.

References

  • 1.Leary RJ, Sausen M, Kinde I, Papadopoulos N, Carpten JD, Craig D, et al. Detection of chromosomal alterations in the circulation of cancer patients with whole-genome sequencing. Sci Transl Med. 2012;4(162):162ra54 10.1126/scitranslmed.3004742 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Forshew T, Murtaza M, Parkinson C, Gale D, Tsui DW, Kaper F, et al. Noninvasive identification and monitoring of cancer mutations by targeted deep sequencing of plasma DNA. Sci Transl Med. 2012;4(136):136ra68 10.1126/scitranslmed.3003726 . [DOI] [PubMed] [Google Scholar]
  • 3.De Mattos-Arruda L, Weigelt B, Cortes J, Won HH, Ng CK, Nuciforo P, et al. Capturing intra-tumor genetic heterogeneity by de novo mutation profiling of circulating cell-free tumor DNA: a proof-of-principle. Ann Oncol. 2014;25(9):1729–35. 10.1093/annonc/mdu239 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sorenson GD, Pribish DM, Valone FH, Memoli VA, Bzik DJ, Yao SL. Soluble normal and mutated DNA sequences from single-copy genes in human blood. Cancer Epidemiol Biomarkers Prev. 1994;3(1):67–71. . [PubMed] [Google Scholar]
  • 5.de Bruin EC, McGranahan N, Mitter R, Salm M, Wedge DC, Yates L, et al. Spatial and temporal diversity in genomic instability processes defines lung cancer evolution. Science. 2014;346(6206):251–6. 10.1126/science.1253462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gerlinger M, Rowan AJ, Horswell S, Larkin J, Endesfelder D, Gronroos E, et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med. 2012;366(10):883–92. 10.1056/NEJMoa1113205 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Bettegowda C, Sausen M, Leary RJ, Kinde I, Wang Y, Agrawal N, et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci Transl Med. 2014;6(224):224ra24 10.1126/scitranslmed.3007094 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Thierry AR, Mouliere F, El Messaoudi S, Mollevi C, Lopez-Crapez E, Rolet F, et al. Clinical validation of the detection of KRAS and BRAF mutations from circulating tumor DNA. Nat Med. 2014;20(4):430–5. 10.1038/nm.3511 . [DOI] [PubMed] [Google Scholar]
  • 9.Diehl F, Schmidt K, Choti MA, Romans K, Goodman S, Li M, et al. Circulating mutant DNA to assess tumor dynamics. Nat Med. 2008;14(9):985–90. 10.1038/nm.1789 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Dawson SJ, Tsui DW, Murtaza M, Biggs H, Rueda OM, Chin SF, et al. Analysis of circulating tumor DNA to monitor metastatic breast cancer. N Engl J Med. 2013;368(13):1199–209. 10.1056/NEJMoa1213261 . [DOI] [PubMed] [Google Scholar]
  • 11.Murtaza M, Dawson SJ, Tsui DW, Gale D, Forshew T, Piskorz AM, et al. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA. Nature. 2013;497(7447):108–12. 10.1038/nature12065 . [DOI] [PubMed] [Google Scholar]
  • 12.Newman AM, Bratman SV, To J, Wynne JF, Eclov NC, Modlin LA, et al. An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage. Nat Med. 2014;20(5):548–54. 10.1038/nm.3519 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Diaz LA Jr, Bardelli A. Liquid biopsies: genotyping circulating tumor DNA. J Clin Oncol. 2014;32(6):579–86. 10.1200/JCO.2012.45.2011 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.McBride DJ, Orpana AK, Sotiriou C, Joensuu H, Stephens PJ, Mudie LJ, et al. Use of cancer-specific genomic rearrangements to quantify disease burden in plasma from patients with solid tumors. Genes Chromosomes Cancer. 2010;49(11):1062–9. 10.1002/gcc.20815 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Diehl F, Li M, Dressman D, He Y, Shen D, Szabo S, et al. Detection and quantification of mutations in the plasma of patients with colorectal tumors. Proc Natl Acad Sci U S A. 2005;102(45):16368–73. 10.1073/pnas.0507904102 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wang Z, Chen R, Wang S, Zhong J, Wu M, Zhao J, et al. Quantification and dynamic monitoring of EGFR T790M in plasma cell-free DNA by digital PCR for prognosis of EGFR-TKI treatment in advanced NSCLC. PLoS One. 2014;9(11):e110780 10.1371/journal.pone.0110780 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Jiang P, Chan CW, Chan KC, Cheng SH, Wong J, Wong VW, et al. Lengthening and shortening of plasma DNA in hepatocellular carcinoma patients. Proc Natl Acad Sci U S A. 2015;112(11):E1317–25. 10.1073/pnas.1500076112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Cancer Genome Atlas Research N. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;489(7417):519–25. 10.1038/nature11404 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Cancer Genome Atlas Research N. Comprehensive molecular profiling of lung adenocarcinoma. Nature. 2014;511(7511):543–50. 10.1038/nature13385 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Alexandrov LB, Stratton MR. Mutational signatures: the patterns of somatic mutations hidden in cancer genomes. Curr Opin Genet Dev. 2014;24:52–60. 10.1016/j.gde.2013.11.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lawrence MS, Stojanov P, Mermel CH, Robinson JT, Garraway LA, Golub TR, et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature. 2014;505(7484):495–501. 10.1038/nature12912 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA Jr, Kinzler KW. Cancer genome landscapes. Science. 2013;339(6127):1546–58. 10.1126/science.1235122 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, et al. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 2015;136(5):E359–86. 10.1002/ijc.29210 . [DOI] [PubMed] [Google Scholar]
  • 24.Mohan S, Heitzer E, Ulz P, Lafer I, Lax S, Auer M, et al. Changes in colorectal carcinoma genomes under anti-EGFR therapy identified by whole-genome plasma DNA sequencing. PLoS Genet. 2014;10(3):e1004271 10.1371/journal.pgen.1004271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Butler TM, Johnson-Camacho K, Peto M, Wang NJ, Macey TA, Korkola JE, et al. Exome Sequencing of Cell-Free DNA from Metastatic Cancer Patients Identifies Clinically Actionable Mutations Distinct from Primary Disease. PLoS One. 2015;10(8):e0136407 10.1371/journal.pone.0136407 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Klevebring D, Neiman M, Sundling S, Eriksson L, Darai Ramqvist E, Celebioglu F, et al. Evaluation of exome sequencing to estimate tumor burden in plasma. PLoS One. 2014;9(8):e104417 10.1371/journal.pone.0104417 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. 10.1093/bioinformatics/btp324 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Hummel M, Bonnin S, Lowy E, Roma G. TEQC: an R package for quality control in target capture experiments. Bioinformatics. 2011;27(9):1316–7. 10.1093/bioinformatics/btr122 . [DOI] [PubMed] [Google Scholar]
  • 30.Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. 10.1038/nature15393 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297–303. 10.1101/gr.107524.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Cingolani P, Platts A, Wang le L, Coon M, Nguyen T, Wang L, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012;6(2):80–92. 10.4161/fly.19695 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8. 10.1093/bioinformatics/btr330 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, et al. Integrative genomics viewer. Nat Biotechnol. 2011;29(1):24–6. 10.1038/nbt.1754 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L, Quake SR. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Natl Acad Sci U S A. 2008;105(42):16266–71. 10.1073/pnas.0808319105 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Lo YM, Chan KC, Sun H, Chen EZ, Jiang P, Lun FM, et al. Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus. Sci Transl Med. 2010;2(61):61ra91 10.1126/scitranslmed.3001720 . [DOI] [PubMed] [Google Scholar]
  • 37.Snyder MW, Kircher M, Hill AJ, Daza RM, Shendure J. Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin. Cell. 2016;164(1–2):57–68. 10.1016/j.cell.2015.11.050 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Jahr S, Hentze H, Englisch S, Hardt D, Fackelmayer FO, Hesch RD, et al. DNA fragments in the blood plasma of cancer patients: quantitations and evidence for their origin from apoptotic and necrotic cells. Cancer Res. 2001;61(4):1659–65. . [PubMed] [Google Scholar]
  • 39.Thierry AR, Mouliere F, Gongora C, Ollier J, Robert B, Ychou M, et al. Origin and quantification of circulating DNA in mice with human colorectal cancer xenografts. Nucleic Acids Res. 2010;38(18):6159–75. 10.1093/nar/gkq421 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Heitzer E, Auer M, Hoffmann EM, Pichler M, Gasch C, Ulz P, et al. Establishment of tumor-specific copy number alterations from plasma DNA of patients with cancer. Int J Cancer. 2013;133(2):346–56. 10.1002/ijc.28030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Rykalina VN, Shadrin AA, Amstislavskiy VS, Rogaev EI, Lehrach H, Borodina TA. Exome sequencing from nanogram amounts of starting DNA: comparing three approaches. PLoS One. 2014;9(7):e101154 10.1371/journal.pone.0101154 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C, et al. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 2011;12(2):R18 10.1186/gb-2011-12-2-r18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Kozarewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, Turner DJ. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods. 2009;6(4):291–5. 10.1038/nmeth.1311 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Kane MD, Jatkoe TA, Stumpf CR, Lu J, Thomas JD, Madore SJ. Assessment of the sensitivity and specificity of oligonucleotide (50mer) microarrays. Nucleic Acids Res. 2000;28(22):4552–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Belkadi A, Bolze A, Itan Y, Cobat A, Vincent QB, Antipenko A, et al. Whole-genome sequencing is more powerful than whole-exome sequencing for detecting exome variants. Proc Natl Acad Sci U S A. 2015;112(17):5473–8. 10.1073/pnas.1418631112 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, Lee C, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461(7261):272–6. 10.1038/nature08250 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337(6090):64–9. 10.1126/science.1219240 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Frenel JS, Carreira S, Goodall J, Roda D, Perez-Lopez R, Tunariu N, et al. Serial Next-Generation Sequencing of Circulating Cell-Free DNA Evaluating Tumor Clone Response To Molecularly Targeted Drug Administration. Clin Cancer Res. 2015;21(20):4586–96. 10.1158/1078-0432.CCR-15-0584 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Frumkin D, Wasserstrom A, Kaplan S, Feige U, Shapiro E. Genomic variability within an organism exposes its cell lineage tree. PLoS Comput Biol. 2005;1(5):e50 10.1371/journal.pcbi.0010050 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Gomez-Ramos A, Sanchez-Sanchez R, Muhaisen A, Rabano A, Soriano E, Avila J. Similarities and differences between exome sequences found in a variety of tissues from the same individual. PLoS One. 2014;9(7):e101412 10.1371/journal.pone.0101412 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Holstege H, Pfeiffer W, Sie D, Hulsman M, Nicholas TJ, Lee CC, et al. Somatic mutations found in the healthy blood compartment of a 115-yr-old woman demonstrate oligoclonal hematopoiesis. Genome Res. 2014;24(5):733–42. 10.1101/gr.162131.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Lupski JR. Genetics. Genome mosaicism—one human, multiple genomes. Science. 2013;341(6144):358–9. 10.1126/science.1239503 . [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

S1 Fig. Experimental and computational workflow for whole-exome sequencing of tumor tissues and cfDNA from corresponding serum samples.

(TIF)

S1 Table. LUAD panel of recurrently mutated genes in lung adenocarcinomas.

(XLSX)

S2 Table. SCC panel of recurrently mutated genes in lung squamous cell carcinomas.

(XLSX)

Data Availability Statement

All fastq sequencing files are available at the Sequence Read Archive (http://www.ncbi.nlm.nih.gov/sra). Accession number: SRP073475.


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES