Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2022 May 19;12:8458. doi: 10.1038/s41598-022-12457-1

Evaluation of cfDNA as an early detection assay for dense tissue breast cancer

Mouadh Barbirou 1,4,, Amanda A Miller 1, Erik Gafni 2, Amel Mezlini 3, Asma Zidi 3, Nathan Boley 2, Peter J Tonellato 1
PMCID: PMC9120463  PMID: 35589867

Abstract

A cell-free DNA (cfDNA) assay would be a promising approach to early cancer diagnosis, especially for patients with dense tissues. Consistent cfDNA signatures have been observed for many carcinogens. Recently, investigations of cfDNA as a reliable early detection bioassay have presented a powerful opportunity for detecting dense tissue screening complications early. We performed a prospective study to evaluate the potential of characterizing cfDNA as a central element in the early detection of dense tissue breast cancer (BC). Plasma samples were collected from 32 consenting subjects with dense tissue and positive mammograms, 20 with positive biopsies and 12 with negative biopsies. After screening and before biopsy, cfDNA was extracted, and whole-genome next-generation sequencing (NGS) was performed on all samples. Copy number alteration (CNA) and single nucleotide polymorphism (SNP)/insertion/deletion (Indel) analyses were performed to characterize cfDNA. In the positive-positive subjects (cases), a total of 5 CNAs overlapped with 5 previously reported BC-related oncogenes (KSR2, MAP2K4, MSI2, CANT1 and MSI2). In addition, 1 SNP was detected in KMT2C, a BC oncogene, and 9 others were detected in or near 10 genes (SERAC1, DAGLB, MACF1, NVL, FBXW4, FANK1, KCTD4, CAVIN1; ATP6V0A1 and ZBTB20-AS1) previously associated with non-BC cancers. For the positive–negative subjects (screening), 3 CNAs were detected in BC genes (ACVR2A, CUL3 and PIK3R1), and 5 SNPs were identified in 6 non-BC cancer genes (SNIP1, TBC1D10B, PANK1, PRKCA and RUNX2; SUPT3H). This study presents evidence of the potential of using cfDNA somatic variants as dense tissue BC biomarkers from a noninvasive liquid bioassay for early cancer detection.

Subject terms: Cancer, Genetics, Molecular biology, Biomarkers, Molecular medicine, Oncology

Introduction

Breast cancer (BC) is the most prevalent cancer worldwide, with an estimated 2.3 million new cases in 20201. According to the GLOBOCAN Cancer Tomorrow Prediction, incidences are expected to increase by 33.8% by 2040, suggesting a staggering 3 million new cases2. The incidence of mortality due to BC remains high in low-income countries due in part to the noticeable lack of options for early detection and therapy management3. In Tunisia, approximately 32.2 incident cases and 10.3 related deaths per 100.000 women were reported in late 20194. Currently, mammography is the only noninvasive method for detecting evidence of possible BC in dense tissue patients, and ultrasound-assisted core needle biopsy is the only robust and effective means of obtaining definitive diagnosis and staging of BC. Together, they provide a tenuous tandem method for accurately detecting early BC in dense tissue patients. Mammography has low sensitivity, with up to 34% false negative diagnoses for female dense tissue patients under 405,6. Complementary invasive ultrasound-assisted core needle biopsy has a number of shortcomings, including difficulty in targeting small lesions and the ability to miss underestimated lesions7. In addition, the mammography-tissue biopsy tandem does not provide detailed information (such as genetic mutations) that could be of great value in obtaining a precise diagnosis and delivering optimized therapy7. Collectively, these limitations suggest the untapped value of a more refined, robust, information-rich, noninvasive approach that reduces the need for repeated biopsies, unnecessary surgeries, and nonideally treatments, especially for women with dense breast tissue. In this context, liquid biopsy based on a simple noninvasive blood test is a very promising approach for investigating the tumor-derived material circulating in the bloodstream shed from primary tumors and their metastatic sites8. Among the tumor components in bodily fluids identified during the past decade, increasing attention has been given to circulating tumor DNA (ctDNA), which is now considered useful for the early detection and management of solid tumors such as those of colorectal, prostate and lung cancers9. The small nucleic acid fragments known as ctDNA (approximately 134–144 bp) are associated with abnormal cell structures and altered mechanisms10. Prior investigations have largely shown a high concordance between the ctDNA molecular profile and traditional tumor tissue using the same testing protocols11. Advances in next-generation sequencing (NGS) have simplified and improved the speed of the molecular identification and testing of ctDNA genomic alterations, proving value for novel target variant identification with the potential to improve patient outcomes12. Molecular investigations have demonstrated that the BC patient genome include somatic mutations and copy number alterations (CNAs) that correlate with cancer susceptibility and staging13. These genetic alterations can be detected in ctDNA from BC patients and thus are candidates for early BC detection and improved screening programs14. However, there are limited data regarding the variant profile differences among dense tissue subjects with positive mammograms and positive ultrasound biopsy versus those with positive mammograms and negative ultrasound biopsy against ctDNA molecular testing. In this study, we aimed to assess the differences in somatic variant profiles, including CNAs), single nucleotide polymorphisms (SNPs), and insertions/deletions (Indels), between subjects with positive mammograms and positive biopsies (pos-pos) versus subjects with positive mammograms and negative biopsies (pos-neg) using a ctDNA assay and to examine the differences in BC early detection and clinical outcomes of ctDNA testing.

Methods

Cohort

A cohort of 32 subjects with dense tissue and positive mammograms from Salah Azaiz Institute in Tunisia between June 2019 and January 2020 was recruited into the study. Clinical information was obtained through the medical records and a personal interview during sample collection. Cell-free DNA (cfDNA) sample collection was conducted after a positive mammogram but before ultrasound-assisted core needle biopsy. Microbiopsy test results were documented after confirmation by two independent physicians (radiologist and oncologist). This research was conducted through an Institutional Review Board-approved protocol (ISA/2019/04), and all subjects provided written informed consent for our study.

Sample preparation and cfDNA sequencing

Ten milliliters of peripheral blood samples were obtained immediately before ultrasound-guided core needle biopsy. Plasma from Streck BCT tubes was prepared within 2 h after blood collection and stored at − 20 °C in the clinic until shipment to the research laboratory. cfDNA was isolated from 5 ml of plasma with a MagMAX Cell-Free DNA Isolation Kit (MM; Applied Biosystems, Thermo Fisher Scientific, Foster City, CA, USA) and then eluted in 60 µl of elution buffer according to the manufacturer’s protocol. cfDNA was quantified using a QuantiFluor dsDNA System and GloMax Discover Microplate Reader (Promega, Madison, WI, USA). The distribution of fragment lengths was checked by electrophoresis on an Agilent 2100 Bioanalyzer with a High Sensitivity Large Fragment 50 kb DNA Kit (Agilent, Technologies Inc., Santa Clara, CA, USA). An NEBNext Ultra II DNA Library Prep kit (New England Biolabs, UK; E7645) was used for cfDNA whole-genome library preparation. Higher-pass whole-genome sequencing was started with 10 ng of cfDNA input (median of 5 ng). Finally, 32 libraries were pooled and sequenced using 150 bp pair-end run reads and 8 bp dual-indices on an Illumina NovaSeq machine (Illumina, San Diego, CA, USA), producing cfDNA whole-genome sequences for each subject.

Pathologic assessment and subject segregation

Pathologic tissues obtained by ultrasound-guided biopsy and under mammography for the whole cohort were reviewed by designated breast pathologists from Salah Azaiz Institute in Tunisia. According to the evaluation results from standard histology and mammogram imaging, the cohort was classified into two groups: the screening group, corresponding to subjects with positive mammography and negative biopsy (pos-neg; N = 12) and the cases group, corresponding to subjects with positive mammography and positive biopsy (pos-neg; N = 20). The absence of tumoral tissue as confirmed by examination was designated a “negative” biopsy, and a designation of a “positive” biopsy was made if the sample indicated stage I or II breast malignancy according to the 8th Edition of the American Joint Committee on Cancer (AJCC) Staging Manual for breast cancer15.

cfDNA sequence analysis

The analysis workflow performed in this study is summarized in Fig. 1. First, cfDNA whole-genome sequencing data were stored in Fastq files and then adapter trimmed using fastp (version 0.19.10) with default settings and -p-detect_adapter_for_pe16. The paired-end reads were aligned with BWA (version 0.7.17-r1188)17 to the GRCh38 human reference genome. The resulting BAM files were processed using the Picard (version 2.18.9) UmiAwareMarkDuplicatesWithMateCigar function (http://broadinstitute.github.io/picard/) to remove duplicate reads. FastQC (version 0.11.9) was run before and after adapter trimming to impose Fastq record quality control18, and Picard CollectWGSMetrics was used for BAM file quality control (http://broadinstitute.github.io/picard/).

Figure 1.

Figure 1

Schematic representation of the analysis workflow. cfDNA cell-free DNA, QC quality control, CNA copy number alterations, SNPs single nucleotide polymorphisms, Indels Insertion/deletions, CADD combined annotation dependent depletion, UCSC University of California Santa Cruz.

CNA

ichorCNA (version 0.3.2, https://github.com/broadinstitute/ichorCNA) was then applied to all high-quality aligned reads for each subject’s BAM files to estimate the tumor-derived DNA fraction (TF) and detect CNAs using all recommended default parameters except parameter adjustment to account for low cfDNA content samples19. Given the absence of an established control reference CNA set for these samples, no false-positive filtering was performed. Subsequently, the detected CNAs were grouped by subject status into “pos-pos” and “pos-neg” groups. The CNAs collected for each group were filtered to include only those shared by at least 2 subjects in the group and thereafter filtered to include alterations exclusive to that same group. These pos-pos and pos-neg exclusive CNAs were separately tested to determine the genes with which they overlapped using the UCSC Genome Browser20. The CNA-tagged genes were then tested against the Cancer Genes set found in the Precision Oncology Knowledge Base (OncoKB, 27) to determine which cancers (if any) the genes were associated with. These CNA-tagged cancer genes were then tested against the Candidate Cancer Gene Database21 to identify predicted associated cancers.

SNPs and indels

Grouped by pathology type (pos-pos; pos-neg), each subject’s BAM files were then analyzed by the Mutect2 part of GATK (v. 4.1.8.1)22 to detect somatic SNPs and Indels within the 22 autosomes against a ‘panel of normals’ created from the 1000 Genomes project23 and the gnomAD24 database as a ‘germline-resource’ included in the GATK resource bundle (https://console.cloud.google.com/storage/browser/genomics-public-data/resources/broad/hg38/v0). Identified variants were then filtered using GATK FilterMutectCalls22 using the recommended default parameters and thereafter annotated using ANNOVAR25. Variants with a minor allele frequency (MAF) >  = 1% in the 1000 Genomes and ExAC databases were excluded26. Subsequently, candidate variants without a predicted deleterious nature were removed from consideration. To detect deleterious mutations, all variants were ranked using the CADD database (version 1.6), and those with a PHRED scaled score of > 10 were considered as having a probable deleterious function and retained in their respective pos-pos and pos-neg grouped collection27. For coding variants, the deleterious nature was predicted by MutationTaster28, PolyPhen V229, Provean30, and SIFT31, provided by the dbNSFP database (version 4.1)32. The grouped variants predicted to be deleterious by at least three of the four prediction engines were retained. For noncoding variants, the designation of ‘deleterious’ was assigned after application of SNPNexus33 and a threshold of FunSeq2 score >  = 1.534. The coding and noncoding deleterious variants were then collected into the pos-pos and pos-neg groupings. As with the candidate cfDNA CNAs, candidate cfDNA SNPs and Indels were filtered to include those appearing in at least two individuals within the group and thereafter exclusive to either pos-pos or pos-neg groups. These pos-pos and pos-neg exclusive variants were then used to identify their associated genes and the subsequent determination of cancer association using the Candidate Cancer Gene Database21.

Statistical analysis

Statistical analysis was performed with R (version 3.6.2)35. Continuous variables are expressed as the means ± SDs, while categorical data are expressed as percentages of the total. Independent sample t tests were applied for intergroup comparisons of normally distributed continuous data, and chi-square tests were applied for categorical variables. P < 0.05 was considered statistically significant. The tumor fraction estimation boxplots of groups were created with the R-ggplot2 package36.

Ethical approval and consent to participate

All subject investigations conformed to the principles outlined in the Declaration of Helsinki and have been performed with permission of the study protocol approved by the ethics committee of Salah Azaiz Institute (SAI), under same’s Ethics Committee registration number (#ISA/2019/04). All subjects were informed about the purposes of the study and consented in writing to participate in the study.

Results

Cohort

A total of 32 women with dense breast tissue and a positive screening mammogram were recruited before microbiopsy. Detailed clinicopathological characteristics of the cohort are described in Table 1. Blood samples were acquired from all subjects for cfDNA analysis. Tumor status was confirmed by the pathology report from nodule biopsy and subsequent ultrasound. A cohort of 12 subjects with no confirmed tumors were stratified as pos-neg (age: 42.00 ± 4.73, BMI: 31.29 ± 6.53); 33.33% had a family history of nonbreast cancer. The remaining 20 subjects with confirmed tumors, 11 in stage I and 9 in stage II (age: 43.50 ± 3.95, BMI: 29.76 ± 5.07), were placed in the pos-pos group; 70% had a family history of nonbreast cancer, and 15% had a breast cancer history. No significant differences were observed between groups concerning the clinicopathological parameters (Table 1).

Table 1.

Participants' characteristics (Pos-pos and Pos-neg).

Parameters Pos-pos N = 20 (%) Pos-neg N = 12 (%) Total N = 32 (%) P 1
Demographic
Age (years)2 43.50 ± 3.95 42.00 ± 4.73 42.94 ± 4.25 0.3673
BMI2 29.76 ± 5.07 31.29 ± 6.53 30.33 ± 5.60 0.4949
Risk factors
Smoking (never/sometimes) 19/1 11/1 30/2 0.7061
Alcohol use (never/sometimes) 20/0 12/0 32/0 NA
Clinical history
Hypertension 6 (30.00%) 1 (8.33%) 7 (21.88%) 0.1512
Hyperglycemia 2 (10.00%) 2 (16.67%) 4 (12.50%) 0.5809
Anemia 5 (25.00%) 2 (16.67%) 7 (21.88%) 0.5809
Cancer family history
Other Cancer 11 (55.00%) 4 (33.33%) 15 (46.88%) 0.5153
Breast cancer 3 (15.00%) 0 (0.00%) 3 (9.38%) 0.1587
TNM classification
I 11 NA NA NA
II 9 NA NA

Pos-neg Positive–negative subjects, Pos-pos Positive-positive subjects, BMI Body Mass Index, TNM Tumor, Nodes, Metastases according to Cancer (AJCC American Joint Committee on Cancer), NA Not Applicable.

1Pearson chi square (categorical variables), Student t-test (continuous variables), Value in bold is statistically significant < 0.05.

2Mean ± standard deviation.

Tumor fraction estimation

The level of tumor‐derived DNA in plasma at baseline (after the positive mammogram and before microbiopsy) was predicted. Subjects were first analyzed as one group and then stratified based on the biopsy pathological results into four groups (pos-neg subjects and pos-pos Stage I, pos-pos Stage II and all pos-pos subjects). The lower limit of sensitivity for detecting the presence of tumor or TF cutoff was set to 3%, as suggested by the authors of the ichorCNA software. For the pos-neg cohort, the mean TF was 0.016 (range 0.012–0.021), and for the all pos-pos group, the mean TF was 0.018 (range 0.009–0.058). The difference in mean TF between the two groups was not statistically significant (p0 = 0.53). The pos-pos TF range was wider, suggesting a larger deviance between TFs in the pos-neg group than in the pos-pos group. The mean TF for the pos-pos Stage I group was 0.014 (range 0.009–0.020) versus 0.022 (range 0.013–0.058) for the pos-pos stage II group; the differences between these groups and the pos-neg group were not significant (p1 = 0.27 and p2 = 0.28, respectively). The mean TF differences between the pos-pos Stage I and II groups was also not statistically significant (p3 = 0.17), although the pos-pos Stage II group had a larger mean TF and contained the only subject with a TF above the 3% cutoff (Fig. 2).

Figure 2.

Figure 2

Distribution of tumor fraction estimation. p0 Pos-neg vs. Pos-pos, p1 Pos-neg vs. Pos-pos Stage I, p2 Pos-neg vs. Pos-pos Stage II, p3 Pos-pos Stage I vs. Pos-pos Stage II. p-value: Student t-test, Pos-Neg Positive–negative subjects, Pos-Pos Positive-positive subjects.

CNAs and associated genes

CNA analysis detected a total of 1253 CNAs across all subjects, 1105 of which were in the pos-neg group and 868 in the pos-pos group. A total of 720 CNAs were shared by both groups, 385 found solely in the pos-neg group and 148 in the pos-neg group. The 1105 pos-neg CNAs were classified as gain (306), deletion (748) and amplification51. Of the 868 pos-pos CNAs, 382 were classified as gain, 435 as deletion and 51 as amplification (Fig. 3 and Table 2). Among the pos-neg subjects, chromosomes (Chr) 1 and 2 had the highest number of CNAs, 109 and 212, respectively, while for pos-pos cases, Chr 1 and 4 had 126 and 97 CNAs, respectively (Table 2). Of the 1253 total CNAs, 90 known overlapping oncogenes were identified; 15 were associated with CNAs found in both groups, 11 of which were previously described in cancers other than BC and 4 with a known association with BC. In addition, 49 deletion CNAs were detected in pos-neg subjects; 30 overlapped with genes previously described as associated with different cancers, 3 of which were previously associated with BC. On the other hand, 26 CNAs classified as gain were detected among the pos-pos subjects; 18 of these CNAs had a potential impact on genes that were previously described as associated with different cancers, 5 of which were described in BC (Table 3).

Figure 3.

Figure 3

Diagram of the CNAs distribution in study groups. CNAs: Copy Number Alterations. *1: Exclusive CNAs detected in Pos-neg. *2: Shared CNAs between Pos-neg and Pos-pos. *3: Exclusive CNAs detected in Pos-pos. *4: Total CNAs for Pos-neg. *5: Total CNAs for Pos-pos. *6: Total CNAs detected in the study cohort (Pos-neg and Pos-pos). Pos-Neg: Positive–negative subjects; Pos-Pos: Positive-positive subjects.

Table 2.

Copy Number Alteration count for study subjects and stratified by subject’s group according to ichorCNA.

CNA filtering CNA count
All subjects 1253 (454 GAIN, 748 DEL, 51 AMP)
Pos-neg 1105 (306 GAIN, 748 DEL, 51 AMP)
Pos-pos 868 (382 GAIN, 435 DEL, 51 AMP)
Total count GAIN DEL AMP
Subject segregation Pos-neg Pos-pos Pos-neg Pos-pos Pos-neg Pos-pos
Shared by at least 2 subjects in a group 200 355 563 435 51 51
Exclusive for a particular group 72 148 313 0 0 0
CNA location by chromosome Pos-neg Pos-pos
CHR1 109 126
CHR2 212 0
CHR3 0 0
CHR4 97 97
CHR5 87 0
CHR6 0 0
CHR7 79 79
CHR8 88 88
CHR9 0 0
CHR10 72 72
CHR11 61 61
CHR12 0 17
CHR13 67 67
CHR14 0 0
CHR15 34 34
CHR16 64 64
CHR17 18 36
CHR18 52 52
CHR19 0 0
CHR20 35 35
CHR21 20 20
CHR22 10 20

CNA Copy Number Alteration, Pos-neg Positive–negative subjects, Pos-pos Positive-positive subjects, CH chromosome, DEL Deletion, AMP Amplification, G1 Screening Subjects Group, G2 Cases Group.

Table 3.

Classification of Copy Number Alteration by gene and cancer impact according to study groups.

Copy number alteration
Genes Detected copy number alteration stratified by study groups
Genomic position Location Pos-neg Pos-pos CCGD classification
Cancer related BC related
JUN 58780790_58784047 CHR1 DEL Gain Blood
JAK1 64833244_65000000 CHR1 DEL Gain Liver, Blood, Colorectal, Pancreatic
NEGR1 71395942_72000000 CHR1 DEL Gain Liver
FUBP1 77948404_77979086 CHR1 DEL Gain Liver, Blood, Colorectal, Pancreatic, Gastric
RBM15 110338505_110346681 CHR1 DEL Gain Liver, Blood, Colorectal
VTCN1 117143586_117210927 CHR1 DEL Gain Pancreatic
DDR2 162632463_162787405 CHR1 DEL Gain Sarcoma
NUF2 163321934_163355764 CHR1 DEL Gain
PBX1 164559634_164851831 CHR1 DEL Gain Gastric
TPR 186311651_186375253 CHR1 DEL Gain Blood, Colorectal
CDC73 193121957_193254815 CHR1 DEL Gain Blood, Gastric
PIK3C2B 204422627_204490424 CHR1 DEL Gain Blood, Colorectal
MDM4 204516405_204558120 CHR1 DEL Gain
PGBD5 230314489_230426332 CHR1 DEL Gain
FH 241497602_241519761 CHR1 DEL Gain
PRDM16 3069202_3438621 CHR1 NA Gain Blood, Colorectal, Pancreatic, Gastric
CAMTA1 7000001_7769706 CHR1 NA Gain Liver, Blood, Colorectal,
SDHB 17018721_17054170 CHR1 NA Gain
PAX7 18630845_18748866 CHR1 NA Gain Colorectal
CDC42 22052708_22090807 CHR1 NA Gain Liver, Blood, Colorectal, Pancreatic
STK40 36339623_36385896 CHR1 NA Gain Liver, Blood, Colorectal
CSF3R 36466042_36483278 CHR1 NA Gain Blood, Colorectal
RRAGC 38838197_38859772 CHR1 NA Gain Liver, Blood, Gastric
MPL 43337848_43352772 CHR1 NA Gain Blood
IGF1 102395873_102480645 CHR12 NA Gain Liver, Pancreatic
DTX1 113057689_113098028 CHR12 NA Gain
TBX3 114670254_114684175 CHR12 NA Gain
KSR2 117453011_117968990 CHR12 NA Gain BC
NCOR2 124324414_124495252 CHR12 NA Gain Liver, Blood, Colorectal, Pancreatic, Skin
MAP2K4 12020876_12143828 CHR17 NA Gain Liver, Blood, Colorectal, Pancreatic BC
CCT6B 34927860_34961460 CHR17 NA Gain Blood
COL1A1 50183288_50201632 CHR17 NA Gain
HLF 55264959_55325187 CHR17 NA Gain Liver
MSI2 57256522_57684689 CHR17 NA Gain Liver, Blood, Pancreatic, Gastric, Thyroid BC
GNA13 65009288_65056740 CHR17 NA Gain Liver, Colorectal, Pancreatic
AXIN2 65528562_65561648 CHR17 NA Gain Colorectal, Lung, Endometrial, Bladder
CANT1 79000001_79009817 CHR17 NA Gain BC
MN1 27748276_27801756 CHR22 NA Gain
GTSE1 46296869_46330810 CHR22 NA Gain
HLF 55264959_55325187 CHR17 NA Gain Liver
MSI2 57256522_57684689 CHR17 NA Gain Liver, Blood, Pancreatic, Gastric, Thyroid BC
MYCN 15940549_15947004 CHR2 Gain NA
CENPA 26786055_26794589 CHR2 Gain NA
PPP1CB 28751747_28802930 CHR2 Gain NA Liver, Blood, Colorectal, Pancreatic
ALK 29192773_29921586 CHR2 Gain NA
YPEL5 30146940_30160533 CHR2 Gain NA Liver
EPAS1 46297406_46386697 CHR2 Gain NA Liver, Blood
FANCL 58159246_58241350 CHR2 Gain NA
ETAA1 67397321_67412089 CHR2 Gain NA
DCTN1 74361153_74380355 CHR2 Gain NA Colorectal, Sarcoma
INPP4A 98444949_98581821 CHR2 Gain NA
SOS1 39000001_39121051 CHR2 Gain NA Liver, Blood
TET3 74000001_74108176 CHR2 Gain NA Blood, Colorectal, Pancreatic, Gastric
AFF3 100000001_100106128 CHR2 Gain NA Colorectal, Blood
CXCR4 136114348_136116243 CHR2 DEL NA
LRP1B 140231422_141000000 CHR2 DEL NA Gastric
ACVR2A 147845028_147930822 CHR2 DEL NA Liver, Pancreatic, Colorectal, Gastric BC
H3F3AP4 174719907_174720318 CHR2 DEL NA
CHN1 174799312_175000000 CHR2 DEL NA Blood
HOXD13 176092720_176095944 CHR2 DEL NA
HOXD11 176104215_176109754 CHR2 DEL NA
NFE2L2 177230307_177264727 CHR2 DEL NA Liver, Blood, Colorectal, Pancreatic
PMS1 189784380_189877629 CHR2 DEL NA
STAT1 190969033_191000000 CHR2 DEL NA Blood
STAT4 191029575_191151590 CHR2 DEL NA Blood
CREB1 207529891_207603431 CHR2 DEL NA Blood, Sarcoma, Colorectal, Pancreatic, Gastric
CPS1 210477681_210678142 CHR2 DEL NA Liver, Colorectal
ERBB4 211375716_212000000 CHR2 DEL NA Liver
IKZF2 213005362_213151603 CHR2 DEL NA Blood
BARD1 214725645_214809683 CHR2 DEL NA
INHA 219572309_219575711 CHR2 DEL NA
PAX3 222200985_222298996 CHR2 DEL NA
ACSL3 222861035_222944639 CHR2 DEL NA
CUL3 224470149_224585363 CHR2 DEL NA Lung, Blood, Sarcoma, Colorectal, Pancreatic, Gastric BC
IRS1 226731316_226799759 CHR2 DEL NA
ACKR3 236569824_236582354 CHR2 DEL NA
HDAC4 239048167_239400949 CHR2 DEL NA Blood, Colorectal
DROSHA 31400496_31532061 CHR5 DEL NA Liver
LIFR 38474962_38595404 CHR5 DEL NA Liver
RICTOR 38937919_39000000 CHR5 DEL NA Liver, Blood, Colorectal, Gastric
MAP3K1 56815548_56896152 CHR5 DEL NA Liver, Pancreatic, Colorectal, Skin, Thyroid
PIK3R1 68215755_68301821 CHR5 DEL NA Liver, Colorectal, Pancreatic, Gastric, Thyroid BC
ARHGEF28 73626157_73941992 CHR5 DEL NA Colorectal, Pancreatic
MEF2C 88718240_88904257 CHR5 DEL NA Blood, Sarcoma, Skin
ARHGAP26 143000001_143229011 CHR5 DEL NA Blood, Liver, Colorectal
CSF1R 150053290_150113372 CHR5 DEL NA Blood, Sarcoma
PDGFRB 150113838_150155845 CHR5 DEL NA Blood
CD74 150400040_150412751 CHR5 DEL NA
EBF1 158695919_159000000 CHR5 DEL NA Sarcoma
GABRA6 161685720_161702592 CHR5 DEL NA

Bold indicates genes associated with BC.

Pos-neg Positive–negative subjects, Pos-pos Positive-positive subjects, CHR CHRomosome, DEL Deletion, BC Breast Cancer, ID Identification, NA Not Applicable, CCGD Candidate Cancer Gene Database.

SNPs, indels and associated genes

A total of 1,583,400 variants, 1,282,284 SNPs, 47,693 multiple nucleotide polymorphisms (MNPs) and 253,423 Indels were identified across all subjects before MAF and CADD filtering, which subsequently yielded 1,467,158 (1,215,768 SNPs, 47,693 MNPs and 203,697 Indels) and 143,719 variants, respectively (134,929 SNPs, 2386 MNPs and 6404 Indels). Of these 143,719 variants, 9494 and 134,225 were identified as coding and noncoding variants, respectively. Of the 9494 total coding variants, 3196 were predicted to have deleterious impact; out of these variants, 2139 were exclusive to the pos-pos group, and 1048 were exclusive to the pos-neg group. Subsequently, 10 variants were identified as shared by at least 2 subjects, 6 for the pos-pos group and 4 for the pos-neg group. Of the 134,225 noncoding variants detected, 78,704 were exclusive to the pos-pos group, and 38,845 were exclusive to the pos-neg group. Thereafter, 3992 and 1144 variants were identified as shared by at least 2 subjects of each group, respectively. Functional annotation of the noncoding variants identified 7 intronic variants, 5 in pos-pos and 2 in pos-neg subjects, and 3 upstream and downstream variants, 2 in pos-pos and 1 in pos-neg subjects (Table 4). A final set of 25 variants overlapped with oncogenes. Eighteen variants were identified among the pos-pos subjects (6 coding and 12 non-coding), and 10 of these 18 variants were previously described to be associated with liver, blood, pancreatic and skin cancers; only one pos-pos variant, rs2884935, was found in a gene (KMT2C) associated with BC. Among the pos-neg subjects, 7 variants were related to oncogenes (4 coding and 3 non-coding), and 5 of these were associated with blood, colorectal and pancreatic cancers, but none were detected in the breast oncogenes (Table 5).

Table 4.

Variants count with functional annotation of noncoding variants.

Variants filtering Variant count
FilterMutectCalls Total: 1,583,400 (SNPs: 1,282,284; MNPs: 47,693; Indels: 253,423)
 < .01 AF 1000G ALL and non-TCGA ExAC ALL 1,467,158 (SNPs: 1,215,768; MNPs: 47,693; Indels: 203,697)
CADD (SNPs) or CADD Indel (indels) Scaled Phred Score > 10 143,719 (SNPs: 134,929; MNPs: 2386; Indels: 6404)
Variant stratification Coding Variants Non-Coding Variants
Total count 9494 134,225
Predicted deleterious by at least 3 of MutationTaster, PolyPhen V2, Provean and SIFT 3196 NA
Exclusive to a particular group Total: (G1: 2139; G2: 1048) Total: (G1: 78,704; G2: 38,845)
Shared by at least 2 subjects in same group Total: (G1: 6; G2: 4) Total: (G1: 3992; G2: 1144)
FunSeq2 Score >  = 1.5 NA Total: (G1: 12; G2: 3)
Functional annotation of noncoding variants (FunSeq2 Score >  = 1.5) according to ANNOVAR
Variants annotation according to region hit from RefSeq G1 G2
Intergenic 2 0
Intronic 5 2
ncRNA_intronic 1 0
3’UTR 0 0
Upstream and Downstream 2 1
5’UTR5 2 0
ncRNA_exonic 0 0

Bold indicates final variant count after filtering.

RefSeq Reference sequence database, ncRNA non-coding transcript variant, NA Not Applicable, ExAC Exome aggregation consortium, AF Allele Frequency, 1000G 1000 Genomes project for all individuals in this release, CADD Combined Annotation Dependent Depletion, SNPs Single Nucleotide Polymorphisms, Indels insertions/deletions, MNPS Multi-nucleotide Polymorphisms, PolyPhen V2 PolyPhen Version 2, G1 positive-positive subjects, G2 positive–negative subjects, SIFT Sorting Intolerant From Tolerant, PROVEAN Protein Variation Effect Analyzer.

Table 5.

Classification of detected variants by gene and cancer impact.

Genes SNP ID AF Genomic structural Functional annotation Cancer related BC related
Pos-pos
CNTN3  rs139142211 0.0004 Coding EX
TMEM44 rs146561237 NA Coding EX
ANK2  rs776254819 NA Coding EX
SERAC1 rs757825963 NA Coding EX Blood
DAGLB  rs766835420 NA Coding EX Blood, Colorectal
TNC  rs376093344 NA Coding EX
MACF1 NA NA Noncoding INT Liver, Blood, Pancreatic
BATF3 NA NA Noncoding Upstream
NVL NA NA Noncoding INT Blood
FBXW4  rs147494591 0.0078 Noncoding INT Blood
FANK1 NA NA Noncoding INT Colorectal
KCTD4 NA NA Noncoding 5’UTR Colorectal
SHF NA NA Noncoding Upstream
CAVIN1; ATP6V0A1 rs190711126 0.0004 Noncoding Intergenic Blood, Colorectal, Pancreatic
HIF3A NA NA Noncoding 5’UTR
LOC101927050; LOC654342  rs11883680 NA Noncoding Intergenic
ZBTB20-AS1 rs114892760 0.0032 Noncoding ncRNA_intronic Liver, Blood, Pancreatic, Skin
KMT2C  rs2884935 NA Noncoding INT Liver, Blood, Pancreatic, Colorectal, Gastric Breast
Pos-neg
SNIP1 rs202020647 0.0002 Coding EX Colorectal
ATP2A1 rs769732457 NA Coding EX
TBC1D10B  rs145571848 NA Coding EX Blood, Colorectal
EVPL rs201833287 0.0002 Coding EX
PANK1 NA NA Noncoding Upstream Liver, Blood
PRKCA rs139323901 0.003 Noncoding INT Blood, Colorectal, Pancreatic, Gastric
RUNX2; SUPT3H NA NA Noncoding INT Blood

Bold indicates genes associated with BC.

AF 1000G Phase 3 all population Allele Frequency, Column in bold variant previously described as associated with cancer, BC Breast Cancer, SNP Single Nucleotide Polymorphism, Pos-neg positive–negative subjects, ID Identification, Pos-pos positive-positive subjects, rs reference SNP, INT intronic, EX EXonic, NA Not Applicable, G Group, Cancer related according to Candidate Cancer Gene Database.

Significant values are in bold.

Discussion

Multiple studies have demonstrated the significance of a noninvasive ctDNA variant testing biopsy for the early detection of solid tumors and subsequent improved outcomes37, therapy management38, response assessment39, and tumor resistance40. Short-fragment, low tumor-fraction cfDNA testing presents a challenge to early detection efforts, however. These fragments were largely investigated in clinical applications related to treatment prediction, relapse, and drug resistance41. Most previous studies focused on cfDNA levels as a predictive biomarker for therapeutic response in solid cancers42. Recently, a large-scale study based on cfDNA concentration showed that variation in the cfDNA level in plasma is not related to patient outcome and thus suggested that cfDNA concentration could not serve as a reliable biomarker for cancer management43. However, investigating cfDNA molecular profiles remains a viable opportunity for evaluating their relationship in detecting and characterizing the patient’s cancer status. In this study, we report a combined analysis of cfDNA whole-genome profiles between subjects with positive mammograms and biopsies versus subjects with positive mammograms and negative biopsies and suggest the possible role of these differences in the early detection of BC and subsequent clinical diagnosis, precision treatment protocols, and hopefully improved outcomes.

According to our assessment of previous research, our study is the first to examine and propose a full ctDNA analysis, including CNA and SNP/Indel detection and characterization, for identifying breast tumors in dense tissue subjects before mammogram identification. We assert that such an approach, when demonstrated to be robust, could serve as a precision oncology application in early BC detection.

In this study, the mean TF (0.016 and 0.018 for the pos-neg and pos-pos groups, respectively) was lower than the 3% recommended TF cutoff. The low TFs obtained in this study may be related to the low sensitivity in detecting the presence of ctDNA in our sequenced data19. However, the TF ranges were larger in the pos-pos group than in the pos-neg group and thus are possibly a different indicator of the presence of cancer than the TF alone. In addition, a higher TF was found in pos-pos stage II than in pos-pos stage I, suggesting that the ctDNA fraction increases as a function of tumor progression. These results support the interpretation that the isolated DNA fragments were ctDNA, an interpretation consistent with previous liquid biomarker studies investigating cfDNA as an early detection and prognosis biomarker in BC44. Other studies have demonstrated the reliability of ctDNA biomarkers for cancer therapeutic decision-making, evaluating patients’ resistance to treatment45,46, and tracking tumor progression during and after therapy47,48. The results of this study identified deletion and gain CNAs exclusively found in pos-neg subjects that overlapped across 11 known oncogenes. Three of these genes, JAK1, FUBP1, and RBM15, are all associated with liver, blood, colorectal and pancreatic cancers; three, TPR, CDC73 and PIK3C2B are all associated with blood and colorectal cancers; and five, JUN, NEGR1, VTCN1, DDR2 and PBX1, are associated with blood, liver, pancreatic, sarcoma and gastric cancer, respectively. In addition, among the pos-neg subjects, three exclusive deletion CNAs overlapped with the ACVR2A, CUL3 and PIK3R1 oncogenes, which are associated with BC. Among the pos-pos subjects, five exclusive gain CNAs overlapped with the KSR2, MAP2K4, MSI2, CANT1 and MSI2 oncogenes, all previously associated with BC (Table 3). Differences in the detected deletion and gain CNAs associated with pos-neg and pos-pos subjects may be related to epigenetic modifications and their impact on somatic alterations leading to oncogenesis and tumor growth49. The precise differences in nucleosome positioning between tumor and normal cells have been described as actively involved in the footprints of transcription factors associated with oncogenesis detectable in cfDNA fragments50. The nuclear architecture responsible for gene structure and expression has been correlated with cfDNA nucleosome occupancies, suggesting the potential for the early-stage detection of cancer cells51. Recently, these same nucleosome footprints identified cell types shedding cfDNA whose molecular profile suggested involvement in multiple pathological states, including cancer52. cfDNA profiling was also found to be informative of tumor localization and progression53. Differential release of cfDNA was also correlated with tumor heterogeneity among patients diagnosed with similar cancers and thus could be a promising biomarker of therapy management54. The collective evidence from the current and previous studies suggests that CNAs previously described in breast tissue coupled to their presence in a ctDNA-based biopsy may play an important role in the early detection and diagnosis of BC. The SNP and Indel results identified 10 functionally important variants in the pos-pos subjects previously associated with cancer. One variant, rs757825963, was located in SERAC1, a known BC risk factor. In addition, SERAC1 is also associated with leukopenia55, and increased expression of SERAC1 has been correlated with BC risk56. SERAC1 also has a strong interaction with multiple splicing factors (hnRNP A3, hnRNP J, hnRNP G, FMRP, Fox-2) in the context of cancer prognosis and development57. The clear and important role of SERAC1 in splicing events suggests a likely role as an early detection liquid biopsy biomarker when coupled to the role of cfDNA variants associated with dysregulation related to epigenetics. Another identified variant, rs147494591, found in FBXW4, which encodes for the F-box proteins that are involved in biological processes such as cell growth, division, development, differentiation, survival and death58, suggests another possible molecular biomarker for early BC detection. Previous studies found that decreased expression of FBXW4 was correlated with poor survival among non-small-cell lung cancer patients59. A recent study showed that downregulation of FBXW4 favored colorectal tumor relapse and limited the survival range60. Together with the results of this study, these previous study findings suggest that FBXW4 may be an important prognostic indicator in oncology. Pos-pos subject variants identified in NVL suggest a role in the dysregulation of telomere function, possibly initiating breast tumor development. The depletion role of NVL was strongly associated with lower hTERT, associated with decreased telomerase activity in multiple pathogeneses61. Two exclusively pos-pos variants found in known BC risk-associated genes (FANK1 and KCTD4) suggest further pos-pos cfDNA somatic association with BC risk. FANK1 was recently identified as a novel binding partner in mammalian cells that prevents the proteasome degradation of polyubiquitinated FANK1, which leads to the activation of the AP-1 signaling pathway and the induction of tumor cell apoptosis62. KCTD4 was reported as a tumor suppressor gene associated with insertional mutagenesis for leukemia or lymphoma development in insertional mutagenesis in a mouse model study63. The deregulation of both FANK1 and KCTD4 may be a consequence of the observed somatic variants, thus suggesting another association with tumor development and their use as an early detection biomarker in a cfDNA-based assay. The two pos-pos–associated variants (rs766835420 and rs190711126), located in DAGLB and CAVIN1/ATP6V0A1, respectively, were positively associated with BC. SNPs of DAGLB have been correlated with increased DAGLB expression in stomach tissues and were also significantly elevated in gastric tumors compared to adjacent tissues, thus confirming the potential of DAGLB as a susceptibility gene for gastric cancer64. Loss of stromal CAVIN1 expression negates the ability of stromal cells to sequester lipids and is associated with the upregulation of inflammatory factors such as cytokines and their receptors, matrix metalloproteinases, and markers for CAFs65. Deregulation of any inflammatory microenvironment factors, such as those seen in CAVINI, promotes aggressive cancer phenotypes, thus supporting the critical function of CAVINI in the stromal component in tumorigenesis and suggesting a metastasis-suppressing role for this gene66. Any deleterious variant appearing in CAVIN1 will likely contribute to lower CAVINI expression and loss of stromal cell function, suggesting a role in breast cancer genesis and tumor development. Other deleterious pos-pos variants found in MACF1 and ZBTB20-AS1 align with earlier studies showing that MACF1 mutations detected in tissue-specific genomes are responsible for function dysregulation associated with cancer67, and a correlation study found that key ZBTB20-AS1 lncRNAs are associated with colon tumor staging and likely tumor progression68. Finally, a pos-pos exclusive variant was associated with KMT2C, a known BC risk factor. In addition, KMT2C is the gene with the highest mutation count predominantly found in BC, with some mutations associated with chromatin function, affecting transcription mechanisms identified in breast tumor development69. KMT2C mutations were also shown to be key to ERα regulation, which can lead to hormone-driven breast cancer cell proliferation70. In summary, the somatic variants found in the pos-pos cases investigated in this study present a rich and highly associated set of potential biomarkers shown to affect key molecular mechanisms important to oncogenesis (and its suppression) and therefore may be putative biomarkers for early BC detection.

Concerning the pos-neg screening group, 6 oncogenes were identified as containing exclusive variants: SNIP1, TBC1D10B, PRKCA, RUNX2 and SUPT3H. PRKCA has been previously identified as associated with BC and encodes a calcium-dependent protein kinase involved in multiple biological functions, including calcium ion transport, exocytosis, cell growth, and proliferation71. PRKCA is also a central signaling node and coinhibitor of the ESR1, mTORC1, and HDAC genes known to suppress breast cancer72. The collective evidence suggests that PRKCA is an important candidate for breast carcinoma stem cell management73. Two hypotheses suggest a role for PRKCA somatic variants in the absence of cancer in pos-neg subjects. First, these variants may have a protective effect against BC oncogenesis via the modulation of PRKCA expression, thus delaying if not stopping tumor development and growth.

Despite the notable results, there are limitations to be acknowledged. This is a small subject study, and a large cohort study must follow to validate these results and thereby challenge the robustness of the proposed biomarkers. Additionally, it is important that an additional study be performed with healthy control subjects (neg-neg) to test for any BC-associated cfDNA variants. These studies should also include normal tissue (from all subjects) and tumor tissue samples (from pos-pos cases) to validate the cfDNA profile against the tumor profile, thus confirming that cfDNA is actually ctDNA. TF levels must also be tested against presence and staging to further validate the use of TF range and low TF to confirm tumor presence and absence. Some detected variants in the pos-pos case group were previously detected in non-BC tumors. This result raises the possibility that such ctDNA variations may be present due to genome disorder, suggesting that these may not be valid biomarkers for BC.

Conclusions

Early breast cancer detection is of paramount importance in managing the most common cancer worldwide. Any bioassay suggested to be a robust test of early BC must be precise, repeatable, inexpensive and preferably noninvasive to replace the standard mammogram-biopsy protocol for BC diagnosis, but at this time, no such bioassay exists. Studies such as this in dense tissue subjects demonstrate promising evidence that a low-TF (thus providing early detection), noninvasive, robust bioassay may be available through cfDNA molecular testing. The presented results and suggestion are the first to describe a coupled analysis of CNA and SNP/Indel identification using cfDNA profiles for breast cancer early detection. Before these promising results can be used in the development of a panel of biomarkers for a biopsy, further understanding of early breast tumor biology and of the mechanisms that lead to tumor progression, is greatly needed to identify the molecular biomarkers to be used with such a highly informative assay. The molecular profiling and analysis workflow performed in this study on cfDNA taken from early screened and confirmed BC subjects presents promising results contributing to the knowledge required to create such a liquid biopsy test. Further investigations building on this are needed to confirm the results of this study, test the putative cfDNA molecular biomarkers and confirm their validity for inclusion in an early BC detection bioassay. In this way, these biomarkers could can contribute to significant improvements in BC diagnosis and therefore improved treatment optimization and subsequent outcomes to reduce the devastating incidence and mortality of breast cancer.

Acknowledgements

We thank all blood donors who participated in the present study. We express our thanks to Drs. Eduardo J. Simoes and Balkiss Bouhaouala-Zahar for their excellent assistance with experiments, discussion of results and suggested ideas for consideration.

Abbreviations

cfDNA

Cell-free DNA

BC

Breast cancer

CNAs

Copy number alterations

ctDNA

Circulating tumor DNA

NGS

Next-generation sequencing

SNPs

Single nucleotide polymorphisms

Indels

Insertions/deletions

MM

MagMAX

TF

Tumor fraction

MAF

Minor allele frequency

MNPs

Multiple nucleotide polymorphisms

Author contributions

M.B.: Participated in study design, carried out the study and managed all project study participants who aided with experiments, patient consenting and chart, data review, manuscript preparation and data analysis. A.A.M.: Data analysis and processing, variant calling, and manuscript editing. E.G.: Participated in study design, data processing, sequencing alignment and editing manuscript. A.M.: Clinical data acquisition update and review. A.Z.: Patient recruitment and patients pathological report confirmation. N.B.: discussion of results and review of manuscript. P.J.T.: Project principal investigation, original idea, study concept and design, guided overall study analysis, discussion of results, supervised the bioinformatics and statistical data analysis and interpretation, review of manuscript. All authors read and approved the final manuscript.

Funding

This work was supported in by funding provided by the Center for Biomedical Informatics, School of Medicine, University of Missouri, Columbia.

Data availability

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests

Erik Gafni and Nathan Boley are employees of Ravel Biotechnology Startup. The remaining authors have no conflict.

Footnotes

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.20-Breast-fact-sheet.pdf.
  • 2.Cancer Tomorrow [Internet]. [cited 2021 Feb 5]. Available from: https://gco.iarc.fr/tomorrow/en
  • 3.Lei S, et al. Global patterns of breast cancer incidence and mortality: A population-based cancer registry data analysis from 2000 to 2020. Cancer Commun. 2021;41:1183–1194. doi: 10.1002/cac2.12207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kahale LA, Ouertatani H, Brahem AB, Grati H, Hamouda MB, Saz-Parkinson Z, et al. Contextual differences considered in the Tunisian ADOLOPMENT of the European Guidelines on Breast Cancer Screening [Internet]. In Review; 2020 Sep [cited 2021 Feb 5]. Available from: https://www.researchsquare.com/article/rs-72256/v1 [DOI] [PMC free article] [PubMed]
  • 5.Wang L. Early diagnosis of breast cancer. Sensors. 2017;17:1572. doi: 10.3390/s17071572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Identification and validation of plasma biomarkers for diagnosis of breast cancer in South Asian women | Scientific Reports. https://www.nature.com/articles/s41598-021-04176-w. [DOI] [PMC free article] [PubMed]
  • 7.Peled M, Agassi R, Czeiger D, Ariad S, Riff R, Rosenthal M, et al. Cell-free DNA concentration in patients with clinical or mammographic suspicion of breast cancer. Sci. Rep. 2020;10(1):14601. doi: 10.1038/s41598-020-71357-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Tzanikou E, Lianidou E. The potential of ctDNA analysis in breast cancer. Crit. Rev. Clin. Lab. Sci. 2020;57(1):54–72. doi: 10.1080/10408363.2019.1670615. [DOI] [PubMed] [Google Scholar]
  • 9.Song Q, Zhang Y, Liu H, Du Y. Potential of using cell-free DNA and miRNA in breast milk to screen early breast cancer. Biomed. Res. Int. 2020;3(2020):1–11. doi: 10.1155/2020/8126176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Stewart CM, Kothari PD, Mouliere F, Mair R, Somnay S, Benayed R, et al. The value of cell-free DNA for molecular pathology. J. Pathol. 2018;244(5):616–627. doi: 10.1002/path.5048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Toor OM, Ahmed Z, Bahaj W, Boda U, Cummings LS, McNally ME, et al. Correlation of somatic genomic alterations between tissue genomics and ctDNA employing next-generation sequencing: Analysis of lung and gastrointestinal cancers. Mol. Cancer Ther. 2018;17(5):1123–1132. doi: 10.1158/1535-7163.MCT-17-1015. [DOI] [PubMed] [Google Scholar]
  • 12.Horak P, Fröhling S, Glimm H. Integrating next-generation sequencing into clinical oncology: Strategies, promises and pitfalls. ESMO Open. 2016;1(5):e000094. doi: 10.1136/esmoopen-2016-000094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Azim HA, Nguyen B, Brohée S, Zoppoli G, Sotiriou C. Genomic aberrations in young and elderly breast cancer patients. BMC Med. 2015;13(1):266. doi: 10.1186/s12916-015-0504-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Clifton K, Luo J, Tao Y, Saam J, Rich T, Roshal A, et al. Mutation profile differences in younger and older patients with advanced breast cancer using circulating tumor DNA (ctDNA) Breast Cancer Res Treat. 2020 doi: 10.1007/s10549-020-06019-0. [DOI] [PubMed] [Google Scholar]
  • 15.Giuliano AE, Edge SB, Hortobagyi GN. Eighth edition of the AJCC cancer staging manual: Breast cancer. Ann. Surg. Oncol. 2018;25(7):1783–1785. doi: 10.1245/s10434-018-6486-6. [DOI] [PubMed] [Google Scholar]
  • 16.fastp: an ultra-fast all-in-one FASTQ preprocessor | Bioinformatics | Oxford Academic. https://academic.oup.com/bioinformatics/article/34/17/i884/5093234?login=true. [DOI] [PMC free article] [PubMed]
  • 17.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Home - BioBam - Bioinformatics Made Easy. BioBamhttps://www.biobam.com/.
  • 19.Adalsteinsson VA, et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat. Commun. 2017;8:1324. doi: 10.1038/s41467-017-00965-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Navarro Gonzalez J, et al. The UCSC genome browser database: 2021 update. Nucl. Acids Res. 2021;49:1046–1057. doi: 10.1093/nar/gkaa1070. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Abbott KL, et al. The candidate cancer gene database: A database of cancer driver genes from forward genetic screens in mice. Nucleic Acids Res. 2015;43:844–848. doi: 10.1093/nar/gku770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.McKenna A, et al. The genome analysis toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.The mutational constraint spectrum quantified from variation in 141,456 humans | Nature. https://www.nature.com/articles/s41586-020-2308-7. [DOI] [PMC free article] [PubMed]
  • 25.Wang K, Li M, Hakonarson H. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164–e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lek M, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. doi: 10.1038/nature19057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 2014;46(3):310–315. doi: 10.1038/ng.2892. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Schwarz JM, Rödelsperger C, Schuelke M, Seelow D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat. Methods. 2010;7(8):575–576. doi: 10.1038/nmeth0810-575. [DOI] [PubMed] [Google Scholar]
  • 29.Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nat. Methods. 2010;7(4):248–249. doi: 10.1038/nmeth0410-248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Choi Y, Chan AP. PROVEAN web server: A tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics. 2015;31(16):2745–2747. doi: 10.1093/bioinformatics/btv195. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31(13):3812–3814. doi: 10.1093/nar/gkg509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Liu X, Li C, Mou C, Dong Y, Tu Y. dbNSFP v4: A comprehensive database of transcript-specific functional predictions and annotations for human nonsynonymous and splice-site SNVs. Genome Med. 2020;12(1):103. doi: 10.1186/s13073-020-00803-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Oscanoa J, et al. SNPnexus: A web server for functional annotation of human genome sequence variation (2020 update) Nucleic Acids Res. 2020;48:W185–W192. doi: 10.1093/nar/gkaa420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Fu Y, Liu Z, Lou S, Bedford J, Mu XJ, Yip KY, et al. FunSeq2: A framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 2014;15(10):480. doi: 10.1186/s13059-014-0480-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Bunn, A., Korpela, M. Crossdating in dplR. :12.
  • 36.Villanueva RAM, Chen ZJ. ggplot2: Elegant graphics for data analysis. Meas. Interdiscip. Res. Perspect. 2019;17(3):160–167. doi: 10.1080/15366367.2019.1565254. [DOI] [Google Scholar]
  • 37.Chera BS, Kumar S, Shen C, Amdur R, Dagan R, Green R, et al. Plasma circulating tumor HPV DNA for the surveillance of cancer recurrence in HPV-associated oropharyngeal cancer. J. Clin. Oncol. 2020;38(10):1050–1058. doi: 10.1200/JCO.19.02444. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Tie J, Cohen JD, Wang Y, Christie M, Simons K, Lee M, et al. Circulating tumor DNA analyses as markers of recurrence risk and benefit of adjuvant therapy for stage III colon cancer. JAMA Oncol. 2019;5(12):1710–1717. doi: 10.1001/jamaoncol.2019.3616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Christensen E, Birkenkamp-Demtröder K, Sethi H, Shchegrova S, Salari R, Nordentoft I, et al. Early detection of metastatic relapse and monitoring of therapeutic efficacy by ultra-deep sequencing of plasma cell-free DNA in patients with urothelial bladder carcinoma. JCO. 2019;37(18):1547–1557. doi: 10.1200/JCO.18.02052. [DOI] [PubMed] [Google Scholar]
  • 40.Horn L, Whisenant JG, Wakelee H, Reckamp KL, Qiao H, Leal TA, et al. Monitoring therapeutic response and resistance: Analysis of circulating tumor DNA in patients with ALK+ lung cancer. J. Thorac. Oncol. 2019;14(11):1901–1911. doi: 10.1016/j.jtho.2019.08.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kilgour E, Rothwell DG, Brady G, Dive C. Liquid biopsy-based biomarkers of treatment response and resistance. Cancer Cell. 2020;37(4):485–495. doi: 10.1016/j.ccell.2020.03.012. [DOI] [PubMed] [Google Scholar]
  • 42.Kumar S, Guleria R, Singh V, Bharti AC, Mohan A, Das BC. Plasma DNA level in predicting therapeutic efficacy in advanced nonsmall cell lung cancer. Eur. Respir. J. 2010;36(4):885–892. doi: 10.1183/09031936.00187909. [DOI] [PubMed] [Google Scholar]
  • 43.Pan S, Xia W, Ding Q, Shu Y, Xu T, Geng Y, et al. Can plasma DNA monitoring be employed in personalized chemotherapy for patients with advanced lung cancer? Biomed. Pharmacother. 2012;66(2):131–137. doi: 10.1016/j.biopha.2011.11.022. [DOI] [PubMed] [Google Scholar]
  • 44.Li BT, Drilon A, Johnson ML, Hsu M, Sima CS, McGinn C, et al. A prospective study of total plasma cell-free DNA as a predictive biomarker for response to systemic therapy in patients with advanced non-small-cell lung cancers†. Ann. Oncol. 2016;27(1):154–159. doi: 10.1093/annonc/mdv498. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Fernandez-Garcia D, Hills A, Page K, Hastings RK, Toghill B, Goddard KS, et al. Plasma cell-free DNA (cfDNA) as a predictive and prognostic marker in patients with metastatic breast cancer. Breast Cancer Res. 2019;21(1):149. doi: 10.1186/s13058-019-1235-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Choudhury, A.D., Werner, L., Francini, E., Wei, X.X., Ha, G., Freeman, S.S., et al. Tumor fraction in cell-free DNA as a biomarker in prostate cancer. JCI Insight [Internet]. [cited 2021 Feb 5];3(21). Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6238737/ [DOI] [PMC free article] [PubMed]
  • 47.Housman G, Byler S, Heerboth S, Lapinska K, Longacre M, Snyder N, et al. Drug resistance in cancer: An overview. Cancers. 2014;6(3):1769–1792. doi: 10.3390/cancers6031769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Ng SB, Chua C, Ng M, Gan A, Poon PS, Teo M, et al. Individualised multiplexed circulating tumour DNA assays for monitoring of tumour presence in patients after colorectal cancer surgery. Sci. Rep. 2017;7(1):40737. doi: 10.1038/srep40737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Conteduca V, Wetterskog D, Scarpi E, Romanel A, Gurioli G, Jayaram A, et al. Plasma tumour DNA as an early indicator of treatment response in metastatic castration-resistant prostate cancer. Br. J. Cancer. 2020;123(6):982–987. doi: 10.1038/s41416-020-0969-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Sun K, Jiang P, Chan KCA, Wong J, Cheng YKY, Liang RHS, et al. Plasma DNA tissue mapping by genome-wide methylation sequencing for noninvasive prenatal, cancer, and transplantation assessments. PNAS. 2015;112(40):E5503–E5512. doi: 10.1073/pnas.1508736112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Kang, H., Hata, A. Chapter six-control of drosha-mediated microRNA maturation by smad proteins. In: Guo F, Tamanoi F, editors. The Enzymes [Internet]. Academic Press; 2012 [cited 2021 Feb 5]. p. 123–36. (Eukaryotic RNases and their Partners in RNA Degradation and Biogenesis, Part B; vol. 32). Available from: https://www.sciencedirect.com/science/article/pii/B9780124047419000064
  • 52.Chromatin - an overview | ScienceDirect Topics [Internet]. [cited 2021 Feb 5]. Available from: https://www.sciencedirect.com/topics/neuroscience/chromatin
  • 53.Snyder MW, Kircher M, Hill AJ, Daza RM, Shendure J. Cell-free DNA comprises an in vivo nucleosome footprint that informs its tissues-of-origin. Cell. 2016;164(1):57–68. doi: 10.1016/j.cell.2015.11.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Gundem G, Van Loo P, Kremeyer B, Alexandrov LB, Tubio JMC, Papaemmanuil E, et al. The evolutionary history of lethal metastatic prostate cancer. Nature. 2015;520(7547):353–357. doi: 10.1038/nature14347. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Brastianos PK, Carter SL, Santagata S, Cahill DP, Taylor-Weiner A, Jones RT, et al. Genomic characterization of brain metastases reveals branched evolution and potential therapeutic targets. Cancer Discov. 2015;5(11):1164–1177. doi: 10.1158/2159-8290.CD-15-0369. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Svedberg A, Björn N, Sigurgeirsson B, Pradhananga S, Brandén E, Koyi H, et al. Genetic association of gemcitabine/carboplatin-induced leukopenia and neutropenia in non-small cell lung cancer patients using whole-exome sequencing. Lung Cancer. 2020;1(147):106–114. doi: 10.1016/j.lungcan.2020.07.005. [DOI] [PubMed] [Google Scholar]
  • 57.Kar SP, Beesley J, Olama AAA, Michailidou K, Tyrer J, Kote-Jarai Z, et al. Genome-wide meta-analyses of breast, ovarian, and prostate cancer association studies identify multiple new susceptibility loci shared by at least two cancer types. Cancer Discov. 2016;6(9):1052–1067. doi: 10.1158/2159-8290.CD-15-1227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Zheng, Y., Shen, Z., Fan, Z., Wang, W., Geng, Q., Kan, Q., et al. Alternative splicing events and subtype analysis of esophageal cancer [Internet]. In Review; 2020 Oct [cited 2021 Feb 5]. Available from: https://www.researchsquare.com/article/rs-80935/v1
  • 59.Skaar JR, Pagan JK, Pagano M. Mechanisms and function of substrate recruitment by F-box proteins. Nat. Rev. Mol. Cell Biol. 2013;14(6):369–381. doi: 10.1038/nrm3582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Lockwood WW, Chandel SK, Stewart GL, Erdjument-Bromage H, Beverly LJ. The novel ubiquitin ligase complex, SCFFbxw4, interacts with the COP9 signalosome in an F-box dependent manner, is mutated, lost and under-expressed in human cancers. PLoS ONE. 2013;8(5):e63610. doi: 10.1371/journal.pone.0063610. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Zhang Y, Sun L, Wang X, Sun Y, Chen Y, Xu M, et al. FBXW4 acts as a protector of FOLFOX-based chemotherapy in metastatic colorectal cancer identified by co-expression network analysis. Front. Genet. 2020 doi: 10.3389/fgene.2020.00113/full?report=reader. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Wang M, Chen J, He K, Wang Q, Li Z, Shen J, et al. The NVL gene confers risk for both major depressive disorder and schizophrenia in the Han Chinese population. Prog. Neuropsychopharmacol. Biol. Psychiatry. 2015;1(62):7–13. doi: 10.1016/j.pnpbp.2015.04.001. [DOI] [PubMed] [Google Scholar]
  • 63.Ma W, Zhang X, Li M, Ma X, Huang B, Chen H, et al. Proapoptotic RYBP interacts with FANK1 and induces tumor cell apoptosis through the AP-1 signaling pathway. Cell. Signal. 2016;28(8):779–787. doi: 10.1016/j.cellsig.2016.03.012. [DOI] [PubMed] [Google Scholar]
  • 64.Jofra Hernández R, Calabria A, Sanvito F, De Mattia F, Farinelli G, Scala S, et al. Hematopoietic tumors in a mouse model of X-linked chronic granulomatous disease after lentiviral vector-mediated gene therapy. Mol. Ther. 2021;29(1):86–102. doi: 10.1016/j.ymthe.2020.09.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Ni J, Deng B, Zhu M, Wang Y, Yan C, Wang T, et al. Integration of GWAS and eQTL analysis to identify risk loci and susceptibility genes for gastric cancer. Front Genet. 2020 doi: 10.3389/fgene.2020.00679/full?report=reader. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Low J-Y, Brennen WN, Meeker AK, Ikonen E, Simons BW, Laiho M. Stromal CAVIN1 controls prostate cancer microenvironment and metastasis by modulating lipid distribution and inflammatory signaling. Mol. Cancer Res. 2020;18(9):1414–1426. doi: 10.1158/1541-7786.MCR-20-0364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Full article: Single nucleotide polymorphism mutation related genes in bladder cancer for the treatment of patients: a study based on the TCGA database [Internet]. [cited 2021 Feb 5]. Available from: https://www.tandfonline.com/doi/full/10.1080/13102818.2020.1864231
  • 68.Qian W, Feng Y, Li J, Peng W, Gu Q, Zhang Z, et al. Construction of ceRNA networks reveals differences between distal and proximal colon cancers. Oncol. Rep. 2019;41(5):3027–3040. doi: 10.3892/or.2019.7083. [DOI] [PubMed] [Google Scholar]
  • 69.Argyri, M., Viktor, L., Malin, M., Arendt, M.L., Jessika, N. Link to external site this link will open in a new window, et al. Targeted sequencing reveals the somatic mutation landscape in a Swedish breast cancer cohort. Scientific Reports (Nature Publisher Group) [Internet]. 2020 [cited 2021 Feb 5];10(1). Available from: https://search.proquest.com/docview/2471554712/abstract/45F07C679DAC4029PQ/1 [DOI] [PMC free article] [PubMed]
  • 70.Gala K, Li Q, Sinha A, Razavi P, Dorso M, Sanchez-Vega F, et al. KMT2C mediates the estrogen dependence of breast cancer through regulation of ERα enhancer function. Oncogene. 2018;37(34):4692–4710. doi: 10.1038/s41388-018-0273-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Chen J, Wu F, Shi Y, Yang D, Xu M, Lai Y, et al. Identification of key candidate genes involved in melanoma metastasis. Mol. Med. Rep. 2019;20(2):903–914. doi: 10.3892/mmr.2019.10314. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Sulaiman A, McGarry S, Lam KM, El-Sahli S, Chambers J, Kaczmarek S, et al. Co-inhibition of mTORC1, HDAC and ESR1α retards the growth of triple-negative breast cancer and suppresses cancer stem cells. Cell Death Dis. 2018;9(8):1–14. doi: 10.1038/s41419-018-0811-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Zhang Z, Chen X, Zhang J, Dai X. Cancer stem cell transcriptome landscape reveals biomarkers driving breast carcinoma heterogeneity. Breast Cancer Res. Treat. 2021 doi: 10.1007/s10549-020-06045-y. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets used and analyzed during the current study are available from the corresponding author on reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES