Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2021 Jan 11;16(1):e0236907. doi: 10.1371/journal.pone.0236907

Novel candidates of pathogenic variants of the BRCA1 and BRCA2 genes from a dataset of 3,552 Japanese whole genomes (3.5KJPNv2)

Hideki Tokunaga 1,2, Keita Iida 3,4, Atsushi Hozawa 3, Soichi Ogishima 2,3, Yoh Watanabe 5, Shogo Shigeta 1, Muneaki Shimada 1,2, Yumi Yamaguchi-Kabata 3, Shu Tadaka 3, Fumiki Katsuoka 3,6, Shin Ito 3,7,8, Kazuki Kumada 2,3, Yohei Hamanaka 3, Nobuo Fuse 3, Kengo Kinoshita 2,3,7,8,9, Masayuki Yamamoto 2,3,6, Nobuo Yaegashi 1,3, Jun Yasuda 3,7,8,*
Editor: Yonglan Zheng10
PMCID: PMC7799847  PMID: 33428613

Abstract

Identification of the population frequencies of definitely pathogenic germline variants in two major hereditary breast and ovarian cancer syndrome (HBOC) genes, BRCA1/2, is essential to estimate the number of HBOC patients. In addition, the identification of moderately penetrant HBOC gene variants that contribute to increasing the risk of breast and ovarian cancers in a population is critical to establish personalized health care. A prospective cohort subjected to genome analysis can provide both sets of information. Computational scoring and prospective cohort studies may help to identify such likely pathogenic variants in the general population. We annotated the variants in the BRCA1 and BRCA2 genes from a dataset of 3,552 whole-genome sequences obtained from members of a prospective cohorts with genome data in the Tohoku Medical Megabank Project (TMM) with InterVar software. Computational impact scores (CADD_phred and Eigen_raw) and minor allele frequencies (MAFs) of pathogenic (P) and likely pathogenic (LP) variants in ClinVar were used for filtration criteria. Familial predispositions to cancers among the 35,000 TMM genome cohort participants were analyzed to verify the identified pathogenicity. Seven potentially pathogenic variants were newly identified. The sisters of carriers of these moderately deleterious variants and definite P and LP variants among members of the TMM prospective cohort showed a statistically significant preponderance for cancer onset, from the self-reported cancer history. Filtering by computational scoring and MAF is useful to identify potentially pathogenic variants in BRCA genes in the Japanese population. These results should help to follow up the carriers of variants of uncertain significance in the HBOC genes in the longitudinal prospective cohort study.

Introduction

Since the precision medicine initiative was launched in 2015 by the US government [1], prediction of the disease risks of individuals by using their genomic information has become plausible in a clinical setting. In Japan, gene profiling assays for cancer tissues and companion diagnostic tests for cancer-predisposing genes are now covered by the national health insurance system. These gene profiling tests can examine variations in most of the genes conferring susceptibility to two major adult-onset hereditary cancer-predisposing syndromes, hereditary breast and ovarian cancer syndrome (HBOC) and Lynch syndrome. Nowadays, the clinical significance of variants of these genes is important for patient care and the health of their relatives at the bedside.

The correct judgment of the pathogenicity of germline variants in these cancer-predisposing genes is critical for the physicians who manage such patients and undertake gene profiling analyses for cancer treatment. For example, in carriers of disease-causing mutations of HBOC, prophylactic surgery is beneficial [2, 3]. Testing of BRCA genes may help carriers’ decision-making regarding prophylactic salpingectomy or salpingo-oophorectomy because, in patients with high-grade serous carcinoma arising from the fallopian tube, germline BRCA mutations are more prevalent in Japanese women than in other ethnic groups [4, 5]. Synthetic lethal drugs for cancers associated with homologous recombination defects are available for patients carrying disease-causing mutations of HBOC [6, 7]. In this context, variants of uncertain significance (VUSs) would clearly be a source of major problems for clinicians. Kurian et al. reported that inexperienced breast surgeons tend to manage patients with VUSs in the BRCA1 or BRCA2 gene as pathogenic HBOC mutation carriers [8]. This means that the lack of comprehensive annotation methods for variants might cause overdiagnosis or overtreatment in patients with BRCA mutations that are uncharacterized but actually benign.

To overcome these difficulties, several levels of studies (single organization, single nation, and whole world level) have been done previously. As a single organization study, Sugano et al. reported the BRCA1 and BRCA2 germline variants in 135 HBOC patients and identified 28 pathogenic ones [9]. As the nationwide study, Arai et al. examined 830 Japanese HBOC pedigrees collected by the Japanese HBOC consortium and identified 49 different pathogenic variants among them [10]. Similarly, a nationwide multicenter study revealed that germline BRCA 1/2 mutations were present in 14.7% of 634 Japanese women with ovarian cancer [5]. Lee et al. also examined the variants in the BRCA1 and BRCA2 genes in breast and ovarian cancer patients’ germline genomic DNA and calculated posterior probabilities for the disease-causing mutations; they identified five previously unreported variants as candidate pathogenic ones [11]. Finally, as an international study, the BRCA Challenge project established an open access database, BRCA Exchange for providing reliable and easily accessible variant data for better clinical treatments of HBOC [12]. As of October 2020, the BRCA Exchange database has collected more than 40,000 variants in the BRCA1/2 genes from major clinical databases and estimated their pathogenicity under expert peer review in collaboration with the ENIGMA consortium [13]. The purposes of this comprehensive database are to provide reliable and easily accessible variant data interpreted for the high-penetrance phenotype of HBOC and to develop a model database for the utilization and sharing of public data to provide better clinical treatments of hereditary disease. In this database, there are more than 4,900 variants annotated as “pathogenic” by the ENIGMA consortium. Recently, a large-scale Japanese project involving the sequencing of HBOC patients’ germline genomic DNA for 11 breast cancer-predisposing genes revealed 134 pathogenic germline variants concentrated in cancer patients in the BRCA1 and BRCA2 genes [14]. Patient-based studies for identifying germline pathogenic variants are very effective for identifying potential variants of this kind, but cannot estimate the frequencies of those alleles in the general population, which is critical for estimating the number of HBOC patients in a community. In addition, moderately deleterious HBOC gene variants contribute to increase the risk of breast and ovarian cancers in a population, so identifying them is critical for establishing personalized health care. The carriers of moderately deleterious HBOC variants would not undergo drastic prophylactic modalities, but frequent examination would be recommendable for earlier detection of the cancers. A prospective cohort subjected to genome analysis would provide both sets of information.

Only analyses of prospective cohorts of the general population can confirm the causality of VUSs via the collection of follow-up data and using the precise minor allele frequencies. However, in the case of follow-up surveys in prospective cohorts, it is critical to focus on the participants who need to be carefully followed up because of the limitation of available resources [15]. An appropriate method to select participants for detailed follow-up studies is critical for analyzing the causalities of germline VUSs in cancer-predisposing genes.

Here, we describe the levels of known and potentially disease-causing variants in the BRCA genes among the general Japanese population, by analyzing a whole-genome reference panel for the Japanese characterized by the Tohoku Medical Megabank (TMM) Project. The TMM Project involves a combination of prospective cohort, biobanking, and genome-omics analysis (for reviews, see [1619]). The dataset collected so far includes more than 3,500 independent whole-human-genome sequences (3.5KJPNv2) [20] with self-reported individual and family history data. The main benefit of the whole-genome sequencing of the dataset is that it provides more comprehensive information of the structure of the two HBOC genes than the exome-based approach. We also refer to these data to test whether computational annotation can identify any variants that might cause HBOC with high penetrance.

Materials and methods

Ethics approval and consent to participate

This study was approved by the ethics committee of Tohoku Medical Megabank Organization at Tohoku University (registration number: 2018-4-003). All participants in the present study were recruited by Tohoku Medical Megabank Organization at Tohoku University and provided written informed consent to participate in the cohort study.

Dataset

Subjects were obtained from the TMM Community-Based Cohort (TMM CommCohort) Study established by Tohoku Medical Megabank Project [21], in which more than 120,000 adults participate. The whole-genome sequences of some of the participants have been obtained; the criteria for selecting WGS samples are described elsewhere [19, 22]. In brief, the samples for development of the Japanese whole-genome sequencing dataset were selected based on the SNP array data of the samples. Only one sample was picked up from a kinship group to obtain the precise allele frequencies. The whole-genome sequencing was performed with HiSeq 2500 sequencers (Illumina, Inc., San Diego, CA) with a PCR-free protocol from the genomic DNA extracted from whole blood.

Annotation of genomic variants in the BRCA genes

The 3.5KJPNv2 variant data were downloaded from the jMorp database (https://jmorp.megabank.tohoku.ac.jp/) [23]. The dataset is divided in two in terms of autosomal variants, namely, single-nucleotide variations (tommo-3.5KJPNv2v2-20181105open-af_snvall-autosome.vcf.gz) and indels (tommo-3.5KJPNv2v2-20181105open-af_indelall-autosome.vcf.gz), with index files. We defined the BRCA1 and BRCA2 regions based on GeneCards (https://www.genecards.org/) [24] as chr17:41,196,312–41,277,500 and chr13:32,889,611–32,973,809 (hg19), respectively. Variant extraction was performed with bcftools [25, 26]. The 3.5KJPNv2 VCF file integrates multiple alleles in single lines, so normalization was performed with bcftools.

The BRCA variants in 3.5KJPNv2 were annotated with the InterVar [27] command line package (default options), which depends on ANNOVAR [28]. InterVar is an analytical package to estimate the clinical impact of gene variants based on guidelines for variant interpretation, namely, the American College of Medical Genomic Guidelines and those of the Association for Molecular Pathology in clinical sequencing [29]. InterVar includes annotations of ClinVar (version from December 1, 2015) [30] and predicts pathogenicity, using indices such as Combined Annotation Dependent Depletion (CADD) [31], DANN [32], and Eigen [33]. The positions of the candidate pathological variants found in the Korean population [11] were described as the cDNA positions. To apply the data to the InterVar software, the TransVar annotation program [34] was used to obtain the genomic positions of the variants, followed by the InterVar annotation described above. To compare the variant frequencies in 3.5KJPNv2 and in the gnomAD database for the BRCA1/2 variants, we downloaded gnomAD data [35] from the associated webpage (https://gnomad.broadinstitute.org/; downloaded on February 23, 2020). The selected variants were visualized with the mutation mapper at cBioPortal [36, 37].

The RIKEN 2000 genome allele frequency data [38] were downloaded from the Japanese Encyclopedia of Genetic Associations (http://jenger.riken.jp/data) and TCGA germline variant data were as described previously [39].

Obtaining individual and family histories

TMM prospective cohort project data are stored in a supercomputer system, with secure data access [40]. The TMM database is a relational database and it consists of several separate datasets. The key is the participants’ IDs to link the information stored in the different tables. The individual and family histories were extracted from a large data matrix consisting of self-reported findings from a paper-based questionnaire given to the members of the cohort. The dataset consists of 35,199 participants in the TMM CommCohort and the data were frozen for distribution to the Japanese scientific community in 2017, as a provisional version. For most of the participants, whole-genome sequencing data are not available. The detailed method for obtaining the participants’ past and family histories, which consist of 269 entries for malignant neoplasms and 1271 for other diseases, is described elsewhere [21]. We did not use TMM Project Birth and Three-Generation Cohort data because the participants are expected to be relatively young and their family members may not be old enough to obtain positive cases [41].

The self-reported questionnaire data were filtered out for the participants who checked more than 50 items for past and family histories of malignant neoplasms. Most of the participants who checked more than 50 items showed contradictory histories, such as a self-history of ovarian cancer being recorded by male participants. Therefore, we decided to remove such records and obtained 35,136 records as a result. In the statistical analysis comparing carriers of candidate BRCA pathogenic variants and other TMM CommCohort participants regarding self-reported individual and family histories, we employed the binomial distribution to calculate the p-value. Then, we calculated the accumulation of past and family histories only for the items of malignant neoplasms. The questionnaire just asked about the presence or absence of such histories, which could be represented as “0” or “1” for each item. This made it impossible to give a weight to the numbers of affected siblings or offspring.

In terms of the access to data from the TMM prospective cohort project, users should obtain approval from the sample and data access committee of the TMM Biobank [17]. This committee consists of experts both inside and outside the TMM. Upon the receipt of an application to the committee, the Group of Materials and Information Management in the TMM at Tohoku University supports the procedures for data utilization.

Statistics

To analyze the correlations among the three computational estimates of the impacts of variants, we employed R 3.6.1 for calculating the Pearson correlation coefficient. We applied Fisher’s exact test and chi-squared test with Yates’ correction for calculating the p-values of the differences in numbers of cancer-bearing family members.

Results and discussion

Summary of BRCA variants in 3.5KJPNv2

More than 3,600 variants were found in the BRCA genes, 6.15% of which are in coding regions. The total proportion of coding exonic regions of the two genes is 9.58% in hg19 and 23.1% of the total variants in the two genes in 3.5KJPNv2 are indels. Indel calling using the short-read sequence data is less reliable than the findings for single-nucleotide variants, so the indels found in 3.5KJPNv2 may require further verification using long-read sequencing data.

How many known pathogenic mutations of the BRCA genes are identified in 3.5KJPNv2? We estimated this in a previous study on 2KJPN [42], relative to which there should be more pathogenic variants here. S1 Table indicates the annotation results of the variants in the BRCA1 and BRCA2 regions using the InterVar package. Ten variants in the BRCA genes are annotated as “pathogenic” (P) or “likely pathogenic” (LP) by referring to the ClinVar database. The accumulated frequency of pathogenic variations of BRCA genes in 3.5KJPNv2 is 0.0018, which might be lower than the clinical estimation of HBOC carriers in Japan.

To obtain deeper insight into the 3.5KJPNv2 BRCA variants, we compared the results with the gnomAD database, which contains more than 130,000, multi-ethnic, human exome variants (https://gnomad.broadinstitute.org/) (S1 Table). The total number of pathogenic variants in the BRCA genes is much smaller in 3.5KJPNv2 than in gnomAD (S1 Table). However, considering that the numbers of collected samples are quite different, the numbers of P and LP variants in 3.5KJPNv2 per population were very similar to those for gnomAD. Specifically, the rates of ClinVar P or LP variants were 2.81 × 10−3/person and 2.50 × 10−3/person in 3.5KJPNv2 and gnomAD, respectively. To investigate the population specificity, we extracted ClinVar and InterVar P or LP variants found in East Asian populations in the gnomAD database (gnomAD-EAS; Table 1). Intriguingly, there were only four overlaps between 3.5KJPN and gnomAD-EAS for ClinVar and InterVar P or LP variants (Table 1). For example, one of the most prominent BRCA1 pathogenic variants, L63X [9, 10], does not appear in gnomAD-EAS (Table 1). In contrast, the two most prevalent P or LP variants, BRCA2 p.G2508S and p.A2786T, are present in 3.5KJPNv2. These two variants may be commonly distributed among East Asian populations. These results support the notion that pathogenic variants of a gene are highly specific to each ethnic group and thus that population-specific collection of whole-genome sequencing data is critical for nationwide public health care planning [19].

Table 1. P or LP variants in the BRCA1/2 genes in gnomAD-EAS population.

Chr Start End Ref Alt Gene.refGene Func.refGene AAChange Clinvar InterVar and Evidence 3.5KJPN gnomAD EAS
13 32890557 32890558 AG - BRCA2 splicing NA Likely_pathogenic Uncertain significance . 1.09.E-04
13 32890627 32890627 - T BRCA2 frameshift insertion p.T10fs Pathogenic Pathogenic . 6.41.E-04
13 32905124 32905127 GACA - BRCA2 frameshift deletion p.V250fs Pathogenic Pathogenic . 5.44.E-05
13 32906565 32906565 - A BRCA2 frameshift insertion p.T317fs Pathogenic Pathogenic . 5.87.E-05
13 32907014 32907014 A T BRCA2 stopgain p.K467X Pathogenic Pathogenic . 5.53.E-05
13 32910831 32910831 C G BRCA2 stopgain p.S780X Pathogenic Pathogenic . 5.44.E-05
13 32910932 32910932 C - BRCA2 frameshift deletion p.P814fs Pathogenic Pathogenic . 5.51.E-05
13 32911601 32911601 C T BRCA2 stopgain p.Q1037X Pathogenic Pathogenic . 5.45.E-05
13 32911659 32911662 AAAA - BRCA2 frameshift deletion p.Q1056fs Pathogenic Pathogenic . 5.48.E-05
13 32912090 32912091 TG - BRCA2 frameshift deletion p.C1200fs Pathogenic Pathogenic . 5.44.E-05
13 32912234 32912237 AGTG - BRCA2 frameshift deletion p.S1248fs Pathogenic Pathogenic . 5.47.E-05
13 32913656 32913657 AG - BRCA2 frameshift deletion p.S1722fs Pathogenic Pathogenic . 6.42.E-04
13 32914066 32914069 AATT - BRCA2 frameshift deletion p.T1858fs Pathogenic Pathogenic 0.0003 5.51.E-05
13 32914137 32914137 C A BRCA2 stopgain p.S1882X Pathogenic Pathogenic . 5.45.E-05
13 32914172 32914172 - A BRCA2 stopgain p.Y1894_E1895delinsX Pathogenic Pathogenic . 5.44.E-05
13 32914356 32914356 C A BRCA2 stopgain p.S1955X Pathogenic Pathogenic . 1.00.E-04
13 32914976 32914977 AA - BRCA2 frameshift deletion p.K2162fs Pathogenic Pathogenic . 5.77.E-05
13 32929367 32929370 AAAC - BRCA2 frameshift deletion p.K2459fs Pathogenic Pathogenic . 5.44.E-05
13 32930609 32930609 C T BRCA2 stopgain p.R2494X Pathogenic Pathogenic . 1.63.E-04
13 32931957 32931957 - A BRCA2 frameshift insertion p.D2566fs Pathogenic Pathogenic . 5.44.E-05
13 32937315 32937315 G T BRCA2 splicing NA Pathogenic/Likely_pathogenic Pathogenic . 6.41.E-04
13 32937362 32937362 A G BRCA2 nonsynonymous SNV p.I2675V Pathogenic/Likely_pathogenic Likely pathogenic 0.0001 5.44.E-05
13 32913569 32913569 - T BRCA2 frameshift insertion p.L1693fs UNK Likely pathogenic . 5.71.E-05
13 32930651 32930651 G A BRCA2 nonsynonymous SNV p.G2508S Conflicting Likely pathogenic 0.0003 2.26.E-03
13 32930727 32930727 C T BRCA2 nonsynonymous SNV p.S2533F UNK Likely pathogenic . 5.44.E-05
13 32932057 32932057 A C BRCA2 nonsynonymous SNV p.E2599A UNK Likely pathogenic . 5.44.E-05
13 32936755 32936755 T A BRCA2 nonsynonymous SNV p.M2634K UNK Likely pathogenic . 5.44.E-05
13 32937581 32937581 G C BRCA2 nonsynonymous SNV p.G2748R UNK Likely pathogenic . 5.44.E-05
13 32944557 32944557 C T BRCA2 nonsynonymous SNV p.R2784W Conflicting Likely pathogenic . 5.44.E-05
13 32944563 32944563 G A BRCA2 nonsynonymous SNV p.A2786T Conflicting Likely pathogenic 0.0001 7.52.E-04
13 32953474 32953474 G T BRCA2 nonsynonymous SNV p.Q2925H UNK Likely pathogenic . 5.45.E-05
13 32953617 32953617 G A BRCA2 nonsynonymous SNV p.R2973H Conflicting Likely pathogenic . 1.64.E-04
13 32968844 32968844 A G BRCA2 nonsynonymous SNV p.Y3092C Conflicting Likely pathogenic . 5.02.E-05
13 32969032 32969032 T - BRCA2 frameshift deletion p.F3155fs UNK Likely pathogenic . 6.42.E-04
17 41201209 41201209 G - BRCA1 frameshift deletion p.Q1779fs Pathogenic Pathogenic . 5.44.E-05
17 41209095 41209095 G A BRCA1 stopgain p.R1751X Pathogenic Uncertain significance . 5.01.E-05
17 41215948 41215948 G A BRCA1 nonsynonymous SNV p.R1699W Pathogenic Uncertain significance . 5.44.E-05
17 41244106 41244106 C - BRCA1 frameshift deletion p.E1148fs Pathogenic Pathogenic . 5.44.E-05
17 41245115 41245115 G - BRCA1 frameshift deletion p.P811fs Pathogenic Pathogenic . 5.44.E-05
17 41256190 41256190 G T BRCA1 stopgain p.Y130X Pathogenic Pathogenic . 5.44.E-05
17 41276080 41276080 G A BRCA1 stopgain p.Q12X Pathogenic Pathogenic . 5.44.E-05

Estimation of pathogenic variants in the two BRCA genes in the Japanese population

In the case of ClinVar, the data are based on previous reports of the identification of pathogenic variants in disease-predisposed families, so there might be new, unreported pathogenic variants to be found in the general population. To address this issue, we applied an annotation approach with InterVar. As stated above, InterVar is designed to estimate the clinical importance of human genetic variants that have not been reported previously, in accordance with the American College of Medical Genetics (ACMG) Guidelines of secondary findings in clinical sequencing [29]. Interestingly, the package annotates another 13 variants as P or LP in the BRCA genes, as well as all of the 10 ClinVar P and/or LP variants. Among the 13 newly annotated P or LP variants, 4 are frameshift indels and 9 are nonsynonymous variants. None of these four frameshift indels is annotated with dbSNP, so it should not be considered as discordant with ClinVar. Four nonsynonymous variants detected by InterVar are annotated as “conflicting interpretation of pathogenicity” in the ClinVar database. One of the LP variants from InterVar, BRCA1 p.L52F, shows quite high minor allele frequency (MAF) in 3.5KJPNv2 (0.0037) compared with other definite ClinVar P or LP variants. This variant was estimated to be a VUS in the Japanese HBOC consortium study [10] and “likely benign” by Lee et al. in a Korean prospective study on breast cancer patients.

There is a large publicly available dataset of Japanese whole-genome sequencing data from RIKEN [38]. It consists of deep sequencing data from 2,234 whole genomes (average depth of 25×), 1,939 of which are from BioBank Japan (BBJ), a large biobank of patients suffering from more than 50 diseases [43]. The detailed composition of the samples from BBJ is not available, but 1,276 patients with six diseases including breast cancer are included. Hence, it can be expected that pathogenic variants found in 3.5KJPNv2 might be enriched in the RIKEN dataset, although the selection criteria of the samples for the RIKEN project are unknown. As expected, two InterVar P or LP variants, BRCA2 c.5573_5577C and BRCA1 p.L63X, are enriched (9.75- and 16.2-fold, respectively) in the RIKEN dataset (S2 Table). In addition, a pathogenic variant not found in TMM 3.5KJPNv2 was identified (BRCA2 p.E2877X). In contrast, the prevalent InterVar LP variant, BRCA1 p.L52F, is not enriched in the RIKEN Japanese whole-genome dataset (0.5-fold, S2 Table). Similarly, we checked the germline variants of the BRCA genes in TCGA dataset [39] and found three BRCA2 pathogenic variants that overlapped with 3.5KJPNv2 (p.T219fs, p.T1858fs, and p.N2134fs); all three of these are highly enriched in TCGA (382-fold, 318-fold, and 95.5-fold, respectively).

These results suggest that the annotation by InterVar may include false positives as well as false negatives, although no VUSs identified by the Japanese HBOC consortium are included in our estimation [10]. Precise data on the MAFs obtained by the unbiased selection of panel constituents from the general population are critical for estimating the pathogenicity of VUSs and should be included in the criteria of pathogenicity for adult-onset hereditary disorders such as HBOC based on the InterVar annotation.

Estimate of computational scoring tools’ performance in predicting pathogenicity of novel 3.5KJPNv2 BRCA variants

InterVar annotates the variants’ functional impact based on the ACMG guidelines and it largely depends on previous reports to define the parameters for scoring. For example, criterion PS1 of InterVar states that “the variant involves the same amino acid change as a previously established pathogenic variant regardless of nucleotide change.” This means that one needs previous knowledge about pathogenic variants in order to annotate a variant as “pathogenic” by InterVar. In contrast, only one supportive item, PP3, is used from computational estimations in InterVar: “Multiple lines of computational evidence support a pathogenic effect on the gene or gene product (e.g., conservation, evolutionary, splicing impact). Hence, InterVar may underestimate the clinical impact of potentially pathogenic variants about which previous information is not available. The tendency might be worse in the noncoding regions in the coding genes like the BRCA1/2 genes because of the lack of functional studies for such regions. Nowadays, the whole genome sequencing data is accumulating and comparisons between the phenotypes and variants in the noncoding regions found by the WGS will provide critical data for the interpretation of the noncoding variants. Therefore, we would like to test whether the unbiased, computational estimations of the pathogenicity of the variants can be used to find potentially pathogenic variants without previous knowledge.

The Pearson correlation coefficients of CADD_phred with DANN_rankscore and Eigen_raw were determined to be 0.815 and 0.860, respectively, showing that both DANN and Eigen correlate well with CADD. However, interestingly, the distributions of ClinVar and/or InterVar P or LP variants were quite different. CADD_phred and DANN_rankscore showed wider distributions in P or LP variants than CADD_phred and Eigen_raw. The Pearson correlation coefficients of CADD_phred with DANN_rankscore and Eigen_raw were 0.127 and 0541, respectively. Interestingly, in both of the scatter plots, BRCA1 p.L52F, a benign variant annotated as LP by InterVar, showed similar scores to the other P or LP variants in the three parameters. S3 Table shows the details of the computational scoring for the ClinVar/InterVar P or LP variants. The ClinVar P or LP variants clearly showed higher average and minimum scores for CADD_phred and Eigen_raw scores for InterVar P or LP variants, but not for DANN_rankscore. Based on this observation, we decided to use CADD_phred and Eigen_raw for further filtration of potentially pathogenic mutations.

Minor allele frequencies are also critical parameters for interpreting the clinical impact of germline variants. As expected, both CADD_phred and Eigen_raw show weak positive correlations with the reverse logarithmic minor allele frequencies (Pearson correlation coefficients = 0.172 and 0.161, respectively). The CADD_phred and Eigen_raw scores of the InterVar P or LP variants are similar to those of the ClinVar P or LP variants (Table 2), with the exception of the BRCA1 L52F variant. Based on these comparisons, we defined computational thresholds for possible pathogenic BRCA single-nucleotide variants as follows: CADD_phred≥25.9, Eigen_raw≥0.501, and MAF≤0.0003 (Fig 1A).

Table 2. Summary of candidate “pathogenic” variants of BRCA genes in 3.5KJPN version 2.

Class Position Ref Alt Ref.Gene ExonicFunc.refGene AAChange InterVar Clinvar 3.5KJPN CADD Eigen OncoKB BRCA Exchange
InterVar P or LP 32903605 TG - BRCA2 frameshift deletion p.T219fs P P 0.0001 . . NA No data
32911577 - T BRCA2 frameshift insertion p.M1029fs LP UNK 0.0001 . . NA No data
32913262 GT - BRCA2 frameshift deletion p.K1590fs LP UNK 0.0001 . . NA No data
32914066 AATT - BRCA2 frameshift deletion p.T1858fs P P 0.0003 . . NA Pathogenic
32914210 CT - BRCA2 frameshift deletion p.N1906fs P P 0.0001 . . NA No data
32914894 TAACT - BRCA2 frameshift deletion p.N2134fs P P 0.0001 . . NA No data
32920978 C T BRCA2 stopgain p.R2318X P P 0.0003 46 0.506 NA Pathogenic
32930651 G A BRCA2 nonsynonymous SNV p.G2508S LP Conflicting 0.0003 34 0.924 Likely Neutral Not reviewed
32930714 G - BRCA2 frameshift deletion p.G2529fs LP UNK 0.0001 . . NA No data
32936719 A C BRCA2 nonsynonymous SNV p.N2622T LP UNK 0.0001 28.7 0.965 NA No data
32937362 A G BRCA2 nonsynonymous SNV p.I2675V* LP P/LP 0.0001 25.9 0.756 Likely Oncogenic Not reviewed
32944563 G A BRCA2 nonsynonymous SNV p.A2786T LP Conflicting 0.0001 28 0.585 Likely Oncogenic Not reviewed
32944612 C T BRCA2 nonsynonymous SNV p.P2802L LP UNK 0.0003 32 0.291 NA Not reviewed
32954267 G A BRCA2 nonsynonymous SNV p.V3081I LP UNK 0.0001 23.8 -0.348 NA Not reviewed
41197729 T C BRCA1 nonsynonymous SNV p.Y1853C LP LP 0.0003 27 0.694 Likely Oncogenic Not reviewed
41215947 C T BRCA1 nonsynonymous SNV p.R1699Q LP Conflicting 0.0003 35 0.871 Likely Oncogenic Not reviewed
41223120 T A BRCA1 nonsynonymous SNV p.Q1604L LP UNK 0.0001 18.26 -0.385 NA No data
41226421 - A BRCA1 frameshift insertion p.V1534fs LP UNK 0.0001 . . NA No data
41228562 T C BRCA1 nonsynonymous SNV p.K1476R LP UNK 0.0001 23.1 0.12 NA No data
41244334 G - BRCA1 stopgain p.L1072X P P 0.0001 . . NA Pathogenic
41244748 G A BRCA1 stopgain p.Q934X P P 0.0001 35 0.501 NA Pathogenic
41258497 A T BRCA1 stopgain p.L63X P P 0.0003 39 0.807 NA Pathogenic
Computational+MAF 32900706 G T BRCA2 nonsynonymous SNV p.S196I VUS VUS 0.0003 29.8 0.783 Likely Oncogenic Not reviewed
32930669 A G BRCA2 nonsynonymous SNV p.K2514E VUS VUS 0.0001 32 0.865 NA Not reviewed
32944605 C T BRCA2 nonsynonymous SNV p.P2800S VUS VUS 0.0001 33 0.864 NA Not reviewed
32968948 T G BRCA2 nonsynonymous SNV p.W3127G VUS UNK 0.0001 29.3 0.764 NA No data
41197783 C T BRCA1 nonsynonymous SNV p.R1835Q VUS VUS 0.0001 34 0.738 Likely Oncogenic Not reviewed
41215957 C T BRCA1 nonsynonymous SNV p.V1696M VUS VUS 0.0001 33 0.754 Likely Oncogenic Not reviewed
41256212 G A BRCA1 nonsynonymous SNV p.S123F VUS UNK 0.0001 28.2 0.686 NA Not reviewed
Benign 32913077 G A BRCA2 nonsynonymous SNV p.G1529R Benign Likely benign 0.0001 NA Benign / Little Clinical Significance

* Known as a splicing error-causing variant [38].

Fig 1. Relationships between the variant impact scores for the BRCA genes in 3.5KJPNv2.

Fig 1

Panel a. Schematic diagram of filtering steps for candidate pathogenic variants in the BRCA genes. The details of the filtering process are described in the main text. MAF indicates minor allele frequency. Panel b. Distribution of candidate pathogenic variants of BRCA cDNA in 3.5KJPNv2. Schematic diagram of the BRCA1 and BRCA2 cDNA generated with Mutation Mapper on the cBioPortal. “F,” “N,” “S,” and “X” indicate frameshifts, nonsynonymous single-nucleotide variants, splicing error variants, and stopgains, respectively. The height of lollipops indicates the number of cases found in 3.5KJPNv2. Asterisks indicate variants in the computational + MAF set.

Eight BRCA variants that fulfill these three criteria defined by the ClinVar P or LP variants are present in 3.5KJPNv2 (Table 2). One of these, BRCA2 p. G1529R, is annotated as “benign” or “likely benign” by ClinVar and InterVar, respectively. This variant is quite rare but found in two different ethnic groups, namely, African-Americans and non-Finnish Europeans (minor allele frequencies of 0.0003 and 0.0007, respectively; Table 2). Because ClinVar annotated the variant as “likely benign,” we excluded it from further analysis. A summary of the filtering criteria is shown in Fig 1A.

We also tested these criteria for the 134 potentially pathogenic BRCA variants for women that were shown to be enriched in breast cancer cases in a previous study [14]. Among them, only 13 variants are found in the latest version of the Japanese whole-genome reference panel (4.7KJPN: jmorp database: https://jmorp.megabank.tohoku.ac.jp/202001/) and all of the available MAFs are ≤ 0.0003. Eighty-seven variants are annotated as P or LP for both ClinVar and InterVar and 130 variants are annotated as P or LP in either ClinVar or InterVar. Four variants were annotated as “pathogenic” by Momozawa et al. [14], but not annotated as P or LP by InterVar (S4 Table). Among them, three variants showed high CADD_phred (24–35) and Eigen_raw scores (0.571–0.871) (S4 Table). One exception is BRCA1 p.K1095E, which is annotated as “likely benign” by InterVar and neither the CADD_phred nor the Eigen_raw score reaches our criteria to define it as pathogenic. Therefore, our criteria correspond well to the previous studies.

A summary of the variants identified in this study is shown in Table 2 and the distribution of the candidate pathogenic variants in the BRCA proteins is shown in Fig 1B. The nonsynonymous variants tend to localize at the C-terminal of the genes, while the frameshift indels and stopgains are localized between the N-terminal and the middle of the protein sequence. BRCA2 I2675V is known as a “splicing error-causing variant” [44] and it is the most C-terminal-end variant causing large structural changes in the BRCA2 mRNA in our collection. So far there is no other splicing error-causing variants of the BRCA1/2 genes in the 3.5KJPN. As shown in Table 1, there are two splicing-affected variants (BRCA2:g.chr13: 32890558AGdel and BRCA2:g.chr13: 32937315G>A) in the gnomAD-EAS, indicating that we did not miss a large numbers of splicing error causing variants in the BRCA1/2 genes in the 3.5KJPN. We obtained additional annotations at the cBioPortal to draw a schematic diagram; three candidate pathogenic variants identified based on the three criteria are annotated as likely oncogenic, as well as four InterVar P or LP variants (Table 2). This indicates that our approach can effectively identify the pathogenic variants in the BRCA genes. Sugano et al. described BRCA2 Y1853C as a VUS, although both ClinVar and InterVar annotated it as LP [9]. Later, Kawatsu et al. showed the pathogenic potential of this variant by experimental and genetic analyses [45]. Similarly, another variant, BRCA2 p.G2508S, is annotated as “likely neutral” by the OncoKB database. However, this variant was recently described as “moderately oncogenic” by Shimelis et al., based on a genome-wide association study of more than 12,000 cases and controls [46]. Therefore, we decided to include this variant for further study.

In the variant call procedures by Tadaka et al., there is no step for ruling out the false positives caused by clonal hematopoiesis [47]. The basic quality control steps when generating 3.5KJPNv2 were described by Tadaka et al. [20], suggesting that some of the BRCA variants analyzed in this study might have originated from somatic mutations in the blood leucocytes from the cohort participants. However, we believe that clonal hematopoiesis should not have contributed substantially to our dataset for the following reasons. First, Tadaka et al. used GATK haplotypecaller for variant calling for 3.5KJPNv2, which is suitable to detect variants in near-diploid genomes. Thus, most of the variants caused by clonal hematopoiesis would not reach sufficient variant read depth in a WGS sample. In addition, the average age of individuals from whom the samples in 3.5KJPNv2 were derived was around 56 years old [20]. Clonal hematopoiesis occurs mainly in the elderly, becoming prominent in those aged over 65 [47]. Hence, clonal hematopoiesis may not have strongly affected our results.

Potentially pathogenic BRCA variant carriers tend to have cancer-prone family histories

Members of the TMM CommCohort reported their individual and family histories of various disorders including cancers by completing a paper-based questionnaire. It is possible that the BRCA pathogenic variant carriers and their family members would suffer from cancers more often than other cohort members and their family members. Fig 2 indicates the numbers of cases of cancer among the participants themselves, their family members, and their spouses. Although the number of non-carrier participants was more than 1,500 times greater than the number of InterVar P or LP and computational + MAF-selected BRCA variant carriers, the overall profiles of cancer onset were similar. For example, fathers of the participants suffered more from cancers than mothers, regardless of the participants’ status in terms of BRCA variants. A prominent difference between those definitely carrying potentially pathogenic BRCA variants and the rest of the cohort was in the rate of cancer-bearing sisters: the InterVar P or LP carriers were shown to have a much higher rate of cancer-bearing sisters than the rest of the cohort (Fig 2 and S5 Table; p = 3.08 × 10−5, chi-squared test with Yates’ correction). In addition, the rate of cancer-bearing offspring was higher in the InterVar P or LP carriers than in the others with marginally significant (p = 0.041). Interestingly, the rate of cancer onset of the participants themselves did not differ markedly between the InterVar P or LP carriers and the rest of the cohort members. This may be reasonable as nearly half of the InterVar P or LP carriers are male and thus are less likely to suffer from BRCA-related breast cancers (S5 Table).

Fig 2. Preponderance of individual and family histories of cancer for the TMM CommCohort.

Fig 2

Numbers of positive cases of self-reported individual and family histories of cancer among the TMM CommCohort participants. Vertical axes indicate the number of cases with positivity for each item below the horizontal axis. The right and left axes indicate the BRCA candidate pathogenic variant-positive and -negative cases, respectively. The scales of the vertical axes are adjusted by showing the “Family” bars at the same height. “Total cases” indicates the number of cases analyzed, while “Self” indicates individual past history of any malignancy. “Father,” “Mother,” “Brother,” “Sister,” “Offspring,” and “Spouse” indicate the cancer-related histories of the participants’ family members. “Family” indicates a case of any cancer among any of the blood relatives, except the participants themselves. Cases in which the “Spouse” was positive are not included in “Family.” Solid and gray bars represent numbers of cases positive for the BRCA candidate pathogenic variants and the rest of the TMM CommCohort cases, respectively. Asterisks indicate statistically significant differences (single: p < 0.05, double: p < 10−4) upon comparison with the total analyzed TMM CommCohort cases (Fig 2).

Recent progress in bioinformatics may open up a completely different path for filtering the VUSs in hereditary disorders, namely, artificial intelligence-mediated approaches. One example of this is CADD, which was reported in 2014 [31]. CADD scores are based on calculations of all of the possible 84 billion single-nucleotide changes in the human genome. Such calculation is based on machine learning using the evolutionarily conserved “proxy-neutral” variants found in both apes and humans and the recently emerged “proxy-pathogenic” rare variants in the human genome alone [48]. In 2016, a further dataset, Eigen, was released, for which calculation was performed without training data but with a principal component that gives the largest diversity among the variants prepared from all possible single-nucleotide changes in the human genome [33]. These annotation tools have achieved some clinically significant findings in genome-wide association studies (for example, see [49, 50]). Recently, The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium successfully applied CADD to estimate the biological impacts of cancer mutations [51]. Therefore, computational scoring is suggested to be very powerful at predicting the clinical impact of single-nucleotide variants in cancer-predisposing genes. The present study shows the potential for applying this approach to find pathogenic variations in cancer-predisposing syndromes by using genome reference panels with precise MAF estimation. This study shows that the MAF estimate for the general population is much more useful for the annotation of pathogenic variants than the biased collection of population samples.

Recently, Findlay et al. reported that a saturation genomics-based approach could functionally characterize more than 4,000 BRCA1 variants that are in the functionally critical regions [52]. Thirteen BRCA1 variants in the 3.5KJPNv2 corresponds to the list of Findlay et al. and among them, three loss of function variants were annotated by Findlay et al. (L63X, R71S, and Y1853C). Specifically, among them, there were two discordant variants between the work of Findlay et al. and the present study. BRCA1 p.R71S was not picked up by our survey but annotated as “loss of function” in the dataset by Findlay et al., while BRCA1 V1696M was picked up by our survey but annotated as “functional” in the same dataset. It is possible that the pathogenicity of BRCA variants would be affected by other genetic modifiers and/or environmental factors. Most of the computational methods for estimating the impact of genetic variants depend on “known datasets” when they perform machine learning. There are probably many “unknown factors” that are essential for the correct estimation of pathogenicity of variants. Further studies should be performed to provide new and critical information for the computational estimation of pathogenicity of genetic variations. Follow-up of the carriers of these variants in prospective cohort studies may also provide clues to resolving any discordant results.

There was a significant preponderance of cancer in the family histories of those with potentially pathogenic BRCA variants only among the sisters of TMM CommCohort members. The carriers found in the TMM CommCohort were mainly male and the female carriers were relatively young, so they themselves had not yet accumulated many cancer cases. A preponderance of a history of cancer in the mothers was not observed, but the mothers should have been aged over 80, so their accumulation of sporadic cancers would have obscured the HBOC cases. Our study suggests that the self-reported data of the TMM CommCohort are useful to analyze the genotype–phenotype relationships, at least in cancer-predisposing syndromes.

It is not easy to estimate the clinical significance of the VUSs that may have clinically significant effects on the hosts’ predisposition for cancer, but with relatively low penetrance. It is an important insight that VUSs may have moderate but significant effects on cancer onset that can be reduced by personalized health care based on information on the genetic variant. Around 10 years ago, a review paper by Berger et al. proposed that haploinsufficiency is not so uncommon in the onset of cancer in HBOC patients with pathogenic variants in the BRCA genes [53]. Moderately deleterious variants are also critical for the successful establishment of precision medicine and/or personalized health care [54]. The carriers in moderately penetrant HBOC families may not be critical to prompt radical interventions such as prophylactic surgery, but the carriers may be encouraged to continue undergoing close health checks to detect HBOC cancers as early as possible.

Conclusions

The present study indicates that a large dataset of Japanese whole-genome sequencing data (3.5KJPNv2) includes definitely and potentially pathogenic variants in representative genes responsible for HBOC: BRCA1 and BRCA2. ClinVar and the ACMG-guided annotation tool InterVar detected more than 20 variants as pathogenic or likely pathogenic, including one obviously benign variant in 3.5KJPNv2. In addition, the use of the combination of computational scoring and MAF picked up another eight candidates, including one likely benign mutant as defined by ClinVar. Some of the variants show concordance with other databases in terms of the pathogenic annotations. The self-reported individual and family histories of the carriers of potentially pathogenic BRCA variants were analyzed and the carriers’ sisters showed a significant history of cancer themselves. This study indicates that prospective genomic cohort studies are a powerful tool for identifying pathogenic variants. The present study should be useful for identifying such moderately deleterious variations in populations and contribute to the development of personalized health care based on individual genomic information.

Supporting information

S1 Table. Functional annotations of the BRCA gene variants in 3.5KJPNv2 and gnomAD.

(XLSX)

S2 Table. InterVar P or LP variants in the BRCA genes of the RIKEN 2,234 Japanese whole-genome sequence dataset.

(XLSX)

S3 Table. Comparison of scores for pathogenic variants in the BRCA genes in 3.5KJPN.

(XLSX)

S4 Table. Details of “pathogenic” BRCA variants but not P or LP by InterVar in the paper by Momozawa et al.

(XLSX)

S5 Table. Statistics of the sisters or offspring cancer histories of candidate BRCA pathogenic variants carriers in the TMM-Comm cohort.

(XLSX)

Acknowledgments

We thank all past and present members of Tohoku Medical Megabank Organization at Tohoku University (present members are listed at https://www.megabank.tohoku.ac.jp/english/a191201/). We also thank Edanz Group (https://en-author-services.edanzgroup.com/ac) for editing the English text of a draft of this manuscript.

Data Availability

In terms of the ethical restrictions on access to the data used in our study, the data that we used are histories of disease and genomic information; both of these sets of data are private and it would be possible to identify an individual with them. Therefore, it is necessary to obtain approval for data access from the TMM prospective cohort project; specifically, users should obtain approval from the sample and data access committee of the TMM Biobank. This committee consists of experts both inside and outside the TMM. Upon applying to this committee, the Group of Materials and Information Management in the TMM at Tohoku University supports the procedures for data transfer. The Group of Materials and Information Management can be contacted at dist@megabank.tohoku.ac.jp.

Funding Statement

This work was supported by JSPS KAKENHI (Grant Number JP17K07193, JP19H03795, and JP17K11265) for JY, NY, and MS, respectively. This work was supported by The National Cancer Center Research and Development Fund (29-A-3) and AMED (Grant Number JP19ck0106319) for NY and HT, respectively. This work was supported in part by the Tohoku Medical Megabank Project through the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Japan for MY; the Reconstruction Agency, MEXT, Japan for MY; by the Japan Agency for Medical Research and development (AMED; Grant numbers JP17km0105001 and JP17km0105002) awarded to MY; and AMED GRIFIN project (grant numbers JP17km0405203 and JP18km0405203) awarded to MY. All computational resources were provided by the ToMMo supercomputer system (http://sc.megabank.tohoku.ac.jp/en), which is supported by the Facilitation of R&D Platform for AMED Genome Medicine Support conducted by AMED (Grant number JP17km0405001) awarded to MY.

References

  • 1.Collins FS, Varmus H. A new initiative on precision medicine. The New England journal of medicine. 2015;372(9):793–5. 10.1056/NEJMp1500523 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Casey MJ, Colanta AB. Mullerian intra-abdominal carcinomatosis in hereditary breast ovarian cancer syndrome: implications for risk-reducing surgery. Fam Cancer. 2016;15(3):371–84. 10.1007/s10689-016-9878-4 . [DOI] [PubMed] [Google Scholar]
  • 3.George SH, Garcia R, Slomovitz BM. Ovarian Cancer: The Fallopian Tube as the Site of Origin and Opportunities for Prevention. Front Oncol. 2016;6:108 10.3389/fonc.2016.00108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sakurada S, Watanabe Y, Tokunaga H, Takahashi F, Yamada H, Takehara K, et al. Clinicopathologic features and BRCA mutations in primary fallopian tube cancer in Japanese women. Jpn J Clin Oncol. 2018;48(9):794–8. 10.1093/jjco/hyy095 . [DOI] [PubMed] [Google Scholar]
  • 5.Enomoto T, Aoki D, Hattori K, Jinushi M, Kigawa J, Takeshima N, et al. The first Japanese nationwide multicenter study of BRCA mutation testing in ovarian cancer: CHARacterizing the cross-sectionaL approach to Ovarian cancer geneTic TEsting of BRCA (CHARLOTTE). Int J Gynecol Cancer. 2019;29(6):1043–9. 10.1136/ijgc-2019-000384 . [DOI] [PubMed] [Google Scholar]
  • 6.Tung NM, Garber JE. BRCA1/2 testing: therapeutic implications for breast cancer management. Br J Cancer. 2018;119(2):141–52. 10.1038/s41416-018-0127-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Noordermeer SM, van Attikum H. PARP Inhibitor Resistance: A Tug-of-War in BRCA-Mutated Cells. Trends Cell Biol. 2019;29(10):820–34. 10.1016/j.tcb.2019.07.008 . [DOI] [PubMed] [Google Scholar]
  • 8.Kurian AW, Li Y, Hamilton AS, Ward KC, Hawley ST, Morrow M, et al. Gaps in Incorporating Germline Genetic Testing Into Treatment Decision-Making for Early-Stage Breast Cancer. J Clin Oncol. 2017;35(20):2232–9. 10.1200/JCO.2016.71.6480 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Sugano K, Nakamura S, Ando J, Takayama S, Kamata H, Sekiguchi I, et al. Cross-sectional analysis of germline BRCA1 and BRCA2 mutations in Japanese patients suspected to have hereditary breast/ovarian cancer. Cancer Sci. 2008;99(10):1967–76. 10.1111/j.1349-7006.2008.00944.x . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Arai M, Yokoyama S, Watanabe C, Yoshida R, Kita M, Okawa M, et al. Genetic and clinical characteristics in Japanese hereditary breast and ovarian cancer: first report after establishment of HBOC registration system in Japan. J Hum Genet. 2018;63(4):447–57. 10.1038/s10038-017-0355-1 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Lee JS, Oh S, Park SK, Lee MH, Lee JW, Kim SW, et al. Reclassification of BRCA1 and BRCA2 variants of uncertain significance: a multifactorial analysis of multicentre prospective cohort. J Med Genet. 2018;55(12):794–802. 10.1136/jmedgenet-2018-105565 . [DOI] [PubMed] [Google Scholar]
  • 12.Cline MS, Liao RG, Parsons MT, Paten B, Alquaddoomi F, Antoniou A, et al. BRCA Challenge: BRCA Exchange as a global resource for variants in BRCA1 and BRCA2. PLoS Genet. 2018;14(12):e1007752 10.1371/journal.pgen.1007752 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Spurdle AB, Healey S, Devereau A, Hogervorst FB, Monteiro AN, Nathanson KL, et al. ENIGMA—evidence-based network for the interpretation of germline mutant alleles: an international initiative to evaluate risk and clinical significance associated with sequence variation in BRCA1 and BRCA2 genes. Hum Mutat. 2012;33(1):2–7. 10.1002/humu.21628 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Momozawa Y, Iwasaki Y, Parsons MT, Kamatani Y, Takahashi A, Tamura C, et al. Germline pathogenic variants of 11 breast cancer genes in 7,051 Japanese patients and 11,241 controls. Nat Commun. 2018;9(1):4083 10.1038/s41467-018-06581-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Manolio TA, Weis BK, Cowie CC, Hoover RN, Hudson K, Kramer BS, et al. New models for large prospective studies: is there a better way? Am J Epidemiol. 2012;175(9):859–66. 10.1093/aje/kwr453 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kuriyama S, Yaegashi N, Nagami F, Arai T, Kawaguchi Y, Osumi N, et al. The Tohoku Medical Megabank Project: Design and Mission. J Epidemiol. 2016;26(9):493–511. 10.2188/jea.JE20150268 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Minegishi N, Nishijima I, Nobukuni T, Kudo H, Ishida N, Terakawa T, et al. Biobank Establishment and Sample Management in the Tohoku Medical Megabank Project. Tohoku J Exp Med. 2019;248(1):45–55. 10.1620/tjem.248.45 . [DOI] [PubMed] [Google Scholar]
  • 18.Koshiba S, Motoike I, Saigusa D, Inoue J, Shirota M, Katoh Y, et al. Omics research project on prospective cohort studies from the Tohoku Medical Megabank Project. Genes Cells. 2018;23(6):406–17. 10.1111/gtc.12588 . [DOI] [PubMed] [Google Scholar]
  • 19.Yasuda J, Kinoshita K, Katsuoka F, Danjoh I, Sakurai-Yageta M, Motoike IN, et al. Genome analyses for the Tohoku Medical Megabank Project towards establishment of personalized healthcare. J Biochem. 2019;165(2):139–58. 10.1093/jb/mvy096 . [DOI] [PubMed] [Google Scholar]
  • 20.Tadaka S, Katsuoka F, Ueki M, Kojima K, Makino S, Saito S, et al. 3.5KJPNv2: an allele frequency panel of 3552 Japanese individuals including the X chromosome. Hum Genome Var. 2019;6:28 10.1038/s41439-019-0059-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Hozawa A, Tanno K, Nakaya N, Nakamura T, Tsuchiya N, Hirata T, et al. Study profile of The Tohoku Medical Megabank Community-Based Cohort Study. J Epidemiol. 2020. 10.2188/jea.JE20190271 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Nagasaki M, Yasuda J, Katsuoka F, Nariai N, Kojima K, Kawai Y, et al. Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals. Nat Commun. 2015;6:8018 10.1038/ncomms9018 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Tadaka S, Saigusa D, Motoike IN, Inoue J, Aoki Y, Shirota M, et al. jMorp: Japanese Multi Omics Reference Panel. Nucleic Acids Res. 2018;46(D1):D551–D7. 10.1093/nar/gkx978 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Stelzer G, Rosen N, Plaschkes I, Zimmerman S, Twik M, Fishilevich S, et al. The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses. Curr Protoc Bioinformatics. 2016;54:1 30 1–1 3. 10.1002/cpbi.5 . [DOI] [PubMed] [Google Scholar]
  • 25.Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27(21):2987–93. 10.1093/bioinformatics/btr509 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Li Q, Wang K. InterVar: Clinical Interpretation of Genetic Variants by the 2015 ACMG-AMP Guidelines. Am J Hum Genet. 2017;100(2):267–80. 10.1016/j.ajhg.2017.01.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wang K, Li M, Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38(16):e164 10.1093/nar/gkq603 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405–24. 10.1038/gim.2015.30 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42(Database issue):D980–5. 10.1093/nar/gkt1113 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014;46(3):310–5. 10.1038/ng.2892 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Quang D, Chen Y, Xie X. DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics. 2015;31(5):761–3. 10.1093/bioinformatics/btu703 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ionita-Laza I, McCallum K, Xu B, Buxbaum JD. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat Genet. 2016;48(2):214–20. 10.1038/ng.3477 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Zhou W, Chen T, Chong Z, Rohrdanz MA, Melott JM, Wakefield C, et al. TransVar: a multilevel variant annotator for precision genomics. Nat Methods. 2015;12(11):1002–3. 10.1038/nmeth.3622 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536(7616):285–91. 10.1038/nature19057 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6(269):pl1. 10.1126/scisignal.2004088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2(5):401–4. 10.1158/2159-8290.CD-12-0095 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Okada Y, Momozawa Y, Sakaue S, Kanai M, Ishigaki K, Akiyama M, et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat Commun. 2018;9(1):1631 10.1038/s41467-018-03274-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Huang KL, Mashl RJ, Wu Y, Ritter DI, Wang J, Oh C, et al. Pathogenic Germline Variants in 10,389 Adult Cancers. Cell. 2018;173(2):355–70 e14. 10.1016/j.cell.2018.03.039 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Takai-Igarashi T, Kinoshita K, Nagasaki M, Ogishima S, Nakamura N, Nagase S, et al. Security controls in an integrated Biobank to protect privacy in data sharing: rationale and study design. BMC Med Inform Decis Mak. 2017;17(1):100 10.1186/s12911-017-0494-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kuriyama S, Metoki H, Kikuya M, Obara T, Ishikuro M, Yamanaka C, et al. Cohort Profile: Tohoku Medical Megabank Project Birth and Three-Generation Cohort Study (TMM BirThree Cohort Study): Rationale, Progress and Perspective. Int J Epidemiol. 2019. 10.1093/ije/dyz169 . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Yamaguchi-Kabata Y, Yasuda J, Tanabe O, Suzuki Y, Kawame H, Fuse N, et al. Evaluation of reported pathogenic variants and their frequencies in a Japanese population based on a whole-genome reference panel of 2049 individuals. J Hum Genet. 2018;63(2):213–30. 10.1038/s10038-017-0347-1 . [DOI] [PubMed] [Google Scholar]
  • 43.Nakamura Y. The BioBank Japan Project. Clinical advances in hematology & oncology: H&O. 2007;5(9):696–7. . [PubMed] [Google Scholar]
  • 44.Bonnet C, Krieger S, Vezain M, Rousselin A, Tournier I, Martins A, et al. Screening BRCA1 and BRCA2 unclassified variants for splicing mutations using reverse transcription PCR on patient RNA and an ex vivo assay based on a splicing reporter minigene. J Med Genet. 2008;45(7):438–46. 10.1136/jmg.2007.056895 . [DOI] [PubMed] [Google Scholar]
  • 45.Kawaku S, Sato R, Song H, Bando Y, Arinami T, Noguchi E. Functional analysis of BRCA1 missense variants of uncertain significance in Japanese breast cancer families. J Hum Genet. 2013;58(9):618–21. 10.1038/jhg.2013.71 . [DOI] [PubMed] [Google Scholar]
  • 46.Shimelis H, Mesman RLS, Von Nicolai C, Ehlen A, Guidugli L, Martin C, et al. BRCA2 Hypomorphic Missense Variants Confer Moderate Risks of Breast Cancer. Cancer Res. 2017;77(11):2789–99. 10.1158/0008-5472.CAN-16-2568 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Zink F, Stacey SN, Norddahl GL, Frigge ML, Magnusson OT, Jonsdottir I, et al. Clonal hematopoiesis, with and without candidate driver mutations, is common in the elderly. Blood. 2017;130(6):742–52. 10.1182/blood-2017-02-769869 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 2019;47(D1):D886–D94. 10.1093/nar/gky1016 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.He KY, Li X, Kelly TN, Liang J, Cade BE, Assimes TL, et al. Leveraging linkage evidence to identify low-frequency and rare variants on 16p13 associated with blood pressure using TOPMed whole genome sequencing data. Hum Genet. 2019;138(2):199–210. 10.1007/s00439-019-01975-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Wallen ZD, Chen H, Hill-Burns EM, Factor SA, Zabetian CP, Payami H. Plasticity-related gene 3 (LPPR1) and age at diagnosis of Parkinson disease. Neurol Genet. 2018;4(5):e271 10.1212/NXG.0000000000000271 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Consortium ITP-CAoWG. Pan-cancer analysis of whole genomes. Nature. 2020;578(7793):82–93. 10.1038/s41586-020-1969-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Findlay GM, Daza RM, Martin B, Zhang MD, Leith AP, Gasperini M, et al. Accurate classification of BRCA1 variants with saturation genome editing. Nature. 2018;562(7726):217–22. 10.1038/s41586-018-0461-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Berger AH, Knudson AG, Pandolfi PP. A continuum model for tumour suppression. Nature. 2011;476(7359):163–9. 10.1038/nature10275 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Yoshida T, Ono H, Kuchiba A, Saeki N, Sakamoto H. Genome-wide germline analyses on cancer susceptibility and GeMDBJ database: Gastric cancer as an example. Cancer Sci. 2010;101(7):1582–9. 10.1111/j.1349-7006.2010.01590.x . [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Yonglan Zheng

23 Sep 2020

PONE-D-20-21851

Novel candidates of pathogenic variants of the BRCA1 and BRCA2 genes in a 3,552 Japanese whole-genome sequence dataset (3.5KJPNv2)

PLOS ONE

Dear Dr. Yasuda,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Nov 07 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Yonglan Zheng

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.

3. To comply with PLOS ONE submission guidelines, in your Methods section, please provide additional information regarding your statistical analyses. For more information on PLOS ONE's expectations for statistical reporting, please see https://journals.plos.org/plosone/s/submission-guidelines.#loc-statistical-reporting.

4.We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

5. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: No

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Overall:

This study reviews identifies participants in a whole-genome databank with BRCA1/2 mutations that can be considered pathogenic. They identify several variants that are reportedly novel based on ClinVar annotation and discuss the validity of these variants regarding possible pathogenicity.

Strengths:

-The introduction does contain a thorough review of prior studies regarding BRCA1 and BRCA2 mutations in the Japanese population

-Computational annotation of genomes is clearly described using standard resources in the field

-The paper highlights several methods that are used to validate pathogenicity of BRCA1/2 mutations, and the self-reported findings of family members with cancer and variant annotations is an interesting one highlighted in this paper.

Areas for Improvement:

Major:

-The authors do not describe the work of international consortia in this area such as the ENIGMA consortium (https://enigmaconsortium.org/library/enigma-publications/). When they describe the work of prior authors, they refer to other studies in the Japanese population (which is thoroughly reviewed), but not other international consortia or databases which have been dedicated to BRCA pathogenicity. This is of concern as it is not clear if the authors searched other sources to determine that their findings were novel.

-The authors do not clarify the aim of their approach and its novelty relative to prior studies in the Japanese population with larger patient samples. This is a large sample of whole genomes and thereby offers novelty to identify pathogenic mutations that may have not have been previously identified through exome means. It is instead unclear if they are interested in ascertaining the general population frequency of pathogenic mutations in the Japanese population, or in validation of their computational strategies. The authors need to clearly define their aim in the abstract and introduction.

-From their methods, it is not clear if and how the authors address splicing variants and noncoding variants if their goal was to identify potentially pathogenic variants. Intervar does not address these – it uses exonic variants only. These would not be consistently reported in ClinVar, and it is known that several splicing variants in BRCA1 and BRCA2 can lead to clinical pathogenic findings. This would leverage the major benefit offered by the whole genome data.

-From their methods, it is not clear how the authors can confirm that their findings are specifically germline and not due to clonal hematopoiesis. Other approaches to identifying pathogenic germline variants do include methods of quality control in this area for confirmation or allude to this issue in the discussion if this cannot be resolved with the methods used.

-While comparison to GnomAD is standard practice, this is typically done in the context of population-specific evaluations of specific allele frequencies. The conclusions regarding BRCA1/2 variants across both databases is not a typical use of GnomAD nor a clear conclusion with all population data in GnomAD aggregated. GnomAD should be used to discuss specific mutations identified or specific populations. (Would recommend that Table 1 be revised accordingly.)

-Please revise or provide a table to clarify the population studied in the database and range of number of mutations and types of mutations identified per individual (not average number per individual, which is difficult to interpret). This information is not clear from the paper as written. Additionally, the utility of the data provided in the Figures and Tables is frankly mixed, with some information very helpful for the reader and some information not clearly helpful towards the authors’ conclusions. To this end: Figure 1C is appropriate to serve as Figure 1. Figure 1 A and B should either be provided later in the text (with that section moved down) or added to supplementary figures. Figure 2 should be moved up (and could also be a candidate for Figure 1). Please move supplementary Table 2 to the actual paper or provide a table of the novel variants identified by the authors and the annotation information as they provide in the text. Please move Table 2 to the supplement as it is not clear that this provides extensive additional data. Supplementary Table 4 should be moved to the main text.

-Explanations or interpretations from the authors in the results outside of description of the actual results should be moved to the methods or discussion sections as appropriate. The results section is extremely long as a result and the discussion section is too short.

*Regarding the reliability of short-read sequence data

*Findings as interpreted by the authors regarding GnomAD

*CADD vs. Eigen scores

*Data regarding specific variants from outside sources

Minor:

-The title was slightly unclear: Would rephrase to

Novel candidates of pathogenic variants of the BRCA1 and BRCA2 genes from a dataset of 3,552 Japanese whole genomes (3.5KJPNv2)

-The introduction contains conflations regarding methods of identifying pathogenic mutations. For example, the reference to allelic dropout in the paper by Yost et al. is describing germline mutations taken from patients’ tumors, which, while a means of confirming pathogenicity, is subject to its own issues. The question that the authors are asking, though, is regarding unaffected carriers who have BRCA1 and BRCA2 mutations in this cohort. It is confusing to switch back and forth between germline testing by blood / via unaffected carriers and via tumor in affected carriers given the significant differences between these two methods unless this is delineated clearly. The introduction and references should be reviewed to clarify prior methods of identifying BRCA1/BRCA2 pathogenic variants and associated literature, and then also to discuss the findings that have been specific to the Japanese population.

-For the methods, any ANNOVAR/annotation software using the ClinVar database should have the date of reference noted, since ClinVar is updated regularly.

-It would be very helpful for the authors to review the specific methods from prior work that are relevant for their study. For example, an extremely brief review regarding the criteria for WGS selection in the Megabank, depth of sequencing, as well as the methods of how these sequences are obtained (e.g. from whole blood?) and how families may be linked in the Project data.

-For the methods, it would be helpful for there to be quantitative descriptions of the filtering process and use of the self-questionnaire data, as this is not replicable based on the current description.

-Computational estimation of pathogenicity is a data source, but this is an ongoing point of information used in interpreting pathogenicity. The authors’ conclusion at one point between conflicting data sources that “pathogenicity of BRCA variants would be affected by other genetic modifiers and/or environmental factors,” while absolutely true, is not as applicable in discussing the discordance between different methods of estimating pathogenicity (computer vs. saturation genomics modeling). Rather, the question is regarding the fallibility of these estimation approaches. Given the authors’ findings regarding some “pathogenic” mutations annotated as such but clearly benign on futher review, this warrants a significant component of the discussion.

-The comparison between ClinVar and InterVar mutations in the tables is unclear. Does this mean “known pathogenic” and “annotated as pathogenic and novel, but under review”?

-Formatting of captions for tables and figures is not consistent.

-The final paragraph of the results is written as though to conflate variants of uncertain significance and moderate penetrance. Please revise this.

-Regarding data access: It would be more appropriate for the authors to state that they do not own the data themselves, but access to it is governed by the steering committee. dbGAP in the US is available but under the same restrictions, and patients’ privacy is honored.

Reviewer #2: The authors indicates that a large dataset of Japanese whole-genome sequencing data includes pathogenic variations in BRCA1/2 genes responsible for HBOC. ClinVar and InterVar detected more than 20 variants as pathogenic or likely pathogenic. The use of the combination of computational scoring and MAF picked up another eight candidates, including one likely benign mutant as defined by ClinVar. The self-reported individual and family histories of the carriers of potentially pathogenic BRCA variants were analyzed and the carriers’ sisters showed a significant history of cancer themselves.

There are major comments on this study.

1. They use ClinVar, InterVar, computational scoring systems and MAF to evaluate the pathogenicity of the BRCA variants found in their cohort. The approaches are all common and novelty of the study is limited.

2. It is not clear why the difference was observed only in the cancer-bearing sisters of the TMM CommCohort. It seems that the paper-based questionnaires is not so robust to differentiate the pathogenic BRCA variant carriers or to evaluate BRCA annotation systems.

3. Functional analysis is recommended to confirm their annotation is accurate for the variants discordance was observed between ClinVar and their annotation system.

Minor comments are the following:

1. The meanings of sentence p20, l297-299 is not clear.

2. p21, l309 cBioPortal.

3. The total number of all candidate in Figure 3 should be 27 considering male and female number.

4. Poor figure resolution.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jan 11;16(1):e0236907. doi: 10.1371/journal.pone.0236907.r002

Author response to Decision Letter 0


20 Oct 2020

PONE-D-20-21851

Novel candidates of pathogenic variants of the BRCA1 and BRCA2 genes in a 3,552 Japanese whole-genome sequence dataset (3.5KJPNv2)

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

We believe that the revised manuscript meets the requirements of PLOS ONE.

2. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data.

We removed the following two sentences that imply the use of inaccessible data. We believe that this deletion will not affect the main message of our manuscript:

We also checked the status regarding smoking and alcohol consumption in the potential HBOC carriers and others in the TMM CommCohort participants; we did not observe any significant difference between them (data not shown).

Intriguingly, there are 0 and 8 overlaps between two datasets for ClinVar and InterVar P or LP variants, respectively (data not shown).

3. To comply with PLOS ONE submission guidelines, in your Methods section, please provide additional information regarding your statistical analyses. For more information on PLOS ONE's expectations for statistical reporting, please see https://journals.plos.org/plosone/s/submission-guidelines.#loc-statistical-reporting.

We added a “Statistics” subsection in the “Methods” section as follows:

“Statistics

To analyze the correlations among the three computational estimates of the impacts of variants, we employed R 3.6.1 for calculating the Pearson correlation coefficient. We applied Fisher’s exact test chi-squared test with Yates’ correction for calculating the p-values of the differences in numbers of cancer-bearing family members.

4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In terms of access to data from the TMM prospective cohort project, users should obtain approval from the sample and data access committee of the TMM Biobank (Minegishi, 2019). The sample and data access committee consists of experts both inside and outside the TMM. Upon applying to this committee, the Group of Materials and Information Management in the TMM at Tohoku University supports the procedures for data transfer.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

In terms of access to the data used in our manuscript, the Group of Materials and Information Management in the Tohoku Medical Megabank Organization at Tohoku University supports the procedures for data transfer. The Group of Materials and Information Management can be contacted at dist@megabank.tohoku.ac.jp.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

There are restrictions on the data usage, so we cannot upload our data for replication by a third party without conditions.

5. Your ethics statement should only appear in the Methods section of your manuscript. If your ethics statement is written in any section besides the Methods, please move it to the Methods section and delete it from any other section. Please ensure that your ethics statement is included in your manuscript, as the ethics statement entered into the online submission form will not be published alongside your manuscript.

We moved “Ethics approval and consent to participate” to the “Methods” section.

Reviewers' comments:

Comments to the Author

________________________________________

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Overall:

This study reviews identifies participants in a whole-genome databank with BRCA1/2 mutations that can be considered pathogenic. They identify several variants that are reportedly novel based on ClinVar annotation and discuss the validity of these variants regarding possible pathogenicity.

Strengths:

-The introduction does contain a thorough review of prior studies regarding BRCA1 and BRCA2 mutations in the Japanese population

-Computational annotation of genomes is clearly described using standard resources in the field

-The paper highlights several methods that are used to validate pathogenicity of BRCA1/2 mutations, and the self-reported findings of family members with cancer and variant annotations is an interesting one highlighted in this paper.

Areas for Improvement:

Major:

-The authors do not describe the work of international consortia in this area such as the ENIGMA consortium (https://enigmaconsortium.org/library/enigma-publications/). When they describe the work of prior authors, they refer to other studies in the Japanese population (which is thoroughly reviewed), but not other international consortia or databases which have been dedicated to BRCA pathogenicity. This is of concern as it is not clear if the authors searched other sources to determine that their findings were novel.

Thank you for your constructive comment. We added the following text in the “Introduction” section:

“Moreover, the BRCA Challenge project established an open access database, BRCA Exchange for providing reliable and easily accessible variant data for better clinical treatments of HBOC [12]. As of October 2020, the BRCA Exchange database has collected more than 40,000 variants in the BRCA1/2 genes from major clinical databases and estimated their pathogenicity under expert peer review in collaboration with the ENIGMA consortium [13]. The purposes of this comprehensive database are to provide reliable and easily accessible variant data interpreted for the high-penetrance phenotype of HBOC and to develop a model database for the utilization and sharing of public data to provide better clinical treatments of hereditary disease. In this database, there are more than 4,900 variants annotated as “pathogenic” by the ENIGMA consortium. “.

-The authors do not clarify the aim of their approach and its novelty relative to prior studies in the Japanese population with larger patient samples. This is a large sample of whole genomes and thereby offers novelty to identify pathogenic mutations that may have not have been previously identified through exome means. It is instead unclear if they are interested in ascertaining the general population frequency of pathogenic mutations in the Japanese population, or in validation of their computational strategies. The authors need to clearly define their aim in the abstract and introduction.

Thank you very much for the constructive comment. Both of the points that the reviewer raised are of interest to us. One interest is ascertaining the precise frequency of pathogenic mutations in the general Japanese population. The other is validating computational strategies to identify moderately pathogenic variants. Therefore, we amended our manuscript in the “Abstract” and “Introduction” sections to reflect this.

In the “Abstract” section

“Identification of frequencies of definitely pathogenic germline variants in a population in two major hereditary breast and ovarian cancer syndrome (HBOC) genes, BRCA1/2, will be essential to estimate the number of HBOC patients. In addition, moderately pathogenic HBOC gene variants that contribute to increase the risk of breast and ovarian cancers in a population and identification of such variants will be critical in the establishment of personalized healthcare. The prospective cohort with genome analyses will provide both information”.

In the “Introduction” section

“Patient-based studies for identifying germline pathogenic variants are very effective for identifying potential pathogenic variants, but cannot estimate the frequencies of those alleles in the general population, which will be critical to estimate the number of HBOC patients in a community. In addition, moderately pathogenic HBOC gene variants will contribute to increase the risk of breast and ovarian cancers in a population and identification of such variants will be critical in the establishment of personalized healthcare. The carriers of moderately pathogenic HBOC variants will not have undergone the drastic prophylactic modalities but frequent examination will be recommendable for earlier detection of the cancers. The prospective cohort with genome analyses will provide both information”.

-From their methods, it is not clear if and how the authors address splicing variants and noncoding variants if their goal was to identify potentially pathogenic variants. Intervar does not address these – it uses exonic variants only. These would not be consistently reported in ClinVar, and it is known that several splicing variants in BRCA1 and BRCA2 can lead to clinical pathogenic findings. This would leverage the major benefit offered by the whole genome data.

Thank you for the critical comment. We completely agree with the reviewer’s point that the alteration of splicing patterns by genetic variation is critical and whole-genome sequencing data would outperform the exome data for the detection of unknown pathologic splicing variants. In the manuscript by Li et al., the authors point out that InterVar software can infer splicing impacts using ANNOVAR from the dbscsnv11 database. Therefore, the splicing variants that can be detected by dbscsnv11 will be detected by InterVar, although the original paper for dbscsnv11 discussed the intrinsic limitations of their software. Therefore, we added the following text to the “Introduction”:

“The main benefit of the whole-genome sequencing of the dataset is that it provides more comprehensive information of the structure of the two HBOC genes than the exome-based approach”.

-From their methods, it is not clear how the authors can confirm that their findings are specifically germline and not due to clonal hematopoiesis. Other approaches to identifying pathogenic germline variants do include methods of quality control in this area for confirmation or allude to this issue in the discussion if this cannot be resolved with the methods used.

Thank you for the critical comments. The reviewer’s point is correct that there is no step for ruling out the false positives caused by clonal hematopoiesis. The basic quality control steps when generating 3.5KJPNv2 were described by Tadaka et al. in Human Genome Variation (2019) 6:28. There is no description of how to avoid clonal hematopoiesis in the manuscript. However, we believe that clonal hematopoiesis should not have contributed substantially to our dataset for the following reasons.

First, Tadaka et al. used GATK haplotypecaller for variant calling, which is suitable to detect variants in near-diploid genomes. Thus, most of the variants caused by clonal hematopoiesis would not reach sufficient variant read depth in a WGS sample. Second, the average age of the individuals from whom the samples in 3.5KJPNv2 were derived was around 56 years, as estimated by Tadaka et al. in Human Genome Variation (2019) 6:28. However, clonal hematopoiesis mainly occurs in the elderly, with it being reported to become prominent in Icelandic individuals older than 65 (Zink et al., Blood 2017). Thus, as the reviewer suggested, we added a discussion about clonal hematopoiesis in the “Results and Discussion” section.

“In the variant call procedures by Tadaka et al., there is no step for ruling out the false positives caused by clonal hematopoiesis [47]. The basic quality control steps when generating 3.5KJPNv2 were described by Tadaka et al. [20], suggesting that some of the BRCA variants analyzed in this study might have originated from somatic mutations in the blood leucocytes from the cohort participants. However, we believe that clonal hematopoiesis should not have contributed substantially to our dataset for the following reasons. First, Tadaka et al. used GATK haplotypecaller for variant calling for 3.5KJPNv2, which is suitable to detect variants in near-diploid genomes. Thus, most of the variants caused by clonal hematopoiesis would not reach sufficient variant read depth in a WGS sample. In addition, the average age of individuals from whom the samples in 3.5KJPNv2 were derived was around 56 years old [20]. Clonal hematopoiesis occurs mainly in the elderly, becoming prominent in those aged over 65 [47]. Hence, clonal hematopoiesis may not have strongly affected our results.”

-While comparison to GnomAD is standard practice, this is typically done in the context of population-specific evaluations of specific allele frequencies. The conclusions regarding BRCA1/2 variants across both databases is not a typical use of GnomAD nor a clear conclusion with all population data in GnomAD aggregated. GnomAD should be used to discuss specific mutations identified or specific populations. (Would recommend that Table 1 be revised accordingly.)

Thank you very much for your constructive comment. We agree with the points raised and amended the structure of the manuscript accordingly. We rearranged previous Table 1 as Supplementary Table 1 and renumbered the other supplementary tables accordingly (see also the following point). We then changed the main text as follows:

“To investigate the population specificity, we extracted ClinVar and InterVar P or LP variants found in East Asian populations in the GnomAD database (GnomAD-EAS; Table 1). Intriguingly, there were only four overlaps between 3.5KJPN and GnomAD-EAS for ClinVar and InterVar P or LP variants (Table 1). For example, one of the most prominent BRCA1 pathogenic variants, L63X [9, 10], does not appear in GnomAD-EAS (Table 1). In contrast, the two most prevalent P or LP variants, BRCA2 p.G2508S and p.A2786T, are present in 3.5KJPNv2. These two variants may be commonly distributed among East Asian populations.”.

Amended Table 1 is a large one, so we could not include the new one into this document. Please check it in the revised main text.

-Please revise or provide a table to clarify the population studied in the database and range of number of mutations and types of mutations identified per individual (not average number per individual, which is difficult to interpret). This information is not clear from the paper as written. Additionally, the utility of the data provided in the Figures and Tables is frankly mixed, with some information very helpful for the reader and some information not clearly helpful towards the authors’ conclusions. To this end: Figure 1C is appropriate to serve as Figure 1. Figure 1 A and B should either be provided later in the text (with that section moved down) or added to supplementary figures. Figure 2 should be moved up (and could also be a candidate for Figure 1). Please move supplementary Table 2 to the actual paper or provide a table of the novel variants identified by the authors and the annotation information as they provide in the text. Please move Table 2 to the supplement as it is not clear that this provides extensive additional data. Supplementary Table 4 should be moved to the main text.

Thank you for your constructive comment. We followed the reviewer’s suggestions. Previous Figures 1C and 2 were changed to Figure 1A and 1B, respectively. We also rearranged Table 2 as Supplementary Table 2. Then, we described the findings appearing in previous Figure 1A and B in the main text as follows:

“The Pearson correlation coefficients of CADD_phred with DANN_rankscore and Eigen_raw are 0.815 and 0.860, respectively, showing that both DANN and Eigen correlate well with CADD. However, interestingly, the distributions of ClinVar and/or InterVar P or LP variants are quite different. CADD_phred and DANN_rankscore show wider distributions in P or LP variants compared with CADD_phred and Eigen_raw. The Pearson correlation coefficients of CADD_phred with DANN_rankscore and Eigen_raw are 0.127 and 0541, respectively. Interestingly, in both of the scatter plots, BRCA1 p.L52F, a benign variant annotated as LP by InterVar, shows at a similar scores to the other P or LP variants in the three parameters. Supplementary Table 3 shows the details of the computational scoring in the ClinVar/InterVar P or LP variants.”

We also removed previous Supplementary Table 4 and updated the main text.

“In addition, the difference of the ratio of cancer-bearing offspring was higher in the InterVar P or LP carriers than in the others, although this was not statistically significant (p = 0.041).”

The other proposals, such as converting previous Supplementary Table 2 to new Table 2 with all of the described annotations, have also been acted upon. Therefore, the numbering of the figures and tables has been updated throughout the manuscript.

-Explanations or interpretations from the authors in the results outside of description of the actual results should be moved to the methods or discussion sections as appropriate. The results section is extremely long as a result and the discussion section is too short.

*Regarding the reliability of short-read sequence data

*Findings as interpreted by the authors regarding GnomAD

*CADD vs. Eigen scores

*Data regarding specific variants from outside sources

Thank you for the constructive comment. We actually followed the recommendations of the journal itself and used the “Results and Discussion” format that merges the two sections into one. This enabled the “Conclusions” section to be more concise. The reviewer’s points are all important, but some of the issues might be outside the remit of our current study. For example, our data were not focused on the reliability of short read sequences. Similarly, we do not have enough data to discuss which of CADD or Eigen is better. In terms of the GnomAD data, we amended Table 1 and added some discussion as per the reviewer’s suggestion.

Minor:

-The title was slightly unclear: Would rephrase to

Novel candidates of pathogenic variants of the BRCA1 and BRCA2 genes from a dataset of 3,552 Japanese whole genomes (3.5KJPNv2)

Thank you for this constructive comment. We agree with this and rephrased our manuscript’s title as suggested.

-The introduction contains conflations regarding methods of identifying pathogenic mutations. For example, the reference to allelic dropout in the paper by Yost et al. is describing germline mutations taken from patients’ tumors, which, while a means of confirming pathogenicity, is subject to its own issues. The question that the authors are asking, though, is regarding unaffected carriers who have BRCA1 and BRCA2 mutations in this cohort. It is confusing to switch back and forth between germline testing by blood / via unaffected carriers and via tumor in affected carriers given the significant differences between these two methods unless this is delineated clearly. The introduction and references should be reviewed to clarify prior methods of identifying BRCA1/BRCA2 pathogenic variants and associated literature, and then also to discuss the findings that have been specific to the Japanese population.

Thank you for the critical comment, with which we completely agree. We removed the sentences related to the study by Yost et al. accordingly.

-For the methods, any ANNOVAR/annotation software using the ClinVar database should have the date of reference noted, since ClinVar is updated regularly.

Thank you for the constructive comment. We added information on the ClinVar version (version from December 1, 2015) in the “Methods” section.

-It would be very helpful for the authors to review the specific methods from prior work that are relevant for their study. For example, an extremely brief review regarding the criteria for WGS selection in the Megabank, depth of sequencing, as well as the methods of how these sequences are obtained (e.g. from whole blood?) and how families may be linked in the Project data.

Thank you for the constructive comment. We added a brief review about the TMM project data management in the “Methods” section as follows.

“The whole-genome sequences of some of the participants have been obtained; the criteria for selecting WGS samples are described elsewhere [19, 22]. In brief, the samples for development of the Japanese whole-genome sequencing dataset were selected based on the SNP array data of the samples. Only one sample was picked up from a kinship group to obtain the precise allele frequencies. The whole-genome sequencing was performed with HiSeq 2500 sequencers (Illumina, Inc., San Diego, CA) with a PCR-free protocol from the genomic DNA extracted from whole blood.”.

-For the methods, it would be helpful for there to be quantitative descriptions of the filtering process and use of the self-questionnaire data, as this is not replicable based on the current description.

Thank you for the critical comment. We added the following explanations of filtering the self-questionnaire data to the “Methods” section.

“The self-reported questionnaire data were filtered out for the participants who checked more than 50 items for past and family histories of malignant neoplasms. Most of the participants who checked more than 50 items showed contradictory histories, such as a self-history of ovarian cancer being recorded by male participants. Therefore, we decided to remove such records and obtained 35,136 records as a result. In the statistical analysis comparing carriers of candidate BRCA pathogenic variants and other TMM CommCohort participants regarding self-reported individual and family histories, we employed the binomial distribution to calculate the p-value. Then, we calculated the accumulation of past and family histories only for the items of malignant neoplasms. The questionnaire just asked about the presence or absence of such histories, which could be represented as “0” or “1” for each item. This made it impossible to give a weight to the numbers of affected siblings or offspring.”.

-Computational estimation of pathogenicity is a data source, but this is an ongoing point of information used in interpreting pathogenicity. The authors’ conclusion at one point between conflicting data sources that “pathogenicity of BRCA variants would be affected by other genetic modifiers and/or environmental factors,” while absolutely true, is not as applicable in discussing the discordance between different methods of estimating pathogenicity (computer vs. saturation genomics modeling). Rather, the question is regarding the fallibility of these estimation approaches. Given the authors’ findings regarding some “pathogenic” mutations annotated as such but clearly benign on further review, this warrants a significant component of the discussion.

Thank you for the thoughtful comment, with which we completely agree. We added the following text in the “Results and discussion” section accordingly:

“Most of the computational estimation methods for the impact of genetic variations depend on the ‘known data sets’ when they perform machine learning. Probably, there are many ‘unknown factors’ that are essential for correct estimation of variants. Further studies should be necessary to provide the new and critical information for the computational estimation of genetic variations. Follow-up of the carriers of these variants in prospective cohort studies may provide the clue to solve the discordance, too”.

-The comparison between ClinVar and InterVar mutations in the tables is unclear. Does this mean “known pathogenic” and “annotated as pathogenic and novel, but under review”?

Thank you for this comment. The reviewer’s understanding is correct.

-Formatting of captions for tables and figures is not consistent.

Thank you for this comment. There were several inconsistent descriptions in the legends of previous Figure 1a and b. We removed them from the revised version of the manuscript and fixed some symbols in the new Figure 1b. We also revised the tables. We believe that there are now no inconsistencies.

-The final paragraph of the results is written as though to conflate variants of uncertain significance and moderate penetrance. Please revise this.

Thank you for this thoughtful comment. Accordingly, we amended the text as follows:

“Among those genes, it is not easy to estimate the clinical significance of the VUSs that may have clinically significant effects on the hosts’ predisposition for cancer, but with relatively low penetrance.”.

-Regarding data access: It would be more appropriate for the authors to state that they do not own the data themselves, but access to it is governed by the steering committee. dbGAP in the US is available but under the same restrictions, and patients’ privacy is honored.

Thank you for this constructive comment. In accordance with this, we added a paragraph in the methods section.

“In terms of the access to data from the TMM prospective cohort project, users should obtain approval from the sample and data access committee of the TMM Biobank [17]. This committee consists of experts both inside and outside the TMM. Upon the receipt of an application to the committee, the Group of Materials and Information Management in the TMM at Tohoku University supports the procedures for data utilization”.

Reviewer #2: The authors indicates that a large dataset of Japanese whole-genome sequencing data includes pathogenic variations in BRCA1/2 genes responsible for HBOC. ClinVar and InterVar detected more than 20 variants as pathogenic or likely pathogenic. The use of the combination of computational scoring and MAF picked up another eight candidates, including one likely benign mutant as defined by ClinVar. The self-reported individual and family histories of the carriers of potentially pathogenic BRCA variants were analyzed and the carriers’ sisters showed a significant history of cancer themselves.

There are major comments on this study.

1. They use ClinVar, InterVar, computational scoring systems and MAF to evaluate the pathogenicity of the BRCA variants found in their cohort. The approaches are all common and novelty of the study is limited.

Thank you for this comment. However, we believe that the use of the prospective cohort’s self-reporting data for validation of the pathogenicity of the variants in BRCA1/2 is quite a novel idea.

2. It is not clear why the difference was observed only in the cancer-bearing sisters of the TMM CommCohort. It seems that the paper-based questionnaires is not so robust to differentiate the pathogenic BRCA variant carriers or to evaluate BRCA annotation systems.

Thank you for this comment, with which we basically agree. We did not intend to identify a new, highly penetrant pathogenic variant through this study. Instead, we attempted to develop a method to identify “moderately pathogenic” or “low penetrant” deleterious variants. To validate such variants, large-scale data collection methods are critical. Therefore, a paper-based questionnaire is appropriate to obtain the necessary information.

3. Functional analysis is recommended to confirm their annotation is accurate for the variants discordance was observed between ClinVar and their annotation system.

Thank you for the constructive comment. We performed some functional analyses to check this discordance. However, HBOC is not an acute disorder. Simple functional analysis may not be applicable for evaluating the small differences in the functions of the variants. Therefore, epidemiological methods would be more appropriate.

Minor comments are the following:

1. The meanings of sentence p20, l297-299 is not clear

Thank you for this constructive comment. Accordingly, we amended the text as follows:

“Four variants were annotated as “pathogenic” by Momozawa et al. [14], but not annotated as P or LP by InterVar (Supplementary Table 4). Among them, three variants showed high CADD_phred (24–35) and Eigen_raw scores (0.571–0.871) (Supplementary Table 4)”.

2. p21, l309 cBioPortal.

Thank you for this comment. We fixed the capitalization of this term throughout the main text.

3. The total number of all candidate in Figure 3 should be 27 considering male and female number.

Thank you for carefully checking the data. We fixed the figure accordingly.

4. Poor figure resolution.

Thank you for this comment. We generated high-resolution figures and substituted them for the old ones. We hope that the new figures have sufficient resolution.

________________________________________

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

________________________________________

In compliance with data protection regulations, you may request that we remove your personal registration details at any time. (Remove my information/details). Please contact the publication office if you have any questions.

Attachment

Submitted filename: Response_to_Reviewersfinal.docx

Decision Letter 1

Yonglan Zheng

23 Nov 2020

PONE-D-20-21851R1

Novel candidates of pathogenic variants of the BRCA1 and BRCA2 genes from a dataset of 3,552 Japanese whole genomes (3.5KJPNv2)

PLOS ONE

Dear Dr. Yasuda,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jan 07 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Yonglan Zheng

Academic Editor

PLOS ONE

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: I Don't Know

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors’ efforts are appreciated with regard to their revisions on the manuscript. This has resulted in text that is significantly improved, but continues to have some ongoing issues towards publication.

MAJOR:

(1) With regard to InterVar use, to reiterate/clarify the point from the initial review, the authors did not explain their handling of noncoding variants (which is, as they clarified, is the point of WGS data over exonic data). Intervar describes handling of the mutation types in question as follows:

*Splicing -> Intervar does address some splicing variants, but they describe in the Li paper that their data cleaning procedure includes removal of variants with conflicting interpretation as part of data cleaning prior to checking the dbscsnv11 database. (This is confusing as the authors then show variants that have conflicting interpretations in ClinVar? I wasn't clear how these would have been identified if Intervar was used as described in the Li paper?)

*Intronic / noncoding variants -> This point was not addressed by the authors although the WGS would allow them to do so (and that is noted as one of their reasons for novelty). ANNOVAR identifies intergenic and noncoding variants, but InterVar’s web application is specifically designed for annotation use in exons and their paper notes that:

“InterVar is designed to interpret genetic variants that are likely to cause Mendelian diseases or are highly penetrant for Mendelian diseases (OR > 5) and cannot handle alleles that increase susceptibility to common and complex traits. Therefore, we caution that the current interpretation is appropriate only for Mendelian diseases or Mendelian forms of complex diseases.”

If the authors intend to only address splicing using WGS data over the existing exonic data, then region of application is what needs to be clarified consistently through the paper and a point should be made in the discussion regarding limitations of InterVar in this context.

If they intend to state that they analyzed intronic / noncoding variants related to BRCA1/BRCA2, then it is not clear that InterVar as described with use of default settings is a good tool for intronic / noncoding variants. I would refer the authors to the literature for references such as this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266896/ It is noted that the variant annotation, albeit different, cited by the authors in their prior analysis on 2KJPN demonstrates ability to pursue this work – this point is just not clear.

(2) There continue to be problems with conflation of moderate penetrance and likely pathogenic variants that are concerning. The term “moderately pathogenic variants” remains unclear. Please revise “moderately pathogenic” to “likely pathogenic” (related to variants) or “moderate penetrance” (related to genes) depending on what was intended as these are the standard accepted terms in the field.

For instance, these descriptions are completely inaccurate regarding management of patients with BRCA1/BRCA2 mutations because of this conflation problem.

“The carriers of moderately pathogenic HBOC variants will not have

undergone the drastic prophylactic modalities but frequent examination will be recommendable for earlier detection of the cancers.”

“The presence of such variants may not be critical to prompt radical interventions such as

prophylactic surgery, but the carriers may be encouraged to continue undergoing close health checks to detect HBOC cancers as early as possible.”

Also:

“Many of the cancer-predisposing genes are known to be associated with juvenile cancer

syndromes such as Li Fraumeni syndrome. The variants responsible for juvenile cancer syndromes

are usually very pathogenic and show strong effects on gene functions.”

Li Fraumeni syndrome is a specific diagnosis associated with a specific gene, not multiple genes. These sentences in particular should just be removed completely.

(3) If part of the purpose of the paper is to evaluate the performance of InterVar in annotation, ClinVar and InterVar variants should be treated as separate categories. There are multiple parts of the results in which variants from these groups are just put together.

MINOR:

Introduction:

-On page 4, in the introduction, the term “approaches” is unclear since what the authors are referring to is that there are several efforts (whereas approaches implies different strategies of ascertaining pathogenicity of identified mutations).

-Would modify result/discussion subtitles from “Pathogenic variants in the two…” -> “Estimation of pathogenic variants…” and “Estimate of computational scoring tools for pathogenicity of the 3.5KJPNv2 BRCA variants” -> “Estimate of computational scoring tools’ performance in predicting pathogenicity of novel 3.5KJPNv2 BRCA variants”

-Please revise references to genomAD for spelling/capitalization accuracy in the Introduction.

-Methods:

-The context for the statement in the methods regarding candidate variants in the Korean population is not clear.

-Continuing to have a lot of difficulty understanding how the authors were able to link information from the familial TMM database / questionnaire with the sequencing information if these are located in two separate datasets w/ two separate accesses.

-Results/Discussion

The authors’ note about cancer-bearing offspring mentions p=0.041 is not statistically significant but does not specify what threshold would be significant – presumably with a Chi-squared test this would be 0.05, so this is statistically significant?). However, if there was a correction done for multiple testing (which would be appropriate if testing was done across numerous variants), then this is not clear from the paper as written.

Reviewer #2: (No Response)

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2021 Jan 11;16(1):e0236907. doi: 10.1371/journal.pone.0236907.r004

Author response to Decision Letter 1


2 Dec 2020

PONE-D-20-21851R1

Novel candidates of pathogenic variants of the BRCA1 and BRCA2 genes from a dataset of 3,552 Japanese whole genomes (3.5KJPNv2)

________________________________________

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors’ efforts are appreciated with regard to their revisions on the manuscript. This has resulted in text that is significantly improved, but continues to have some ongoing issues towards publication.

MAJOR: (1) With regard to InterVar use, to reiterate/clarify the point from the initial review, the authors did not explain their handling of noncoding variants (which is, as they clarified, is the point of WGS data over exonic data). Intervar describes handling of the mutation types in question as follows:

*Splicing -> Intervar does address some splicing variants, but they describe in the Li paper that their data cleaning procedure includes removal of variants with conflicting interpretation as part of data cleaning prior to checking the dbscsnv11 database. (This is confusing as the authors then show variants that have conflicting interpretations in Clinvar? I wasn't clear how these would have been identified if Intervar was used as described in the Li paper?)

Thank you very much for detailed comment. To answer to the point, we checked the Intervar annotation data again and we did not find any variants that may affect splicing but interpreted as “Benign” or “likely benign” in the BRCA1/2 gene variants in 3.5KJPN. There is one variant, the BRCA2 I2675V, that is known as a splicing error-causing variant in 3.5KJPN and the variant is annotated as P or LP by both Clinvar and Intervar. To clarify the issue, we added following sentences in the “Results and discussions” section (p22).

“So far there is no other splicing error-causing variants of the BRCA1/2 genes in the 3.5KJPN. As shown in Table 1, there are two splicing-affected variants (BRCA2:g.chr13: 32890558AGdel and BRCA2:g.chr13: 32937315G>A) in the GnomAD-EAS, indicating that we did not miss a large numbers of splicing error causing variants in the BRCA1/2 genes in the 3.5KJPN.”

*Intronic / noncoding variants -> This point was not addressed by the authors although the WGS would allow them to do so (and that is noted as one of their reasons for novelty). ANNOVAR identifies intergenic and noncoding variants, but InterVar’s web application is specifically designed for annotation use in exons and their paper notes that:

“InterVar is designed to interpret genetic variants that are likely to cause Mendelian diseases or are highly penetrant for Mendelian diseases (OR > 5) and cannot handle alleles that increase susceptibility to common and complex traits. Therefore, we caution that the current interpretation is appropriate only for Mendelian diseases or Mendelian forms of complex diseases.”

If the authors intend to only address splicing using WGS data over the existing exonic data, then region of application is what needs to be clarified consistently through the paper and a point should be made in the discussion regarding limitations of InterVar in this context.

If they intend to state that they analyzed intronic / noncoding variants related to BRCA1/BRCA2, then it is not clear that InterVar as described with use of default settings is a good tool for intronic / noncoding variants. I would refer the authors to the literature for references such as this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6266896/ It is noted that the variant annotation, albeit different, cited by the authors in their prior analysis on 2KJPN demonstrates ability to pursue this work – this point is just not clear.

Thank you very much for the comment and we apologize our unclear explanations. We have used command-line based InterVar software with its default option. It can annotate the noncoding variants of a gene and intergenic regions, too. As we mentioned in the methods section of our manuscript, the command-line Intervar is depends on the ANNOVAR function so that it can cover non-coding variants. In the BRCA1/2 genes in the 3.5KJPN, there are 2891 non-coding (intronic, 5’-UTR, and 3’-UTR) variants and all of them are annotated by the software. There is no P or LP variants in either Clinvar or Intervar. To clarify the issue, we added the following sentences in the “Results and discussions” section (p17).

Hence, InterVar may underestimate the clinical impact of potentially pathogenic variants about which previous information is not available. The tendency might be worse in the noncoding regions in the coding genes like the BRCA1/2 genes because of the lack of functional studies for such regions. Nowadays, the whole genome sequencing data is accumulating and comparisons between the phenotypes and variants in the noncoding regions found by the WGS will provide critical data for the interpretation of the noncoding variants.”

(2) There continue to be problems with conflation of moderate penetrance and likely pathogenic variants that are concerning. The term “moderately pathogenic variants” remains unclear. Please revise “moderately pathogenic” to “likely pathogenic” (related to variants) or “moderate penetrance” (related to genes) depending on what was intended as these are the standard accepted terms in the field.

For instance, these descriptions are completely inaccurate regarding management of patients with BRCA1/BRCA2 mutations because of this conflation problem.

“The carriers of moderately pathogenic HBOC variants will not have

undergone the drastic prophylactic modalities but frequent examination will be recommendable for earlier detection of the cancers.”

“The presence of such variants may not be critical to prompt radical interventions such as

prophylactic surgery, but the carriers may be encouraged to continue undergoing close health checks to detect HBOC cancers as early as possible.”

Thank you very much for the comments. We understand the reviewer’s criticism as a word “pathogenic” should not be used for a variant with low susceptibility that may not be needed any medical drastic actions. We would like to use “likely pathogenic” for variants that can cause diseases with enough high provability for medical action and “moderate” variants that shows low but significantly susceptible for disease onset. So, we amended the issue throughout the manuscript not to use “pathogenic” for explanation of such weakly disease-causing variants. In terms of the two exemplified sentences, we amended as follows:

“The carriers of moderately deleterious HBOC variants would not undergo drastic prophylactic modalities, but frequent examination would be recommendable for earlier detection of the cancers.”

“The carriers in moderately penetrant HBOC families may not be critical to prompt radical interventions such as prophylactic surgery, but the carriers may be encouraged to continue undergoing close health checks to detect HBOC cancers as early as possible.”

Also:

“Many of the cancer-predisposing genes are known to be associated with juvenile cancer

syndromes such as Li Fraumeni syndrome. The variants responsible for juvenile cancer syndromes

are usually very pathogenic and show strong effects on gene functions.”

Li Fraumeni syndrome is a specific diagnosis associated with a specific gene, not multiple genes. These sentences in particular should just be removed completely.

Thank you very much for critical comment and we accept the request and deleted the sentences.

(3) If part of the purpose of the paper is to evaluate the performance of InterVar in annotation, Clinvar and InterVar variants should be treated as separate categories. There are multiple parts of the results in which variants from these groups are just put together.

Thank you very much for the careful comments and we apologize the confusing way of our explanation. We did not include the evaluation of performance of InterVar in annotation. We just use it to supplement the lack of annotation by Clinvar and obtain the info of the other parameters such as CADD, DAN, and Eigen. Most of the conflicts between two databases are caused by the lack of information in Clinvar. So, we use them basically the same manner.

MINOR:

Introduction:

-On page 4, in the introduction, the term “approaches” is unclear since what the authors are referring to is that there are several efforts (whereas approaches implies different strategies of ascertaining pathogenicity of identified mutations).

Thank you very much for careful comments and we amended the sentences to clarify our intention as follows:

“Several levels of studies (single organization, single nation, and international level) have been done previously. As a single organization study, Sugano et al. reported the BRCA1 and BRCA2 germline variants in 135 HBOC patients and identified 28 pathogenic ones [9]. As the nationwide study, Arai et al. examined 830 Japanese HBOC pedigrees collected by the Japanese HBOC consortium and identified 49 different pathogenic variants among them [10]. Similarly, a nationwide multicenter study revealed that germline BRCA 1/2 mutations were present in 14.7% of 634 Japanese women with ovarian cancer [5]. Lee et al. also examined the variants in the BRCA1 and BRCA2 genes in breast and ovarian cancer patients’ germline genomic DNA and calculated posterior probabilities for the disease-causing mutations; they identified five previously unreported variants as candidate pathogenic ones [11]. Finally, as an international study, BRCA Exchange…”

-Would modify result/discussion subtitles from “Pathogenic variants in the two…” -> “Estimation of pathogenic variants…” and “Estimate of computational scoring tools for pathogenicity of the 3.5KJPNv2 BRCA variants” -> “Estimate of computational scoring tools’ performance in predicting pathogenicity of novel 3.5KJPNv2 BRCA variants”

Thank you very much for critical comment and we accept the request and amended as suggested.

-Please revise references to genomAD for spelling/capitalization accuracy in the Introduction.

Thank you so much for the careful comment and we amended the issue throughout the manuscript as “gnomAD”.

-Methods:

-The context for the statement in the methods regarding candidate variants in the Korean population is not clear.

Thank you very much for the comment. We tried to fix the issue as following:

“The positions of the candidate pathological variants found in the Korean population [11] were described as the cDNA positions. Ti apply the data to the InterVar software, …“

-Continuing to have a lot of difficulty understanding how the authors were able to link information from the familial TMM database / questionnaire with the sequencing information if these are located in two separate datasets w/ two separate accesses.

Thank you for the comment and we apologize the unclear explanation. So, we added the following explanation in the Methods section.

“The TMM database is a relational database and it consists of several separate datasets. The key is the participants’ IDs to link the information stored in the different tables”

-Results/Discussion

The authors’ note about cancer-bearing offspring mentions p=0.041 is not statistically significant but does not specify what threshold would be significant – presumably with a Chi-squared test this would be 0.05, so this is statistically significant?). However, if there was a correction done for multiple testing (which would be appropriate if testing was done across numerous variants), then this is not clear from the paper as written.

Thank you very much for the constructive comments. We felt that the 0.041 was not enough low to be “significant”. As the reviewer suggested, it should be interpreted as “marginally significant” if the p < 0.05 is the threshold. So we amended the main text as follows. To fulfil the requirement of the PLoS One policy for publication, we added the data for offspring cancer burden in the TMM database as Supplementary Table 5 and fixed Figure 2 accordingly. So, the figure legend of Figure 2 also amended.

“A prominent difference between those definitely carrying potentially pathogenic BRCA variants and the rest of the cohort was in the rate of cancer-bearing sisters: the InterVar P or LP carriers were shown to have a much higher rate of cancer-bearing sisters than the rest of the cohort (Fig. 2 and Supplementary Table 5; p = 3.08 × 10−5, chi-squared test with Yates’ correction). In addition, the rate of cancer-bearing offspring was higher in the InterVar P or LP carriers than in the others with marginally significant (p = 0.041).”

Figure 2 legend:

“…Asterisks indicate statistically significant differences (single: p < 0.05, double: p < 10-4) upon comparison with the total analyzed TMM CommCohort cases (Fig. 2).”

Reviewer #2: (No Response)

________________________________________

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

________________________________________

In compliance with data protection regulations, you may request that we remove your personal registration details at any time. (Remove my information/details). Please contact the publication office if you have any questions.

Attachment

Submitted filename: Response_to_Reviewers.docx

Decision Letter 2

Yonglan Zheng

7 Dec 2020

Novel candidates of pathogenic variants of the BRCA1 and BRCA2 genes from a dataset of 3,552 Japanese whole genomes (3.5KJPNv2)

PONE-D-20-21851R2

Dear Dr. Yasuda,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Yonglan Zheng

Academic Editor

PLOS ONE

Acceptance letter

Yonglan Zheng

22 Dec 2020

PONE-D-20-21851R2

Novel candidates of pathogenic variants of the BRCA1 and BRCA2 genes from a dataset of 3,552 Japanese whole genomes (3.5KJPNv2)

Dear Dr. Yasuda:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Yonglan Zheng

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Functional annotations of the BRCA gene variants in 3.5KJPNv2 and gnomAD.

    (XLSX)

    S2 Table. InterVar P or LP variants in the BRCA genes of the RIKEN 2,234 Japanese whole-genome sequence dataset.

    (XLSX)

    S3 Table. Comparison of scores for pathogenic variants in the BRCA genes in 3.5KJPN.

    (XLSX)

    S4 Table. Details of “pathogenic” BRCA variants but not P or LP by InterVar in the paper by Momozawa et al.

    (XLSX)

    S5 Table. Statistics of the sisters or offspring cancer histories of candidate BRCA pathogenic variants carriers in the TMM-Comm cohort.

    (XLSX)

    Attachment

    Submitted filename: Response_to_Reviewersfinal.docx

    Attachment

    Submitted filename: Response_to_Reviewers.docx

    Data Availability Statement

    In terms of the ethical restrictions on access to the data used in our study, the data that we used are histories of disease and genomic information; both of these sets of data are private and it would be possible to identify an individual with them. Therefore, it is necessary to obtain approval for data access from the TMM prospective cohort project; specifically, users should obtain approval from the sample and data access committee of the TMM Biobank. This committee consists of experts both inside and outside the TMM. Upon applying to this committee, the Group of Materials and Information Management in the TMM at Tohoku University supports the procedures for data transfer. The Group of Materials and Information Management can be contacted at dist@megabank.tohoku.ac.jp.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES