Summary
The increasing availability of tumor and germline genomic data allows integration of these two data sets to better understand cancer risk. We provide an overview of the types of research being performed and the tools and data available to researchers.
Abstract
Cancer is characterized by a diversity of genetic and epigenetic alterations occurring in both the germline and somatic (tumor) genomes. Hundreds of germline variants associated with cancer risk have been identified, and large amounts of data identifying mutations in the tumor genome that participate in tumorigenesis have been generated. Increasingly, these two genomes are being explored jointly to better understand how cancer risk alleles contribute to carcinogenesis and whether they influence development of specific tumor types or mutation profiles. To understand how data from germline risk studies and tumor genome profiling is being integrated, we reviewed 160 articles describing research that incorporated data from both genomes, published between January 2009 and December 2012, and summarized the current state of the field. We identified three principle types of research questions being addressed using these data: (i) use of tumor data to determine the putative function of germline risk variants; (ii) identification and analysis of relationships between host genetic background and particular tumor mutations or types; and (iii) use of tumor molecular profiling data to reduce genetic heterogeneity or refine phenotypes for germline association studies. We also found descriptive studies that compared germline and tumor genomic variation in a gene or gene family, and papers describing research methods, data sources, or analytical tools. We identified a large set of tools and data resources that can be used to analyze and integrate data from both genomes. Finally, we discuss opportunities and challenges for cancer research that integrates germline and tumor genomics data.
Introduction
The progression from cancer susceptibility to tumorigenesis involves two separate, but related genomes—the germline and the somatic, or tumor genomes. Significant advances have been made both in identifying inherited cancer risk variants and in describing the myriad genetic and epigenetic mutations present in tumor genomes. Increasingly, investigators are incorporating data from both genomes into research to understand carcinogenesis.
Genome wide association studies (GWAS) of cancer have identified hundreds of variants associated with cancer susceptibility. Current technology allows genotyping of millions of SNPs in hundreds of thousands of cases and controls. GWAS also have incorporated whole genome and whole exome sequencing studies, generating even more data on the genetics of cancer risk (1,2). These data have been collected and shared through the creation of large scale resources, such as the 1000 Genomes Project and the NHLBI Exome Sequencing Project, which will assist further discovery efforts (3,4).
Similarly, novel molecular technologies have accelerated progress in understanding the molecular alterations present in the tumor itself that are important for tumorigenesis. The Cancer Genome Atlas [TCGA (5)] and the International Cancer Genome Consortium [ICGC (6)] were established to generate comprehensive catalogs of genomic characteristics of tumors, including mutations, gene expression patterns, and epigenetic changes, from tumors representing 50 different cancer types, and to coordinate efforts to comprehensively characterize more than 25,000 cancer-normal genomes collected globally (7,8).
Despite the vast amounts of data generated to date, determining the function of germline variants associated with cancer risk, whether a tumor mutation is a driver or passenger, and how molecular aberrations in both genomes influence cancer risk, initiation, and progression remains challenging. Understanding which of the many genes mutated in the tumor genome are true drivers for the establishment and growth of the tumor requires complex analyses, and the location of many cancer susceptibility loci in noncoding regions of the genome means that their role in carcinogenesis is not easily discerned (9,10). Catalogs of germline cancer risk data and tumor molecular profiles provide an opportunity to integrate information from both the germline and tumor genomes to better understand carcinogenesis. Expression and epigenetic data derived from tumor genomes is proving useful for understanding the function of cancer risk alleles, particularly those that lie in non-coding regions of the genome (10,11). The ability to classify tumors into multiple subtypes based on commonalities at the molecular level rather than by histological observations could impact studies of cancer risk by more precisely defining the cancer phenotype, as has been done in studies of breast cancer (12). Joint analysis of both the germline and tumor genomes should help determine whether and the extent to which pathways involved in cancer risk, initiation, progression, and response to therapy or prognosis intersect.
We conducted a literature review to assess how germline and somatic data are being integrated to address questions regarding the impact of germline risk alleles and mutations in the tumor on carcinogenesis, identify possible research gaps, and assess resource needs and opportunities to foster work in this field. We describe how combined analysis of germline and tumor data is broadening our knowledge of cancer biology and also review the methods, analytical strategies, and data resources used in this work.
Methods
Literature search
We searched for articles describing combined analyses of germline and somatic data (defined as data generated from analysis of tumor genomes). For this review, we defined ‘somatic alterations’ as alterations such as mutations, changes in gene expression, and epigenetic modifications found in tumor tissue. Three separate PubMed searches were performed for relevant English-language studies conducted in humans (Figure 1). The first search was performed in May 2012 for articles with publication dates starting in January 1, 2009 using the terms: (somatic [tiab] OR gene expression [tiab]) AND (germline [tiab] OR snp [tiab] OR polymorphism* [tiab] OR Polymorphism, Single Nucleotide [mesh] OR Polymorphism, Genetic [mh: noexp] OR gwas OR Genome-Wide Association Study [mesh]); AND cancer [tiab] (693 articles). A second, bridge search, was performed in October 2012 to capture articles more recently published or listed in PubMed from May to October 2012, using the identical criteria (46 articles). Additional potential articles were identified on the TCGA website (5) or recognized as examples of germline and somatic integration research by NCI staff (10 articles).
Because the primary search and bridge searches found only a small number of articles examining association of germline genetic variants with tumor subtypes, a supplemental literature search was performed in July 2013 to identify additional articles using the following criteria: (GWAS and tumor subtype) AND [‘1 January 2009’ (Date—Publication): ‘31 December 2012’ (Date—Publication)] (49 articles). Abstracts from the supplemental search were reviewed by two reviewers and consensus was used for inclusion.
A total of 798 articles were identified. Abstracts were reviewed first, and articles underwent full review if they appeared to incorporate both germline and tumor genomic data. Ten percent of articles excluded during abstract review (primary and bridge search) were reviewed independently for quality-control purposes. Articles were excluded for any of the following reasons: no integration of germline and somatic data (i.e. the research did not combine the two data sources), no cancer endpoint in the study, no somatic component, no germline component, limited to in vitro functional studies (i.e. used only established cell lines), animal studies, and case reports of small sample size (N ≤ 2). After exclusion, 160 articles, 24 from the supplemental search, were considered relevant and included for data abstraction (Supplementary Table I, available at Carcinogenesis Online).
Data abstraction
From the relevant 160 articles, we abstracted data about study design, molecular approaches used for the analysis of germline and somatic data, data resources, and software and analysis programs used. To better understand the research questions these data were being used to address, articles were assigned to one of three categories: (i) use of tumor data to determine the putative function of germline risk variants; (ii) identification and analysis of relationships between host genetic background and particular tumor mutations or subtypes; and (iii) use of tumor molecular profiling data to reduce genetic heterogeneity or refine phenotypes for association studies. Although some articles described research that included characteristics of multiple categories, each article was classified according to the category that was the main focus of the work. We identified three articles where a single focus could not be determined; these were included in multiple categories. Several articles did not fall into these main categories and were classified as descriptive studies of germline and tumor variation in a gene or related genes (16 articles), or as papers describing research methods, data sources, or analytical tools (16 articles). Data analysis tools (e.g. web-based analytical programs, software tools, suites of programs, or prediction programs) and data resources (e.g. databases or catalogues of DNA sequences, sources of datasets) used in the relevant papers were characterized as being used for analysis, cataloging germline or somatic variation, germline and somatic data integration, or functional annotation (Supplementary Tables II and III, available at Carcinogenesis Online).
Results
Cancer types included
The 160 articles we identified that described use of both germline and tumor data encompassed 18 different cancer types (Figure 2). The most common cancer types were breast (N = 39); colon, rectum, or colorectal (N = 23); and lung (N = 20). Several articles were reviews or focused on analytical methods or best practices and were not specific to particular cancer types, or incorporated data from more than one cancer type and were considered multiple or not otherwise specified (NOS) (N = 29).
Molecular approaches used for characterization of germline and somatic alterations
In articles describing joint analysis of germline and tumor data, a range of molecular approaches were used for characterization of variation (Figures 3 and 4). Targeted genotyping of candidate genes was common in studies of germline risk, as was use of data from prior studies, genetic screening, or data classified as ‘known mutation status’ from TCGA. To examine variation in tumor tissue, studies most frequently evaluated gene expression, using targeted and genome wide approaches
Databases and analytic tools
We identified 64 data analysis tools used in these studies (Supplementary Table II, available at Carcinogenesis Online). Most tools were used for analysis or characterization of germline or somatic data sources independently, rather than explicitly for integration. These tools were used for a variety of purposes including sequence alignment, sequence annotation, pathway analysis, and variant evaluation for SNPs and copy number alterations (CNAs) including identification, cataloging, imputation, and haplotype analysis. Some packages or toolkits provided a platform for using analysis tools, such as GenePattern, Genome Analysis Toolkit (GATK), and Partek Genomics Suite.
We identified 35 data resources that were used in these studies (Supplementary Table III, available at Carcinogenesis Online). Many of these resources are not specific to cancer research, but provide richly annotated data on human genomic variation such as the International HapMap project (http://hapmap.ncbi.nlm.nih.gov/), the 1000 Genomes Project (http://www.1000genomes.org/) and National Center for Biotechnology Information (NCBI) resources including the SNP database (dbSNP). Other data resources were specific to cancer research including the Catalogue of Somatic Mutations in Cancer (COSMIC), TCGA, Cancer Gene Census, and ICGC.
Categories of research questions integrating germline and tumor genomic data
We classified the 160 articles into three broad categories according to the types of research questions being addressed. We classified 51 articles as integrating germline and somatic data to address questions about germline variant function (category 1), 40 as describing the influence of the germline genome on the tumor that develops (category 2), and 34 as using somatic data to refine tumor phenotypes for association studies (category 3). We found three articles that described research that had characteristics of both categories 1 and 2 and were considered both as category 1 and 2. The descriptive studies and articles reporting data resources or tools were not classified according to the three categories of research. In the following sections, we report our findings and provide illustrative examples of studies from each category.
Category 1: use of tumor data to determine the function of germline risk variants
The majority of articles we found that incorporated germline and tumor data in their analyses focused on use of data from tumor tissue to understand the functional consequences of germline variation (51 articles), a challenge given that many of these risk alleles lie in non-coding regions of the genome. In many of the articles, investigators used data generated from local analysis of tumor samples, rather than from a database such as TCGA, ICGC, or COSMIC. Several of the articles we found described research in which tumor data was used to examine the potential impact of germline risk variants on regulatory events such as mRNA expression, transcription factor binding, or formation of transcription complexes at putative regulatory elements. Other studies in this category explored relationships between germline variants and somatic changes including CNAs, methylation patterns, telomere length, and chromosomal rearrangement.
Notably, tumor data was most commonly used to assess whether germline risk variants affected gene expression. As an example of this type of work, Fu et al used data from tumor tissue to determine that a bladder cancer risk SNP located at 8q24.3 was associated with increased expression of PSCA in tumor samples. Another bladder cancer risk SNP at this locus appeared to disrupt nuclear protein binding to DNA (13), implying a potential regulatory effect. Similarly, Loo et al. performed a cis-eQTL analysis of genome-wide expression data and 18 colorectal cancer risk variants and found that three of these were significantly associated with expression levels of genes located within 2 MB up or downstream of the variant (14). Two of these variants were associated with increased expression of the genes ATP5C1 and DLGAP5, which encode mitochondrial proteins involved in cellular metabolism and cell division; these genes were not previously known to have a role in colon carcinogenesis.
As an example of functional research that specifically used TCGA tumor data to further characterize germline cancer risk variants, Feng et al. mined TCGA glioblastoma data to evaluate the functional consequences of molecular alterations observed in the tumor genome at the 9p21.3 locus. This locus encodes the CDKN2A/CKDN2B tumor suppressor genes and also harbors germline polymorphisms associated with glioblastoma risk (15). This analysis found that CNAs detected in the tumor genome strongly affected expression of both CDKN2A and CDKN2B.
The articles described in this category demonstrate how integrating germline and tumor genomic data can link germline polymorphisms to events occurring in the tumor, which may help to elucidate the role of germline risk variants in tumorigenesis. This approach also has shown that genes not previously known to be involved in carcinogenesis may participate in this process.
Category 2: analysis of relationships between germline genetic variation (host genetic background) and tumor types or somatic mutations
We found 40 articles that examined the relationship between germline variants and tumor characteristics. This type of relationship is exemplified by hereditary cancer syndromes such as Lynch syndrome, in which inherited mutations in mismatch repair genes lead to the development of colorectal tumors with characteristic microsatellite instability (16). The availability of large amounts of molecular data on both germline risk variants and tumor mutations will allow investigators to explore such relationships in greater detail and discover new links between germline and somatic events involved in carcinogenesis.
Most of the articles analyzing relationships between germline variants and tumor characteristics linked mutations in one gene to tumor type, although recently investigators have begun to explore this relationship on a genome-wide basis. As examples of articles describing relationships between mutations in specific genes and tumor type, Banneau et al. (17) found that women with Cowden syndrome have germline PTEN mutations that associate with development of breast tumors with an apocrine phenotype. Similarly, Rausch et al observed that medulloblastomas in patients with germline TP53 mutations characteristic of Li Fraumeni syndrome often showed evidence of chromothripsis, which is characterized by massive genomic rearrangements and may indicate aggressive disease (18). Finally, Liu et al. (19) found that germline polymorphisms in the EGFR gene were associated with exon 19 microdeletions in lung tumors that are associated with response to tyrosine kinase inhibitors.
Recently, researchers have begun to explore the connection between the germline and tumor genomes on a broader scale. For example, a study by Dworkin et al. (20) described the analysis of molecular profiles of squamous cell carcinomas (SCCs) in organ transplant recipients, who tend to develop multiple tumors. They found that the molecular profiles of the SCCs occurring within a single patient were more similar than profiles of tumors taken from different patients. They concluded that the host background has a significant impact on tumor characteristics, even within a given tumor type. La Framboise et al. (21) used glioblastoma data from TCGA to examine connections between the germline and tumor genomes. They developed a method to identify preferentially amplified alleles in the tumor, hypothesizing that oncogenic germline variants with roles in tumor development would be selectively retained. Positively selected alleles included known glioma risk variants, strengthening the hypothesis that certain germline risk variants impact tumor development.
Although a relationship between germline susceptibility and characteristics of the tumor that develops has been suspected for some time, the availability of data from both genomes is refining our understanding of this relationship. New information on associations between germline risk and tumor phenotype may impact risk prediction, prevention, and treatment.
Category 3: use of tumor molecular profiling data to reduce genetic heterogeneity or refine phenotypes
Although cancers are usually described by anatomical site, histological types have been recognized for many different cancers and germline risk variants have been found to associate with specific tumor subtypes. Similarly, gene expression patterns have been identified that correspond to specific cancer subtypes, as shown for breast cancer (22). The extensive molecular profiling of tumors performed by groups such as TCGA and ICGC has confirmed that most cancer can also be grouped based on commonalities at the molecular level (23–27), and, perhaps unsurprisingly, incorporating this information into association studies has revealed links between molecular subtype and germline variants. There were 34 articles in this category, some of which categorized tumors by histological characteristics, while others used molecular data, for example, estrogen receptor status for breast cancer, to define tumor subtypes.
We found several articles in which tumors were stratified by histological type for GWAS, and SNPs that associated with specific types were identified. For example, GWAS of gastric cancer have found that certain polymorphisms are more strongly associated with a particular histological type of gastric cancer [diffuse-type gastric cancer versus intestinal type (23,24,28)]. Similar observations have been made for lung cancer, for which certain SNPs associate more strongly with specific histological types (29,30). Susceptibility loci have been found that associate with risk for different histological types of ovarian cancer as well (31,32). For example, two SNPs in IL1A were associated with decreased risk for clear cell, endometroid, and mucinous type ovarian cancer, but were not associated with risk for the more common serous type (33).
The generation of large amounts of tumor molecular data has allowed a similar approach to be used, that is, refining tumor phenotype using molecular profiling, to find germline variants that associate with risk for a specific tumor subtype. The majority of this work has been performed for breast cancer, for which data on hormone receptor status and intrinsic subtype have been available for some time. Several loci have been found that associate more strongly with estrogen receptor positive (ER+) breast cancer (34–36). Other variants, including SNPs in FGFR2 and TNRC9 were found to have a stronger association with ER− breast cancer than with ER+ tumors (37,38). Variants also have been identified that seem to associate specifically with triple negative breast cancer (ER−, PR− and HER2−), a more aggressive form of the disease. Similar work has been performed using intrinsic subtype to categorize breast cancers, and has found SNPs that associate specifically with a given subtype (39).
As profiling data becomes available for more cancer sites, it will be interesting to see whether specific germline risk alleles associate with molecular subtypes for these cancers. In addition, it is possible that reducing phenotypic heterogeneity using molecular data may decrease ‘noise’ and facilitate discovery of novel, rarer variants.
Discussion
The past decade has seen dramatic increases in both the identification of germline variants associated with cancer risk and the generation of multi-dimensional molecular profiles of tumor genomes (40). Thanks to new technologies, increased computational power, and new analytic strategies, increasingly sophisticated analyses that can incorporate data from both the tumor and germline genomes are well underway. These efforts are allowing investigators to develop a better understanding of the connection between germline risk variants and tumor biology, as well as the events that lead to increased cancer risk and other cancer-related outcomes such as progression and survival. We conducted a literature review to assess the current state of cancer research that integrates data from germline and tumor genomes, understand the types of approaches and sources of data commonly used in these efforts, and to identify research opportunities as well as challenges and resource needs.
Our initial search for articles that included data from both the tumor and germline genomes, published between 2009 and 2012, identified more than 700 manuscripts, but of these, only 160 articles satisfied our inclusion criteria. We acknowledge that limiting our search to the years 2009–2012 means that some articles were missed. However, given that the first (glioblastoma) and second (ovarian carcinoma) molecular analyses of tumor genomes by TCGA were published in 2008 and 2011, respectively (25,26), and ICGC began similar work in 2008 (7), with data release in 2010 and initial research publications in 2012 (41), we believe this timeframe nonetheless provides a fairly comprehensive overview of the field. Most of the 160 articles also focused on targeted, rather than genome-wide approaches. We expect that, as more data become available through these resources and as tools for analyses become more sophisticated and widely used, publication of articles integrating germline and tumor genomic data and using genome-wide approaches for discovery may increase accordingly.
A large number of tools and databases available for this type of research were identified, but little overlap was observed across different studies regarding the use of specific analytical tools. It is possible that the types of research questions addressed in these articles required the use of customized tools, but it also is possible that these tools are not being cataloged or disseminated in an efficient manner. One consequence of the proliferation of customized tools for analyzing these types of data is that comparability across research findings may be difficult. It may be worthwhile to evaluate and compare different tools and develop a consensus regarding best practices for tools and approaches used to analyze the complex molecular profiles of tumors and to integrate this data with results from germline association studies. Recently, the ICGC–TCGA Somatic Mutation Calling challenge was launched to accomplish similar goals for mutation calling methods (42).
In our review, we noted the sources of data used by investigators, for example, whether researchers generated molecular data from their own tissue collections or used data from a central resource such as TCGA. We found that in most cases, investigators relied on their own resources for data; we speculate that this could be due to the relatively recent availability of TCGA data, or because the investigators’ own data was more suitable for their research purposes. We also tabulated the types of cancers studied in these reports. Of research that relied on TCGA data, studies of the most mature data (i.e. data generated for glioblastoma and ovarian cancer) were the most common. However, more commonly occurring cancers, such as breast and colorectal, were the most frequent cancer types studied across all the articles we found.
We classified the 160 articles that fit our criteria into three main categories of research. The large number of studies using tumor data to determine the function of germline risk variants (category 1) was perhaps expected, given that this has been a difficult task made more challenging by the location of many cancer risk variants in non-coding regions. The availability of tumor molecular data has provided investigators with an opportunity to test whether risk variants have an impact on processes occurring in the tumor. Because trait-associated SNPs are considered likely to exert regulatory effects (43), most of these studies mined expression data to determine if a risk allele was associated with changes in gene expression in the tumor. Several such studies observed a regulatory effect and in several cases this work led to identification of genes not previously known to be involved in development of that particular type of cancer (13,14,44). Although expression studies were most common, additional data types exist in TCGA and ICGC and we expect that investigators will assess the impact of risk variants on chromatin conformation, long range interactions, and epigenetic events. Moreover, the growth of large databases and tools for mining and interpreting large scale data could enable more genome-wide analyses, which our results suggest are limited to date, and further facilitate efforts to determine function and understand interactions. Additional work will be required to understand the biological impact of changes in gene expression and the roles of newly identified genes in carcinogenesis, but this approach offers a way to begin to understand the function of risk variants and annotate the previously uncharacterized regions of the genome in which they are found.
Researchers have long suspected that the host genome may influence the tumor that develops [category 2 (45,46)]. We found 40 articles in which certain germline risk variants were associated with tumor characteristics, including histology and the presence of specific mutations. The creation of catalogs of tumor mutation data will allow genome-wide assessment of the interaction between germline variation and tumor characteristics, rather than a gene by gene approach. Better understanding of the interactions between these two genomes will help strengthen the hypothesis that some germline risk variants have a direct role in tumorigenesis. However, it also is possible that risk variants affect processes extrinsic to the tumor, such as immune surveillance; evading mechanisms for suppressing tumor growth may be another way in which germline variants affect cancers risk. It will be interesting to see what new insights into the host-tumor relationship are gleaned from studies integrating information from both genomes.
The realization that molecular profiling can more precisely delineate tumor subtypes than histological information offers additional opportunities for cancer researchers (category 3). Evidence from association studies that stratify tumors by histological types laid the groundwork for the idea that refining tumor phenotypes could lead to the identification of new or different germline variants associated with each specific cancer type (23,24,28). GWAS in which breast tumors were stratified by hormone receptor status or intrinsic subtypes identified risk variants specific for each subtype and strengthened some associations beyond those observed when all tumor subtypes were analyzed together (34–36). Although obtaining sufficient numbers of specific molecular subtypes of tumors to adequately power such studies will be challenging, reducing the heterogeneity created by analyses of multiple phenotypes may allow the identification of rarer alleles. In addition, knowing that a specific risk profile is associated with a more or less aggressive subtype of a particular cancer may lead to changes in screening and prophylaxis strategies.
We acknowledge several limitations to this literature analysis. It is possible that some relevant articles may have been missed, but we believe our review provides a fairly accurate overview of the types of research in the field. We also acknowledge that our categorization of the articles was subjective. The purpose of developing these categories was to provide a way to organize and understand the types of research currently underway that integrate data from the germline and tumor genomes and also identify gaps in the research and future opportunities. Some of the articles we found may contain aspects of more than one category, which could slightly alter the absolute size of each category, but we believe that the relative sizes of each category would not change significantly. Nonetheless, these classifications were helpful for developing an understanding of the field.
We noted several opportunities regarding the use of both germline and tumor data to understand cancer risk and biology. Specifically, data generated from tumor tissue is proving useful for determining the function of germline risk variants, particularly those that lie in non-coding regions. Molecular profiling to refine tumor subtypes may help identify rarer alleles and also may improve risk prediction and treatment decision-making. Our review also identified several challenges. The variety of tools and databases currently used for work in this field suggests a need to identify best practices, set standards and develop effective ways to share tools to ensure that results are comparable across studies. Analysis of associations by tumor subtypes may theoretically help to detect rarer variants. However, although power might be increased because heterogeneity has been reduced, obtaining adequate numbers of cases of a particular subtype to generate sufficient power for such analyses may be difficult. Additionally, although tumor data have shown utility for analyzing the function of risk alleles, investigators must expand such analyses to normal tissue to determine the role of germline risk variants in host processes that may occur prior to tumor initiation. Leveraging and integrating information from both the germline and tumor genomes to better understand carcinogenesis will require multi-disciplinary collaborations that include researchers in diverse fields, including genetic epidemiology, cancer biology, and bioinformatics.
Supplementary material
Supplementary Tables I–III can be found at http://carcin. oxfordjournals.org/
Funding
SAIC-Frederick, Inc., NCI-Frederick, Frederick, Maryland 21709, National Cancer Institute contract (HHSN261200800001E) to K.G., H.F. and C.H.
Supplementary Material
Acknowledgements
The authors would like to thank Sheri Schully, PhD, for contract oversight and helpful discussions.
Conflict of Interest Statement: None declared.
Glossary
Abbreviations:
- ER
estrogen receptor
- GWAS
Genome wide association studies
- ICGC
International Cancer Genome Consortium
- TCGA
The Cancer Genome Atlas.
References
- 1. Green E.D., et al. (2011). Charting a course for genomic medicine from base pairs to bedside. Nature, 470, 204–213 [DOI] [PubMed] [Google Scholar]
- 2. Hindorff L., et al. (2013). A Catalog of Published Genome-Wide Association Studies. www.genome.gov/gwastudies (5 September 2013, date last accessed). [Google Scholar]
- 3. The 1000 Genomes Project Consortium (2012). An integrated map of genetic variation from 1,092 human genomes. Nature, 491, 56–65 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Abecasis G.R., et al. (2010). A map of human genome variation from population-scale sequencing. Nature, 467, 1061–1073 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. (2013). The Cancer Genome Atlas. http://cancergenome.nih.gov/ (5 September 2013, date last accessed).
- 6. (2013). International Cancer Genome Consortium. http://icgc.org/ (5 September 2013, date last accessed).
- 7. The International Cancer Genome Consortium (2010). International network of cancer genome projects. Nature, 464, 993–998 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Chin L., et al. (2011). Making sense of cancer genomic data. Genes Dev., 25, 534–555 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Lawrence M.S., et al. (2013). Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature, 499, 214–218 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Freedman M.L., et al. (2011). Principles for the post-GWAS functional characterization of cancer risk loci. Nat. Genet., 43, 513–518 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Sara H., et al. (2010). A decade of cancer gene profiling: from molecular portraits to molecular function. Methods Mol. Biol., 576, 61–87 [DOI] [PubMed] [Google Scholar]
- 12. Garcia-Closas M., et al. (2013). Genome-wide association studies identify four ER negative-specific breast cancer risk loci. Nat. Genet., 45, 392–398, 398e1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Fu Y.P., et al. (2012). Common genetic variants in the PSCA gene influence gene expression and bladder cancer risk. Proc. Natl. Acad. Sci. U. S. A., 109, 4974–4979 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Loo L.W., et al. (2012). cis-Expression QTL analysis of established colorectal cancer risk variants in colon tumors and adjacent normal tissue. PLoS ONE, 7, e30477. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Feng J., et al. (2012). An integrated analysis of germline and somatic, genetic and epigenetic alterations at 9p21.3 in glioblastoma. Cancer, 118, 232–240 [DOI] [PubMed] [Google Scholar]
- 16. Al-Sohaily S., et al. (2012). Molecular pathways in colorectal cancer. J. Gastroenterol. Hepatol., 27, 1423–1431 [DOI] [PubMed] [Google Scholar]
- 17. Banneau G., et al. (2010). Molecular apocrine differentiation is a common feature of breast cancer in patients with germline PTEN mutations. Breast Cancer Res., 12, R63. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Rausch T., et al. (2012). Genome sequencing of pediatric medulloblastoma links catastrophic DNA rearrangements with TP53 mutations. Cell, 148, 59–71 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Liu W., et al. (2011). Functional EGFR germline polymorphisms may confer risk for EGFR somatic mutations in non-small cell lung cancer, with a predominant effect on exon 19 microdeletions. Cancer Res., 71, 2423–2427 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Dworkin A.M., et al. (2010). Germline variation controls the architecture of somatic alterations in tumors. PLoS Genet., 6, e1001136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. LaFramboise T., et al. (2010). Allelic selection of amplicons in glioblastoma revealed by combining somatic and germline analysis. PLoS Genet., 6, e1001086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Sorlie T., et al. (2003). Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc. Natl. Acad. Sci. U. S. A., 100, 8418–8423 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Ju H., et al. (2009). A regulatory polymorphism at position-309 in PTPRCAP is associated with susceptibility to diffuse-type gastric cancer and gene expression. Neoplasia, 11, 1340–1347 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Ju H., et al. (2010). SERPINE1 intron polymorphisms affecting gene expression are associated with diffuse-type gastric cancer susceptibility. Cancer, 116, 4248–4255 [DOI] [PubMed] [Google Scholar]
- 25. Cancer Genome Atlas Research Network (2011). Integrated genomic analyses of ovarian carcinoma. Nature, 474, 609–615 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Cancer Genome Atlas Research Network (2008). Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455, 1061–1068 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Cancer Genome Atlas Network (2012). Comprehensive molecular portraits of human breast tumours. Nature, 490, 61–70 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Lim B., et al. (2011). Increased genetic susceptibility to intestinal-type gastric cancer is associated with increased activity of the RUNX3 distal promoter. Cancer, 117, 5161–5171 [DOI] [PubMed] [Google Scholar]
- 29. Kazma R., et al. (2012). Lung cancer and DNA repair genes: multilevel association analysis from the International Lung Cancer Consortium. Carcinogenesis, 33, 1059–1064 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Yang Y.L., et al. (2011). IKZF1 deletions predict a poor prognosis in children with B-cell progenitor acute lymphoblastic leukemia: a multicenter analysis in Taiwan. Cancer Sci., 102, 1874–1881 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Bolton K.L., et al. (2012). Role of common genetic variants in ovarian cancer susceptibility and outcome: progress to date from the Ovarian Cancer Association Consortium (OCAC). J. Intern. Med., 271, 366–378 [DOI] [PubMed] [Google Scholar]
- 32. Song H., et al. (2009). A genome-wide association study identifies a new ovarian cancer susceptibility locus on 9p22.2. Nat. Genet., 41, 996–1000 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. White K.L., et al. (2012). Ovarian cancer risk associated with inherited inflammation-related variants. Cancer Res., 72, 1064–1069 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Warren H., et al. (2012). 9q31.2-rs865686 as a susceptibility locus for estrogen receptor-positive breast cancer: evidence from the Breast Cancer Association Consortium. Cancer Epidemiol. Biomarkers Prev., 21, 1783–1791 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Lambrechts D., et al. (2012). 11q13 is a susceptibility locus for hormone receptor positive breast cancer. Hum. Mutat., 33, 1123–1132 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Figueroa J.D., et al. (2011). Associations of common variants at 1p11.2 and 14q24.1 (RAD51L1) with breast cancer risk and heterogeneity by tumor subtype: findings from the Breast Cancer Association Consortium. Hum. Mol. Genet., 20, 4693–4706 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Reeves G.K., et al. (2010). Incidence of breast cancer and its subtypes in relation to individual and multiple low-penetrance genetic susceptibility loci. JAMA, 304, 426–434 [DOI] [PubMed] [Google Scholar]
- 38. Li J., et al. (2010). A genome-wide association scan on estrogen receptor-negative breast cancer. Breast Cancer Res., 12, R93. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39. Han W., et al. (2011). Common genetic variants associated with breast cancer in Korean women and differential susceptibility according to intrinsic subtype. Cancer Epidemiol. Biomarkers Prev., 20, 793–798 [DOI] [PubMed] [Google Scholar]
- 40. Rahman N. (2014). Realizing the promise of cancer predisposition genes. Nature, 505, 302–308 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41. Jones D.T., et al. (2012). Dissecting the genomic complexity underlying medulloblastoma. Nature, 488, 100–105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42. Boutros P.C., et al. (2014). Global optimization of somatic variant identification in cancer genomes with a global community challenge. Nat. Genet., 46, 318–319 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43. Nicolae D.L., et al. (2010). Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet., 6, e1000888. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44. Grisanzio C., et al. (2012). Genetic and functional analyses implicate the NUDT11, HNF1B, and SLC22A3 genes in prostate cancer pathogenesis. Proc. Natl. Acad. Sci. U. S. A., 109, 11252–11257 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45. Kilpivaara O., et al. (2009). A germline JAK2 SNP is associated with predisposition to the development of JAK2(V617F)-positive myeloproliferative neoplasms. Nat. Genet., 41, 455–459 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46. Landi M.T., et al. (2006). MC1R germline variants confer risk for BRAF-mutant melanoma. Science, 313, 521–522 [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.