Pan-cancer analysis of whole genomes

The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium

doi:10.1038/s41586-020-1969-6

. 2020 Feb 5;578(7793):82–93. doi: 10.1038/s41586-020-1969-6

Pan-cancer analysis of whole genomes

The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium¹

¹Applied Tumor Genomics Research Program, Research Programs Unit, University of Helsinki, Helsinki, Finland

²Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK

³Memorial Sloan Kettering Cancer Center, New York, NY USA

⁴Genome Science Division, Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan

⁵Department of Surgery, University of Chicago, Chicago, IL USA

⁶Department of Surgery, Division of Hepatobiliary and Pancreatic Surgery, School of Medicine, Keimyung University Dongsan Medical Center, Daegu, South Korea

⁷Department of Oncology, Gil Medical Center, Gachon University, Incheon, South Korea

⁸Hiroshima University, Hiroshima, Japan

⁹Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA

¹⁰University of Texas MD Anderson Cancer Center, Houston, TX USA

¹¹King Faisal Specialist Hospital and Research Centre, Al Maather, Riyadh, Saudi Arabia

¹²Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), Madrid, Spain

¹³Bioinformatics Core Facility, University Medical Center Hamburg, Hamburg, Germany

¹⁴Heinrich Pette Institute, Leibniz Institute for Experimental Virology, Hamburg, Germany

¹⁵Ontario Tumour Bank, Ontario Institute for Cancer Research, Toronto, ON Canada

¹⁶Department of Pathology, The University of Texas MD Anderson Cancer Center, Houston, TX USA

¹⁷Laboratory of Pathology, Center for Cancer Research, National Cancer Institute, Bethesda, MD USA

¹⁸Department of Cellular and Molecular Medicine and Department of Bioengineering, University of California San Diego, La Jolla, CA USA

¹⁹UC San Diego Moores Cancer Center, San Diego, CA USA

²⁰Canada’s Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC Canada

²¹Sir Peter MacCallum Department of Oncology, Peter MacCallum Cancer Centre, University of Melbourne, Melbourne, VIC Australia

²²Centre for Research in Molecular Medicine and Chronic Diseases (CiMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain

²³Department of Zoology, Genetics and Physical Anthropology, (CiMUS), Universidade de Santiago de Compostela, Santiago de Compostela, Spain

²⁴The Biomedical Research Centre (CINBIO), Universidade de Vigo, Vigo, Spain

²⁵Royal National Orthopaedic Hospital - Bolsover, London, UK

²⁶Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX USA

²⁷Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, TX USA

²⁸The Jackson Laboratory for Genomic Medicine, Farmington, CT USA

²⁹Genome Informatics Program, Ontario Institute for Cancer Research, Toronto, ON Canada

³⁰Institute of Human Genetics, Christian-Albrechts-University, Kiel, Germany

³¹Institute of Human Genetics, Ulm University and Ulm University Medical Center, Ulm, Germany

³²Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, St. Lucia, Brisbane, QLD Australia

³³Salford Royal NHS Foundation Trust, Salford, UK

³⁴Department of Surgery, Pancreas Institute, University and Hospital Trust of Verona, Verona, Italy

³⁵Molecular and Medical Genetics, OHSU Knight Cancer Institute, Oregon Health and Science University, Portland, OR USA

³⁶Department of Molecular Oncology, BC Cancer Research Centre, Vancouver, BC Canada

³⁷The McDonnell Genome Institute at Washington University, St. Louis, MO USA

³⁸University College London, London, UK

³⁹Division of Cancer Genomics, National Cancer Center Research Institute, National Cancer Center, Tokyo, Japan

⁴⁰DLR Project Management Agency, Bonn, Germany

⁴¹Tokyo Women’s Medical University, Tokyo, Japan

⁴²Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY USA

⁴³Los Alamos National Laboratory, Los Alamos, NM USA

⁴⁴Department of Pathology, University Health Network, Toronto General Hospital, Toronto, ON Canada

⁴⁵Nottingham University Hospitals NHS Trust, Nottingham, UK

⁴⁶Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), Heidelberg, Germany

⁴⁷Computational Biology Program, Ontario Institute for Cancer Research, Toronto, ON Canada

⁴⁸Department of Molecular Genetics, University of Toronto, Toronto, ON Canada

⁴⁹Vector Institute, Toronto, ON Canada

⁵⁰Hematopathology Section, Institute of Pathology, Christian-Albrechts-University, Kiel, Germany

⁵¹Department of Pathology and Laboratory Medicine, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC USA

⁵²Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, Norway

⁵³Pathology, Hospital Clinic, Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain

⁵⁴Department of Veterinary Medicine, Transmissible Cancer Group, University of Cambridge, Cambridge, UK

⁵⁵Alvin J. Siteman Cancer Center, Washington University School of Medicine, St. Louis, MO USA

⁵⁶Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Glasgow, UK

⁵⁷Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC USA

⁵⁸Broad Institute of MIT and Harvard, Cambridge, MA USA

⁵⁹Dana-Farber/Boston Children’s Cancer and Blood Disorders Center, Boston, MA USA

⁶⁰Department of Pediatrics, Harvard Medical School, Boston, MA USA

⁶¹Leeds Institute of Medical Research @ St. James’s, University of Leeds, St. James’s University Hospital, Leeds, UK

⁶²Department of Pathology and Diagnostics, University and Hospital Trust of Verona, Verona, Italy

⁶³Department of Surgery, Princess Alexandra Hospital, Brisbane, QLD Australia

⁶⁴Surgical Oncology Group, Diamantina Institute, University of Queensland, Brisbane, QLD Australia

⁶⁵Department of Population and Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH USA

⁶⁶Research Health Analytics and Informatics, University Hospitals Cleveland Medical Center, Cleveland, OH USA

⁶⁷Gloucester Royal Hospital, Gloucester, UK

⁶⁸European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK

⁶⁹Diagnostic Development, Ontario Institute for Cancer Research, Toronto, ON Canada

⁷⁰Barcelona Supercomputing Center (BSC), Barcelona, Spain

⁷¹Arnie Charbonneau Cancer Institute, University of Calgary, Calgary, AB Canada

⁷²Departments of Surgery and Oncology, University of Calgary, Calgary, AB Canada

⁷³Department of Pathology, Oslo University Hospital, The Norwegian Radium Hospital, Oslo, Norway

⁷⁴PanCuRx Translational Research Initiative, Ontario Institute for Cancer Research, Toronto, ON Canada

⁷⁵Department of Oncology, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University School of Medicine, Baltimore, MD USA

⁷⁶University Hospital Southampton NHS Foundation Trust, Southampton, UK

⁷⁷Royal Stoke University Hospital, Stoke-on-Trent, UK

⁷⁸Genome Sequence Informatics, Ontario Institute for Cancer Research, Toronto, ON Canada

⁷⁹Human Longevity Inc, San Diego, CA USA

⁸⁰Olivia Newton-John Cancer Research Institute, La Trobe University, Heidelberg, VIC Australia

⁸¹Computer Network Information Center, Chinese Academy of Sciences, Beijing, China

⁸²Genome Canada, Ottawa, ON Canada

⁸³CNAG-CRG, Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Barcelona, Spain

⁸⁴Universitat Pompeu Fabra (UPF), Barcelona, Spain

⁸⁵Buck Institute for Research on Aging, Novato, CA USA

⁸⁶Duke University Medical Center, Durham, NC USA

⁸⁷Department of Human Genetics, Hannover Medical School, Hannover, Germany

⁸⁸Center for Bioinformatics and Functional Genomics, Cedars-Sinai Medical Center, Los Angeles, CA USA

⁸⁹Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA USA

⁹⁰The Hebrew University Faculty of Medicine, Jerusalem, Israel

⁹¹Barts Cancer Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of London, London, UK

⁹²Department of Computer Science, Bioinformatics Group, University of Leipzig, Leipzig, Germany

⁹³Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany

⁹⁴Transcriptome Bioinformatics, LIFE Research Center for Civilization Diseases, University of Leipzig, Leipzig, Germany

⁹⁵Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA USA

⁹⁶Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA USA

⁹⁷Harvard Medical School, Boston, MA USA

⁹⁸USC Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, CA USA

⁹⁹Department of Diagnostics and Public Health, University and Hospital Trust of Verona, Verona, Italy

¹⁰⁰Department of Mathematics, Aarhus University, Aarhus, Denmark

¹⁰¹Department of Molecular Medicine (MOMA), Aarhus University Hospital, Aarhus N, Denmark

¹⁰²Instituto Carlos Slim de la Salud, Mexico City, Mexico

¹⁰³Department of Medical Biophysics, University of Toronto, Toronto, ON Canada

¹⁰⁴Cancer Division, Garvan Institute of Medical Research, Kinghorn Cancer Centre, University of New South Wales (UNSW Sydney), Sydney, NSW Australia

¹⁰⁵South Western Sydney Clinical School, Faculty of Medicine, University of New South Wales (UNSW Sydney), Liverpool, NSW Australia

¹⁰⁶West of Scotland Pancreatic Unit, Glasgow Royal Infirmary, Glasgow, UK

¹⁰⁷Center for Digital Health, Berlin Institute of Health and Charitè - Universitätsmedizin Berlin, Berlin, Germany

¹⁰⁸Heidelberg Center for Personalized Oncology (DKFZ-HIPO), German Cancer Research Center (DKFZ), Heidelberg, Germany

¹⁰⁹The Preston Robert Tisch Brain Tumor Center, Duke University Medical Center, Durham, NC USA

¹¹⁰Massachusetts General Hospital, Boston, MA USA

¹¹¹National Institute of Biomedical Genomics, Kalyani, West Bengal India

¹¹²Institute of Clinical Medicine and Institute of Oral Biology, University of Oslo, Oslo, Norway

¹¹³University of North Carolina at Chapel Hill, Chapel Hill, NC USA

¹¹⁴ARC-Net Centre for Applied Research on Cancer, University and Hospital Trust of Verona, Verona, Italy

¹¹⁵The Institute of Cancer Research, London, UK

¹¹⁶Centre for Computational Biology, Duke-NUS Medical School, Singapore, Singapore

¹¹⁷Programme in Cancer and Stem Cell Biology, Duke-NUS Medical School, Singapore, Singapore

¹¹⁸Division of Oncology and Pathology, Department of Clinical Sciences Lund, Lund University, Lund, Sweden

¹¹⁹Department of Pediatric Oncology, Hematology and Clinical Immunology, Heinrich-Heine-University, Düsseldorf, Germany

¹²⁰Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan

¹²¹RIKEN Center for Integrative Medical Sciences, Yokohama, Japan

¹²²Department of Internal Medicine/Hematology, Friedrich-Ebert-Hospital, Neumünster, Germany

¹²³Departments of Dermatology and Pathology, Yale University, New Haven, CT USA

¹²⁴Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain

¹²⁵Radcliffe Department of Medicine, University of Oxford, Oxford, UK

¹²⁶Canadian Center for Computational Genomics, McGill University, Montreal, QC Canada

¹²⁷Department of Human Genetics, McGill University, Montreal, QC Canada

¹²⁸Department of Human Genetics, University of California Los Angeles, Los Angeles, CA USA

¹²⁹Department of Pharmacology, University of Toronto, Toronto, ON Canada

¹³⁰Faculty of Medicine and Health Technology, Tampere University and Tays Cancer Center, Tampere University Hospital, Tampere, Finland

¹³¹Haematology, Leeds Teaching Hospitals NHS Trust, Leeds, UK

¹³²Translational Research and Innovation, Centre Léon Bérard, Lyon, France

¹³³Fox Chase Cancer Center, Philadelphia, PA USA

¹³⁴International Agency for Research on Cancer, World Health Organization, Lyon, France

¹³⁵Earlham Institute, Norwich, UK

¹³⁶Norwich Medical School, University of East Anglia, Norwich, UK

¹³⁷Department of Molecular Biology, Faculty of Science, Radboud Institute for Molecular Life Sciences, Radboud University, Nijmegen, HB The Netherlands

¹³⁸CRUK Manchester Institute and Centre, Manchester, UK

¹³⁹Department of Radiation Oncology, University of Toronto, Toronto, ON Canada

¹⁴⁰Division of Cancer Sciences, Manchester Cancer Research Centre, University of Manchester, Manchester, UK

¹⁴¹Radiation Medicine Program, Princess Margaret Cancer Centre, Toronto, ON Canada

¹⁴²Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA

¹⁴³Department of Surgery, Division of Thoracic Surgery, The Johns Hopkins University School of Medicine, Baltimore, MD USA

¹⁴⁴Division of Molecular Pathology, The Netherlands Cancer Institute, Oncode Institute, Amsterdam, CX The Netherlands

¹⁴⁵Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA USA

¹⁴⁶UC Santa Cruz Genomics Institute, University of California Santa Cruz, Santa Cruz, CA USA

¹⁴⁷Division of Applied Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany

¹⁴⁸German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany

¹⁴⁹National Center for Tumor Diseases (NCT) Heidelberg, Heidelberg, Germany

¹⁵⁰Center for Biological Sequence Analysis, Department of Bio and Health Informatics, Technical University of Denmark, Lyngby, Denmark

¹⁵¹Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark

¹⁵²Institute for Molecular Bioscience, University of Queensland, St. Lucia, Brisbane, QLD Australia

¹⁵³Biomedical Engineering, Oregon Health and Science University, Portland, OR USA

¹⁵⁴Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany

¹⁵⁵Institute of Pharmacy and Molecular Biotechnology and BioQuant, Heidelberg University, Heidelberg, Germany

¹⁵⁶Federal Ministry of Education and Research, Berlin, Germany

¹⁵⁷Melanoma Institute Australia, University of Sydney, Sydney, NSW Australia

¹⁵⁸Pediatric Hematology and Oncology, University Hospital Muenster, Muenster, Germany

¹⁵⁹Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD USA

¹⁶⁰McKusick-Nathans Institute of Genetic Medicine, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University School of Medicine, Baltimore, MD USA

¹⁶¹Foundation Medicine, Inc, Cambridge, MA USA

¹⁶²Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA USA

¹⁶³Department of Genetics, Stanford University School of Medicine, Stanford, CA USA

¹⁶⁴Bakar Computational Health Sciences Institute and Department of Pediatrics, University of California, San Francisco, CA USA

¹⁶⁵Institute of Clinical Medicine, Faculty of Medicine, University of Oslo, Oslo, Norway

¹⁶⁶National Cancer Institute, National Institutes of Health, Bethesda, MD USA

¹⁶⁷Royal Marsden NHS Foundation Trust, London and Sutton, UK

¹⁶⁸Genome Biology Unit, European Molecular Biology Laboratory (EMBL), Heidelberg, Germany

¹⁶⁹Department of Oncology, University of Cambridge, Cambridge, UK

¹⁷⁰Li Ka Shing Centre, Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK

¹⁷¹Institut Gustave Roussy, Villejuif, France

¹⁷²Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK

¹⁷³Department of Haematology, University of Cambridge, Cambridge, UK

¹⁷⁴Anatomia Patológica, Hospital Clinic, Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain

¹⁷⁵Spanish Ministry of Science and Innovation, Madrid, Spain

¹⁷⁶University of Michigan Comprehensive Cancer Center, Ann Arbor, MI USA

¹⁷⁷Department for BioMedical Research, University of Bern, Bern, Switzerland

¹⁷⁸Department of Medical Oncology, Inselspital, University Hospital and University of Bern, Bern, Switzerland

¹⁷⁹Graduate School for Cellular and Biomedical Sciences, University of Bern, Bern, Switzerland

¹⁸⁰University of Pavia, Pavia, Italy

¹⁸¹University of Alabama at Birmingham, Birmingham, AL USA

¹⁸²UHN Program in BioSpecimen Sciences, Toronto General Hospital, Toronto, ON Canada

¹⁸³Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY USA

¹⁸⁴Centre for Law and Genetics, University of Tasmania, Sandy Bay Campus, Hobart, TAS Australia

¹⁸⁵Faculty of Biosciences, Heidelberg University, Heidelberg, Germany

¹⁸⁶Department of Biochemistry, Microbiology and Immunology, Faculty of Medicine, University of Ottawa, Ottawa, ON Canada

¹⁸⁷Division of Anatomic Pathology, Mayo Clinic, Rochester, MN USA

¹⁸⁸Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD USA

¹⁸⁹Illawarra Shoalhaven Local Health District L3 Illawarra Cancer Care Centre, Wollongong Hospital, Wollongong, NSW Australia

¹⁹⁰BioForA, French National Institute for Agriculture, Food, and Environment (INRAE), ONF, Orléans, France

¹⁹¹Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD USA

¹⁹²University of California San Diego, San Diego, CA USA

¹⁹³Division of Experimental Pathology, Mayo Clinic, Rochester, MN USA

¹⁹⁴Centre for Cancer Research, The Westmead Institute for Medical Research, University of Sydney, Sydney, NSW Australia

¹⁹⁵Department of Gynaecological Oncology, Westmead Hospital, Sydney, NSW Australia

¹⁹⁶PDXen Biosystems Inc, Seoul, South Korea

¹⁹⁷Korea Advanced Institute of Science and Technology, Daejeon, South Korea

¹⁹⁸Electronics and Telecommunications Research Institute, Daejeon, South Korea

¹⁹⁹Institut National du Cancer (INCA), Boulogne-Billancourt, France

²⁰⁰Department of Genetics, Informatics Institute, University of Alabama at Birmingham, Birmingham, AL USA

²⁰¹Division of Medical Oncology, National Cancer Centre, Singapore, Singapore

²⁰²Medical Oncology, University and Hospital Trust of Verona, Verona, Italy

²⁰³Department of Pediatrics, University Hospital Schleswig-Holstein, Kiel, Germany

²⁰⁴Hepatobiliary/Pancreatic Surgical Oncology Program, University Health Network, Toronto, ON Canada

²⁰⁵School of Biological Sciences, University of Auckland, Auckland, New Zealand

²⁰⁶Department of Surgery, University of Melbourne, Parkville, VIC Australia

²⁰⁷The Murdoch Children’s Research Institute, Royal Children’s Hospital, Parkville, VIC Australia

²⁰⁸Walter and Eliza Hall Institute, Parkville, VIC Australia

²⁰⁹Vancouver Prostate Centre, Vancouver, Canada

²¹⁰Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, ON Canada

²¹¹University of East Anglia, Norwich, UK

²¹²Norfolk and Norwich University Hospital NHS Trust, Norwich, UK

²¹³Victorian Institute of Forensic Medicine, Southbank, VIC Australia

²¹⁴Department of Biomedical Informatics, Harvard Medical School, Boston, MA USA

²¹⁵Department of Chemistry, Centre for Molecular Science Informatics, University of Cambridge, Cambridge, UK

²¹⁶Ludwig Center at Harvard Medical School, Boston, MA USA

²¹⁷Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX USA

²¹⁸Peter MacCallum Cancer Centre, University of Melbourne, Melbourne, VIC Australia

²¹⁹Physics Division, Optimization and Systems Biology Lab, Massachusetts General Hospital, Boston, MA USA

²²⁰Department of Medicine, Baylor College of Medicine, Houston, TX USA

²²¹University of Cologne, Cologne, Germany

²²²International Genomics Consortium, Phoenix, AZ USA

²²³Genomics Research Program, Ontario Institute for Cancer Research, Toronto, ON Canada

²²⁴Barking Havering and Redbridge University Hospitals NHS Trust, Romford, UK

²²⁵Children’s Hospital at Westmead, University of Sydney, Sydney, NSW Australia

²²⁶Department of Medicine, Section of Endocrinology, University and Hospital Trust of Verona, Verona, Italy

²²⁷Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY USA

²²⁸Department of Biology, ETH Zurich, Zürich, Switzerland

²²⁹Department of Computer Science, ETH Zurich, Zurich, Switzerland

²³⁰SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland

²³¹Weill Cornell Medical College, New York, NY USA

²³²Academic Department of Medical Genetics, University of Cambridge, Addenbrooke’s Hospital, Cambridge, UK

²³³MRC Cancer Unit, University of Cambridge, Cambridge, UK

²³⁴Departments of Pediatrics and Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC USA

²³⁵Seven Bridges Genomics, Charlestown, MA USA

²³⁶Annai Systems, Inc, Carlsbad, CA USA

²³⁷Department of Pathology, General Hospital of Treviso, Department of Medicine, University of Padua, Treviso, Italy

²³⁸Department of Computational Biology, University of Lausanne, Lausanne, Switzerland

²³⁹Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, CH Switzerland

²⁴⁰Swiss Institute of Bioinformatics, University of Geneva, Geneva, CH Switzerland

²⁴¹The Francis Crick Institute, London, UK

²⁴²University of Leuven, Leuven, Belgium

²⁴³Institute of Medical Genetics and Applied Genomics, University of Tübingen, Tübingen, Germany

²⁴⁴Computational and Systems Biology, Genome Institute of Singapore, Singapore, Singapore

²⁴⁵School of Computing, National University of Singapore, Singapore, Singapore

²⁴⁶Big Data Institute, Li Ka Shing Centre, University of Oxford, Oxford, UK

²⁴⁷Biomedical Data Science Laboratory, Francis Crick Institute, London, UK

²⁴⁸Bioinformatics Group, Department of Computer Science, University College London, London, UK

²⁴⁹The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON Canada

²⁵⁰Breast Cancer Translational Research Laboratory JC Heuson, Institut Jules Bordet, Brussels, Belgium

²⁵¹Department of Oncology, Laboratory for Translational Breast Cancer Research, KU Leuven, Leuven, Belgium

²⁵²Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona, Spain

²⁵³Research Program on Biomedical Informatics, Universitat Pompeu Fabra, Barcelona, Spain

²⁵⁴Division of Medical Oncology, Princess Margaret Cancer Centre, Toronto, ON Canada

²⁵⁵Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY USA

²⁵⁶Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY USA

²⁵⁷Department of Pathology, UPMC Shadyside, Pittsburgh, PA USA

²⁵⁸Independent Consultant, Wellesley, USA

²⁵⁹Department of Cell and Molecular Biology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden

²⁶⁰Department of Medicine and Department of Genetics, Washington University School of Medicine, St. Louis, St. Louis, MO USA

²⁶¹Hefei University of Technology, Anhui, China

²⁶²Translational Cancer Research Unit, GZA Hospitals St.-Augustinus, Center for Oncological Research, Faculty of Medicine and Health Sciences, University of Antwerp, Antwerp, Belgium

²⁶³Simon Fraser University, Burnaby, BC Canada

²⁶⁴University of Pennsylvania, Philadelphia, PA USA

²⁶⁵Faculty of Science and Technology, University of Vic—Central University of Catalonia (UVic-UCC), Vic, Spain

²⁶⁶The Wellcome Trust, London, UK

²⁶⁷The Hospital for Sick Children, Toronto, ON Canada

²⁶⁸Department of Pathology, Queen Elizabeth University Hospital, Glasgow, UK

²⁶⁹Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, Brisbane, QLD Australia

²⁷⁰Department of Oncology, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK

²⁷¹Department of Public Health and Primary Care, Centre for Cancer Genetic Epidemiology, University of Cambridge, Cambridge, UK

²⁷²Prostate Cancer Canada, Toronto, ON Canada

²⁷³University of Cambridge, Cambridge, UK

²⁷⁴Department of Laboratory Medicine, Translational Cancer Research, Lund University Cancer Center at Medicon Village, Lund University, Lund, Sweden

²⁷⁵Heidelberg University, Heidelberg, Germany

²⁷⁶New BIH Digital Health Center, Berlin Institute of Health (BIH) and Charité - Universitätsmedizin Berlin, Berlin, Germany

²⁷⁷CIBER Epidemiología y Salud Pública (CIBERESP), Madrid, Spain

²⁷⁸Research Group on Statistics, Econometrics and Health (GRECS), UdG, Barcelona, Spain

²⁷⁹Quantitative Genomics Laboratories (qGenomics), Barcelona, Spain

²⁸⁰Icelandic Cancer Registry, Icelandic Cancer Society, Reykjavik, Iceland

²⁸¹State Key Laboratory of Cancer Biology, and Xijing Hospital of Digestive Diseases, Fourth Military Medical University, Shaanxi, China

²⁸²Department of Medicine (DIMED), Surgical Pathology Unit, University of Padua, Padua, Italy

²⁸³Rigshospitalet, Copenhagen, Denmark

²⁸⁴Center for Cancer Genomics, National Cancer Institute, National Institutes of Health, Bethesda, MD USA

²⁸⁵Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, QC Canada

²⁸⁶Australian Institute of Tropical Health and Medicine, James Cook University, Douglas, QLD Australia

²⁸⁷Department of Neuro-Oncology, Istituto Neurologico Besta, Milano, Italy

²⁸⁸Bioplatforms Australia, North Ryde, NSW Australia

²⁸⁹Department of Pathology (Research), University College London Cancer Institute, London, UK

²⁹⁰Department of Surgical Oncology, Princess Margaret Cancer Centre, Toronto, ON Canada

²⁹¹Department of Medical Oncology, Josephine Nefkens Institute and Cancer Genomics Centre, Erasmus Medical Center, Rotterdam, CN The Netherlands

²⁹²The University of Queensland Thoracic Research Centre, The Prince Charles Hospital, Brisbane, QLD Australia

²⁹³CIBIO/InBIO - Research Center in Biodiversity and Genetic Resources, Universidade do Porto, Vairão, Portugal

²⁹⁴HCA Laboratories, London, UK

²⁹⁵University of Liverpool, Liverpool, UK

²⁹⁶The Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel

²⁹⁷Department of Neurosurgery, University of Florida, Gainesville, FL USA

²⁹⁸Department of Pathology, Graduate School of Medicine, University of Tokyo, Tokyo, Japan

²⁹⁹University of Milano Bicocca, Monza, Italy

³⁰⁰BGI-Shenzhen, Shenzhen, China

³⁰¹Department of Pathology, Oslo University Hospital Ulleval, Oslo, Norway

³⁰²Center for Biomedical Informatics, Harvard Medical School, Boston, MA USA

³⁰³Department Biochemistry and Molecular Biomedicine, University of Barcelona, Barcelona, Spain

³⁰⁴Office of Cancer Genomics, National Cancer Institute, National Institutes of Health, Bethesda, MD USA

³⁰⁵Cancer Epigenomics, German Cancer Research Center (DKFZ), Heidelberg, Germany

³⁰⁶Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA

³⁰⁷Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX USA

³⁰⁸Department of Computer Science, Yale University, New Haven, CT USA

³⁰⁹Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT USA

³¹⁰Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT USA

³¹¹Center for Cancer Research, Massachusetts General Hospital, Boston, MA USA

³¹²Department of Pathology, Massachusetts General Hospital, Boston, MA USA

³¹³Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY USA

³¹⁴Division of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN USA

³¹⁵University of Sydney, Sydney, NSW Australia

³¹⁶University of Oxford, Oxford, UK

³¹⁷Department of Surgery, Academic Urology Group, University of Cambridge, Cambridge, UK

³¹⁸Department of Medicine II, University of Würzburg, Wuerzburg, Germany

³¹⁹Sylvester Comprehensive Cancer Center, University of Miami, Miami, FL USA

³²⁰Institut Hospital del Mar d’Investigacions Mèdiques (IMIM), Barcelona, Spain

³²¹Genome Integrity and Structural Biology Laboratory, National Institute of Environmental Health Sciences (NIEHS), Durham, NC USA

³²²St. Thomas’s Hospital, London, UK

³²³Osaka International Cancer Center, Osaka, Japan

³²⁴Department of Pathology, Skåne University Hospital, Lund University, Lund, Sweden

³²⁵Department of Medical Oncology, Beatson West of Scotland Cancer Centre, Glasgow, UK

³²⁶National Human Genome Research Institute, National Institutes of Health, Bethesda, MD USA

³²⁷Centre for Cancer Research, Victorian Comprehensive Cancer Centre, University of Melbourne, Melbourne, VIC Australia

³²⁸Department of Medicine, Section of Hematology/Oncology, University of Chicago, Chicago, IL USA

³²⁹German Center for Infection Research (DZIF), Partner Site Hamburg-Borstel-Lübeck-Riems, Hamburg, Germany

³³⁰Bioinformatics Research Centre (BiRC), Aarhus University, Aarhus, Denmark

³³¹Department of Biotechnology, Ministry of Science and Technology, Government of India, New Delhi, Delhi India

³³²National Cancer Centre Singapore, Singapore, Singapore

³³³Brandeis University, Waltham, MA USA

³³⁴Department of Urologic Sciences, University of British Columbia, Vancouver, BC Canada

³³⁵Department of Internal Medicine, Stanford University, Stanford, CA USA

³³⁶The University of Texas Health Science Center at Houston, Houston, TX USA

³³⁷Imperial College NHS Trust, Imperial College, London, INY UK

³³⁸Senckenberg Institute of Pathology, University of Frankfurt Medical School, Frankfurt, Germany

³³⁹Department of Medicine, Division of Biomedical Informatics, UC San Diego School of Medicine, San Diego, CA USA

³⁴⁰Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center, Houston, TX USA

³⁴¹Oxford Nanopore Technologies, New York, NY USA

³⁴²Institute of Medical Science, University of Tokyo, Tokyo, Japan

³⁴³Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA USA

³⁴⁴Wakayama Medical University, Wakayama, Japan

³⁴⁵Department of Internal Medicine, Division of Medical Oncology, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC USA

³⁴⁶University of Tennessee Health Science Center for Cancer Research, Memphis, TN USA

³⁴⁷Department of Histopathology, Salford Royal NHS Foundation Trust, Salford, UK

³⁴⁸Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK

³⁴⁹BIOPIC, ICG and College of Life Sciences, Peking University, Beijing, China

³⁵⁰Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, China

³⁵¹Children’s Hospital of Philadelphia, Philadelphia, PA USA

³⁵²Department of Bioinformatics and Computational Biology and Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA

³⁵³Karolinska Institute, Stockholm, Sweden

³⁵⁴The Donnelly Centre, University of Toronto, Toronto, ON Canada

³⁵⁵Department of Medical Genetics, College of Medicine, Hallym University, Chuncheon, South Korea

³⁵⁶Department of Experimental and Health Sciences, Institute of Evolutionary Biology (UPF-CSIC), Universitat Pompeu Fabra, Barcelona, Spain

³⁵⁷Health Data Science Unit, University Clinics, Heidelberg, Germany

³⁵⁸Massachusetts General Hospital Center for Cancer Research, Charlestown, MA USA

³⁵⁹Hokkaido University, Sapporo, Japan

³⁶⁰Department of Pathology and Clinical Laboratory, National Cancer Center Hospital, Tokyo, Japan

³⁶¹Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC USA

³⁶²Computational Biology, Leibniz Institute on Aging - Fritz Lipmann Institute (FLI), Jena, Germany

³⁶³University of Melbourne Centre for Cancer Research, Melbourne, VIC Australia

³⁶⁴University of Nebraska Medical Center, Omaha, NE USA

³⁶⁵Syntekabio Inc, Daejeon, South Korea

³⁶⁶Department of Pathology, Academic Medical Center, Amsterdam, AZ The Netherlands

³⁶⁷China National GeneBank-Shenzhen, Shenzhen, China

³⁶⁸Division of Molecular Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany

³⁶⁹Division of Life Science and Applied Genomics Center, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China

³⁷⁰Icahn School of Medicine at Mount Sinai, New York, NY USA

³⁷¹Geneplus-Shenzhen, Shenzhen, China

³⁷²School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China

³⁷³AbbVie, North Chicago, IL USA

³⁷⁴Institute of Pathology, Charité – University Medicine Berlin, Berlin, Germany

³⁷⁵Centre for Translational and Applied Genomics, British Columbia Cancer Agency, Vancouver, BC Canada

³⁷⁶Edinburgh Royal Infirmary, Edinburgh, UK

³⁷⁷Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Berlin, Germany

³⁷⁸Department of Pediatric Immunology, Hematology and Oncology, University Hospital, Heidelberg, Germany

³⁷⁹German Cancer Research Center (DKFZ), Heidelberg, Germany

³⁸⁰Heidelberg Institute for Stem Cell Technology and Experimental Medicine (HI-STEM), Heidelberg, Germany

³⁸¹Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY USA

³⁸²New York Genome Center, New York, NY USA

³⁸³Department of Urology, James Buchanan Brady Urological Institute, Johns Hopkins University School of Medicine, Baltimore, MD USA

³⁸⁴Department of Preventive Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan

³⁸⁵Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX USA

³⁸⁶Department of Pathology and Immunology, Baylor College of Medicine, Houston, TX USA

³⁸⁷Michael E. DeBakey Veterans Affairs Medical Center, Houston, TX USA

³⁸⁸Technical University of Denmark, Lyngby, Denmark

³⁸⁹Department of Pathology, College of Medicine, Hanyang University, Seoul, South Korea

³⁹⁰Academic Unit of Surgery, School of Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow Royal Infirmary, Glasgow, UK

³⁹¹Department of Pathology, Asan Medical Center, College of Medicine, Ulsan University, Songpa-gu, Seoul South Korea

³⁹²Science Writer, Garrett Park, MD USA

³⁹³International Cancer Genome Consortium (ICGC)/ICGC Accelerating Research in Genomic Oncology (ARGO) Secretariat, Ontario Institute for Cancer Research, Toronto, ON Canada

³⁹⁴University of Ljubljana, Ljubljana, Slovenia

³⁹⁵Department of Public Health Sciences, University of Chicago, Chicago, IL USA

³⁹⁶Research Institute, NorthShore University HealthSystem, Evanston, IL USA

³⁹⁷Department for Biomedical Research, University of Bern, Bern, Switzerland

³⁹⁸Centre of Genomics and Policy, McGill University and Génome Québec Innovation Centre, Montreal, QC Canada

³⁹⁹Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, NC USA

⁴⁰⁰Hopp Children’s Cancer Center (KiTZ), Heidelberg, Germany

⁴⁰¹Pediatric Glioma Research Group, German Cancer Research Center (DKFZ), Heidelberg, Germany

⁴⁰²Cancer Research UK, London, UK

⁴⁰³Indivumed GmbH, Hamburg, Germany

⁴⁰⁴Genome Integration Data Center, Syntekabio, Inc, Daejeon, South Korea

⁴⁰⁵University Hospital Zurich, Zurich, Switzerland

⁴⁰⁶Clinical Bioinformatics, Swiss Institute of Bioinformatics, Geneva, Switzerland

⁴⁰⁷Institute for Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland

⁴⁰⁸Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland

⁴⁰⁹MRC Human Genetics Unit, MRC IGMM, University of Edinburgh, Edinburgh, UK

⁴¹⁰Women’s Cancer Program at the Samuel Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los Angeles, CA USA

⁴¹¹Department of Biology, Bioinformatics Group, Division of Molecular Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia

⁴¹²Department for Internal Medicine II, University Hospital Schleswig-Holstein, Kiel, Germany

⁴¹³Genetics and Molecular Pathology, SA Pathology, Adelaide, SA Australia

⁴¹⁴Department of Gastric Surgery, National Cancer Center Hospital, Tokyo, Japan

⁴¹⁵Department of Bioinformatics, Division of Cancer Genomics, National Cancer Center Research Institute, Tokyo, Japan

⁴¹⁶A.A. Kharkevich Institute of Information Transmission Problems, Moscow, Russia

⁴¹⁷Oncology and Immunology, Dmitry Rogachev National Research Center of Pediatric Hematology, Moscow, Russia

⁴¹⁸Skolkovo Institute of Science and Technology, Moscow, Russia

⁴¹⁹Department of Surgery, The George Washington University, School of Medicine and Health Science, Washington, DC USA

⁴²⁰Endocrine Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD USA

⁴²¹Melanoma Institute Australia, Macquarie University, Sydney, NSW Australia

⁴²²MIT Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA USA

⁴²³Tissue Pathology and Diagnostic Oncology, Royal Prince Alfred Hospital, Sydney, NSW Australia

⁴²⁴Cholangiocarcinoma Screening and Care Program and Liver Fluke and Cholangiocarcinoma Research Centre, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand

⁴²⁵Controlled Department and Institution, New York, NY USA

⁴²⁶Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY USA

⁴²⁷National Cancer Center, Gyeonggi, South Korea

⁴²⁸Department of Biochemistry, College of Medicine, Ewha Womans University, Seoul, South Korea

⁴²⁹Health Sciences Department of Biomedical Informatics, University of California San Diego, La Jolla, CA USA

⁴³⁰Research Core Center, National Cancer Centre Korea, Goyang-si, South Korea

⁴³¹Department of Health Sciences and Technology, Sungkyunkwan University School of Medicine, Seoul, South Korea

⁴³²Samsung Genome Institute, Seoul, South Korea

⁴³³Breast Oncology Program, Dana-Farber/Brigham and Women’s Cancer Center, Boston, MA USA

⁴³⁴Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY USA

⁴³⁵Division of Breast Surgery, Brigham and Women’s Hospital, Boston, MA USA

⁴³⁶Integrative Bioinformatics Support Group, National Institute of Environmental Health Sciences (NIEHS), Durham, NC USA

⁴³⁷Department of Clinical Science, University of Bergen, Bergen, Norway

⁴³⁸Center For Medical Innovation, Seoul National University Hospital, Seoul, South Korea

⁴³⁹Department of Internal Medicine, Seoul National University Hospital, Seoul, South Korea

⁴⁴⁰Institute of Computer Science, Polish Academy of Sciences, Warsawa, Poland

⁴⁴¹Functional and Structural Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany

⁴⁴²Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, , National Institutes of Health, Bethesda, MD USA

⁴⁴³Institute for Medical Informatics Statistics and Epidemiology, University of Leipzig, Leipzig, Germany

⁴⁴⁴Morgan Welch Inflammatory Breast Cancer Research Program and Clinic, The University of Texas MD Anderson Cancer Center, Houston, TX USA

⁴⁴⁵Department of Hematology and Oncology, Georg-Augusts-University of Göttingen, Göttingen, Germany

⁴⁴⁶Institute of Cell Biology (Cancer Research), University of Duisburg-Essen, Essen, Germany

⁴⁴⁷King’s College London and Guy’s and St. Thomas’ NHS Foundation Trust, London, UK

⁴⁴⁸Center for Epigenetics, Van Andel Research Institute, Grand Rapids, MI USA

⁴⁴⁹The University of Queensland Centre for Clinical Research, Royal Brisbane and Women’s Hospital, Herston, QLD Australia

⁴⁵⁰Department of Pediatric Oncology and Hematology, University of Cologne, Cologne, Germany

⁴⁵¹University of Düsseldorf, Düsseldorf, Germany

⁴⁵²Department of Pathology, Institut Jules Bordet, Brussels, Belgium

⁴⁵³Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden

⁴⁵⁴Children’s Medical Research Institute, Sydney, NSW Australia

⁴⁵⁵ILSbio, LLC Biobank, Chestertown, MD USA

⁴⁵⁶Division of Genetics and Genomics, Boston Children’s Hospital, Harvard Medical School, Boston, MA USA

⁴⁵⁷Institute for Bioengineering and Biopharmaceutical Research (IBBR), Hanyang University, Seoul, South Korea

⁴⁵⁸Department of Statistics, University of California Santa Cruz, Santa Cruz, CA USA

⁴⁵⁹National Genotyping Center, Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan

⁴⁶⁰Department of Vertebrate Genomics/Otto Warburg Laboratory Gene Regulation and Systems Biology of Cancer, Max Planck Institute for Molecular Genetics, Berlin, Germany

⁴⁶¹McGill University and Genome Quebec Innovation Centre, Montreal, QC Canada

⁴⁶²biobyte solutions GmbH, Heidelberg, Germany

⁴⁶³Gynecologic Oncology, NYU Laura and Isaac Perlmutter Cancer Center, New York University, New York, NY USA

⁴⁶⁴Division of Oncology, Stem Cell Biology Section, Washington University School of Medicine, St. Louis, MO USA

⁴⁶⁵Department of Systems Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA

⁴⁶⁶Harvard University, Cambridge, MA USA

⁴⁶⁷Urologic Oncology Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD USA

⁴⁶⁸University of Oslo, Oslo, Norway

⁴⁶⁹University of Toronto, Toronto, ON Canada

⁴⁷⁰Peking University, Beijing, China

⁴⁷¹School of Life Sciences, Peking University, Beijing, China

⁴⁷²Leidos Biomedical Research, Inc, McLean, VA USA

⁴⁷³Hematology, Hospital Clinic, Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona, Barcelona, Spain

⁴⁷⁴Second Military Medical University, Shanghai, China

⁴⁷⁵Chinese Cancer Genome Consortium, Shenzhen, China

⁴⁷⁶Department of Medical Oncology, Beijing Hospital, Beijing, China

⁴⁷⁷Laboratory of Molecular Oncology, Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education), Peking University Cancer Hospital and Institute, Beijing, China

⁴⁷⁸School of Medicine/School of Mathematics and Statistics, University of St. Andrews, St, Andrews, Fife UK

⁴⁷⁹Institute for Systems Biology, Seattle, WA USA

⁴⁸⁰Department of Biochemistry and Molecular Biology, Faculty of Medicine, University Institute of Oncology-IUOPA, Oviedo, Spain

⁴⁸¹Institut Bergonié, Bordeaux, France

⁴⁸²Cancer Unit, MRC University of Cambridge, Cambridge, UK

⁴⁸³Department of Pathology and Laboratory Medicine, Center for Personalized Medicine, Children’s Hospital Los Angeles, Los Angeles, CA USA

⁴⁸⁴John Curtin School of Medical Research, Canberra, ACT Australia

⁴⁸⁵MVZ Department of Oncology, PraxisClinic am Johannisplatz, Leipzig, Germany

⁴⁸⁶Department of Information Technology, Ghent University, Ghent, Belgium

⁴⁸⁷Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium

⁴⁸⁸Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH USA

⁴⁸⁹Computational Biology Program, School of Medicine, Oregon Health and Science University, Portland, OR USA

⁴⁹⁰Department of Surgery, Duke University, Durham, NC USA

⁴⁹¹Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain

⁴⁹²Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain

⁴⁹³University of Glasgow, Glasgow, UK

⁴⁹⁴Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain

⁴⁹⁵Division of Oncology, Washington University School of Medicine, St. Louis, MO USA

⁴⁹⁶Department of Surgery and Cancer, Imperial College, London, INY UK

⁴⁹⁷Applications Department, Oxford Nanopore Technologies, Oxford, UK

⁴⁹⁸Department of Obstetrics, Gynecology and Reproductive Services, University of California San Francisco, San Francisco, CA USA

⁴⁹⁹Department of Biochemistry and Molecular Medicine, University California at Davis, Sacramento, CA USA

⁵⁰⁰STTARR Innovation Facility, Princess Margaret Cancer Centre, Toronto, ON Canada

⁵⁰¹Discipline of Surgery, Western Sydney University, Penrith, NSW Australia

⁵⁰²Yale School of Medicine, Yale University, New Haven, CT USA

⁵⁰³Department of Genetics, Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC USA

⁵⁰⁴Departments of Neurology and Neurosurgery, Henry Ford Hospital, Detroit, MI USA

⁵⁰⁵Precision Oncology, OHSU Knight Cancer Institute, Oregon Health and Science University, Portland, OR USA

⁵⁰⁶Institute of Pathology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

⁵⁰⁷Department of Health Sciences, Faculty of Medical Sciences, Kyushu University, Fukuoka, Japan

⁵⁰⁸Heidelberg Academy of Sciences and Humanities, Heidelberg, Germany

⁵⁰⁹Department of Clinical Pathology, University of Melbourne, Melbourne, VIC, Australia

⁵¹⁰Department of Pathology, Roswell Park Cancer Institute, Buffalo, NY USA

⁵¹¹Department of Computer Science, University of Helsinki, Helsinki, Finland

⁵¹²Institute of Biotechnology, University of Helsinki, Helsinki, Finland

⁵¹³Organismal and Evolutionary Biology Research Programme, University of Helsinki, Helsinki, Finland

⁵¹⁴Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, Washington University School of Medicine, St. Louis, MO USA

⁵¹⁵Penrose St. Francis Health Services, Colorado Springs, CO USA

⁵¹⁶Institute of Pathology, Ulm University and University Hospital of Ulm, Ulm, Germany

⁵¹⁷National Cancer Center, Tokyo, Japan

⁵¹⁸Genome Institute of Singapore, Singapore, Singapore

⁵¹⁹32Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT USA

⁵²⁰German Cancer Aid, Bonn, Germany

⁵²¹Programme in Cancer and Stem Cell Biology, Centre for Computational Biology, Duke-NUS Medical School, Singapore, Singapore

⁵²²The Chinese University of Hong Kong, Shatin, NT, Hong Kong China

⁵²³Fourth Military Medical University, Shaanxi, China

⁵²⁴The University of Cambridge School of Clinical Medicine, Cambridge, UK

⁵²⁵St. Jude Children’s Research Hospital, Memphis, TN USA

⁵²⁶University Health Network, Princess Margaret Cancer Centre, Toronto, ON Canada

⁵²⁷Center for Biomolecular Science and Engineering, University of California Santa Cruz, Santa Cruz, CA USA

⁵²⁸Department of Medicine, University of Chicago, Chicago, IL USA

⁵²⁹Department of Neurology, Mayo Clinic, Rochester, MN USA

⁵³⁰Cambridge Oesophagogastric Centre, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK

⁵³¹Department of Computer Science, Carleton College, Northfield, MN USA

⁵³²Institute of Cancer Sciences, College of Medical Veterinary and Life Sciences, University of Glasgow, Glasgow, UK

⁵³³Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL USA

⁵³⁴HudsonAlpha Institute for Biotechnology, Huntsville, AL USA

⁵³⁵O’Neal Comprehensive Cancer Center, University of Alabama at Birmingham, Birmingham, AL USA

⁵³⁶Department of Pathology, Keio University School of Medicine, Tokyo, Japan

⁵³⁷Department of Hepatobiliary and Pancreatic Oncology, National Cancer Center Hospital, Tokyo, Japan

⁵³⁸Sage Bionetworks, Seattle, WA USA

⁵³⁹Lymphoma Genomic Translational Research Laboratory, National Cancer Centre, Singapore, Singapore

⁵⁴⁰Department of Clinical Pathology, Robert-Bosch-Hospital, Stuttgart, Germany

⁵⁴¹Department of Cell and Systems Biology, University of Toronto, Toronto, ON Canada

⁵⁴²Department of Biosciences and Nutrition, Karolinska Institutet, Stockholm, Sweden

⁵⁴³Center for Liver Cancer, Research Institute and Hospital, National Cancer Center, Gyeonggi, South Korea

⁵⁴⁴Division of Hematology-Oncology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, South Korea

⁵⁴⁵Samsung Advanced Institute for Health Sciences and Technology, Sungkyunkwan University School of Medicine, Seoul, South Korea

⁵⁴⁶Cheonan Industry-Academic Collaboration Foundation, Sangmyung University, Cheonan, South Korea

⁵⁴⁷NYU Langone Medical Center, New York, NY USA

⁵⁴⁸Department of Hematology and Medical Oncology, Cleveland Clinic, Cleveland, OH USA

⁵⁴⁹Department of Radiation Oncology, University of California San Francisco, San Francisco, CA USA

⁵⁵⁰Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA

⁵⁵¹Helen F. Graham Cancer Center at Christiana Care Health Systems, Newark, DE USA

⁵⁵²Heidelberg University Hospital, Heidelberg, Germany

⁵⁵³CSRA Incorporated, Fairfax, VA USA

⁵⁵⁴Research Department of Pathology, University College London Cancer Institute, London, UK

⁵⁵⁵Department of Research Oncology, Guy’s Hospital, King’s Health Partners AHSC, King’s College London School of Medicine, London, UK

⁵⁵⁶Faculty of Medicine and Health Sciences, Macquarie University, Sydney, NSW Australia

⁵⁵⁷University Hospital of Minjoz, INSERM UMR 1098, Besançon, France

⁵⁵⁸Spanish National Cancer Research Centre, Madrid, Spain

⁵⁵⁹Center of Digestive Diseases and Liver Transplantation, Fundeni Clinical Institute, Bucharest, Romania

⁵⁶⁰Cureline, Inc, South San Francisco, CA USA

⁵⁶¹St. Luke’s Cancer Centre, Royal Surrey County Hospital NHS Foundation Trust, Guildford, UK

⁵⁶²Cambridge Breast Unit, Addenbrooke’s Hospital, Cambridge University Hospital NHS Foundation Trust and NIHR Cambridge Biomedical Research Centre, Cambridge, UK

⁵⁶³East of Scotland Breast Service, Ninewells Hospital, Aberdeen, UK

⁵⁶⁴Department of Genetics, Microbiology and Statistics, University of Barcelona, IRSJD, IBUB, Barcelona, Spain

⁵⁶⁵Department of Obstetrics and Gynecology, Medical College of Wisconsin, Milwaukee, WI USA

⁵⁶⁶Hematology and Medical Oncology, Winship Cancer Institute of Emory University, Atlanta, GA USA

⁵⁶⁷Department of Computer Science, Princeton University, Princeton, NJ USA

⁵⁶⁸Vanderbilt Ingram Cancer Center, Vanderbilt University, Nashville, TN USA

⁵⁶⁹Ohio State University College of Medicine and Arthur G. James Comprehensive Cancer Center, Columbus, OH USA

⁵⁷⁰Department of Surgery, Yokohama City University Graduate School of Medicine, Kanagawa, Japan

⁵⁷¹Division of Chromatin Networks, German Cancer Research Center (DKFZ) and BioQuant, Heidelberg, Germany

⁵⁷²Research Computing Center, University of North Carolina at Chapel Hill, Chapel Hill, NC USA

⁵⁷³School of Molecular Biosciences and Center for Reproductive Biology, Washington State University, Pullman, WA USA

⁵⁷⁴Finsen Laboratory and Biotech Research and Innovation Centre (BRIC), University of Copenhagen, Copenhagen, Denmark

⁵⁷⁵Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON Canada

⁵⁷⁶Department of Pathology, Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY USA

⁵⁷⁷University Hospital Giessen, Pediatric Hematology and Oncology, Giessen, Germany

⁵⁷⁸Oncologie Sénologie, ICM Institut Régional du Cancer, Montpellier, France

⁵⁷⁹Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, Germany

⁵⁸⁰Institute of Pathology, University of Wuerzburg, Wuerzburg, Germany

⁵⁸¹Department of Urology, North Bristol NHS Trust, Bristol, UK

⁵⁸²SingHealth, Duke-NUS Institute of Precision Medicine, National Heart Centre Singapore, Singapore, Singapore

⁵⁸³Department of Computer Science, University of Toronto, Toronto, ON Canada

⁵⁸⁴Bern Center for Precision Medicine, University Hospital of Bern, University of Bern, Bern, Switzerland

⁵⁸⁵Englander Institute for Precision Medicine, Weill Cornell Medicine and New York Presbyterian Hospital, New York, NY USA

⁵⁸⁶Meyer Cancer Center, Weill Cornell Medicine, New York, NY USA

⁵⁸⁷Pathology and Laboratory, Weill Cornell Medical College, New York, NY USA

⁵⁸⁸Vall d’Hebron Institute of Oncology: VHIO, Barcelona, Spain

⁵⁸⁹General and Hepatobiliary-Biliary Surgery, Pancreas Institute, University and Hospital Trust of Verona, Verona, Italy

⁵⁹⁰National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, India

⁵⁹¹Indiana University, Bloomington, IN USA

⁵⁹²Department of Pathology, GZA-ZNA Hospitals, Antwerp, Belgium

⁵⁹³Analytical Biological Services, Inc, Wilmington, DE USA

⁵⁹⁴Sydney Medical School, University of Sydney, Sydney, NSW Australia

⁵⁹⁵cBio Center, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA USA

⁵⁹⁶Department of Cell Biology, Harvard Medical School, Boston, MA USA

⁵⁹⁷Advanced Centre for Treatment Research and Education in Cancer, Tata Memorial Centre, Navi Mumbai, Maharashtra India

⁵⁹⁸School of Environmental and Life Sciences, Faculty of Science, The University of Newcastle, Ourimbah, NSW Australia

⁵⁹⁹Department of Dermatology, University Hospital of Essen, Essen, Germany

⁶⁰⁰Bioinformatics and Omics Data Analytics, German Cancer Research Center (DKFZ), Heidelberg, Germany

⁶⁰¹Department of Urology, Charité Universitätsmedizin Berlin, Berlin, Germany

⁶⁰²Martini-Clinic, Prostate Cancer Center, University Medical Center Hamburg-Eppendorf, Hamburg, Germany

⁶⁰³Department of General Internal Medicine, University of Kiel, Kiel, Germany

⁶⁰⁴German Cancer Consortium (DKTK), Partner site Berlin, Berlin, Germany

⁶⁰⁵Cancer Research Institute, Beth Israel Deaconess Medical Center, Boston, MA USA

⁶⁰⁶University of Pittsburgh, Pittsburgh, PA USA

⁶⁰⁷Department of Ophthalmology and Ocular Genomics Institute, Massachusetts Eye and Ear, Harvard Medical School, Boston, MA USA

⁶⁰⁸Center for Psychiatric Genetics, NorthShore University HealthSystem, Evanston, IL USA

⁶⁰⁹Van Andel Research Institute, Grand Rapids, MI USA

⁶¹⁰Laboratory of Molecular Medicine, Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan

⁶¹¹Japan Agency for Medical Research and Development, Tokyo, Japan

⁶¹²Korea University, Seoul, South Korea

⁶¹³Murtha Cancer Center, Walter Reed National Military Medical Center, Bethesda, MD USA

⁶¹⁴Human Genetics, University of Kiel, Kiel, Germany

⁶¹⁵Department of Oncologic Pathology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA USA

⁶¹⁶Oregon Health and Science University, Portland, OR USA

⁶¹⁷Center for RNA Interference and Noncoding RNA, The University of Texas MD Anderson Cancer Center, Houston, TX USA

⁶¹⁸Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX USA

⁶¹⁹Department of Gynecologic Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center, Houston, TX USA

⁶²⁰University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK

⁶²¹Department of Radiation Oncology, Radboud University Nijmegen Medical Centre, Nijmegen, GA The Netherlands

⁶²²Institute for Genomics and Systems Biology, University of Chicago, Chicago, IL USA

⁶²³Clinic for Hematology and Oncology, St.-Antonius-Hospital, Eschweiler, Germany

⁶²⁴Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY USA

⁶²⁵University of Iceland, Reykjavik, Iceland

⁶²⁶Division of Computational Genomics and Systems Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany

⁶²⁷Dundee Cancer Centre, Ninewells Hospital, Dundee, UK

⁶²⁸Department for Internal Medicine III, University of Ulm and University Hospital of Ulm, Ulm, Germany

⁶²⁹Institut Curie, INSERM Unit 830, Paris, France

⁶³⁰Department of Gastroenterology and Hepatology, Yokohama City University Graduate School of Medicine, Kanagawa, Japan

⁶³¹Department of Laboratory Medicine, Radboud University Nijmegen Medical Centre, Nijmegen, GA The Netherlands

⁶³²Division of Cancer Genome Research, German Cancer Research Center (DKFZ), Heidelberg, Germany

⁶³³Department of General Surgery, Singapore General Hospital, Singapore, Singapore

⁶³⁴Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore

⁶³⁵Department of Medical and Clinical Genetics, Genome-Scale Biology Research Program, University of Helsinki, Helsinki, Finland

⁶³⁶East Anglian Medical Genetics Service, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK

⁶³⁷Irving Institute for Cancer Dynamics, Columbia University, New York, NY USA

⁶³⁸Institute of Molecular and Cell Biology, Singapore, Singapore

⁶³⁹Laboratory of Cancer Epigenome, Division of Medical Science, National Cancer Centre Singapore, Singapore, Singapore

⁶⁴⁰Universite Lyon, INCa-Synergie, Centre Léon Bérard, Lyon, France

⁶⁴¹Department of Urology, Mayo Clinic, Rochester, MN USA

⁶⁴²Royal National Orthopaedic Hospital - Stanmore, Stanmore, Middlesex UK

⁶⁴³Department of Biochemistry, Genetics and Immunology, University of Vigo, Vigo, Spain

⁶⁴⁴Giovanni Paolo II / I.R.C.C.S. Cancer Institute, Bari, BA Italy

⁶⁴⁵Neuroblastoma Genomics, German Cancer Research Center (DKFZ), Heidelberg, Germany

⁶⁴⁶Fondazione Policlinico Universitario Gemelli IRCCS, Rome, Italy, Rome, Italy

⁶⁴⁷University of Verona, Verona, Italy

⁶⁴⁸Centre National de Génotypage, CEA - Institute de Génomique, Evry, France

⁶⁴⁹CAPHRI Research School, Maastricht University, Maastricht, ER The Netherlands

⁶⁵⁰Department of Biopathology, Centre Léon Bérard, Lyon, France

⁶⁵¹Université Claude Bernard Lyon 1, Villeurbanne, France

⁶⁵²Core Research for Evolutional Science and Technology (CREST), JST, Tokyo, Japan

⁶⁵³Department of Biological Sciences, Laboratory for Medical Science Mathematics, Graduate School of Science, University of Tokyo, Yokohama, Japan

⁶⁵⁴Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, Japan

⁶⁵⁵Cancer Ageing and Somatic Mutation Programme, Wellcome Sanger Institute, Hinxton, UK

⁶⁵⁶University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK

⁶⁵⁷Centre for Cancer Research and Cell Biology, Queen’s University, Belfast, UK

⁶⁵⁸Breast Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX USA

⁶⁵⁹Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, MD USA

⁶⁶⁰Department of Oncology-Pathology, Science for Life Laboratory, Karolinska Institute, Stockholm, Sweden

⁶⁶¹School of Cancer Sciences, Faculty of Medicine, University of Southampton, Southampton, UK

⁶⁶²Department of Gene Technology, Tallinn University of Technology, Tallinn, Estonia

⁶⁶³Genetics and Genome Biology Program, SickKids Research Institute, The Hospital for Sick Children, Toronto, ON Canada

⁶⁶⁴Departments of Neurosurgery and Hematology and Medical Oncology, Winship Cancer Institute and School of Medicine, Emory University, Atlanta, GA USA

⁶⁶⁵Department of Clinical and Molecular Medicine, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway

⁶⁶⁶Argmix Consulting, North Vancouver, BC Canada

⁶⁶⁷Department of Information Technology, Ghent University, Interuniversitair Micro-Electronica Centrum (IMEC), Ghent, Belgium

⁶⁶⁸Nuffield Department of Surgical Sciences, John Radcliffe Hospital, University of Oxford, Oxford, UK

⁶⁶⁹Institute of Mathematics and Computer Science, University of Latvia, Riga, LV Latvia

⁶⁷⁰Discipline of Pathology, Sydney Medical School, University of Sydney, Sydney, NSW Australia

⁶⁷¹Department of Applied Mathematics and Theoretical Physics, Centre for Mathematical Sciences, University of Cambridge, Cambridge, UK

⁶⁷²Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY USA

⁶⁷³Department of Statistics, Columbia University, New York, NY USA

⁶⁷⁴Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden

⁶⁷⁵School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China

⁶⁷⁶Department of Histopathology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK

⁶⁷⁷Oxford NIHR Biomedical Research Centre, University of Oxford, Oxford, UK

⁶⁷⁸Georgia Regents University Cancer Center, Augusta, GA USA

⁶⁷⁹Wythenshawe Hospital, Manchester, UK

⁶⁸⁰Department of Genetics, Washington University School of Medicine, St.Louis, MO USA

⁶⁸¹Department of Biological Oceanography, Leibniz Institute of Baltic Sea Research, Rostock, Germany

⁶⁸²Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK

⁶⁸³Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX USA

⁶⁸⁴Thoracic Oncology Laboratory, Mayo Clinic, Rochester, MN USA

⁶⁸⁵Institute for Genomic Medicine, Nationwide Children’s Hospital, Columbus, OH USA

⁶⁸⁶Department of Obstetrics and Gynecology, Division of Gynecologic Oncology, Mayo Clinic, Rochester, MN USA

⁶⁸⁷International Institute for Molecular Oncology, Poznań, Poland

⁶⁸⁸Poznan University of Medical Sciences, Poznań, Poland

⁶⁸⁹Genomics and Proteomics Core Facility High Throughput Sequencing Unit, German Cancer Research Center (DKFZ), Heidelberg, Germany

⁶⁹⁰NCCS-VARI Translational Research Laboratory, National Cancer Centre Singapore, Singapore, Singapore

⁶⁹¹Edison Family Center for Genome Sciences and Systems Biology, Washington University, St. Louis, MO USA

⁶⁹²MRC-University of Glasgow Centre for Virus Research, Glasgow, UK

⁶⁹³Department of Medical Informatics and Clinical Epidemiology, Division of Bioinformatics and Computational Biology, OHSU Knight Cancer Institute, Oregon Health and Science University, Portland, OR USA

⁶⁹⁴School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China

⁶⁹⁵Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD USA

⁶⁹⁶Department of Cancer Genome Informatics, Graduate School of Medicine, Osaka University, Osaka, Japan

⁶⁹⁷Institute of Computer Science, Heidelberg University, Heidelberg, Germany

⁶⁹⁸School of Mathematics and Statistics, University of Sydney, Sydney, NSW Australia

⁶⁹⁹Ben May Department for Cancer Research, University of Chicago, Chicago, IL USA

⁷⁰⁰Department of Human Genetics, University of Chicago, Chicago, IL USA

⁷⁰¹Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Medicine, New York, NY USA

⁷⁰²The First Affiliated Hospital, Xi’an Jiaotong University, Xi’an, China

⁷⁰³Department of Medicine and Therapeutics, The Chinese University of Hong Kong, Shatin, NT, Hong Kong China

⁷⁰⁴Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX USA

⁷⁰⁵Duke-NUS Medical School, Singapore, Singapore

⁷⁰⁶Department of Surgery, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai, China

⁷⁰⁷School of Computing Science, University of Glasgow, Glasgow, UK

⁷⁰⁸Division of Orthopaedic Surgery, Oslo University Hospital, Oslo, Norway

⁷⁰⁹Eastern Clinical School, Monash University, Melbourne, VIC Australia

⁷¹⁰Epworth HealthCare, Richmond, VIC Australia

⁷¹¹Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, MA USA

⁷¹²Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH USA

⁷¹³The Ohio State University Comprehensive Cancer Center (OSUCCC – James), Columbus, OH USA

⁷¹⁴The University of Texas School of Biomedical Informatics (SBMI) at Houston, Houston, TX USA

⁷¹⁵Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC USA

⁷¹⁶Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine, Northwestern University, Chicago, IL USA

⁷¹⁷Faculty of Medicine and Health, University of Sydney, Sydney, NSW Australia

⁷¹⁸Department of Pathology, Erasmus Medical Center Rotterdam, Rotterdam, GD The Netherlands

⁷¹⁹Division of Molecular Carcinogenesis, The Netherlands Cancer Institute, Amsterdam, CX The Netherlands

⁷²⁰Institute of Molecular Life Sciences and Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland

PMCID: PMC7025898 EMSID: EMS85186 PMID: 32025007

Abstract

Cancer is driven by genetic change, and the advent of massively parallel sequencing has enabled systematic documentation of this variation at the whole-genome scale^1–3. Here we report the integrative analysis of 2,658 whole-cancer genomes and their matching normal tissues across 38 tumour types from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). We describe the generation of the PCAWG resource, facilitated by international data sharing using compute clouds. On average, cancer genomes contained 4–5 driver mutations when combining coding and non-coding genomic elements; however, in around 5% of cases no drivers were identified, suggesting that cancer driver discovery is not yet complete. Chromothripsis, in which many clustered structural variants arise in a single catastrophic event, is frequently an early event in tumour evolution; in acral melanoma, for example, these events precede most somatic point mutations and affect several cancer-associated genes simultaneously. Cancers with abnormal telomere maintenance often originate from tissues with low replicative activity and show several mechanisms of preventing telomere attrition to critical levels. Common and rare germline variants affect patterns of somatic mutation, including point mutations, structural variants and somatic retrotransposition. A collection of papers from the PCAWG Consortium describes non-coding mutations that drive cancer beyond those in the TERT promoter⁴; identifies new signatures of mutational processes that cause base substitutions, small insertions and deletions and structural variation^5,6; analyses timings and patterns of tumour evolution⁷; describes the diverse transcriptional consequences of somatic mutation on splicing, expression levels, fusion genes and promoter activity^8,9; and evaluates a range of more-specialized features of cancer genomes^8,10–18.

Subject terms: Cancer genomics, Cancer genomics

The flagship paper of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium describes the generation of the integrative analyses of 2,658 cancer whole genomes and their matching normal tissues across 38 tumour types, the structures for international data sharing and standardized analyses, and the main scientific findings from across the consortium studies.

Main

Cancer is the second most-frequent cause of death worldwide, killing more than 8 million people every year; the incidence of cancer is expected to increase by more than 50% over the coming decades^19,20. ‘Cancer’ is a catch-all term used to denote a set of diseases characterized by autonomous expansion and spread of a somatic clone. To achieve this behaviour, the cancer clone must co-opt multiple cellular pathways that enable it to disregard the normal constraints on cell growth, modify the local microenvironment to favour its own proliferation, invade through tissue barriers, spread to other organs and evade immune surveillance²¹. No single cellular program directs these behaviours. Rather, there is a large pool of potential pathogenic abnormalities from which individual cancers draw their own combinations: the commonalities of macroscopic features across tumours belie a vastly heterogeneous landscape of cellular abnormalities.

This heterogeneity arises from the stochastic nature of Darwinian evolution. There are three preconditions for Darwinian evolution: characteristics must vary within a population; this variation must be heritable from parent to offspring; and there must be competition for survival within the population. In the context of somatic cells, heritable variation arises from mutations acquired stochastically throughout life, notwithstanding additional contributions from germline and epigenetic variation. A subset of these mutations alter the cellular phenotype, and a small subset of those variants confer an advantage on clones during the competition to escape the tight physiological controls wired into somatic cells. Mutations that provide a selective advantage to the clone are termed driver mutations, as opposed to selectively neutral passenger mutations.

Initial studies using massively parallel sequencing demonstrated the feasibility of identifying every somatic point mutation, copy-number change and structural variant (SV) in a given cancer^1–3. In 2008, recognizing the opportunity that this advance in technology provided, the global cancer genomics community established the ICGC with the goal of systematically documenting the somatic mutations that drive common tumour types²².

The pan-cancer analysis of whole genomes

The expansion of whole-genome sequencing studies from individual ICGC and TCGA working groups presented the opportunity to undertake a meta-analysis of genomic features across tumour types. To achieve this, the PCAWG Consortium was established. A Technical Working Group implemented the informatics analyses by aggregating the raw sequencing data from different working groups that studied individual tumour types, aligning the sequences to the human genome and delivering a set of high-quality somatic mutation calls for downstream analysis (Extended Data Fig. 1). Given the recent meta-analysis of exome data from the TCGA Pan-Cancer Atlas^23–25, scientific working groups concentrated their efforts on analyses best-informed by whole-genome sequencing data.

Extended Data Fig. 1 — After alignment to the genome, somatic mutations were identified by three pipelines, with subsequent merging into a consensus variant set used for downstream scientific analyses. Subs, substitutions; DKFZ/EMBL, the German Cancer Research Centre (DKFZ) and Europen Molecular Biology Laboratory (EMBL).

We collected genome data from 2,834 donors (Extended Data Table 1), of which 176 were excluded after quality assurance. A further 75 had minor issues that could affect some of the analyses (grey-listed donors) and 2,583 had data of optimal quality (white-listed donors) (Supplementary Table 1). Across the 2,658 white- and grey-listed donors, whole-genome sequencing data were available from 2,605 primary tumours and 173 metastases or local recurrences. Mean read coverage was 39× for normal samples, whereas tumours had a bimodal coverage distribution with modes at 38× and 60× (Supplementary Fig. 1). RNA-sequencing data were available for 1,222 donors. The final cohort comprised 1,469 men (55%) and 1,189 women (45%), with a mean age of 56 years (range, 1–90 years) across 38 tumour types (Extended Data Table 1 and Supplementary Table 1).

Extended Data Table 1.

Overview of the tumour types included in PCAWG project

Open in a new tab

Adeno., adenocarcinoma; Ca., carcinoma; Comb., combined; F, female; HCC, hepatocellular carcinoma; M, male; Med, median; 10–90th, 10–90th centiles; SCC, squamous cell carcinoma.

To identify somatic mutations, we analysed all 6,835 samples using a uniform set of algorithms for alignment, variant calling and quality control (Extended Data Fig. 1, Supplementary Fig. 2 and Supplementary Methods 2). We used three established pipelines to call somatic single-nucleotide variations (SNVs), small insertions and deletions (indels), copy-number alterations (CNAs) and SVs. Somatic retrotransposition events, mitochondrial DNA mutations and telomere lengths were also called by bespoke algorithms. RNA-sequencing data were uniformly processed to call transcriptomic alterations. Germline variants identified by the three separate pipelines included single-nucleotide polymorphisms, indels, SVs and mobile-element insertions (Supplementary Table 2).

The requirement to uniformly realign and call variants on approximately 5,800 whole genomes presented considerable computational challenges, and raised ethical issues owing to the use of data from different jurisdictions (Extended Data Table 2). We used cloud computing^26,27 to distribute alignment and variant calling across 13 data centres on 3 continents (Supplementary Table 3). Core pipelines were packaged into Docker containers²⁸ as reproducible, stand-alone packages, which we have made available for download. Data repositories for raw and derived datasets, together with portals for data visualization and exploration, have also been created (Box 1 and Supplementary Table 4).

Extended Data Table 2.

Ethical considerations of genomic cloud computing

Open in a new tab

Box 1  Online resources for data access, visualization and analysis.

The PCAWG landing page (http://docs.icgc.org/pcawg) provides links to several data resources for interactive online browsing, analysis and download of PCAWG data and results (Supplementary Table 4).

Direct download of PCAWG data

Aligned PCAWG read data in BAM format are also available at the European Genome Phenome Archive (EGA; https://www.ebi.ac.uk/ega/search/site/pcawg under accession number EGAS00001001692). In addition, all open-tier PCAWG genomics data, as well as reference datasets used for analysis, can be downloaded from the ICGC Data Portal at http://docs.icgc.org/pcawg/data/. Controlled-tier genomic data, including SNVs and indels that originated from TCGA projects (in VCF format) and aligned reads (in BAM format) can be downloaded using the Score (https://www.overture.bio/) software package, which has accelerated and secure file transfer, as well as BAM slicing facilities to selectively download defined regions of genomic alignments.

PCAWG computational pipelines

The core alignment, somatic variant-calling, quality-control and variant consensus-generation pipelines used by PCAWG have each been packaged into portable cross-platform images using the Dockstore system⁸⁴ and released under an Open Source licence that enables unrestricted use and redistribution. All PCAWG Dockstore images are available to the public at https://dockstore.org/organizations/PCAWG/collections/PCAWG.

ICGC Data Portal

The ICGC Data Portal⁸⁵ (https://dcc.icgc.org) serves as the main entry point for accessing PCAWG datasets with a single uniform web interface and a high-performance data-download client. This uniform interface provides users with easy access to the myriad of PCAWG sequencing data and variant calls that reside in many repositories and compute clouds worldwide. Streaming technology⁸⁶ provides users with high-level visualizations in real time of BAM and VCF files stored remotely on the Cancer Genome Collaboratory.

UCSC Xena

UCSC Xena⁸⁷ (https://pcawg.xenahubs.net) visualizes all PCAWG primary results, including copy-number, gene-expression, gene-fusion and promoter-usage alterations, simple somatic mutations, large somatic structural variations, mutational signatures and phenotypic data. These open-access data are available through a public Xena hub, and consensus simple somatic mutations can be loaded to the local computer of a user via a private Xena hub. Kaplan–Meier plots, histograms, box plots, scatter plots and transcript-specific views offer additional visualization options and statistical analyses.

The Expression Atlas

The Expression Atlas (https://www.ebi.ac.uk/gxa/home) contains RNA-sequencing and expression microarray data for querying gene expression across tissues, cell types, developmental stages and/or experimental conditions⁸⁸. Two different views of the data are provided: summarized expression levels for each tumour type and gene expression at the level of individual samples, including reference-gene expression datasets for matching normal tissues.

PCAWG Scout

PCAWG Scout (http://pcawgscout.bsc.es/) provides a framework for -omics workflow and website templating to generate on-demand, in-depth analyses of the PCAWG data that are openly available to the whole research community. Views of protected data are available that still safeguard sensitive data. Through the PCAWG Scout web interface, users can access an array of reports and visualizations that leverage on-demand bioinformatic computing infrastructure to produce results in real time, allowing users to discover trends as well as form and test hypotheses.

Chromothripsis Explorer

Chromothripsis Explorer (http://compbio.med.harvard.edu/chromothripsis/) is a portal that allows structural variation in the PCAWG dataset to be explored on an individual patient basis through the use of circos plots. Patterns of chromothripsis can also be explored in aggregated formats.

Benchmarking of genetic variant calls

To benchmark mutation calling, we ran the 3 core pipelines, together with 10 additional pipelines, on 63 representative tumour–normal genome pairs (Supplementary Note 1). For 50 of these cases, we performed validation by hybridization of tumour and matched normal DNA to a custom bait set with deep sequencing²⁹. The 3 core somatic variant-calling pipelines had individual estimates of sensitivity of 80–90% to detect a true somatic SNV called by any of the 13 pipelines; more than 95% of SNV calls made by each of the core pipelines were genuine somatic variants (Fig. 1a). For indels—a more-challenging class of variants to identify with short-read sequencing—the 3 core algorithms had individual sensitivity estimates in the range of 40–50%, with precision of 70–95% (Fig. 1b). For individual SV algorithms, we estimated precision to be in the range 80–95% for samples in the 63-sample pilot dataset.

Fig. 1 — a, Scatter plot of estimated sensitivity and precision for somatic SNVs across individual algorithms assessed in the validation exercise across n = 63 PCAWG samples. Core algorithms included in the final PCAWG call set are shown in blue. b, Sensitivity and precision estimates across individual algorithms for somatic indels. c, Accuracy (precision, sensitivity and F₁ score, defined as 2 × sensitivity × precision/(sensitivity + precision)) of somatic SNV calls across variant allele fractions (VAFs) for the core algorithms. The accuracy of two methods of combining variant calls (two-plus, which was used in the final dataset, and logistic regression) is also shown. d, Accuracy of indel calls across variant allele fractions.

Next, we defined a strategy to merge results from the three pipelines into one final call-set to be used for downstream scientific analyses (Methods and Supplementary Note 2). Sensitivity and precision of consensus somatic variant calls were 95% (90% confidence interval, 88–98%) and 95% (90% confidence interval, 71–99%), respectively, for SNVs (Extended Data Fig. 2). For somatic indels, sensitivity and precision were 60% (34–72%) and 91% (73–96%), respectively (Extended Data Fig. 2). Regarding somatic SVs, we estimate the sensitivity of merged calls to be 90% for true calls generated by any one pipeline; precision was estimated as 97.5%. The improvement in calling accuracy from combining different pipelines was most noticeable in variants with low variant allele fractions, which probably originate from tumour subclones (Fig. 1c, d). Germline variant calls, phased using a haplotype-reference panel, displayed a precision of more than 99% and a sensitivity of 92–98% (Supplementary Note 2).

Extended Data Fig. 2 — a, F₁ accuracy, precision and sensitivity estimates for somatic SNVs across the core algorithms and different approaches to merging the call-sets. The box plots demarcate the interquartile range and median of estimates across the n = 50 samples in the validation dataset. b, F₁ accuracy, precision and sensitivity estimates for somatic indels (n = 50 samples). SVM, support vector machine; union, calls made by all variant-calling algorithms; intersect2, calls made by any combination of two variant-calling algorithms; intersect3, calls made by any three variant-calling algorithms.

Analysis of PCAWG data

The uniformly generated, high-quality set of variant calls across more than 2,500 donors provided the springboard for a series of scientific working groups to explore the biology of cancer. A comprehensive suite of companion papers that describe the analyses and discoveries across these thematic areas is copublished with this paper^4–18 (Extended Data Table 3).

Extended Data Table 3.

Scientific output using PCAWG data, in bite-size chunks

Open in a new tab

Key findings are described further in associated papers^4–18.

Pan-cancer burden of somatic mutations

Across the 2,583 white-listed PCAWG donors, we called 43,778,859 somatic SNVs, 410,123 somatic multinucleotide variants, 2,418,247 somatic indels, 288,416 somatic SVs, 19,166 somatic retrotransposition events and 8,185 de novo mitochondrial DNA mutations (Supplementary Table 1). There was considerable heterogeneity in the burden of somatic mutations across patients and tumour types, with a broad correlation in mutation burden among different classes of somatic variation (Extended Data Fig. 3). Analysed at a per-patient level, this correlation held, even when considering tumours with similar purity and ploidy (Supplementary Fig. 3). Why such correlation should apply on a pan-cancer basis is unclear. It is likely that age has some role, as we observe a correlation between most classes of somatic mutation and age at diagnosis (around 190 SNVs per year, P = 0.02; about 22 indels per year, P = 5 × 10⁻⁵; 1.5 SVs per year, P < 2 × 10⁻¹⁶; linear regression with likelihood ratio tests; Supplementary Fig. 4). Other factors are also likely to contribute to the correlations among classes of somatic mutation, as there is evidence that some DNA-repair defects can cause multiple types of somatic mutation³⁰, and a single carcinogen can cause a range of DNA lesions³¹.

Extended Data Fig. 3 — The y axis is on a log scale. The 2,583 donors with the highest quality metrics (white-listed donors) are plotted. SNVs indicate substitutions; indels are taken as insertions or deletions <100 bp in size; retrotranspositions are the combined counts of somatic retrotransposon insertions, transductions and somatic pseudogene insertions.

Panorama of driver mutations in cancer

We extracted the subset of somatic mutations in PCAWG tumours that have high confidence to be driver events on the basis of current knowledge. One challenge to pinpointing the specific driver mutations in an individual tumour is that not all point mutations in recurrently mutated cancer-associated genes are drivers³². For genomic elements significantly mutated in PCAWG data, we developed a ‘rank-and-cut’ approach to identify the probable drivers (Supplementary Methods 8.1). This approach works by ranking the observed mutations in a given genomic element based on recurrence, estimated functional consequence and expected pattern of drivers in that element. We then estimate the excess burden of somatic mutations in that genomic element above that expected for the background mutation rate, and cut the ranked mutations at this level. Mutations in each element with the highest driver ranking were then assigned as probable drivers; those below the threshold will probably have arisen through chance and were assigned as probable passengers. Improvements to features that are used to rank the mutations and the methods used to measure them will contribute to further development of the rank-and-cut approach.

We also needed to account for the fact that some bona fide cancer genomic elements were not rediscovered in PCAWG data because of low statistical power. We therefore added previously known cancer-associated genes to the discovery set, creating a ‘compendium of mutational driver elements’ (Supplementary Methods 8.2). Then, using stringent rules to nominate driver point mutations that affect these genomic elements on the basis of prior knowledge³³, we separated probable driver from passenger point mutations. To cover all classes of variant, we also created a compendium of known driver SVs, using analogous rules to identify which somatic CNAs and SVs are most likely to act as drivers in each tumour. For probable pathogenic germline variants, we identified all truncating germline point mutations and SVs that affect high-penetrance germline cancer-associated genes.

This analysis defined a set of mutations that we could confidently assert, based on current knowledge, drove tumorigenesis in the more than 2,500 tumours of PCAWG. We found that 91% of tumours had at least one identified driver mutation, with an average of 4.6 drivers per tumour identified, showing extensive variation across cancer types (Fig. 2a). For coding point mutations, the average was 2.6 drivers per tumour, similar to numbers estimated in known cancer-associated genes in tumours in the TCGA using analogous approaches³².

Fig. 2 — a, Top, putative driver mutations in PCAWG, represented as a circos plot. Each sector represents a tumour in the cohort. From the periphery to the centre of the plot the concentric rings represent: (1) the total number of driver alterations; (2) the presence of whole-genome (WG) duplication; (3) the tumour type; (4) the number of driver CNAs; (5) the number of driver genomic rearrangements; (6) driver coding point mutations; (7) driver non-coding point mutations; and (8) pathogenic germline variants. Bottom, snapshots of the panorama of driver mutations. The horizontal bar plot (left) represents the proportion of patients with different types of drivers. The dot plot (right) represents the mean number of each type of driver mutation across tumours with at least one event (the square dot) and the standard deviation (grey whiskers), based on n = 2,583 patients. b, Genomic elements targeted by different types of mutations in the cohort altered in more than 65 tumours. Both germline and somatic variants are included. Left, the heat map shows the recurrence of alterations across cancer types. The colour indicates the proportion of mutated tumours and the number indicates the absolute count of mutated tumours. Right, the proportion of each type of alteration that affects each genomic element. c, Tumour-suppressor genes with biallelic inactivation in 10 or more patients. The values included under the gene labels represent the proportions of patients who have biallelic mutations in the gene out of all patients with a somatic mutation in that gene. GR, genomic rearrangement; SCNA, somatic copy-number alteration; SGR, somatic genome rearrangement; TSG, tumour suppressor gene; UTR, untranslated region.

To address the frequency of non-coding driver point mutations, we combined promoters and enhancers that are known targets of non-coding drivers^34–37 with those newly discovered in PCAWG data; this is reported in a companion paper⁴. Using this approach, only 13% (785 out of 5,913) of driver point mutations were non-coding in PCAWG. Nonetheless, 25% of PCAWG tumours bear at least one putative non-coding driver point mutation, and one third (237 out of 785) affected the TERT promoter (9% of PCAWG tumours). Overall, non-coding driver point mutations are less frequent than coding driver mutations. With the exception of the TERT promoter, individual enhancers and promoters are only infrequent targets of driver mutations⁴.

Across tumour types, SVs and point mutations have different relative contributions to tumorigenesis. Driver SVs are more prevalent in breast adenocarcinomas (6.4 ± 3.7 SVs (mean ± s.d.) compared with 2.2 ± 1.3 point mutations; P < 1 × 10⁻¹⁶, Mann–Whitney U-test) and ovary adenocarcinomas (5.8 ± 2.6 SVs compared with 1.9 ± 1.0 point mutations; P < 1 × 10⁻¹⁶), whereas driver point mutations have a larger contribution in colorectal adenocarcinomas (2.4 ± 1.4 SVs compared with 7.4 ± 7.0 point mutations; P = 4 × 10⁻¹⁰) and mature B cell lymphomas (2.2 ± 1.3 SVs compared with 6 ± 3.8 point mutations; P < 1 × 10⁻¹⁶), as previously shown³⁸. Across tumour types, there are differences in which classes of mutation affect a given genomic element (Fig. 2b).

We confirmed that many driver mutations that affect tumour-suppressor genes are two-hit inactivation events (Fig. 2c). For example, of the 954 tumours in the cohort with driver mutations in TP53, 736 (77%) had both alleles mutated, 96% of which (707 out of 736) combined a somatic point mutation that affected one allele with somatic deletion of the other allele. Overall, 17% of patients had rare germline protein-truncating variants (PTVs) in cancer-predisposition genes³⁹, DNA-damage response genes⁴⁰ and somatic driver genes. Biallelic inactivation due to somatic alteration on top of a germline PTV was observed in 4.5% of patients overall, with 81% of these affecting known cancer-predisposition genes (such as BRCA1, BRCA2 and ATM).

PCAWG tumours with no apparent drivers

Although more than 90% of PCAWG cases had identified drivers, we found none in 181 tumours (Extended Data Fig. 4a). Reasons for missing drivers have not yet been systematically evaluated in a pan-cancer cohort, and could arise from either technical or biological causes.

Technical explanations could include poor-quality samples, inadequate sequencing or failures in the bioinformatic algorithms used. We assessed the quality of the samples and found that 4 of the 181 cases with no known drivers had more than 5% tumour DNA contamination in their matched normal sample (Fig. 3a). Using an algorithm designed to correct for this contamination⁴¹, we identified previously missed mutations in genes relevant to the respective cancer types. Similarly, if the fraction of tumour cells in the cancer sample is low through stromal contamination, the detection of driver mutations can be impaired. Most tumours with no known drivers had an average power to detect mutations close to 100%; however, a few had power in the 70–90% range (Fig. 3b and Extended Data Fig. 4b). Even in adequately sequenced genomes, lack of read depth at specific driver loci can impair mutation detection. For example, only around 50% of PCAWG tumours had sufficient coverage to call a mutation (≥90% power) at the two TERT promoter hotspots, probably because the high GC content of this region causes biased coverage (Fig. 3c). In fact, 6 hepatocellular carcinomas and 2 biliary cholangiocarcinomas among the 181 cases with no known drivers actually did contain TERT mutations, which were discovered after deep targeted sequencing⁴².

Fig. 3 — a, Individual estimates of the percentage of tumour-in-normal contamination across patients with no driver mutations in PCAWG (n = 181). No data were available for myelodysplastic syndromes and acute myeloid leukaemia. Points represent estimates for individual patients, and the coloured areas are estimated density distributions (violin plots). Abbreviations of the tumour types are defined in Extended Data Table 1. b, Average detection sensitivity by tumour type for tumours without known drivers (n = 181). Each dot represents a given sample and is the average sensitivity of detecting clonal substitutions across the genome, taking into account purity and ploidy. Coloured areas are estimated density distributions, shown for cohorts with at least five cases. c, Detection sensitivity for *TERT* promoter hotspots in tumour types in which *TERT* is frequently mutated. Coloured areas are estimated density distributions. d, Significant copy-number losses identified by two-sided hypothesis testing using GISTIC2.0, corrected for multiple-hypothesis testing. Numbers in parentheses indicate the number of genes in significant regions when analysing medulloblastomas without known drivers (n = 42). Significant regions with known cancer-associated genes are labelled with the representative cancer-associated gene. e, Aneuploidy in chromophobe renal cell carcinomas and pancreatic neuroendocrine tumours without known drivers. Patients are ordered on the y axis by tumour type and then by presence of whole-genome duplication (bottom) or not (top).

Finally, technical reasons for missing driver mutations include failures in the bioinformatic algorithms. This affected 35 myeloproliferative neoplasms in PCAWG, in which the JAK2^V617F driver mutation should have been called. Our somatic variant-calling algorithms rely on ‘panels of normals’, typically from blood samples, to remove recurrent sequencing artefacts. As 2–5% of healthy individuals carry occult haematopoietic clones⁴³, recurrent driver mutations in these clones can enter panels of normals.

With regard to biological causes, tumours may be driven by mutations in cancer-associated genes that are not yet described for that tumour type. Using driver discovery algorithms on tumours with no known drivers, no individual genes reached significance for point mutations. However, we identified a recurrent CNA that spanned SETD2 in medulloblastomas that lacked known drivers (Fig. 3d), indicating that restricting hypothesis testing to missing-driver cases can improve power if undiscovered genes are enriched in such tumours. Inactivation of SETD2 in medulloblastoma significantly decreased gene expression (P = 0.002) (Extended Data Fig. 4c). Notably, SETD2 mutations occurred exclusively in medulloblastoma group-4 tumours (P < 1 × 10⁻⁴). Group-4 medulloblastomas are known for frequent mutations in other chromatin-modifying genes⁴⁴, and our results suggest that SETD2 loss of function is an additional driver that affects chromatin regulators in this subgroup.

Two tumour types had a surprisingly high fraction of patients without identified driver mutations: chromophobe renal cell carcinoma (44%; 19 out of 43) and pancreatic neuroendocrine cancers (22%; 18 out of 81) (Extended Data Fig. 4a). A notable feature of the missing-driver cases in both tumour types was a remarkably consistent profile of chromosomal aneuploidy—patterns that have previously been reported^45,46 (Fig. 3e). The absence of other identified driver mutations in these patients raises the possibility that certain combinations of whole-chromosome gains and losses may be sufficient to initiate a cancer in the absence of more-targeted driver events such as point mutations or fusion genes of focal CNAs.

Even after accounting for technical issues and novel drivers, 5.3% of PCAWG tumours still had no identifiable driver events. In a research setting, in which we are interested in drawing conclusions about populations of patients, the consequences of technical issues that affect occasional samples will be mitigated by sample size. In a clinical setting, in which we are interested in the driver mutations in a specific patient, these issues become substantially more important. Careful and critical appraisal of the whole pipeline—including sample acquisition, genome sequencing, mapping, variant calling and driver annotation, as done here—should be required for laboratories that offer clinical sequencing of cancer genomes.

Patterns of clustered mutations and SVs

Some somatic mutational processes generate multiple mutations in a single catastrophic event, typically clustered in genomic space, leading to substantial reconfiguration of the genome. Three such processes have previously been described: (1) chromoplexy, in which repair of co-occurring double-stranded DNA breaks—typically on different chromosomes—results in shuffled chains of rearrangements^47,48 (Extended Data Fig. 5a); (2) kataegis, a focal hypermutation process that leads to locally clustered nucleotide substitutions, biased towards a single DNA strand^49–51 (Extended Data Fig. 5b); and (3) chromothripsis, in which tens to hundreds of DNA breaks occur simultaneously, clustered on one or a few chromosomes, with near-random stitching together of the resulting fragments^52–55 (Extended Data Fig. 5c). We characterized the PCAWG genomes for these three processes (Fig. 4).

Extended Data Fig. 5 — a, Chromoplexy example in a thyroid adenocarcinoma. Genes at the breakpoints are schematically depicted in their normal genomic context and again in the reconstructed derivative chromosomes below. b, Distinct kataegis signatures in the genome of a pancreatic adenocarcinoma sample. SVs and their classification are shown above the main rainfall plot, as well as the total and minor allele copy number. Tra, translocation; del, deletion; dup, duplication; t2tInv, tail-to-tail inversion; h2hInv, head-to-head inversion. Magnifications of the three foci on chromosomes 1, 8 and 12, respectively, highlight distinct manifestations of kataegis. Left, a novel process similar to signature 17 with T > N mutations at CT or TT dinucleotides. Middle, the prototypical APOBEC3A/B type with C > T (signature 2) and/or C>G/A (signature 13) substitutions at TpC. Right, an alternative cytidine deaminase(s) with a preference for substitutions at C/GpC. Most of the SNVs in each of these foci can be phased to the same allele and no evidence of anti-phasing is observed. c, Example of a chromothripsis event in a melanoma. The black points (top) represent copy-number estimates from individual genomic bins, with SVs shown as coloured arcs (translocation in black, deletion in purple, duplication in brown, tail-to-tail inversion in cyan, head-to-head inversion in green) that mostly demarcate copy-number changes. The mate chromosomes are displayed above translocations. Bottom, the variant allele fractions of somatic mutations distributed along the relevant chromosomal region.

Chromoplexy events and reciprocal translocations were identified in 467 (17.8%) samples (Fig. 4a, c). Chromoplexy was prominent in prostate adenocarcinoma and lymphoid malignancies, as previously described^47,48, and—unexpectedly—thyroid adenocarcinoma. Different genomic loci were recurrently rearranged by chromoplexy across the three tumour types, mediated by positive selection for particular fusion genes or enhancer-hijacking events. Of 13 fusion genes or enhancer hijacking events in 48 thyroid adenocarcinomas, at least 4 (31%) were caused by chromoplexy, with a further 4 (31%) part of complexes that contained chromoplexy footprints (Extended Data Fig. 5a). These events generated fusion genes that involved RET (two cases) and NTRK3 (one case)⁵⁶, and the juxtaposition of the oncogene IGF2BP3 with regulatory elements from highly expressed genes (five cases).

Kataegis events were found in 60.5% of all cancers, with particularly high abundance in lung squamous cell carcinoma, bladder cancer, acral melanoma and sarcomas (Fig. 4a, b). Typically, kataegis comprises C > N mutations in a TpC context, which are probably caused by APOBEC activity^49–51, although a T > N conversion in a TpT or CpT process (the affected T is highlighted in bold) attributed to error-prone polymerases has recently been described⁵⁷. The APOBEC signature accounted for 81.7% of kataegis events and correlated positively with APOBEC3B expression levels, somatic SV burden and age at diagnosis (Supplementary Fig. 5). Furthermore, 5.7% of kataegis events involved the T > N error-prone polymerase signature and 2.3% of events, most notably in sarcomas, showed cytidine deamination in an alternative GpC or CpC context.

Kataegis events were frequently associated with somatic SV breakpoints (Fig. 4a and Supplementary Fig. 6a), as previously described^50,51. Deletions and complex rearrangements were most-strongly associated with kataegis, whereas tandem duplications and other simple SV classes were only infrequently associated (Supplementary Fig. 6b). Kataegis inducing predominantly T > N mutations in CpTpT context was enriched near deletions, specifically those in the 10–25-kilobase (kb) range (Supplementary Fig. 6c).

Samples with extreme kataegis burden (more than 30 foci) comprise four types of focal hypermutation (Extended Data Fig. 6): (1) off-target somatic hypermutation and foci of T > N at CpTpT, found in B cell non-Hodgkin lymphoma and oesophageal adenocarcinomas, respectively; (2) APOBEC kataegis associated with complex rearrangements, notably found in sarcoma and melanoma; (3) rearrangement-independent APOBEC kataegis on the lagging strand and in early-replicating regions, mainly found in bladder and head and neck cancer; and (4) a mix of the last two types. Kataegis only occasionally led to driver mutations (Supplementary Table 5).

We identified chromothripsis in 587 samples (22.3%), most frequently among sarcoma, glioblastoma, lung squamous cell carcinoma, melanoma and breast adenocarcinoma¹⁸. Chromothripsis increased with whole-genome duplications in most cancer types (Extended Data Fig. 7a), as previously shown in medulloblastoma⁵⁸. The most recurrently associated driver was TP53⁵² (pan-cancer odds ratio = 3.22; pan-cancer P = 8.3 × 10⁻³⁵; q < 0.05 in breast lobular (odds ratio = 13), colorectal (odds ratio = 25), prostate (odds ratio = 2.6) and hepatocellular (odds ratio = 3.9) cancers; Fisher–Boschloo tests). In two cancer types (osteosarcoma and B cell lymphoma), women had a higher incidence of chromothripsis than men (Extended Data Fig. 7b). In prostate cancer, we observed a higher incidence of chromothripsis in patients with late-onset than early-onset disease⁵⁹ (Extended Data Fig. 7c).

Extended Data Fig. 7 — a, Odds ratios per cancer type of containing chromothripsis in whole-genome duplicated versus diploid samples (n = 2,583 patients). ***q < 0.001; **q < 0.01; *q < 0.05. Two-sided hypothesis testing was performed using Fisher–Boschloo tests, corrected for multiple-hypothesis testing. b, Same as a for female versus male. c, Proportion of mutations explained by single-base substitution signature 1 and age at diagnosis in prostate cancer samples (n = 210 patients) with or without chromothripsis (q < 0.05). The early-onset prostate cancer project drives the signal and was sequenced at lower depth. For the box-and-whisker plots, the box denotes the interquartile range, with the median marked as a horizontal line. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Two-sided hypothesis testing was performed using Mann–Whitney U-tests. d, Counts of co-occurrence of chromothripsis with amplification (blue) and homozygous deletions (red) in driver regions: observed (thick line) versus randomized (shaded area and thin line). The cumulative number of drivers that were hit is plotted as a function of the number of times those drivers were hit. e, For each sample in which chromothripsis coincided with a driver event in those genes, we show the fold change in gene expression compared to the median expression of the gene in non-chromothripsis samples of the same cancer type, coloured by cancer type and shaped by the type of driver event. We show with added transparency the fold changes calculated the same way for samples with driver mutations hitting the same driver genes, but that had no evidence of chromothripsis. Analysis is based on n = 1,222 patients with RNA-sequencing data. f, Enrichment of co-occurrence of chromothripsis with driver events. The x axis shows the association of chromothripsis with a driver in a given cancer type compared with its rate of association with that driver in all other cancer types. The y axis shows the association of chromothripsis with a driver in a given cancer type compared with its rate of association with all other drivers in that type. Exact binomial tests are used and P values are corrected for multiple testing according to the Benjamini–Hochberg method.

Chromothripsis regions coincided with 3.6% of all identified drivers in PCAWG and around 7% of copy-number drivers (Fig. 4d). These proportions are considerably enriched compared to expectation if selection were not acting on these events (Extended Data Fig. 7d). The majority of coinciding driver events were amplifications (58%), followed by homozygous deletions (34%) and SVs within genes or promoter regions (8%). We frequently observed a ≥2-fold increase or decrease in expression of amplified or deleted drivers, respectively, when these loci were part of a chromothripsis event, compared with samples without chromothripsis (Extended Data Fig. 7e).

Chromothripsis manifested in diverse patterns and frequencies across tumour types, which we categorized on the basis of five characteristics (Fig. 4a). In liposarcoma, for example, chromothripsis events often involved multiple chromosomes, with universal MDM2 amplification⁶⁰ and co-amplification of TERT in 4 of 19 cases (Fig. 4d). By contrast, in glioblastoma the events tended to affect a smaller region on a single chromosome that was distant from the telomere, resulting in focal amplification of EGFR and MDM2 and loss of CDKN2A. Acral melanomas frequently exhibited CCND1 amplification, and lung squamous cell carcinomas SOX2 amplifications. In both cases, these drivers were more-frequently altered by chromothripsis compared with other drivers in the same cancer type and to other cancer types for the same driver (Fig. 4d and Extended Data Fig. 7f). Finally, in chromophobe renal cell carcinoma, chromothripsis nearly always affected chromosome 5 (Supplementary Fig. 7): these samples had breakpoints immediately adjacent to TERT, increasing TERT expression by 80-fold on average compared with samples without rearrangements (P = 0.0004; Mann–Whitney U-test).

Timing clustered mutations in evolution

An unanswered question for clustered mutational processes is whether they occur early or late in cancer evolution. To address this, we used molecular clocks to define broad epochs in the life history of each tumour^49,61. One transition point is between clonal and subclonal mutations: clonal mutations occurred before, and subclonal mutations after, the emergence of the most-recent common ancestor. In regions with copy-number gains, molecular time can be further divided according to whether mutations preceded the copy-number gain (and were themselves duplicated) or occurred after the gain (and therefore present on only one chromosomal copy)⁷.

Chromothripsis tended to have greater relative odds of being clonal than subclonal, suggesting that it occurs early in cancer evolution, especially in liposarcomas, prostate adenocarcinoma and squamous cell lung cancer (Fig. 5a). As previously reported, chromothripsis was especially common in melanomas⁶². We identified 89 separate chromothripsis events that affected 66 melanomas (61%); 47 out of 89 events affected genes known to be recurrently altered in melanoma⁶³ (Supplementary Table 6). Involvement of a region on chromosome 11 that includes the cell-cycle regulator CCND1 occurred in 21 cases (10 out of 86 cutaneous, and 11 out of 21 acral or mucosal melanomas), typically combining chromothripsis with amplification (19 out of 21 cases) (Extended Data Fig. 8). Co-involvement of other cancer-associated genes in the same chromothripsis event was also frequent, including TERT (five cases), CDKN2A (three cases), TP53 (two cases) and MYC (two cases) (Fig. 5b). In these co-amplifications, a chromothripsis event involving multiple chromosomes initiated the process, creating a derivative chromosome in which hundreds of fragments were stitched together in a near-random order (Fig. 5b). This derivative then rearranged further, leading to massive co-amplification of the multiple target oncogenes together with regions located nearby on the derivative chromosome.

Fig. 5 — a, Extent and timing of chromothripsis, kataegis and chromoplexy across PCAWG. Top, stacked bar charts illustrate co-occurrence of chromothripsis, kataegis and chromoplexy in the samples. Middle, relative odds of clustered events being clonal or subclonal are shown with bootstrapped 95% confidence intervals. Point estimates are highlighted when they do not overlap odds of 1:1. Bottom, relative odds of the events being early or late clonal are shown as above. Sample sizes (number of patients) are shown across the top. b, Three representative patients with acral melanoma and chromothripsis-induced amplification that simultaneously affects *TERT* and *CCND1*. The black points (top) represent sequence coverage from individual genomic bins, with SVs shown as coloured arcs (translocation in black, deletion in purple, duplication in brown, tail-to-tail inversion in cyan and head-to-head inversion in green). Bottom, the variant allele fractions of somatic point mutations.

Extended Data Fig. 8 — a, Examples of amplifications that occurred early in the development of melanoma. The black points (top) represent copy-number estimates from individual genomic bins, with SVs shown as coloured arcs (translocation in black, deletion in purple, duplication in brown, tail-to-tail inversion in cyan and head-to-head inversion in green) that mostly demarcate copy-number changes. Bottom, the variant allele fractions of SNVs distributed along the relevant chromosomal region. The paucity of somatic mutations at high variant allele fractions in the most-heavily amplified regions indicates that these amplifications began very early in tumour evolution, before the lineage had had opportunity to acquire many SNVs. b, Example of an amplification that occurred late in melanoma development. The large numbers of somatic mutations at high variant allele fractions in the most-heavily amplified regions indicate that these amplifications began late in tumour evolution, after the lineage had already acquired many SNVs.

In these cases of amplified chromothripsis, we can use the inferred number of copies bearing each SNV to time the amplification process. SNVs present on the chromosome before amplification will themselves be amplified and are therefore reported in a high fraction of sequence reads (Fig. 5b and Extended Data Fig. 8). By contrast, late SNVs that occur after the amplification has concluded will be present on only one chromosome copy out of many, and thus have a low variant allele fraction. Regions of CCND1 amplification had few—sometimes zero—mutations at high variant allele fraction in acral melanomas, in contrast to later CCND1 amplifications in cutaneous melanomas, in which hundreds to thousands of mutations typically predated amplification (Fig. 5b and Extended Data Fig. 9a, b). Thus, both chromothripsis and the subsequent amplification generally occurred very early during the evolution of acral melanoma. By comparison, in lung squamous cell carcinomas, similar patterns of chromothripsis followed by SOX2 amplification are characterized by many amplified SNVs, suggesting a later event in the evolution of these cancers (Extended Data Fig. 9c).

Extended Data Fig. 9 — a, Copy-number plot of chromothriptic regions categorized as ‘liposarc-like’ in five acral melanomas with *CCND1* amplification. Segments indicate the copy number of the major allele. Points represent SNV multiplicities, that is, the estimated number of copies carrying each SNV, coloured by base change and shaped by strand. Small vertical arrows link SNVs to their corresponding copy-number segment. Kataegis foci are shown within black boxes and show typical strand specificities (all triangles or all circles), similar multiplicities and base changes of signatures 2 and 13 (red and black, respectively). A coloured bar (top right) represents the molecular timing of the amplification (red bar; high is early, low is late) and is coloured by the fraction of total SNVs assigned to the following timing categories: clonal [early], clonal mutations that occurred before duplications involving the relevant chromosome (including whole-genome duplications); clonal [late], clonal mutations that occurred after such duplications; and clonal [NA], mutations that occurred when no duplication was observed. b, Same as a in two cutaneous melanomas, one shows early amplification, the other late amplification. c, Same as a, b, for three lung squamous cell carcinomas and late amplification of *SOX2*.

Notably, in cancer types in which the mutational load was sufficiently high, we could detect a larger-than-expected number of SNVs on an intermediate number of DNA copies, suggesting that they appeared during the amplification process (Supplementary Fig. 8).

Germline effects on somatic mutations

We integrated the set of 88 million germline genetic variant calls with somatic mutations in PCAWG, to study germline determinants of somatic mutation rates and patterns. First, we performed a genome-wide association study of somatic mutational processes with common germline variants (minor allele frequency (MAF) > 5%) in individuals with inferred European ancestry. An independent genome-wide association study was performed in East Asian individuals from Asian cancer genome projects. We focused on two prevalent endogenous mutational processes: spontaneous deamination of 5-methylcytosine at CpG dinucleotides⁵ (signature 1) and activity of the APOBEC3 family of cytidine deaminases⁶⁴ (signatures 2 and 13). No locus reached genome-wide significance (P < 5 × 10⁻⁸) for signature 1 (Extended Data Fig. 10a, b). However, a locus at 22q13.1 predicted an APOBEC3B-like mutagenesis at the pan-cancer level⁶⁵ (Fig. 6a). The strongest signal at 22q13.1 was driven by rs12628403, and the minor (non-reference) allele was protective against APOBEC3B-like mutagenesis (β = −0.43, P = 5.6 × 10⁻⁹, MAF = 8.2%, n = 1,201 donors) (Extended Data Fig. 10c). This variant tags a common, approximately 30-kb germline SV that deletes the APOBEC3B coding sequence and fuses the APOBEC3B 3′ untranslated region with the coding sequence of APOBEC3A. The deletion is known to increase breast cancer risk and APOBEC mutagenesis in breast cancer genomes^66,67. Here, we found that rs12628403 reduces APOBEC3B-like mutagenesis specifically in cancer types with low levels of APOBEC mutagenesis (β_low = −0.50, P_low = 1 × 10⁻⁸; β_high = 0.17, P_high = 0.2), and increases APOBEC3A-like mutagenesis in cancer types with high levels of APOBEC mutagenesis (β_high = 0.44, P_high = 8 × 10⁻⁴; β_low = −0.21, P_low = 0.02). Moreover, we identified a second, novel locus at 22q13.1 that was associated with APOBEC3B-like mutagenesis across cancer types (rs2142833, β = 0.23, P = 1.3 × 10⁻⁸). We independently validated the association between both loci and APOBEC3B-like mutagenesis using East Asian individuals from Asian cancer genome projects (β_rs12628403 = 0.57, P_rs12628403 = 4.2 × 10⁻¹²; β_rs2142833 = 0.58, P_rs2142833 = 8 × 10⁻¹⁵) (Extended Data Fig. 10d). Notably, in a conditional analysis that accounted for rs12628403, we found that rs2142833 and rs12628403 are inherited independently in Europeans (r²<0.1), and rs2142833 remained significantly associated with APOBEC3B-like mutagenesis in Europeans (β_EUR = 0.17, P_EUR = 3 × 10⁻⁵) and East Asians (β_ASN = 0.25, P_ASN = 2 × 10⁻³) (Extended Data Fig. 10e, f). Analysis of donor-matched expression data further suggests that rs2142833 is a cis-expression quantitative trait locus (eQTL) for APOBEC3B at the pan-cancer level (β = 0.19, P = 2 × 10⁻⁶) (Extended Data Fig. 10g, h), consistent with cis-eQTL studies in normal cells^68,69.

Extended Data Fig. 10 — Genome-wide association of somatic CpG mutagenesis in individuals of European ancestry (n = 1,201 patients) based on mutational signature analysis (a) and NpCpG motif analysis (b). Two-sided hypothesis testing was performed using PLINK v.1.9. To mitigate multiple-hypothesis testing, the significance threshold was set to genome-wide significance (P < 5 × 10⁻⁸). c, d, Locuszoom plot for somatic APOBEC3B-like mutagenesis association results, linkage disequilibrium and recombination rates around the genome-wide significant 22q13.1 locus in individuals with European (c) and East Asian (d) ancestry (n = 1,201 and 318 patients, respectively). Locuszoom plot for somatic APOBEC3B-like mutagenesis association results around the 22q13.1 locus in individuals with European (e) and East Asian (f) ancestry after conditioning on rs12628403. g, h, Association between rs2142833 and expression of *APOBEC3* genes in PCAWG tumour samples (adjusted for sex, age at diagnosis, histology and population structure in linear-regression models with two-sided hypothesis testing not corrected for multiple tests). For the box-and-whisker plot, the box denotes the interquartile range, with the median marked as a horizontal line. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Outliers are shown as points.

Fig. 6 — a, Association between common (MAF > 5%) germline variants and somatic APOBEC3B-like mutagenesis in individuals of European ancestry (n = 1,201). Two-sided hypothesis testing was performed with PLINK v.1.9. To mitigate multiple-hypothesis testing, the significance threshold was set to genome-wide significance (P < 5 × 10⁻⁸). b, Templated insertion SVs in a *BRCA1*-associated prostate cancer. Left, chromosome bands (1); SVs ≤ 10 megabases (Mb) (2); 1-kb read depth corrected to copy number 0–6 (3); inter- and intrachromosomal SVs > 10 Mb (4). Right, a complex somatic SV composed of a 2.2-kb tandem duplication on chromosome 2 together with a 232-base-pair (bp) inverted templated insertion SV that is derived from chromosome 5 and inserted inbetween the tandem duplication (bottom). Consensus sequence alignment of locally assembled Oxford Nanopore Technologies long sequencing reads to chromosomes 2 and 5 of the human reference genome (top). Breakpoints are circled and marked as 1 (beginning of tandem duplication), 2 (end of tandem duplication) or 3 (inverted templated insertion). For each breakpoint, the middle panel shows Illumina short reads at SV breakpoints. c, Association between rare germline PTVs (MAF < 0.5%) and somatic CpG mutagenesis (approximately with signature 1) in individuals of European ancestry (n = 1,201). Genes highlighted in blue or red were associated with lower or higher somatic mutation rates. Two-sided hypothesis testing was performed using linear-regression models with sex, age at diagnosis and cancer project as variables. To mitigate multiple-hypothesis testing, the significance threshold was set to exome-wide significance (P < 2.5 × 10⁻⁶). The black line represents the identity line that would be followed if the observed P values followed the null expectation; the shaded area shows the 95% confidence intervals. d, Catalogue of polymorphic germline L1 source elements that are active in cancer. The chromosomal map shows germline source L1 elements as volcano symbols. Each volcano is colour-coded according to the type of source L1 activity. The contribution of each source locus (expressed as a percentage) to the total number of transductions identified in PCAWG tumours is represented as a gradient of volcano size, with top contributing elements exhibiting larger sizes.

Second, we performed a rare-variant association study (MAF <0.5%) to investigate the relationship between germline PTVs and somatic DNA rearrangements in individuals with European ancestry (Extended Data Fig. 11a–c). Germline BRCA2 and BRCA1 PTVs were associated with an increased burden of small (less than 10 kb) somatic SV deletions (P = 1 × 10⁻⁸) and tandem duplications (P = 6 × 10⁻¹³), respectively, corroborating recent studies in breast and ovarian cancer^30,70. In PCAWG data, this pattern also extends to other tumour types, including adenocarcinomas of the prostate and pancreas⁶, typically in the setting of biallelic inactivation. In addition, tumours with high levels of small SV tandem duplications frequently exhibited a novel and distinct class of SVs termed ‘cycles of templated insertions’⁶. These complex SV events consist of DNA templates that are copied from across the genome, joined into one contiguous sequence and inserted into a single derivative chromosome. We found a significant association between germline BRCA1 PTVs and templated insertions at the pan-cancer level (P = 4 × 10⁻¹⁵) (Extended Data Fig. 11d, e). Whole-genome long-read sequencing data generated for a BRCA1-deficient PCAWG prostate tumour verified the small tandem-duplication and templated-insertion SV phenotypes (Fig. 6b). Almost all (20 out of 21) of BRCA1-associated tumours with a templated-insertion SV phenotype displayed combined germline and somatic hits in the gene. Together, these data suggest that biallelic inactivation of BRCA1 is a driver of the templated-insertion SV phenotype.

Extended Data Fig. 11 — a–d, f, Data are based on two-sided rare-variant association testing across n = 2,583 patients, with a stringent P value threshold of P < 2.5 × 10⁻⁶ used to mitigate multiple-hypothesis testing (significant genes marked with coloured circles). Blue/red circles mark genes that decrease/increase somatic mutation rates. The black line represents the identity line that would be followed if the observed P values followed the null expectation, with the shaded area showing the 95% confidence intervals. a, QQ plots for the proportion of somatic SV deletions, tandem duplications, inversions and translocation in cancer genomes. b, QQ plots for the proportion of somatic SV deletions in cancer genomes stratified by four size groups (1–10 kb, 10–100 kb, 100–1,000 kb and >1,000 kb). c, QQ plots for the proportion of somatic SV tandem duplications in cancer genomes stratified by four size groups (1–10 kb, 10–100 kb, 100–1,000 kb and >1,000 kb). d, QQ plot for the presence or absence of somatic SV templated insertion (cycles) in cancer genomes. e, Number of SV-templated insertion cycles in PCAWG tumours with germline *BRCA1* PTVs. Only histological samples with at least one germline *BRCA1* PTV carrier are shown (n = 1,095 patients combined). The box denotes the interquartile range, with the median marked as a horizontal line. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Outliers are shown as points. f, QQ plot for somatic CpG mutagenesis in cancer genomes based on NpCpG motif analysis. g, Violin plots show estimated densities of the proportion of somatic CpG mutations in PCAWG donors with germline *MBD4* and *BRCA2* PTVs. The box denotes the interquartile range, with the median marked as a white point. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Two-sided hypothesis testing, not corrected for multiple testing, was performed using linear regression models. h, Replication of germline *MBD4* and *BRCA2* PTV associations with somatic CpG mutagenesis in TCGA whole-exome sequencing donors. Violin plots show the estimated density of the proportion of somatic CpG mutations in TCGA exomes with germline *MBD4* and *BRCA2* PTVs. The box denotes the interquartile range, with the median marked as a white point. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Two-sided hypothesis testing, not corrected for multiple testing, was performed using linear-regression models. i, Correlation between *MBD4* expression and somatic CpG mutagenesis in primary solid PCAWG tumours. Hypothesis testing was two-sided and not corrected for multiple testing, using linear-regression models. The box denotes the interquartile range, with the median marked as a horizontal line. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. j, Data are mean ± s.e.m. across n = 20 tumour types. The dashed black line shows the fitted line to the data, estimated using linear-regression models. Hypothesis testing was two-sided and not corrected for multiple testing, using Spearman’s rank correlations. k, *MBD4* effect sizes (open circles) with 95% confidence intervals (error bars) for individual cancer types were estimated using linear-regression analysis after (if available) accounting for sex, age at diagnosis (young/old) and ICGC project. Hypothesis testing was two-sided and not corrected for multiple testing.

Third, rare-variant association analysis revealed that patients with germline MBD4 PTVs had increased rates of somatic C > T mutation rates at CpG dinucleotides (P < 2.5 × 10⁻⁶) (Fig. 6c and Extended Data Fig. 11f, g). Analysis of previously published whole-exome sequencing samples from the TCGA (n = 8,134) replicated the association between germline MBD4 PTVs and increased somatic CpG mutagenesis at the pan-cancer level (P = 7.1 × 10⁻⁴) (Extended Data Fig. 11h). Moreover, gene-expression profiling revealed a significant but modest correlation between MBD4 expression and somatic CpG mutation rates between and within PCAWG tumour types (Extended Data Fig. 11i–k). MBD4 encodes a DNA-repair gene that removes thymidines from T:G mismatches within methylated CpG sites⁷¹, a functionality that would be consistent with a CpG mutational signature in cancer.

Fourth, we assessed long interspersed nuclear elements (LINE-1; L1 hereafter) that mediate somatic retrotransposition events^72–74. We identified 114 germline source L1 elements capable of active somatic retrotransposition, including 70 that represent insertions with respect to the human reference genome (Fig. 6d and Supplementary Table 7), and 53 that were tagged by single-nucleotide polymorphisms in strong linkage disequilibrium (Supplementary Table 7). Only 16 germline L1 elements accounted for 67% (2,440 out of 3,669) of all L1-mediated transductions¹⁰ detected in the PCAWG dataset (Extended Data Fig. 12a). These 16 hot-L1 elements followed two broad patterns of somatic activity (8 of each), which we term Strombolian and Plinian in analogy to patterns of volcanic activity. Strombolian L1s are frequently active in cancer, but mediate only small-to-modest eruptions of somatic L1 activity in cancer samples (Extended Data Fig. 12b). By contrast, Plinian L1s are more rarely seen, but display aggressive somatic activity. Whereas Strombolian elements are typically relatively common (MAF > 2%) and sometimes even fixed in the human population, all Plinian elements were infrequent (MAF ≤ 2%) in PCAWG donors (Extended Data Fig. 12c; P = 0.001, Mann–Whitney U-test). This dichotomous pattern of activity and allele frequency may reflect differences in age and selective pressures, with Plinian elements potentially inserted into the human germline more recently. PCAWG donors bear on average between 50 and 60 L1 source elements and between 5 and 7 elements with hot activity (Extended Data Fig. 12d), but only 38% (1,075 out of 2,814) of PCAWG donors carried ≥1 Plinian element. Some L1 germline source loci caused somatic loss of tumour-suppressor genes (Extended Data Fig. 12e). Many are restricted to individual continental population ancestries (Extended Data Fig. 12f–j).

Extended Data Fig. 12 — a, Left, dots show the number of transductions promoted by each hot element in individual samples. Arrows highlight retrotransposition burst. Right, the contribution of each hot locus is represented. The total number of transductions mediated by each source element is shown on the right. b, Source L1 activity rate (that is, measured as the average number of transductions mediated by an element) versus the percentage of samples with retrotransposition activity in which the germline element is active. For visualization purposes, extreme points observed for a source L1 with an activity rate of 49 and for a L1 active in 31% of the samples are shown at ≥20 and ≥10, respectively. c, Contrasting allele frequencies for Strombolian and Plinian source loci (sample sizes shown under each axis label). The box denotes the interquartile range, with the median marked as a white point. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Hypothesis testing was performed using two-sided Mann–Whitney U-tests without correction for multiple tests. d, Numbers of active and hot source L1 elements per donor. Data are mean ± s.d. number of elements per donor. e, The novel Plinian source element on 7p12.3 mediates 72 transductions among only 6 cancer samples. This generates a transduction that induces the deletion of the tumour-suppressor gene *CDKN2A*. f, Violin plots show the estimated number of distinct germline MEI alleles per PCAWG donor. The box denotes the interquartile range, with the median marked as a white point. The whiskers extend as far as the range or 1.5× the interquartile range, whichever is less. Donors are grouped according to their genetic ancestry: AFR, African; AMR, admixed American; EAS, East Asian; EUR, European; SAS, South Asian. Sample sizes are shown under each axis label. g, For each type of MEI (L1, Alu and SVA) identified both in PCAWG and in the 1000 Genomes Project (1KGP), the correlations between allele frequency estimates per ancestry derived from both projects are displayed in a blue (0) to red (1) coloured gradient. n = 2,583 PCAWG patients. Two-sided hypothesis testing was performed using Spearman’s rank correlations without correction for multiple tests. h, Example correlation between MEI allele frequencies derived from PCAWG and the 1000 Genomes Project for individuals with European ancestry (n = 1,201 patients in PCAWG). Two-sided hypothesis testing was performed using Spearman’s rank correlations without correction for multiple tests. i, Evaluation of TraFiC-mem false-discovery rate on a liver hepatocellular carcinoma sample (DO50807) and a cell line (NCI-BL2087) sequenced using single-molecule sequencing with MinION (Oxford Nanopore). For each allele frequency bin (common, >5%; low frequency, 1–5%; rare, <1%), the percentage of events supported by N long reads is represented (N ranges from 0–1 to more than 5). MEIs supported by at least two Nanopore reads were considered to be true positives (blue palette) and were classified as false positives (red) otherwise. The total number of germline MEIs per allele frequency bin is shown on the right. j, Correlation between predicted MEI lengths from Illumina and Nanopore data. Two-sided hypothesis testing was performed using Spearman’s rank correlations without correction for multiple testing.

Replicative immortality

One of the hallmarks of cancer is the ability of cancer to evade cellular senescence²¹. Normal somatic cells typically have finite cell division potential; telomere attrition is one mechanism to limit numbers of mitoses⁷⁵. Cancers enlist multiple strategies to achieve replicative immortality. Overexpression of the telomerase gene, TERT, which maintains telomere lengths, is especially prevalent. This can be achieved through point mutations in the promoter that lead to de novo transcription factor binding^34,37; hitching TERT to highly active regulatory elements elsewhere in the genome^46,76; insertions of viral enhancers upstream of the gene^77,78; and increased dosage through chromosomal amplification, as we have seen in melanoma (Fig. 5b). In addition, there is an ‘alternative lengthening of telomeres’ (ALT) pathway, in which telomeres are lengthened through homologous recombination, mediated by loss-of-function mutations in the ATRX and DAXX genes⁷⁹.

As reported in a companion paper¹³, 16% of tumours in the PCAWG dataset exhibited somatic mutations in at least one of ATRX, DAXX and TERT. TERT alterations were detected in 270 samples, whereas 128 tumours had alterations in ATRX or DAXX, of which 71 were protein-truncating. In the companion paper, which focused on describing patterns of ALT and TERT-mediated telomere maintenance¹³, 12 features of telomeric sequence were measured in the PCAWG cohort. These included counts of nine variants of the core hexameric sequence, the number of ectopic telomere-like insertions within the genome, the number of genomic breakpoints and telomere length as a ratio between tumour and normal. Here we used the 12 features as an overview of telomere integrity across all tumours in the PCAWG dataset.

On the basis of these 12 features, tumour samples formed 4 distinct subclusters (Fig. 7a and Extended Data Fig. 13a), suggesting that telomere-maintenance mechanisms are more diverse than the well-established TERT and ALT dichotomy. Clusters C1 (47 tumours) and C2 (42 tumours) were enriched for traits of the ALT pathway—having longer telomeres, more genomic breakpoints, more ectopic telomere insertions and variant telomere sequence motifs (Supplementary Fig. 9). C1 and C2 were distinguished from one another by the latter having a considerable increase in the number of TTCGGG and TGAGGG variant motifs among the telomeric hexamers. Thyroid adenocarcinomas were markedly enriched among C3 samples (26 out of 33 C3 samples; P < 10⁻¹⁶); the C1 cluster (ALT subtype 1) was common among sarcomas; and both pancreatic endocrine neoplasms and low-grade gliomas had a high proportion of samples in the C2 cluster (ALT subtype 2) (Fig. 7b). Notably, some of the thyroid adenocarcinomas and pancreatic neuroendocrine tumours that cluster together (cluster C3) had matched normal samples that also cluster together (normal cluster N3) (Extended Data Fig. 13a) and which share common properties. For example, the GTAGGG repeat was overrepresented among samples in this group (Supplementary Fig. 10).

Somatic driver mutations were also unevenly distributed across the four clusters (Fig. 7c). C1 tumours were enriched for RB1 mutations or SVs (P = 3 × 10⁻⁵), as well as frequent SVs that affected ATRX (P = 6 × 10⁻¹⁴), but not DAXX. RB1 and ATRX mutations were largely mutually exclusive (Extended Data Fig. 13b). By contrast, C2 tumours were enriched for somatic point mutations in ATRX and DAXX (P = 6 × 10⁻⁵), but not RB1. The enrichment of RB1 mutations in C1 remained significant when only leiomyosarcomas and osteosarcomas were considered, confirming that this enrichment is not merely a consequence of the different distribution of tumour types across clusters. C3 samples had frequent TERT promoter mutations (30%; P = 2 × 10⁻⁶).

There was a marked predominance of RB1 mutations in C1. Nearly a third of the samples in C1 contained an RB1 alteration, which were evenly distributed across truncating SNVs, SVs and shallow deletions (Extended Data Fig. 13c). Previous research has shown that RB1 mutations are associated with long telomeres in the absence of TERT mutations and ATRX inactivation⁸⁰, and studies using mouse models have shown that knockout of Rb-family proteins causes elongated telomeres⁸¹. The association with the C1 cluster here suggests that RB1 mutations can represent another route to activating the ALT pathway, which has subtly different properties of telomeric sequence compared with the inactivation of DAXX—these fall almost exclusively in cluster C2.

Tumour types with the highest rates of abnormal telomere maintenance mechanisms often originate in tissues that have low endogenous replicative activity (Fig. 7d). In support of this, we found an inverse correlation between previously estimated rates of stem cell division across tissues⁸² and the frequency of telomere maintenance abnormalities (P = 0.01, Poisson regression) (Extended Data Fig. 13d). This suggests that restriction of telomere maintenance is an important tumour-suppression mechanism, particularly in tissues with low steady-state cellular proliferation, in which a clone must overcome this constraint to achieve replicative immortality.

Conclusions and future perspectives

The resource reported in this paper and its companion papers has yielded insights into the nature and timing of the many mutational processes that shape large- and small-scale somatic variation in the cancer genome; the patterns of selection that act on these variations; the widespread effect of somatic variants on transcription; the complementary roles of the coding and non-coding genome for both germline and somatic mutations; the ubiquity of intratumoral heterogeneity; and the distinctive evolutionary trajectory of each cancer type. Many of these insights can be obtained only from an integrated analysis of all classes of somatic mutation on a whole-genome scale, and would not be accessible with, for example, targeted exome sequencing.

The promise of precision medicine is to match patients to targeted therapies using genomics. A major barrier to its evidence-based implementation is the daunting heterogeneity of cancer chronicled in these papers, from tumour type to tumour type, from patient to patient, from clone to clone and from cell to cell. Building meaningful clinical predictors from genomic data can be achieved, but will require knowledge banks comprising tens of thousands of patients with comprehensive clinical characterization⁸³. As these sample sizes will be too large for any single funding agency, pharmaceutical company or health system, international collaboration and data sharing will be required. The next phase of ICGC, ICGC-ARGO (https:// www.icgc-argo.org/), will bring the cancer genomics community together with healthcare providers, pharmaceutical companies, data science and clinical trials groups to build comprehensive knowledge banks of clinical outcome and treatment data from patients with a wide variety of cancers, matched with detailed molecular profiling.

Extending the story begun by TCGA, ICGC and other cancer genomics projects, the PCAWG has brought us closer to a comprehensive narrative of the causal biological changes that drive cancer phenotypes. We must now translate this knowledge into sustainable, meaningful clinical treatments.

Methods

Samples

We compiled an inventory of matched tumour–normal whole-cancer genomes in the ICGC Data Coordinating Centre. Most samples came from treatment-naive, primary cancers, although a small number of donors had multiple samples of primary, metastatic and/or recurrent tumours. Our inclusion criteria were: (1) matched tumour and normal specimen pair; (2) a minimal set of clinical fields; and (3) characterization of tumour and normal whole genomes using Illumina HiSeq paired-end sequencing reads.

We collected genome data from 2,834 donors, representing all ICGC and TCGA donors that met these criteria at the time of the final data freeze in autumn 2014 (Extended Data Table 1). After quality assurance (Supplementary Methods 2.5), data from 176 donors were excluded as unusable, 75 had minor issues that could affect some analyses (grey-listed donors) and 2,583 had data of optimal quality (white-listed donors) (Supplementary Table 1). Across the 2,658 white- and grey-listed donors, whole-genome sequences were available from 2,605 primary tumours and 173 metastases or local recurrences. Matching normal samples were obtained from blood (2,064 donors), tissue adjacent to the primary tumour (87 donors) or from distant sites (507 donors). Whole-genome sequencing data were available for tumour and normal DNA for the entire cohort. The mean read coverage was 39× for normal samples, whereas tumours had a bimodal coverage distribution with modes at 38× and 60× (Supplementary Fig. 1). The majority of specimens (65.3%) were sequenced using 101-bp paired-end reads. An additional 28% were sequenced with 100-bp paired-end reads. Of the remaining specimens, 4.7% were sequenced with read lengths longer than 101 bp, and 1.9% with read lengths shorter than 100 bp. The distribution of read lengths by tumour cohort is shown in Supplementary Fig. 11. Median read length for whole-genome sequencing paired-end reads was 101 bp (mean = 106.2, s.d. = 16.7; minimum–maximum = 50–151). RNA-sequencing data were collected and re-analysed centrally for 1,222 donors, including 1,178 primary tumours, 67 metastases or local recurrences and 153 matched normal tissue samples adjacent to the primary tumour.

Demographically, the cohort included 1,469 men (55%) and 1,189 women (45%), with a mean age of 56 years (range, 1–90 years) (Supplementary Table 1). Using population ancestry-differentiated single nucleotide polymorphisms, the ancestry distribution was heavily weighted towards donors of European descent (77% of total) followed by East Asians (16%), as expected for large contributions from European, North American and Australian projects (Supplementary Table 1).

We consolidated histopathology descriptions of the tumour samples, using the ICD-0-3 tumour site controlled vocabulary⁸⁹. Overall, the PCAWG dataset comprises 38 distinct tumour types (Extended Data Table 1 and Supplementary Table 1). Although the most common tumour types are included in the dataset, their distribution does not match the relative population incidences, largely owing to differences among contributing ICGC/TCGA groups in the numbers of sequenced samples.

Uniform processing and somatic variant calling

To generate a consistent set of somatic mutation calls that could be used for cross-tumour analyses, we analysed all 6,835 samples using a uniform set of algorithms for alignment, variant calling and quality control (Extended Data Fig. 1, Supplementary Fig. 2, Supplementary Table 3 and Supplementary Methods 2). We used the BWA-MEM algorithm⁹⁰ to align each tumour and normal sample to human reference build hs37d5 (as used in the 1000 Genomes Project⁹¹). Somatic mutations were identified in the aligned data using three established pipelines, which were run independently on each tumour–normal pair. Each of the three pipelines—labelled ‘Sanger’^92–95, ‘EMBL/DKFZ’^96,97 and ‘Broad’^98–101 after the computational biology groups that created or assembled them—consisted of multiple software packages for calling somatic SNVs, small indels, CNAs and somatic SVs (with intrachromosomal SVs defined as those >100 bp). Two additional variant algorithms^102,103 were included to further improve accuracy across a broad range of clonal and subclonal mutations. We tested different merging strategies using validation data, and choses the optimal method for each variant type to generate a final consensus set of mutation calls (Supplementary Methods S2.4).

Somatic retrotransposition events, including Alu and LINE-1 insertions⁷², L1-mediated transductions⁷³ and pseudogene formation¹⁰⁴, were called using a dedicated pipeline⁷³. We removed these retrotransposition events from the somatic SV call-set. Mitochondrial DNA mutations were called using a published algorithm¹⁰⁵. RNA-sequencing data were uniformly processed to quantify normalized gene-level expression, splicing variation and allele-specific expression, and to identify fusion transcripts, alternative promoter usage and sites of RNA editing⁸.

Integration, phasing and validation of germline variant call-sets

Calls of common (≥1% frequency in PCAWG) and rare (<1%) germline variants including single-nucleotide polymorphisms, indels, SVs and mobile-element insertions (MEIs) were generated using a population-scale genetic polymorphism-detection approach^91,106. The uniform germline data-processing workflow comprised variant identification using six different variant-calling algorithms^96,107,108 and was orchestrated using the Butler workflow system¹⁰⁹.

We performed call-set benchmarking, merging, variant genotyping and statistical haplotype-block phasing⁹¹ (Supplementary Methods 3.4). Using this strategy, we identified 80.1 million germline single-nucleotide polymorphisms, 5.9 million germline indels, 1.8 million multi-allelic short (<50 bp) germline variants, as well as germline SVs ≥ 50 bp in size including 29,492 biallelic deletions and 27,254 MEIs (Supplementary Table 2). We statistically phased this germline variant set using haplotypes from the 1000 Genomes Project⁹¹ as a reference panel, yielding an N50-phased block length of 265 kb based on haploid chromosomes from donor-matched tumour genomes. Precision estimates for germline SNVs and indels were >99% for the phased merged call-set, and sensitivity estimates ranged from 92% to 98%.

Core alignment and variant calling by cloud computing

The requirement to uniformly realign and call variants on nearly 5,800 whole genomes (tumour plus normal) presented considerable computational challenges, and raised ethical issues owing to the use of data from different jurisdictions (Extended Data Table 2). To process the data, we adopted a cloud-computing architecture²⁶ in which the alignment and variant calling was spread across 13 data centres on 3 continents, representing a mixture of commercial, infrastructure-as-a-service, academic cloud compute and traditional academic high-performance computer clusters (Supplementary Table 3). Together, the effort used 10 million CPU-core hours.

To generate reproducible variant calling across the 13 data centres, we built the core pipelines into Docker containers²⁸, in which the workflow description, required code and all associated dependencies were packaged together in stand-alone packages. These heavily tested, extensively validated workflows are available for download (Box 1).

Validation, benchmarking and merging of somatic variant calls

To evaluate the performance of each of the mutation-calling pipelines and determine an integration strategy, we performed a large-scale deep-sequencing validation experiment (Supplementary Notes 1). We selected a pilot set of 63 representative tumour–normal pairs, on which we ran the 3 core pipelines, together with a set of 10 additional somatic variant-calling pipelines contributed by members of the PCAWG SNV Calling Methods Working Group. Sufficient DNA remained for 50 of the 63 cases for validation, which was performed by hybridization of tumour and matched normal DNA to a custom RNA bait set, followed by deep sequencing, as previously described²⁹. Although performed using the same sequencing chemistry as the original whole-genome sequencing analyses, the considerably greater depth achieved in the validation experiment enabled accurate assessment of sensitivity and precision of variant calls. Variant calls in repeat-masked regions were not tested, owing to the challenge of designing reliable validation probes in these areas.

The 3 core pipelines had individual estimates of sensitivity of 80–90% to detect a true somatic SNV called by any of the 13 pipelines; with >95% of SNV calls made by each of the core pipelines being genuine somatic variants (Fig. 1a). For indels—a more-challenging class of variants to identify in short-read sequencing data—the 3 core algorithms had individual sensitivity estimates in the range of 40–50%, with precision 70–95% (Fig. 1b). Validation of SV calls is inherently more difficult, as methods based on PCR or hybridization to RNA baits often fail to isolate DNA that spans the breakpoint. To assess the accuracy of SV calls, we therefore used the property that an SV must either generate a copy-number change or be balanced, whereas artefactual calls will not respect this property. For individual SV-calling algorithms, we estimated precision to be in the range of 80–95% for samples in the 63-sample pilot dataset.

Next, we examined multiple methods for merging calls made by several algorithms into a single definitive call-set to be used for downstream analysis. The final consensus calls for SNVs were based on a simple approach that required two or more methods to agree on a call. For indels, because methods were less concordant, we used stacked logistic regression^110,111 to integrate the calls. The merged SV set includes all calls made by two or more of the four primary SV-calling algorithms^{96,100,112,113}. Consensus CNA calls were obtained by joining the outputs of six individual CNA-calling algorithms with SV consensus breakpoints to obtain base-pair resolution CNAs (Supplementary Methods 2.4.3). Consensus purity and ploidy were derived, and a multitier system was developed for consensus copy-number calls (Supplementary Methods 2.4.3, and described in detail elsewhere⁷).

Overall, the sensitivity and precision of the consensus somatic variant calls were 95% (90% confidence interval, 88–98%) and 95% (90% confidence interval, 71–99%), respectively, for SNVs (Extended Data Fig. 2). For somatic indels, sensitivity and precision were 60% (90% confidence interval, 34–72%) and 91% (90% confidence interval, 73–96%), respectively. Regarding SVs, we estimate the sensitivity of the merging algorithm to be 90% for true calls generated by any one calling pipeline; precision was estimated to be 97.5%. That is, 97.5% of SVs in the merged SV call-set had an associated copy-number change or balanced partner rearrangement. The improvement in calling accuracy from combining different pipelines was most noticeable in variants that had low variant allele fractions, which are likely to originate from subclonal populations of the tumour (Fig. 1c, d). There remains much work to be done to improve indel calling software; we still lack sensitivity for calling even fully clonal complex indels from short-read sequencing data.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Online content

Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-020-1969-6.

Supplementary information

Supplementary Information^{(53MB, pdf)}

This file contains Supplementary Figures 1-19, Supplementary Methods and Supplementary Notes 1-6

Reporting Summary^{(97.3KB, pdf)}

Supplementary Tables^{(1.4MB, zip)}

This zipped file contains Supplementary Tables 1-21 and a Supplementary Table Guide

Supplementary Information^{(470.7KB, pdf)}

Supplementary information

Acknowledgements

We thank research participants who donated samples and data, the physicians and clinical staff who contributed to sample annotation and collection, and the numerous funding agencies that contributed to the collection and analysis of this dataset.

Extended data figures and tables

Author contributions

Writing committee leads: Peter J. Campbell, Gad Getz, Jan O. Korbel, Joshua M. Stuart, Jennifer L. Jennings, Lincoln D. Stein. Head of project management: Jennifer L. Jennings. Sample collection: major contributions from Marc D. Perry, Hardeep K. Nahal-Bose; led by B. F. Francis Ouellette. Histopathology harmonization: major contribution from Constance H. Li; further contributions from Esther Rheinbay, G. Petur Nielsen, Dennis C. Sgroi, Chin-Lee Wu, William C. Faquin, Vikram Deshpande, Paul C. Boutros, Alexander J. Lazar, Katherine A. Hoadley; led by Lincoln D. Stein, David N. Louis. Uniform processing, somatic, germline variant calling: major contribution from L. Jonathan Dursi; further contributions from Christina K. Yung, Matthew H. Bailey, Gordon Saksena, Keiran M. Raine, Ivo Buchhalter, Kortine Kleinheinz, Matthias Schlesner, Junjun Zhang, Wenyi Wang, David A. Wheeler; led by Li Ding, Jared T. Simpson. Core alignment, variant calling by cloud computing: major contributions from Christina K. Yung, Brian D. O’Connor, Sergei Yakneen, Junjun Zhang; further contributions from Kyle Ellrott, Kortine Kleinheinz, Naoki Miyoshi, Keiran M. Raine, Adam P. Butler, Romina Royo, Gordon Saksena, Matthias Schlesner, Solomon I. Shorser, Miguel Vazquez. Integration, phasing, validation of germline variant callsets: major contributions from Tobias Rausch, Grace Tiao, Sebastian M. Waszak, Bernardo Rodriguez-Martin, Suyash Shringarpure, Dai-Ying Wu; further contributions from Sergei Yakneen, German M. Demidov, Olivier Delaneau, Shuto Hayashi, Seiya Imoto, Nina Habermann, Ayellet V. Segre, Erik Garrison, Andy Cafferkey, Eva G. Alvarez, José María Heredia-Genestar, Francesc Muyas, Oliver Drechsel, Alicia L. Bruzos, Javier Temes, Jorge Zamora, L. Jonathan Dursi, Adrian Baez-Ortega, Hyung-Lae Kim, Matthew H. Bailey, R. Jay Mashl, Kai Ye, Ivo Buchhalter, Anthony DiBiase, Kuan-lin Huang, Ivica Letunic, Michael D. McLellan, Steven J. Newhouse, Matthias Schlesner, Tal Shmaya, Sushant Kumar, David C. Wedge, Mark H. Wright, Venkata D. Yellapantula, Mark Gerstein, Ekta Khurana, Tomas Marques-Bonet, Arcadi Navarro, Carlos D. Bustamante, Jared T. Simpson, Li Ding, Reiner Siebert, Hidewaki Nakagawa, Douglas F. Easton; led by Stephan Ossowski, Jose M. C. Tubio, Gad Getz, Francisco M. De La Vega, Xavier Estivill, Jan O. Korbel. Validation, benchmarking, merging of somatic variant calls: major contribution from L. Jonathan Dursi; further contributions from David A. Wheeler, Christina K. Yung; led by Li Ding, Jared T. Simpson. Data and code availability: major contribution from Junjun Zhang; further contributions from Christina K. Yung, Sergei Yakneen, Denis Yuen, George L. Mihaiescu, Larsson Omberg; led by Vincent Ferretti. Pan-cancer burden of somatic mutations: major contribution from Junjun Zhang; led by Peter J. Campbell. Panorama of driver mutations in human cancer: led by Radhakrishnan Sabarinathan, Oriol Pich, Abel Gonzalez-Perez. PCAWG tumours with no apparent driver mutations: major contribution from Esther Rheinbay; further contributions from Amaro Taylor-Weiner, Radhakrishnan Sabarinathan; led by Peter J. Campbell, Gad Getz. Patterns, oncogenicity of kataegis, chromoplexy: major contributions from Matthew W. Fittall, Jonas Demeulemeester, Maxime Tarabichi; further contributions from Nicola D. Roberts, Peter J. Campbell, Jan O. Korbel; led by Peter Van Loo. Patterns, oncogenicity of chromothripsis: major contributions from Maxime Tarabichi, Jonas Demeulemeester, Matthew W. Fittall; further contributions from Isidro Cortes-Ciriano, Lara Urban, Peter J. Park, Peter J. Campbell, Jan O. Korbel; led by Peter Van Loo. Timing-clustered mutational processes during tumour evolution: major contributions from Jonas Demeulemeester, Maxime Tarabichi, Matthew W. Fittall; further contributions from Jan O. Korbel, Peter J. Campbell; led by Peter Van Loo. Germline effects on somatic mutation: major contributions from Sebastian M. Waszak, Bin Zhu, Bernardo Rodriguez-Martin, Esa Pitkanen, Tobias Rausch; further contributions from Yilong Li, Natalie Saini, Leszek J. Klimczak, Joachim Weischenfeldt, Nikos Sidiropoulos, Ludmil B. Alexandrov, Francesc Muyas, Raquel Rabionet, Georgia Escaramis, Adrian Baez-Ortega, Mattia Bosio, Aliaksei Z. Holik, Hana Susak, Eva G. Alvarez, Alicia L. Bruzos, Javier Temes, Aparna Prasad, Nina Habermann, Serap Erkek, Lara Urban, Claudia Calabrese, Benjamin Raeder, Eoghan Harrington, Simon Mayes, Daniel Turner, Sissel Juul, Steven A. Roberts, Lei Song, Roelof Koster, Lisa Mirabello, Xing Hua, Tomas J. Tanskanen, Marta Tojo, David C. Wedge, Jorge Zamora, Jieming Chen, Lauri A. Aaltonen, Gunnar Ratsch, Roland F. Schwarz, Atul J. Butte, Alvis Brazma, Peter J. Campbell, Stephen J. Chanock, Nilanjan Chatterjee, Oliver Stegle, Olivier Harismendy; led by G. Steven Bova, Dmitry A. Gordenin, Jose M. C. Tubio, Douglas F. Easton, Xavier Estivill, Jan O. Korbel. Replicative immortality: major contribution from David Haan; further contributions from Lina Sieverling, Lars Feuerbach; led by Lincoln D. Stein, Joshua M. Stuart. Ethical considerations of genomic cloud computing: led by Don Chalmers, Yann Joly, Bartha Knoppers, Fruzsina Molnar-Gabor, Jan O. Korbel, Mark Phillips, Adrian Thorogood, David Townend. Online resources for data access, visualization, exploration and analysis: major contributions from Mary Goldman, Junjun Zhang, Nuno A. Fonseca; further contributions from Qian Xiang, Brian Craft, Elena Pineiro-Yanez, Alfonso Munoz, Robert Petryszak, Anja Fullgrabe, Fatima Al-Shahrour, Maria Keays, David Haussler, John Weinstein, Wolfgang Huber, Alfonso Valencia, Irene Papatheodorou, Jingchun Zhu; led by Brian D. O’Connor, Lincoln D. Stein, Alvis Brazma, Vincent Ferretti, Miguel Vazquez. The 63-sample pilot-analysis validation process: major contribution from L. Jonathan Dursi; further contributions from Christina K. Yung, Matthew H. Bailey, Gordon Saksena, Keiran M. Raine, Ivo Buchhalter, Kortine Kleinheinz, Matthias Schlesner, Yu Fan, David Torrents, Matthias Bieg, Paul C. Boutros, Ken Chen, Zechen Chong, Kristian Cibulskis, Oliver Drechsel, Roland Eils, Robert S. Fulton, Josep Gelpi, Mark Gerstein, Santiago Gonzalez, Gad Getz, Ivo G. Gut, Faraz Hach, Michael Heinold, Taobo Hu, Vincent Huang, Barbara Hutter, Hyung-Lae Kim, Natalie Jager, Jongsun Jung, Sushant Kumar, Yogesh Kumar, Christopher Lalansingh, Ignaty Leshchiner, Ivica Letunic, Dimitri Livitz, Eric Z. Ma, Yosef E. Maruvka, R. Jay Mashl, Michael D. McLellan, Ana Milovanovic, Morten Muhlig Nielsen, Brian D. O’Connor, Stephan Ossowski, Nagarajan Paramasivam, Jakob Skou Pedersen, Marc D. Perry, Montserrat Puiggros, Romina Royo, Esther Rheinbay, S. Cenk Sahinalp, Iman Sarrafi, Chip Stewart, Miranda D. Stobbe, Grace Tiao, Jeremiah A. Wala, Jiayin Wang, Wenyi Wang, Sebastian M. Waszak, Joachim Weischenfeldt, Michael Wendl, Johannes Werner, Zhenggang Wu, Hong Xue, Sergei Yakneen, Takafumi N. Yamaguchi, Kai Ye, Venkata Yellapantula, Junjun Zhang, David A. Wheeler; led by Li Ding, Jared T. Simpson. Processing of validation data: major contributions from Christina K. Yung, Brian D. O’Connor, Sergei Yakneen, Junjun Zhang; further contributions from Kyle Ellrott, Kortine Kleinheinz, Naoki Miyoshi, Keiran M. Raine, Romina Royo, Gordon Saksena, Matthias Schlesner, Solomon I. Shorser, Miguel Vazquez, Joachim Weischenfeldt, Denis Yuen, Adam P. Butler, Brandi N. Davis-Dusenbery, Roland Eils, Vincent Ferretti, Robert L. Grossman, Olivier Harismendy, Youngwook Kim, Hidewaki Nakagawa, Steven J. Newhouse, David Torrents; led by Lincoln D. Stein. Whole-genome sequencing somatic variant calling: major contribution from Junjun Zhang; further contributions from Christina K. Yung, Solomon I. Shorser. Whole-genome alignment: Keiran M. Raine, Junjun Zhang, Brian D. O’Connor. DKFZ pipeline: Kortine Kleinheinz, Tobias Rausch, Jan O. Korbel, Ivo Buchhalter, Michael C. Heinold, Barbara Hutter, Natalie Jager, Nagarajan Paramasivam, Matthias Schlesner. EMBL pipeline: Joachim Weischenfeldt. Sanger pipeline: Keiran M. Raine, Jonathan Hinton, David R. Jones, Andrew Menzies, Lucy Stebbings, Adam P. Butler. Broad pipeline: Gordon Saksena, Dimitri Livitz, Esther Rheinbay, Julian M. Hess, Ignaty Leshchiner, Chip Stewart, Grace Tiao, Jeremiah A. Wala, Amaro Taylor-Weiner, Mara Rosenberg, Andrew J. Dunford, Manaswi Gupta, Marcin Imielinski, Matthew Meyerson, Rameen Beroukhim, Gad Getz. MuSE Pipeline: Yu Fan, Wenyi Wang. Consensus somatic SNV/indel annotation: Andrew Menzies, Matthias Schlesner, Juri Reimand, Priyanka Dhingra, Ekta Khurana. Somatic SNV, indel merging: major contribution from L. Jonathan Dursi; further contributions from Christina K. Yung, Matthew H. Bailey, Gordon Saksena, Keiran M. Raine, Ivo Buchhalter, Kortine Kleinheinz, Matthias Schlesner, Yu Fan, David Torrents, Matthias Bieg, Paul C. Boutros, Ken Chen, Zechen Chong, Kristian Cibulskis, Oliver Drechsel, Roland Eils, Robert S. Fulton, Josep L. Gelpi, Mark Gerstein, Santiago Gonzalez, Gad Getz, Ivo G. Gut, Faraz Hach, Michael Heinold, Taobo Hu, Vincent Huang, Barbara Hutter, Hyung-Lae Kim, Natalie Jager, Jongsun Jung, Sushant Kumar, Yogesh Kumar, Christopher Lalansingh, Ignaty Leshchiner, Ivica Letunic, Dimitri Livitz, Eric Z. Ma, Yosef E. Maruvka, R. Jay Mashl, Michael D. McLellan, Ana Milovanovic, Morten Muhlig Nielsen, Brian D. O’Connor, Stephan Ossowski, Nagarajan Paramasivam, Jakob Skou Pedersen, Marc D. Perry, Montserrat Puiggros, Romina Royo, Esther Rheinbay, S. Cenk Sahinalp, Iman Sarrafi, Chip Stewart, Miranda D. Stobbe, Grace Tiao, Jeremiah A. Wala, Jiayin Wang, Wenyi Wang, Sebastian M. Waszak, Joachim Weischenfeldt, Michael Wendl, Johannes Werner, Zhenggang Wu, Hong Xue, Sergei Yakneen, Takafumi N. Yamaguchi, Kai Ye, Venkata Yellapantula, Junjun Zhang, David A. Wheeler; major contributions from Li Ding, Jared T. Simpson. Somatic SV merging: Joachim Weischenfeldt, Francesco Favero, Yilong Li. Somatic CNA merging: Stefan Dentro, Jeff Wintersinger, Ignaty Leshchiner. Oxidative artefact filtration: Dimitri Livitz, Ignaty Leshchiner, Chip Stewart, Esther Rheinbay, Gordon Saksena, Gad Getz. Strand bias filtration: Matthias Bieg, Ivo Buchhalter, Johannes Werner, Matthias Schlesner. miniBAM generation: Jeremiah A. Wala, Gordon Saksena, Rameen Beroukhim, Gad Getz. Germline variant identification from whole-genome sequencing: major contributions from Tobias Rausch, Grace Tiao, Sebastian M. Waszak, Bernardo Rodriguez-Martin, Suyash Shringarpure, Dai-Ying Wu; further contributions from Sergei Yakneen, German M. Demidov, Olivier Delaneau, Shuto Hayashi, Seiya Imoto, Nina Habermann, Ayellet V. Segre, Erik Garrison, Andy Cafferkey, Eva G. Alvarez, Alicia L. Bruzos, Jorge Zamora, José María Heredia-Genestar, Francesc Muyas, Oliver Drechsel, L. Jonathan Dursi, Adrian Baez-Ortega, Hyung-Lae Kim, Matthew H. Bailey, R. Jay Mashl, Kai Ye, Ivo Buchhalter, Vasilisa Rudneva, Ji Wan Park, Eun Pyo Hong, Seong Gu Heo, Anthony DiBiase, Kuan-lin Huang, Ivica Letunic, Michael D. McLellan, Steven J. Newhouse, Matthias Schlesner, Tal Shmaya, Sushant Kumar, David C. Wedge, Mark H. Wright, Venkata D. Yellapantula, Mark Gerstein, Ekta Khurana, Tomas Marques-Bonet, Arcadi Navarro, Carlos D. Bustamante, Jared T. Simpson, Li Ding, Reiner Siebert, Hidewaki Nakagawa, Douglas F. Easton; led by Stephan Ossowski, Jose M. C. Tubio, Gad Getz, Francisco M. De La Vega, Xavier Estivill, Jan O. Korbel. RNA-sequencing analysis: major contributions from Nuno A. Fonseca, Andre Kahles, Kjong-Van Lehmann, Lara Urban, Cameron M. Soulette, Yuichi Shiraishi, Fenglin Liu, Yao He, Deniz Demircioglu, Natalie R. Davidson, Claudia Calabrese, Junjun Zhang, Marc D. Perry, Qian Xiang; further contributions from Liliana Greger, Siliang Li, Dongbing Liu, Stefan G. Stark, Fan Zhang, Samirkumar B. Amin, Peter Bailey, Aurelien Chateigner, Isidro Cortes-Ciriano, Brian Craft, Serap Erkek, Milana Frenkel-Morgenstern, Mary Goldman, Katherine A. Hoadley, Yong Hou, Matthew R. Huska, Ekta Khurana, Helena Kilpinen, Jan O. Korbel, Fabien C. Lamaze, Chang Li, Xiaobo Li, Xinyue Li, Xingmin Liu, Maximillian G. Marin, Julia Markowski, Tannistha Nandi, Morten Muhlig Nielsen, Akinyemi I. Ojesina, Qiang Pan-Hammarstrom, Peter J. Park, Chandra Sekhar Pedamallu, Jakob Skou Pedersen, Reiner Siebert, Hong Su, Patrick Tan, Bin Tean Teh, Jian Wang, Sebastian M. Waszak, Heng Xiong, Sergei Yakneen, Chen Ye, Christina Yung, Xiuqing Zhang, Liangtao Zheng, Jingchun Zhu, Shida Zhu, Philip Awadalla, Chad J. Creighton, Matthew Meyerson, B. F. Francis Ouellette, Kui Wu, Huanming Yang; led by Jonathan Goke, Roland F. Schwarz, Oliver Stegle, Zemin Zhang, Alvis Brazma, Gunnar Ratsch, Angela N. Brooks. Clustering of tumour genomes based on telomere maintenance-related features: major contribution from David Haan; led by Lincoln D. Stein, Joshua M. Stuart. Clustered mutational processes in PCAWG: major contributions from Jonas Demeulemeester, Maxime Tarabichi, Matthew W. Fittall; led by Peter J. Campbell, Jan O. Korbel, Peter Van Loo. Tumours without detected driver mutations: Esther Rheinbay, Amaro Taylor-Weiner, Radhakrishnan Sabarinathan, Peter J. Campbell, Gad Getz. Panorama of driver mutations in human cancer: major contributions from Radhakrishnan Sabarinathan, Oriol Pich; further contributions from Inigo Martincorena, Carlota Rubio-Perez, Malene Juul, Jeremiah A. Wala, Steven Schumacher, Ofer Shapira, Nikos Sidiropoulos, Sebastian M. Waszak, David Tamborero, Loris Mularoni, Esther Rheinbay, Henrik Hornshoj, Jordi Deu-Pons, Ferran Muinos, Johanna Bertl, Qianyun Guo, Chad J. Creighton, Joachim Weischenfeldt, Jan O. Korbel, Gad Getz, Peter J. Campbell, Jakob Skou Pedersen, Rameen Beroukhim; led by Abel Gonzalez-Perez. Pilot benchmarking, variant consensus development and validation: major contribution from L. Jonathan Dursi; further contributions from Christina K. Yung, Matthew H. Bailey, Gordon Saksena, Keiran M. Raine, Ivo Buchhalter, Kortine Kleinheinz, Matthias Schlesner, Yu Fan, David Torrents, Matthias Bieg, Paul C. Boutros, Ken Chen, Zechen Chong, Kristian Cibulskis, Oliver Drechsel, Roland Eils, Robert S. Fulton, Josep Gelpi, Mark Gerstein, Santiago Gonzalez, Gad Getz, Ivo G. Gut, Faraz Hach, Michael Heinold, Taobo Hu, Vincent Huang, Barbara Hutter, Hyung-Lae Kim, Natalie Jager, Jongsun Jung, Sushant Kumar, Yogesh Kumar, Christopher Lalansingh, Ignaty Leshchiner, Ivica Letunic, Dimitri Livitz, Eric Z. Ma, Yosef E. Maruvka, R. Jay Mashl, Michael D. McLellan, Ana Milovanovic, Morten Muhlig Nielsen, Brian D. O’Connor, Stephan Ossowski, Nagarajan Paramasivam, Jakob Skou Pedersen, Marc D. Perry, Montserrat Puiggros, Romina Royo, Esther Rheinbay, S. Cenk Sahinalp, Iman Sarrafi, Chip Stewart, Miranda D. Stobbe, Grace Tiao, Jeremiah A. Wala, Jiayin Wang, Wenyi Wang, Sebastian M. Waszak, Joachim Weischenfeldt, Michael Wendl, Johannes Werner, Zhenggang Wu, Hong Xue, Sergei Yakneen, Takafumi N. Yamaguchi, Kai Ye, Venkata Yellapantula, Junjun Zhang, David A. Wheeler; led by Li Ding, Jared T. Simpson. Production somatic variant calling on the PCAWG compute cloud: major contributions from Christina K. Yung, Brian D. O’Connor, Sergei Yakneen, Junjun Zhang; further contributions from Kyle Ellrott, Kortine Kleinheinz, Naoki Miyoshi, Keiran M. Raine, Romina Royo, Gordon Saksena, Matthias Schlesner, Solomon I. Shorser, Miguel Vazquez, Joachim Weischenfeldt, Denis Yuen, Adam P. Butler, Brandi N. Davis-Dusenbery, Roland Eils, Vincent Ferretti, Robert L. Grossman, Olivier Harismendy, Youngwook Kim, Hidewaki Nakagawa, Steven J Newhouse, David Torrents; led by Lincoln D. Stein. PCAWG data portals: major contributions from Mary Goldman, Junjun Zhang, Nuno A. Fonseca, Isidro Cortes-Ciriano; further contributions from Qian Xiang, Brian Craft, Elena Pineiro-Yanez, Brian D O’Connor, Wojciech Bazant, Elisabet Barrera, Alfonso Munoz, Robert Petryszak, Anja Fullgrabe, Fatima Al-Shahrour, Maria Keays, David Haussler, John Weinstein, Wolfgang Huber, Alfonso Valencia, Irene Papatheodorou, Jingchun Zhu; led by Vincent Ferretti, Miguel Vazquez.

Data availability

The PCAWG-generated alignments, somatic variant calls, annotations and derived datasets are available for general research use for browsing and download at http://dcc.icgc.org/pcawg/ (Box 1 and Supplementary Table 4). In accordance with the data access policies of the ICGC and TCGA projects, most molecular, clinical and specimen data are in an open tier which does not require access approval. To access potentially identifying information, such as germline alleles and underlying read data, researchers will need to apply to the TCGA Data Access Committee (DAC) via dbGaP (https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) for access to the TCGA portion of the dataset, and to the ICGC Data Access Compliance Office (DACO; http://icgc.org/daco) for the ICGC portion. In addition, to access somatic single nucleotide variants derived from TCGA donors, researchers will also need to obtain dbGaP authorization.

Beyond the core sequence data and variant call-sets, the analyses in this paper used a number of datasets that were derived from the variant calls (Supplementary Table 4). The individual datasets are available at Synapse (https://www.synapse.org/), and are denoted with synXXXXX accession numbers; all these datasets are also mirrored at https://dcc.icgc.org, with full links, filenames, accession numbers and descriptions detailed in Supplementary Table 4. The datasets encompass: clinical data from each patient including demographics, tumour stage and vital status (syn10389158); harmonized tumour histopathology annotations using a standardised hierarchical ontology (syn1038916); inferred purity and ploidy values for each tumour sample (syn8272483); driver mutations for each patient from their cancer genome spanning all classes of variant, and coding versus non-coding drivers (syn11639581); mutational signatures inferred from PCAWG donors (syn11804065), including APOBEC mutagenesis (syn7437313); and transcriptional data from RNA sequencing, including gene expression levels (syn5553985, syn5553991, syn8105922) and gene fusions (syn10003873, syn7221157).

Code availability

Computational pipelines for calling somatic mutations are available to the public at https://dockstore.org/organizations/PCAWG/collections/PCAWG. A range of data-visualization and -exploration tools are also available for the PCAWG data (Box 1).

Competing interests

Gad Getz receives research funds from IBM and Pharmacyclics and is an inventor on patent applications related to MuTect, ABSOLUTE, MutSig, MSMuTect, MSMutSig and POLYSOLVER. Hikmat Al-Ahmadie is consultant for AstraZeneca and Bristol-Myers Squibb. Samuel Aparicio is a founder and shareholder of Contextual Genomics. Pratiti Bandopadhayay receives grant funding from Novartis for an unrelated project. Rameen Beroukhim owns equity in Ampressa Therapeutics. Andrew Biankin receives grant funding from Celgene, AstraZeneca and is a consultant for or on advisory boards of AstraZeneca, Celgene, Elstar Therapeutics, Clovis Oncology and Roche. Ewan Birney is a consultant for Oxford Nanopore, Dovetail and GSK. Marcus Bosenberg is a consultant for Eli Lilly. Atul Butte is a cofounder of and consultant for Personalis, NuMedii, a consultant for Samsung, Geisinger Health, Mango Tree Corporation, Regenstrief Institute and in the recent past a consultant for 10x Genomics and Helix, a shareholder in Personalis, a minor shareholder in Apple, Twitter, Facebook, Google, Microsoft, Sarepta, 10x Genomics, Amazon, Biogen, CVS, Illumina, Snap and Sutro and has received honoraria and travel reimbursement for invited talks from Genentech, Roche, Pfizer, Optum, AbbVie and many academic institutions and health systems. Carlos Caldas has served on the Scientific Advisory Board of Illumina. Lorraine Chantrill acted on an advisory board for AMGEN Australia in the past 2 years. Andrew D. Cherniack receives research funding from Bayer. Helen Davies is an inventor on a number of patent applications that encompass the use of mutational signatures. Francisco De La Vega was employed at Annai Systems during part of the project. Ronny Drapkin serves on the scientific advisory board of Repare Therapeutics and Siamab Therapeutics. Rosalind Eeles has received an honorarium for the GU-ASCO meeting in San Francisco in January 2016 as a speaker, a honorarium and support from Janssen for the RMH FR meeting in November 2017 as a speaker (title: genetics and prostate cancer), a honorarium for an University of Chicago invited talk in May 2018 as speaker and an educational honorarium paid by Bayer & Ipsen to attend GU Connect ‘Treatment sequencing for mCRPC patients within the changing landscape of mHSPC’ at a venue at ESMO, Barcelona, on 28 September 2019. Paul Flicek is a member of the scientific advisory boards of Fabric Genomics and Eagle Genomics. Ronald Ghossein is a consultant for Veracyte. Dominik Glodzik is an inventor on a number of patent applications that encompass the use of mutational signatures. Eoghan Harrington is a full-time employee of Oxford Nanopore Technologies and is a stock holder. Yann Joly is responsible for the Data Access Compliance Office (DACO) of ICGC 2009-2018. Sissel Juul is a full-time employee of Oxford Nanopore Technologies and is a stock holder. Vincent Khoo has received personal fees and non-financial support from Accuray, Astellas, Bayer, Boston Scientific and Janssen. Stian Knappskog is a coprincipal investigator on a clinical trial that receives research funding from AstraZeneca and Pfizer. Ignaty Leshchiner is a consultant for PACT Pharma. Carlos López-Otín has ownership interest (including stock and patents) in DREAMgenics. Matthew Meyerson is a scientific advisory board chair of, and consultant for, OrigiMed, has obtained research funding from Bayer and Ono Pharma and receives patent royalties from LabCorp. Serena Nik-Zainal is an inventor on a number of patent applications that encompass the use of mutational signatures. Nathan Pennell has done consulting work with Merck, Astrazeneca, Eli Lilly and Bristol-Myers Squibb. Xose S. Puente has ownership interest (including stock and patents in DREAMgenics. Benjamin J. Raphael is a consultant for and has ownership interest (including stock and patents) in Medley Genomics. Jorge Reis-Filho is a consultant for Goldman Sachs and REPARE Therapeutics, member of the scientific advisory board of Volition RX and Paige.AI and an ad hoc member of the scientific advisory board of Ventana Medical Systems, Roche Tissue Diagnostics, InVicro, Roche, Genentech and Novartis. Lewis R. Roberts has received grant support from ARIAD Pharmaceuticals, Bayer, BTG International, Exact Sciences, Gilead Sciences, Glycotest, RedHill Biopharma, Target PharmaSolutions and Wako Diagnostics and has provided advisory services to Bayer, Exact Sciences, Gilead Sciences, GRAIL, QED Therapeutics and TAVEC Pharmaceuticals. Richard A. Scolyer has received fees for professional services from Merck Sharp & Dohme, GlaxoSmithKline Australia, Bristol-Myers Squibb, Dermpedia, Novartis Pharmaceuticals Australia, Myriad, NeraCare GmbH and Amgen. Tal Shmaya is employed at Annai Systems. Reiner Siebert has received speaker honoraria from Roche and AstraZeneca. Sabina Signoretti is a consultant for Bristol-Myers Squibb, AstraZeneca, Merck, AACR and NCI and has received funding from Bristol-Myers Squibb, AstraZeneca, Exelixis and royalties from Biogenex. Jared Simpson has received research funding and travel support from Oxford Nanopore Technologies. Anil K. Sood is a consultant for Merck and Kiyatec, has received research funding from M-Trap and is a shareholder in BioPath. Simon Tavaré is on the scientific advisory board of Ipsen and a consultant for Kallyope. John F. Thompson has received honoraria and travel support for attending advisory board meetings of GlaxoSmithKline and Provectus and has received honoraria for participation in advisory boards for MSD Australia and BMS Australia. Daniel Turner is a full-time employee of Oxford Nanopore Technologies and is a stock holder. Naveen Vasudev has received speaker honoraria and/or consultancy fees from Bristol-Myers Squibb, Pfizer, EUSA pharma, MSD and Novartis. Jeremiah A. Wala is a consultant for Nference. Daniel J. Weisenberger is a consultant for Zymo Research. Dai-Ying Wu is employed at Annai Systems. Cheng-Zhong Zhang is a cofounder and equity holder of Pillar Biosciences, a for-profit company that specializes in the development of targeted sequencing assays. The other authors declare no competing interests.

Footnotes

Peer review information Nature thanks Arul Chinnaiyan, Ben Lehner, Nicolas Robine and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

These authors jointly supervised this work: Peter J. Campbell, Gad Getz, Jan O. Korbel, Joshua M. Stuart, Lincoln D. Stein

A list of members and their affiliations appears in the online version of the paper and lists of working groups appear in the Supplementary Information

Change history

1/25/2023

A Correction to this paper has been published: 10.1038/s41586-022-05598-w

Contributor Information

The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium:

Lauri A. Aaltonen, Federico Abascal, Adam Abeshouse, Hiroyuki Aburatani, David J. Adams, Nishant Agrawal, Keun Soo Ahn, Sung-Min Ahn, Hiroshi Aikata, Rehan Akbani, Kadir C. Akdemir, Hikmat Al-Ahmadie, Sultan T. Al-Sedairy, Fatima Al-Shahrour, Malik Alawi, Monique Albert, Kenneth Aldape, Ludmil B. Alexandrov, Adrian Ally, Kathryn Alsop, Eva G. Alvarez, Fernanda Amary, Samirkumar B. Amin, Brice Aminou, Ole Ammerpohl, Matthew J. Anderson, Yeng Ang, Davide Antonello, Pavana Anur, Samuel Aparicio, Elizabeth L. Appelbaum, Yasuhito Arai, Axel Aretz, Koji Arihiro, Shun-ichi Ariizumi, Joshua Armenia, Laurent Arnould, Sylvia Asa, Yassen Assenov, Gurnit Atwal, Sietse Aukema, J. Todd Auman, Miriam R. R. Aure, Philip Awadalla, Marta Aymerich, Gary D. Bader, Adrian Baez-Ortega, Matthew H. Bailey, Peter J. Bailey, Miruna Balasundaram, Saianand Balu, Pratiti Bandopadhayay, Rosamonde E. Banks, Stefano Barbi, Andrew P. Barbour, Jonathan Barenboim, Jill Barnholtz-Sloan, Hugh Barr, Elisabet Barrera, John Bartlett, Javier Bartolome, Claudio Bassi, Oliver F. Bathe, Daniel Baumhoer, Prashant Bavi, Stephen B. Baylin, Wojciech Bazant, Duncan Beardsmore, Timothy A. Beck, Sam Behjati, Andreas Behren, Beifang Niu, Cindy Bell, Sergi Beltran, Christopher Benz, Andrew Berchuck, Anke K. Bergmann, Erik N. Bergstrom, Benjamin P. Berman, Daniel M. Berney, Stephan H. Bernhart, Rameen Beroukhim, Mario Berrios, Samantha Bersani, Johanna Bertl, Miguel Betancourt, Vinayak Bhandari, Shriram G. Bhosle, Andrew V. Biankin, Matthias Bieg, Darell Bigner, Hans Binder, Ewan Birney, Michael Birrer, Nidhan K. Biswas, Bodil Bjerkehagen, Tom Bodenheimer, Lori Boice, Giada Bonizzato, Johann S. De Bono, Arnoud Boot, Moiz S. Bootwalla, Ake Borg, Arndt Borkhardt, Keith A. Boroevich, Ivan Borozan, Christoph Borst, Marcus Bosenberg, Mattia Bosio, Jacqueline Boultwood, Guillaume Bourque, Paul C. Boutros, G. Steven Bova, David T. Bowen, Reanne Bowlby, David D. L. Bowtell, Sandrine Boyault, Rich Boyce, Jeffrey Boyd, Alvis Brazma, Paul Brennan, Daniel S. Brewer, Arie B. Brinkman, Robert G. Bristow, Russell R. Broaddus, Jane E. Brock, Malcolm Brock, Annegien Broeks, Angela N. Brooks, Denise Brooks, Benedikt Brors, Søren Brunak, Timothy J. C. Bruxner, Alicia L. Bruzos, Alex Buchanan, Ivo Buchhalter, Christiane Buchholz, Susan Bullman, Hazel Burke, Birgit Burkhardt, Kathleen H. Burns, John Busanovich, Carlos D. Bustamante, Adam P. Butler, Atul J. Butte, Niall J. Byrne, Anne-Lise Børresen-Dale, Samantha J. Caesar-Johnson, Andy Cafferkey, Declan Cahill, Claudia Calabrese, Carlos Caldas, Fabien Calvo, Niedzica Camacho, Peter J. Campbell, Elias Campo, Cinzia Cantù, Shaolong Cao, Thomas E. Carey, Joana Carlevaro-Fita, Rebecca Carlsen, Ivana Cataldo, Mario Cazzola, Jonathan Cebon, Robert Cerfolio, Dianne E. Chadwick, Dimple Chakravarty, Don Chalmers, Calvin Wing Yiu Chan, Kin Chan, Michelle Chan-Seng-Yue, Vishal S. Chandan, David K. Chang, Stephen J. Chanock, Lorraine A. Chantrill, Aurélien Chateigner, Nilanjan Chatterjee, Kazuaki Chayama, Hsiao-Wei Chen, Jieming Chen, Ken Chen, Yiwen Chen, Zhaohong Chen, Andrew D. Cherniack, Jeremy Chien, Yoke-Eng Chiew, Suet-Feung Chin, Juok Cho, Sunghoon Cho, Jung Kyoon Choi, Wan Choi, Christine Chomienne, Zechen Chong, Su Pin Choo, Angela Chou, Angelika N. Christ, Elizabeth L. Christie, Eric Chuah, Carrie Cibulskis, Kristian Cibulskis, Sara Cingarlini, Peter Clapham, Alexander Claviez, Sean Cleary, Nicole Cloonan, Marek Cmero, Colin C. Collins, Ashton A. Connor, Susanna L. Cooke, Colin S. Cooper, Leslie Cope, Vincenzo Corbo, Matthew G. Cordes, Stephen M. Cordner, Isidro Cortés-Ciriano, Kyle Covington, Prue A. Cowin, Brian Craft, David Craft, Chad J. Creighton, Yupeng Cun, Erin Curley, Ioana Cutcutache, Karolina Czajka, Bogdan Czerniak, Rebecca A. Dagg, Ludmila Danilova, Maria Vittoria Davi, Natalie R. Davidson, Helen Davies, Ian J. Davis, Brandi N. Davis-Dusenbery, Kevin J. Dawson, Francisco M. De La Vega, Ricardo De Paoli-Iseppi, Timothy Defreitas, Angelo P. Dei Tos, Olivier Delaneau, John A. Demchok, Jonas Demeulemeester, German M. Demidov, Deniz Demircioğlu, Nening M. Dennis, Robert E. Denroche, Stefan C. Dentro, Nikita Desai, Vikram Deshpande, Amit G. Deshwar, Christine Desmedt, Jordi Deu-Pons, Noreen Dhalla, Neesha C. Dhani, Priyanka Dhingra, Rajiv Dhir, Anthony DiBiase, Klev Diamanti, Li Ding, Shuai Ding, Huy Q. Dinh, Luc Dirix, HarshaVardhan Doddapaneni, Nilgun Donmez, Michelle T. Dow, Ronny Drapkin, Oliver Drechsel, Ruben M. Drews, Serge Serge, Tim Dudderidge, Ana Dueso-Barroso, Andrew J. Dunford, Michael Dunn, Lewis Jonathan Dursi, Fraser R. Duthie, Ken Dutton-Regester, Jenna Eagles, Douglas F. Easton, Stuart Edmonds, Paul A. Edwards, Sandra E. Edwards, Rosalind A. Eeles, Anna Ehinger, Juergen Eils, Roland Eils, Adel El-Naggar, Matthew Eldridge, Kyle Ellrott, Serap Erkek, Georgia Escaramis, Shadrielle M. G. Espiritu, Xavier Estivill, Dariush Etemadmoghadam, Jorunn E. Eyfjord, Bishoy M. Faltas, Daiming Fan, Yu Fan, William C. Faquin, Claudiu Farcas, Matteo Fassan, Aquila Fatima, Francesco Favero, Nodirjon Fayzullaev, Ina Felau, Sian Fereday, Martin L. Ferguson, Vincent Ferretti, Lars Feuerbach, Matthew A. Field, J. Lynn Fink, Gaetano Finocchiaro, Cyril Fisher, Matthew W. Fittall, Anna Fitzgerald, Rebecca C. Fitzgerald, Adrienne M. Flanagan, Neil E. Fleshner, Paul Flicek, John A. Foekens, Kwun M. Fong, Nuno A. Fonseca, Christopher S. Foster, Natalie S. Fox, Michael Fraser, Scott Frazer, Milana Frenkel-Morgenstern, William Friedman, Joan Frigola, Catrina C. Fronick, Akihiro Fujimoto, Masashi Fujita, Masashi Fukayama, Lucinda A. Fulton, Robert S. Fulton, Mayuko Furuta, P. Andrew Futreal, Anja Füllgrabe, Stacey B. Gabriel, Steven Gallinger, Carlo Gambacorti-Passerini, Jianjiong Gao, Shengjie Gao, Levi Garraway, Øystein Garred, Erik Garrison, Dale W. Garsed, Nils Gehlenborg, Josep L. L. Gelpi, Joshy George, Daniela S. Gerhard, Clarissa Gerhauser, Jeffrey E. Gershenwald, Mark Gerstein, Moritz Gerstung, Gad Getz, Mohammed Ghori, Ronald Ghossein, Nasra H. Giama, Richard A. Gibbs, Bob Gibson, Anthony J. Gill, Pelvender Gill, Dilip D. Giri, Dominik Glodzik, Vincent J. Gnanapragasam, Maria Elisabeth Goebler, Mary J. Goldman, Carmen Gomez, Santiago Gonzalez, Abel Gonzalez-Perez, Dmitry A. Gordenin, James Gossage, Kunihito Gotoh, Ramaswamy Govindan, Dorthe Grabau, Janet S. Graham, Robert C. Grant, Anthony R. Green, Eric Green, Liliana Greger, Nicola Grehan, Sonia Grimaldi, Sean M. Grimmond, Robert L. Grossman, Adam Grundhoff, Gunes Gundem, Qianyun Guo, Manaswi Gupta, Shailja Gupta, Ivo G. Gut, Marta Gut, Jonathan Göke, Gavin Ha, Andrea Haake, David Haan, Siegfried Haas, Kerstin Haase, James E. Haber, Nina Habermann, Faraz Hach, Syed Haider, Natsuko Hama, Freddie C. Hamdy, Anne Hamilton, Mark P. Hamilton, Leng Han, George B. Hanna, Martin Hansmann, Nicholas J. Haradhvala, Olivier Harismendy, Ivon Harliwong, Arif O. Harmanci, Eoghan Harrington, Takanori Hasegawa, David Haussler, Steve Hawkins, Shinya Hayami, Shuto Hayashi, D. Neil Hayes, Stephen J. Hayes, Nicholas K. Hayward, Steven Hazell, Yao He, Allison P. Heath, Simon C. Heath, David Hedley, Apurva M. Hegde, David I. Heiman, Michael C. Heinold, Zachary Heins, Lawrence E. Heisler, Eva Hellstrom-Lindberg, Mohamed Helmy, Seong Gu Heo, Austin J. Hepperla, José María Heredia-Genestar, Carl Herrmann, Peter Hersey, Julian M. Hess, Holmfridur Hilmarsdottir, Jonathan Hinton, Satoshi Hirano, Nobuyoshi Hiraoka, Katherine A. Hoadley, Asger Hobolth, Ermin Hodzic, Jessica I. Hoell, Steve Hoffmann, Oliver Hofmann, Andrea Holbrook, Aliaksei Z. Holik, Michael A. Hollingsworth, Oliver Holmes, Robert A. Holt, Chen Hong, Eun Pyo Hong, Jongwhi H. Hong, Gerrit K. Hooijer, Henrik Hornshøj, Fumie Hosoda, Yong Hou, Volker Hovestadt, William Howat, Alan P. Hoyle, Ralph H. Hruban, Jianhong Hu, Taobo Hu, Xing Hua, Kuan-lin Huang, Mei Huang, Mi Ni Huang, Vincent Huang, Yi Huang, Wolfgang Huber, Thomas J. Hudson, Michael Hummel, Jillian A. Hung, David Huntsman, Ted R. Hupp, Jason Huse, Matthew R. Huska, Barbara Hutter, Carolyn M. Hutter, Daniel Hübschmann, Christine A. Iacobuzio-Donahue, Charles David Imbusch, Marcin Imielinski, Seiya Imoto, William B. Isaacs, Keren Isaev, Shumpei Ishikawa, Murat Iskar, S. M. Ashiqul Islam, Michael Ittmann, Sinisa Ivkovic, Jose M. G. Izarzugaza, Jocelyne Jacquemier, Valerie Jakrot, Nigel B. Jamieson, Gun Ho Jang, Se Jin Jang, Joy C. Jayaseelan, Reyka Jayasinghe, Stuart R. Jefferys, Karine Jegalian, Jennifer L. Jennings, Seung-Hyup Jeon, Lara Jerman, Yuan Ji, Wei Jiao, Peter A. Johansson, Amber L. Johns, Jeremy Johns, Rory Johnson, Todd A. Johnson, Clemency Jolly, Yann Joly, Jon G. Jonasson, Corbin D. Jones, David R. Jones, David T. W. Jones, Nic Jones, Steven J. M. Jones, Jos Jonkers, Young Seok Ju, Hartmut Juhl, Jongsun Jung, Malene Juul, Randi Istrup Juul, Sissel Juul, Natalie Jäger, Rolf Kabbe, Andre Kahles, Abdullah Kahraman, Vera B. Kaiser, Hojabr Kakavand, Sangeetha Kalimuthu, Christof von Kalle, Koo Jeong Kang, Katalin Karaszi, Beth Karlan, Rosa Karlić, Dennis Karsch, Katayoon Kasaian, Karin S. Kassahn, Hitoshi Katai, Mamoru Kato, Hiroto Katoh, Yoshiiku Kawakami, Jonathan D. Kay, Stephen H. Kazakoff, Marat D. Kazanov, Maria Keays, Electron Kebebew, Richard F. Kefford, Manolis Kellis, James G. Kench, Catherine J. Kennedy, Jules N. A. Kerssemakers, David Khoo, Vincent Khoo, Narong Khuntikeo, Ekta Khurana, Helena Kilpinen, Hark Kyun Kim, Hyung-Lae Kim, Hyung-Yong Kim, Hyunghwan Kim, Jaegil Kim, Jihoon Kim, Jong K. Kim, Youngwook Kim, Tari A. King, Wolfram Klapper, Kortine Kleinheinz, Leszek J. Klimczak, Stian Knappskog, Michael Kneba, Bartha M. Knoppers, Youngil Koh, Jan Komorowski, Daisuke Komura, Mitsuhiro Komura, Gu Kong, Marcel Kool, Jan O. Korbel, Viktoriya Korchina, Andrey Korshunov, Michael Koscher, Roelof Koster, Zsofia Kote-Jarai, Antonios Koures, Milena Kovacevic, Barbara Kremeyer, Helene Kretzmer, Markus Kreuz, Savitri Krishnamurthy, Dieter Kube, Kiran Kumar, Pardeep Kumar, Sushant Kumar, Yogesh Kumar, Ritika Kundra, Kirsten Kübler, Ralf Küppers, Jesper Lagergren, Phillip H. Lai, Peter W. Laird, Sunil R. Lakhani, Christopher M. Lalansingh, Emilie Lalonde, Fabien C. Lamaze, Adam Lambert, Eric Lander, Pablo Landgraf, Luca Landoni, Anita Langerød, Andrés Lanzós, Denis Larsimont, Erik Larsson, Mark Lathrop, Loretta M. S. Lau, Chris Lawerenz, Rita T. Lawlor, Michael S. Lawrence, Alexander J. Lazar, Ana Mijalkovic Lazic, Xuan Le, Darlene Lee, Donghoon Lee, Eunjung Alice Lee, Hee Jin Lee, Jake June-Koo Lee, Jeong-Yeon Lee, Juhee Lee, Ming Ta Michael Lee, Henry Lee-Six, Kjong-Van Lehmann, Hans Lehrach, Dido Lenze, Conrad R. Leonard, Daniel A. Leongamornlert, Ignaty Leshchiner, Louis Letourneau, Ivica Letunic, Douglas A. Levine, Lora Lewis, Tim Ley, Chang Li, Constance H. Li, Haiyan Irene Li, Jun Li, Lin Li, Shantao Li, Siliang Li, Xiaobo Li, Xiaotong Li, Xinyue Li, Yilong Li, Han Liang, Sheng-Ben Liang, Peter Lichter, Pei Lin, Ziao Lin, W. M. Linehan, Ole Christian Lingjærde, Dongbing Liu, Eric Minwei Liu, Fei-Fei Fei Liu, Fenglin Liu, Jia Liu, Xingmin Liu, Julie Livingstone, Dimitri Livitz, Naomi Livni, Lucas Lochovsky, Markus Loeffler, Georgina V. Long, Armando Lopez-Guillermo, Shaoke Lou, David N. Louis, Laurence B. Lovat, Yiling Lu, Yong-Jie Lu, Youyong Lu, Claudio Luchini, Ilinca Lungu, Xuemei Luo, Hayley J. Luxton, Andy G. Lynch, Lisa Lype, Cristina López, Carlos López-Otín, Eric Z. Ma, Yussanne Ma, Gaetan MacGrogan, Shona MacRae, Geoff Macintyre, Tobias Madsen, Kazuhiro Maejima, Andrea Mafficini, Dennis T. Maglinte, Arindam Maitra, Partha P. Majumder, Luca Malcovati, Salem Malikic, Giuseppe Malleo, Graham J. Mann, Luisa Mantovani-Löffler, Kathleen Marchal, Giovanni Marchegiani, Elaine R. Mardis, Adam A. Margolin, Maximillian G. Marin, Florian Markowetz, Julia Markowski, Jeffrey Marks, Tomas Marques-Bonet, Marco A. Marra, Luke Marsden, John W. M. Martens, Sancha Martin, Jose I. Martin-Subero, Iñigo Martincorena, Alexander Martinez-Fundichely, Yosef E. Maruvka, R. Jay Mashl, Charlie E. Massie, Thomas J. Matthew, Lucy Matthews, Erik Mayer, Simon Mayes, Michael Mayo, Faridah Mbabaali, Karen McCune, Ultan McDermott, Patrick D. McGillivray, Michael D. McLellan, John D. McPherson, John R. McPherson, Treasa A. McPherson, Samuel R. Meier, Alice Meng, Shaowu Meng, Andrew Menzies, Neil D. Merrett, Sue Merson, Matthew Meyerson, William Meyerson, Piotr A. Mieczkowski, George L. Mihaiescu, Sanja Mijalkovic, Tom Mikkelsen, Michele Milella, Linda Mileshkin, Christopher A. Miller, David K. Miller, Jessica K. Miller, Gordon B. Mills, Ana Milovanovic, Sarah Minner, Marco Miotto, Gisela Mir Arnau, Lisa Mirabello, Chris Mitchell, Thomas J. Mitchell, Satoru Miyano, Naoki Miyoshi, Shinichi Mizuno, Fruzsina Molnár-Gábor, Malcolm J. Moore, Richard A. Moore, Sandro Morganella, Quaid D. Morris, Carl Morrison, Lisle E. Mose, Catherine D. Moser, Ferran Muiños, Loris Mularoni, Andrew J. Mungall, Karen Mungall, Elizabeth A. Musgrove, Ville Mustonen, David Mutch, Francesc Muyas, Donna M. Muzny, Alfonso Muñoz, Jerome Myers, Ola Myklebost, Peter Möller, Genta Nagae, Adnan M. Nagrial, Hardeep K. Nahal-Bose, Hitoshi Nakagama, Hidewaki Nakagawa, Hiromi Nakamura, Toru Nakamura, Kaoru Nakano, Tannistha Nandi, Jyoti Nangalia, Mia Nastic, Arcadi Navarro, Fabio C. P. Navarro, David E. Neal, Gerd Nettekoven, Felicity Newell, Steven J. Newhouse, Yulia Newton, Alvin Wei Tian Ng, Anthony Ng, Jonathan Nicholson, David Nicol, Yongzhan Nie, G. Petur Nielsen, Morten Muhlig Nielsen, Serena Nik-Zainal, Michael S. Noble, Katia Nones, Paul A. Northcott, Faiyaz Notta, Brian D. O’Connor, Peter O’Donnell, Maria O’Donovan, Sarah O’Meara, Brian Patrick O’Neill, J. Robert O’Neill, David Ocana, Angelica Ochoa, Layla Oesper, Christopher Ogden, Hideki Ohdan, Kazuhiro Ohi, Lucila Ohno-Machado, Karin A. Oien, Akinyemi I. Ojesina, Hidenori Ojima, Takuji Okusaka, Larsson Omberg, Choon Kiat Ong, Stephan Ossowski, German Ott, B. F. Francis Ouellette, Christine P’ng, Marta Paczkowska, Salvatore Paiella, Chawalit Pairojkul, Marina Pajic, Qiang Pan-Hammarström, Elli Papaemmanuil, Irene Papatheodorou, Nagarajan Paramasivam, Ji Wan Park, Joong-Won Park, Keunchil Park, Kiejung Park, Peter J. Park, Joel S. Parker, Simon L. Parsons, Harvey Pass, Danielle Pasternack, Alessandro Pastore, Ann-Marie Patch, Iris Pauporté, Antonio Pea, John V. Pearson, Chandra Sekhar Pedamallu, Jakob Skou Pedersen, Paolo Pederzoli, Martin Peifer, Nathan A. Pennell, Charles M. Perou, Marc D. Perry, Gloria M. Petersen, Myron Peto, Nicholas Petrelli, Robert Petryszak, Stefan M. Pfister, Mark Phillips, Oriol Pich, Hilda A. Pickett, Todd D. Pihl, Nischalan Pillay, Sarah Pinder, Mark Pinese, Andreia V. Pinho, Esa Pitkänen, Xavier Pivot, Elena Piñeiro-Yáñez, Laura Planko, Christoph Plass, Paz Polak, Tirso Pons, Irinel Popescu, Olga Potapova, Aparna Prasad, Shaun R. Preston, Manuel Prinz, Antonia L. Pritchard, Stephenie D. Prokopec, Elena Provenzano, Xose S. Puente, Sonia Puig, Montserrat Puiggròs, Sergio Pulido-Tamayo, Gulietta M. Pupo, Colin A. Purdie, Michael C. Quinn, Raquel Rabionet, Janet S. Rader, Bernhard Radlwimmer, Petar Radovic, Benjamin Raeder, Keiran M. Raine, Manasa Ramakrishna, Kamna Ramakrishnan, Suresh Ramalingam, Benjamin J. Raphael, W. Kimryn Rathmell, Tobias Rausch, Guido Reifenberger, Jüri Reimand, Jorge Reis-Filho, Victor Reuter, Iker Reyes-Salazar, Matthew A. Reyna, Sheila M. Reynolds, Esther Rheinbay, Yasser Riazalhosseini, Andrea L. Richardson, Julia Richter, Matthew Ringel, Markus Ringnér, Yasushi Rino, Karsten Rippe, Jeffrey Roach, Lewis R. Roberts, Nicola D. Roberts, Steven A. Roberts, A. Gordon Robertson, Alan J. Robertson, Javier Bartolomé Rodriguez, Bernardo Rodriguez-Martin, F. Germán Rodríguez-González, Michael H. A. Roehrl, Marius Rohde, Hirofumi Rokutan, Gilles Romieu, Ilse Rooman, Tom Roques, Daniel Rosebrock, Mara Rosenberg, Philip C. Rosenstiel, Andreas Rosenwald, Edward W. Rowe, Romina Royo, Steven G. Rozen, Yulia Rubanova, Mark A. Rubin, Carlota Rubio-Perez, Vasilisa A. Rudneva, Borislav C. Rusev, Andrea Ruzzenente, Gunnar Rätsch, Radhakrishnan Sabarinathan, Veronica Y. Sabelnykova, Sara Sadeghi, S. Cenk Sahinalp, Natalie Saini, Mihoko Saito-Adachi, Gordon Saksena, Adriana Salcedo, Roberto Salgado, Leonidas Salichos, Richard Sallari, Charles Saller, Roberto Salvia, Michelle Sam, Jaswinder S. Samra, Francisco Sanchez-Vega, Chris Sander, Grant Sanders, Rajiv Sarin, Iman Sarrafi, Aya Sasaki-Oku, Torill Sauer, Guido Sauter, Robyn P. M. Saw, Maria Scardoni, Christopher J. Scarlett, Aldo Scarpa, Ghislaine Scelo, Dirk Schadendorf, Jacqueline E. Schein, Markus B. Schilhabel, Matthias Schlesner, Thorsten Schlomm, Heather K. Schmidt, Sarah-Jane Schramm, Stefan Schreiber, Nikolaus Schultz, Steven E. Schumacher, Roland F. Schwarz, Richard A. Scolyer, David Scott, Ralph Scully, Raja Seethala, Ayellet V. Segre, Iris Selander, Colin A. Semple, Yasin Senbabaoglu, Subhajit Sengupta, Elisabetta Sereni, Stefano Serra, Dennis C. Sgroi, Mark Shackleton, Nimish C. Shah, Sagedeh Shahabi, Catherine A. Shang, Ping Shang, Ofer Shapira, Troy Shelton, Ciyue Shen, Hui Shen, Rebecca Shepherd, Ruian Shi, Yan Shi, Yu-Jia Shiah, Tatsuhiro Shibata, Juliann Shih, Eigo Shimizu, Kiyo Shimizu, Seung Jun Shin, Yuichi Shiraishi, Tal Shmaya, Ilya Shmulevich, Solomon I. Shorser, Charles Short, Raunak Shrestha, Suyash S. Shringarpure, Craig Shriver, Shimin Shuai, Nikos Sidiropoulos, Reiner Siebert, Anieta M. Sieuwerts, Lina Sieverling, Sabina Signoretti, Katarzyna O. Sikora, Michele Simbolo, Ronald Simon, Janae V. Simons, Jared T. Simpson, Peter T. Simpson, Samuel Singer, Nasa Sinnott-Armstrong, Payal Sipahimalani, Tara J. Skelly, Marcel Smid, Jaclyn Smith, Karen Smith-McCune, Nicholas D. Socci, Heidi J. Sofia, Matthew G. Soloway, Lei Song, Anil K. Sood, Sharmila Sothi, Christos Sotiriou, Cameron M. Soulette, Paul N. Span, Paul T. Spellman, Nicola Sperandio, Andrew J. Spillane, Oliver Spiro, Jonathan Spring, Johan Staaf, Peter F. Stadler, Peter Staib, Stefan G. Stark, Lucy Stebbings, Ólafur Andri Stefánsson, Oliver Stegle, Lincoln D. Stein, Alasdair Stenhouse, Chip Stewart, Stephan Stilgenbauer, Miranda D. Stobbe, Michael R. Stratton, Jonathan R. Stretch, Adam J. Struck, Joshua M. Stuart, Henk G. Stunnenberg, Hong Su, Xiaoping Su, Ren X. Sun, Stephanie Sungalee, Hana Susak, Akihiro Suzuki, Fred Sweep, Monika Szczepanowski, Holger Sültmann, Takashi Yugawa, Angela Tam, David Tamborero, Benita Kiat Tee Tan, Donghui Tan, Patrick Tan, Hiroko Tanaka, Hirokazu Taniguchi, Tomas J. Tanskanen, Maxime Tarabichi, Roy Tarnuzzer, Patrick Tarpey, Morgan L. Taschuk, Kenji Tatsuno, Simon Tavaré, Darrin F. Taylor, Amaro Taylor-Weiner, Jon W. Teague, Bin Tean Teh, Varsha Tembe, Javier Temes, Kevin Thai, Sarah P. Thayer, Nina Thiessen, Gilles Thomas, Sarah Thomas, Alan Thompson, Alastair M. Thompson, John F. F. Thompson, R. Houston Thompson, Heather Thorne, Leigh B. Thorne, Adrian Thorogood, Grace Tiao, Nebojsa Tijanic, Lee E. Timms, Roberto Tirabosco, Marta Tojo, Stefania Tommasi, Christopher W. Toon, Umut H. Toprak, David Torrents, Giampaolo Tortora, Jörg Tost, Yasushi Totoki, David Townend, Nadia Traficante, Isabelle Treilleux, Jean-Rémi Trotta, Lorenz H. P. Trümper, Ming Tsao, Tatsuhiko Tsunoda, Jose M. C. Tubio, Olga Tucker, Richard Turkington, Daniel J. Turner, Andrew Tutt, Masaki Ueno, Naoto T. Ueno, Christopher Umbricht, Husen M. Umer, Timothy J. Underwood, Lara Urban, Tomoko Urushidate, Tetsuo Ushiku, Liis Uusküla-Reimand, Alfonso Valencia, David J. Van Den Berg, Steven Van Laere, Peter Van Loo, Erwin G. Van Meir, Gert G. Van den Eynden, Theodorus Van der Kwast, Naveen Vasudev, Miguel Vazquez, Ravikiran Vedururu, Umadevi Veluvolu, Shankar Vembu, Lieven P. C. Verbeke, Peter Vermeulen, Clare Verrill, Alain Viari, David Vicente, Caterina Vicentini, K. VijayRaghavan, Juris Viksna, Ricardo E. Vilain, Izar Villasante, Anne Vincent-Salomon, Tapio Visakorpi, Douglas Voet, Paresh Vyas, Ignacio Vázquez-García, Nick M. Waddell, Nicola Waddell, Claes Wadelius, Lina Wadi, Rabea Wagener, Jeremiah A. Wala, Jian Wang, Jiayin Wang, Linghua Wang, Qi Wang, Wenyi Wang, Yumeng Wang, Zhining Wang, Paul M. Waring, Hans-Jörg Warnatz, Jonathan Warrell, Anne Y. Warren, Sebastian M. Waszak, David C. Wedge, Dieter Weichenhan, Paul Weinberger, John N. Weinstein, Joachim Weischenfeldt, Daniel J. Weisenberger, Ian Welch, Michael C. Wendl, Johannes Werner, Justin P. Whalley, David A. Wheeler, Hayley C. Whitaker, Dennis Wigle, Matthew D. Wilkerson, Ashley Williams, James S. Wilmott, Gavin W. Wilson, Julie M. Wilson, Richard K. Wilson, Boris Winterhoff, Jeffrey A. Wintersinger, Maciej Wiznerowicz, Stephan Wolf, Bernice H. Wong, Tina Wong, Winghing Wong, Youngchoon Woo, Scott Wood, Bradly G. Wouters, Adam J. Wright, Derek W. Wright, Mark H. Wright, Chin-Lee Wu, Dai-Ying Wu, Guanming Wu, Jianmin Wu, Kui Wu, Yang Wu, Zhenggang Wu, Liu Xi, Tian Xia, Qian Xiang, Xiao Xiao, Rui Xing, Heng Xiong, Qinying Xu, Yanxun Xu, Hong Xue, Shinichi Yachida, Sergei Yakneen, Rui Yamaguchi, Takafumi N. Yamaguchi, Masakazu Yamamoto, Shogo Yamamoto, Hiroki Yamaue, Fan Yang, Huanming Yang, Jean Y. Yang, Liming Yang, Lixing Yang, Shanlin Yang, Tsun-Po Yang, Yang Yang, Xiaotong Yao, Marie-Laure Yaspo, Lucy Yates, Christina Yau, Chen Ye, Kai Ye, Venkata D. Yellapantula, Christopher J. Yoon, Sung-Soo Yoon, Fouad Yousif, Jun Yu, Kaixian Yu, Willie Yu, Yingyan Yu, Ke Yuan, Yuan Yuan, Denis Yuen, Christina K. Yung, Olga Zaikova, Jorge Zamora, Marc Zapatka, Jean C. Zenklusen, Thorsten Zenz, Nikolajs Zeps, Cheng-Zhong Zhang, Fan Zhang, Hailei Zhang, Hongwei Zhang, Hongxin Zhang, Jiashan Zhang, Jing Zhang, Junjun Zhang, Xiuqing Zhang, Xuanping Zhang, Yan Zhang, Zemin Zhang, Zhongming Zhao, Liangtao Zheng, Xiuqing Zheng, Wanding Zhou, Yong Zhou, Bin Zhu, Hongtu Zhu, Jingchun Zhu, Shida Zhu, Lihua Zou, Xueqing Zou, Anna deFazio, Nicholas van As, Carolien H. M. van Deurzen, Marc J. van de Vijver, L. van’t Veer, and Christian von Mering

Extended data

is available for this paper at 10.1038/s41586-020-1969-6.

Supplementary information

is available for this paper at 10.1038/s41586-020-1969-6.

References

1.Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature463, 191–196 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature463, 184–190 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature456, 66–72 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,693 cancer whole genomes. Nature10.1038/s41586-020-1965-x (2020). [DOI] [PMC free article] [PubMed]
5.Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature10.1038/s41586-020-1943-3 (2020). [DOI] [PMC free article] [PubMed]
6.Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature10.1038/s41586-019-1913-9 (2020). [DOI] [PMC free article] [PubMed]
7.Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature10.1038/s41586-019-1907-7 (2020). [DOI] [PMC free article] [PubMed]
8.PCAWG Transcriptome Core Group et al. Genomic basis of RNA alterations in cancer. Nature10.1038/s41586-020-1970-0 (2020). [DOI] [PMC free article] [PubMed]
9.Zhang, Y. et al. High-coverage whole-genome analysis of 1,220 cancers reveals hundreds of genes deregulated by rearrangement-mediated cis-regulatory alterations. Nat. Commun. 10.1038/s41467-019-13885-w (2020). [DOI] [PMC free article] [PubMed]
10.Rodriguez-Martin, B. et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 10.1038/s41588-019-0562-0 (2020). [DOI] [PMC free article] [PubMed]
11.Zapatka, M. et al. The landscape of viral associations in human cancers. Nat. Genet.10.1038/s41588-019-0558-9 (2020). [DOI] [PMC free article] [PubMed]
12.Jiao, W. et al. A deep learning system can accurately classify primary and metastatic cancers based on patterns of passenger mutations. Nat. Commun. 10.1038/s41467-019-13825-8 (2020).
13.Sieverling, L. et al. Genomic footprints of activated telomere maintenance mechanisms in cancer. Nat. Commun. 10.1038/s41467-019-13824-9 (2020). [DOI] [PMC free article] [PubMed]
14.Yuan, Y. et al. Comprehensive molecular characterization of mitochondrial genomes in human cancers. Nat. Genet. 10.1038/s41588-019-0557-x (2020). [DOI] [PMC free article] [PubMed]
15.Akdemir, K. C. et al. Chromatin folding domains disruptions by somatic genomic rearrangements in human cancers. Nat. Genet. 10.1038/s41588-019-0564-y (2020). [DOI] [PMC free article] [PubMed]
16.Reyna, M. A. et al. Pathway and network analysis of more than 2,500 whole cancer genomes. Nat. Commun. 10.1038/s41467-020-14351-8 (2020). [DOI] [PMC free article] [PubMed]
17.Bailey, M. H. et al. Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples. Nat. Commun. (2020). [DOI] [PMC free article] [PubMed]
18.Cortes-Ciriano, I. et al. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nat. Genet. 10.1038/s41588-019-0576-7 (2020). [DOI] [PMC free article] [PubMed]
19.Bray, F., Ren, J.-S., Masuyer, E. & Ferlay, J. Global estimates of cancer prevalence for 27 sites in the adult population in 2008. Int. J. Cancer132, 1133–1145 (2013). [DOI] [PubMed] [Google Scholar]
20.Tarver, T. Cancer Facts & Figures 2012. American Cancer Society (ACS). J. Consum. Health Internet16, 366–367 (2012). [Google Scholar]
21.Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell144, 646–674 (2011). [DOI] [PubMed] [Google Scholar]
22.International Cancer Genome Consortium. International network of cancer genome projects. Nature464, 993–998 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell173, 371–385 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Sanchez-Vega, F. et al. Oncogenic signaling pathways in The Cancer Genome Atlas. Cell173, 321–337 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Hoadley, K. A. et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell173, 291–304 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Stein, L. D., Knoppers, B. M., Campbell, P., Getz, G. & Korbel, J. O. Data analysis: create a cloud commons. Nature523, 149–151 (2015). [DOI] [PubMed] [Google Scholar]
27.Phillips, M. et al. Genomics: data sharing needs international code of conduct. Nature10.1038/d41586-020-00082-9 (2020). [DOI] [PubMed]
28.Krochmalski, J. Developing with Docker (Packt Publishing, 2016).
29.Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell150, 264–278 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature534, 47–54 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Meier, B. et al. C. elegans whole-genome sequencing reveals mutational signatures related to carcinogens and DNA repair deficiency. Genome Res. 24, 1624–1636 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell171, 1029–1041 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Tamborero, D. et al. Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations. Genome Med. 10, 25 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science339, 957–959 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Rheinbay, E. et al. Recurrent and functional regulatory mutations in breast cancer. Nature547, 55–60 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
36.Fredriksson, N. J., Ny, L., Nilsson, J. A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014). [DOI] [PubMed] [Google Scholar]
37.Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science339, 959–961 (2013). [DOI] [PubMed] [Google Scholar]
38.Ciriello, G. et al. Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45, 1127–1133 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Rahman, N. Realizing the promise of cancer predisposition genes. Nature505, 302–308 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Pearl, L. H., Schierz, A. C., Ward, S. E., Al-Lazikani, B. & Pearl, F. M. G. Therapeutic opportunities within the DNA damage response. Nat. Rev. Cancer15, 166–180 (2015). [DOI] [PubMed] [Google Scholar]
41.Taylor-Weiner, A. et al. DeTiN: overcoming tumor-in-normal contamination. Nat. Methods15, 531–534 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Fujimoto, A. et al. Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer. Nat. Genet. 48, 500–509 (2016). [DOI] [PubMed] [Google Scholar]
43.Shlush, L. I. Age-related clonal hematopoiesis. Blood131, 496–504 (2018). [DOI] [PubMed] [Google Scholar]
44.Northcott, P. A. et al. The whole-genome landscape of medulloblastoma subtypes. Nature547, 311–317 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
45.Scarpa, A. et al. Whole-genome landscape of pancreatic neuroendocrine tumours. Nature543, 65–71 (2017). [DOI] [PubMed] [Google Scholar]
46.Davis, C. F. et al. The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell26, 319–330 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
47.Berger, M. F. et al. The genomic complexity of primary human prostate cancer. Nature470, 214–220 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
48.Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell153, 666–677 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell149, 994–1007 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell149, 979–993 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Roberts, S. A. et al. Clustered mutations in yeast and in human cancers can arise from damaged long single-strand DNA regions. Mol. Cell46, 424–435 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Rausch, T. et al. Genome sequencing of pediatric medulloblastoma links catastrophic DNA rearrangements with TP53 mutations. Cell148, 59–71 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
53.Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell144, 27–40 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
54.Korbel, J. O. & Campbell, P. J. Criteria for inference of chromothripsis in cancer genomes. Cell152, 1226–1236 (2013). [DOI] [PubMed] [Google Scholar]
55.Zhang, C.-Z. et al. Chromothripsis from DNA damage in micronuclei. Nature522, 179–184 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
56.The Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell159, 676–690 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Supek, F. & Lehner, B. Clustered mutation signatures reveal that error-prone DNA repair targets mutations to active genes. Cell170, 534–547 (2017). [DOI] [PubMed] [Google Scholar]
58.Mardin, B. R. et al. A cell-based model system links chromothripsis with hyperploidy. Mol. Syst. Biol. 11, 828 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
59.Weischenfeldt, J. et al. Integrative genomic analyses reveal an androgen-driven somatic alteration landscape in early-onset prostate cancer. Cancer Cell23, 159–170 (2013). [DOI] [PubMed] [Google Scholar]
60.Garsed, D. W. et al. The architecture and evolution of cancer neochromosomes. Cancer Cell26, 653–667 (2014). [DOI] [PubMed] [Google Scholar]
61.Durinck, S. et al. Temporal dissection of tumorigenesis in primary cancers. Cancer Discov. 1, 137–143 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
62.Hayward, N. K. et al. Whole-genome landscapes of major melanoma subtypes. Nature545, 175–180 (2017). [DOI] [PubMed] [Google Scholar]
63.The Cancer Genome Atlas Network. Genomic classification of cutaneous melanoma. Cell161, 1681–1696 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
64.Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature500, 415–421 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
65.Chan, K. et al. An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet. 47, 1067–1072 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
66.Nik-Zainal, S. et al. Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer. Nat. Genet. 46, 487–491 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
67.Middlebrooks, C. D. et al. Association of germline variants in the APOBEC3 region with cancer risk and enrichment with APOBEC-signature mutations in tumors. Nat. Genet. 48, 1330–1338 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
68.Westra, H.-J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
69.Stranger, B. E. et al. Population genomics of human gene expression. Nat. Genet. 39, 1217–1224 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
70.Menghi, F. et al. The tandem duplicator phenotype as a distinct genomic configuration in cancer. Proc. Natl Acad. Sci. USA113, E2373–E2382 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
71.Hendrich, B., Hardeland, U., Ng, H. H., Jiricny, J. & Bird, A. The thymine glycosylase MBD4 can bind to the product of deamination at methylated CpG sites. Nature401, 301–304 (1999). [DOI] [PubMed] [Google Scholar]
72.Lee, E. et al. Landscape of somatic retrotransposition in human cancers. Science337, 967–971 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
73.Tubio, J. M. C. et al. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science345, 1251343–1251343 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
74.Helman, E. et al. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 24, 1053–1063 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
75.Shay, J. W. & Wright, W. E. Hayflick, his limit, and cellular ageing. Nat. Rev. Mol. Cell Biol. 1, 72–76 (2000). [DOI] [PubMed] [Google Scholar]
76.Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblastoma. Nature526, 700–704 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
77.Totoki, Y. et al. Trans-ancestry mutational landscape of hepatocellular carcinoma genomes. Nat. Genet. 46, 1267–1273 (2014). [DOI] [PubMed] [Google Scholar]
78.Paterlini-Bréchot, P. et al. Hepatitis B virus-related insertional mutagenesis occurs frequently in human liver cancers and recurrently targets human telomerase gene. Oncogene22, 3911–3916 (2003). [DOI] [PubMed] [Google Scholar]
79.Heaphy, C. M. et al. Prevalence of the alternative lengthening of telomeres telomere maintenance mechanism in human cancer subtypes. Am. J. Pathol. 179, 1608–1615 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
80.Barthel, F. P. et al. Systematic analysis of telomere length and somatic alterations in 31 cancer types. Nat. Genet. 49, 349–357 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
81.García-Cao, M., Gonzalo, S., Dean, D. & Blasco, M. A. A role for the Rb family of proteins in controlling telomere length. Nat. Genet. 32, 415–419 (2002). [DOI] [PubMed] [Google Scholar]
82.Tomasetti, C. & Vogelstein, B. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science347, 78–81 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
83.Gerstung, M. et al. Precision oncology for acute myeloid leukemia using a knowledge bank approach. Nat. Genet. 49, 332–340 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
84.O’Connor, B. D. et al. The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows. F1000Res. 6, 52 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
85.Zhang, J. et al. The International Cancer Genome Consortium Data Portal. Nat. Biotechnol. 37, 367–369 (2019). [DOI] [PubMed] [Google Scholar]
86.Miller, C. A., Qiao, Y., DiSera, T., D’Astous, B. & Marth, G. T. bam.iobio: a web-based, real-time, sequence alignment file inspector. Nat. Methods11, 1189–1189 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
87.Goldman, M. et al. The UCSC Xena platform for public and private cancer genomics data visualization and interpretation. Preprint at https://www.biorxiv.org/content/10.1101/326470v6 (2019).
88.Papatheodorou, I. et al. Expression Atlas: gene and protein expression across multiple studies and organisms. Nucleic Acids Res. 46, D246–D251 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
89.NCI SEER. ICD-O-3 Coding Materials (2018).
90.Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics26, 589–595 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
91.1000 Genomes Project Consortium. A global reference for human genetic variation. Nature526, 68–74 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
92.Raine, K. M. et al. ascatNgs: identifying somatically acquired copy-number alterations from whole-genome sequencing data. Curr. Protoc. Bioinformatics56, 15.9.1–15.9.17 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
93.Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics56, 15.10.1–15.10.18 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
94.Raine, K. M. et al. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinformatics52, 15.7.1–15.7.12 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
95.Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics25, 2865–2871 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
96.Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics28, i333–i339 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
97.Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
98.Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
99.Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
100.Drier, Y. et al. Somatic rearrangements across cancer reveal classes of samples with distinct patterns of DNA breakage and rearrangement-induced hypermutability. Genome Res. 23, 228–235 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
101.Ramos, A. H. et al. Oncotator: cancer variant annotation tool. Hum. Mutat. 36, E2423–E2429 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
102.Moncunill, V. et al. Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads. Nat. Biotechnol. 32, 1106–1112 (2014). [DOI] [PubMed] [Google Scholar]
103.Fan, Y. et al. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol. 17, 178 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
104.Cooke, S. L. et al. Processed pseudogenes acquired somatically during cancer development. Nat. Commun. 5, 3644 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
105.Ju, Y. S. et al. Origins and functional consequences of somatic mitochondrial DNA mutations in human cancer. eLife3, e02935 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
106.Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature526, 75–81 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
107.Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://arxiv.org/abs/1207.3907 (2012).
108.DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
109.Yakneen, S., Waszak, S. M., Gertz, M. & Korbel, J. O. & PCAWG Consortium. Butler enables rapid cloud-based analysis of thousands of human genomes. Nat. Biotechnol. 10.1038/s41587-019-0360-3 (2020). [DOI] [PMC free article] [PubMed]
110.Kim, S. Y., Jacob, L. & Speed, T. P. Combining calls from multiple somatic mutation-callers. BMC Bioinformatics15, 154 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
111.Breiman, L. Stacked regressions. Mach. Learn. 24, 49–64 (1996). [Google Scholar]
112.Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
113.Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Information^{(53MB, pdf)}

This file contains Supplementary Figures 1-19, Supplementary Methods and Supplementary Notes 1-6

Reporting Summary^{(97.3KB, pdf)}

Supplementary Tables^{(1.4MB, zip)}

This zipped file contains Supplementary Tables 1-21 and a Supplementary Table Guide

Supplementary Information^{(470.7KB, pdf)}

Supplementary information

Data Availability Statement

[CR1] 1.Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature463, 191–196 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Pleasance, E. D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature463, 184–190 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Ley, T. J. et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature456, 66–72 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR4] 4.Rheinbay, E. et al. Analyses of non-coding somatic drivers in 2,693 cancer whole genomes. Nature10.1038/s41586-020-1965-x (2020). [DOI] [PMC free article] [PubMed]

[CR5] 5.Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature10.1038/s41586-020-1943-3 (2020). [DOI] [PMC free article] [PubMed]

[CR6] 6.Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature10.1038/s41586-019-1913-9 (2020). [DOI] [PMC free article] [PubMed]

[CR7] 7.Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature10.1038/s41586-019-1907-7 (2020). [DOI] [PMC free article] [PubMed]

[CR8] 8.PCAWG Transcriptome Core Group et al. Genomic basis of RNA alterations in cancer. Nature10.1038/s41586-020-1970-0 (2020). [DOI] [PMC free article] [PubMed]

[CR9] 9.Zhang, Y. et al. High-coverage whole-genome analysis of 1,220 cancers reveals hundreds of genes deregulated by rearrangement-mediated cis-regulatory alterations. Nat. Commun. 10.1038/s41467-019-13885-w (2020). [DOI] [PMC free article] [PubMed]

[CR10] 10.Rodriguez-Martin, B. et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 10.1038/s41588-019-0562-0 (2020). [DOI] [PMC free article] [PubMed]

[CR11] 11.Zapatka, M. et al. The landscape of viral associations in human cancers. Nat. Genet.10.1038/s41588-019-0558-9 (2020). [DOI] [PMC free article] [PubMed]

[CR12] 12.Jiao, W. et al. A deep learning system can accurately classify primary and metastatic cancers based on patterns of passenger mutations. Nat. Commun. 10.1038/s41467-019-13825-8 (2020).

[CR13] 13.Sieverling, L. et al. Genomic footprints of activated telomere maintenance mechanisms in cancer. Nat. Commun. 10.1038/s41467-019-13824-9 (2020). [DOI] [PMC free article] [PubMed]

[CR14] 14.Yuan, Y. et al. Comprehensive molecular characterization of mitochondrial genomes in human cancers. Nat. Genet. 10.1038/s41588-019-0557-x (2020). [DOI] [PMC free article] [PubMed]

[CR15] 15.Akdemir, K. C. et al. Chromatin folding domains disruptions by somatic genomic rearrangements in human cancers. Nat. Genet. 10.1038/s41588-019-0564-y (2020). [DOI] [PMC free article] [PubMed]

[CR16] 16.Reyna, M. A. et al. Pathway and network analysis of more than 2,500 whole cancer genomes. Nat. Commun. 10.1038/s41467-020-14351-8 (2020). [DOI] [PMC free article] [PubMed]

[CR17] 17.Bailey, M. H. et al. Retrospective evaluation of whole exome and genome mutation calls in 746 cancer samples. Nat. Commun. (2020). [DOI] [PMC free article] [PubMed]

[CR18] 18.Cortes-Ciriano, I. et al. Comprehensive analysis of chromothripsis in 2,658 human cancers using whole-genome sequencing. Nat. Genet. 10.1038/s41588-019-0576-7 (2020). [DOI] [PMC free article] [PubMed]

[CR19] 19.Bray, F., Ren, J.-S., Masuyer, E. & Ferlay, J. Global estimates of cancer prevalence for 27 sites in the adult population in 2008. Int. J. Cancer132, 1133–1145 (2013). [DOI] [PubMed] [Google Scholar]

[CR20] 20.Tarver, T. Cancer Facts & Figures 2012. American Cancer Society (ACS). J. Consum. Health Internet16, 366–367 (2012). [Google Scholar]

[CR21] 21.Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell144, 646–674 (2011). [DOI] [PubMed] [Google Scholar]

[CR22] 22.International Cancer Genome Consortium. International network of cancer genome projects. Nature464, 993–998 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell173, 371–385 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR24] 24.Sanchez-Vega, F. et al. Oncogenic signaling pathways in The Cancer Genome Atlas. Cell173, 321–337 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR25] 25.Hoadley, K. A. et al. Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell173, 291–304 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR26] 26.Stein, L. D., Knoppers, B. M., Campbell, P., Getz, G. & Korbel, J. O. Data analysis: create a cloud commons. Nature523, 149–151 (2015). [DOI] [PubMed] [Google Scholar]

[CR27] 27.Phillips, M. et al. Genomics: data sharing needs international code of conduct. Nature10.1038/d41586-020-00082-9 (2020). [DOI] [PubMed]

[CR28] 28.Krochmalski, J. Developing with Docker (Packt Publishing, 2016).

[CR29] 29.Welch, J. S. et al. The origin and evolution of mutations in acute myeloid leukemia. Cell150, 264–278 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature534, 47–54 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR31] 31.Meier, B. et al. C. elegans whole-genome sequencing reveals mutational signatures related to carcinogens and DNA repair deficiency. Genome Res. 24, 1624–1636 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell171, 1029–1041 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Tamborero, D. et al. Cancer Genome Interpreter annotates the biological and clinical relevance of tumor alterations. Genome Med. 10, 25 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR34] 34.Huang, F. W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science339, 957–959 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR35] 35.Rheinbay, E. et al. Recurrent and functional regulatory mutations in breast cancer. Nature547, 55–60 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR36] 36.Fredriksson, N. J., Ny, L., Nilsson, J. A. & Larsson, E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 46, 1258–1263 (2014). [DOI] [PubMed] [Google Scholar]

[CR37] 37.Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science339, 959–961 (2013). [DOI] [PubMed] [Google Scholar]

[CR38] 38.Ciriello, G. et al. Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45, 1127–1133 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR39] 39.Rahman, N. Realizing the promise of cancer predisposition genes. Nature505, 302–308 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR40] 40.Pearl, L. H., Schierz, A. C., Ward, S. E., Al-Lazikani, B. & Pearl, F. M. G. Therapeutic opportunities within the DNA damage response. Nat. Rev. Cancer15, 166–180 (2015). [DOI] [PubMed] [Google Scholar]

[CR41] 41.Taylor-Weiner, A. et al. DeTiN: overcoming tumor-in-normal contamination. Nat. Methods15, 531–534 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR42] 42.Fujimoto, A. et al. Whole-genome mutational landscape and characterization of noncoding and structural mutations in liver cancer. Nat. Genet. 48, 500–509 (2016). [DOI] [PubMed] [Google Scholar]

[CR43] 43.Shlush, L. I. Age-related clonal hematopoiesis. Blood131, 496–504 (2018). [DOI] [PubMed] [Google Scholar]

[CR44] 44.Northcott, P. A. et al. The whole-genome landscape of medulloblastoma subtypes. Nature547, 311–317 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR45] 45.Scarpa, A. et al. Whole-genome landscape of pancreatic neuroendocrine tumours. Nature543, 65–71 (2017). [DOI] [PubMed] [Google Scholar]

[CR46] 46.Davis, C. F. et al. The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell26, 319–330 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR47] 47.Berger, M. F. et al. The genomic complexity of primary human prostate cancer. Nature470, 214–220 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR48] 48.Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell153, 666–677 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR49] 49.Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell149, 994–1007 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR50] 50.Nik-Zainal, S. et al. Mutational processes molding the genomes of 21 breast cancers. Cell149, 979–993 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR51] 51.Roberts, S. A. et al. Clustered mutations in yeast and in human cancers can arise from damaged long single-strand DNA regions. Mol. Cell46, 424–435 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR52] 52.Rausch, T. et al. Genome sequencing of pediatric medulloblastoma links catastrophic DNA rearrangements with TP53 mutations. Cell148, 59–71 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR53] 53.Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell144, 27–40 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR54] 54.Korbel, J. O. & Campbell, P. J. Criteria for inference of chromothripsis in cancer genomes. Cell152, 1226–1236 (2013). [DOI] [PubMed] [Google Scholar]

[CR55] 55.Zhang, C.-Z. et al. Chromothripsis from DNA damage in micronuclei. Nature522, 179–184 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR56] 56.The Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell159, 676–690 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR57] 57.Supek, F. & Lehner, B. Clustered mutation signatures reveal that error-prone DNA repair targets mutations to active genes. Cell170, 534–547 (2017). [DOI] [PubMed] [Google Scholar]

[CR58] 58.Mardin, B. R. et al. A cell-based model system links chromothripsis with hyperploidy. Mol. Syst. Biol. 11, 828 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR59] 59.Weischenfeldt, J. et al. Integrative genomic analyses reveal an androgen-driven somatic alteration landscape in early-onset prostate cancer. Cancer Cell23, 159–170 (2013). [DOI] [PubMed] [Google Scholar]

[CR60] 60.Garsed, D. W. et al. The architecture and evolution of cancer neochromosomes. Cancer Cell26, 653–667 (2014). [DOI] [PubMed] [Google Scholar]

[CR61] 61.Durinck, S. et al. Temporal dissection of tumorigenesis in primary cancers. Cancer Discov. 1, 137–143 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR62] 62.Hayward, N. K. et al. Whole-genome landscapes of major melanoma subtypes. Nature545, 175–180 (2017). [DOI] [PubMed] [Google Scholar]

[CR63] 63.The Cancer Genome Atlas Network. Genomic classification of cutaneous melanoma. Cell161, 1681–1696 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR64] 64.Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature500, 415–421 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR65] 65.Chan, K. et al. An APOBEC3A hypermutation signature is distinguishable from the signature of background mutagenesis by APOBEC3B in human cancers. Nat. Genet. 47, 1067–1072 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR66] 66.Nik-Zainal, S. et al. Association of a germline copy number polymorphism of APOBEC3A and APOBEC3B with burden of putative APOBEC-dependent mutations in breast cancer. Nat. Genet. 46, 487–491 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR67] 67.Middlebrooks, C. D. et al. Association of germline variants in the APOBEC3 region with cancer risk and enrichment with APOBEC-signature mutations in tumors. Nat. Genet. 48, 1330–1338 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR68] 68.Westra, H.-J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR69] 69.Stranger, B. E. et al. Population genomics of human gene expression. Nat. Genet. 39, 1217–1224 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR70] 70.Menghi, F. et al. The tandem duplicator phenotype as a distinct genomic configuration in cancer. Proc. Natl Acad. Sci. USA113, E2373–E2382 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR71] 71.Hendrich, B., Hardeland, U., Ng, H. H., Jiricny, J. & Bird, A. The thymine glycosylase MBD4 can bind to the product of deamination at methylated CpG sites. Nature401, 301–304 (1999). [DOI] [PubMed] [Google Scholar]

[CR72] 72.Lee, E. et al. Landscape of somatic retrotransposition in human cancers. Science337, 967–971 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR73] 73.Tubio, J. M. C. et al. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science345, 1251343–1251343 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR74] 74.Helman, E. et al. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 24, 1053–1063 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR75] 75.Shay, J. W. & Wright, W. E. Hayflick, his limit, and cellular ageing. Nat. Rev. Mol. Cell Biol. 1, 72–76 (2000). [DOI] [PubMed] [Google Scholar]

[CR76] 76.Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblastoma. Nature526, 700–704 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR77] 77.Totoki, Y. et al. Trans-ancestry mutational landscape of hepatocellular carcinoma genomes. Nat. Genet. 46, 1267–1273 (2014). [DOI] [PubMed] [Google Scholar]

[CR78] 78.Paterlini-Bréchot, P. et al. Hepatitis B virus-related insertional mutagenesis occurs frequently in human liver cancers and recurrently targets human telomerase gene. Oncogene22, 3911–3916 (2003). [DOI] [PubMed] [Google Scholar]

[CR79] 79.Heaphy, C. M. et al. Prevalence of the alternative lengthening of telomeres telomere maintenance mechanism in human cancer subtypes. Am. J. Pathol. 179, 1608–1615 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR80] 80.Barthel, F. P. et al. Systematic analysis of telomere length and somatic alterations in 31 cancer types. Nat. Genet. 49, 349–357 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR81] 81.García-Cao, M., Gonzalo, S., Dean, D. & Blasco, M. A. A role for the Rb family of proteins in controlling telomere length. Nat. Genet. 32, 415–419 (2002). [DOI] [PubMed] [Google Scholar]

[CR82] 82.Tomasetti, C. & Vogelstein, B. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science347, 78–81 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR83] 83.Gerstung, M. et al. Precision oncology for acute myeloid leukemia using a knowledge bank approach. Nat. Genet. 49, 332–340 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR84] 84.O’Connor, B. D. et al. The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows. F1000Res. 6, 52 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR85] 85.Zhang, J. et al. The International Cancer Genome Consortium Data Portal. Nat. Biotechnol. 37, 367–369 (2019). [DOI] [PubMed] [Google Scholar]

[CR86] 86.Miller, C. A., Qiao, Y., DiSera, T., D’Astous, B. & Marth, G. T. bam.iobio: a web-based, real-time, sequence alignment file inspector. Nat. Methods11, 1189–1189 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR87] 87.Goldman, M. et al. The UCSC Xena platform for public and private cancer genomics data visualization and interpretation. Preprint at https://www.biorxiv.org/content/10.1101/326470v6 (2019).

[CR88] 88.Papatheodorou, I. et al. Expression Atlas: gene and protein expression across multiple studies and organisms. Nucleic Acids Res. 46, D246–D251 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR89] 89.NCI SEER. ICD-O-3 Coding Materials (2018).

[CR90] 90.Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics26, 589–595 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR91] 91.1000 Genomes Project Consortium. A global reference for human genetic variation. Nature526, 68–74 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR92] 92.Raine, K. M. et al. ascatNgs: identifying somatically acquired copy-number alterations from whole-genome sequencing data. Curr. Protoc. Bioinformatics56, 15.9.1–15.9.17 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR93] 93.Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics56, 15.10.1–15.10.18 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR94] 94.Raine, K. M. et al. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinformatics52, 15.7.1–15.7.12 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR95] 95.Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics25, 2865–2871 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR96] 96.Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics28, i333–i339 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR97] 97.Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR98] 98.Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR99] 99.Carter, S. L. et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol. 30, 413–421 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR100] 100.Drier, Y. et al. Somatic rearrangements across cancer reveal classes of samples with distinct patterns of DNA breakage and rearrangement-induced hypermutability. Genome Res. 23, 228–235 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR101] 101.Ramos, A. H. et al. Oncotator: cancer variant annotation tool. Hum. Mutat. 36, E2423–E2429 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR102] 102.Moncunill, V. et al. Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads. Nat. Biotechnol. 32, 1106–1112 (2014). [DOI] [PubMed] [Google Scholar]

[CR103] 103.Fan, Y. et al. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data. Genome Biol. 17, 178 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR104] 104.Cooke, S. L. et al. Processed pseudogenes acquired somatically during cancer development. Nat. Commun. 5, 3644 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR105] 105.Ju, Y. S. et al. Origins and functional consequences of somatic mitochondrial DNA mutations in human cancer. eLife3, e02935 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR106] 106.Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature526, 75–81 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR107] 107.Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://arxiv.org/abs/1207.3907 (2012).

[CR108] 108.DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR109] 109.Yakneen, S., Waszak, S. M., Gertz, M. & Korbel, J. O. & PCAWG Consortium. Butler enables rapid cloud-based analysis of thousands of human genomes. Nat. Biotechnol. 10.1038/s41587-019-0360-3 (2020). [DOI] [PMC free article] [PubMed]

[CR110] 110.Kim, S. Y., Jacob, L. & Speed, T. P. Combining calls from multiple somatic mutation-callers. BMC Bioinformatics15, 154 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR111] 111.Breiman, L. Stacked regressions. Mach. Learn. 24, 49–64 (1996). [Google Scholar]

[CR112] 112.Campbell, P. J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR113] 113.Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Pan-cancer analysis of whole genomes

Abstract

Main

The pan-cancer analysis of whole genomes

Extended Data Fig. 1. Flow-chart showing key steps in the analysis of PCAWG genomes.

Extended Data Table 1.

Extended Data Table 2.

Box 1 Online resources for data access, visualization and analysis.

Benchmarking of genetic variant calls

Fig. 1. Validation of variant-calling pipelines in PCAWG.

Extended Data Fig. 2. Distribution of accuracy estimates across algorithms and samples from validation data.

Analysis of PCAWG data

Extended Data Table 3.

Pan-cancer burden of somatic mutations

Extended Data Fig. 3. Distribution of numbers of somatic mutations of different classes across tumour types.

Panorama of driver mutations in cancer

Fig. 2. Panorama of driver mutations in PCAWG.

PCAWG tumours with no apparent drivers

Extended Data Fig. 4. Patients with no detected driver mutations in PCAWG.

Fig. 3. Analysis of patients with no detected driver mutations.

Patterns of clustered mutations and SVs

Extended Data Fig. 5. Examples of clustered mutational processes.

Fig. 4. Patterns of clustered mutational processes in PCAWG.

Extended Data Fig. 6. Patterns of intense kataegis.

Extended Data Fig. 7. Association of chromothripsis with covariates and driver events.

Timing clustered mutations in evolution

Fig. 5. Timing of clustered events in PCAWG.

Extended Data Fig. 8. Further examples of chromothripsis-induced amplification targeting multiple cancer-associated genes simultaneously in melanoma.

Extended Data Fig. 9. Timing the amplifications after chromothripsis in molecular time for 10 representative cases.

Germline effects on somatic mutations

Extended Data Fig. 10. Association between common germline variants and endogenous mutational processes.

Fig. 6. Germline determinants of the somatic mutation landscape.

Extended Data Fig. 11. Association between rare germline PTVs in protein-coding genes and somatic mutational phenotypes.

Extended Data Fig. 12. Germline MEI call set.

Replicative immortality

Fig. 7. Telomere sequence patterns across PCAWG.

Extended Data Fig. 13. Different mechanisms of telomere lengthening in cancer.

Conclusions and future perspectives

Methods

Samples

Uniform processing and somatic variant calling

Integration, phasing and validation of germline variant call-sets

Core alignment and variant calling by cloud computing

Validation, benchmarking and merging of somatic variant calls

Reporting summary

Online content

Supplementary information

Acknowledgements

Extended data figures and tables

Author contributions

Data availability

Code availability

Competing interests

Footnotes

Contributor Information

Extended data

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Box 1  Online resources for data access, visualization and analysis.