Association between tumour somatic mutations and venous thromboembolism in the 100,000 Genomes Project cancer cohort: a study protocol

Naomi Cornish; Sarah K Westbury; Matthew T Warkentin; Chrissie Thirlwell; Andrew D Mumford; Philip C Haycock

doi:10.12688/wellcomeopenres.23156.2

. 2024 Dec 24;9:640. Originally published 2024 Nov 4. [Version 2] doi: 10.12688/wellcomeopenres.23156.2

Association between tumour somatic mutations and venous thromboembolism in the 100,000 Genomes Project cancer cohort: a study protocol

Naomi Cornish ^1,^2,^a, Sarah K Westbury ², Matthew T Warkentin ³, Chrissie Thirlwell ², Andrew D Mumford ¹, Philip C Haycock ²

PMCID: PMC11809147 PMID: 39931111

Version Changes

Revised. Amendments from Version 1

In this new version, several changes have been made to address the reviewers’ comments: 1) The algorithm used to identify venous thromboembolism from ICD10 codes in electronic health records has been expanded slightly to ensure upper limb venous thromboembolism diagnoses are identified (table 1). We also justify the exclusion of superficial vein thrombosis of the lower limb in the ‘Primary outcome definition’ section. 2)We have made some minor changes to the text in the statistical analyses section (under primary analysis) to clarify how we will analyse the data and assess for violations of the proportional hazards assumption underlying the Cox regression models. 3) We have updated the sensitivity analyses to include an assessment of the interaction between a germline polygenic risk score and tumour somatic mutations on the rate of venous thromboembolism (sensitivity analysis 1), and an ancestry-stratified analysis (sensitivity analysis 5). 4) We have expanded the discussion to outline some additional limitations of the proposed analysis and contextualise these within the recent literature in this field.

Abstract

Venous thromboembolism (VTE) is a common cause of morbidity and mortality in patients with cancer. There is evidence that specific aberrations in tumour biology contribute to the pathophysiology of this condition. We plan to examine the association between tumour somatic mutations and VTE in an existing cohort of patients with cancer, who were enrolled to the flagship Genomics England 100,000 Genomes Project. Here, we outline an a-priori analysis plan to address this objective, including details on study cohort selection, exposure and outcome definitions, annotation of genetic variants and planned statistical analyses. We will assess the effect of 1) deleterious somatic DNA variants in each gene; 2) tumour mutational burden and 3) tumour mutational signatures on the rate of VTE (outcome) in a pan-cancer cohort. Sensitivity analyses will be performed to examine the robustness of any associations, including adjustment for potentially correlated co-variates: tumour type, stage and systemic anti-cancer therapy. We hope that results from this study may help to identify key genes which are implicated in the development of cancer associated thrombosis, which may shed light on related mechanistic pathways and/or provide data which can be integrated into genetic risk prediction models for these patients.

Keywords: Cancer, venous thromboembolism, tumour, somatic, mutation, genomic

Plain Language Summary

People with cancer have a significantly higher risk of developing a condition known as venous thromboembolism (a type of blood clot). This is a major cause of illness and death. Rates of venous thromboembolism vary between different types of cancer and the reasons for this are not well understood. The purpose of this study is to evaluate whether there is a link between acquired genetic variants (which occur within the tumour) and venous thromboembolism. This study will hopefully contribute to our understanding of the mechanisms which drive the development of cancer-associated venous thromboembolism and help future efforts to develop models for more accurately identifying people at highest risk of the condition.

Introduction

Venous thromboembolism (VTE) is a frequent complication of malignancy and a leading cause of cancer-related death ¹. Estimates of the absolute risk of VTE in cancer patients vary enormously, from approximately 0.6% to over 20%, depending on a myriad of incompletely-understood risk factors ². These include patient, cancer-specific and treatment-related variables ³.

One of the most powerful predictive factors for cancer-associated VTE is the anatomical type of tumour ⁴. Many tumours express elevated levels of prothrombotic proteins, or directly interact with adhesion molecules on platelets and vascular endothelial cells ^3,
5. This suggests that VTE may result from specific aberrations of tumour biology.

Several groups have previously explored the association between alterations within the tumour genome and VTE. The majority of these studies are relatively small and have focused on a handful of candidate genes (including KRAS, EGFR, ALK and IDH1), sometimes with conflicting results ^6,
7.

The largest study in this field used the Memorial Sloan Kettering (MSK)-IMPACT platform to sequence target genes in 11,695 tumour samples ⁸. Their pan-cancer analysis was restricted to 53 known oncogenes or tumour-suppressor genes, and found that somatic mutations in 7 genes modulated the risk of VTE. A separate analysis in the same cohort found several of these genes were also implicated in the development of arterial thrombosis ⁹. Recently, another publication from the MSK group has reported that levels of circulating tumour DNA independently predict VTE risk in a dose-dependent relationship. Interestingly some gene level alterations (including KRAS, STK11 and KEAP1) were also shown to be associated with VTE in the discovery cohort (n = 4,141) ¹⁰.

There is scope to attempt to replicate these results in an independent cohort and to further explore associations between VTE and somatic mutations in genes which have not previously been analysed.

Objective

We plan to conduct a pan-cancer analysis examining the association between somatic mutations across the tumour genome (exposure) and VTE (outcome).

We intend to analyse the exposure variable in 3 alternative ways 1) a gene-centric approach, where the effect of somatic mutations is considered separately for each gene; 2) by assessing tumour mutational burden; and 3) by assessing tumour mutational signatures. The effect of these variables on the rate of VTE will be assessed using statistical models described below.

Study population

This analysis uses existing data from the 100,000 Genomes Project Cancer Programme, a Genomics England (GEL) initiative which recruited 17,241 participants with cancer and performed whole genome sequencing (WGS) on matched germline and somatic (tumour) genomes ¹¹. Recruitment occurred between 2015 and 2019. Eligibility for the overall program covered patients aged from birth upwards, diagnosed with a solid organ or haematological malignancy. Previously treated patients presenting with cancer recurrence, progression or undergoing surgery following neoadjuvant chemotherapy were all included. Genomic data is linked to pseudo-anonymised longitudinal electronic health records from secondary care, including Hospital Episode Statistics (HES), the National Cancer Registration and Analysis Service (NCRAS), the Systemic Anti-Cancer Therapy (SACT) dataset, and mortality data from the Office for National Statistics (ONS).

From the 100,000 Genomes Project version 18 data-release (December 2023), we will select a cohort of participants who meet the following inclusion criteria:

1. Ongoing valid study consent and prospectively collected tumour sample (exclude patients with stored samples which were collected prior to study recruitment opening)

2. Paired somatic and germline WGS meeting the following quality control criteria: concordant phenotypic and karyotypic sex; read mapping quality > 30 across 210 Gb for tumour DNA and 85Gb for germline DNA; cross-sample contamination <3% for germline DNA (assessed by VerifyBamID [ https://github.com/statgen/verifyBamID]) and <5% for tumour DNA (assessed by ConPair [ https://github.com/nygenome/Conpair]) ¹².

3. Linked NCRAS record with a congruous cancer diagnosis to the tumour type received by GEL.

4. Histology consistent with malignant cancer.

5. No missing or discrepant information for critical covariates (including age, sex, genetically inferred ancestry, cancer type and diagnosis date).

6. No prior history of VTE (patients with a documented VTE prior to study entry will be excluded).

Primary outcome definition

The primary outcome of interest is the first occurrence of VTE. This is a composite phenotype which we will identify using ICD10 codes (version:2019) recorded in linked hospital episode statistics (including inpatient, outpatient and emergency department records). Death from any cause other than VTE will be recorded from linked ONS data.

The ICD10 codes used to define VTE have been extracted from Health Data Research UK coding algorithms for deep vein thrombosis (PH338) and pulmonary embolism (PH71) [ https://phenotypes.healthdatagateway.org] ^13,
14. In addition to the codes listed in the original algorithms, we have included codes I808-9 and I828-9 which encompass VTE at other/unspecified sites ( Table 1). This adaptation aims to ensure that upper limb VTE diagnoses are also identified, as these events are more common in cancer patients ¹⁵.

Table 1. ICD10 version: 2019 codes for venous thromboembolism in linked electronic health records.

Code	Description
I26	Pulmonary embolism
I636	Cerebral infarction due to cerebral venous thrombosis, non-pyogenic
I676	Non-pyogenic thrombosis of intracranial venous system
I801	Phlebitis and thrombophlebitis of femoral vein
I802	Phlebitis and thrombophlebitis of other deep vessels of lower extremities
I808	Phlebitis and thrombophlebitis of other sites
I809	Phlebitis and thrombophlebitis of unspecified site
I81	Portal vein thrombosis
I820	Budd-Chiari syndrome
I822	Embolism and thrombosis of vena cava
I823	Embolism and thrombosis of renal vein
I828	Embolism and thrombosis of other specified veins
I829	Embolism and thrombosis of unspecified vein

Open in a new tab

In line with recent literature in this field ^16,
17, we have not included code I800 (phlebitis and thrombophlebitis of superficial vessels of lower extremities) in the case definition. This diagnosis is unlikely to be recorded accurately in hospital records of study participants as it is generally managed in the community. Furthermore, since superficial vein thrombosis of the leg is primarily associated with varicose veins ¹⁸, including these patients as VTE cases may reduce the power of our analysis to detect the effects of tumour genomic aberrations on rates of VTE.

Covariate assessment

Baseline covariates (including age and sex) will be obtained from participant data submitted at recruitment. Cancer-specific covariate information will be derived from linked NCRAS records: this includes date of cancer diagnosis, tumour site and histological type, date of chemotherapy or surgery and stage of cancer (recorded within 12months of study entry). NCRAS diagnoses are coded using ICD10 version 19 [ https://icd.who.int/browse10/2019/en#/II]. Where applicable, information on the specific anti-cancer drug regimen administered will be obtained from linked SACT data.

Genetic data

For cancer participants enrolled to the 100,000 Genomes Project, tissue collection and DNA extraction was performed in local laboratories linked to the regional NHS Genomics Medicine Centres. Tumour DNA was obtained primarily from fresh-frozen histology specimens (and rarely from formalin-fixed paraffin-embedded samples). Germline DNA was obtained from blood or occasionally saliva. The protocols for sample collection and DNA extraction are described in the GEL Sample Handling Guidance (v.4.0) available from [ Document library | Genomics England].

Samples were prepared with an Illumina TruSeq PCR-free library preparation kit, providing sufficient DNA was available; otherwise PCR-based library preparation was used for a minority of samples. DNA sequencing was performed centrally on an Illumina HiSeq platform to an average coverage of 30x (germline DNA) and 100x (tumour DNA). Sequence data was processed using the Illumina North Star (version 2.6.53.23) pipeline with read alignment against the human reference genome GRCh38-Decoy+EBV using ISAAC (version iSAAC-03.16.02.19) ¹².

Somatic variant calls and interpretation

Detection of somatic single nucleotide variants (SNVs) and insertions/deletions < 50bp has been performed using Strelka4 (v2.4.7) [ https://github.com/Illumina/strelka]. Detection of large structural somatic variants (inversions, translocations) and insertions/deletions >50bp has been performed using Manta (v.0.28.0) [ https://github.com/Illumina/manta]. In addition to the default quality filters applied by these pipelines, we will exclude 1) variants with germline allele frequency >1% in the GEL cohort (as these may indicate unsubtracted germline SNVs); 2) variants with somatic allele frequency >5% in the GEL cohort (as these may indicate potential technical artefacts); 3) indels in regions of high sequencing noise (i.e proportion of low quality filtered base calls within 50bp of the variant exceeds 10%) ¹².

Variants have been annotated against the canonical transcript using Cellbase ¹⁹ (integrating information from ENSEMBL (version 90) ²⁰, COSMIC (version 86) ²¹ and ClinVar (October 2018 release) ²². We will classify variants as potentially deleterious if they fall into any of the following categories:

Transcript ablation
Splice acceptor / splice donor variant
Stop gain / loss
Start loss
In-frame insertion / deletion
Frameshift
Missense variant predicted to be deleterious to protein function by in-silico prediction algorithms (e.g. FATHMM-MKL score ²³)
Listed in Clinvar [ https://ncbi.nlm.nik/gov/clinvar] as pathogenic/likely pathogenic

Gene inclusion criteria for gene-centric analysis

1) We will analyse all genes which are listed in either tier 1 or tier 2 of the Cancer Gene Census [ https://cancer.sanger.ac.uk] ²¹. Tier 1 includes known oncogenic and tumour suppressor genes, while tier 2 includes genes where there is emerging (but less extensive) evidence to implicate their role in cancer pathology. Since it is becoming increasingly routine to perform sequencing of these genes in newly-diagnosed cancer patients ²⁴, identifying whether there are associations between these genes and VTE risk has potential clinical utility in the near-future.

2) We will analyse all genes for which germline polymorphisms have been shown to be associated with VTE in large population genome-wide or exome-wide association studies ^25–
27 as we hypothesise that somatic variation in these same genes may contribute to cancer associated VTE.

3) Finally, since this is an exploratory analysis designed to identify new genes which have not previously been implicated in VTE, we will analyse any remaining genes for which at least 5% of the study cohort carry potentially deleterious somatic variants. This cut-off has been chosen based on power calculations indicating that in a sample of ~10,000 participants, with a VTE prevalence ~ 10%, there will be >80% power to detect an association between VTE and mutations which are present in >5% of the study cohort, assuming an effect size of 1.5 and type 1 error rate < 0.05 ²⁸.

For each gene included in the analysis we will dichotomise patients into one of two categories:

1) Gene mutated: if the participant carries one or more potentially deleterious somatic variants in the gene at a variant allele frequency (VAF) >=5% in the tumour sample. VAF is calculated by dividing the number of variant reads by the total number of reads (variant + reference sequence) at that position ¹². This VAF threshold is derived from literature which suggests that variants called at a lower VAF are frequently due to sequencing errors ²⁹.

2) Gene unmutated: If the participant does not have any somatic variants in the gene, has a variant which is not predicted to be deleterious or has a variant present with a VAF < 5%.

Global tumour mutational burden

In addition to the gene-centric approach described above, we will consider the association between global tumour mutational burden (calculated as the total number of small somatic variants and indels per Mb of coding sequence) and risk of VTE.

Mutational signature analysis

We will also report the association between 30 specific mutational signatures [ https://cancer.sanger.ac.uk/cosmic/signatures_v2] and risk of VTE. The contribution of each mutational signature to the overall mutation burden has been computed by GEL using the R package nnls ³⁰.

Statistical analyses

Primary analysis

Our primary objective is to identify whether there are potentially causal relationships between somatic mutations in the tumour of a person with cancer and VTE. We will therefore use Cox proportional hazards regression to assess the effect of 1) deleterious somatic DNA variants in each gene; 2) tumour mutational burden and 3) tumour mutational signatures on the rate (hazard) of VTE (outcome) in the pan-cancer cohort. We have chosen this model over a competing risks analysis, based on consensus in the literature that cause-specific hazard models are more appropriate for research questions focused around etiology, rather than prediction which requires accurate estimates of absolute risk that account for competing risks of mortality ³¹.

Time under observation will begin from the first date that a participant’s tumour was sampled for sequencing (here-after referred to as study entry). For participants who have had multiple tumour samples submitted to GEL, only DNA samples submitted at initial study entry will be used for analysis. Follow-up time will be defined as the time from study entry until the diagnosis of VTE, death from any cause, the last date when electronic health records were uploaded to the GEL Research environment (July 2022), or administrative study termination after five years, whichever occurs first. Diagnosis of VTE will be considered as the event of interest while all other events will be considered as right censoring of the follow-up time.

In the primary analysis, we will adjust Cox models for baseline patient covariates which we have identified as potential confounders including: age at study entry, sex and the top 4 genetic principal components ( Figure 1) (minimally adjusted model).

To prevent convergence issues in the regression model caused by complete separation (due to zero / near-zero outcome events in an exposure group) ³², we will limit the analyses to genes/mutational signatures where at least 5 VTE events occur in each exposure category.

Results will be expressed in terms of the hazard ratio (HR) and associated 95% confidence interval (95% CI) for VTE associated with each gene and mutational signature. For tumour mutational burden, we will standardize the exposure variable and report the HR and 95% CI for VTE per standard deviation increase in mutation burden. Results will be interpreted in the context of false-discovery rate corrected p-values to account for multiple testing.

We will assess for violations of the proportional hazards assumption underlying our statistical models by incorporating time-dependent interaction terms to examine whether the relative association between somatic mutations and VTE changes over time, and by evaluating Schoenfeld residuals.

Sensitivity analyses

Sensitivity analysis 1: Examine interactions between somatic mutations and other covariates which may mediate or moderate the association with VTE. We hypothesise that somatic mutations arising within a tumour contribute directly to VTE (for example through increased tumour-expression of pro-thrombotic proteins). However, these same mutations may drive tumour growth or proxy aggressive histological subtypes of malignancy. This may lead to an indirect association with VTE via other mechanisms ( Figure 1). For example, large tumours may compress or invade vessels, contributing to venous stasis and subsequent thrombosis. Patients with advanced cancer are also more likely to experience complications such as infection or hospitalization, all of which have been associated with VTE ^3,
5. Chemotherapy and surgery are also strongly associated with VTE risk. In the era of molecularly targeted therapies, selection of systemic anti-cancer drugs may be dictated by knowledge of the underlying somatic mutation profile ³³. This may lead to associations between genetic variants and VTE which are mediated through drugs, rather than any direct biological phenomenon.

We are interested primarily in estimating the degree to which somatic mutations contribute to VTE through direct biological mechanisms. Therefore, for genes / gene signatures where there is possible evidence from the primary analysis for an association with VTE (nominal P < 0.05), we will conduct mediation analyses to explore the degree to which these associations are dependent on indirect pathways/mediating variables

We will compare estimates from the minimally adjusted (primary analysis) with estimates from Cox regressions where we include the following covariates (presumed mediators of the exposure-outcome relationship): 1) tumour type, grouped by anatomical site; 2) cancer stage and 3) SACT. SACT will be included as a time-dependent covariate. We will also group SACT regimens by type and include these drug groups as distinct covariates. Participants with missing information on any of the above covariates will be excluded from this sensitivity analysis.

It is also plausible that baseline germline thrombophilia polymorphisms modulate the impact of tumour genetic changes on rates of VTE. We will therefore examine the interaction between tumour somatic mutations in each gene and a 297-SNP germline polygenic risk score (derived from a GWAS by Klarin et al.) ³⁴ on the rate of VTE. We have selected this score as, compared with other VTE polygenic risk scores, it demonstrated superior predictive performance for cancer-associated thrombosis when evaluated in the UK biobank cancer population ¹⁶.

Sensitivity analysis 2: Competing risks analysis. For genes/mutational signatures where there is statistical evidence indicating a possible association with the rate of VTE in the primary analysis (nominal p <0.05), we will use Fine and Gray regressions to estimate the risk of VTE in the gene-mutated vs reference groups, treating death from any other cause as a competing risk. We will report the subdistribution hazard ratio (SHR) and associated 95% CI for VTE associated with the gene/mutational signature, as well as the estimated cumulative incidence for VTE over time ³¹.

Sensitivity analysis 3: Exclude relapsed and previously treated participants. The GEL cancer cohort includes some pre-treated patients with relapsed or progressive disease as well as patients who have undergone neo-adjuvant chemotherapy prior to study entry. Prior chemotherapy may be a powerful risk factor for VTE ⁵. It may also influence the somatic mutation profile of a tumour through treatment-induced mutagenesis and selection of tumour sub-clones ³⁵. Therefore, we will perform a sensitivity analysis which is limited to only newly diagnosed untreated cancer patients: excluding any patients who have received SACT prior to study entry or where the date between original cancer diagnosis and study entry is > 180 days.

To control for variability in time between cancer diagnosis and tissue biopsy (for somatic DNA sequencing), we will also perform a sensitivity analysis where the outcome is time to VTE from cancer diagnosis, with left truncation at the point of tumour sampling.

Sensitivity analysis 4: Exclude patients with a documented indication for anticoagulation or anti-platelet therapy. Older patients with co-morbidities may be more likely to be prescribed anticoagulation for reasons other than VTE (for example, for stroke prevention in the context of atrial fibrillation) ³⁶. This is a potential confounding variable and failing to include it as a co-variate may introduce bias. Prescription data are unfortunately not available for this cohort. However, we will perform a sensitivity analysis where we exclude patients who have a common medical indication for long-term anticoagulation or antiplatelet therapy documented in their hospital episode statistics prior to study entry: namely atrial fibrillation, acute coronary syndrome, ischaemic stroke/transient ischaemic attack or prosthetic heart valve replacement ^37,
38. (Note patients with a prior history of VTE are excluded from the primary cohort).

Sensitivity analysis 5: Ancestry stratified analysis. The cohort is predominantly comprised of participants of European genetic ancestry, which potentially limits the generalizability of any findings. Although power to detect ancestry-specific associations in non-Europeans will be low, we will assess the direction of effect of any gene associations with are identified from the primary analysis in an ancestry-stratified sensitivity analysis.

Discussion

In this protocol, we plan to examine the impact of tumour somatic mutations on rates of VTE. We will attempt to adjust for factors which may be correlated with both tumour genetics and VTE, including age, sex, tumour site and stage of cancer. However, there will still be residual and unmeasured confounding which may bias our results. For example, obesity is a risk factor both for the development of some cancers and for VTE ³⁹. Unfortunately, weight and height has been recorded for fewer than 5% of participants in the 100,000 Genomes Project cancer programme. Therefore, it is not possible to include body mass index as a covariate. Similarly, haematological laboratory parameters (including haemoglobin, leucocyte count and platelet count) which have previously been shown to be relevant in the clinical prediction of cancer-associated thrombosis ⁴⁰, are not available for this dataset.

The longitudinal health record data for participants in our cohort is obtained through linkage with pseudo-anonymised hospital-episode statistics (coded using ICD10 diagnoses). Unfortunately, this does not include full imaging reports or free-text hospital notes. It is therefore not possible to directly validate the specificity and sensitivity of ICD10 codes for VTE in the context of this study. We note recent work by Overvad et al. ¹⁷ showing that identification of cancer-associated VTE through ICD10 codes recorded in electronic health registry data was reasonably specific (positive predictive value >86%) and sensitive (~74% of oncology patients with venous thromboembolism had a relevant ICD10 code documented). We acknowledge that deriving diagnoses from administrative hospital statistics will inevitably lead to some degree of outcome misclassification, particularly given that no primary care data is currently available for participants, meaning VTE diagnoses which are only recorded in GP records will be missed. We anticipate this will result in an under-estimation of the true VTE incidence, which could potentially bias effect estimates towards the null (although bias in the other direction is also plausible) ⁴¹.

Risk of VTE is dynamic and likely to change over the course of a patient’s cancer illness. It may also be moderated by events such as hospital admissions and the administration of short or medium-term primary thromboprophylaxis. As we do not have prescription data available, we cannot specifically explore the impact of these factors. However, we will examine our data to ascertain whether the relative effect of somatic mutations on VTE varies over time and interpret our results in light of these considerations.

Despite these limitations, we believe this large pan-cancer cohort, with good quality paired tumour and germline whole genome sequence data and individual participant linkage to longitudinal health records, provides a unique opportunity to study the contribution of tumour genomic abnormalities to VTE pathophysiology. It is outside the scope of this analysis to attempt to develop or validate a comprehensive risk-prediction model. However, we hope that any gene associations which we identify can be carried forward into future (ideally prospectively-designed) studies to evaluate the value of adding tumour genome sequence data to existing risk-prediction algorithms for cancer associated thrombosis.

Study status

Prior to finalising this protocol, NC explored the GEL research environment platform to develop familiarity with the data structure and the available files for analysis, in order to inform judgements on the feasibility of the planned study.

To ascertain whether there was likely to be adequate power to perform the analyses described above (page 7), we also generated descriptive statistics on the overall cancer cohort including

Proportion of the sample with missing covariate data
Proportion of the population experiencing the outcome of interest (VTE)

No prior analyses examining the associations between somatic mutations and VTE have been performed.

Future amendments to this protocol will be clearly documented and justified in subsequent reports. We plan to submit findings from this study for publication once analyses are complete.

Ethics and consent

This study has been approved by the Genomics England research network, and access to the data will adhere to Genomics England research governance agreements. The 100,000 Genomes Project was approved by the NHS Health Research Authority East of England - Cambridge South Research Ethics Committee (REC ref: 14/EE/1112, 20 ^th February 2015). Participants were recruited from across 13 NHS Genomic Medicine Centres and all participants agreed to participate in the 100,000 Genomes Project and provided informed written consent.

Acknowledgements

This research will be made possible through access to data in the National Genomic Research Library, which is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The National Genomic Research Library holds data provided by patients and collected by the NHS as part of their care and data collected as part of their participation in research. The National Genomic Research Library is funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK and the Medical Research Council have also funded research infrastructure.

Funding Statement

This work was supported by Wellcome [225541; GW4 Clinical Academic Training Programme for Health Professionals to NC]; Cancer Research UK [C18281/A29019; to PCH].

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 2; peer review: 2 approved]

Data availability

Underlying data

The de-identified patient data which will be used for this analysis can be accessed via the Genomics England Research Environment subject to a collaborative agreement that adheres to patient-led governance. For more information about accessing the data, contact research-network@genomicsengland.co.uk or access the relevant information on the Genomics England website: https://www.genomicsengland.co.uk/research.

Reporting guidelines

University of Bristol data repository [ https://data.bris.ac.uk/data/] : STROBE checklist for ‘Association between tumour somatic mutations and venous thromboembolism in the 100,000 Genomes Project cancer cohort: a study protocol’. https://doi.org/10.5523/bris.1pmmcgyqaij8n27j94i2rgd4j9 ⁴².

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Software availability

Analyses will be carried out in R version 4.2.1 using standard open source software packages available from [ https://cran.r-project.org] including survival, cmprsk and riskRegression ^43–
45.

All analysis scripts will be posted publicly on github [ https://github.com/NaomiC-0] following an airlock approval from the Genomics England data protection committee.

References

1. Khorana AA, Francis CW, Culakova E, et al. : Thromboembolism is a leading cause of death in cancer patients receiving outpatient chemotherapy. J Thromb Haemost. 2007;5(3):632–4. 10.1111/j.1538-7836.2007.02374.x [DOI] [PubMed] [Google Scholar]
2. Rutjes AW, Porreca E, Candeloro M, et al. : Primary prophylaxis for venous thromboembolism in ambulatory cancer patients receiving chemotherapy. Cochrane Database Syst Rev. 2020;12(12):CD008500. 10.1002/14651858.CD008500.pub5 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Noble S, Pasi J: Epidemiology and pathophysiology of cancer-associated thrombosis. Br J Cancer. 2010;102 Suppl 1(Suppl 1):S2–9. 10.1038/sj.bjc.6605599 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Mulder FI, Horváth-Puhó E, van Es N, et al. : Venous thromboembolism in cancer patients: a population-based cohort study. Blood. 2021;137(14):1959–69. 10.1182/blood.2020007338 [DOI] [PubMed] [Google Scholar]
5. Guntupalli SR, Spinosa D, Wethington S, et al. : Prevention of venous thromboembolism in patients with cancer. BMJ. 2023;381:e072715. 10.1136/bmj-2022-072715 [DOI] [PubMed] [Google Scholar]
6. Ünlü B, Versteeg HH: Cancer-associated thrombosis: the search for the holy grail continues. Res Pract Thromb Haemost. 2018;2(4):622–9. 10.1002/rth2.12143 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Abufarhaneh M, Pandya RK, Alkhaja A, et al. : Association between genetic mutations and risk of venous thromboembolism in patients with solid tumor malignancies: a systematic review and meta-analysis. Thromb Res. 2022;213:47–56. 10.1016/j.thromres.2022.02.022 [DOI] [PubMed] [Google Scholar]
8. Dunbar A, Bolton KL, Devlin SM, et al. : Genomic profiling identifies somatic mutations predicting thromboembolic risk in patients with solid tumors. Blood. 2021;137(15):2103–13. 10.1182/blood.2020007488 [DOI] [PMC free article] [PubMed] [Google Scholar]
9. Feldman S, Gupta D, Navi BB, et al. : Tumor genomic profile is associated with Arterial Thromboembolism risk in patients with solid cancer. JACC CardioOncol. 2023;5(2):246–55. 10.1016/j.jaccao.2023.01.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Jee J, Brannon AR, Singh R, et al. : DNA liquid biopsy-based prediction of cancer-associated venous thromboembolism. Nat Med. 2024;30(9):2499–2507. 10.1038/s41591-024-03195-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
11. Genomics England: The national genomics research library v5.1.2020; [cited 2024 Sep 1]. Reference Source
12. Genomics England: Cancer analysis technical information document, version 2.0.2019; [cited 2024 Sep 26]. Reference Source
13. Kuan V, Denaxas S, Gonzalez-Izquierdo A, et al. : PH338 / 676 - Venous thromboembolic disease (Excl PE). HDRUK Phenotype Library.2021; [cited 2024 Sep 3]. Reference Source [Google Scholar]
14. Kuan V, Denaxas S, Gonzalez-Izquierdo A, et al. : PH71 / 142 - Pulmonary embolism. HDRUK Phenotype Library.2021; [cited 2024 Sep 3]. Reference Source [Google Scholar]
15. Alikhan R, Gomez K, Maraveyas A, et al. : Cancer-associated venous thrombosis in adults (second edition): a British Society for Haematology guideline. Br J Haematol. 2024;205(1):71–87. 10.1111/bjh.19414 [DOI] [PubMed] [Google Scholar]
16. Guman NAM, Mulder FI, Ferwerda B, et al. : Polygenic risk scores for prediction of cancer-associated venous thromboembolism in the UK Biobank cohort study. J Thromb Haemost. 2023;21(11):3175–3183. 10.1016/j.jtha.2023.07.009 [DOI] [PubMed] [Google Scholar]
17. Overvad TF, Severinsen MT, Johnsen SP, et al. : Positive predictive value and sensitivity of cancer-associated venous thromboembolism diagnoses in the Danish National Patient Register. Thromb Res. 2024;241: 109074. 10.1016/j.thromres.2024.109074 [DOI] [PubMed] [Google Scholar]
18. Cosmi B: Management of superficial vein thrombosis. J Thromb Haemost. 2015;13(7):1175–83. 10.1111/jth.12986 [DOI] [PubMed] [Google Scholar]
19. Bleda M, Tarraga J, de Maria A, et al. : CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources. Nucleic Acids Res. 2012;40(Web Server issue):W609–14. 10.1093/nar/gks575 [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Harrison PW, Amode MR, Austine-Orimoloye O, et al. : Ensembl 2024. Nucleic Acids Res. 2024;52(D1):D891–9. 10.1093/nar/gkad1049 [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Sondka Z, Dhir NB, Carvalho-Silva D, et al. : COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 2024;52(D1):D1210–7. 10.1093/nar/gkad986 [DOI] [PMC free article] [PubMed] [Google Scholar]
22. Landrum MJ, Lee JM, Benson M, et al. : ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–7. 10.1093/nar/gkx1153 [DOI] [PMC free article] [PubMed] [Google Scholar]
23. Shihab HA, Rogers MF, Gough J, et al. : An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics. 2015;31(10):1536–43. 10.1093/bioinformatics/btv009 [DOI] [PMC free article] [PubMed] [Google Scholar]
24. NHS England: National genomic test directory.2024; [cited 2024 Sep 3]. Reference Source [Google Scholar]
25. Thibord F, Klarin D, Brody JA, et al. : Cross-Ancestry investigation of Venous Thromboembolism genomic predictors. Circulation. 2022;146(16):1225–1242. 10.1161/CIRCULATIONAHA.122.059675 [DOI] [PMC free article] [PubMed] [Google Scholar]
26. Ghouse J, Tragante V, Ahlberg G, et al. : Genome-wide meta-analysis identifies 93 risk loci and enables risk prediction equivalent to monogenic forms of venous thromboembolism. Nat Genet. 2023;55(3):399–409. 10.1038/s41588-022-01286-7 [DOI] [PubMed] [Google Scholar]
27. He XY, Wu BS, Yang L, et al. : Genetic associations of protein-coding variants in venous thromboembolism. Nat Commun. 2024;15(1): 2819. 10.1038/s41467-024-47178-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
28. Johnson J: Gene association study power calculator.2017; [cited 2024 Jul 13]. Reference Source [Google Scholar]
29. Yan YH, Chen SX, Cheng LY, et al. : Confirming putative variants at ≤ 5% allele frequency using allele enrichment and Sanger sequencing. Sci Rep. 2021;11(1): 11640. 10.1038/s41598-021-91142-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Alexandrov LB, Nik-Zainal S, Wedge DC, et al. : Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 2013;3(1):246–59. 10.1016/j.celrep.2012.12.008 [DOI] [PMC free article] [PubMed] [Google Scholar]
31. Austin PC, Fine JP: Practical recommendations for reporting Fine-Gray model analyses for competing risk data. Stat Med. 2017;36(27):4391–400. 10.1002/sim.7501 [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Agresti A: Building and applying logistic regression models.In: An Introduction to Categorical Data Analysis. John Wiley & Sons, Ltd,2007[cited 2024 Dec 19];137–72. 10.1002/9780470114759.ch5 [DOI] [Google Scholar]
33. Sosinsky A, Ambrose J, Cross W, et al. : Insights for precision oncology from the integration of genomic and clinical data of 13,880 tumors from the 100,000 Genomes Cancer Programme. Nat Med. 2024;30(1):279–89. 10.1038/s41591-023-02682-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
34. Klarin D, Busenkell E, Judy R, et al. : Genome-wide association analysis of venous thromboembolism identifies new risk loci and genetic overlap with arterial vascular disease. Nat Genet. 2019;51(11):1574–9. 10.1038/s41588-019-0519-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
35. Bertrums EJM, de Kanter JK, Derks LLM, et al. : Selective pressures of platinum compounds shape the evolution of therapy-related myeloid neoplasms. Nat Commun. 2024;15(1): 6025. 10.1038/s41467-024-50384-z [DOI] [PMC free article] [PubMed] [Google Scholar]
36. Olesen JB, Lip GYH, Hansen ML, et al. : Validation of risk stratification schemes for predicting stroke and thromboembolism in patients with atrial fibrillation: nationwide cohort study. BMJ. 2011;342:d124. 10.1136/bmj.d124 [DOI] [PMC free article] [PubMed] [Google Scholar]
37. National Institute for Health and Care Excellence: Antiplatelet drugs | Treatment summaries | BNF content published by NICE. [cited 2024 Oct 1]. Reference Source
38. National Institute for Health and Care Excellence: Oral anticoagulants | Treatment summaries | BNF content published by NICE. [cited 2024 Oct 1]. Reference Source
39. Larsson SC, Burgess S: Causal role of high body mass index in multiple chronic diseases: a systematic review and meta-analysis of Mendelian Randomization studies. BMC Med. 2021;19(1):320. 10.1186/s12916-021-02188-x [DOI] [PMC free article] [PubMed] [Google Scholar]
40. Khorana AA, Kuderer NM, Culakova E, et al. : Development and validation of a predictive model for chemotherapy-associated thrombosis. Blood. 2008;111(10):4902–7. 10.1182/blood-2007-10-116327 [DOI] [PMC free article] [PubMed] [Google Scholar]
41. Bakoyannis G, Yiannoutsos CT: Impact of and correction for outcome misclassification in cumulative incidence estimation. PLoS One. 2015;10(9): e0137454. 10.1371/journal.pone.0137454 [DOI] [PMC free article] [PubMed] [Google Scholar]
42. Gerds TA, Ohlendorff JS, Blanche P, et al. : riskRegression: risk regression models and prediction scores for survival analysis with competing risks.2023; [cited 2024 Sep 10]. https://cran.r-project.org/web/packages/riskRegression/index.html
43. Gray B: cmprsk: subdistribution analysis of competing risks.2024; [cited 2024 Sep 10]. https://cran.r-project.org/web/packages/cmprsk/index.html
44. Therneau TM, Lumley T, Elizabeth A, et al. : survival: survival analysis.2024; [cited 2024 Sep 10]. https://cran.r-project.org/web/packages/survival/index.html
45. Cornish N: STROBE checklist for ‘Association between tumour somatic mutations and venous thromboembolism in the 100,000 Genomes Project cancer cohort: a study protocol’.2024. 10.5523/bris.1pmmcgyqaij8n27j94i2rgd4j9 [DOI] [PMC free article] [PubMed] [Google Scholar]

Wellcome Open Res. 2024 Dec 30. doi: 10.21956/wellcomeopenres.25990.r116319

Reviewer response for version 2

Yan Xu ¹

The authors have adequately addressed the queries raised in my original review, and added additional analyses as applicable (e.g., germline predispositions to VTE). I have no further comments and wish the authors the best in their analysis which I anticipate with great interest.

Is the study design appropriate for the research question?

Yes

Is the rationale for, and objectives of, the study clearly described?

Yes

Are sufficient details of the methods provided to allow replication by others?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Reviewer Expertise:

Venous thromboembolism

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Wellcome Open Res. 2024 Dec 27. doi: 10.21956/wellcomeopenres.25990.r116320

Reviewer response for version 2

Benilde Cosmi ¹

The authors have adequately addressed previously raised concerns.

Is the study design appropriate for the research question?

Yes

Is the rationale for, and objectives of, the study clearly described?

Yes

Are sufficient details of the methods provided to allow replication by others?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Reviewer Expertise:

venous thromboembolism , heparin, vitamin K antagonists, direct oral anticoagulant

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Wellcome Open Res. 2024 Dec 5. doi: 10.21956/wellcomeopenres.25500.r110118

Reviewer response for version 1

Benilde Cosmi ¹

The authors plan to perform a whole genome sequencing (WGS) in patients with cancer from 100,000 Genomes Project Cancer Programme, a Genomics England (GEL) initiative to assess the association between somatic mutations across the tumour genome (exposure) and venous thromboembolism (VTE ) (outcome).

Some issues should be addressed as indicated below:

1- Outcomes: the diagnosis of DVT in any site and/or PE is based on ICD-10 as reported in table 1. However DVT of the upper limb is not considered but it is quite common in cancer patients ( mostly due to central venous catheters). Superficial vein thrombosis of the lower limbs was also excluded. The authors should justify these exclusion.

2- In addition ICD-10 codes do not guarantee that VTE diagnosis was objectively documented.

3- Sensitivity analysis 4: Cancer patients can also receive thromboprophylaxis in case of hospitalization for acute illness or surgery, and not only long term anticoagulation. In fact the risk of VTE is dynamic and it can change during the clinical course of cancer. This may also impact on the risk of VTE.

Is the study design appropriate for the research question?

Yes

Is the rationale for, and objectives of, the study clearly described?

Yes

Are sufficient details of the methods provided to allow replication by others?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Reviewer Expertise:

venous thromboembolism , heparin, vitamin K antagonists, direct oral anticoagulant

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Wellcome Open Res. 2024 Dec 19.

Naomi Cornish ¹

Many thanks for taking the time to review this study protocol. We appreciate your valuable insights regarding the potential limitations of this analysis and where possible have incorporated your feedback to improve the revised manuscript. Our response to your comments is below:

1. The ICD10 codes which we propose using to identify venous thromboembolism from electronic health record data are derived from a previously published algorithm by Kuan et al which has been curated by a panel of clinicians. ^[1] We deliberately chose a standardised algorithm for venous thromboembolism in an effort to ensure that our outcome definition is easily reproducible. However the algorithm was not developed specifically for cancer-associated thrombosis, and, as you highlight, does not include upper limb deep vein thrombosis (DVT), which occurs more frequently in people with cancer. ^[2] Therefore, we have expanded the diagnostic algorithm to include codes I808-9 and I828-9 which encompass thromboembolism of ‘other’ or ‘unspecified’ veins (Table 1 of manuscript). Specific codes for upper extremity DVT (I826), axillary vein thrombosis (I82A), subclavian vein thrombosis (I82B) and internal jugular thrombosis (I82C) do not appear in the data. This is due to the fact that the health records are coded using WHO ICD10 version 19 ^[3], which lacks some of the more granular codes that appear in ICD10-CM (curated by the National Centre for Health Statistics, US). ^[4] In line with recent literature in this field ^[5,6] we have not included ICD10 code I800 (“Phlebitis and thrombophlebitis of superficial vessels of lower extremities”) in the case definition. This diagnosis is unlikely to be recorded accurately in hospital records of study participants as it is generally managed in the community. Furthermore, although superficial vein thrombosis undoubtedly shares risk factors with deep venous thrombosis, it may have distinct aetiology in this context, given that it is primarily associated with the presence of varicose veins and lower limb venous insufficiency. ^[7] Including these patients as VTE cases may reduce the power of our analysis to detect the direct effects of tumour genomic aberrations on thrombosis development. We have clarified our justification for this in the manuscript (primary outcome definition).

2. Unfortunately, the linked electronic health records for participants in this cohort do not include imaging reports or free-text hospital notes. It is therefore not possible to directly validate the specificity and sensitivity of ICD10 codes for venous thromboembolism in the context of this study. We note recent work by Overvad et al ^[6] showing that identification of incident cancer-associated venous thromboembolism through ICD10 codes recorded in electronic health registry data was reasonably specific (positive predictive value >86%) and sensitive (~74% of oncology patients with venous thromboembolism had a relevant ICD10 code documented). We acknowledge that using administrative data to identify our primary outcome will inevitably lead to the misclassification of some participants. However, we would anticipate the impact of this would be more likely to bias effect estimates towards the null, by diluting the ‘true’ case vs control groups and thereby reducing the power to detect casual associations (rather than introducing false positive associations, although this is also possible). ^[8] We have acknowledged these limitations in the manuscript (discussion section).

3. We agree that the risk of VTE is dynamic and likely to change over the course of a patient’s cancer illness. As we do not have prescription data available, we cannot explore the impact of short or medium-term anticoagulation prescriptions (for example in the context of hospital admissions) on the rate of venous thromboembolism. We have expanded on this limitation in the discussion section. We will examine whether the effect of somatic mutations on venous thromboembolism changes over time. This will include assessing the data for violations of the proportional hazards assumption underlying our statistical models, by incorporating time-dependent interaction terms and evaluating Schoenfeld residuals. We have added this clarification to the statistical analysis plan.

References:

1. Kuan V, Denaxas S, Gonzalez-Izquierdo A, Direk K, Bhatti O, Husain S, et al. A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National Health Service. Lancet Digit Health 2019;1(2):e63–77.

2. Alikhan R, Gomez K, Maraveyas A, Noble S, Young A, Thomas M, et al. Cancer-associated venous thrombosis in adults (second edition): A British Society for Haematology Guideline. Br J Haematol 2024;

3. World Health Organization. International statistical classification of diseases and related health problems (11th ed.) [Internet]. 2019 [cited 2024 Jul 31];Available from: https://icd.who.int/browse10/2019/en#/I80-I89.

4. CDC. ICD-10-CM [Internet]. Classification of Diseases, Functioning, and Disability2024 [cited 2024 Dec 12];Available from: https://www.cdc.gov/nchs/icd/icd-10-cm/index.html.

5. Guman NAM, Mulder FI, Ferwerda B, Zwinderman AH, Kamphuisen PW, Büller HR, et al. Polygenic risk scores for prediction of cancer-associated venous thromboembolism in the UK Biobank cohort study. J Thromb Haemost 2023;S1538-7836(23)00571-8.

6. Overvad TF, Severinsen MT, Johnsen SP, Madsen SS, Kannik K, Stenfeldt LG, et al. Positive predictive value and sensitivity of cancer-associated venous thromboembolism diagnoses in the Danish National Patient Register. T hrombosis Research 2024;241:109074.

7. Cosmi B. Management of superficial vein thrombosis. Journal of Thrombosis and Haemostasis 2015;13(7):1175–83. 8. Bakoyannis G, Yiannoutsos CT. Impact of and Correction for Outcome Misclassification in Cumulative Incidence Estimation. PLOS ONE 2015;10(9):e0137454.

Wellcome Open Res. 2024 Nov 28. doi: 10.21956/wellcomeopenres.25500.r113413

Reviewer response for version 1

Yan Xu ¹

In this protocol, Dr. Cornish and colleagues seek to establish an association between somatic mutation in tumoral environment and risk of venous thromboembolism through use of UK’s 100,000 Genomes Project. In addition to evaluating an association between tumoural somatic mutations and risk of VTE, the authors plan to adjust for several confounding variables (tumour type, stage, use of anti-cancer therapy) and perform sensitivity analyses to delineate the contribution from somatic mutations in context of other factors that contribute to increased risk of VTE.

The authors have presented a clear proposal with important impact on care of patients living with cancer. As anti-cancer therapies evolve and our understanding of VTE risks among patients with cancer are refined, there is a need for a precision medicine approach to improve our primary VTE prevention methods.

Major comments:

The authors propose use of tumour type, stage and systemic anti-cancer therapy as mediators and age at study entry, sex and the top 4 genetic principal components as confounders. In addition, the authors are encouraged to consider components of other established risk scores for VTE prediction in cancer patients (e.g., Khorana score components) such as anemia, thrombocytosis and leukocytosis as mediators, as they likely reflect the underlying inflammatory state of malignancy and are well-established risk factors for VTE in cancer patients. Furthermore, BMI should be considered a confounder given the association between elevated BMI and development of both selected malignancies (e.g., post-menopausal breast, colorectal) and VTE. If these variables are not available, these should be acknowledged as limitations.
The authors should consider patient-related germline risk factors for cancer-associated VTE (e.g, polygenic risk score for cancer-associated VTE has been evaluated by Guman et al, JTH 2023). While one presumes that germline risk factors should not correlate with somatic mutations in the tumoural environment, they may modulate the risk of observed VTE if those with germline predispositions (e.g., factor V Leiden) have higher tendencies to receive thromboprophylaxis.
Impact of ancestry on somatic variants should be considered as there seems to be fewer Domain 1 variants and those from all domains combined ( Diversity and genetic ancestry effects in the… | Genomics England). Performance of specific somatic mutations on VTE should be evaluated in ancestral groups to ensure generalizability, rather than simply classifying PCA-defined ancestry as a mediator in the analysis.
Performance of ICD-10 codes (sensitivity, specificity, PPV, NPV) are not clear and would warrant validation.

Is the study design appropriate for the research question?

Yes

Is the rationale for, and objectives of, the study clearly described?

Yes

Are sufficient details of the methods provided to allow replication by others?

Yes

Are the datasets clearly presented in a useable and accessible format?

Yes

Reviewer Expertise:

Venous thromboembolism

References

1. : Polygenic risk scores for prediction of cancer-associated venous thromboembolism in the UK Biobank cohort study. J Thromb Haemost .2023;21(11) : 10.1016/j.jtha.2023.07.009 3175-3183 10.1016/j.jtha.2023.07.009 [DOI] [PubMed] [Google Scholar]
2. Diversity and genetic ancestry effects in the Cancer Cohort of the 100,000 Genomes Project. Genomics England . Reference source

Wellcome Open Res. 2024 Dec 19.

Naomi Cornish ¹

Thank you for taking the time to review our study protocol. We are very grateful for your expertise and the effort you have taken to provide detailed feedback. We have responded to each of your comments below and, where possible, we have adapted our analysis plan according to your suggestions (revision uploaded).

1. Obesity is a potentially confounding variable which may influence the development of both cancer and venous thromboembolism. ^[1] Unfortunately, weight and height has been recorded for fewer than 5% of participants in the 100,000 Genomes Project cancer programme. Therefore, it is not possible to include body mass index as a covariate. Similarly, haematological laboratory parameters (including haemoglobin, leucocyte count and platelet count) are not available for this dataset. Although these variables have undoubtedly been shown to be relevant in the clinical prediction of cancer-associated thrombosis, ^[2] our primary aim in this analysis is specifically to examine the contribution of tumour genomic abnormalities to the aetiology of venous thromboembolism, rather than to develop or validate a comprehensive risk-prediction model. We also note that prospective studies have shown that clinical utility of the Khorana score is relatively low for several cancer sites. ^[3,4] Although it is outside the scope of this particular analysis, we hope that any gene associations which we identify can be carried forward into future studies to evaluate the value of adding tumour genome sequence data to existing risk-prediction algorithms. We have acknowledged these limitations in the updated manuscript (discussion section).

2. Recent UK guidelines do not currently suggest routine primary thromboprophylaxis for cancer patients who carry common germline thrombophilia variants (including Factor V Leiden). ^[5,6] However, we agree that germline predisposition to venous thromboembolism may well interact with tumour somatic mutations to modulate the risk of cancer-associated thrombosis. We therefore plan to examine whether a previously validated germline polygenic risk score ^[7] further affects the rate of venous thromboembolism in the context of different tumour somatic gene mutations (sensitivity analysis 1).

3. We recognise that population stratification may potentially confound observed associations between somatic mutations and venous thromboembolism. We attempt to mitigate this by including genetic principal components as covariates in the proposed statistical models. However, as you highlight, associations identified from a predominantly European study cohort may not be generalisable to other diverse populations. Therefore, we have updated the protocol to include an ancestry-stratified analysis (sensitivity analysis 5), while acknowledging that our power to detect ancestry-specific associations will be limited, given the relatively low numbers of non-European ancestry participants. ^[8]

4. The longitudinal health record data for participants in our cohort is obtained through linkage of pseudo-anonymised hospital-episode statistics (coded using ICD10 diagnoses) from NHS England. Unfortunately, this does not include full imaging reports or free-text hospital notes. It is therefore not possible to directly validate the specificity and sensitivity of ICD10 codes for venous thromboembolism in the context of this study. We note recent work by Overvad et al ^[9] showing that identification of incident cancer-associated venous thromboembolism through ICD10 codes recorded in electronic health registry data was reasonably specific (positive predictive value >86%) and sensitive (~74% of oncology patients with venous thromboembolism had a relevant ICD10 code documented). We acknowledge that using administrative data to identify our primary outcome will inevitably lead to the misclassification of some participants. However, we would anticipate the impact of this would be more likely to bias effect estimates towards the null, by diluting the ‘true’ case vs control groups and thereby reducing the power to detect casual associations (rather than introducing false positive associations, although this is also possible). ^[10] We have acknowledged these limitations in the manuscript (discussion section).

References:

1. Larsson SC, Burgess S. Causal role of high body mass index in multiple chronic diseases: a systematic review and meta-analysis of Mendelian randomization studies. BMC Medicine 2021;19(1):320.

2. Khorana AA, Kuderer NM, Culakova E, Lyman GH, Francis CW. Development and validation of a predictive model for chemotherapy-associated thrombosis. Blood. 2008;111(10):4902–7.

3. Mulder FI, Candeloro M, Kamphuisen PW, Di Nisio M, Bossuyt PM, Guman N, et al. The Khorana score for prediction of venous thromboembolism in cancer patients: a systematic review and meta-analysis. Haematologica 2019;104(6):1277–87.

4. Overvad TF, Ording AG, Nielsen PB, Skjøth F, Albertsen IE, Noble S, et al. Validation of the Khorana score for predicting venous thromboembolism in 40 218 patients with cancer initiating chemotherapy. Blood Adv 2022;6(10):2967–76.

5. Arachchillage DJ, Mackillop L, Chandratheva A, Motawani J, MacCallum P, Laffan M. Thrombophilia testing: A British Society for Haematology guideline. Br J Haematol 2022;198(3):443–58.

6. Alikhan R, Gomez K, Maraveyas A, Noble S, Young A, Thomas M, et al. Cancer-associated venous thrombosis in adults (second edition): A British Society for Haematology Guideline. Br J Haematol 2024;

7. Guman NAM, Mulder FI, Ferwerda B, Zwinderman AH, Kamphuisen PW, Büller HR, et al. Polygenic risk scores for prediction of cancer-associated venous thromboembolism in the UK Biobank cohort study. J Thromb Haemost 2023;S1538-7836(23)00571-8.

8. Nguyen BT, Kuchenbaecker K, Silver M, Tallman S, Cho Y, Sosinsky A, et al. Diversity and genetic ancestry effects in the Cancer Cohort of the 100,000 Genomes Project [Internet]. Genomics England2023 [cited 2024 Dec 11];Available from: https://www.genomicsengland.co.uk/blog/diversity-and-genetic-ancestry-effects-in-the-cancer-cohort-of-the-100-000-genomes-project

9. Overvad TF, Severinsen MT, Johnsen SP, Madsen SS, Kannik K, Stenfeldt LG, et al. Positive predictive value and sensitivity of cancer-associated venous thromboembolism diagnoses in the Danish National Patient Register. Thrombosis Research 2024;241:109074.

10. Bakoyannis G, Yiannoutsos CT. Impact of and Correction for Outcome Misclassification in Cumulative Incidence Estimation. PLOS ONE 2015;10(9):e0137454.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

Underlying data

Reporting guidelines

Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

[ref-1] 1. Khorana AA, Francis CW, Culakova E, et al. : Thromboembolism is a leading cause of death in cancer patients receiving outpatient chemotherapy. J Thromb Haemost. 2007;5(3):632–4. 10.1111/j.1538-7836.2007.02374.x [DOI] [PubMed] [Google Scholar]

[ref-2] 2. Rutjes AW, Porreca E, Candeloro M, et al. : Primary prophylaxis for venous thromboembolism in ambulatory cancer patients receiving chemotherapy. Cochrane Database Syst Rev. 2020;12(12):CD008500. 10.1002/14651858.CD008500.pub5 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-3] 3. Noble S, Pasi J: Epidemiology and pathophysiology of cancer-associated thrombosis. Br J Cancer. 2010;102 Suppl 1(Suppl 1):S2–9. 10.1038/sj.bjc.6605599 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-4] 4. Mulder FI, Horváth-Puhó E, van Es N, et al. : Venous thromboembolism in cancer patients: a population-based cohort study. Blood. 2021;137(14):1959–69. 10.1182/blood.2020007338 [DOI] [PubMed] [Google Scholar]

[ref-5] 5. Guntupalli SR, Spinosa D, Wethington S, et al. : Prevention of venous thromboembolism in patients with cancer. BMJ. 2023;381:e072715. 10.1136/bmj-2022-072715 [DOI] [PubMed] [Google Scholar]

[ref-6] 6. Ünlü B, Versteeg HH: Cancer-associated thrombosis: the search for the holy grail continues. Res Pract Thromb Haemost. 2018;2(4):622–9. 10.1002/rth2.12143 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-7] 7. Abufarhaneh M, Pandya RK, Alkhaja A, et al. : Association between genetic mutations and risk of venous thromboembolism in patients with solid tumor malignancies: a systematic review and meta-analysis. Thromb Res. 2022;213:47–56. 10.1016/j.thromres.2022.02.022 [DOI] [PubMed] [Google Scholar]

[ref-8] 8. Dunbar A, Bolton KL, Devlin SM, et al. : Genomic profiling identifies somatic mutations predicting thromboembolic risk in patients with solid tumors. Blood. 2021;137(15):2103–13. 10.1182/blood.2020007488 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-9] 9. Feldman S, Gupta D, Navi BB, et al. : Tumor genomic profile is associated with Arterial Thromboembolism risk in patients with solid cancer. JACC CardioOncol. 2023;5(2):246–55. 10.1016/j.jaccao.2023.01.009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-10] 10. Jee J, Brannon AR, Singh R, et al. : DNA liquid biopsy-based prediction of cancer-associated venous thromboembolism. Nat Med. 2024;30(9):2499–2507. 10.1038/s41591-024-03195-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-11] 11. Genomics England: The national genomics research library v5.1.2020; [cited 2024 Sep 1]. Reference Source

[ref-12] 12. Genomics England: Cancer analysis technical information document, version 2.0.2019; [cited 2024 Sep 26]. Reference Source

[ref-13] 13. Kuan V, Denaxas S, Gonzalez-Izquierdo A, et al. : PH338 / 676 - Venous thromboembolic disease (Excl PE). HDRUK Phenotype Library.2021; [cited 2024 Sep 3]. Reference Source [Google Scholar]

[ref-14] 14. Kuan V, Denaxas S, Gonzalez-Izquierdo A, et al. : PH71 / 142 - Pulmonary embolism. HDRUK Phenotype Library.2021; [cited 2024 Sep 3]. Reference Source [Google Scholar]

[ref-15] 15. Alikhan R, Gomez K, Maraveyas A, et al. : Cancer-associated venous thrombosis in adults (second edition): a British Society for Haematology guideline. Br J Haematol. 2024;205(1):71–87. 10.1111/bjh.19414 [DOI] [PubMed] [Google Scholar]

[ref-16] 16. Guman NAM, Mulder FI, Ferwerda B, et al. : Polygenic risk scores for prediction of cancer-associated venous thromboembolism in the UK Biobank cohort study. J Thromb Haemost. 2023;21(11):3175–3183. 10.1016/j.jtha.2023.07.009 [DOI] [PubMed] [Google Scholar]

[ref-17] 17. Overvad TF, Severinsen MT, Johnsen SP, et al. : Positive predictive value and sensitivity of cancer-associated venous thromboembolism diagnoses in the Danish National Patient Register. Thromb Res. 2024;241: 109074. 10.1016/j.thromres.2024.109074 [DOI] [PubMed] [Google Scholar]

[ref-18] 18. Cosmi B: Management of superficial vein thrombosis. J Thromb Haemost. 2015;13(7):1175–83. 10.1111/jth.12986 [DOI] [PubMed] [Google Scholar]

[ref-19] 19. Bleda M, Tarraga J, de Maria A, et al. : CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources. Nucleic Acids Res. 2012;40(Web Server issue):W609–14. 10.1093/nar/gks575 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-20] 20. Harrison PW, Amode MR, Austine-Orimoloye O, et al. : Ensembl 2024. Nucleic Acids Res. 2024;52(D1):D891–9. 10.1093/nar/gkad1049 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-21] 21. Sondka Z, Dhir NB, Carvalho-Silva D, et al. : COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 2024;52(D1):D1210–7. 10.1093/nar/gkad986 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-22] 22. Landrum MJ, Lee JM, Benson M, et al. : ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–7. 10.1093/nar/gkx1153 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-23] 23. Shihab HA, Rogers MF, Gough J, et al. : An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics. 2015;31(10):1536–43. 10.1093/bioinformatics/btv009 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-24] 24. NHS England: National genomic test directory.2024; [cited 2024 Sep 3]. Reference Source [Google Scholar]

[ref-25] 25. Thibord F, Klarin D, Brody JA, et al. : Cross-Ancestry investigation of Venous Thromboembolism genomic predictors. Circulation. 2022;146(16):1225–1242. 10.1161/CIRCULATIONAHA.122.059675 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-26] 26. Ghouse J, Tragante V, Ahlberg G, et al. : Genome-wide meta-analysis identifies 93 risk loci and enables risk prediction equivalent to monogenic forms of venous thromboembolism. Nat Genet. 2023;55(3):399–409. 10.1038/s41588-022-01286-7 [DOI] [PubMed] [Google Scholar]

[ref-27] 27. He XY, Wu BS, Yang L, et al. : Genetic associations of protein-coding variants in venous thromboembolism. Nat Commun. 2024;15(1): 2819. 10.1038/s41467-024-47178-8 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-28] 28. Johnson J: Gene association study power calculator.2017; [cited 2024 Jul 13]. Reference Source [Google Scholar]

[ref-29] 29. Yan YH, Chen SX, Cheng LY, et al. : Confirming putative variants at ≤ 5% allele frequency using allele enrichment and Sanger sequencing. Sci Rep. 2021;11(1): 11640. 10.1038/s41598-021-91142-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-30] 30. Alexandrov LB, Nik-Zainal S, Wedge DC, et al. : Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 2013;3(1):246–59. 10.1016/j.celrep.2012.12.008 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-31] 31. Austin PC, Fine JP: Practical recommendations for reporting Fine-Gray model analyses for competing risk data. Stat Med. 2017;36(27):4391–400. 10.1002/sim.7501 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-32] 32. Agresti A: Building and applying logistic regression models.In: An Introduction to Categorical Data Analysis. John Wiley & Sons, Ltd,2007[cited 2024 Dec 19];137–72. 10.1002/9780470114759.ch5 [DOI] [Google Scholar]

[ref-33] 33. Sosinsky A, Ambrose J, Cross W, et al. : Insights for precision oncology from the integration of genomic and clinical data of 13,880 tumors from the 100,000 Genomes Cancer Programme. Nat Med. 2024;30(1):279–89. 10.1038/s41591-023-02682-0 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-34] 34. Klarin D, Busenkell E, Judy R, et al. : Genome-wide association analysis of venous thromboembolism identifies new risk loci and genetic overlap with arterial vascular disease. Nat Genet. 2019;51(11):1574–9. 10.1038/s41588-019-0519-3 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-35] 35. Bertrums EJM, de Kanter JK, Derks LLM, et al. : Selective pressures of platinum compounds shape the evolution of therapy-related myeloid neoplasms. Nat Commun. 2024;15(1): 6025. 10.1038/s41467-024-50384-z [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-36] 36. Olesen JB, Lip GYH, Hansen ML, et al. : Validation of risk stratification schemes for predicting stroke and thromboembolism in patients with atrial fibrillation: nationwide cohort study. BMJ. 2011;342:d124. 10.1136/bmj.d124 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-37] 37. National Institute for Health and Care Excellence: Antiplatelet drugs | Treatment summaries | BNF content published by NICE. [cited 2024 Oct 1]. Reference Source

[ref-38] 38. National Institute for Health and Care Excellence: Oral anticoagulants | Treatment summaries | BNF content published by NICE. [cited 2024 Oct 1]. Reference Source

[ref-39] 39. Larsson SC, Burgess S: Causal role of high body mass index in multiple chronic diseases: a systematic review and meta-analysis of Mendelian Randomization studies. BMC Med. 2021;19(1):320. 10.1186/s12916-021-02188-x [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-40] 40. Khorana AA, Kuderer NM, Culakova E, et al. : Development and validation of a predictive model for chemotherapy-associated thrombosis. Blood. 2008;111(10):4902–7. 10.1182/blood-2007-10-116327 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-41] 41. Bakoyannis G, Yiannoutsos CT: Impact of and correction for outcome misclassification in cumulative incidence estimation. PLoS One. 2015;10(9): e0137454. 10.1371/journal.pone.0137454 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-42] 42. Gerds TA, Ohlendorff JS, Blanche P, et al. : riskRegression: risk regression models and prediction scores for survival analysis with competing risks.2023; [cited 2024 Sep 10]. https://cran.r-project.org/web/packages/riskRegression/index.html

[ref-43] 43. Gray B: cmprsk: subdistribution analysis of competing risks.2024; [cited 2024 Sep 10]. https://cran.r-project.org/web/packages/cmprsk/index.html

[ref-44] 44. Therneau TM, Lumley T, Elizabeth A, et al. : survival: survival analysis.2024; [cited 2024 Sep 10]. https://cran.r-project.org/web/packages/survival/index.html

[ref-45] 45. Cornish N: STROBE checklist for ‘Association between tumour somatic mutations and venous thromboembolism in the 100,000 Genomes Project cancer cohort: a study protocol’.2024. 10.5523/bris.1pmmcgyqaij8n27j94i2rgd4j9 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Association between tumour somatic mutations and venous thromboembolism in the 100,000 Genomes Project cancer cohort: a study protocol

Naomi Cornish

Sarah K Westbury

Matthew T Warkentin

Chrissie Thirlwell

Andrew D Mumford

Philip C Haycock

Roles

Version Changes

Revised. Amendments from Version 1

Abstract

Plain Language Summary

Introduction

Objective

Study population

Primary outcome definition

Table 1. ICD10 version: 2019 codes for venous thromboembolism in linked electronic health records.

Covariate assessment

Genetic data

Somatic variant calls and interpretation

Gene inclusion criteria for gene-centric analysis

Global tumour mutational burden

Mutational signature analysis

Statistical analyses

Primary analysis

Figure 1. Directed acyclic graph illustrating hypothesised relationships between somatic mutations and venous thromboembolism, including potential confounders (grey) and mediators (green).

Sensitivity analyses

Discussion

Study status

Ethics and consent

Acknowledgements

Funding Statement

Data availability

Underlying data

Reporting guidelines

Software availability

References

Reviewer response for version 2

Yan Xu

Roles

Reviewer response for version 2

Benilde Cosmi

Roles

Reviewer response for version 1

Benilde Cosmi

Roles

Naomi Cornish

Reviewer response for version 1

Yan Xu

Roles

References

Naomi Cornish

Associated Data

Data Availability Statement

Underlying data

Reporting guidelines

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases