Skip to main content
Journal of Peking University (Health Sciences) logoLink to Journal of Peking University (Health Sciences)
. 2023 Apr 27;55(3):471–479. [Article in Chinese] doi: 10.19723/j.issn.1671-167X.2023.03.013

乳腺癌患者新发心血管疾病预测模型的建立与验证:基于内蒙古区域医疗数据

Development and validation of risk prediction model for new-onset cardiovascular diseases among breast cancer patients: Based on regional medical data of Inner Mongolia

张 云静 1,2, 乔 丽颖 3, 祁 萌 4, 严 颖 5, 亢 伟伟 3, 刘 国臻 6, 王 明远 6, 席 云峰 3,*, 王 胜锋 1,2,*
PMCID: PMC10258055  PMID: 37291923

Abstract

目的

开发和验证乳腺癌患者新发心血管疾病(cardiovascular disease, CVD)的3年预测模型。

方法

基于内蒙古区域医疗数据,纳入接受抗肿瘤治疗的18岁以上乳腺癌女性患者。多因素Fine & Gray模型纳入预测因子后,使用Lasso回归筛选变量,在训练集上拟合Cox比例风险、Logistic回归、Fine & Gray、随机森林和XGBoost模型,在测试集上分别用受试者工作特征(receiver operating characteristics, ROC)曲线下面积(area under the curve, AUC)和校准曲线评价模型区分度和校准度。

结果

共纳入19 325例接受抗肿瘤治疗的乳腺癌患者,平均年龄(52.76±10.44)岁,中位随访时间1.18年[四分位距(interquartile range, IQR):2.71]。7 856例患者(40.65%)在乳腺癌诊断3年内发生CVD。Lasso回归筛选的预测因子为乳腺癌诊断年龄、居住地国内生产总值(gross domestic product,GDP)、肿瘤分期、高血压、缺血性心脏病及脑血管疾病既往史、手术类型、化疗类型、放疗类型。不考虑生存时间时,XGBoost模型的AUC显著高于随机森林模型[0.660 (95%CI:0.644~0.675) vs. 0.608 (95%CI:0.591~0.624), P < 0.001]和Logistic回归[0.609 (95%CI:0.593~0.625), P < 0.001],Logistic回归和XGBoost模型的校准度更好。考虑生存时间时,Cox比例风险模型和Fine & Gray模型的AUC差异无统计学意义[0.600 (95%CI:0.584~0.616) vs. 0.615 (95%CI:0.599~0.631), P=0.188],但Fine & Gray模型的校准度更好。

结论

基于区域医疗数据建立乳腺癌新发CVD的预测模型具有可行性。不考虑生存时间时,Logistic回归和XGBoost模型的预测性能更好;考虑生存时间时,Fine & Gray模型的预测性能更好。

Keywords: 乳腺肿瘤, 心血管疾病, 风险预测模型, 危险性评估, 计算机化病案系统


乳腺癌是全球第一大恶性肿瘤[1],由于预后较好,乳腺癌患者的生存质量逐渐引起关注。心血管疾病(cardiovascular disease, CVD)是乳腺癌主要的合并症之一[2]和主要死因之一[3-4],严重影响了乳腺癌患者的生存质量。CVD不仅和乳腺癌共有年龄、肥胖、吸烟等多个危险因素,也是乳腺癌抗肿瘤治疗的主要不良反应之一[5-6]。因此,识别发生CVD的高危乳腺癌患者,对采取预防措施具有重要意义。2021年中国临床肿瘤学会(Chinese Society of Clinical Oncology, CSCO)制定的乳腺癌诊疗指南推荐借助于大数据和人工智能建立预测模型,以辅助临床决策[7]。既往国内仅有一项基于三级医院乳腺癌术后住院患者的电子健康记录预测静脉血栓栓塞(venous thromboembolism, VTE)的研究[8],但该研究为单中心研究,未预测VTE之外的CVD。国外虽也有建立蒽环类药物、曲妥珠单抗等某类抗肿瘤治疗后乳腺癌患者新发心力衰竭、脑卒中、缺血性心脏病等CVD的预测模型,但同样存在未纳入乳腺癌治疗、仅关注于部分CVD事件等方面的不足[9-14]

区域医疗数据提供了覆盖范围广、时间跨度长的人群信息,但迄今为止国内尚未见基于此类数据建立的乳腺癌患者新发CVD预测模型的研究,此外,乳腺癌抗肿瘤治疗的毒性为长期效应,既往预测模型的研究多关注3年内CVD的发病情况[15-16]。因此,本研究尝试构建乳腺癌新发CVD的3年风险预测模型,探索基于区域医疗数据建立肿瘤心脏病学预测模型的可行性。

1. 资料与方法

1.1. 数据来源

数据库基于内蒙古自治区区域医疗信息平台(Inner Mongolia Regional Health Information Platform, IMRHIP),包含医疗保险数据、死亡登记数据、肿瘤登记信息系统数据,所有数据对患者信息均进行匿名化处理。本研究已通过内蒙古自治区疾病预防控制中心医学伦理委员会批准(NMCDCIRB2021001)。

1.2. 研究对象

在2012年1月1日至2021年3月9日的内蒙古自治区医疗保险数据库中,以含有乳腺癌的诊断名称或疾病诊断的国际疾病分类第十次修订版(International Classification of Diseases,10th edition,ICD-10)编码为“C50”来识别乳腺癌就诊记录。纳入至少有1次记录诊断为乳腺癌,且有至少1次乳腺癌抗肿瘤治疗记录(手术、化疗、内分泌治疗、靶向治疗或放疗)的所有18岁以上女性患者[9]。排除:(1)主要诊断信息、身份证号等重要信息缺失的患者10人(0.04%);(2)乳腺癌首次确诊前有其他肿瘤既往史的患者3 698人(16.06%)。

1.3. 指标定义

结局定义为除先天性心脏病和风湿性心脏病之外的任何种类的CVD。CVD分类主要依据欧洲心脏病学会发布的肿瘤心脏病学相关指南[17],结合世界卫生组织的CVD分类[18]、《中国心血管病报告2018》[19]、《中国心血管健康与疾病报告2019》[20]以及美国心脏病学会/美国心脏学会[21]指定的指南等,再根据临床医生的建议进行调整,最终将CVD归为以下12类:(1)心肌功能不全;(2)心力衰竭;(3)缺血性心脏病/冠状动脉疾病;(4)心脏瓣膜病;(5)心律失常;(6)血栓栓塞性疾病;(7)周围血管病;(8)脑血管病;(9)肺血管病;(10)其他;(11)主动脉瘤;(12)高血压。出现新发CVD的定义为:乳腺癌诊断后3年内至少1次新发生CVD的门诊/住院相关诊断记录,或至少1次新出现的CVD相关药物用药记录,而且该CVD所属的种类在乳腺癌首次诊断前的记录中从未出现过[22]。竞争风险事件定义为非CVD死亡事件,即除以上心血管病导致的其他死亡。

预测因子变量:预测因子的具体定义和测量方法见表 1。年龄、民族、医疗保险类型、合并症记录、治疗记录来源于医疗保险数据库;肿瘤信息来源于肿瘤登记数据库,以及依据治疗模式和临床医生建议;内蒙古自治区各盟、市区的国内生产总值(gross domestic product,GDP)来源于内蒙古统计局[23]。合并症方面,计算Charlson共患指数(Charlson comorbidity index, CCI)[24]

表 1.

2012—2021年内蒙古自治区乳腺癌新发CVD预测模型拟纳入的变量信息

The information of variables included in the prediction model for CVD among breast cancer patients in the Inner Mongolia Autonomous Region from 2012 to 2021

Variables Measurement Variable assignments
CVD, cardiovascular diseases; GDP, gross domestic product.
Age on set Age at diagnosis of breast cancer according to birthday and date of diagnosis Years
Type of medical insurance Type of medical insurance 0=employee, 1=residence
Ethnicity Ethnicity according to medical insurance data 0=Han, 1=Mongolian, 2=others
GDP of residence GDP of district where patients lived according to Inner Mongolia Bureau of Statistics 100 million yuan
Tumor stage According to clinical TNM stage and treatment patterns. Stage=0, 1, 2, 3 was defined as “early” while stage=4 was defined as “advanced” 0=early, 1=unknown, 2=advanced
Type of surgery According to records of treatments from medical insurance data 0=none, 1=breast-conserving surgery, 2=mastectomy
Type of chemotherapy According to records of treatments from medical insurance data 0=none, 1=anthracyclines, 2=non-anthracyclines
Endocrine therapy According to records of treatments from medical insurance data 0=no, 1=yes
Type of targeted therapy According to records of treatments from medical insurance data 0=none, 1=trastuzumab, 2=non-trastuzumab
Radiotherapy According to records of treatments from medical insurance data 0=no, 1=yes
History of diabetes According to records of medical insurance before breast cancer diagnosis 0=no, 1=yes
History of renal diseases According to records of medical insurance before breast cancer diagnosis 0=no, 1=yes
History of hypertension According to records of medical insurance before breast cancer diagnosis 0=no, 1=yes
History of ischemic heart diseases According to records of medical insurance before breast cancer diagnosis 0=no, 1=yes
History of cerebrovascular diseases According to records of medical insurance before breast cancer diagnosis 0=no, 1=yes
History of dyslipidemia According to records of medical insurance before breast cancer diagnosis 0=no, 1=yes
History of hypothyroidism According to records of medical insurance before breast cancer diagnosis 0=no, 1=yes
History of hyperthyroidism According to records of medical insurance before breast cancer diagnosis 0=no, 1=yes
Charlson comorbidity index According to records of medical insurance Points
Hospitalization before diagnosis within 1 year According to records of medical insurance before breast cancer diagnosis within 1 year 0=no, 1=yes
Length of stay before diagnosis within 1 year According to the duration of hospitalization records from medical insurance before breast cancer diagnosis within 1 year Days

1.4. 统计学分析

分别采用Logistic回归、Cox等比例风险模型、Fine & Gray模型、随机森林模型和XGBoost模型建立预测模型。将数据按7 ∶ 3的比例通过随机抽样分为训练集(70%)和验证集(30%),在训练集上进行变量筛选和模型建立。民族(0.03%)存在缺失值,由众数填补。通过多因素Fine & Gray模型和既往文献报道的影响因素确定纳入回归模型的预测因子。使用Lasso回归方法筛选危险因素,采用10折交叉验证法及在最小均方误差的一个标准误差内最大的参数λ值(λ.1se)为最优调整参数。

采用区分度和校准度评价预测模型。区分度方面,绘制受试者工作曲线(receiver operating characteristic, ROC),使用ROC曲线下面积(area under curve, AUC)和C指数评价模型的区分能力指标[25]。校准度借助校准曲线来评价。

采用R x64 4.2.2软件进行模型训练和数据分析。计量资料均为近似正态分布,以x±s表示,组间比较采用t检验;计数资料以例(%)表示,组间比较采用Pearson χ2检验(二分类变量)。所有分析均采用双侧统计检验,P < 0.05为差异有统计学意义。

1.5. 敏感性分析

考虑到各种CVD之间可能存在关联,例如高血压是其他CVD发病的危险因素[26],故在敏感性分析中排除基线已患CVD的患者,在预测因子中排除脑血管疾病既往史、高血压既往史、缺血性心脏病既往史。

2. 结果

2.1. 基本信息

共纳入接受抗肿瘤治疗的乳腺癌患者19 325例,平均入组年龄为(52.76±10.44)岁。总随访人年为35 819.06人年,中位随访时间为1.18年[四分位距(interquartile range, IQR):2.71年]。共7 856例患者(40.65%)在乳腺癌诊断3年内发生CVD。新发CVD患者确诊乳腺癌的年龄更大,居民医疗保险的比例更高,早期患者更多,既往有糖尿病、肾脏疾病、高血压、缺血性心脏病、脑血管疾病史的患者比例更低,接受全切手术、蒽环类药物治疗、放疗的患者更多,差异均有统计学意义(表 2)。

表 2.

2012—2021年内蒙古自治区接受抗肿瘤治疗的乳腺癌患者的人口学特征

Demographic characteristics of breast cancer patients receiving anti-tumor treatment in the Inner Mongolia Autonomous Region from 2012 to 2021

Items CVD (n=7 856) No CVD (n=11 469) P value
Missing values existed in ethnicity (n=5). CVD, cardiovascular disease; GDP, gross domestic product; IQR, interquartile range.
Age onset/years, x±s 53.48±10.38 52.28±10.45 < 0.001
Type of medical insurance, n (%) < 0.001
   Employee 4 291 (54.62) 6 829 (59.54)
   Resident 3 565 (45.38) 4 640 (40.46)
Ethnicity, n (%) 0.002
   Han 6 727 (85.63) 9 620 (83.88)
   Mongolian 865 (11.01) 1 372 (11.96)
   Others 263 (3.35) 473 (4.12)
GDP of residence, median (IQR) 156.21 (180.36) 174.46 (312.93) < 0.001
Charlson comorbidity index, x±s 1.89±3.32 1.84±3.24 0.254
Tumor stage, n (%) < 0.001
   Early 3 619 (46.07) 4 807 (41.91)
   Advanced 3 114 (39.64) 4 680 (40.81)
   Unknown 1 123 (14.29) 1 982 (17.28)
Previous disease history, n (%)
   Dyslipidemia 943 (12.00) 2 202 (19.20) < 0.001
   Diabetes 943 (12.00) 2 202 (19.20) < 0.001
   Renal diseases 81 (1.03) 181 (1.58) 0.002
   Hypertension 1 325 (16.87) 3 535 (30.82) < 0.001
   Ischemic heart diseases 435 (5.54) 1 313 (11.45) < 0.001
   Cerebrovascular diseases 547 (6.96) 1 372 (11.96) < 0.001
   Hyperthyroidism 10 (0.13) 26 (0.23) 0.160
   Hypothyroidism 9 (0.11) 24 (0.21) 0.165
Hospitalization before diagnosis within 1 year, n (%) 1 743 (22.19) 3 309 (28.85) < 0.001
Length of stay before diagnosis within 1 year, x±s 2.88±9.15 3.67±8.99 < 0.001
Type of surgery, n (%) < 0.001
   Mastectomy 3 262 (41.52) 3 910 (34.09)
   Breast-conserving 247 (3.14) 495 (4.32)
   None 4 347 (55.33) 7 064 (61.59)
Type of chemotherapy, n (%) < 0.001
   Anthracyclines 3 399 (43.27) 4 299 (37.48)
   Non-anthracyclines 2 394 (30.47) 3 593 (31.33)
   None 2 063 (26.26) 3 577 (31.19)
Endocrine therapy, n (%) 4 669 (59.43) 7 114 (62.03) < 0.001
Type of targeted therapy, n (%) 0.018
   Trastuzumab 1 023 (13.02) 1 649 (14.38)
   Non-trastuzumab 67 (0.85) 111 (0.97)
   None 6 766 (86.13) 9 709 (84.65)
Radiotherapy, n (%) 587 (7.47) 533 (4.65) < 0.001

2.2. 筛选预测因子

根据Fine & Gray模型的多因素分析结果,乳腺癌诊断年龄、民族、居住地GDP、肿瘤分期、手术治疗类型、化疗类型、内分泌治疗、放疗类型、高血压既往史、缺血性心脏病既往史的差异有统计学意义(表 3)。结合既往研究结果,将血脂紊乱既往史、糖尿病既往史、脑血管疾病既往史、肾脏疾病既往史、靶向治疗类型也纳入了潜在影响因素中[14, 16, 27]

表 3.

2012—2021年内蒙古自治区接受抗肿瘤治疗的乳腺癌患者的多因素竞争风险模型结果

Results of multivariate competing risk models for breast cancer patients receiving anti-tumor treatment in the Inner Mongolia Autonomous Region from 2012 to 2021

Factor β SE Wald χ2 HR (95%CI) P value
GDP, gross domestic product.
Age onset/years 0.02 0.00 14.17 1.02 (1.01-1.02) < 0.001
Type of medical insurance
   Employee 1.00
   Resident -0.03 0.03 -1.05 0.97 (0.92-1.03) 0.290
Ethnicity
   Han 1.00
   Mongolian 0.00 0.04 0.12 1.00 (0.94-1.08) 0.910
   Others -0.13 0.06 -2.02 0.88 (0.78-1.00) 0.043
GDP of residence 0.00 0.00 -8.36 1.00 (1.00-1.00) < 0.001
Charlson comorbidity index 0.00 0.00 -0.40 1.00 (0.99-1.01) 0.690
Tumor stage
   Early 1.00
   Advanced -0.14 0.04 -3.44 0.87 (0.80-0.94) 0.001
   Unknown -0.06 0.03 -2.15 0.94 (0.88-0.99) 0.031
Type of surgery
   None 1.00
   Breast-conserving 0.06 0.07 0.83 1.06 (0.92-1.22) 0.410
   Mastectomy 0.27 0.03 8.75 1.31 (1.23-1.39) < 0.001
Type of chemotherapy
   None 1.00
   Anthracyclines 0.19 0.03 6.22 1.21 (1.14-1.29) < 0.001
   Non-anthracyclines 0.11 0.03 3.48 1.11 (1.05-1.18) < 0.001
Endocrine therapy
   No 1.00
   Yes -0.10 0.02 -4.07 0.90 (0.86-0.95) < 0.001
Type of targeted therapy
   None 1.00
   Trastuzumab -0.01 0.03 -0.26 0.99 (0.93-1.06) 0.790
   Non-trastuzumab -0.04 0.12 -0.33 0.96 (0.76-1.21) 0.740
Radiotherapy
   No 1.00
   Yes 0.26 0.05 5.35 1.30 (1.18-1.42) < 0.001
History of hypertension
   No 1.00
   Yes -0.44 0.04 -11.52 0.64 (0.60-0.69) < 0.001
History of dyslipidemia
   No 1.00
   Yes -0.02 0.04 -0.53 0.98 (0.90-1.06) 0.590
History of diabetes
   No 1.00
   Yes 0.03 0.04 0.83 1.03 (0.96-1.12) 0.410
History of ischemic heart diseases
   No 1.00
   Yes -0.35 0.06 -6.31 0.71 (0.63-0.79) < 0.001
History of cerebrovascular diseases
   No 1.00
   Yes -0.05 0.05 -1.07 0.95 (0.86-1.05) 0.280
History of renal diseases
   No 1.00
   Yes 0.08 0.12 0.72 1.09 (0.87-1.36) 0.470
History of hyperthyroidism
   No 1.00
   Yes -0.07 0.32 -0.22 0.93 (0.50-1.75) 0.830
History of hypothyroidism
   No 1.00
   Yes 0.01 0.32 0.02 1.01 (0.53-1.90) 0.980
Hospitalization before diagnosis within 1 year
   No 1.00
   Yes -0.07 0.04 -1.75 0.93 (0.86-1.01) 0.081
Length of stay before diagnosis within 1 year 0.00 0.00 -1.20 1.00 (0.99-1.00) 0.230

将以上影响因素纳入Lasso回归中,λ.1se=0.010 127,乳腺癌诊断年龄、居住地GDP、肿瘤分期、高血压既往史、缺血性心脏病既往史、脑血管疾病既往史、手术类型、化疗类型、放疗类型是CVD发病的相关影响因素。

2.3. 模型建立与评价

模型的区分度如图 1所示,不考虑生存时间时,Logistic回归的AUC为0.609(95%CI: 0.593~0.625),XGBoost模型的AUC为0.660(95%CI: 0.644~0.675),随机森林的AUC为0.608(95%CI: 0.591~0.624)。XGBoost模型的AUC显著高于Logistic回归(P < 0.001)和随机森林模型(P < 0.001),但Logistic回归与随机森林模型的AUC差异无统计学意义(P=0.863)。相对而言,考虑生存时间的两个模型,即Cox比例风险模型和Fine & Gray模型的区分度更低,AUC分别为0.600(95%CI: 0.584~0.616)和0.615(95%CI: 0.599~0.631),且差异无统计学意义(P=0.188),C指数分别为0.611和0.609。

图 1.

2012—2021年内蒙古自治区接受抗肿瘤治疗的乳腺癌患者的测试集ROC曲线

The ROC curves of the testing set for breast cancer patients receiving anti-tumor treatment in the Inner Mongolia Autonomous Region from 2012 to 2021

Parameters were trained on the training set. In the random forest model, tree number was 700, the number of predictors that will be randomly sampled at each split was 5. In the XGBoost model, tree number was 500, learning rate was 0.018, the maximum depth of the tree was 8, the sample size was 1, the minimum number of data points in a node that is required for the node to be split further was 40. ROC, receiver operator characteristic; AUC, area under the curve.

图 1

模型的校准度方面,校准曲线如图 2所示。不考虑生存时间时,Logistic回归和XGBoost模型的校准度较好;考虑生存时间时,Fine & Gray模型的校准度更好。

图 2.

2012—2021年内蒙古自治区接受抗肿瘤治疗的乳腺癌患者的测试集校准曲线

The calibration curves of the testing set for breast cancer patients receiving anti-tumor treatment in the Inner Mongolia Autonomous Region from 2012 to 2021

A, the calibration curve of the testing set for Logistic regression, random forest, and XGBoost; B, the calibration curve of the testing set for the Cox proportional hazard model and Fine & Gray model.

图 2

2.4. 敏感性分析

排除乳腺癌诊断前患有任意种类CVD的患者后,共纳入12 414例患者。主分析和敏感性分析的生存曲线如图 3所示。多因素Fine & Gray模型结果显示,预测因子较主分析增加了甲状腺亢进病既往史和诊断前1年是否有住院史。Lasso筛选后,λ.1se=0.013 931,预测因子增加了诊断前1年是否有住院史。敏感性分析结果与主分析一致。

图 3.

图 3

2012—2021年内蒙古自治区接受抗肿瘤治疗的乳腺癌患者的主分析和敏感性分析生存曲线

Survival curve of main analysis and sensitivity analysis for breast cancer patients receiving anti-tumor treatment in the Inner Mongolia Autonomous Region from 2012 to 2021

3. 讨论

本研究利用国内的区域医疗数据库,尝试建立乳腺癌患者确诊后3年内新发CVD的预测模型,探索依据此类数据开发肿瘤心脏病学领域预测模型的可行性。本研究纳入了乳腺癌患者诊断年龄、居住地GDP、肿瘤分期、高血压既往史、缺血性心脏病既往史、脑血管疾病既往史、手术类型、化疗类型、放疗类型共9个预测因子,显示Logistic回归、XGBoost模型和Fine & Gray模型的分类效果更佳,未来可考虑用于评估乳腺癌患者新发CVD的风险。

基于影响因素建立临床预测模型是辅助识别高危人群、尽早开展干预的重要工具之一。目前肿瘤心脏病学领域的预测模型开发尚处于探索阶段,应该尝试纳入多维度的预测因子,基于多种算法建立并验证预测模型,以探索更佳的风险评估工具。但是,既往研究在研究设计、预测因子、模型等方面均存在不足。研究设计方面,部分研究将静脉血栓栓塞症(venous thromboembolism, VTE)[8]、心力衰竭[10-11]等单一CVD事件作为预测结局,使预测模型的适用场景受到局限。更好的选择是实现CVD多病协防共管的健康管理模式[28-29],例如,欧洲心脏病学会的心力衰竭协会与国际心脏肿瘤学会开发的风险分层工具(HFA-ICOS风险工具)用于预测左心室射血分数(left ventricular ejection fraction, LVEF)下降、充血性心力衰竭、心脏死亡或曲妥珠单抗停药的综合结局的心血管毒性率[30],或部分研究以主要心血管不良事件(major adverse cardiovascular events, MACE)为结局[31]。因此,本研究采用了含有多种CVD的复合结局,可用于评估乳腺癌患者在心血管方面的综合风险。

预测因子方面,考虑到抗肿瘤治疗的心血管毒性,多数模型将抗肿瘤治疗类型作为重要预测因子纳入[14, 30]。本研究中,手术类型、化疗类型、放疗类型作为预测因子被纳入模型中。与既往研究一致,目前已有相当数量的流行病学研究表明,接受了乳腺癌手术[32]、化疗[33]、放疗[34]的患者存在更高的CVD发病风险。在机制方面,上述结论也同样具有合理性[35]。例如,化疗药物中的蒽环类药物产生心脏毒性的机制之一是通过氧化应激和拓扑异构酶Ⅱb抑制破坏心肌细胞[36];恶性肿瘤手术可能引起失血和电解质紊乱,使心房颤动等CVD发生风险有所上升[37];炎症变化以及产生活性氧是早期放疗辐射导致心脏组织损伤的主要原因,而且这种炎症和氧化应激的持续状态可引起延迟的组织损伤。虽然曲妥珠单抗的心血管毒性也已被证实[22],但由于曲妥珠单抗是在2019年进入内蒙古自治区医疗保险目录的[38],可能导致使用曲妥珠单抗的患者样本量较小,故未发现明显差异。

在模型方面,多数研究基于Logistic回归[10]、Cox回归[9, 14]、亚分布风险模型[30]等传统方式建模,极少数研究尝试了机器学习的方法[16]。本研究证实了基于区域医疗数据开发传统模型和机器学习模型具有可行性,而且二者的性能相当。与传统模型相比,机器学习基于数据本身的特征进行学习和模式识别,对数据分布等特征的要求较低[39],在处理大量复杂数据时可通过优化参数和模型选择来提高预测准确性[40]。但机器学习也存在过拟合、稳定性较低等不足[41],实际使用时应权衡利弊加以选择。在传统模型中,Fine & Gray模型将全因死亡作为CVD的竞争风险事件,优于Cox比例风险模型,究其原因可能是中国乳腺癌患者的3年生存率为81.0%(95%CI:77.8%~83.8%)[42],死亡对CVD发病有一定影响,若单纯将死亡患者列为删失,则会导致CVD发病风险被高估[43-44]

本研究基于多中心、大样本的人群数据构建了中国人群乳腺癌新发CVD的预测模型,证明区域医疗数据运用于肿瘤心脏病学领域的预测模型具有一定可行性。此外,本研究的结局包含了除先天性心脏病之外的所有种类的CVD,可充分且准确地评估乳腺癌患者新发任一种类CVD的风险。本研究亦有以下局限性:(1)由于区域医疗信息平台缺乏如体重指数(body mass index, BMI)、肿瘤亚型、LVEF、实验室检查结果等关键数据[30],研究无法纳入其作为预测因子,但由于有如高血压[45]、糖尿病[46]等多种与BMI高度相关的合并症被纳入模型,一定程度上弥补了BMI缺失对模型性能的影响,且既往预测模型也显示,既往合并症、抗肿瘤治疗类型、年龄等才是更为关键的预测因子[9, 12];(2)本研究所建立的预测模型仅基于内蒙古自治区人群,未进行外部验证,未来可尝试对模型进行合理的外部验证及模型更新。

基于区域医疗数据所开发的预测模型能相对便利、低成本地整合嵌套于已有的信息采集系统,进而辅助乳腺癌患者的精准化诊疗决策。此外,未来还可探索进一步对高危人群开展干预的成本收益分析[47]。本研究结果显示,基于我国区域医疗数据建立肿瘤心脏病学预测模型具有可行性,Logistic回归和XGBoost预测模型最佳。未来建议进一步开发区域医疗数据,加强数据库的互联互通性,在肿瘤心脏病学领域拓展机器学习的应用场景,以形成更具临床参考价值的风险评估工具。

志谢

感谢柯雅蕾、蒋宇玲、饶颖婷、许璐、何国华、于玥琳、任静、颜雪、邓思危、杨昕昱、宋雨潼、杨英姿、温俏睿、韩静、吴宜伟和张骁彧对数据整理所提供的帮助!

Funding Statement

国家自然科学基金(82173616)

Supported by the National Natural Science Foundation of China (82173616)

Contributor Information

席 云峰 (Yun-feng XI), Email: xiyunfeng210@163.com.

王 胜锋 (Sheng-feng WANG), Email: shengfeng1984@126.com.

References

  • 1.International Agency for Research on Cancer. World cancer day: Breast cancer overtakes lung cancer as leading cause of cancer worldwide. IARC showcases key research projects to address breast cancer [EB/OL]. (2021-02-04) [2023-02-20]. https://www.iarc.who.int/news-events/world-cancer-day-2021/.
  • 2.Connor AE, Schmaltz CL, Jackson-Thompson J, et al. Comorbidities and the risk of cardiovascular disease mortality among racially diverse patients with breast cancer. Cancer. 2021;127(15):2614–2622. doi: 10.1002/cncr.33530. [DOI] [PubMed] [Google Scholar]
  • 3.Abdel-Qadir H, Austin PC, Lee DS, et al. A population-based study of cardiovascular mortality following early-stage breast cancer. JAMA Cardiol. 2017;2(1):88–93. doi: 10.1001/jamacardio.2016.3841. [DOI] [PubMed] [Google Scholar]
  • 4.Sturgeon KM, Deng L, Bluethmann SM, et al. A population-based study of cardiovascular disease mortality risk in US cancer patients. Eur Heart J. 2019;40(48):3889–3897. doi: 10.1093/eurheartj/ehz766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Padegimas A, Clasen S, Ky B. Cardioprotective strategies to prevent breast cancer therapy-induced cardiotoxicity. Trends Cardiovasc Med. 2020;30(1):22–28. doi: 10.1016/j.tcm.2019.01.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Mehta LS, Watson KE, Barac A, et al. Cardiovascular disease and breast cancer: Where these entities intersect: A scientific statement from the American Heart Association. Circulation. 2018;137(8):e30–e66. doi: 10.1161/CIR.0000000000000556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.中国抗癌协会乳腺癌专业委员会 中国抗癌协会乳腺癌诊治指南与规范(2021版) 中国癌症杂志. 2021;31(10):954–1040. [Google Scholar]
  • 8.Li J, Qiang WM, Wang Y, et al. Development and validation of a risk assessment nomogram for venous thromboembolism associated with hospitalized postoperative Chinese breast cancer patients. J Adv Nurs. 2021;77(1):473–483. doi: 10.1111/jan.14571. [DOI] [PubMed] [Google Scholar]
  • 9.Ezaz G, Long JB, Gross CP, et al. Risk prediction model for heart failure and cardiomyopathy after adjuvant trastuzumab therapy for breast cancer. J Am Heart Assoc. 2014;3(1):e000472. doi: 10.1161/JAHA.113.000472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Fogarassy G, Vathy-Fogarassy Á, Kenessey I, et al. Risk prediction model for long-term heart failure incidence after epirubicin chemotherapy for breast cancer: A real-world data-based, nationwide classification analysis. Int J Cardiol. 2019;285:47–52. doi: 10.1016/j.ijcard.2019.03.013. [DOI] [PubMed] [Google Scholar]
  • 11.Romond EH, Jeong JH, Rastogi P, et al. Seven-year follow-up assessment of cardiac function in NSABP B-31, a randomized trial comparing doxorubicin and cyclophosphamide followed by paclitaxel (ACP) with ACP plus trastuzumab as adjuvant therapy for patients with node-positive, human epidermal growth factor receptor 2-positive breast cancer. J Clin Oncol. 2012;30(31):3792–3799. doi: 10.1200/JCO.2011.40.0010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Abdel-Qadir H, Thavendiranathan P, Austin PC, et al. Development and validation of a multivariable prediction model for major adverse cardiovascular events after early stage breast cancer: A population-based cohort study. Eur Heart J. 2019;40(48):3913–3920. doi: 10.1093/eurheartj/ehz460. [DOI] [PubMed] [Google Scholar]
  • 13.Dranitsaris G, Rayson D, Vincent M, et al. The development of a predictive model to estimate cardiotoxic risk for patients with metastatic breast cancer receiving anthracyclines. Breast Cancer Res Treat. 2008;107(3):443–450. doi: 10.1007/s10549-007-9803-5. [DOI] [PubMed] [Google Scholar]
  • 14.Kim DY, Park MS, Youn JC, et al. Development and validation of a risk score model for predicting the cardiovascular outcomes after breast cancer therapy: The CHEMO-RADIAT score. J Am Heart Assoc. 2021;10(16):e021931. doi: 10.1161/JAHA.121.021931. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Rushton M, Johnson C, Dent S. Trastuzumab-induced cardiotoxi-city: Testing a clinical risk score in a real-world cardio-oncology population. Curr Oncol. 2017;24(3):176–180. doi: 10.3747/co.24.3349. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Chang WT, Liu CF, Feng YH, et al. An artificial intelligence approach for predicting cardiotoxicity in breast cancer patients receiving anthracycline. Arch Toxicol. 2022;96(10):2731–2737. doi: 10.1007/s00204-022-03341-y. [DOI] [PubMed] [Google Scholar]
  • 17.Zamorano JL, Lancellotti P, Rodriguez Munoz D, et al. 2016 ESC Position Paper on cancer treatments and cardiovascular toxicity developed under the auspices of the ESC Committee for Practice Guidelines: The Task Force for cancer treatments and cardiovascular toxicity of the European Society of Cardiology (ESC) Eur J Heart Fail. 2017;19(1):9–42. doi: 10.1002/ejhf.654. [DOI] [PubMed] [Google Scholar]
  • 18.World Health Organization. Cardiovascular diseases (CVDs) [EB/OL]. (2021-6-11) [2023-12-14]. https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds).
  • 19.胡 盛寿, 高 润霖, 刘 力生, et al. 《中国心血管病报告2018》概要. 中国循环杂志. 2019;34(3):209–220. doi: 10.3969/j.issn.1000-3614.2019.03.001. [DOI] [Google Scholar]
  • 20.中国心血管健康与疾病报告编写组 中国心血管健康与疾病报告2019概要. 中国循环杂志. 2020;35(9):833–854. [Google Scholar]
  • 21.Grundy SM, Stone NJ, Bailey AL, et al. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol: A report of the American College of Cardiology/American Heart Association Task Force on clinical practice guidelines. Circulation. 2019;139(25):e1082–e1143. doi: 10.1161/CIR.0000000000000625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Chien HC, Kao Yang YH, Bai JP. Trastuzumab-related cardio-toxic effects in Taiwanese women: A nationwide cohort study. JAMA Oncol. 2016;2(10):1317–1325. doi: 10.1001/jamaoncol.2016.1269. [DOI] [PubMed] [Google Scholar]
  • 23.内蒙古自治区统计局. 国民经济核算-地区生产总值-各盟市年度数据[EB/OL]. (2021-03-01) [2023-02-20]. http://tj.nmg.gov.cn/datashow/easyquery/easyquery.htm?cn=B0103.
  • 24.Charlson ME, Pompei P, Ales KL, et al. A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis. 1987;40(5):373–383. doi: 10.1016/0021-9681(87)90171-8. [DOI] [PubMed] [Google Scholar]
  • 25.王 俊峰, 章 仲恒, 周 支瑞, et al. 临床预测模型: 模型的验证. 中国循证心血管医学杂志. 2019;11(2):141–144. doi: 10.3969/j.issn.1674-4055.2019.02.04. [DOI] [Google Scholar]
  • 26.中国心血管健康与疾病报告编写组 中国心血管健康与疾病报告2021概要. 中国循环杂志. 2022;37(6):553–578. doi: 10.3969/j.issn.1000-3614.2022.06.001. [DOI] [Google Scholar]
  • 27.Sutton AL, Felix AS, Wahl S, et al. Racial disparities in treatment-related cardiovascular toxicities amongst women with breast cancer: A scoping review [J/OL]. J Cancer Surviv, 2022, [2022-04-14]. https://doi.org/10.1007/s11764-022-01210-2.
  • 28.戴芮. 心脑血管疾病"协防共管"健康管理模式评价指标体系研究[D]. 江苏: 南京医科大学, 2021.
  • 29.林 晓斐. 国务院办公厅印发《中国防治慢性病中长期规划(2017—2025年)》. 中医药管理杂志. 2017;25(4):14. [Google Scholar]
  • 30.Battisti NML, Andres MS, Lee KA, et al. Incidence of cardio-toxicity and validation of the Heart Failure Association-International Cardio-Oncology Society risk stratification tool in patients treated with trastuzumab for HER2-positive early breast cancer. Breast Cancer Res Treat. 2021;188(1):149–163. doi: 10.1007/s10549-021-06192-w. [DOI] [PubMed] [Google Scholar]
  • 31.D'Agostino RB Sr, Vasan RS, Pencina MJ, et al. General car-diovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008;117(6):743–753. doi: 10.1161/CIRCULATIONAHA.107.699579. [DOI] [PubMed] [Google Scholar]
  • 32.Guha A, Fradley MG, Dent SF, et al. Incidence, risk factors, and mortality of atrial fibrillation in breast cancer: A SEER-Medicare analysis. Eur Heart J. 2022;43(4):300–312. doi: 10.1093/eurheartj/ehab745. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Henry ML, Niu J, Zhang N, et al. Cardiotoxicity and cardiac monitoring among chemotherapy-treated breast cancer patients. JACC Cardiovasc Imaging. 2018;11(8):1084–1093. doi: 10.1016/j.jcmg.2018.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Boekel NB, Jacobse JN, Schaapveld M, et al. Cardiovascular disease incidence after internal mammary chain irradiation and anthracycline-based chemotherapy for breast cancer. Br J Cancer. 2018;119(4):408–418. doi: 10.1038/s41416-018-0159-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Giordano G, Spagnuolo A, Olivieri N, et al. Cancer drug related cardiotoxicity during breast cancer treatment. Expert Opin Drug Saf. 2016;15(8):1063–1074. doi: 10.1080/14740338.2016.1182493. [DOI] [PubMed] [Google Scholar]
  • 36.Zhang S, Liu XB, Bawa-Khalfe T, et al. Identification of the molecular basis of doxorubicin-induced cardiotoxicity. Nat Med. 2012;18(11):1639–1642. doi: 10.1038/nm.2919. [DOI] [PubMed] [Google Scholar]
  • 37.Higuchi S, Kabeya Y, Matsushita K, et al. Incidence and complications of perioperative atrial fibrillation after non-cardiac surgery for malignancy. PLoS One. 2019;14(5):e0216239. doi: 10.1371/journal.pone.0216239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.内蒙古自治区医疗保障局. 关于执行国家基本医疗保险、工伤保险和生育保险药品目录(2019年版)的通知[EB/OL]. (2019-12-06) [2023-02-25]. https://ylbzj.nmg.gov.cn/zwgk/zfxxgk/fdzdgknr/bmwj/202103/t20210326_1313389.html.
  • 39.Nusinovici S, Tham YC, Chak Yan MY, et al. Logistic regression was as good as machine learning for predicting major chronic diseases. J Clin Epidemiol. 2020;122:56–69. doi: 10.1016/j.jclinepi.2020.03.002. [DOI] [PubMed] [Google Scholar]
  • 40.James G, Witten D, Hastie T, et al. An introduction to statistical learning with application in R. New York: Springer; 2013. [Google Scholar]
  • 41.Hastie T, Tibshirani R, Friedman JH, et al. The elements of statistical learning: Data mining, inference and prediction. New York: Springer; 2009. [Google Scholar]
  • 42.International Agency for Research on Cancer. SURVCAN [EB/OL]. (2019-01-01) [2023-02-20]. https://gco.iarc.fr/survi-val/survcan/dataviz/table?mode=population&population_group=Asia&cancers=180&survival=5.
  • 43.Nolan EK, Chen HY. A comparison of the Cox model to the Fine-Gray model for survival analyses of re-fracture rates. Arch Osteoporos. 2020;15(1):86. doi: 10.1007/s11657-020-00748-x. [DOI] [PubMed] [Google Scholar]
  • 44.Putter H, Fiocco M, Geskus RB. Tutorial in biostatistics: Competing risks and multi-state models. Stat Med. 2007;26(11):2389–2430. doi: 10.1002/sim.2712. [DOI] [PubMed] [Google Scholar]
  • 45.惠 春霞, 陈 文婕, 钱 永刚, et al. 内蒙古自治区居民超重/肥胖多水平分析. 慢性病学杂志. 2020;21(3):319–322. [Google Scholar]
  • 46.王 瑞琪, 杜 茂林, 梁 丹艳, et al. 内蒙古地区流动人口糖尿病影响因素的研究. 现代预防医学. 2018;45(1):155–159. [Google Scholar]
  • 47.Galovic M, Döhler N, Erdélyi-Canavese B, et al. Prediction of late seizures after ischaemic stroke with a novel prognostic model (the SeLECT score): A multivariable prediction model development and validation study. Lancet Neurol. 2018;17(2):143–152. doi: 10.1016/S1474-4422(17)30404-0. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Peking University (Health Sciences) are provided here courtesy of Editorial Office of Beijing Da Xue Xue Bao Yi Xue Ban, Peking University Health Science Center

RESOURCES