基于随机生存森林模型的肝外胆管癌根治术后预后预测模型的构建与验证
作者:
通讯作者:
作者单位:

1.苏州大学附属第一医院 普通外科,江苏 苏州 215000;2.嘉兴市第一医院 骨科,浙江 嘉兴 314000;3.河南中医药大学人民医院 消化内科,河南 郑州 450000

作者简介:

吴市委,苏州大学附属第一医院硕士研究生,主要从事肝胆胰方面的研究。

基金项目:

苏州市科技发展计划基金资助项目(SLJ2021001)。


Development and validation of a random survival forest model for prognosis prediction in extrahepatic cholangiocarcinoma after radical resection
Author:
Affiliation:

1.Department of General Surgery, the First Affiliated Hospital of Soochow University, Suzhou, Jiangsu 215000, China;2.Department of Orthopedics, Jiaxing First Hospital, Jiaxing, Zhejiang 314000, China;3.Department of Gastroenterology, People's Hospital of Henan University of Chinese Medicine, Zhengzhou 450000, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 音频文件
  • |
  • 视频文件
    摘要:

    背景与目的 肝外胆管癌(ECCA)是一类起病隐匿、侵袭性强且预后较差的恶性肿瘤,其术后复发率高,5年生存率不足20%。现有多数预后模型基于Cox比例风险模型,受比例风险假设及线性关系限制。随机生存森林(RSF)模型作为一种新型机器学习算法,能够捕捉变量间复杂的交互和非线性效应,但在ECCA领域的应用仍较少。因此,本研究基于RSF模型这一机器学习算法,构建根治术后ECCA患者预后模型,旨在精准化、个体化评价根治术后ECCA患者的预后,为临床决策制定提供参考。方法 回顾性纳入2016—2021年SEER数据库中符合纳入标准的根治术后ECCA患者515例,按7∶3比例随机分为训练集(361例)和测试集(154例)。收集人口学及临床特征,采用单、多因素Cox回归构建Cox模型,并通过VIMP法与最小深度法筛选变量建立RSF模型。通过一致性指数(C指数)、曲线下面积(AUC)、Brier评分、校准曲线及决策曲线评估模型性能,并结合Kaplan-Meier生存分析及可解释性工具(SurvSHAP与SurvLIME)进行解释。结果 多因素Cox回归筛选出年龄、人种、收入、T分期、N分期、肿瘤大小及化疗7个独立预后因素。RSF模型则筛选出年龄、肿瘤大小、区域淋巴结阳性率和化疗4个核心变量。测试集中,RSF模型的C指数为0.751,优于Cox模型的0.711;RSF模型1、2、3年AUC分别为0.843、0.749和0.814,整体性能更佳,校准度与临床获益也优于Cox模型。进一步分析显示,区域淋巴结阳性率、年龄及肿瘤大小与生存风险呈非线性关系,化疗则显著降低死亡风险。分层生存曲线提示,不化疗、淋巴结阳性率高、年龄>70岁及肿瘤大小>20 mm患者预后更差。结论 基于RSF的预后模型仅依赖4个临床可及变量,预测性能优于Cox模型,可为ECCA术后个体化生存预测及随访策略制定提供可靠依据。其结合可解释性分析,具有较强的临床应用潜力,有望改善患者生存质量与预后。

    Abstract:

    Background and Aims Extrahepatic cholangiocarcinoma (ECCA) is a malignancy with insidious onset, strong invasiveness, and poor prognosis, characterized by a high postoperative recurrence rate and a 5-year overall survival of less than 20%. Most existing prognostic models are based on the Cox proportional hazards model, which is limited by the proportional hazards assumption and linearity constraints. The random survival forest (RSF) model, a novel machine learning algorithm, can capture complex interactions and nonlinear effects among variables; however, its application in ECCA remains scarce. Therefore, this study developed a prognostic model for ECCA patients after radical resection using the RSF algorithm, aiming to provide precise and individualized prognostic assessments and support clinical decision-making.Methods A total of 515 postoperative ECCA patients from the SEER database (2016-2021) were retrospectively enrolled and randomly divided into a training set (n=361) and a test set (n=154). Demographic and clinical variables were collected. Cox models were developed using univariate and multivariate regression, while RSF models were constructed using variable importance (VIMP) and minimal depth methods. Model performance was evaluated using the concordance index(C-index), time-dependent area under the curve(AUC), Brier scores, calibration plots, and decision curve analysis. Survival differences were assessed using Kaplan-Meier analysis, and interpretability was enhanced through the use of SurvSHAP and SurvLIME.Results Multivariate Cox regression identified seven independent prognostic factors: age, race, income, T stage, N stage, tumor size, and chemotherapy. The RSF model selected four key predictors: age, tumor size, lymph node positive rate, and chemotherapy. In the test cohort, the RSF model achieved a C-index of 0.751, outperforming the Cox model (0.711). The RSF model yielded AUCs of 0.843, 0.749, and 0.814 at 1, 2, and 3 years, respectively, with superior calibration, overall performance, and net clinical benefit. Nonlinear associations were observed for lymph node positive rate, age, and tumor size, while chemotherapy was associated with reduced mortality risk. Stratified survival curves indicated poorer prognosis in patients without chemotherapy, lymph node positive rate >0.1, age >70 years, or tumor size >20 mm.Conclusion The RSF model, based on only four readily available clinical variables, demonstrated superior predictive performance compared with the Cox model. It provides a reliable tool for individualized prognosis and postoperative management in ECCA patients. The integration of interpretability frameworks further enhances its clinical applicability, offering potential to improve survival outcomes and quality of life.

    图1 基于Cox比例风险模型的列线图Fig.1 Nomogram based on the Cox proportional hazards model
    图2 Cox比例风险模型测试集的校准曲线与决策曲线 A:测试集1、2、3年的AUC;B:测试集Kaplan-Meier曲线;C:第1年校准曲线;D:第2年校准曲线;E:第3年校准曲线;F:第1年决策分析曲线;G:第2年决策分析曲线;H:第3年决策分析曲线Fig.2 Calibration and decision curve analyses of the Cox model in the test cohort A: AUC at 1, 2, and 3 years in the test cohort; B: Kaplan-Meier curve in the test cohort; C: Calibration curve at 1 year; D: Calibration curve at 2 years; E: Calibration curve at 3 years; F: Decision curve analysis at 1 year; G: Decision curve analysis at 2 years; H: Decision curve analysis at 3 years
    图3 RSF模型的建立 A:VIMP法变量筛选结果;B:VIMP法与MD法结合变量筛选结果Fig.3 Development of the RSF model A: Variable selection results using the VIMP method; B: Variable selection results combining the VIMP and MD methods
    图4 Cox比例风险模型测试集的校准曲线与决策曲线 A:测试集1、2、3年的AUC;B:测试集Kaplan-Meier曲线;C:第1年校准曲线;D:第2年校准曲线;E:第3年校准曲线;F:第1年决策分析曲线;G:第2年决策分析曲线;H:第3年决策分析曲线Fig.4 Calibration and decision curve analyses of the RSF model in the test cohort A: AUC at 1, 2, and 3 years in the test cohort; B: Kaplan-Meier curve in the test cohort; C: Calibration curve at 1 year; D: Calibration curve at 2 years; E: Calibration curve at 3 years; F: Decision curve analysis at 1 year; G: Decision curve analysis at 2 years; H: Decision curve analysis at 3 years
    图5 基于SurvSHAP的模型解释Fig.5 Model interpretation based on SurvSHAP
    图6 各变量的Kaplan-Meier生存曲线Fig.6 Kaplan-Meier survival curves of key variables
    图7 单例患者的SurvLIME图(左边部分显示了每个变量对单个选定患者存活率的影响:红色表示降低死亡风险的变量,绿色表示增加死亡风险的变量,面积越大则影响越大;右边部分显示了RSF模型的预测与黑箱模型的预测之间的比较:两个函数越接近,模型结果就越准确) A:年龄72岁,未行化疗,淋巴结阳性率为24.0%,肿瘤大小为30 mm,研究终点为死亡,存活时间为3个月;B:年龄56岁,行化疗,淋巴结阳性率为0,肿瘤大小为10 mm,研究终点为存活,存活时间为68个月Fig.7 SurvLIME plots for individual patients (the left panel shows the effect of each variable on the survival probability of the selected patient: red indicates variables that reduce mortality risk, green indicates variables that increase mortality risk, and larger areas represent greater influence; The right panel compares predictions from the RSF model with those from the black-box model: the closer the two curves are, the more accurate the model results) A: Age 72 years, no chemotherapy, lymph node positive rate 0.24, tumor size 30 mm; endpoint: death, survival time 3 months; B: Age 56 years, received chemotherapy, lymph node positive rate 0, tumor size 10 mm; endpoint: survival, survival time 68 months
    表 1 患者基线特征Table 1 Baseline characteristics of the enrolled patients
    表 2 单因素与多因素Cox回归分析Table 2 Univariate and multivariate Cox regression analysis
    参考文献
    相似文献
    引证文献
引用本文

吴市委,肖哲泰,秦占雨,王博宇,侍阳.基于随机生存森林模型的肝外胆管癌根治术后预后预测模型的构建与验证[J].中国普通外科杂志,2025,34(8):1696-1708.
DOI:10.7659/j. issn.1005-6947.250160

复制
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2025-03-19
  • 最后修改日期:2025-08-09
  • 录用日期:
  • 在线发布日期: 2025-10-11