基于H2O自动化机器学习的肝硬化患者死亡预测模型的建立
作者:
通讯作者:
作者单位:

1.江苏大学附属金坛医院,肝胆外科,江苏 常州 213200;2.江苏大学附属金坛医院 骨科,江苏 常州 213200;3.江苏大学附属金坛医院 肿瘤科,江苏 常州 213200;4.中南大学湘雅医院 儿科,湖南 长沙 410008;5.湖南省人民医院/湖南师范大学附属第一医院 肝胆外科,湖南 长沙 410005

作者简介:

王玉,江苏大学附属金坛医院副主任医师,主要从事肝胆外科疾病方面的研究。

基金项目:

江苏省常州市科技局第十三批科技计划(应用基础研究)基金资助项目(CJ20210005、CJ20210006);江苏大学医教协同创新基金资助项目( JDY2022018)。


Development of a prediction model for mortality in liver cirrhosis patients based on H2O automated machine learning
Author:
Affiliation:

1.Department of Hepatobiliary Surgery, Jintan Affiliated Hospital of Jiangsu University, Changzhou, Jiangsu 213200, China;2.Department of Orthopaedics, Jintan Affiliated Hospital of Jiangsu University, Changzhou, Jiangsu 213200, China;3.Department of Oncology, Jintan Affiliated Hospital of Jiangsu University, Changzhou, Jiangsu 213200, China;4.Department of Pediatrics, Xiangya Hospital, Central South University, Changsha 410008, China;5.Department of Hepatobiliary Surgery, Hunan Provincial People's Hospital/the First Affiliated Hospital of Hunan Normal University, Changsha 410005, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 音频文件
  • |
  • 视频文件
    摘要:

    背景与目的 晚期肝硬化患者往往出现一系列并发症,死亡风险增加。因此,尽早识别肝硬化死亡高风险具有重要的临床意义。本研究利用H2O平台自动化机器学习(AutoML)框架,建立预测肝硬化患者入院30 d死亡模型,以期为改善肝硬化患者预后以及肝硬化临床管理提供新的方法。方法 收集江苏大学附属金坛医院及湖南省人民医院肝硬化住院患者入院时一般资料及实验室检查数据。利用H2O AutoML框架建立针对死亡结局的多种机器学习算法模型,绘制受试者工作特征(ROC)曲线并建立混淆矩阵来评价模型效力,同时对重要变量进行可视化呈现。结果 最佳模型为梯度提升机(GBM),Gini值0.994,R2为0.775,LogLoss为0.120。模型中重要变量包括凝血酶原时间、肌酐、白细胞及年龄。变量SHAP特征图及部分依赖图呈现了重要变量与模型整体预测的相关性。局部可解析性算法(LIME)可视化显示变量在个体预测的作用。最佳模型GBM在验证集中特异度为0.950,敏感度0.676,ROC曲线下面积(AUC)为0.793,优于基于极致梯度提升(XGBoost)、Logistic回归、随机森林和深度学习四个算法模型,以及终末期肝病模型(MELD)及白蛋白-胆红素(ALBI)评分。结论 所建立的预测短期死亡机器学习模型对肝硬化患者的短期死亡风险筛查提供了有效的工具,但其可靠性仍需多中心的外部验证进一步评估。

    Abstract:

    Background and Aims Patients with advanced liver cirrhosis often experience a series of complications, leading to an increased risk of death. Therefore, early identification of high-risk patients for liver cirrhosis mortality is of significant clinical importance. In this study, we used the H2O platform and automated machine learning (AutoML) framework to develop a predictive model for 30-d in-hospital mortality in liver cirrhosis patients, aiming to provide new methods for improving patient prognosis and clinical management of liver cirrhosis.Methods General information and laboratory examination data were collected from hospitalized liver cirrhosis patients at Jintan Hospital affiliated with Jiangsu University and Hunan Provincial People's Hospital. Multiple machine learning algorithm models for mortality outcomes were established using the H2O AutoML framework. ROC curves were plotted, and confusion matrices were used to evaluate the performance of the models. Furthermore, important variables were visualized.Results The best model, gradient boosting machine (GBM), had a Gini value of 0.994, R2 of 0.775, and LogLoss of 0.120. Important variables in the model included prothrombin time, creatinine, white blood cells, and age. The SHAP feature graph and partial dependence graph demonstrated the correlation between important variables and the overall predictions of the model. LIME visualization showed the individual predictive effects of the variables. The best GBM model had a specificity of 0.950, sensitivity of 0.676, and AUC of 0.793 in the validation set, outperforming four algorithm models (XGBoost, Logistic regression, random forest, and deep learning), as well as the MELD and ALBI scores.Conclusions The established machine learning model for predicting short-term mortality provides an effective tool for screening the risk of short-term death in patients with liver cirrhosis. However, its reliability still needs further evaluation through external validation from multiple centers.

    图1 最佳模型GBM中各变量重要性相对值排名Fig.1 Relative ranking of variable importance in the best model GBM
    图2 最佳模型中各变量SHAP特征(横轴坐标表示变量在结局分类中的分布,纵坐标表示各变量)Fig.2 SHAP features of variables in the best model (The horizontal axis represening the distribution of variables in outcome categories, and the vertical axis representing the respective variables)
    图3 随机样本中变量重要性LIME可视化(随机样本中,显示重要变量在个体预测中作用;p0表示生存,p1表示死亡)Fig.3 LIME visualization of variable importance in random samples (in random samples, displaying the role of important variables in individual predictions; p0 representing survival, p1 representing death)
    图4 最佳模型中变量部分依赖图(部分依赖图显示的是排名前四变量对GBM模型的预测结果的边际效应)Fig.4 Partial dependence plots of variables in the best model (partial dependence plots showing the marginal effects of the top four ranked variables on the predictions of the GBM model)
    表 2 各机器学习模型及常用评分在验证集上的表现Table 2 Performance of various machine learning models and commonly used scores on the validation set
    参考文献
    相似文献
    引证文献
引用本文

王玉,徐中华,虞卫新,张辉,于倩倩,段文斌.基于H2O自动化机器学习的肝硬化患者死亡预测模型的建立[J].中国普通外科杂志,2023,32(7):1071-1078.
DOI:10.7659/j. issn.1005-6947.2023.07.012

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-03-03
  • 最后修改日期:2023-01-10
  • 录用日期:
  • 在线发布日期: 2023-11-03