神经网络预测模型辅助诊断结直肠癌微卫星状态的研究
作者:
通讯作者:
作者单位:

1.中国人民解放军空军军医大学第一附属医院 实验外科,陕西 西安 710000;2.中国人民解放军空军军医大学第一附属医院 胃肠外科,陕西 西安 710000;3.中国人民解放军空军西安飞行学院一旅明港场站医院 门诊部,河南 信阳 463200;4.中国人民解放军南部战区空军医院 普通外科,广东 广州 510000

作者简介:

郝俊,中国人民解放军空军军医大学第一附属医院助理研究员,主要从事结直肠癌发生机制方面的研究。

基金项目:

国家自然科学基金资助项目(82100680)。


Neural network prediction model for assisting diagnosis of microsatellite status in colorectal cancer
Author:
Affiliation:

1.Department of Experimental Surgery, the First Affiliated Hospital, Air Force Medical University, Xi'an 710000, China;2.Department of Gastrointestinal Surgery, the First Affiliated Hospital, Air Force Medical University, Xi'an 710000, China;3.Department of Outpatient Services, Ming Gang Station Hospital, Xi'an Institute of Flight of the Air Force, Xingyang, Henan 463200, China;4.Department of General Surgery, PLA Southern Theater Command General Hospital, Guangzhou 510000, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 音频文件
  • |
  • 视频文件
    摘要:

    背景和目的:微卫星不稳定(MSI)已经成为结直肠癌(CRC)临床诊断、辅助治疗和预后指导的重要生物学标志物。MSI往往伴随DNA错配修复蛋白(dMMR)的缺失。目前错配修复蛋白缺失的诊断主要依靠4种修复蛋白(MLH1、MSH2、MSH6和PMS2)病理免疫组化的结果,而且MSI已经成为CRC免疫治疗重要的生物学标志物。然而MSI精准预测模型和新型特征基因的研究很少。随着人工智能(AI)在医学的发展,精准预测和数据挖掘成为研究的热点。本研究的目的是建立MSI预测的神经网络模型和挖掘新型MSI特征基因。方法 将3个CRC的GEO数据集(GSE39582、GSE29638和GSE75315)作为模型训练集,将1个TCGA CRC数据集作为独立的外部验证集。基于数据集测序数据和芯片数据,使用差异分析,随机森林算法和弹性反向传播算法建立CRC MSI的神经网络预测模型。用K-临近算法(KNN)和支持向量机(SVM)算法建立MSI传统机器学习网络模型。用混淆矩阵,受试者工作特征曲线(ROC)与曲线下面积(AUC)评价模型的预测能力。结果 在训练集中,共纳入787例,其中微卫星高不稳定(MSI-H)111例(14.10%),微卫星低不稳定(MSI-L)/微卫星稳定(MSS)676例(85.90%)。在验证集中,TCGA数据集最终纳入389例,其中MSI-H 67例(17.22%),MSI-L/MSS 322例(82.78%)。通过差异分析计算出与MSI的相关基因100个,其中上调61个,下调39个。通过差异分析和随机森林算法,筛选出前30个贡献最大的MSI的特征基因。基于MSI相关基因的表达矩阵,建立了基于23个基因表达矩阵的神经网络预测模型。该模型在训练集(敏感度0.993,特异度0.973,诊断符合率0.990,AUC为0.991)和验证集(敏感度0.950,特异度0.828,诊断符合率0.933,AUC为0.922)模型均体现出精准的预测能力。此外,对比神经网络模型和机器学习的其他模型,结果表明神经网络模型在预测MSI方面更加准确。结论 神经网络预测模型结合组织深度测序可以较好地辅助临床医生诊断CRC的MSI状态,为肿瘤免疫治疗方案的选择提供了参考和决策依据。同时,所鉴定的MSI的特征基因为深入研究相关的功能及机制提供了线索和方向。

    Abstract:

    Backgrounds and Aims Microsatellite instability (MSI) has become an important biological marker for clinical diagnosis, adjuvant therapy, and prognostic guidance in colorectal cancer (CRC). Microsatellite instability often accompanies the loss of DNA mismatch repair proteins (dMMR). Currently, the diagnosis of mismatch repair protein deficiency mainly relies on the results of pathological immunohistochemistry for four repair proteins (MLH1, MSH2, MSH6, and PMS2), and MSI has become an important biological marker for immunotherapy in CRC. However, there are few studies on precise MSI prediction models and new signature genes. With the development of artificial intelligence in medicine, precise prediction and data mining have become research hotspots. The aim of this study was to establish a neural network model for MSI prediction and to discern new MSI signature genes.Methods Three CRC GEO datasets (GSE39582, GSE29638, and GSE75315) were used as model training sets, and one TCGA CRC dataset was used as an independent external validation set. Based on the sequencing data and microarray data of the datasets, a neural network prediction model for CRC MSI was established using differential analysis, random forest algorithm, and elastic backpropagation algorithm. Traditional machine learning models for MSI were established using K-nearest neighbor algorithm (KNN) and support vector machine (SVM) algorithm. The prediction ability of the models was evaluated using confusion matrices, receiver operating characteristic (ROC) curves, and the area under the curve (AUC).Results In the training set, a total of 787 cases were included, including 111 cases (14.10%) of microsatellite instability-high (MSI-H) and 676 cases (85.90%) of microsatellite instability-low/microsatellite stability (MSI-L/MSS). In the validation set, 389 cases in the TCGA dataset were finally included, including 67 cases (17.22%) of MSI-H and 322 cases (82.78%) of MSI-L/MSS. One hundred MSI-related genes were identified by differential analysis, including 61 up-regulated genes and 39 down-regulated genes. By combining differential analysis and random forest algorithm, the top 30 most significant MSI-related genes were screened out. Based on the expression matrix of the MSI-related genes, a neural network prediction model was established using 23 gene expression matrices. The model showed accurate prediction ability in both the training set (sensitivity: 0.993, specificity: 0.973, diagnostic coincidence rate: 0.990, AUC: 0.991) and the validation set (sensitivity: 0.950, specificity: 0.828, diagnostic coincidence rate: 0.933, AUC 0.922). Moreover, compared with other machine learning models, the neural network model demonstrated more accurate prediction ability in predicting MSI.Conclusion The neural network prediction model combined with tissue deep sequencing can assist clinicians in diagnosing the MSI status of CRC, and provide references and decision-making basis for the selection of tumor immunotherapy schemes. At the same time, the identified MSI signature genes provide clues and directions for in-depth research on related functions and mechanisms.

    表 2 输入层与隐藏层的权重系数Table 2 The weight coefficients of the input layer and the hidden layer
    表 3 隐藏层与输出层的权重系数Table 3 The weight coefficients of the hidden layer and the output layer
    表 1 CRC数据集的基本临床特征[n(%)]Table 1 Basic clinical characteristics in CRC datasets [n (%)]
    表 4 模型评价指标Table 4 Model evaluation variables
    表 5 不同算法预测CRC MSI状态的准确性Table 5 Accuracy of different algorithms for predicting MSI status in CRC
    图1 MSI基因差异分析的火山图(红色的点表示高表达基因,绿色的表示低表达基因,黑色的表示无差异表达的基因)Fig.1 Volcano plot for MSI differentially expressed genes (red dots representing upregulated genes, green dots represent downregulated genes, and black dots representing genes with no differential expression)
    图2 随机森林筛选的MSI特征基因 A:决策树目与误差的关系;B:前30位贡献最大的基因的重要程度Fig.2 MSI-related genes screened by random forest A: The relationship between the decision tree node and the error rate; B: The importance of the top 30 genes with the largest contributions
    图3 人工神经网络Fig.3 Diagram of artificial neural network
    图4 训练集和验证集的ROC曲线Fig.4 ROC curves of the training and validation sets
    参考文献
    相似文献
    引证文献
引用本文

郝俊,王帅,朱军,徐春盛.神经网络预测模型辅助诊断结直肠癌微卫星状态的研究[J].中国普通外科杂志,2023,32(4):488-496.
DOI:10.7659/j. issn.1005-6947.2023.04.002

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-03-01
  • 最后修改日期:2022-05-07
  • 录用日期:
  • 在线发布日期: 2023-04-28