基于机器学习的结直肠癌微卫星不稳定基因挖掘及其应用价值分析
作者:
作者单位:

1.中国人民解放军南部战区空军医院 普通外科, 广东 广州 510000;2.中国人民解放军空军军医大学西京医院 神经内科,陕西 西安 710000;3.中国人民解放军空军西安飞行学院一旅明港场站医院 门诊部,河南 信阳 463200

作者简介:

李秀勤,中国人民解放军南部战区空军医院主治医师, 主要从事消化道肿瘤临床方面的研究。

通信作者:

朱军,Email: zjsty@fmmu.edu.cn

基金项目:

国家自然科学基金资助项目(82100680)。


Mining of genes involved in microsatellite instability in colorectal cancer through machine learning and evaluation of their application values
Author:
Affiliation:

1.Department of General Surgery, the Southern Theater Air Force Hospital, Guangzhou 510000, China;2.Department of Neurology, Air Force Medical University, Xi'an 710000, China;3.Ming Gang Station Hospital, Xi'an Institute of Flight of the Air Force, Xingyang, Henan 463200, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 音频文件
  • |
  • 视频文件
    摘要:

    背景与目的 结直肠癌(CRC)是全球第三大最常诊断的恶性肿瘤和第二大癌症死亡原因。最新指南推荐所有的CRC患者需要进行微卫星不稳定(MSI)的检测。MSI患者往往具有错配修复蛋白缺失(dMMR)。MSI/dMMR状态已被用作生物标志物预测对免疫治疗的有利反应和预后。然而MSI特征基因及其与肿瘤浸润的免疫细胞的关系未进行阐述。因此本研究通过使用机器学习的方式发掘CRC中新型的MSI特征基因,并且验证其的诊断价值及其与免疫细胞浸润的关系。方法 按照纳入排除标准,将GEO数据库中GSE39582数据集作为训练集,将TCGA数据库中COAD数据集作为外部验证集。使用机器学习的方法(LASSO回归、SVM-RFE算法),在GSE39582结直肠癌数据集中筛选MSI特征基因,并在TCGA结直肠癌数据中进行验证。采用受试者工作特征(ROC)曲线和曲线下面积(AUC)评价基因对MSI的诊断效能。CIBERSORT算法评估肿瘤样本浸润的免疫细胞成分,Spearman相关性分析验证MSI特征基因和免疫细胞的关系。结果 训练集共纳入536例CRC患者,其中高频MSI(MSI-H)77例(14.37%)。在验证集中,共计389例CRC患者,其中MSI-H 67例(17.22%)。基线资料分析显示,MSI-H/dMMR CRC的TNM分期存活率优于低频MSI(MSI-L)或微卫星稳定(MSS)/错配蛋白完整(pMMR)CRC(P<0.05)。在GSE39582数据集中,LASSO回归筛选MSI特征基因21个,SVM-RFE算法筛选基因6个,结合两种算法确定MSI特征基因为EIF5ACXCL13HNRNPLHOXC6RPL22L1Y16709。在TCGA数据库中进一步验证MSI特征基因的诊断效能,研究发现EIF5A的诊断效能最高。在训练集和验证集中,EIF5A的AUC值分别为0.922和0.805。同时,Spearman相关性分析发现,EIF5A主要与CD8+T细胞,活化的树突状细胞,辅助性T细胞,M1型巨噬细胞,γδT细胞,中性粒细胞成正相关;与CD4+记忆性T细胞,M2型巨噬细胞,静止树突状细胞,嗜酸性粒细胞,调节性T细胞呈负相关。结论 CRC的新型MSI特征基因分析结果表明,EIF5A对CRC MSI的诊断具有较好的诊断作用和临床价值,同时提示EIF5A与免疫细胞及免疫微环境相关。因此,EIF5A可能成为免疫检查点治疗的新型标志物。

    Abstract:

    Background and Aims Colorectal cancer (CRC) is the third most commonly diagnosed malignancy and the second leading cause of cancer death worldwide. The latest guidelines recommend that all CRC patients need to be tested for microsatellite instability (MSI). MSI patients often have deficient mismatch repair (dMMR). The MSI/dMMR has been used as a biomarker for predicting the favorable response to immunotherapy and prognosis of patients. However, MSI signature genes and their relationship to tumor-infiltrating immune cells have not been fully described. Therefore, this study was conducted to discover novel MSI signature genes in CRC through machine learning and verify their diagnostic values and relationships with immune cell infiltration.Methods According to the inclusion and exclusion criteria, the GSE39582 dataset in GEO database was used as the training set, and the COAD dataset in TCGA database was used as the external validation set. Using machine learning methods (LASSO regression and SVM-RFE algorithm), MSI signature genes were screened in the GSE39582 CRC data set and validated in the TCGA COAD dataset. Receiver operating characteristic (ROC) curve and area under the curve (AUC) were used to evaluate the diagnostic performance of genes for MSI. The CIBERSORT algorithm evaluated each sample's immune infiltrating cell components, and Spearman correlation analysis was used to verify the relationship between MSI signature genes and immune cells.Results A total of 536 CRC patients were included in training set, of which 77 cases (for 14.37%) were high microsatellite instability (MSI-H). In validation set, there were a total of 389 CRC patients, of which 67 cases (17.22%) were MSI-H. The baseline data analysis showed that the TNM profiles and survival rates in MSI-H/dMMR CRC were superior to those in low microsatellite instability (MSI-L) or microsatellite stable (MSS)/proficient mismatch repair (pMMR) CRC (P<0.05). In GSE39582 dataset, 21 MSI signature genes were screened by LASSO regression, and 6 genes were screened by SVM-RFE algorithm. The MSI signature genes were identified as EIF5A, CXCL13, HNRNPL, HOXC6, RPL22L1, and Y16709 by combining the two algorithms. The diagnostic efficacy of MSI signature genes was further verified in TCGA database, and EIF5A was found to have the highest diagnostic efficacy. The AUC values for EIF5A in training and validation sets were 0.922 and 0.805, respectively. At the same time, Spearman correlation analysis found that EIF5A was mainly positively correlated with CD8+ T cells, activated dendritic cells, helper T cells, M1 macrophages, γδ T cells, and neutrophils; it was negatively correlated with CD4+ memory T cells, M2 macrophages, quiescent dendritic cells, eosinophils, and regulatory T cells.Conclusion Analysis of novel MSI signature genes in CRC shows that EIF5A has a good diagnostic performance and clinical value for CRC MSI status. It is also associated with immune cells and immune microenvironment. Thus, EIF5A may become a new marker for immune checkpoint therapy.

    表 3 不同基因对CRC MSI状态的诊断效能Table 3 Diagnostic efficacy of different genes for MSI status in colorectal cancer
    图1 MSI差异性基因的火山图Fig.1 Volcano diagram of differentially expressed genes of MSI
    图2 LASSO回归和SVM-RFE筛选特征基因 A:LASSO回归筛选特征基因的过程;B:SVM-RFE中误差与变量数目的关系Fig.2 MSI-related genes identified by LASSO regression and SVM-RFE methods A: Selection of MSI-related genes by LASSO regression; B: The relationship between error and number of genes in SVM-RFE
    图3 EIF5A基因与肿瘤浸润免疫细胞的关系Fig.3 The correlation between EIF5A and tumor-infiltrating immune cells
    图1 MSI差异性基因的火山图Fig.1 Volcano diagram of differentially expressed genes of MSI
    图2 LASSO回归和SVM-RFE筛选特征基因 A:LASSO回归筛选特征基因的过程;B:SVM-RFE中误差与变量数目的关系Fig.2 MSI-related genes identified by LASSO regression and SVM-RFE methods A: Selection of MSI-related genes by LASSO regression; B: The relationship between error and number of genes in SVM-RFE
    图3 EIF5A基因与肿瘤浸润免疫细胞的关系Fig.3 The correlation between EIF5A and tumor-infiltrating immune cells
    表 2 TCGA和GEO数据集的基线资料特征(续)Table 2 Baseline features of CRC patients in TCGA and GEO datasets (continued)
    参考文献
    相似文献
    引证文献
引用本文

李秀勤,韩腾辉,王帅,沈刚,朱军.基于机器学习的结直肠癌微卫星不稳定基因挖掘及其应用价值分析[J].中国普通外科杂志,2022,31(10):1355-1362.
DOI:10.7659/j. issn.1005-6947.2022.10.011

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2021-11-29
  • 最后修改日期:2022-04-18
  • 录用日期:
  • 在线发布日期: 2022-10-31