基于E2F靶点基因集和免疫亚型差异的肝细胞癌预后风险评分模型的建立
作者:
通讯作者:
作者单位:

1.中国人民解放军陆军军医大学基础医学院,学员三大队十一队,重庆 400038;2.中国人民解放军陆军军医大学基础医学院,学员四大队十二队,重庆 400038;3.中国人民解放军陆军军医大学基础医学院,学员三大队九队,重庆 400038;4.中国人民解放军陆军军医大学第一附属医院 肝胆外科, 重庆 400038

作者简介:

何锶,中国人民解放军陆军军医大学基础医学院本科生,主要从事生物信息学分析方面的研究。

基金项目:

重庆市科技局技术创新与应用发展专项基金资助项目(CSTC2021jscx-gksb-N0009)。


Construction of prognostic assessment model for hepatocellular carcinoma based on E2F targets and immune subtype differences
Author:
Affiliation:

1.The Eleventh Squadron of the Third Student Brigade, Basic Medicine College of Army Medical University, Chongqing 400038, China;2.the Twelfth Squadron of the Fourth Student Brigade, Basic Medicine College of Army Medical University, Chongqing 400038, China;3.the Ninth Squadron of the Third Student Brigade, Basic Medicine College of Army Medical University, Chongqing 400038, China;4.Department of Hepatobiliary Surgery, the First Affiliated Hospital of Army Medical University, Chongqing 400038, China

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 音频文件
  • |
  • 视频文件
    摘要:

    背景与目的 肝细胞癌(HCC)是肝癌中最常见的种类,HCC患者的预后生存情况较差,其有效的预后预测也面临巨大挑战。许多研究已证实E2F基因家族和免疫微环境相关的基因标志物是癌症的重要预后因素,因此,本研究利用TCGA数据库筛选E2F基因家族和免疫微环境相关的HCC基因标志物,建立新的HCC风险评分模型,并预测HCC潜在治疗靶点。方法 TCGA数据库中下载大型HCC(LIHC)队列(424例样本)。进行了基因集富集分析、基因集单样本富集分析和基因集单样本富集分析分数聚类后的基因表达差异分析,通过Lasso回归筛选标志基因并建模,根据模型计算患者得分并将患者分为高风险组和低风险组。使用受试者工作特征曲线(ROC)、Kaplan-Meier生存曲线、Cox回归分析等多种统计学方法以验证模型的可靠性。所有统计分析均使用R语言软件。最后在Cbioportal数据库查询风险模型的标志基因在TCGA-HCC样本中的基因变异情况,从String数据库中下载蛋白互作信息并用Cytoscape软件进行可视化分析。结果 确认了与HCC密切相关的E2F靶点基因组和免疫相关差异基因后,从中筛选到了与HCC患者总生存率明显相关的7个基因(CYR61,FBLN5,LPA,SAA1,SDC3,SERPINE1,SSRP1),建立7-mRNA预后模型:风险评分=-0.55×CYR61表达-0.18×FBLN5表达-0.17×LPA表达-0.06×SAA1表达+0.31×SDC3表达+0.38×SERPINE1表达+1.08×SSRP1表达。该模型ROC的AUC值为0.846。Kaplan-Meier生存曲线显示,高风险评分患者预后不良(P<0.001),高、低风险评分对预后的区分度与肿瘤大小和UICC分期相似,而比淋巴转移、远处转移和BMI值更好。多因素Cox回归分析显示,7-mRNA预后模型的预测能力独立于临床因素。此外,联合蛋白组学找到7个基因中的关键基因SERPINE1和LPA,预测抑制纤溶酶原激活可能是治疗HCC的新的靶途径。结论 本研究揭示了7个基因与E2F靶点和免疫的相关关系,为HCC患者的不良预后提供了新的生物标志物,并建立了有较高预测准确性预后风险评分模型。然而,多基因预后模型的预测能力仍需大量多中心的循证医学证据证实,被纳入的多基因模型的基因功能和参与的机制仍尚需进行更深入的研究。

    Abstract:

    Background and Aims Hepatocellular carcinoma (HCC) is the most common type of liver cancer. The prognosis of HCC patients is poor, and its effective prognosis prediction is also facing significant challenges. Several studies have shown that the genetic markers associated with the E2F gene family and immune microenvironment are important prognostic factors for cancers. Therefore, this study was conducted to screen the HCC gene signatures related to the E2F gene family and immune microenvironment using the TCGA database, establish a new risk assessment model for HCC and predict the potential therapeutic targets for HCC.Methods A large HCC (LIHC) dataset (n=424) from the TCGA database was downloaded. Gene set enrichment analysis, single sample gene set enrichment analysis, and differential gene expression analysis was performed, marker genes were screened and modeled by Lasso regression, patient scores were calculated according to the model, and patients were divided into high-risk and low-risk groups. Multiple statistical methods, such as the receiver operating characteristic (ROC) curve, Kaplan-Meier survival curve, and Cox regression analysis, were used to verify the model's reliability. R language software was used for all statistical analyses. Finally, genetic alterations of the marker genes from the risk model were queried in the TCGA-HCC samples in the Cbioportal database. The protein interaction information was downloaded from the String database and visualized in Cytoscape software.Results After identification of the E2F target genome and immune-related differential genes which were closely related to HCC, seven genes (CYR61, fbln5, LPA, SAA1, SDC3, serpine1, ssrp1) significantly associated with the overall survival rate of HCC patients were screened, and a prognostic 7-mRNA signature model was established: risk score=-0.55×CYR61 expression-0.18×FBLN5 expression-0.17×LPA expression -0.06×SAA1 expression +0.31×SDC3 expression+0.38 ×SERPINE1 expression+1.08×SSRP1 expression The ROC AUC value of the model was 0.846. Kaplan-Meier survival curve showed that patients with high-risk scores had a poor prognosis (P<0.001). The degree of discrimination for prognosis of high and low-risk scores was similar to those of tumor size and UICC stage and higher than those of lymph node metastasis, distant metastasis, and BMI. Multivariate Cox regression analysis showed that the predictive ability of the 7-mRNA signature model was independent of clinical factors. In addition, the key genes SERPINE1 and LPA in the 7 genes were found by combining proteomics, which predicted that inhibiting plasminogen activation was probably a new target approach for treating HCC.Conclusion This study reveals the correlation between seven genes and E2F targets and immunity, provides new biomarkers for poor prognosis of HCC patients and establishes a prognostic risk score model with high predictive accuracy. However, the predictive ability of the polygenic prognosis model still needs to be confirmed by many evidence-based medical practices from multiple centers, and the gene function and participation mechanism of the included polygenic models still need to be further studied.

    表 1 本研究中HCC患者的临床特征[n(%)]Table 1 Clinical characteristics of HCC patients in this study [n (%)]
    表 2 与临床指标进行Cox回归分析Table 2 Cox regression analysis with clinical indicators
    图1 整体流程图Fig.1 Overall flow chart
    图2 富集得分最高的E2F靶点及其相关基因的鉴定 A:区分有、无癌组的E2F靶点基因集富集图;B:按归一化后的富集得分排序的10个标志基因集的P值及基因集包含基因数量;C:所选特征的系数由λ参数表示;D:偏似然偏差和对数(λ)用Lasso-Cox回归模型绘制Fig.2 E2F targets with the highest enrichment score and identification of related genes A: Enrichment plots of E2F target gene sets differentiated between in cancer and non-cancer group; B: The P-value and the number of genes of top 10 marker gene sets with the highest normalized enrichment score; C: The coefficients of the selected features represented by the λ parameter; D: Partial likelihood deviation and logarithm (λ) constructed with the Lasso-Cox regression model
    图3 免疫亚型及免疫相关差异表达基因的鉴定 A:有癌组样本按富集得分的聚类分组;B:两免疫亚组间差异表达mRNA的火山图;C:所选特征的系数由λ参数表示;D:偏似然偏差和对数(λ)用Lasso-Cox回归模型绘制Fig.3 Immune subtypes and identification of immune-related differentially expressed genes A: The samples in the cancer group distinguished according to the enrichment score; B: Volcano map of differentially expressed mRNAs between two immune subgroups; C: The coefficients of the selected features represented by the λ parameter; D: Partial likelihood deviation and logarithm (λ) constructed with the Lasso-Cox regression model
    图4 构建预后模型 A:所选特征的系数由λ参数表示;B:偏似然偏差和对数(λ)用Lasso-Cox回归模型绘制;C:多因素Cox回归模型的构建及其列线图;D:ROC检测7-mRNA预后模型的预后能力Fig.4 Construction of prognosis model A: The coefficients of the selected features represented by the λ parameter; B: Partial likelihood deviation and logarithm (λ) constructed with the Lasso-Cox regression model; C: Construction of multivariate Cox regression model and its nomogram; D: Prognostic ability of the 7-mRNA signature model detected by ROC
    图5 7-mRNA预后模型预后能力分析验证 A-B:将风险评分高于中位数的标记为高风险,低于中位数的为低风险;C-H:按高低风险评分、肿瘤大小(T)、淋巴转移(N)、远处转移(M)、UICC分期和BMI值分别绘制的Kaplan-Meier生存曲线;I-J:两UICC亚型(Ⅰ、Ⅱ期和Ⅲ、Ⅳ期)分别绘制的高低风险Kaplan-Meier生存曲线Fig.5 Analysis and verification of the prognostic ability of the 7-mRNA signature model A-B: Marking the risk score higher than the median as high risk, and the risk score lower than the median as low risk; C-H: Constructing Kaplan-Meier survival curves for the high and low-risk score, tumor size (T), lymphatic metastasis (N), distant metastasis (M), UICC stage and BMI value; I-J: Constructing high and low-risk Kaplan-Meier survival curves between the two UICC subtypes (stages Ⅰ, Ⅱ, Ⅲ and Ⅳ)
    图6 模型的7个基因分析及药物靶点寻找 A:有、无癌样本7个基因表达量对比箱线图;B:蛋白互作并找出核心基因Fig.6 Analysis of seven genes and identification of drug targets A: Box-plot of comparison of 7 genes expression between cancer and non-cancer samples; B: Protein interaction and identification of the core genes
    表 3 模型的7个基因在372个TCGA-HCC样本中的基因变异情况(%)Table 3 Genetic variation of 7 genes in 372 TCGA-HCC samples (%)
    参考文献
    相似文献
    引证文献
引用本文

何锶,赵杨,朱永乾,吴卓翼,吴英,谢君蓉,郑登烨,简红梅.基于E2F靶点基因集和免疫亚型差异的肝细胞癌预后风险评分模型的建立[J].中国普通外科杂志,2023,32(1):64-73.
DOI:10.7659/j. issn.1005-6947.2023.01.005

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2022-04-06
  • 最后修改日期:2022-06-17
  • 录用日期:
  • 在线发布日期: 2023-02-03