基于血清miRNA表达数据的胰腺癌诊断决策树构建
作者:
通讯作者:
作者单位:

作者简介:

周钧, Email: zhoujun15974240006@126.com

基金项目:


Construction of decision tree for diagnosis of pancreatic cancer based on serum miRNA expression data
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 音频文件
  • |
  • 视频文件
    摘要:

    背景与目的:胰腺癌具有早期阶段的检测率低,远处转移风险高,以及化疗的效果较差等特点,大多数患者的预后不良,因而开发早期诊断胰腺癌的工具意义重大。本研究利用生物信息学及机器学习的方法,筛选可用于鉴别样本类型的血清miRNA,并构建聚类树。
    方法:从GEO数据库下载GSE113486及GSE85589的血清miRNA表达谱及分组信息,利用ComBat函数移除批次效应,通过LASSO回归筛选出用于鉴别肿瘤与非肿瘤样本的关键miRNA,基于关键miRNA,利用rpart函数构建决策树。ROC曲线用于评价决策树的预测效果,Wilcoxon检验用于比较决策树观测指标在肿瘤及正常组样本中的表达差异。最后,利用miRDB、miRTarBsae及TargetScan预测关键miRNA的靶向mRNA,并行富集分析。
    结果:去除批次效应后,共119例健康对照和128例胰腺癌样本的血清miRNA表达谱纳入研究,进行LASSO回归分析,采用10倍交叉验证,筛选出33个miRNA,进一步将样本随机分为训练集(60%)和测试集(40%)。在训练集中,将33个miRNA用于构建决策树,通过剪枝,最后保留miR-4532和miR-4668-5p作为决策树的观测指标。ROC曲线评价结果显示,训练集中曲线下面积(AUC)为0.948 1,测试集中AUC为0.902 4。且miR-4532和miR-4668-5p在胰腺癌血清样本高表达,与正常血清样本中的表达量相比具有统计学差异(P<0.05)。预测到6个miR-4532靶mRNA,73个miR-4668-5p靶mRNA,它们可能与转录调节复合物、核染色质、转录阻遏物复合体、巨核细胞分化的调控、黏着剂组装、细胞-底物连接组织、巨核细胞分化、黏着斑组装的负调节等功能有关;主要富集于癌症中的转录失调、FoxO、黏附连接、胰腺癌、乙型肝炎、肝细胞癌、TGF-β,MAPK等信号通路中。
    结论:miR-4532和miR-4668-5p所构建的决策树在区分正常与胰腺癌血清样本中具有良好的效果,对于胰腺癌的早期诊断有一定的价值。

    Abstract:

    Background and Aims: Pancreatic cancer is characterized by features such as low early detection rate, high distant metastasis risk, and poor chemotherapy response. The majority of the patients with pancreatic cancer may face a dismal prognosis. So, development of early diagnostic tools for pancreatic cancer is of great importance. This study was conducted to serum miRNAs that can be used for differential detection among samples and construct a decision tree model using bioinformatics approaches and machine learning method. 
    Methods: Serum miRNA expressions datasets of GSE113486 and GSE85589 were downloaded from the Gene Expression Omnibus (GEO) database. Batch effect removal was performed using the ComBat function. The hub miRNAs for distinguishing the tumor from normal serum samples were screened by LASSO regression analyses. Decision-tree model was constructed based on the hub miRNAs by rpart software package. The diagnostic efficacy of the decision tree was evaluated with ROC curves. The expression differences of the variables of the decision tree between the tumor and normal serum samples were compared by Wilcoxon test. Finally, the target mRNAs of the hub miRNAs were predicted using miRDB, miRTarBsae and TargetScan databases, and enrichment analysis was also performed.
    Results: After removing the batch effects, the serum miRNA expression profiles of 119 healthy controls and 128 pancreatic cancer samples were included in the study. Using LASSO regression with 10-fold cross validation, 33 miRNAs were screened. Next, all samples were randomly divided into a training set (60%) and a validation set (40%). In training set, 33 hub miRNAs were used for construction of decision tree model. After pruning, miR-4532 and miR-4668-5p were selected as observation variables of the decision tree. The results of ROC curve evaluation showed the area under the curve (AUC) was 0.948 1 in training set and was 0.902 4 in validation set. In addition, miR-4532 and miR-4668-5p were significantly overexpressed in the pancreatic cancer serum samples compared to the normal serum samples. Six target mRNAs were predicted for miR-4532, and 73 target mRNAs predicted for miR-4668-5p. They were mainly associated with the functions such as transcription regulator complex, nuclear chromatin, transcription repressor complex, cell-subtrate junction organization, regulation of megakaryocyte differentiation, and negative regulation of focal adhesion assembly, and mainly enriched in the pathways involving the transcriptional misregulation in cancer, FoxO, adherens junction, pancreatic cancer, hepatitis B, hepatocellular carcinoma, TGF-β and MAPK.
    Conclusion: The decision tree constructed by miR-4532 and miR-4668-5p has a good efficacy in distinguishing normal and pancreatic cancer serum samples, and it has certain value in early detection of pancreatic cancer.

    参考文献
    相似文献
    引证文献
引用本文

张帆, 李泽东, 彭禹, 陈胜, 周钧.基于血清miRNA表达数据的胰腺癌诊断决策树构建[J].中国普通外科杂志,2021,30(2):211-218.
DOI:10.7659/j. issn.1005-6947.2021.02.010

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
历史
  • 收稿日期:2020-11-03
  • 最后修改日期:2021-01-25
  • 录用日期:
  • 在线发布日期: 2021-02-25