Abstract:Background and Aims: Pancreatic cancer is characterized by features such as low early detection rate, high distant metastasis risk, and poor chemotherapy response. The majority of the patients with pancreatic cancer may face a dismal prognosis. So, development of early diagnostic tools for pancreatic cancer is of great importance. This study was conducted to serum miRNAs that can be used for differential detection among samples and construct a decision tree model using bioinformatics approaches and machine learning method.
Methods: Serum miRNA expressions datasets of GSE113486 and GSE85589 were downloaded from the Gene Expression Omnibus (GEO) database. Batch effect removal was performed using the ComBat function. The hub miRNAs for distinguishing the tumor from normal serum samples were screened by LASSO regression analyses. Decision-tree model was constructed based on the hub miRNAs by rpart software package. The diagnostic efficacy of the decision tree was evaluated with ROC curves. The expression differences of the variables of the decision tree between the tumor and normal serum samples were compared by Wilcoxon test. Finally, the target mRNAs of the hub miRNAs were predicted using miRDB, miRTarBsae and TargetScan databases, and enrichment analysis was also performed.
Results: After removing the batch effects, the serum miRNA expression profiles of 119 healthy controls and 128 pancreatic cancer samples were included in the study. Using LASSO regression with 10-fold cross validation, 33 miRNAs were screened. Next, all samples were randomly divided into a training set (60%) and a validation set (40%). In training set, 33 hub miRNAs were used for construction of decision tree model. After pruning, miR-4532 and miR-4668-5p were selected as observation variables of the decision tree. The results of ROC curve evaluation showed the area under the curve (AUC) was 0.948 1 in training set and was 0.902 4 in validation set. In addition, miR-4532 and miR-4668-5p were significantly overexpressed in the pancreatic cancer serum samples compared to the normal serum samples. Six target mRNAs were predicted for miR-4532, and 73 target mRNAs predicted for miR-4668-5p. They were mainly associated with the functions such as transcription regulator complex, nuclear chromatin, transcription repressor complex, cell-subtrate junction organization, regulation of megakaryocyte differentiation, and negative regulation of focal adhesion assembly, and mainly enriched in the pathways involving the transcriptional misregulation in cancer, FoxO, adherens junction, pancreatic cancer, hepatitis B, hepatocellular carcinoma, TGF-β and MAPK.
Conclusion: The decision tree constructed by miR-4532 and miR-4668-5p has a good efficacy in distinguishing normal and pancreatic cancer serum samples, and it has certain value in early detection of pancreatic cancer.