Abstract:Background and Aims: Pancreatic cancer is a common malignant tumor of the digestive tract. Its main pathological type is pancreatic adenocarcinoma (PAAD). Due to the difficulty of early diagnosis and lack of effective treatment measures, the prognosis of PAAD is extremely poor. Therefore, defining new targets for the diagnosis and treatment of PAAD is of great significance. This study was conducted to screen the hub genes related to the diagnosis and prognosis of PAAD by bioinformatics analysis, and then construct a support vector machine (SVM) model to classify PAAD and normal pancreatic samples, so as to provide a useful resource for researches in terms of diagnosis, treatment and mechanism of PAAD.
Methods: Three microarray datasets (GSE28735, GSE62165, GSE62452) were downloaded from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) between PAAD tissue and normal pancreatic tissue were screened using Limma package of R language. GO and KEGG pathway enrichment analysis of the DEGs were performed using STRING database. Then, protein-protein interaction networks (PPI) of the DEGs were generated using the STRING server and visualized by Cytoscape software. Key subnetwork module analyses were performed through MCODE plug-in. R language survival package was used to screen the key nodes related to prognosis in PPI and key subnetworks, and then, the key nodes were uploaded to Metascape for function enrichment analysis. The recursive feature elimination (RFE) algorithm in caret package of R language was used to select the optimal feature genes in key nodes, and the expression differences of the optimal feature genes were verified in GEPIA database. A SVM classifier based on the optimal feature genes was constructed using the R language e1071 package, and the R language pROC package was used to verify the model in the 3 microarray datasets. In the TCGA database, the R package survminer was used to select the genes related to the prognosis of PAAD among the optimal feature genes as the hub genes.
Results: A total of 257 DEGs were screened, including 168 up-regulated genes and 89 down-regulated genes. GO analysis showed that DEGs were mainly involved in biological processes such as the extracellular matrix organization, cell adhesion, serine-type peptidase activity. KEGG analysis showed that DEGs were mainly enriched in protein digestion and absorption, pancreatic secretion, focal adhesion and PI3K-Akt signaling pathway. Survival analysis showed that 14 key nodes were associated with the prognosis in both GSE28735 and GSE62452 (all P<0.05), and these genes played a certain role in neoplasm invasiveness and oncogenesis. RFE screened out 8 optimal feature genes: LAMA3, FN1, ITGA3, MET, PLAU, CENPF, MMP14, and OAS2; GEPIA database validation found that the 8 optimal feature genes were significantly up-regulated in PAAD tissues (all P<0.01). The AUC of ROC curve of the SVM model constructed by these genes in the 3 microarray datasets were 0.898, 1.000 and 0.905, respectively. TCGA database verification found that the up-regulations of LAMA3, ITGA3, MET, PLAU, CENPF and OAS2 were associated with poor prognosis of PAAD (all P<0.05).
Conclusion: The hub genes LAMA3, ITGA3, MET, PLAU, CENPF and OAS2 may be new targets for diagnosis or treatment of PAAD. The SVM model based on 8 optimal feature genes offers an effective tool for diagnosing PAAD.