Abstract:Background and Aims Pancreatic ductal adenocarcinoma (PDAC) is the most common pathological type of pancreatic cancer, with a poor long-term prognosis and a lack of individualized prognostic assessment tools. This study was conducted to construct a prognostic nomogram for PDAC patients based on large-sample real-world data from the SEER database using machine learning algorithms to provide precise and individualized prognostic evaluations to inform clinical decision-making.Methods The clinical and prognostic data of PDAC patients pathologically diagnosed from 2000 to 2018 were extracted from the SEER database based on inclusion and exclusion criteria. The data were randomly divided into training (70%) and validation (30%) sets. In the training set, independent prognostic factors were identified using univariate and multivariate Cox proportional hazards models, LASSO regression, and random survival forests. A nomogram was developed to predict 6, 12, and 36-month cancer-specific survival (CSS) and overall survival (OS). The model was then validated and assessed in both training and validation sets using the concordance index (C-index), receiver operating characteristic (ROC) curve, calibration curve, survival curves, and decision curve analysis.Results A total of 4 237 patients were included, with 2 965 in the training set and 1 272 in the validation set, showing comparable baseline characteristics. The median follow-up time was 18 (9-36) months for the training set and 18 (9-37) months for the validation set. The multivariate Cox model indicated that age, T stage, N stage, M stage, differentiation, surgery, systemic therapy, and chemotherapy were independent factors for OS (all P<0.05). For CSS, age, T stage, N stage, M stage, differentiation, surgery, and chemotherapy were independent factors (all P<0.05). The LASSO regression model found that age, differentiation, T stage, N stage, M stage, chemotherapy, surgery, lymph node dissection, radiotherapy, and systemic therapy were associated with OS, while T stage, N stage, M stage, chemotherapy, surgery, lymph node dissection, radiotherapy, and systemic therapy were linked to CSS. The random survival forest model identified the top five variables affecting OS as systemic therapy, differentiation, N stage, chemotherapy, and T stage; and for CSS, they were systemic therapy, differentiation, N stage, chemotherapy, and AJCC stage. Based on the analyses from the multivariate Cox, LASSO, and random survival forest model, along with clinical significance, a prediction model was successfully constructed using seven clinical features: age, T stage, N stage, M stage, differentiation, surgery, and chemotherapy to predict OS and CSS at 6, 12, and 36 months. The validation results showed C-indexes of 0.692 (95% CI=0.681-0.704) and 0.680 (95% CI=0.664-0.698) for OS in the training and validation sets, respectively, and 0.696 (95% CI=0.684-0.707) and 0.680 (95% CI=0.662-0.698) for CSS. ROC curves indicated good predictive value, and calibration curves closely matched the ideal 45° reference line.Conclusion Age, TNM stage, differentiation, surgery, and chemotherapy are independent prognostic factors for PDAC patients. The prognostic model based on these variables has high discrimination and accuracy, assisting clinicians in developing precise and personalized treatment and follow-up plans for PDAC patients.