Development and validation of a prediction model for VTE risk in gastric and esophageal cancer patients
Objective This study focuses on the risk of venous thromboembolism (VTE) in patients with gastric or esophageal cancer (GC/EC), investigating the risk factors for VTE in this population. Utilizing machine learning techniques, the research aims to develop an interpretable VTE risk prediction model. The goal is to identify patients with gastric or esophageal cancer who are at high risk of VTE at an early stage in clinical practice, thereby enabling precise anticoagulant prophylaxis and thrombus management. Methods This study is a real-world investigation aimed at predicting VTE in patients with GC/EC. Data were collected from inpatients diagnosed with GC/EC at Sichuan Provincial People’s Hospital between 1 January 2018, and 31 June 2023. Using nine supervised learning algorithms, 576 prediction models were developed based on 56 available variables. Subsequently, a simplified modeling approach was employed using the top 12 feature variables from the best-performing model. The primary metric for assessing the predictive performance of the models was the area under the ROC curve (AUC). Additionally, the training data used to construct the best model in this study were employed to externally validate several existing assessment models, including the Padua, Caprini, Khorana, and COMPASS-CAT scores. Results A total of 3,742 cases of GC/EC patients were collected after excluding duplicate visit information. The study included 861 (23.0%) patients, of which 124 (14.4%) developed VTE. The top five models based on AUC for full-variable modeling are as follows: GBoost (0.9646), Logic Regression (0.9443), AdaBoost (0.9382), CatBoost (0.9354), XGBoost (0.8097). For simplified modeling, the models are: Simp-CatBoost (0.8811), Simp-GBoost (0.8771), Simp-Random Forest (0.8736), Simp-AdaBoost (0.8263), Simp-Logistic Regression (0.8090). After evaluating predictive performance and practicality, the Simp-GBoost model was determined as the best model for this study. External validation of the Padua score, Caprini score, Khorana score, and COMPASS-CAT score based on the training set of the Simp-GBoost model yielded AUCs of 0.4367, 0.2900, 0.5000, and 0.3633, respectively. Conclusion In this study, we analyzed the risk factors of VTE in GC/EC patients, and constructed a well-performing VTE risk prediction model capable of accurately identifying the extent of VTE risk in patients. Four VTE prediction scoring systems were introduced to externally validate the dataset of this study. The results demonstrated that the VTE risk prediction model established in this study held greater clinical utility for patients with GC/EC. The Simp-GB model can provide intelligent assistance in the early clinical assessment of VTE risk in these patients.
Preview
Cite
Access Statistic
