Development and economic assessment of machine learning models to predict glycosylated hemoglobin in type 2 diabetes
Background: Glycosylated hemoglobin (HbA1c) is recommended for diagnosing and monitoring type 2 diabetes. However, the monitoring frequency in real-world applications has not yet reached the recommended frequency in the guidelines. Developing machine learning models to screen patients with poor glycemic control in patients with T2D could optimize management and decrease medical service costs. Methods: This study was carried out on patients with T2D who were examined for HbA1c at the Sichuan Provincial People’s Hospital from April 2018 to December 2019. Characteristics were extracted from interviews and electronic medical records. The data (excluded FBG or included FBG) were randomly divided into a training dataset and a test dataset with a radio of 8:2 after data pre-processing. Four imputing methods, four screening methods, and six machine learning algorithms were used to optimize data and develop models. Models were compared on the basis of predictive performance metrics, especially on the model benefit (MB, a confusion matrix combined with economic burden associated with therapeutic inertia). The contributions of features were interpreted using SHapley Additive exPlanation (SHAP). Finally, we validated the sample size on the best model. Results: The study included 980 patients with T2D, of whom 513 (52.3%) were defined as positive (need to perform the HbA1c test). The results indicated that the model trained in the data (included FBG) presented better forecast performance than the models that excluded the FBG value. The best model used modified random forest as the imputation method, ElasticNet as the feature screening method, and the LightGBM algorithms and had the best performance. The MB, AUC, and AUPRC of the best model, among a total of 192 trained models, were 43475.750 (¥), 0.972, 0.944, and 0.974, respectively. The FBG values, previous HbA1c values, having a rational and reasonable diet, health status scores, type of manufacturers of metformin, interval of measurement, EQ-5D scores, occupational status, and age were the most significant contributors to the prediction model. Conclusion: We found that MB could be an indicator to evaluate the model prediction performance. The proposed model performed well in identifying patients with T2D who need to undergo the HbA1c test and could help improve individualized T2D management.