Background:
The risk of bowel surgery in patients with Inflammatory Bowel Disease (IBD) and Clostridioides difficile infection (CDI) can change rapidly over the course of the disease. To address this challenge, we developed a machine learning model that incorporates electronic health record (EHR) data ranging 3 months prior to their initial CDI diagnosis up to a year to predict the risk of bowel surgery in IBD patients.
Methods:
This longitudinal study analyzed EHR data from 2012-2021 of adult IBD patients with primary CDI. The model, based on XGBoost, used demographics, comorbidities, care settings, treatment plan, procedures, medications, and lab tests as features, starting from 3 months pre-CDI to 1 year after (or until 30 days before bowel surgery, whichever is earlier). The outcome variable was any bowel surgery within 1 year of initial CDI. The dataset was split into 80% for training and 20% for testing. The model's effectiveness was evaluated using accuracy, area under the receiver operating characteristic curve (AuROC), precision, recall, and F1 score. SHAP (SHapley Additive exPlanations) was used to determine feature importance.
Results:
Overall, 2495 patients with diagnosis of IBD (72.4% UC and 27.1% Crohn’s disease) and CDI were included, 54.4% female, median age 48 years; median BMI was 25.3 and 93% white were included. Of these, 500 patients underwent bowel surgery within 1 year after incident CDI. The median time to surgery after incident CDI was 92 days (range 1-356 days). We trained an XGBoost classifier on 154 features and achieved an overall accuracy of 86.9%, AuROC of 0.86, a recall of 89.83%, a precision of 88.1%, and an F1 score of 0.88 (Figure 1)
In interpreting relevant features from the SHAP summary plot, use of metronidazole for treatment for incident CDI as well as being an inpatient at the time of first CDI, were features that pushed the model's output towards a higher likelihood of bowel surgery. Additionally, higher platelet counts influenced the model towards a higher likelihood of surgery. Lower levels of aspartate aminotransferase (AST) and potassium were features that also influenced the model's output towards a higher likelihood of bowel surgery (Figure 2).
Conclusion:
This study demonstrates the efficacy of a machine learning model in predicting bowel surgery risk in IBD patients with CDI, utilizing comprehensive EHR data. The model's potential in interpreting complex, non-linear interactions among variables, highlights the role of ML in clinical decision-making and precision medicine.

Receiver Operating Characteristic (ROC) Curve displaying the diagnostic ability of a predictive model for bowel surgery with an Area Under the Curve (AUC) of 0.86
SHAP (SHapley Additive exPlanations) summary plot showing the impact of various features on the model predicting bowel surgery. Features are ranked by importance, with color indicating feature value (high or low) and horizontal position showing the impact on model prediction