315

GASTROENTEROLOGY SPECIFIC AI MODEL OUTPERFORMS ATTENDING PHYSICIAN CLINICAL NOTES IN A REAL-WORLD DATA EVALUATION

Date
May 19, 2024
Explore related products in the following collection:

Introduction: Artificial Intelligence (AI) large language models (LLMs) show promise in medicine, however general-purpose AI models underperform on clinical tasks. Recognizing this potential, our team developed a specialty-specific multi-task clinical LLM: GastroGPT (Fig 1). We demonstrated superiority of the platform over general purpose LLMs in a simulated environment and now seek to compare GastroGPT’s note taking abilities to attending physicians using real-world clinical data.

Materials and Methods: GastroGPT was evaluated on 3,530 selected gastroenterology-focused intensive care admissions in Medical Information Mart III (MIMIC-III), which includes de-identified, comprehensive clinical patient data. Selected attending physician notes were assessed across seven domains mirroring clinical flow and were used to select cases representing the gastroenterology subspecialties. A novel guideline-based, expert-derived weighted objective rubric called the Clinical Language Model Evaluation Rubric (CLEAR) was used to assess GastroGPT and physician performance on key clinical tasks, including assessment, diagnostic workup, treatment planning, follow-up, multidisciplinary care, history gathering, and patient education. CLEAR incorporates subtasks and essential skills under each task to enable standardized evaluation. In total, CLEAR encompasses 57 benchmarked and weighted subtasks across the clinical tasks. Overall weighted performance was the primary outcome, with secondary outcomes of individual task performance and consistency across case complexity. Multivariable regression identified score predictors.

Results: GastroGPT achieved higher note scores versus attending physicians for gastroenterology focused cases (8.1 ± 0.6 vs 6.5 ± 1.4 p<0.001). Across all clinical tasks in the notes, GastroGPT showed superior performance to attending physicians (Fig 2): 1) assessment and summary (8.5 ± 0.3 vs 7.18 ± 0.59) 2) diagnostic workup (8.5 ± 0.4 vs 7.6 ± 0.3, p<0.001), 3) treatment planning and management (7.6 ± 0.4 vs 6.5 ± 0.4, p<0.001), 4) follow-up and 5) multidisciplinary care (8.5 ± 0.3 vs 6.7 ± 0.6, p<0.001). In 6) Additional history and 7) Patient Education, GastroGPT was compared only with ChatGPT. GastroGPT was superior to ChatGPT4 in all cases (p<0.001), which scored inferior to physicians (5.2±2.1 vs 6.5±1.4; p<0.05). In multivariable analysis, GastroGPT was a predictor of higher scores after adjusting for other clinical factors. Subgroup analysis demonstrated consistent GastroGPT performance by complexity.

Conclusion: The gastroenterology-specific AI model GastroGPT achieved superior performance to attending notetaking across all tasks, while the general AI model, ChatGPT4, was inferior to GastroGPT and physician notes.
<b>Figure 1. </b>Diagram illustrating the design and working principles of gastroenterology specific artificial intelligence large language model, GastroGPT.

Figure 1. Diagram illustrating the design and working principles of gastroenterology specific artificial intelligence large language model, GastroGPT.

<b>Figure 2. </b>Bar chart comparing the evaluation scores of GastroGPT, attending physician and ChatGPT.<br /> * p < 0.05 vs ChatGPT<br /> ** p < 0.05 vs Attending Physician and ChatGPT

Figure 2. Bar chart comparing the evaluation scores of GastroGPT, attending physician and ChatGPT.
* p < 0.05 vs ChatGPT
** p < 0.05 vs Attending Physician and ChatGPT


Tracks

Related Products

Thumbnail for ROBOT-ASSISTED VERSUS CONVENTIONAL ENDOSCOPIC SUBMUCOSAL DISSECTION FOR COLONIC LESIONS: A RANDOMIZED, CONTROLLED, BOVINE COLON STUDY
ROBOT-ASSISTED VERSUS CONVENTIONAL ENDOSCOPIC SUBMUCOSAL DISSECTION FOR COLONIC LESIONS: A RANDOMIZED, CONTROLLED, BOVINE COLON STUDY
Endoscopic submucosal dissection (ESD) is becoming the preferred method for the management of early gastrointestinal (GI) malignancies…
Thumbnail for THE MULTIVIEW PERSPECTIVE (MVP) STUDY: A BLINDED, TANDEM PROSPECTIVE TRIAL OF FORWARD-VIEW VERSUS SIDE-VIEW EXAMINATION DURING ERCP
THE MULTIVIEW PERSPECTIVE (MVP) STUDY: A BLINDED, TANDEM PROSPECTIVE TRIAL OF FORWARD-VIEW VERSUS SIDE-VIEW EXAMINATION DURING ERCP
INTRODUCTION: Most endoscopists perform ERCP with only a side-viewing duodenoscope. We hypothesize significant gastrointestinal findings are missed due to the side-viewing design of the duodenoscope and that at least a subset of patients would benefit from concomitant forward-viewing exam (i.e…
Thumbnail for ENDOSCOPIC ULTRASOUND (EUS)-GUIDED GASTRIC PER-ORAL ENDOSCOPIC MYOTOMY (G-POEM) FOR THE TREATMENT OF BENIGN GASTRIC OUTLET OBSTRUCTION (GOO) IN THE REMNANT STOMACH IN A PATIENT WITH ROUX-EN-Y GASTRIC BYPASS (RYGB)
ENDOSCOPIC ULTRASOUND (EUS)-GUIDED GASTRIC PER-ORAL ENDOSCOPIC MYOTOMY (G-POEM) FOR THE TREATMENT OF BENIGN GASTRIC OUTLET OBSTRUCTION (GOO) IN THE REMNANT STOMACH IN A PATIENT WITH ROUX-EN-Y GASTRIC BYPASS (RYGB)
BACKGROUND: Benign and malignant gastric outlet obstruction (GOO) at the remnant stomach in patients with Roux-en-Y gastric bypass (RYGB) can potentially be life-threatening…
Thumbnail for ENDOSCOPIC VERSUS SURGICAL TREATMENT OF WEIGHT REGAIN FOLLOWING SLEEVE GASTRECTOMY
ENDOSCOPIC VERSUS SURGICAL TREATMENT OF WEIGHT REGAIN FOLLOWING SLEEVE GASTRECTOMY
Weight regain following sleeve gastrectomy (SG) is not uncommon. This condition is often treated with surgical re-sleeve (SRS) or conversion to Roux-en-Y gastric bypass (SG-GB), especially for patients with concomitant acid reflux…