315

GASTROENTEROLOGY SPECIFIC AI MODEL OUTPERFORMS ATTENDING PHYSICIAN CLINICAL NOTES IN A REAL-WORLD DATA EVALUATION

Date
May 19, 2024
Explore related products in the following collection:

Introduction: Artificial Intelligence (AI) large language models (LLMs) show promise in medicine, however general-purpose AI models underperform on clinical tasks. Recognizing this potential, our team developed a specialty-specific multi-task clinical LLM: GastroGPT (Fig 1). We demonstrated superiority of the platform over general purpose LLMs in a simulated environment and now seek to compare GastroGPT’s note taking abilities to attending physicians using real-world clinical data.

Materials and Methods: GastroGPT was evaluated on 3,530 selected gastroenterology-focused intensive care admissions in Medical Information Mart III (MIMIC-III), which includes de-identified, comprehensive clinical patient data. Selected attending physician notes were assessed across seven domains mirroring clinical flow and were used to select cases representing the gastroenterology subspecialties. A novel guideline-based, expert-derived weighted objective rubric called the Clinical Language Model Evaluation Rubric (CLEAR) was used to assess GastroGPT and physician performance on key clinical tasks, including assessment, diagnostic workup, treatment planning, follow-up, multidisciplinary care, history gathering, and patient education. CLEAR incorporates subtasks and essential skills under each task to enable standardized evaluation. In total, CLEAR encompasses 57 benchmarked and weighted subtasks across the clinical tasks. Overall weighted performance was the primary outcome, with secondary outcomes of individual task performance and consistency across case complexity. Multivariable regression identified score predictors.

Results: GastroGPT achieved higher note scores versus attending physicians for gastroenterology focused cases (8.1 ± 0.6 vs 6.5 ± 1.4 p<0.001). Across all clinical tasks in the notes, GastroGPT showed superior performance to attending physicians (Fig 2): 1) assessment and summary (8.5 ± 0.3 vs 7.18 ± 0.59) 2) diagnostic workup (8.5 ± 0.4 vs 7.6 ± 0.3, p<0.001), 3) treatment planning and management (7.6 ± 0.4 vs 6.5 ± 0.4, p<0.001), 4) follow-up and 5) multidisciplinary care (8.5 ± 0.3 vs 6.7 ± 0.6, p<0.001). In 6) Additional history and 7) Patient Education, GastroGPT was compared only with ChatGPT. GastroGPT was superior to ChatGPT4 in all cases (p<0.001), which scored inferior to physicians (5.2±2.1 vs 6.5±1.4; p<0.05). In multivariable analysis, GastroGPT was a predictor of higher scores after adjusting for other clinical factors. Subgroup analysis demonstrated consistent GastroGPT performance by complexity.

Conclusion: The gastroenterology-specific AI model GastroGPT achieved superior performance to attending notetaking across all tasks, while the general AI model, ChatGPT4, was inferior to GastroGPT and physician notes.
<b>Figure 1. </b>Diagram illustrating the design and working principles of gastroenterology specific artificial intelligence large language model, GastroGPT.

Figure 1. Diagram illustrating the design and working principles of gastroenterology specific artificial intelligence large language model, GastroGPT.

<b>Figure 2. </b>Bar chart comparing the evaluation scores of GastroGPT, attending physician and ChatGPT.<br /> * p < 0.05 vs ChatGPT<br /> ** p < 0.05 vs Attending Physician and ChatGPT

Figure 2. Bar chart comparing the evaluation scores of GastroGPT, attending physician and ChatGPT.
* p < 0.05 vs ChatGPT
** p < 0.05 vs Attending Physician and ChatGPT


Tracks

Related Products

Thumbnail for ROBOT-ASSISTED VERSUS CONVENTIONAL ENDOSCOPIC SUBMUCOSAL DISSECTION FOR COLONIC LESIONS: A RANDOMIZED, CONTROLLED, BOVINE COLON STUDY
ROBOT-ASSISTED VERSUS CONVENTIONAL ENDOSCOPIC SUBMUCOSAL DISSECTION FOR COLONIC LESIONS: A RANDOMIZED, CONTROLLED, BOVINE COLON STUDY
Endoscopic submucosal dissection (ESD) is becoming the preferred method for the management of early gastrointestinal (GI) malignancies…
Thumbnail for METABOLIC OUTCOMES AND MECHANISMS OF ACTION FOR DUODENAL BI-PARTITION IN THE TREATMENT OF OBESITY AND TYPE 2 DIABETES MELLITUS: A 4-YEAR PROSPECTIVE OBSERVATIONAL STUDY
METABOLIC OUTCOMES AND MECHANISMS OF ACTION FOR DUODENAL BI-PARTITION IN THE TREATMENT OF OBESITY AND TYPE 2 DIABETES MELLITUS: A 4-YEAR PROSPECTIVE OBSERVATIONAL STUDY
INTRODUCTION: Roux-en-Y gastric bypass (RYGB) is an effective treatment for patients suffering from obesity and concomitant type 2 diabetes mellitus (T2DM). Nevertheless, less than 2% of eligible patients choose to undergo the surgery…
Thumbnail for EFFECT OF ENDOSCOPIC GASTRIC REMODELING ON HISTOLOGIC METABOLIC DYSFUNCTION-ASSOCIATED STEATOHEPATITIS (MASH) AND LIVER FIBROSIS: A 12-MONTH PROSPECTIVE OBSERVATIONAL STUDY
EFFECT OF ENDOSCOPIC GASTRIC REMODELING ON HISTOLOGIC METABOLIC DYSFUNCTION-ASSOCIATED STEATOHEPATITIS (MASH) AND LIVER FIBROSIS: A 12-MONTH PROSPECTIVE OBSERVATIONAL STUDY
Metabolic dysfunction-associated steatohepatitis (MASH) is traditionally treated with lifestyle modification. Nevertheless, most patients are unable to achieve the 10% total weight loss (TWL) threshold required for fibrosis regression…
Thumbnail for ENDOSCOPIC ULTRASOUND (EUS)-GUIDED GASTRIC PER-ORAL ENDOSCOPIC MYOTOMY (G-POEM) FOR THE TREATMENT OF BENIGN GASTRIC OUTLET OBSTRUCTION (GOO) IN THE REMNANT STOMACH IN A PATIENT WITH ROUX-EN-Y GASTRIC BYPASS (RYGB)
ENDOSCOPIC ULTRASOUND (EUS)-GUIDED GASTRIC PER-ORAL ENDOSCOPIC MYOTOMY (G-POEM) FOR THE TREATMENT OF BENIGN GASTRIC OUTLET OBSTRUCTION (GOO) IN THE REMNANT STOMACH IN A PATIENT WITH ROUX-EN-Y GASTRIC BYPASS (RYGB)
BACKGROUND: Benign and malignant gastric outlet obstruction (GOO) at the remnant stomach in patients with Roux-en-Y gastric bypass (RYGB) can potentially be life-threatening…