314

VALIDATION OF GPT-4 FOR CLINICAL EVENT CLASSIFICATION: A COMPARATIVE ANALYSIS WITH ICD CODES AND HUMAN REVIEWERS

Date
May 19, 2024

BACKGROUND: Effective clinical event classification is essential for clinical research and quality improvement. The validation of artificial intelligence (AI) models like Generative Pre-trained Transformer 4 (GPT-4) for this task and comparison with conventional methods remains unexplored.

METHODS: We evaluated the performance of the GPT-4 model for classifying gastrointestinal (GI) bleeding episodes from 200 medical discharge summaries and compared the results with a human review and an InternationalClassification of Diseases (ICD) code-based system. The analysis included accuracy, sensitivity, and specificity evaluation, using ground truth determined by three independent physician reviewers.

RESULTS: GPT-4 exhibited an accuracy of 94.4% in identifying GI bleeding occurrences, outperforming ICD codes(accuracy 63.5%, P<0.001). GPT-4’s accuracy was either slightly lower or statistically similar to individual human reviewers (Reviewer 1: 98.5%, P<0.001; Reviewer 2: 90.8%, P=0.170). For location classification, GPT-4 achieved accuracies of 81.7% and 83.5% for confirmed and probable GI bleeding locations, respectively, with figures that were either slightly lower or comparable to those of human reviewers. GPT-4 was highly efficient, analyzing the dataset in 12.7 minutes at a cost of $21.2 USD, whereas human reviewers required 8-9 hours each.

CONCLUSION: Our study indicates GPT-4 offers a reliable, cost-efficient, and faster alternative to current clinical event classification methods, outperforming the conventional ICD coding system and performing comparably to individual expert human reviewers. Its implementation could facilitate more accurate and granular clinical research and quality audits. However, GPT-4’s current limitations, such as occasional hallucinations and nuanced clinical scenario processing, warrant further investigation and model refinement. Future research should explore the scalability, privacy, and ethical implications of high-performance AI models in clinical data processing.
<b>Figure</b>. Confusion Matrix for GPT

Figure. Confusion Matrix for GPT

<b>Table</b>. Comparison of Performance between GPT, ICD, Reviewer 1, and Reviewer 2 for Gastrointestinal Bleeding Detection and Localization

Table. Comparison of Performance between GPT, ICD, Reviewer 1, and Reviewer 2 for Gastrointestinal Bleeding Detection and Localization


Tracks

Related Products

Thumbnail for INTER-HOSPITAL VARIABILITY AND TRENDS IN CHOLECYSTECTOMY RATES FOR BILIARY COLIC: A NATIONAL PERSPECTIVE FROM 2012 TO 2019
INTER-HOSPITAL VARIABILITY AND TRENDS IN CHOLECYSTECTOMY RATES FOR BILIARY COLIC: A NATIONAL PERSPECTIVE FROM 2012 TO 2019
BACKGROUND: Cholecystectomy (CCY) for biliary colic (BC) is subject to varied interpretations of surgical indications, thus influencing decision-making processes. This study aims to elucidate the longitudinal trends and inter-hospital variability in CCY rates for BC hospitalizations in the US…
Thumbnail for OPTIMIZING DIAGNOSTIC PRECISION: PATIENT SELECTION AND DIAGNOSTIC ENDOSCOPY IN GASTROINTESTINAL GRAFT VERSUS HOST DISEASE POST ALLOGENEIC BONE MARROW TRANSPLANTATION
OPTIMIZING DIAGNOSTIC PRECISION: PATIENT SELECTION AND DIAGNOSTIC ENDOSCOPY IN GASTROINTESTINAL GRAFT VERSUS HOST DISEASE POST ALLOGENEIC BONE MARROW TRANSPLANTATION
Graft versus host disease (GvHD) is the most common complication of bone marrow transplant (BMT) and affects the gastrointestinal system in 50% of cases. Symptoms can be difficult to separate from immunosuppression or chemo side effects…
Thumbnail for ENDOSCOPIC GASTRIC REMODELING FOR THE TREATMENT OF DIABESITY: AN ORGAN-ON-A-CHIP MODEL
ENDOSCOPIC GASTRIC REMODELING FOR THE TREATMENT OF DIABESITY: AN ORGAN-ON-A-CHIP MODEL
Background: Gastric sleeve stenosis (GSS) is an increasingly common adverse event following sleeve gastrectomy (SG) and thought to result from progressive rotation and/or scarring of the sleeve. Objective diagnostic criteria for this condition are lacking…