313

ENHANCING PATIENT-REPORTED OUTCOMES ASSESSMENT IN INFLAMMATORY BOWEL DISEASE: A COMPARATIVE STUDY OF NATURAL LANAUGE PROCESSING METHODS AND GPT-4 INTEGRATION

Date
May 19, 2024

Background: Patient-reported outcomes (PROs) are a core measure of disease activity in clinical studies of inflammatory bowel disease (IBD). While PROs are commonly captured in a structured form in prospective trials, they are typically captured as free text in electronic health records (EHR), significantly limiting their utility for research and quality improvement. This contributes to missing data in IBD registries, or the exclusion of unstructured information in favor of analysis-ready data. Computational methods for extracting information from free text have recently undergone dramatic changes, particularly following the release of GPT-4, OpenAI’s latest large language model (LLM).

Aims: To develop, evaluate, and compare natural language processing (NLP) methods for extracting 3 IBD PROs from clinical notes: abdominal pain, diarrhea, and fecal blood.

Methods: We queried a deidentified EHR database at the University of California San Francisco (UCSF) to establish a corpus of IBD clinic notes for model training and evaluation (Table 1). Two physicians independently annotated 1,050 notes for the presence or absence of PROs using a predefined protocol. 900 notes (85%) were used to train custom extraction algorithms that incorporate rule-based approaches (Regex), NLP programs (SciSpacy), and supervised learning models to predict PROs in each note. The remaining notes (15%) were used for internal testing and model selection. The top-performing model was externally tested on notes from Stanford University. As a third comparison, we evaluated the model against the zero-shot (without task-specific training data) performance of GPT-4 using the UCSF test set.

Results: Inter-rater reliability between annotators was >90%. The top-performing UCSF models were XGBoost models, with accuracies of 92% (abdominal pain), 82% (diarrhea), and 80% (fecal blood) on an internal test set (n=50). On external validation at Stanford (n=250), all model accuracies were 61-62%, with better ability to identify absence over presence of PROs (Table 2). The zero-shot GPT-4 model was 91% (abdominal pain), 90% (diarrhea), and 88% (fecal blood) accurate on the same UCSF test set.

Conclusions: Traditional NLP models achieve high institution-specific accuracy, and our open-source code provides a framework for building such custom tools. However, variation in documentation styles across institutions likely decreases their generalizability. In contrast, GPT-4 outperforms our custom NLP model despite no prior training on institutional data. GPT-4 shows resiliency across domain shifts, with outstanding performance across a test set that spanned 10-years, included multiple authors, writing styles, note templates, and changes in IBD management guidelines. LLMs like GPT represent an exciting opportunity to leverage unstructured EHR data for future IBD outcomes research.

Tracks

Related Products

Thumbnail for CLOSTRIDIOIDES DIFFICILE INFECTION INDUCES A PRO-STEATOTIC AND PRO-INFLAMMATORY METABOLIC STATE IN LIVER
CLOSTRIDIOIDES DIFFICILE INFECTION INDUCES A PRO-STEATOTIC AND PRO-INFLAMMATORY METABOLIC STATE IN LIVER
BACKGROUND: Recent studies suggest links between _Clostridioides difficile_ infection (CDI) and liver disorders, with non-alcoholic fatty liver disease (NAFLD) increasing CDI risk and CDI exacerbating the progression and prognosis of liver cirrhosis. Moreover, gut dysbiosis, often leading to _C…
Thumbnail for A PROSPECTIVE, MULTI-INSTITUTIONAL STUDY REVEALS THE COMBINATION OF RNA ANALYSIS WITH DNA-BASED NEXT-GENERATION SEQUENCING (NGS) IMPROVES THE PREOPERATIVE CLASSIFICATION OF PANCREATIC CYSTS AND IDENTIFICATION OF ADVANCED NEOPLASIA
A PROSPECTIVE, MULTI-INSTITUTIONAL STUDY REVEALS THE COMBINATION OF RNA ANALYSIS WITH DNA-BASED NEXT-GENERATION SEQUENCING (NGS) IMPROVES THE PREOPERATIVE CLASSIFICATION OF PANCREATIC CYSTS AND IDENTIFICATION OF ADVANCED NEOPLASIA
BACKGROUND: As outlined by the Kyoto guidelines, targeted DNA-based NGS of pancreatic cyst fluid (PCF) is an important adjunct to the evaluation of pancreatic cyst patients…
Thumbnail for INFLIXIMAB CLEARANCE IN RELATION TO DISEASE ACTIVITY DURING INDUCTION AND MAINTENANCE THERAPY OF ACUTE SEVERE AND AMBULATORY PEDIATRIC ULCERATIVE COLITIS
INFLIXIMAB CLEARANCE IN RELATION TO DISEASE ACTIVITY DURING INDUCTION AND MAINTENANCE THERAPY OF ACUTE SEVERE AND AMBULATORY PEDIATRIC ULCERATIVE COLITIS
BACKGROUND: The low fermentable oligosaccharide, disaccharide, monosaccharide, and polyol (FODMAP) diet, which restricts FODMAP intake, currently is used as a treatment for pain-related disorders of gut-brain interaction (DGBI)…