541

AUTOMATING CROHN’S DISEASE PHENOTYPING: A NATURAL LANGUAGE PROCESSING APPROACH

Date
May 19, 2024
Explore related products in the following collection:

Background: The Montreal Classification (MC) captures the heterogeneity of Crohn's disease (CD). While the MC is an important tool for characterizing CD, its ascertainment for real-world studies requires manual chart review that is labor-intensive with limited scalability. We therefore aimed to use the Electronic Health Records (EHR) to develop automated MC phenotyping, and, using this information, to identify CD incident cases and controls.

Methods: We defined CD patients (n=7,624) from the Mount Sinai Health System EHR based on CD diagnosis codes and medications (Figure 1). We then developed a pipeline for automated phenotyping to extract MC disease behavior and age at diagnosis from EHR narrative texts, using a rule-based approach based on the spaCy framework. Two reviewers labeled a test set of randomly selected clinical notes (n=150) and radiology reports (n=50) at sentence-level (n=15,390). The algorithms were evaluated for recall, precision, specificity, and F1-Scores. For each CD patient the first coded CD diagnosis was considered as disease index date. We compared the index date with the prior patient encounter history and the extracted age at diagnosis to filter for incident cases. To confirm the validity of the extracted incident case cohort, the index date, and control cohort, we conducted manual chart review of 50 randomly selected cases and controls of the resulting cohorts.

Results: For the sentence-level labeled test data, the Cohen's kappa inter-annotator agreement was 0.84. For MC disease behavior the developed algorithm had high recall using clinical notes, with a minimum value of 0.92 for B2, and reduced recall using radiology reports (0.64 for B3, 0.71 for B2) at note-level (Table 1). Perianal disease was identified with high recall (1.00) and precision (0.86 with clinical notes and 0.80 with radiology reports). For age at diagnosis, recall and precision values were 0.81 and 0.88 on note-level, respectively. Upon achieving good performance of the algorithms, we were able to extract the age at diagnosis from the clinical text of 4,344 Crohn's disease patients of the Mount Sinai Health System and compared this information with the first coded patient encounters and CD diagnosis in the patients’ EHR, resulting in a sub-cohort of 229 Crohn's disease incident cases (Figure 1). With our phenotyping algorithm, we were able to identify cases and controls with high accuracy (0.96 and 0.95, respectively). In 83% of cases, the automatically identified first date of CD diagnosis was at most 180 days before the reviewed first date of diagnosis.

Conclusion: We demonstrate the feasibility of automatically extracting CD diagnosis and MC from clinical texts with good precision using EHR data. This approach can facilitate data extraction for real-world research at large scale and demonstrated utility in identifying newly diagnosed patients with CD.
<b>Figure 1: </b>NLP-based phenotyping algorithm to identify Crohn’s disease (CD) incident cases. CD cases were defined based on prescribed medication and coded CD diagnosis on at least three different days. The year of diagnosis [YoD (NLP)] was extracted from clinical text and subsequently compared with the codified year of diagnosis [YoD (EHR)]. Only patients that had their first clinical encounter at least 180 days before the first coded CD diagnosis were considered CD incident cases. Cohort sizes after each filtering step are indicated next to the arrows. MSDW: Mount Sinai Data Warehouse, dx: diagnosis, med: medication.

Figure 1: NLP-based phenotyping algorithm to identify Crohn’s disease (CD) incident cases. CD cases were defined based on prescribed medication and coded CD diagnosis on at least three different days. The year of diagnosis [YoD (NLP)] was extracted from clinical text and subsequently compared with the codified year of diagnosis [YoD (EHR)]. Only patients that had their first clinical encounter at least 180 days before the first coded CD diagnosis were considered CD incident cases. Cohort sizes after each filtering step are indicated next to the arrows. MSDW: Mount Sinai Data Warehouse, dx: diagnosis, med: medication.

<b>Table 1: </b>Performance of rule-based phenotyping algorithms on note-level using the newly annotated test datasets. P: perianal, B2: stricturing behavior, B3: penetrating behavior.

Table 1: Performance of rule-based phenotyping algorithms on note-level using the newly annotated test datasets. P: perianal, B2: stricturing behavior, B3: penetrating behavior.


Tracks

Related Products

Thumbnail for CHARACTERIZING EPITOPES AND ISOTYPES OF CROHN'S DISEASE-ASSOCIATED ANTI-GRANULOCYTE MACROPHAGE-COLONY STIMULATING FACTOR ANTIBODIES.
CHARACTERIZING EPITOPES AND ISOTYPES OF CROHN'S DISEASE-ASSOCIATED ANTI-GRANULOCYTE MACROPHAGE-COLONY STIMULATING FACTOR ANTIBODIES.
Granulocyte Macrophage-Colony Stimulating Factor (GM-CSF) maintains mononuclear phagocytes (MNP) and supports intestinal immune homeostasis. Anti-GM-CSF autoantibodies (aGMAbs) are found in approx. 25% of all Crohn’s disease (CD) patients…
Thumbnail for DECIPHERING THE POTENTIAL CRITICAL LINK BETWEEN VIRUSES AND GM-CSF AUTOANTIBODIES IN PREDICTING CROHN'S DISEASE COMPLICATIONS
DECIPHERING THE POTENTIAL CRITICAL LINK BETWEEN VIRUSES AND GM-CSF AUTOANTIBODIES IN PREDICTING CROHN'S DISEASE COMPLICATIONS
Anti-granulocyte/macrophage-colony stimulating factor autoantibody (GM-CSF AuAB) has been identified before the onset of Crohn’s Disease (CD) and is associated with an increase in complicated disease. However, its connection to the gut microbiome, a key factor in CD progression, remains unclear…
Thumbnail for IBD Diagnosis. Monitoring, and Prediction of Complications
IBD Diagnosis. Monitoring, and Prediction of Complications
SOCIETY: AGA This session highlights the latest research on novel prediction models in IBD as well as the role of the fecal microbiome and imaging in IBD…
Thumbnail for SIMPLE ENDOSCOPIC SCORE FOR CROHN’S DISEASE (SES-CD) ≥ 7 PREDICTS DISEASE PROGRESSION IN PATIENTS WITH MILD CD
SIMPLE ENDOSCOPIC SCORE FOR CROHN’S DISEASE (SES-CD) ≥ 7 PREDICTS DISEASE PROGRESSION IN PATIENTS WITH MILD CD
Approximately 20-30% of individuals with Crohn's disease (CD) experience a relatively mild disease course without progression. However, there is no widely accepted objective definition of mild CD. We aimed to identify endoscopic severity cut-offs that can be used to define mild non-progressive CD…