541

AUTOMATING CROHN’S DISEASE PHENOTYPING: A NATURAL LANGUAGE PROCESSING APPROACH

Date
May 19, 2024
Explore related products in the following collection:

Background: The Montreal Classification (MC) captures the heterogeneity of Crohn's disease (CD). While the MC is an important tool for characterizing CD, its ascertainment for real-world studies requires manual chart review that is labor-intensive with limited scalability. We therefore aimed to use the Electronic Health Records (EHR) to develop automated MC phenotyping, and, using this information, to identify CD incident cases and controls.

Methods: We defined CD patients (n=7,624) from the Mount Sinai Health System EHR based on CD diagnosis codes and medications (Figure 1). We then developed a pipeline for automated phenotyping to extract MC disease behavior and age at diagnosis from EHR narrative texts, using a rule-based approach based on the spaCy framework. Two reviewers labeled a test set of randomly selected clinical notes (n=150) and radiology reports (n=50) at sentence-level (n=15,390). The algorithms were evaluated for recall, precision, specificity, and F1-Scores. For each CD patient the first coded CD diagnosis was considered as disease index date. We compared the index date with the prior patient encounter history and the extracted age at diagnosis to filter for incident cases. To confirm the validity of the extracted incident case cohort, the index date, and control cohort, we conducted manual chart review of 50 randomly selected cases and controls of the resulting cohorts.

Results: For the sentence-level labeled test data, the Cohen's kappa inter-annotator agreement was 0.84. For MC disease behavior the developed algorithm had high recall using clinical notes, with a minimum value of 0.92 for B2, and reduced recall using radiology reports (0.64 for B3, 0.71 for B2) at note-level (Table 1). Perianal disease was identified with high recall (1.00) and precision (0.86 with clinical notes and 0.80 with radiology reports). For age at diagnosis, recall and precision values were 0.81 and 0.88 on note-level, respectively. Upon achieving good performance of the algorithms, we were able to extract the age at diagnosis from the clinical text of 4,344 Crohn's disease patients of the Mount Sinai Health System and compared this information with the first coded patient encounters and CD diagnosis in the patients’ EHR, resulting in a sub-cohort of 229 Crohn's disease incident cases (Figure 1). With our phenotyping algorithm, we were able to identify cases and controls with high accuracy (0.96 and 0.95, respectively). In 83% of cases, the automatically identified first date of CD diagnosis was at most 180 days before the reviewed first date of diagnosis.

Conclusion: We demonstrate the feasibility of automatically extracting CD diagnosis and MC from clinical texts with good precision using EHR data. This approach can facilitate data extraction for real-world research at large scale and demonstrated utility in identifying newly diagnosed patients with CD.
<b>Figure 1: </b>NLP-based phenotyping algorithm to identify Crohn’s disease (CD) incident cases. CD cases were defined based on prescribed medication and coded CD diagnosis on at least three different days. The year of diagnosis [YoD (NLP)] was extracted from clinical text and subsequently compared with the codified year of diagnosis [YoD (EHR)]. Only patients that had their first clinical encounter at least 180 days before the first coded CD diagnosis were considered CD incident cases. Cohort sizes after each filtering step are indicated next to the arrows. MSDW: Mount Sinai Data Warehouse, dx: diagnosis, med: medication.

Figure 1: NLP-based phenotyping algorithm to identify Crohn’s disease (CD) incident cases. CD cases were defined based on prescribed medication and coded CD diagnosis on at least three different days. The year of diagnosis [YoD (NLP)] was extracted from clinical text and subsequently compared with the codified year of diagnosis [YoD (EHR)]. Only patients that had their first clinical encounter at least 180 days before the first coded CD diagnosis were considered CD incident cases. Cohort sizes after each filtering step are indicated next to the arrows. MSDW: Mount Sinai Data Warehouse, dx: diagnosis, med: medication.

<b>Table 1: </b>Performance of rule-based phenotyping algorithms on note-level using the newly annotated test datasets. P: perianal, B2: stricturing behavior, B3: penetrating behavior.

Table 1: Performance of rule-based phenotyping algorithms on note-level using the newly annotated test datasets. P: perianal, B2: stricturing behavior, B3: penetrating behavior.


Tracks

Related Products

Thumbnail for SIMPLE ENDOSCOPIC SCORE FOR CROHN’S DISEASE (SES-CD) ≥ 7 PREDICTS DISEASE PROGRESSION IN PATIENTS WITH MILD CD
SIMPLE ENDOSCOPIC SCORE FOR CROHN’S DISEASE (SES-CD) ≥ 7 PREDICTS DISEASE PROGRESSION IN PATIENTS WITH MILD CD
Approximately 20-30% of individuals with Crohn's disease (CD) experience a relatively mild disease course without progression. However, there is no widely accepted objective definition of mild CD. We aimed to identify endoscopic severity cut-offs that can be used to define mild non-progressive CD…
Thumbnail for IBD Diagnosis. Monitoring, and Prediction of Complications
IBD Diagnosis. Monitoring, and Prediction of Complications
SOCIETY: AGA This session highlights the latest research on novel prediction models in IBD as well as the role of the fecal microbiome and imaging in IBD…
Thumbnail for SINGLE CELL SEQUENCING REVEALS CRITICAL ROLE OF STEM CELL GRAFT IN EFFICACY OF AUTOLOGOUS STEM CELL TRANSPLANT FOR REFRACTORY CROHNS DISEASE
SINGLE CELL SEQUENCING REVEALS CRITICAL ROLE OF STEM CELL GRAFT IN EFFICACY OF AUTOLOGOUS STEM CELL TRANSPLANT FOR REFRACTORY CROHNS DISEASE
Autologous stem cell transplant (auto-SCT) is unparalleled in its ability to induce clinical and endoscopic remission for the treatment of refractory Crohn’s disease (CD)…
Thumbnail for DISABILITY IN CROHN'S DISEASE PATIENTS AT DIAGNOSIS: FINDINGS FROM THE CROCO (CROHN'S DISEASE COHORT) STUDY
DISABILITY IN CROHN'S DISEASE PATIENTS AT DIAGNOSIS: FINDINGS FROM THE CROCO (CROHN'S DISEASE COHORT) STUDY
BACKGROUND: Crohn's disease (CD) can lead to progressive bowel damage and disability. Disability has been proposed by the SPIRIT-IOIBD consensus as an endpoint in disease-modification trials. Despite this, there is scarce data on disability at CD diagnosis…