Background: The Montreal Classification (MC) captures the heterogeneity of Crohn's disease (CD). While the MC is an important tool for characterizing CD, its ascertainment for real-world studies requires manual chart review that is labor-intensive with limited scalability. We therefore aimed to use the Electronic Health Records (EHR) to develop automated MC phenotyping, and, using this information, to identify CD incident cases and controls.
Methods: We defined CD patients (n=7,624) from the Mount Sinai Health System EHR based on CD diagnosis codes and medications (Figure 1). We then developed a pipeline for automated phenotyping to extract MC disease behavior and age at diagnosis from EHR narrative texts, using a rule-based approach based on the spaCy framework. Two reviewers labeled a test set of randomly selected clinical notes (n=150) and radiology reports (n=50) at sentence-level (n=15,390). The algorithms were evaluated for recall, precision, specificity, and F1-Scores. For each CD patient the first coded CD diagnosis was considered as disease index date. We compared the index date with the prior patient encounter history and the extracted age at diagnosis to filter for incident cases. To confirm the validity of the extracted incident case cohort, the index date, and control cohort, we conducted manual chart review of 50 randomly selected cases and controls of the resulting cohorts.
Results: For the sentence-level labeled test data, the Cohen's kappa inter-annotator agreement was 0.84. For MC disease behavior the developed algorithm had high recall using clinical notes, with a minimum value of 0.92 for B2, and reduced recall using radiology reports (0.64 for B3, 0.71 for B2) at note-level (Table 1). Perianal disease was identified with high recall (1.00) and precision (0.86 with clinical notes and 0.80 with radiology reports). For age at diagnosis, recall and precision values were 0.81 and 0.88 on note-level, respectively. Upon achieving good performance of the algorithms, we were able to extract the age at diagnosis from the clinical text of 4,344 Crohn's disease patients of the Mount Sinai Health System and compared this information with the first coded patient encounters and CD diagnosis in the patients’ EHR, resulting in a sub-cohort of 229 Crohn's disease incident cases (Figure 1). With our phenotyping algorithm, we were able to identify cases and controls with high accuracy (0.96 and 0.95, respectively). In 83% of cases, the automatically identified first date of CD diagnosis was at most 180 days before the reviewed first date of diagnosis.
Conclusion: We demonstrate the feasibility of automatically extracting CD diagnosis and MC from clinical texts with good precision using EHR data. This approach can facilitate data extraction for real-world research at large scale and demonstrated utility in identifying newly diagnosed patients with CD.
![<b>Figure 1: </b>NLP-based phenotyping algorithm to identify Crohn’s disease (CD) incident cases. CD cases were defined based on prescribed medication and coded CD diagnosis on at least three different days. The year of diagnosis [YoD (NLP)] was extracted from clinical text and subsequently compared with the codified year of diagnosis [YoD (EHR)]. Only patients that had their first clinical encounter at least 180 days before the first coded CD diagnosis were considered CD incident cases. Cohort sizes after each filtering step are indicated next to the arrows. MSDW: Mount Sinai Data Warehouse, dx: diagnosis, med: medication.](https://assets.prod.dp.digitellcdn.com/api/services/imgopt/fmt_webp/akamai-opus-nc-public.digitellcdn.com/uploads/ddw/abstracts/4035787_File000001.jpg.webp)
Figure 1: NLP-based phenotyping algorithm to identify Crohn’s disease (CD) incident cases. CD cases were defined based on prescribed medication and coded CD diagnosis on at least three different days. The year of diagnosis [YoD (NLP)] was extracted from clinical text and subsequently compared with the codified year of diagnosis [YoD (EHR)]. Only patients that had their first clinical encounter at least 180 days before the first coded CD diagnosis were considered CD incident cases. Cohort sizes after each filtering step are indicated next to the arrows. MSDW: Mount Sinai Data Warehouse, dx: diagnosis, med: medication.
Table 1: Performance of rule-based phenotyping algorithms on note-level using the newly annotated test datasets. P: perianal, B2: stricturing behavior, B3: penetrating behavior.