Background: Diagnosis of gastric intestinal metaplasia (GIM) is highly challenging for trainee endoscopists (TEs) due to subtle mucosal changes with mostly flat lesions which can be easily overlooked. We hypothesized that incorporating “Deep-GI”, an artificial intelligence (AI) model specifically designed for real-time localizing and segmenting GIM lesions might improve their performance. This study aimed to compare the performance of TEs in diagnosing GIM versus AI reading.
Methods: From 2/2023 to 9/2023, we enrolled patients with suspected GIM for a surveillance esophagogastroduodenoscopy (EGD). All EGDs were performed by TEs independently and attending staff were allowed to intervene only when a significant lesion was missed. All endoscopists were blinded to the previous EGD and pathology results if EGD had been performed previously. The GIM detection was mapped to fit in 5 areas in accordance with Sydney protocol, using white light (WLI) followed by narrow-band imaging (NBI) in a sequenced manner. The procedures were simultaneously displayed on 2 monitors, one unlabeled for the TE, and another with AI label shown separately to the research officer who muted about the AI results. The results of GIM reading from 5 areas by TEs and AI were recorded separately. In areas where either AI or TE identified GIM, a targeted biopsy of that lesion was obtained. In the area where no GIM was detected by both AI and TE, a random biopsy was performed to confirm the absence of GIM. Pathological diagnosis needed unanimous confirmation by 2 pathologists. The performances of TEs and AI reading were compared using different validity values including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy.
Results: A total of 375 biopsies were obtained from 75 patients (64±9.5 years; 39% male). A total of 77 lesions (20.5%) from 34 patients were GIM. During the WLI inspection, TEs’ GIM diagnosis showed 44%sensitivity, 94%specificity, 64%PPV, 87%NPV, and 83%accuracy. Incorporating AI into WLI showed significantly increased TE’s sensitivity from 44% to 64% and accuracy from 83% to 89%, respectively (p<0.05 all). Compared to WLI, NBI alone can increase sensitivity to 68% (p=0.004). However, the specificity decreased from 94 to 87% (p=0.004) and no improvement in accuracy was observed. By combining AI with NBI, the sensitivity, NPV, and accuracy increased to 71.4%, 92.6%, and 88.5%, respectively (p<0.05 all), while specificity and PPV maintained at 93% and 72.4%, respectively.
Conclusion: The sensitivity and PPV for GIM diagnosis by TEs under WLI were suboptimal. Both NBI and AI improved sensitivity. However, NBI readings decreased specificity while AI readings did not. A combination of NBI and AI further increased sensitivity, NPV, and accuracy while maintaining high specificity and PPV.

Figure: Demonstrated the position of trainee endoscopist and two monitor displays, unlabeled and the other AI-labeled monitor during the study.
Table 2: Comparison of validity scores of GIM diagnosis between trainee endoscopists using white light EGD and NBI with and without AI.