502

INTERPRETABLE DIAGNOSIS AND REPORTING OF GASTRIC LESIONS USING LARGE LANGUAGE MODEL UNDER WHITE LIGHT ENDOSCOPY

Date
May 19, 2024

Background: Gastric cancer is the third-leading cause of cancer-related death globally. Detection and diagnosis of early gastric cancer through white light endoscopy is critical for improving patients’ survival. However, previous computer-aided systems using Convolutional Neural Networks for assisting in diagnosis of EGC under white light endoscopy (WLE) have controversies regarding the “black-box” nature of the models. They cannot explain their diagnostic basis and will not present a text description of lesions. We aimed to develop an explainable intelligent system assisting in diagnosis and automatic reporting for gastric lesions based on the large language model (LLM) algorithm.
Method: We retrospectively collected 4758 images from Renmin Hospital of Wuhan University from 2016 to 2021. Images were randomly assigned to a training set and a testing set with a ratio of 9:1. Firstly, we matched the sentences that judge the lesions’ type with the images of benign lesions (including xanthelasma, submucosal tumors, benign ulcer, polyps). Then, 1800 images of early gastric neoplasms and focal non-neoplastic lesions shared similar feature types with the neoplasms were annotated by experts with five features (including morphology, tone, surface rough or smooth, spontaneous bleeding or not, and whether the boundary is visible) and described them in fluent sentences, paired with the corresponding image as the training set. We localized Mini-GPT4 and fine-tuned the model using the image-sentence pairs mentioned above to classify benign lesions and automatically describe features of early gastric neoplasms in sentences. Finally, We paired the images with corresponding feature statements to further train the algorithm and validate its accuracy in determining whether the lesion is early gastric neoplasm using only the descriptive sentences automatically generated by the model.
Results: The accuracy of classification tasks for benign lesions of LLM achieved 88.25%, and the average accuracy of feature description for focal neoplastic and non-neoplastic lesions of LLM achieved 79.12%. The accuracy of determining whether a lesion is early gastric neoplasm based on textual information was 77.22%.
Conclusion: Large language models have achieved considerable performance in the classification task of benign lesions and the feature description of suspicious lesions basing on images from white light endoscopy and can potentially determine whether suspicious lesions are EGC based on feature description sentences to a certain extent. It has an encouraging future for interpretable diagnosis and automated reporting of lesions under white light endoscopy.