Introduction: Despite the advances of recent years, the diagnosis and characterization of biliary strictures is challenging. Artificial intelligence applied to digital single-operator cholangioscopy (D-SOC) is expected to provide a significant increase in the diagnostic yield of indeterminate biliary strictures. Pilot studies using artificial intelligence (AI) algorithms applied to D-SOC have shown promising results. This multicentric study aimed to validate a CNN model on a large dataset of D-SOC images, providing automatic detection of malignant biliary strictures, as well as their morphological characterization.
Methods: Our group conducted an international study including D-SOC exams from 3 centers in Portugal (Centro Hospitalar Universitário de São João, Porto, Portugal, n=123), Spain (Hospital Universitario Puerta de Hierro Majadahonda, Madrid, Spain, n=18), and the United States of America (New York University Langone Hospital, New York, USA, n=23). Each frame was labelled as normal/benign findings or as a malignant lesion if histopathological evidence of biliary malignancy was available. Also, we evaluated the network's performance for detecting morphologic features, namely tumor vessels, papillary projections, nodules and masses. The image dataset was split for the constitution of training and validation datasets, with a ratio of 90% and 10%, respectively. The performance of the CNN was measured by calculating the area under the curve (AUC), sensitivity, specificity, and positive and negative predictive values (PPV and NPV, respectively).
Results: A total of 103,082 images from 164 D-SOC exams from the three centers were included (53,678 of malignant strictures and 49,404 of benign findings). Figure1 shows the predictions by the CNN. The model had an overall accuracy of 94.1%, a sensitivity of 93.5%, a specificity of 94.8%, a positive predictive value of 95.1%, a negative predictive value of 93.1% and an AUROC of 0.96.
Our group evaluated the performance of the CNN for the detection of morphological characteristics associated with malignancy, including papillary projections, nodules, masses and tumor vessels. The accuracy for the detection of these features was 90.8%, 93.6%, 93.2% and 78.1%, respectively. The AUC for the detection of each morphologic feature is shown in Figure 2.
Discussion: The potential of deep learning algorithms to impact the care of patients with suspected biliary malignancy is vast. The authors have expanded this line of research with a multicentre study including patients from two continents, thus increasing the variability of the dataset. This study assessed the performance of multiple CNNs for detecting and differentiating malignant and benign biliary disorders using a large pool of D-SOC images. These preliminary results provide a promising groundwork for further exploration of AI in this specific patient subset.

Figure 1: output obtained during the training and development of the convolutional neural network. The bars represent the probability estimated by the network. The finding with the highest probability was outputted as the predicted classification. B, benign biliary findings; M, malignant biliary stricture.
Figure 2: Receiver operating characteristic analysis of the networks’s performance in detection of malignant biliary strictures morphological features. ROC, receiver operating characteristic.