Society: AGA
Background & Aim:
For training deep learning algorithms in medical imaging, application-specific data is often scarce. Deep learning systems are therefore generally pretrained on large publicly available labeled data sets of general imagery, unrelated to the envisioned application, to have the algorithm learn basic features from these widely available data followed by refinement training on the generally scarce application-specific images. Pretraining might be more effective if the images for pretraining resemble the envisioned application, i.e., domain-specific pretraining. We investigated if pretraining on general endoscopic imagery results in a better performance of five existing AI systems with an application in gastro-intestinal endoscopy, compared to current state-of-the art pretraining approaches (i.e., supervised pretraining with ImageNet and semi-weakly supervised pretraining with the Billion-scale data set).
Methods:
Our group has created an endoscopy-specific dataset called GastroNet for pretraining deep learning systems in endoscopy. GastroNet consists of 5,084,494 endoscopic images retrospectively collected between 2012 and 2020 in seven Dutch hospitals. We created four pretrained models: one using GastroNet and three using ImageNet and/or the Billion-scale data set. The pretraining method was either supervised, self-supervised, or semi-weakly supervised. The pretrained models were subsequently trained towards five independent, commonly used applications in GI endoscopy, using their original application-specific datasets. Outcome parameters were: 1) classification and/or localization performance of the five trained applications; 2) change in performance when the number of available application-specific training data was reduced, to investigate a possible difference in performance drop for the different pretrained models. The different combinations of pretraining data & method, test sets and downstream task are visualized in Figure 1.
Results:
Overall, the domain-specific pretrained model resulted in a statistically superior performance for the five different GI applications. More detailed results are presented in Table 1. The superiority was also reflected in a smaller drop in performance when the number of application-specific training data were reduced artificially.
Conclusion:
Domain-specific pretraining, using unlabeled general endoscopic images, is superior to current state-of-the-art pretraining approaches for developing deep learning algorithms in GI endoscopy. It also allows more effective use of the generally scarce application-specific endoscopy images. These findings might cause a paradigm shift in the development of AI systems in endoscopy.

Figure 1. Flow diagram of different pretraining methods, data sets and downstream tasks.
Table 1. Overview of performance of the five different application-specific data sets using four different pretrained models. Cells highlighted in green represent the highest scoring pretrained model per application-specific data set.