Aims: to develop and validate a new scale for mucosal visualization of the upper gastrointestinal tract during esophagogastroduodenoscopy (EGD), the Gastroscopy RAte of Cleanliness Evaluation (GRACE), as a quality standard tool through the application of a standardized, reliable, and validated scoring system.
Methods: a cross-sectional study was conducted in a multicenter international study. The GRACE scale is based on the evaluation of three different anatomic areas (esophagus, stomach, and duodenum) with 4 different grades of cleanliness (from 0-worst to 3-excellent). A score of 0 to 3 was assigned to each segment and then summed up for a total score ranging from 0 to 9 (Figure 1). In the first phase, four expert endoscopists evaluated 60 selected images twice with a two-week interval; in the second phase, the same 60 images were scored twice again with a two-week interval by one expert and one non-expert endoscopist from 27 different Endoscopy Departments Worldwide. For reproducibility assessment and clinical validation, in a third phase, the same mix of experts and non-expert endoscopists performed a real-time application of the scale on consecutive patients undergoing gastroscopy in their own center, and the evaluations were compared with the original experts. Intra-rater reliability was assessed by Fleiss kappa, Inter-rater reliability by Intraclass correlation coefficient (ICC), and perclass agreement by k for individual categories; for these assessments, almost perfect agreement was defined as >0.80.
Results: in the first phase, the intra-rater Fleiss kappa was 0.89 [95% Confidence Interval (CI) 0.81-0.97], whilst the inter-rater ICC was 0.91 (95% CI 0.87-0.94) for single measures. In the second phase, 27 centers and 54 endoscopists participated (27 experts, 27 non-experts). The overall intra-rater Fleiss kappa was 0.85 (95% CI 0.83-0.87): between experts 0.86 (95% CI 0.83-0.86) and between non-experts 0.88 (95% CI 0.85-0.91), whilst the inter-rater ICC was 0.92 (95% CI 0.89-0.94) for single measures. The perclass analysis for scores 0, 1, 2 and 3 were: 1.00, 0.94, 0.87 and 0.93 in the first phase, and 0.97, 0.89, 0.85 and 0.92 in the second phase, respectively.
In the third phase, 1008 images were evaluated: the inter-rater ICC was 0.86 (95% CI 0.84-0.87) for single measures.
Conclusions: the GRACE scale for esophagogastroduodenoscopy showed almost perfect results in terms of reproducibility, in intra-rater, inter-rater reliability and perclass agreement, and the results were validated in a worldwide clinical setting. The real-time clinical application of this new cleanliness evaluation scale of the upper gastrointestinal tract during EGD could represent a very important tool to standardize the evaluation of mucosal visibility, push endoscopists to obtain excellent visibility and reduce the risk of missing lesions.
