Background: Control tissue for research studies is essential; however, endoscopy in children is an invasive procedure performed only when clinically indicated. Therefore, little is known about the true “normal” tissue that is used as control. This study aims to provide in-depth exploration of “normal” duodenal tissue.
Methods: Pediatric archival duodenal biopsies without histopathologic abnormalities from initial endoscopies at the University of Virginia (UVA) were obtained. Duodenal mRNA-seq data from additional patients with no duodenal gastrointestinal (GI) diseases per chart review were also obtained. Biopsy digitization is described in Figure 1A. Clinical metadata characterization, machine learning-based (ML) image analysis, and gene expression profiling are described in Figure 1B-D. Clinical characterization included K-means unsupervised clustering, pathologist review of patches from each cluster, and classification using decision trees. Image analysis consisted of unsupervised clustering of biopsy image patches, followed by pathologist review, and a ML model was trained for cell annotations. Analysis of differential gene expression of patients with non-duodenal GI disease and patients without GI inflammation is ongoing. Statistical significance was calculated using Welch’s t-test for continuous variables and chi-squared test for categorical variables, with α set to 0.05.
Results: 198 pediatric patients had initial endoscopies at UVA between 7/27/17-2/28/18 with no duodenal abnormalities. Cohort demographics, including breakdown of those with non-duodenal GI inflammation (n=43) and no GI inflammation (n=155) is shown in Table 1. Biopsy digitization produced 399 WSIs and 221,897 patches. Unsupervised clustering (Figure 1B) of clinical metadata yielded 5 clusters, and pathologist review did not appreciate differences in representative patches from each cluster. Decision tree clinical classification of patients with non-duodenal inflammation versus control was 95% accurate (Figure 1B). Unsupervised clustering of patches revealed 9 clusters and pathologist review of representative patches did not reveal appreciable patterns (Figure 1C). A ML cell annotation pipeline pilot over-predicted epithelial cells, and model retraining is ongoing. RNAseq data was available for 68 patients, including 30 patients with non-duodenal GI inflammation (Figure 1D).
Conclusions: This investigation of control pediatric duodenal tissue includes multiple levels of characterization. Our results indicate that there is no histologic difference in control duodenal tissue between patients with non-duodenal GI disease and patients without any GI tract disease. Future analysis will examine tissue transcriptomics. Furthermore, we found that the number of repeat clinical visits and endoscopies are greater predictors of inflammatory diagnosis versus other clinical metadata.

Figure 1: Overview of analysis workflow of control pediatric duodenal tissue. 1A Slide digitization workflow of duodenal tissue. 1BClinical metadata characterization, including unsupervised clustering and clinical classification of data. 1C Machine Learning based characterization, including unsupervised clustering of patches and machine learning cell annotation prediction modeling. 1D Gene expression profiling of mRNA-seq data from normal duodenal tissue.
Table 1: Demographics and clinical metadata for the total cohort and broken down by those with non-duodenal GI tract inflammation and no GI tract inflammation.