Background: The genetic contribution to inflammatory bowel disease (IBD) spans monogenic and polygenic forms. Monogenic variants with moderate to high penetrance are enriched in individuals with very early onset IBD who present before 6 years of age. To date, 102 genes have been reported to cause monogenic IBD, which affect essential pathways in intestinal homeostasis and immunity, including epithelial barrier function, phagocyte activity, immunometabolism, T and B cell development, and cytokine signaling. In this study, we systematically investigated genotype-phenotype associations in predicted gain-of-function (GOF) and loss-of-function (LOF) variants of monogenic IBD genes in the general population using a phenome-wide association study (PheWAS) approach.
Methods: Predicted GOF and LOF variants were extracted from 102 genes reported to cause monogenic IBD based on LoGoFunc and LOFTEE methods. Variant level PheWAS was performed using whole exome sequencing (WES) cohorts from BioMe Regeneron (n = 29,476) for discovery and BioMe Sema4 (n = 14,984) and UK Biobank (n = 180,501) for validation. Analyses by ancestry (European, African, Latino) were performed for BioMe Biobank cohorts. ICD-9 and ICD-10 diagnoses were mapped to 1,856 phecodes, and variant-level tests were performed using Firth’s logistic regression.
Results: LOF variants in ADA, ADA2, COL7A1, FMNL2, ITGB2, LRBA, SKIV2L, and TRIM22 and GOF variants in DOCK2, G6PC3, PIK3CD, and SAMD9 were significantly associated with increased risk of IBD related phecodes in Regeneron and replicated in at least one validation cohort. These variants were also associated with phenotypes across multiple organ systems. rs61732239 in ADA (LOF) was associated with increased risk of diffuse diseases of connective tissue and infections in Regeneron and Sema4 cohorts. rs115986203 in ADA2 (LOF) was associated with gastritis and duodenitis, bacterial infections, and atherosclerosis in Regeneron and Sema4 cohorts. COL7A1 LOF variants showed increased risk of malignancies as well as autoimmune, musculoskeletal, and infection phecodes in BioMe Biobank cohorts and UK Biobank. rs72719663 in LRBA (LOF) was associated with increased risk of adrenal insufficiency, osteitis deformans, and atrial fibrillation in Regeneron and Sema4 cohorts. rs150969388 in DOCK2 (GOF) was associated with chronic liver disease, malignancy, and arrhythmia in Regeneron and UK Biobank.
Conclusions: We observed variant-phenotype associations in IBD, gastrointestinal conditions, infections, autoimmunity, and malignancy in monogenic IBD genes. Increased risk of atherosclerosis and arrhythmias was found in multiple variants, suggesting pleiotropic effects with cardiovascular diseases. BioMe ancestry-specific cohorts revealed multiple significant associations, highlighting the importance of inclusion of diverse cohorts in genomic studies.

Variant level PheWAS results: Selected significant variant level findings (p<0.05) in a subset of VEO-IBD variants in Regeneron BioMe Biobank cohort (n = 29,476) are shown in the correlation plot. Findings were replicated in at least one validation cohort: Sema4 BioMe, UK Biobank, BioMe ancestry-specific cohorts. The size of the point corresponds to p-value (-log10(p)) in Regeneron BioMe and the color represents the direction of association, with red indicating increased risk and blue indicating decreased risk.