Background:
It is well known of the requirement of HLA-DQ2/8 for developing CeD, but only 3% of DQ2/8 carriers have CeD. Therefore, a systemic approach is required to improve the clinical utility of genetic testing and providing a better genetic risk stratification. The All of Us (AOU) research program is an NIH supported project with the goal of enrolling a diverse group of participants in the United States and conducting both a genetic and environmental study to advance populational health. The aim of this study is to leverage the whole genome sequencing (WGS) data in AOU to identify additional genetic risks for CeD.
Methods:
We conducted a search for healthy controls and CeD cases in AOU data base, it contains WGS data for 196,544 controls and 1360 CeD patients (AOU-CeD) (Figure 1A). After a propensity matching by race, gender, ethnicity, and sex, a total of 1356 CeD patients and 3496 controls were included in our analysis, demographics were shown in Figure 1B. We then conducted a Genome Wide Association Study (GWAS) using a logistic regression model for 32 million genetic variants with a minor allele frequency > 1%. In addition, we imputed HLA genotypes for 7 HLA genes (A, B, C, DPB1, DQA1, DQB1, and DRB1) using HLA Genotype Imputation with Attribute Bagging (HIBAG).
Results:
Our GWAS on AOU-CeD demonstrated a significant genome-wide association in the MHC region of chromosome 6, with the top SNP in HLA-DQB1 gene (rs9274474, p=1.14e-71, OR=2.61, Figure 1C). A fine mapping of MHC locus revealed the positive association of a few immune related genes including C2 and TNF (Figure 1D). The strongest risk allele was presented in SLC45A1 gene with an odds ratio of 7.95 (rs187699606, p=2.8e-6). In addition, several SNPs outside of the MHC were shown to be associated at lower significance. There gene have been shown to play essential role in intestinal homeostasis and immune response. One significant variant (rs553549332, RAD51B, p=6.6e-6, OR=4.51) was located in the locus 14q24.1, which was previously reported in CeD. Overall, 28 novel loci were uncovered through this study. HLA-typing shows that the DQB1*02:01 (OR=2.81, Table 1) and DQA1*05:01 (OR=2.84) alleles are enriched in the CeD cohort, and the distribution of HLA-DQ genotype is different to the CeD cohort in UK biobank. In addition, we found AOU-CeD cohort was significantly enriched for the following alleles: A*01:01 (OR=1.53), B*08:01 (OR=2.41), C*07:01 (OR=1.90), and DRB1*03:01 (OR=2.81, Table 1)
Conclusion:
This analysis on the AOU-CeD revealed multiple novel non-HLA genetic associations and a different HLA-DQ genotype frequency in US cohort. We also found that HLA A*01:01, B*08:01 and C*07:01 are the risk alleles for CeD. A systemic approach combining all the risk alleles might improve the clinical value of genetic testing in CeD.

Figure 1: GWAS and fine mapping of CeD using WGS data from “All of US”.
(A) A workflow of CeD patient and healthy control selection process. Participants with available survey and WGS data were selected using a cohort builder tool. Propensity matching was conducted using reported race, age, ethnicity, and sex at birth. (B) The basic demographics for selected CeD patients and healthy controls. (C) GWAS was performed using variants with MAF>0.01. (D) A fine mapping of the MHC region to show variants with a significance threshold of p <10-8.

Table 1: HIBAG based HLA typing on WGS data in AOU-CeD. (Top) The most common three alleles of seven HLA genes in AOU-CeD and healthy controls. P-values are calculated from a chi-square test of contingency. (Bottom) HLA-DQ genotype frequency for the known risk alleles (DQ2.5, DQ2.2, DQ8, DQ7.5) for CeD. ‘X’ indicates any HLA-DQ allele which is not known to be a risk factor for CeD. *Counts less than 20 are suppressed due to the “All of Us” data dissemination policy.