609

PREDICTING INCIDENT ADENOCARCINOMA OF THE ESOPHAGUS OR GASTRIC CARDIA USING MACHINE LEARNING OF ELECTRONIC HEALTH RECORDS

Date
May 8, 2023
Explore related products in the following collection:

Society: AGA

Background: Tools predicting incident esophageal adenocarcinoma (EAC) and gastric cardia adenocarcinoma (GCA) that can be automated in electronic health records (EHRs) to guide screening decisions are needed. However, EHRs often have missing data for smoking, body mass index (BMI), and gastroesophageal reflux disease (GERD) which are important factors used by currently validated tools (Trøndelag Health [HUNT] and Kunzmann). We aimed to accurately predict EAC/GCA even with missing data using machine learning.
Methods: We performed retrospective analyses in the Veterans Health Administration (VHA) Corporate Data Warehouse among Veterans with ≥1 encounter between 2005 and 2018. Cases diagnosed with EAC/GCA were identified in the VHA Central Cancer Registry. The index date was the date of diagnosis for cases and randomly selected for controls. We collected prescriptions, laboratory results, and International Classification of Diseases diagnoses 1 to 5 years prior to index. We randomly divided the cohort into training (50%), preliminary validation (25%), and testing (25%). In the preliminary validation set, simple random sampling imputation and extreme gradient boosting machine learning were most accurate. In the test set, we compared the final model, the Kettles Esophageal and Cardia Adenocarcinoma predictioN (K-ECAN) Tool, to HUNT, Kunzmann, and published guidelines. To simulate a non-VHA population, we randomly under-sampled males. We ranked the proportion of the total gain in the loss function and the mean Shapley Additive Explanations for each variable.
Results: We identified 8,430 cases of EAC, 2,965 of GCA, and 10,256,887 controls. The mean age was 59.6 years, 92% were male, 80% white, and mean BMI 29.3 kg/m2. In the test set, K-ECAN was well calibrated (Figure 1) and had better discrimination (area under the receiver operating characteristics curve [AUC] = 0.77) than HUNT (AUC = 0.68), Kunzmann (AUC = 0.64), or guidelines (Figure 2). Using only data from 4-5 years prior to index slightly diminished its accuracy (AUC = 0.75). Under-sampling men to simulate a non-VHA population, the AUCs of HUNT and Kunzmann improved, but K-ECAN was still most accurate (AUC = 0.85, Figure 2). The most important variables influencing K-ECAN included 4 known risk factors (age, race, sex, BMI) and 9 novel (COPD, greater Hct, lower HDL, greater LDL, lower serum CO2, lower Na, lower BUN, lower ALT, and greater WBC). While GERD was strongly associated with EAC, it only contributed a small proportion of gain in information.
Conclusions: We developed and internally validated a novel prediction tool for incident EAC/GCA using EHR data. K-ECAN identifies individuals who are at increased risk for EAC/GCA ≥ 3 years in advance and is more accurate than published guidelines. Further work is needed to validate K-ECAN outside VHA and to assess how best to implement it within EHRs.
<b>Figure 1. Calibration Plot</b><br /> Predicted and observed risks are the cumulative incidences per 100,000 individuals in the testing set over the 14 years of ascertainment. Each dot represents 2% of the testing set (51,398 individuals).

Figure 1. Calibration Plot
Predicted and observed risks are the cumulative incidences per 100,000 individuals in the testing set over the 14 years of ascertainment. Each dot represents 2% of the testing set (51,398 individuals).

<b>Figure 2. Receiver Operating Characteristic Curves.</b><br /> <b>Panel A: Entire Testing Set. </b>AUCs [95% CIs] are displayed in parentheses.<br /> <b>Panel B: Simulated non-VHA Population</b><br /> Because Kunzmann and HUNT were both developed in populations that were roughly 50% male and both rely heavily on sex for classifying risk, this analysis simulated a non-VHA population by using all available female controls and cancer cases and under-sampled men by including a random selection of an equal number of male controls as female controls and a random sample of male cancer cases to match the expected odds ratio of male sex of 8.33 in the US population.<br /> ACG: American College of Gastroenterology, ACP: American College of Physicians, AGA: American Gastroenterological Association, ASGE: American Society for Gastrointestinal Endoscopy, BSG: British Society of Gastroenterology, ESGE: European Society for Gastrointestinal Endoscopy, HUNT: Trøndelag Health, VHA: Veterans Health Administration

Figure 2. Receiver Operating Characteristic Curves.
Panel A: Entire Testing Set. AUCs [95% CIs] are displayed in parentheses.
Panel B: Simulated non-VHA Population
Because Kunzmann and HUNT were both developed in populations that were roughly 50% male and both rely heavily on sex for classifying risk, this analysis simulated a non-VHA population by using all available female controls and cancer cases and under-sampled men by including a random selection of an equal number of male controls as female controls and a random sample of male cancer cases to match the expected odds ratio of male sex of 8.33 in the US population.
ACG: American College of Gastroenterology, ACP: American College of Physicians, AGA: American Gastroenterological Association, ASGE: American Society for Gastrointestinal Endoscopy, BSG: British Society of Gastroenterology, ESGE: European Society for Gastrointestinal Endoscopy, HUNT: Trøndelag Health, VHA: Veterans Health Administration


Tracks

Related Products

Thumbnail for IS IT MEDICALLY NECESSARY TO DE-PRESCRIBE PPIS?: NO
IS IT MEDICALLY NECESSARY TO DE-PRESCRIBE PPIS?: NO
This annual session will provide DDW attendees the opportunity to hear opposing viewpoints from leading GI experts on the most challenging issues in clinical gastroenterology and hepatology, in a debate format…
Thumbnail for AGA Best Practices in Eosinophilic Esophagitis
AGA Best Practices in Eosinophilic Esophagitis
The field of EoE is undergone remarkable and dramatic changes in the last decade. With the advent of biologics, a better understanding of all current treatment options and how to pick the right option for a given patient is needed…
Thumbnail for ROUTINE ANORECTAL FUNCTION TESTING IS MORE COST-EFFECTIVE THAN EMPIRIC TREATMENT OF CHRONIC CONSTIPATION
ROUTINE ANORECTAL FUNCTION TESTING IS MORE COST-EFFECTIVE THAN EMPIRIC TREATMENT OF CHRONIC CONSTIPATION
Fecal incontinence is associated with substantial impacts on quality of life. Prevalence estimates range from 2% to as high as 20% based on the methodology used, and is likely underreported by patients. Prior studies have demonstrated inconsistent effects of body-mass index on fecal incontinence…
Thumbnail for DELIVERY OF OUTPATIENT CIRRHOSIS CARE VIA TELEHEALTH IS NOT ASSOCIATED WITH INCREASED MORTALITY AS COMPARED TO TRADITIONAL IN-PERSON CARE
DELIVERY OF OUTPATIENT CIRRHOSIS CARE VIA TELEHEALTH IS NOT ASSOCIATED WITH INCREASED MORTALITY AS COMPARED TO TRADITIONAL IN-PERSON CARE
INTRODUCTION: A number of medications have been implicated as primary risk factors for microscopic colitis (MC) and guidelines recommend discontinuation of these medications as part of MC management…