Background
Conscious sedation is widely used during endoscopy to reduce patients’ anxiety and discomfort. After sedated endoscopy, patients stay in the recovery room and have their consciousness level assessed by nurses before discharging home from the hospital. However, various contextual factors, such as high workload, limited recovery space, and patient demands, can influence and burden the decision-making process.
Aims
We aimed to develop an artificial intelligence (AI)-assisted voice analytic tool to assess the consciousness level of patients after sedated endoscopy.
Methods
Patients who were over 18 years old and undergoing outpatient sedated endoscopy of any type were recruited from a university-affiliated hospital in Hong Kong. Patients were asked to read from 1 to 7 in Cantonese (one of the dialects of Chinese). The recorded voices before and immediately after sedated endoscopy were annotated as “conscious” and “unconscious”, respectively. To enlarge our sample size, we performed data augmentation techniques such as noise injection, shifting time, changing pitch, and changing speed. Features and Mel-spectrogram were extracted from each audio and fed to different supervised machine learning classification models. 5-fold cross-validations were performed, and the accuracy and AUC of each fold were used to evaluate the model performance. The model with the best performance was applied to another cohort of patients who speak a different dialect of Chinese (Mandarin) at a hospital in China for external validation.
Results
From September to October 2021, a total of 100 patients who speak Cantonese were recruited from Hong Kong. 1000 audio samples were included for modelling after data augmentation. The accuracies and AUCs of all techniques are presented in Table 1. Among these methods, CNN showed the highest 5-fold mean accuracy and AUC with 89.20 and 0.95, respectively, which exceed other machine learning methods. The training process of the CNN model is presented in Figure 1. Another 100 patients who speak Mandarin were recruited from China for external validation of the CNN model. After fine-tuning, it achieved accuracy and an AUC of 84.14 and 0.91, respectively.
Conclusion
We developed and validated the AI-assisted voice analytic model that efficiently identifies Cantonese and Mandarin-speaking patients who have fully recovered after sedated endoscopy. Further randomized controlled trials are warranted to confirm its clinical effectiveness in comparison to nurse evaluation alone.

Table 1. Performance of different machine learning models in identifying the conscious level of patients after sedated endoscopy.
Figure 1. Training process of the Convolution Neural Network-based consciousness prediction system.