Background and Aims: Dysbiosis of gut virus community is associated with colorectal cancer (CRC), but viral features and genomic function remain unclear. In this study, we aimed to uncover the gut virus lifestyle, gene function, single nucleotide polymorphism (SNP) and structural variation (SV), abundance and phage-host interaction in CRC based on assembled gut virus genomes.
Methods: Faecal samples were collected from CRC patients (n=11) and healthy controls (n=10), followed by virus-like particle (VLP) isolation. Extracted VLP DNA (length > 15 kb, at least 10 μg / sample) was directly sequenced using PacBio sequencing platforms. Gut virus genomes were assembled and annotated using Canu and EggNog. Differential abundance viruses were analyzed using DEseq2. SNP/SV were analyzed using MetaPop and Sniffles2. The host of virus was predicted using iPHoP.
Results: A total of 23,992 high-quality gut virus genomes were assembled, including 9045 novel virus genomes. More novel viruses were detected in CRC (41.23%) compared to healthy control (34.44%). The proportion of phages, especially temperate phages was significantly higher in CRC (37.29%) than in healthy control (33.71%) (P < 0.001). Based on the KEGG pathway analyses of virus genes, we identified the enrichment of pathways in CRC, including DNA repair and recombination, homologous recombination and mismatch repair pathways compared to healthy control. In addition, we found 71 viruses showed significantly higher SNPs in CRC compared to healthy control (P < 0.05). A total of 2025 virus genes showed higher frequency of SNPs in CRC compared to healthy control. Among them, 15 genes with differential SNPs could distinguish CRC from healthy control with an AUC of 88.03% using a random forest model. Moreover, 15differential abundance viruses could distinguish CRC from healthy control with an AUC of 94.12% in our training cohort. Its performance was validated in five global cohorts with AUCs of 71.94%-81.72%. Finally, we found that gut phage-host bacteria interaction was greatly altered in CRC compared to healthy control, which involving CRC-associated bacteria, such as Bacteroides fragilis and Fusobacterium nucleatum. Especially, both the genome number (68) and relative abundance (4.28%) of B. fragilis-hosted phages were significantly lower in CRC than in healthy control (129 and 4.86%, P < 0.05).
Conclusions: Multidimensional analysis based on PacBio sequencing demonstrated that gut virus communities, lifestyle, gene function, SNP/SV and phage-host interaction were significantly altered in CRC patients as compared to healthy control. Gut virus abundance and SNPs are potential biomarkers for the diagnosis of CRC.