Publication
Meta-analysis of metagenomes via machine learning and assembly graphs reveals strain switches in Crohn’s disease
Publisher:
Cold Spring Harbor Laboratory
Date:
05-07-2022
DOI:
10.1101/2022.06.30.498290
Abstract: Microbial strains have closely related genomes but may have different phenotypes in the same environment. Shotgun metagenomic sequencing can capture the genomes of all strains present in a community but strain-resolved analysis from shotgun sequencing alone remains difficult. We developed an approach to identify and interrogate strain-level differences in groups of metagenomes. We use this approach to perform a meta-analysis of stool microbiomes from in iduals with and without inflammatory bowel disease (IBD Crohn’s disease, ulcerative colitis n = 605), a disease for which there are not specific microbial biomarkers but some evidence that microbial strain variation may stratify by disease state. We first developed a machine learning classifier based on compressed representations of complete metagenomes (FracMinHash sketches) and identified genomes that correlate with IBD subtype. To rescue variation that may not have been present in the genomes, we then used assembly graph genome queries to recover strain variation for correlated genomes. Lastly, we developed a novel differential abundance framework that works directly on the assembly graph to uncover all sequence variants correlated with IBD. We refer to this approach as dominating set differential abundance analysis and have implemented it in the spacegraphcats software package . Using this approach, we identified five bacterial strains that are associated with Crohn’s disease. Our method captures variation within the entire sequencing data set, allowing for discovery of previously hidden disease associations.