Publication
You can move, but you can't hide: identification of mobile genetic elements with geNomad
Publisher:
Cold Spring Harbor Laboratory
Date:
06-03-2023
DOI:
10.1101/2023.03.05.531206
Abstract: Identifying and characterizing mobile genetic elements (MGEs) in sequencing data is essential for understanding their ersity, ecology, biotechnological applications, and impact on public health. Here, we introduce geNomad, a classification and annotation framework that combines information from gene content and a deep neural network to identify sequences of plasmids and viruses. geNomad uses a large dataset of marker proteins to provide functional gene annotation and taxonomic assignment of viral genomes. Using a conditional random field model, geNomad also detects proviruses integrated into host genomes with high precision. In benchmarks that included erse MGE and chromosome sequences, geNomad significantly outperformed other tools in all evaluated clades of plasmids and viruses. Leveraging geNomad's speed and scalability, we were able to process public metagenomes and metatranscriptomes, leading to the discovery of millions of new viruses and plasmids that are available through the IMG/VR and IMG/PR databases. We anticipate that geNomad will enable further advancements in MGE research, and it is available at enomad.