ARDC Research Link Australia

Publication

The importance of complete genome sequences

Publisher: Elsevier BV

Date: 05-2002

DOI: 10.1016/S0966-842X(02)02354-5

Publication

Taxonomy of prokaryotic viruses: 2016 update from the ICTV bacterial and archaeal viruses subcommittee.

Publisher: Springer Science and Business Media LLC

Date: 31-12-2017

DOI: 10.1007/S00705-016-3173-4

Publication

Evolution of microbial pathogens

Publisher: Elsevier BV

Date: 03-2000

DOI: 10.1016/S0168-9525(99)01965-4

Publication

hafeZ: Active prophage identification through read mapping

Publisher: Cold Spring Harbor Laboratory

Date: 23-07-2021

DOI: 10.1101/2021.07.21.453177

Abstract: Bacteriophages that have integrated their genomes into bacterial chromosomes, termed prophages, are widespread across bacteria. Prophages are key components of bacterial genomes, with their integration often contributing novel, beneficial, characteristics to the infected host. Likewise, their induction—through the production and release of progeny virions into the surrounding environment—can have considerable ramifications on bacterial communities. Yet, not all prophages can excise following integration, due to genetic degradation by their host bacterium. Here, we present hafeZ, a tool able to identify ‘active’ prophages (i.e. those undergoing induction) within bacterial genomes through genomic read mapping. We demonstrate its use by applying hafeZ to publicly available sequencing data from bacterial genomes known to contain active prophages and show that hafeZ can accurately identify their presence and location in the host chromosomes. hafeZ is implemented in Python 3.7 and freely available under an open-source GPL-3.0 license from github.com/Chrisjrt/hafeZ . Bugs and issues may be reported by submitting them via the hafeZ github issues page. cturkington@ucmerced.edu or chrisjrt1@gmail.com

Publication

The human gut virome: Composition, colonisation, interactions, and impacts on human health

Publisher: Center for Open Science

Date: 26-09-2022

DOI: 10.31219/OSF.IO/S9PX2

Abstract: The gut virome is an incredibly complex part of the gut ecosystem. Gut viruses play a role in many disease 9 states, but it is not yet known to what extent the gut virome impacts everyday human health. New 10 experimental and bioinformatic approaches are required to address this knowledge gap. Gut virome 11 colonisation begins at birth, becoming temporally stable over a 30-month period. The stable virome is highly 12 specific to each in idual and is modulated by varying factors such as age, gender, diet, and disease state. 13 The gut virome is primarily composed of bacteriophages—predominantly crAssphage, and other 14 Caudovirales. The stability of the virome’s regular constituents is disrupted by disease. Transferring the 15 faecal microbiome and its viruses from a healthy in idual can restore the functionality of the gut, and 16 alleviates the symptoms of some chronic illnesses. Investigation of the virome is a relatively novel field with 17 new genetic sequences being published at an increasing rate. Many of these present with a large percentage 18 of unknown proteins, this ‘viral dark matter’ is one of the major questions facing virologists and 19 bioinformaticians. Bioinformatics tools can be used to explore both new and existing publicly available viral 20 sequence datasets to quantify and classify viral species, but not all these tools are created equal, with some 21 being more precise, effective, and efficient than others. Here, we review the literature surrounding the gut 22 virome, how it is established, how it impacts human health, the methods used to investigate it, and the viral 23 dark matter veiling the understanding of the gut virome.

Publication

The Marine Viromes of Four Oceanic Regions

Publisher: Public Library of Science (PLoS)

Date: 07-11-2006

DOI: 10.1371/JOURNAL.PBIO.0040368

Publication

Aging and Intermittent Fasting Impact on Transcriptional Regulation and Physiological Responses of Adult Drosophila Neuronal and Muscle Tissues

Publisher: MDPI AG

Date: 10-04-2018

DOI: 10.3390/IJMS19041140

Publication

The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes

Publisher: Public Library of Science (PLoS)

Date: 11-12-2009

DOI: 10.1371/JOURNAL.PCBI.1000593

Publication

Ten simple rules and a template for creating workflows-as-applications

Publisher: Public Library of Science (PLoS)

Date: 15-12-2022

DOI: 10.1371/JOURNAL.PCBI.1010705

Publication

NCBI’s Virus Discovery Codeathon: Building “FIVE” —The Federated Index of Viral Experiments API Index

Publisher: MDPI AG

Date: 10-12-2020

DOI: 10.3390/V12121424

Abstract: Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus–host pairings. Variability in the context of viral genomic ersity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome ersity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.

Publication

Essential genes on metabolic maps

Publisher: Elsevier BV

Date: 10-2006

DOI: 10.1016/J.COPBIO.2006.08.006

Abstract: Within the past five years genome-scale gene essentiality data sets have been published for ten erse bacterial species. These data are a rich source of information about cellular networks that we are only beginning to explore. The analysis of these data, very heterogeneous in nature, is a challenging task. Even the definition of 'essential genes' in various genome-scale studies varies from genes 'absolutely required for survival' to those 'strongly contributing to fitness' and robust competitive growth. A comparative analysis of gene essentiality across multiple organisms based on projection of experimentally observed essential genes to functional roles in a collection of metabolic pathways and subsystems is emerging as a powerful tool of systems biology.

Publication

THEA: A novel approach to gene identification in phage genomes

Publisher: Cold Spring Harbor Laboratory

Date: 15-02-2018

DOI: 10.1101/265983

Abstract: Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their underlying design they are unable to capture the full complexity of phage genomes. Phages have adapted their genomes to be extremely compact, having adjacent genes that overlap, and genes completely inside of other longer genes. This non-delineated genome structure makes it difficult for gene prediction using the currently available gene annotators. Here we present THEA (The Algorithm), a novel method for gene calling specifically designed for phage genomes. While the compact nature of genes in phages is a problem for current gene annotators, we exploit this property by treating a phage genome as a network of paths: where open reading frames are favorable, and overlaps and gaps are less favorable, but still possible. We represent this network of connections as a weighted graph, and use graph theory to find the optimal path. We compare THEA to other gene callers by annotating a set of 2,133 complete phage genomes from GenBank, using THEA and the three most popular gene callers. We found that the four programs agree on 82% of the total predicted genes, with THEA predicting significantly more genes than the other three. We searched for these extra genes in both GenBank’s non-redundant protein database and sequence read archive, and found that they are present at levels that suggest that these are functional protein coding genes. The source code and all files can be found at: eprekate/THEA Katelyn McNair: deprekate@gmail.com

Publication

A bioinformatic analysis of ribonucleotide reductase genes in phage genomes and metagenomes

Publisher: Springer Science and Business Media LLC

Date: 07-02-2013

DOI: 10.1186/1471-2148-13-33

Abstract: Ribonucleotide reductase (RNR), the enzyme responsible for the formation of deoxyribonucleotides from ribonucleotides, is found in all domains of life and many viral genomes. RNRs are also amongst the most abundant genes identified in environmental metagenomes. This study focused on understanding the distribution, ersity, and evolution of RNRs in phages (viruses that infect bacteria). Hidden Markov Model profiles were used to analyze the proteins encoded by 685 completely sequenced double-stranded DNA phages and 22 environmental viral metagenomes to identify RNR homologs in cultured phages and uncultured viral communities, respectively. RNRs were identified in 128 phage genomes, nearly tripling the number of phages known to encode RNRs. Class I RNR was the most common RNR class observed in phages (70%), followed by class II (29%) and class III (28%). Twenty-eight percent of the phages contained genes belonging to multiple RNR classes. RNR class distribution varied according to phage type, isolation environment, and the host’s ability to utilize oxygen. The majority of the phages containing RNRs are Myoviridae (65%), followed by Siphoviridae (30%) and Podoviridae (3%). The phylogeny and genomic organization of phage and host RNRs reveal several distinct evolutionary scenarios involving horizontal gene transfer, co-evolution, and differential selection pressure. Several putative split RNR genes interrupted by self-splicing introns or inteins were identified, providing further evidence for the role of frequent genetic exchange. Finally, viral metagenomic data indicate that RNRs are prevalent and highly dynamic in uncultured viral communities, necessitating future research to determine the environmental conditions under which RNRs provide a selective advantage. This comprehensive study describes the distribution, ersity, and evolution of RNRs in phage genomes and environmental viral metagenomes. The distinct distributions of specific RNR classes amongst phages, combined with the various evolutionary scenarios predicted from RNR phylogenies suggest multiple inheritance sources and different selective forces for RNRs in phages. This study significantly improves our understanding of phage RNRs, providing insight into the ersity and evolution of this important auxiliary metabolic gene as well as the evolution of phages in response to their bacterial hosts and environments.

Publication

Variability and host density independence in inductions-based estimates of environmental lysogeny

Publisher: Springer Science and Business Media LLC

Date: 28-04-2017

DOI: 10.1038/NMICROBIOL.2017.64

Publication

MultiPhATE2: code for functional annotation and comparison of phage genomes

Publisher: Oxford University Press (OUP)

Date: 17-03-2021

DOI: 10.1093/G3JOURNAL/JKAB074

Abstract: To address a need for improved tools for annotation and comparative genomics of bacteriophage genomes, we developed multiPhATE2. As an extension of multiPhATE, a functional annotation code released previously, multiPhATE2 performs gene finding using multiple algorithms, compares the results of the algorithms, performs functional annotation of coding sequences, and incorporates additional search algorithms and databases to extend the search space of the original code. MultiPhATE2 performs gene matching among sets of closely related bacteriophage genomes, and uses multiprocessing to speed computations. MultiPhATE2 can be re-started at multiple points within the workflow to allow the user to examine intermediate results and adjust the subsequent computations accordingly. In addition, multiPhATE2 accommodates custom gene calls and sequence databases, again adding flexibility. MultiPhATE2 was implemented in Python 3.7 and runs as a command-line code under Linux or MAC operating systems. Full documentation is provided as a README file and a Wiki website.

Publication

A Distinct Contractile Injection System Found in a Majority of Adult Human Microbiomes

Publisher: Cold Spring Harbor Laboratory

Date: 05-12-2019

DOI: 10.1101/865204

Abstract: An imbalance of normal bacterial groups such as Bacteroidales within the human gut is correlated with diseases like obesity. A current grand challenge in the microbiome field is to identify factors produced by normal microbiome bacteria that cause these observed health and disease correlations. While identifying factors like a bacterial injection system could provide a missing explanation for why Bacteroidales correlates with host health, no such factor has been identified to date. The lack of knowledge about these factors is a significant barrier to improving therapies like fecal transplants that promote a healthy microbiome. Here we show that a previously ill-defined Contractile Injection System is carried in the gut microbiome of 99% of in iduals from the United States and Europe. This type of Contractile Injection System, we name here Bacteroidales Injection System (BIS), is related to the contractile tails of bacteriophage (viruses of bacteria) and have been described to mediate interactions between bacteria and erse eukaryotes like amoeba, insects and tubeworms. Our findings that BIS are ubiquitous within adult human microbiomes suggest that they shape host health by mediating interactions between Bacteroidales bacteria and the human host or its microbiome.

Publication

Low-Molecular-Weight Protein Tyrosine Phosphatases of Bacillus subtilis

Publisher: American Society for Microbiology

Date: 07-2005

DOI: 10.1128/JB.187.14.4945-4956.2005

Abstract: In gram-negative organisms, enzymes belonging to the low-molecular-weight protein tyrosine phosphatase (LMPTP) family are involved in the regulation of important physiological functions, including stress resistance and synthesis of the polysaccharide capsule. LMPTPs have been identified also in gram-positive bacteria, but their functions in these organisms are presently unknown. We cloned two putative LMPTPs from Bacillus subtilis , YfkJ and YwlE, which are highly similar to each other in primary structure as well as to LMPTPs from gram-negative bacteria. When purified from overexpressing Escherichia coli strains, both enzymes were able to dephosphorylate p -nitrophenyl-phosphate and phosphotyrosine-containing substrates in vitro but showed significant differences in kinetic parameters and sensitivity to inhibitors. Transcriptional analyses showed that yfkJ was transcribed at a low level throughout the growth cycle and underwent a σ B -dependent transcriptional upregulation in response to ethanol stress. The transcription of ywlE was growth dependent but stress insensitive. Genomic deletion of each phosphatase-encoding gene led to a phenotype of reduced bacterial resistance to ethanol stress, which was more marked in the ywlE deletion strain. Our study suggests that YfkJ and YwlE play roles in B. subtilis stress resistance.

Publication

Draft Genome Sequence of Cylindrospermopsis raciborskii (Cyanobacteria) Strain ITEP-A1 Isolated from a Brazilian Semiarid Freshwater Body: Evidence of Saxitoxin and Cyli

Publisher: American Society for Microbiology

Date: 30-06-2016

DOI: 10.1128/GENOMEA.00228-16

Abstract: Cylindrospermopsis raciborskii ITEP-A1 is a saxitoxin-producing cyanobacterium. We report the draft genome sequence of ITEP-A1, which comprised 195 contigs that were assembled with SPAdes and annotated with Rapid Annotation using Subsystem Technology. The identified genome sequence had 3,605,836 bp, 40.1% G+C, and predicted 3,553 coding sequences (including the synthetase genes).

Publication

Genomic analysis and growth-phase-dependent regulation of the SEF14 fimbriae of Salmonella enterica serovar Enteritidis The GenBank accession number for the sequence reported in this pa

Publisher: Microbiology Society

Date: 10-2001

DOI: 10.1099/00221287-147-10-2705

Abstract: Salmonella enterica serovar Enteritidis is a leading cause of food poisoning in the USA and Europe. Although Salmonella serovars share many fimbrial operons, a few fimbriae are limited to specific Samonella serovars. SEF14 fimbriae are restricted to group D Salmonella and the genes encoding this virulence factor were acquired relatively recently. Genomic, genetic and gene expression studies have been integrated to investigate the ancestry, regulation and expression of the sef genes. Genomic comparisons of the Salmonella serovars sequenced revealed that the sef operon is inserted in leuX in Salmonella Enteritidis, Salmonella Paratyphi and Salmonella Typhi, and revealed the presence of a previously unidentified 25 kb pathogenicity island in Salmonella Typhimurium at this location. Salmonella Enteritidis contains a region of homology between the Salmonella virulence plasmid and the chromosome downstream of the sef operon. The sef operon itself consists of four co-transcribed genes, sefABCD, and adjacent to sefD there is an AraC-like transcriptional activator that is required for expression of the sef genes. Expression of the sef genes was optimal during growth in late exponential phase and was repressed during stationary phase. The regulation was coordinated by the RpoS sigma factor.

Publication

Microbes, metagenomes and marine mammals: enabling the next generation of scientist to enter the genomic era

Publisher: Springer Science and Business Media LLC

Date: 04-09-2013

DOI: 10.1186/1471-2164-14-600

Abstract: The revolution in DNA sequencing technology continues unabated, and is affecting all aspects of the biological and medical sciences. The training and recruitment of the next generation of researchers who are able to use and exploit the new technology is severely lacking and potentially negatively influencing research and development efforts to advance genome biology. Here we present a cross-disciplinary course that provides undergraduate students with practical experience in running a next generation sequencing instrument through to the analysis and annotation of the generated DNA sequences. Many labs across world are installing next generation sequencing technology and we show that the undergraduate students produce quality sequence data and were excited to participate in cutting edge research. The students conducted the work flow from DNA extraction, library preparation, running the sequencing instrument, to the extraction and analysis of the data. They sequenced microbes, metagenomes, and a marine mammal, the Californian sea lion, Zalophus californianus . The students met sequencing quality controls, had no detectable contamination in the targeted DNA sequences, provided publication quality data, and became part of an international collaboration to investigate carcinomas in carnivores. Students learned important skills for their future education and career opportunities, and a perceived increase in students’ ability to conduct independent scientific research was measured. DNA sequencing is rapidly expanding in the life sciences. Teaching undergraduates to use the latest technology to sequence genomic DNA ensures they are ready to meet the challenges of the genomic era and allows them to participate in annotating the tree of life.

Publication

FOCUS2: agile and sensitive classification of metagenomics data using a reduced database

Publisher: Cold Spring Harbor Laboratory

Date: 31-03-2016

DOI: 10.1101/046425

Abstract: Metagenomics approaches rely on identifying the presence of organisms in the microbial community from a set of unknown DNA sequences. Sequence classification has valuable applications in multiple important areas of medical and environmental research. Here we introduce FOCUS2, an update of the previously published computational method FOCUS. FOCUS2 was tested with 10 simulated and 543 real metagenomes demonstrating that the program is more sensitive, faster, and more computationally efficient than existing methods. The Python implementation is freely available at edwards.sdsu.edu/FOCUS2 . available at Bioinformatics online.

Publication

An Agile Functional Analysis of Metagenomic Data Using SUPER-FOCUS

Publisher: Springer New York

Date: 2017

DOI: 10.1007/978-1-4939-7015-5_4

Abstract: One of the main goals in metagenomics is to identify the functional profile of a microbial community from unannotated shotgun sequencing reads. Functional annotation is important in biological research because it enables researchers to identify the abundance of functional genes of the organisms present in the s le, answering the question, "What can the organisms in the s le do?" Most currently available approaches do not scale with increasing data volumes, which is important because both the number and lengths of the reads provided by sequencing platforms keep increasing. Here, we present SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with real metagenomes, and the results show that it accurately predicts the subsystems present in the profiled microbial communities, is computationally efficient, and up to 1000 times faster than other tools. SUPER-FOCUS is freely available at edwards.sdsu.edu/SUPERFOCUS .

Publication

Microfluidic PCR Combined with Pyrosequencing for Identification of Allelic Variants with Phenotypic Associations among Targeted Salmonella Genes

Publisher: American Society for Microbiology

Date: 15-10-2012

DOI: 10.1128/AEM.01703-12

Abstract: A novel targeted massive parallel sequencing approach identified genetic variation in eight known or predicted fimbrial adhesins for 46 Salmonella strains. The results highlight associations between specific adhesin alleles, host species, and antimicrobial resistance. The differentiation of allelic variants has potential applications for diagnostic microbiology and epidemiological investigations.

Publication

PRFect: A tool to predict programmed ribosomal frameshifts in prokaryotic and viral genomes

Publisher: Research Square Platform LLC

Date: 06-06-2023

DOI: 10.21203/RS.3.RS-2997217/V1

Abstract: Background One of the stranger phenomena that can occur during gene translation is where, as a ribosome reads along the mRNA, various cellular and molecular properties contribute to stalling the ribosome on a slippery sequence, shifting the ribosome into one of the other two alternate reading frames. The alternate frame has different codons, so different amino acids are added to the peptide chain, but more importantly, the original stop codon is no longer in-frame, so the ribosome can bypass the stop codon and continue to translate the codons past it. This produces a longer version of the protein, a fusion of the original in-frame amino acids, followed by all the alternate frame amino acids. There is currently no automated software to predict the occurrence of these programmed ribosomal frameshifts (PRF), and they are currently only identified by manual curation. Results Here we present PRFect, an innovative machine-learning method for the detection and prediction of PRFs in coding genes of various types. PRFect combines advanced machine learning techniques with the integration of multiple complex cellular properties, such as secondary structure, codon usage, ribosomal binding site interference, direction, and slippery site motif. Calculating and incorporating these erse properties posed significant challenges, but through extensive research and development, we have achieved a user-friendly approach. The code for PRFect is freely available, open-source, and can be easily installed via a single command in the terminal. Our comprehensive evaluations on erse organisms, including bacteria, archaea, and phages, demonstrate PRFect's strong performance, achieving high sensitivity, specificity, and an accuracy exceeding 90%. Conclusion PRFect represents a significant advancement in the field of PRF detection and prediction, offering a powerful tool for researchers and scientists to unravel the intricacies of programmed ribosomal frameshifting in coding genes.

Publication

Decoding diversity in a coral reef fish species complex with restricted range using metagenomic sequencing of gut contents

Publisher: Wiley

Date: 10-03-2020

DOI: 10.1002/ECE3.6138

Publication

Genome Sequences of the Ethanol-Tolerant Lactobacillus vini Strains LMG 23202 ^T and JP7.8.9

Publisher: American Society for Microbiology

Date: 06-2012

DOI: 10.1128/JB.00446-12

Abstract: We report on the genome sequences of Lactobacillus vini type strain LMG 23202 T (DSM 20605) (isolated from fermenting grape musts in Spain) and the industrial strain L. vini JP7.8.9 (isolated from a bioethanol plant in northeast Brazil). All contigs were assembled using gsAssembler, and genes were predicted and annotated using Rapid Annotation using Subsystem Technology (RAST). The identified genome sequence of LMG 23202 T had 2.201.333 bp, 37.6% G+C, and 1,833 genes, whereas the identified genome sequence of JP7.8.9 had 2.301.037 bp, 37.8% G+C, and 1,739 genes. The gene repertoire of the species L. vini offers promising opportunities for biotechnological applications.

Publication

Biodiversity and biogeography of phages in modern stromatolites and thrombolites

Publisher: Springer Science and Business Media LLC

Date: 03-2008

DOI: 10.1038/NATURE06735

Abstract: Viruses, and more particularly phages (viruses that infect bacteria), represent one of the most abundant living entities in aquatic and terrestrial environments. The biogeography of phages has only recently been investigated and so far reveals a cosmopolitan distribution of phage genetic material (or genotypes). Here we address this cosmopolitan distribution through the analysis of phage communities in modern microbialites, the living representatives of one of the most ancient life forms on Earth. On the basis of a comparative metagenomic analysis of viral communities associated with marine (Highborne Cay, Bahamas) and freshwater (Pozas Azules II and Rio Mesquites, Mexico) microbialites, we show that some phage genotypes are geographically restricted. The high percentage of unknown sequences recovered from the three metagenomes (>97%), the low percentage similarities with sequences from other environmental viral (n = 42) and microbial (n = 36) metagenomes, and the absence of viral genotypes shared among microbialites indicate that viruses are genetically unique in these environments. Identifiable sequences in the Highborne Cay metagenome were dominated by single-stranded DNA microphages that were not detected in any other s les examined, including sea water, fresh water, sediment, terrestrial, extreme, metazoan-associated and marine microbial mats. Finally, a marine signature was present in the phage community of the Pozas Azules II microbialites, even though this environment has not been in contact with the ocean for tens of millions of years. Taken together, these results prove that viruses in modern microbialites display biogeographical variability and suggest that they may be derived from an ancient community.

Publication

A Distinct Contractile Injection System Gene Cluster Found in a Majority of Healthy Adult Human Microbiomes

Publisher: American Society for Microbiology

Date: 25-08-2020

DOI: 10.1128/MSYSTEMS.00648-20

Abstract: To engage with host cells, erse pathogenic bacteria produce syringe-like structures called contractile injection systems (CIS). CIS are evolutionarily related to the contractile tails of bacteriophages and are specialized to puncture membranes, often delivering effectors to target cells. Although CIS are key for pathogens to cause disease, paradoxically, similar injection systems have been identified within healthy human microbiome bacteria. Here, we show that gene clusters encoding a predicted CIS, which we term Bacteroidales injection systems (BIS), are present in the microbiomes of nearly all adult humans tested from Western countries. BIS genes are enriched within human gut microbiomes and are expressed both in vitro and in vivo . Further, a greater abundance of BIS genes is present within healthy gut microbiomes than in those humans with with inflammatory bowel disease (IBD). Our discovery provides a potentially distinct means by which our microbiome interacts with the human host or its microbiome.

Publication

Predicting the capsid architecture of phages from metagenomic data

Publisher: Elsevier BV

Date: 2022

DOI: 10.1016/J.CSBJ.2021.12.032

Publication

Global phylogeography and ancient evolution of the widespread human gut virus crAssphage

Publisher: Cold Spring Harbor Laboratory

Date: 26-01-2019

DOI: 10.1101/527796

Abstract: Microbiomes are vast communities of microbes and viruses that populate all natural ecosystems. Viruses have been considered the most variable component of microbiomes, as supported by virome surveys and ex les of high genomic mosaicism. However, recent evidence suggests that the human gut virome is remarkably stable compared to other environments. Here we investigate the origin, evolution, and epidemiology of crAssphage, a widespread human gut virus. Through a global collaboratory, we obtained DNA sequences of crAssphage from over one-third of the world's countries, and showed that its phylogeography is locally clustered within countries, cities, and in iduals. We also found colinear crAssphage-like genomes in both Old-World and New-World primates, challenging genomic mosaicism and suggesting that the association of crAssphage with primates may be millions of years old. We conclude that crAssphage is a benign globetrotter virus that may have co-evolved with the human lineage and an integral part of the normal human gut virome.

Publication

GenomePeek—an online tool for prokaryotic genome and metagenome analysis

Publisher: PeerJ

Date: 16-06-2015

DOI: 10.7717/PEERJ.1025

Publication

Mechanistic Model of Rothia mucilaginosa Adaptation toward Persistence in the CF Lung, Based on a Genome Reconstructed from Metagenomic Data

Publisher: Public Library of Science (PLoS)

Date: 30-05-2013

DOI: 10.1371/JOURNAL.PONE.0064285

Publication

‘Genome skimming’ with the MinION hand-held sequencer identifies CITES-listed shark species in India’s exports market

Publisher: Springer Science and Business Media LLC

Date: 14-03-2019

DOI: 10.1038/S41598-019-40940-9

Abstract: Chondrichthyes - sharks, rays, skates, and chimeras, are among the most threatened and data deficient vertebrate species. Global demand for shark and ray derived products, drives unregulated and exploitative fishing practices, which are in turn facilitated by the lack of ecological data required for effective conservation of these species. Here, we describe a Next Generation Sequencing method (using the MinION, a hand-held portable sequencing device from Oxford Nanopore Technologies), and analyses pipeline for molecular ecological studies in Chondrichthyes. Using this method, the complete mitochondrial genome and nuclear intergenic and protein-coding sequences were obtained by direct sequencing of genomic DNA obtained from shark fin tissue. Recovered loci include mitochondrial barcode sequences- Cytochrome oxidase I, NADH2, 16S rRNA and 12S rRNA- and nuclear genetic loci such as 5.8S rRNA, Internal Transcribed Spacer 2, and 28S rRNA regions, which are commonly used for taxonomic identification. Other loci recovered were the nuclear protein-coding genes for antithrombin or SerpinC, Immunoglobulin lambda light chain, Preprogehrelin, selenium binding protein 1(SBP1), Interleukin-1 beta (IL-1β) and Recombination-Activating Gene 1 (RAG1). The median coverage across all genetic loci was 20x and sequence accuracy was ≥99.8% compared to reference sequences. Analyses of the nuclear ITS2 region and the mitochondrial protein-encoding loci allowed accurate taxonomic identification of the shark specimen as Carcharhinus falciformis , a CITES Appendix II species. MinION sequencing provided 1,152,211 bp of new shark genome, increasing the number of sequenced shark genomes to five. Phylogenetic analyses using both mitochondrial and nuclear loci provided evidence that Prionace glauca is nested within Carcharhinus , suggesting the need for taxonomic reassignment of P . glauca . We increased genomic information about a shark species for ecological and population genetic studies, enabled accurate identification of the shark tissue for bio ersity indexing and resolved phylogenetic relationships among multiple taxa. The method was independent of lification bias, and adaptable for field assessments of other Chondrichthyes and wildlife species in the future.

Publication

Author Correction: Guidelines for public database submission of uncultivated virus genome sequences for taxonomic classification

Publisher: Springer Science and Business Media LLC

Date: 22-08-2023

DOI: 10.1038/S41587-023-01952-Z

Publication

linsalrob/fasta_validator: Initial Release

Publisher: Zenodo

Date: 2019

DOI: 10.5281/ZENODO.2532044

Publication

Differential regulation of fasA and fasH expression of Escherichia coli 987P fimbriae by environmental cues

Publisher: Wiley

Date: 08-1997

DOI: 10.1046/J.1365-2958.1997.5161875.X

Abstract: An early process in the pathogenesis of enteric bacteria is colonization of the intestinal epithelium leading to local multiplication, pathophysiological interactions with the host and further spreading. Attachment is typically mediated by bacterial fimbriae, which are selectively expressed during growth in the intestine. Here we report an analysis of the regulation of 987P fimbrial expression of enterotoxigenic Escherichia coli (ETEC). Expression of both fasH, the transcriptional activator of the 987P fimbrial genes, and fasA, the major fimbrial subunit, is regulated in response to a variety of environmental stimuli. We have found that expression of fasH is regulated in response to the carbon status of the growth medium by the cAMP-CRP complex. Moreover, fasH is regulated in response to both the nitrogen status of the growth medium and the external pH. Expression of fasA is activated by FasH, and is also selectively regulated in response to growth temperature by HNS. Regulation of fimbrial expression by carbon and/or nitrogen gradients is proposed to provide a mechanism that allows preferential colonization of different segments of the intestine by various enteropathogens, such as ETEC, enteropathogenic E. coli and Vibrio cholerae.

Publication

Host interactions of novel Crassvirales species belonging to multiple families infecting bacterial host, Bacteroides cellulosilyticus WH2

Publisher: Microbiology Society

Date: 04-09-2023

DOI: 10.1099/MGEN.0.001100

Publication

Prophage rates in the human microbiome vary by body site and host health

Publisher: Cold Spring Harbor Laboratory

Date: 05-05-2023

DOI: 10.1101/2023.05.04.539508

Abstract: Phages integrated into a bacterial genome–called prophages–continuously monitor the health of the host bacteria to determine when to escape the genome, protect their host from other phage infections, and may provide genes that promote bacterial growth. Prophages are essential to almost all microbiomes, including the human microbiome. However, most human microbiome studies focus on bacteria, ignoring free and integrated phages, so we know little about how these prophages affect the human microbiome. We compared the prophages identified in 11,513 bacterial genomes isolated from human body sites to characterise prophage DNA in the human microbiome. Here, we show that prophage DNA comprised an average of 1-5% of each bacterial genome. The prophage content per genome varies with the isolation site on the human body, the health of the human, and whether the disease was symptomatic. The presence of prophages promotes bacterial growth and sculpts the microbiome. However, the disparities caused by prophages vary throughout the body.

Publication

Global phylogeography and ancient evolution of the widespread human gut virus crAssphage

Publisher: Springer Science and Business Media LLC

Date: 08-07-2201

DOI: 10.1038/S41564-019-0494-6

Publication

Structure and function of a cyanophage-encoded peptide deformylase

Publisher: Springer Science and Business Media LLC

Date: 14-02-2013

DOI: 10.1038/ISMEJ.2013.4

Publication

PRINSEQ++, a multi-threaded tool for fast and efficient quality control and preprocessing of sequencing datasets

Publisher: PeerJ

Date: 27-02-2019

DOI: 10.7287/PEERJ.PREPRINTS.27553V1

Abstract: PRINSEQ++ is a C++ implementation of the very popular software prinseq-lite for quality control and preprocessing of sequencing datasets. PRINSEQ++ can run multi-threaded processes, which makes it more than 10 times faster than the original version. It can read from, and write to, compressed files, drastically reducing the use of hard-drive. PRINSEQ++ can filter, trim and reformat sequences by a variety of options to improve downstream analysis. PRINSEQ++ is freely available on GitHub (github.com/Adrian-Cantu/PRINSEQ-plus-plus) and runs on all Unix-like systems.

Publication

Philympics 2021: Prophage Predictions Perplex Programs

Publisher: F1000 Research Ltd

Date: 05-08-2021

DOI: 10.12688/F1000RESEARCH.54449.1

Abstract: Background Most bacterial genomes contain integrated bacteriophages—prophages—in various states of decay. Many are active and able to excise from the genome and replicate, while others are cryptic prophages, remnants of their former selves. Over the last two decades, many computational tools have been developed to identify the prophage components of bacterial genomes, and it is a particularly active area for the application of machine learning approaches. However, progress is hindered and comparisons thwarted because there are no manually curated bacterial genomes that can be used to test new prophage prediction algorithms. Methods We present a library of gold-standard bacterial genome annotations that include manually curated prophage annotations, and a computational framework to compare the predictions from different algorithms. We use this suite to compare all extant stand-alone prophage prediction algorithms to identify their strengths and weaknesses. We provide a FAIR dataset for prophage identification, and demonstrate the accuracy, precision, recall, and f 1 score from the analysis of seven different algorithms for the prediction of prophages. Results We identified different strengths and weaknesses between the prophage prediction tools. Several tools exhibit exceptional f 1 scores, while others have better recall at the expense of more false positives. The tools vary greatly in runtime performance with few exhibiting all desirable qualities for large-scale analyses. Conclusions Our library of gold-standard prophage annotations and benchmarking framework provide a valuable resource for exploring strengths and weaknesses of current and future prophage annotation tools. We discuss caveats and concerns in this analysis, how those concerns may be mitigated, and avenues for future improvements. This framework will help developers identify opportunities for improvement and test updates. It will also help users in determining the tools that are best suited for their analysis.

Publication

Sequencing at sea: Challenges and experiences in Ion Torrent PGM sequencing during the 2013 Southern Line Islands Research Expedition

Publisher: PeerJ

Date: 11-07-2014

DOI: 10.7287/PEERJ.PREPRINTS.433V1

Abstract: Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the s les were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes, and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phospohonate as an important phosphorous source for microbes growing in the Line Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further s ling based on the sequences generated. By eliminating the delay between s ling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines.

Publication

Philympics 2021: Prophage Predictions Perplex Programs

Publisher: F1000 Research Ltd

Date: 08-04-2022

DOI: 10.12688/F1000RESEARCH.54449.2

Abstract: Background Most bacterial genomes contain integrated bacteriophages—prophages—in various states of decay. Many are active and able to excise from the genome and replicate, while others are cryptic prophages, remnants of their former selves. Over the last two decades, many computational tools have been developed to identify the prophage components of bacterial genomes, and it is a particularly active area for the application of machine learning approaches. However, progress is hindered and comparisons thwarted because there are no manually curated bacterial genomes that can be used to test new prophage prediction algorithms. Methods We present a library of gold-standard bacterial genomes with manually curated prophage annotations, and a computational framework to compare the predictions from different algorithms. We use this suite to compare all extant stand-alone prophage prediction algorithms and identify their strengths and weaknesses. We provide a FAIR dataset for prophage identification, and demonstrate the accuracy, precision, recall, and f 1 score from the analysis of ten different algorithms for the prediction of prophages. Results We identified strengths and weaknesses between the prophage prediction tools. Several tools exhibit exceptional f 1 scores, while others have better recall at the expense of more false positives. The tools vary greatly in runtime performance with few exhibiting all desirable qualities for large-scale analyses. Conclusions Our library of gold-standard prophage annotations and benchmarking framework provide a valuable resource for exploring strengths and weaknesses of current and future prophage annotation tools. We discuss caveats and concerns in this analysis, how those concerns may be mitigated, and avenues for future improvements. This framework will help developers identify opportunities for improvement and test updates. It will also help users in determining the tools that are best suited for their analysis.

Publication

Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

Publisher: Frontiers Media SA

Date: 08-05-2015

DOI: 10.3389/FMICB.2015.00381

Publication

The importance of complete genome sequences

Publisher: Elsevier BV

Date: 05-2002

DOI: 10.1016/S0966-842X(02)02353-3

Publication

Growth Score: A single metric to define growth in 96-well phenotype assays

Publisher: PeerJ

Date: 30-01-2018

DOI: 10.7287/PEERJ.PREPRINTS.26469V1

Abstract: High-throughput phenotype assays are a cornerstone of systems biology as they allow direct measurements of mutations, genes, strains, or even different genera. High-throughput methods also require data analytic methods that reduce complex time-series data to a single numeric evaluation. Here, we present the Growth Score, an improvement on the previous Growth Level formula. There is strong correlation between Growth Score and Growth Level, but the new Growth Score contains only essential growth curve properties while the formula of the previous Growth Level was convoluted and not easily interpretable. Several programs can be used to estimate the parameters required to calculate the Growth Score metric, including our PMAnalyzer pipeline.

Publication

Taxonomy of prokaryotic viruses: 2018-2019 update from the ICTV Bacterial and Archaeal Viruses Subcommittee

Publisher: Springer Science and Business Media LLC

Date: 11-03-2020

DOI: 10.1007/S00705-020-04577-8

Publication

Comparative Metagenomics Reveals Host Specific Metavirulomes and Horizontal Gene Transfer Elements in the Chicken Cecum Microbiome

Publisher: Public Library of Science (PLoS)

Date: 13-08-2008

DOI: 10.1371/JOURNAL.PONE.0002945

Publication

Nitrogen control in bacteria

Publisher: American Society for Microbiology

Date: 12-1995

DOI: 10.1128/MR.59.4.604-622.1995

Abstract: Nitrogen metabolism in prokaryotes involves the coordinated expression of a large number of enzymes concerned with both utilization of extracellular nitrogen sources and intracellular biosynthesis of nitrogen-containing compounds. The control of this expression is determined by the availability of fixed nitrogen to the cell and is effected by complex regulatory networks involving regulation at both the transcriptional and posttranslational levels. While the most detailed studies to date have been carried out with enteric bacteria, there is a considerable body of evidence to show that the nitrogen regulation (ntr) systems described in the enterics extend to many other genera. Furthermore, as the range of bacteria in which the phenomenon of nitrogen control is examined is being extended, new regulatory mechanisms are also being discovered. In this review, we have attempted to summarize recent research in prokaryotic nitrogen control to show the ubiquity of the ntr system, at least in gram-negative organisms and to identify those areas and groups of organisms about which there is much still to learn.

Publication

Genome analysis of the obligately lytic bacteriophage 4268 of Lactococcus lactis provides insight into its adaptable nature

Publisher: Elsevier BV

Date: 2006

DOI: 10.1016/J.GENE.2005.09.022

Abstract: Analysis of the complete nucleotide sequence of the lactococcal phage 4268, which is lytic for the cheese starter Lactococcus lactis DPC4268, is presented. Phage 4268 has a linear genome of 36,596 bp, which is modularly organised and encompasses 49 open reading frames. Putative functions were assigned to approximately 45% of the predicted products of these open reading frames based on sequence similarity with known proteins, N-terminal sequence analysis and identification of conserved domains. Significantly, a segment of the genome has homology to the recently sequenced lysogenic module in lactococcal phage phi31 that contains a lytic switch but no phage integrase or attachment site. This suggests that it is derived from a prophage. A phage 4268-encoded and a host-encoded methylase were found to be highly similar, having only two nucleotide mismatches, suggesting that the phage acquired the methylase gene to protect it from a host endonuclease. Comparative genomic analysis revealed significant homology between phage 4268 and the lactococcal phage BK5-T. The comparative analysis also supported the classification of phage 4268 and other BK5-T-related phage as separate from the proposed P335 species of lactococcal phage.

Publication

Correction: Corrigendum: Lytic to temperate switching of viral communities

Publisher: Springer Science and Business Media LLC

Date: 24-08-2016

DOI: 10.1038/NATURE19335

Publication

Experimental and Computational Assessment of Conditionally Essential Genes in Escherichia coli

Publisher: American Society for Microbiology

Date: 12-2006

DOI: 10.1128/JB.00740-06

Abstract: Genome-wide gene essentiality data sets are becoming available for Escherichia coli , but these data sets have yet to be analyzed in the context of a genome scale model. Here, we present an integrative model-driven analysis of the Keio E. coli mutant collection screened in this study on glycerol-supplemented minimal medium. Out of 3,888 single-deletion mutants tested, 119 mutants were unable to grow on glycerol minimal medium. These conditionally essential genes were then evaluated using a genome scale metabolic and transcriptional-regulatory model of E. coli , and it was found that the model made the correct prediction in ∼91% of the cases. The discrepancies between model predictions and experimental results were analyzed in detail to indicate where model improvements could be made or where the current literature lacks an explanation for the observed phenotypes. The identified set of essential genes and their model-based analysis indicates that our current understanding of the roles these essential genes play is relatively clear and complete. Furthermore, by analyzing the data set in terms of metabolic subsystems across multiple genomes, we can project which metabolic pathways are likely to play equally important roles in other organisms. Overall, this work establishes a paradigm that will drive model enhancement while simultaneously generating hypotheses that will ultimately lead to a better understanding of the organism.

Publication

Programmed ribosomal frameshifts, and how to find them

Publisher: Cold Spring Harbor Laboratory

Date: 12-04-2023

DOI: 10.1101/2023.04.10.536325

Abstract: One of the stranger phenomena that can occur during gene translation is where, as a ribosome reads along the mRNA, various cellular and molecular properties contribute to stalling the ribosome on a slippery sequence, shifting the ribosome into one of the other two alternate reading frames. The alternate frame has different codons, so different amino acids are added to the peptide chain, but more importantly, the original stop codon is no longer in-frame, so the ribosome can bypass the stop codon and continue to translate the codons past it. This produces a longer version of the protein, a fusion of the original in-frame amino acids, followed by all the alternate frame amino acids. There is currently no automated software to predict the occurrence of these programmed ribosomal frameshifts (PRF), and they are currently only identified by manual curation. Here we present the first machine-learning based method to detect and predict the presence of PRFs in all types of coding genes and taxa with an accuracy exceding 90%.

Publication

Clinical Insights from Metagenomic Analysis of Sputum Samples from Patients with Cystic Fibrosis

Publisher: American Society for Microbiology

Date: 02-2014

DOI: 10.1128/JCM.02204-13

Abstract: As DNA sequencing becomes faster and cheaper, genomics-based approaches are being explored for their use in personalized diagnoses and treatments. Here, we provide a proof of principle for disease monitoring using personal metagenomic sequencing and traditional clinical microbiology by focusing on three adults with cystic fibrosis (CF). The CF lung is a dynamic environment that hosts a complex ecosystem composed of bacteria, viruses, and fungi that can vary in space and time. Not surprisingly, the microbiome data from the induced sputum s les we collected revealed a significant amount of species ersity not seen in routine clinical laboratory cultures. The relative abundances of several species changed as clinical treatment was altered, enabling the identification of the climax and attack communities that were proposed in an earlier work. All patient microbiomes encoded a ersity of mechanisms to resist antibiotics, consistent with the characteristics of multidrug-resistant microbial communities that are commonly observed in CF patients. The metabolic potentials of these communities differed by the health status and recovery route of each patient. Thus, this pilot study provides an ex le of how metagenomic data might be used with clinical assessments for the development of treatments tailored to in idual patients.

Publication

Some of the most interesting CASP11 targets through the eyes of their authors

Publisher: Wiley

Date: 16-11-2015

DOI: 10.1002/PROT.24942

Publication

Real Time Metagenomics: Using k-mers to annotate metagenomes

Publisher: Oxford University Press (OUP)

Date: 09-10-2012

DOI: 10.1093/BIOINFORMATICS/BTS599

Abstract: Summary: Annotation of metagenomes involves comparing the in idual sequence reads with a database of known sequences and assigning a unique function to each read. This is a time-consuming task that is computationally intensive (though not computationally complex). Here we present a novel approach to annotate metagenomes using unique k-mer oligopeptide sequences from 7 to 12 amino acids long. We demonstrate that k-mer-based annotations are faster and approach the sensitivity and precision of blastx-based annotations without loosing accuracy. A last-common ancestor approach was also developed to describe the members of the community. Availability and implementation: This open-source application was implemented in Perl and can be accessed via a user-friendly website at tmg. In addition, code to access the annotation servers is available for download from FIGfams and k-mers are available for download from ftp://ftp.theseed.org/FIGfams/. Contact: redwards@mail.sdsu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

Quality control and preprocessing of metagenomic datasets

Publisher: Oxford University Press (OUP)

Date: 28-01-2011

DOI: 10.1093/BIOINFORMATICS/BTR026

Abstract: Summary: Here, we present PRINSEQ for easy and rapid quality control and data preprocessing of genomic and metagenomic datasets. Summary statistics of FASTA (and QUAL) or FASTQ files are generated in tabular and graphical form and sequences can be filtered, reformatted and trimmed by a variety of options to improve downstream analysis. Availability and Implementation: This open-source application was implemented in Perl and can be used as a stand alone version or accessed online through a user-friendly web interface. The source code, user help and additional information are available at Contact: rschmied@sciences.sdsu.edu redwards@cs.sdsu.edu

Publication

Poster Session Abstracts

Publisher: Wiley

Date: 09-2012

DOI: 10.1002/PPUL.22682

Publication

Growth Score: a single metric to define growth in 96-well phenotype assays

Publisher: PeerJ

Date: 19-04-2018

DOI: 10.7717/PEERJ.4681

Abstract: High-throughput phenotype assays are a cornerstone of systems biology as they allow direct measurements of mutations, genes, strains, or even different genera. High-throughput methods also require data analytic methods that reduce complex time-series data to a single numeric evaluation. Here, we present the Growth Score, an improvement on the previous Growth Level formula. There is strong correlation between Growth Score and Growth Level, but the new Growth Score contains only essential growth curve properties while the formula of the previous Growth Level was convoluted and not easily interpretable. Several programs can be used to estimate the parameters required to calculate the Growth Score metric, including our PMAnalyzer pipeline.

Publication

Connecting genotype to phenotype in the era of high-throughput sequencing

Publisher: Elsevier BV

Date: 10-2011

DOI: 10.1016/J.BBAGEN.2011.03.010

Abstract: The development of next generation sequencing technology is rapidly changing the face of the genome annotation and analysis field. One of the primary uses for genome sequence data is to improve our understanding and prediction of phenotypes for microbes and microbial communities, but the technologies for predicting phenotypes must keep pace with the new sequences emerging. This review presents an integrated view of the methods and technologies used in the inference of phenotypes for microbes and microbial communities based on genomic and metagenomic data. Given the breadth of this topic, we place special focus on the resources available within the SEED Project. We discuss the two steps involved in connecting genotype to phenotype: sequence annotation, and phenotype inference, and we highlight the challenges in each of these steps when dealing with both single genome and metagenome data. This integrated view of the genotype-to-phenotype problem highlights the importance of a controlled ontology in the annotation of genomic data, as this benefits subsequent phenotype inference and metagenome annotation. We also note the importance of expanding the set of reference genomes to improve the annotation of all sequence data, and we highlight metagenome assembly as a potential new source for complete genomes. Finally, we find that phenotype inference, particularly from metabolic models, generates predictions that can be validated and reconciled to improve annotations. This review presents the first look at the challenges and opportunities associated with the inference of phenotype from genotype during the next generation sequencing revolution. This article is part of a Special Issue entitled: Systems Biology of Microorganisms.

Publication

Cyanobacterial biodiversity of semiarid public drinking water supply reservoirs assessed via next-generation DNA sequencing technology

Publisher: Springer Science and Business Media LLC

Date: 27-05-2019

DOI: 10.1007/S12275-019-8349-7

Abstract: Next-generation DNA sequencing technology was applied to generate molecular data from semiarid reservoirs during well-defined seasons. Target sequences of 16S-23S rRNA ITS and cpcBA-IGS were used to reveal the taxonomic groups of cyanobacteria present in the s les, and genes coding for cyanotoxins such as microcystins (mcyE), saxitoxins (sxtA), and cylindrospermopsins (cyrJ) were investigated. The presence of saxitoxins in the environmental s les was evaluated using ELISA kit. Taxonomic analyses of high-throughput DNA sequencing data showed the dominance of the genus Microcystis in Mundaú reservoir. Furthermore, it was the most abundant genus in the dry season in Ingazeira reservoir. In the rainy season, 16S-23S rRNA ITS analysis revealed that Cylindrospermopsis raciborskii comprised 46.8% of the cyanobacterial community in Ingazeira reservoir, while the cpcBAIGS region revealed that C. raciborskii (31.8%) was the most abundant taxon followed by Sphaerospermopsis aphanizomenoides (17.3%) and Planktothrix zahidii (16.6%). Despite the presence of other potential toxin-producing genera, the detected sxtA gene belonged to C. raciborskii, while the mcyE gene belonged to Microcystis in both reservoirs. The detected mcyE gene had good correlation with MC content, while the lification of the sxtA gene was related to the presence of STX. The cyrJ gene was not detected in these s les. Using DNA analyses, our results showed that the cyanobacterial composition of Mundaú reservoir was similar in successive dry seasons, and it varied between seasons in Ingazeira reservoir. In addition, our data suggest that some biases of analysis influenced the cyanobacterial communities seen in the NGS output of Ingazeira reservoir.

Publication

The StkSR Two-Component System Influences Colistin Resistance in Acinetobacter baumannii

Publisher: MDPI AG

Date: 08-05-2022

DOI: 10.3390/MICROORGANISMS10050985

Abstract: Acinetobacter baumannii is an opportunistic human pathogen responsible for numerous severe nosocomial infections. Genome analysis on the A. baumannii clinical isolate 04117201 revealed the presence of 13 two-component signal transduction systems (TCS). Of these, we examined the putative TCS named here as StkSR. The stkR response regulator was deleted via homologous recombination and its progeny, ΔstkR, was phenotypically characterized. Antibiogram analyses of ΔstkR cells revealed a two-fold increase in resistance to the clinically relevant polymyxins, colistin and polymyxin B, compared to wildtype. PAGE-separation of silver stained purified lipooligosaccharide isolated from ΔstkR and wildtype cells ruled out the complete loss of lipooligosaccharide as the mechanism of colistin resistance identified for ΔstkR. Hydrophobicity analysis identified a phenotypical change of the bacterial cells when exposed to colistin. Transcriptional profiling revealed a significant up-regulation of the pmrCAB operon in ΔstkR compared to the parent, associating these two TCS and colistin resistance. These results reveal that there are multiple levels of regulation affecting colistin resistance the suggested ‘cross-talk’ between the StkSR and PmrAB two-component systems highlights the complexity of these systems.

Publication

Hecatomb: An End-to-End Research Platform for Viral Metagenomics

Publisher: Cold Spring Harbor Laboratory

Date: 16-05-2022

DOI: 10.1101/2022.05.15.492003

Abstract: Analysis of viral ersity using modern sequencing technologies offers extraordinary opportunities for discovery. However, these analyses present a number of bioinformatic challenges due to viral genetic ersity and virome complexity. Due to the lack of conserved marker sequences, metagenomic detection of viral sequences requires a non-targeted, random (shotgun) approach. Annotation and enumeration of viral sequences relies on rigorous quality control and effective search strategies against appropriate reference databases. Virome analysis also benefits from the analysis of both in idual metagenomic sequences as well as assembled contigs. Combined, virome analysis results in large amounts of data requiring sophisticated visualization and statistical tools. Here we introduce Hecatomb, a bioinformatics platform enabling both read and contig based analysis. Hecatomb integrates query information from both amino acid and nucleotide reference sequence databases. Hecatomb integrates data collected throughout the workflow enabling analyst driven virome analysis and discovery. Hecatomb is available on GitHub at handley/hecatomb . Hecatomb provides a single, modular software solution to the complex tasks required of many virome analysis. We demonstrate the value of the approach by applying Hecatomb to both a host-associated (enteric) and an environmental (marine) virome data set. Hecatomb provided data to determine true- or false-positive viral sequences in both data sets and revealed complex virome structure at distinct marine reef sites.

Publication

Combining de novo and reference-guided assembly with scaffold_builder

Publisher: Springer Science and Business Media LLC

Date: 22-11-2013

DOI: 10.1186/1751-0473-8-23

Abstract: Genome sequencing has become routine, however genome assembly still remains a challenge despite the computational advances in the last decade. In particular, the abundance of repeat elements in genomes makes it difficult to assemble them into a single complete sequence. Identical repeats shorter than the average read length can generally be assembled without issue. However, longer repeats such as ribosomal RNA operons cannot be accurately assembled using existing tools. The application Scaffold_builder was designed to generate scaffolds – super contigs of sequences joined by N-bases – based on the similarity to a closely related reference sequence. This is independent of mate-pair information and can be used complementarily for genome assembly, e.g. when mate-pairs are not available or have already been exploited. Scaffold_builder was evaluated using simulated pyrosequencing reads of the bacterial genomes Escherichia coli 042, Lactobacillus salivarius UCC118 and Salmonella enterica subsp. enterica serovar Typhi str. P-stx-12. Moreover, we sequenced two genomes from Salmonella enterica serovar Typhimurium LT2 G455 and Salmonella enterica serovar Typhimurium SDT1291 and show that Scaffold_builder decreases the number of contig sequences by 53% while more than doubling their average length. Scaffold_builder is written in Python and is available at caffold_builder . A web-based implementation is additionally provided to allow users to submit a reference genome and a set of contigs to be scaffolded.

Publication

Genome Sequence of the Bacterioplanktonic, Mixotrophic Vibrio campbellii Strain PEL22A, Isolated in the Abrolhos Bank

Publisher: American Society for Microbiology

Date: 24-04-2012

DOI: 10.1128/JB.00377-12

Publication

Global microbialization of coral reefs

Publisher: Springer Science and Business Media LLC

Date: 25-04-2016

DOI: 10.1038/NMICROBIOL.2016.42

Abstract: Microbialization refers to the observed shift in ecosystem trophic structure towards higher microbial biomass and energy use. On coral reefs, the proximal causes of microbialization are overfishing and eutrophication, both of which facilitate enhanced growth of fleshy algae, conferring a competitive advantage over calcifying corals and coralline algae. The proposed mechanism for this competitive advantage is the DDAM positive feedback loop (dissolved organic carbon (DOC), disease, algae, microorganism), where DOC released by ungrazed fleshy algae supports copiotrophic, potentially pathogenic bacterial communities, ultimately harming corals and maintaining algal competitive dominance. Using an unprecedented data set of >400 s les from 60 coral reef sites, we show that the central DDAM predictions are consistent across three ocean basins. Reef algal cover is positively correlated with lower concentrations of DOC and higher microbial abundances. On turf and fleshy macroalgal-rich reefs, higher relative abundances of copiotrophic microbial taxa were identified. These microbial communities shift their metabolic potential for carbohydrate degradation from the more energy efficient Embden-Meyerhof-Parnas pathway on coral-dominated reefs to the less efficient Entner-Doudoroff and pentose phosphate pathways on algal-dominated reefs. This 'yield-to-power' switch by microorganism directly threatens reefs via increased hypoxia and greater CO2 release from the microbial respiration of DOC.

Publication

Baseline Assessment of Mesophotic Reefs of the Vitória-Trindade Seamount Chain Based on Water Quality, Microbial Diversity, Benthic Cover and Fish Biomass Data

Publisher: Public Library of Science (PLoS)

Date: 19-06-2015

DOI: 10.1371/JOURNAL.PONE.0130084

Publication

A Novel Group of Promiscuous Podophages Infecting Diverse Gammaproteobacteria from River Communities Exhibits Dynamic Intergenus Host Adaptation

Publisher: American Society for Microbiology

Date: 23-02-2021

DOI: 10.1128/MSYSTEMS.00773-20

Abstract: In natural environments, phages coexist and interact with a broad variety of bacteria, posing a conundrum for narrow-host-range phage maintenance in erse communities. This context is rarely considered in the study of host-phage interactions, typically focused on narrow-host-range viruses and their infectivity in target bacteria isolated from sources distinct to where the phages were retrieved from.

Publication

NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements

Publisher: MDPI AG

Date: 16-09-2019

DOI: 10.3390/GENES10090714

Abstract: A wealth of viral data sits untapped in publicly available metagenomic data sets when it might be extracted to create a usable index for the virological research community. We hypothesized that work of this complexity and scale could be done in a hackathon setting. Ten teams comprised of over 40 participants from six countries, assembled to create a crowd-sourced set of analysis and processing pipelines for a complex biological data set in a three-day event on the San Diego State University c us starting 9 January 2019. Prior to the hackathon, 141,676 metagenomic data sets from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) were pre-assembled into contiguous assemblies (contigs) by NCBI staff. During the hackathon, a subset consisting of 2953 SRA data sets (approximately 55 million contigs) was selected, which were further filtered for a minimal length of 1 kb. This resulted in 4.2 million (Mio) contigs, which were aligned using BLAST against all known virus genomes, phylogenetically clustered and assigned metadata. Out of the 4.2 Mio contigs, 360,000 contigs were labeled with domains and an additional subset containing 4400 contigs was screened for virus or virus-like genes. The work yielded valuable insights into both SRA data and the cloud infrastructure required to support such efforts, revealing analysis bottlenecks and possible workarounds thereof. Mainly: (i) Conservative assemblies of SRA data improves initial analysis steps (ii) existing bioinformatic software with weak multithreading/multicore support can be elevated by wrapper scripts to use all cores within a computing node (iii) redesigning existing bioinformatic algorithms for a cloud infrastructure to facilitate its use for a wider audience and (iv) a cloud infrastructure allows a erse group of researchers to collaborate effectively. The scientific findings will be extended during a follow-up event. Here, we present the applied workflows, initial results, and lessons learned from the hackathon.

Publication

Phables: from fragmented assemblies to high-quality bacteriophage genomes

Publisher: Cold Spring Harbor Laboratory

Date: 04-04-2023

DOI: 10.1101/2023.04.04.535632

Abstract: Microbial communities found within the human gut have a strong influence on human health. Intestinal bacteria and viruses influence gastrointestinal diseases such as inflammatory bowel disease. Viruses infecting bacteria, known as bacteriophages, play a key role in modulating bacterial communities within the human gut. However, the identification and characterisation of novel bacteriophages remain a challenge. Available tools use similarities between sequences, nucleotide composition, and the presence of viral genes roteins. Most available tools consider in idual contigs to determine whether they are of viral origin. As a result of the challenges in viral assembly, fragmentation of viral genomes can occur, leading to the need for new approaches in viral identification. We introduce Phables, a new computational method to resolve bacteriophage genomes from fragmented viral metagenomic assemblies. Phables identifies bacteriophage-like components in the assembly graph, models each component as a flow network, and uses graph algorithms and flow decomposition techniques to identify genomic paths. Experimental results of viral metagenomic s les obtained from different environments show that over 80% of the bacteriophage genomes resolved by Phables have high quality and are longer than the in idual contigs identified by existing viral identification tools. Phables is available on GitHub at github.com/Vini2 hables (DOI: 10.5281/zenodo.7645166). vijini.mallawaarachchi@flinders.edu.au

Publication

The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes

Publisher: Springer Science and Business Media LLC

Date: 19-09-2008

DOI: 10.1186/1471-2105-9-386

Publication

Eye of newt and toe of frog

Publisher: Elsevier BV

Date: 06-1999

DOI: 10.1016/S0168-9525(98)01678-3

Publication

A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes

Publisher: Springer Science and Business Media LLC

Date: 24-07-2014

DOI: 10.1038/NCOMMS5498

Publication

Utilizing amino acid composition and entropy of potential open reading frames to identify protein-coding genes

Publisher: MDPI AG

Date: 08-01-2021

DOI: 10.3390/MICROORGANISMS9010129

Abstract: One of the main steps in gene-finding in prokaryotes is determining which open reading frames encode for a protein, and which occur by chance alone. There are many different methods to differentiate the two the most prevalent approach is using shared homology with a database of known genes. This method presents many pitfalls, most notably the catch that you only find genes that you have seen before. The four most popular prokaryotic gene-prediction programs (GeneMark, Glimmer, Prodigal, Phanotate) all use a protein-coding training model to predict protein-coding genes, with the latter three allowing for the training model to be created ab initio from the input genome. Different methods are available for creating the training model, and to increase the accuracy of such tools, we present here GOODORFS, a method for identifying protein-coding genes within a set of all possible open reading frames (ORFS). Our workflow begins with taking the amino acid frequencies of each ORF, calculating an entropy density profile (EDP), using KMeans to cluster the EDPs, and then selecting the cluster with the lowest variation as the coding ORFs. To test the efficacy of our method, we ran GOODORFS on 14,179 annotated phage genomes, and compared our results to the initial training-set creation step of four other similar methods (Glimmer, MED2, PHANOTATE, Prodigal). We found that GOODORFS was the most accurate (0.94) and had the best F1-score (0.85), while Glimmer had the highest precision (0.92) and PHANOTATE had the highest recall (0.96).

Publication

Mosaic Prophages with Horizontally Acquired Genes Account for the Emergence and Diversification of the Globally Disseminated M1T1 Clone of Streptococcus pyogenes

Publisher: American Society for Microbiology

Date: 15-05-2005

DOI: 10.1128/JB.187.10.3311-3318.2005

Abstract: The recrudescence of severe invasive group A streptococcal (GAS) diseases has been associated with relatively few strains, including the M1T1 subclone that has shown an unprecedented global spread and prevalence and high virulence in susceptible hosts. To understand its unusual epidemiology, we aimed to identify unique genomic features that differentiate it from the fully sequenced M1 SF370 strain. We constructed DNA microarrays from an M1T1 shotgun library and, using differential hybridization, we found that both M1 strains are 95% identical and that the 5% unique M1T1 clone sequences more closely resemble sequences found in the M3 strain, which is also associated with severe disease. Careful analysis of these unique sequences revealed three unique prophages that we named M1T1.X, M1T1.Y, and M1T1.Z. While M1T1.Y is similar to phage 370.3 of the M1-SF370 strain, M1T1.X and M1T1.Z are novel and encode the toxins SpeA2 and Sda1, respectively. The genomes of these prophages are highly mosaic, with different segments being related to distinct streptococcal phages, suggesting that GAS phages continue to exchange genetic material. Bioinformatic and phylogenetic analyses revealed a highly conserved open reading frame (ORF) adjacent to the toxins in 18 of the 21 toxin-carrying GAS prophages. We named this ORF paratox, determined its allelic distribution among different phages, and found linkage disequilibrium between particular paratox alleles and specific toxin genes, suggesting that they may move as a single cassette. Based on the conservation of paratox and other genes flanking the toxins, we propose a recombination-based model for toxin dissemination among prophages. We also provide evidence that a minor population of the M1T1 clonal isolates have exchanged their virulence module on phage M1T1.Y, replacing it with a different module identical to that found on a related M3 phage. Taken together, the data demonstrate that mosaicism of the GAS prophages has contributed to the emergence and ersification of the M1T1 subclone.

Publication

Elucidating genomic gaps using phenotypic profiles

Publisher: F1000 Research Ltd

Date: 17-10-2016

DOI: 10.12688/F1000RESEARCH.5140.2

Abstract: Advances in genomic sequencing provide the ability to model the metabolism of organisms from their genome annotation. The bioinformatics tools developed to deduce gene function through homology-based methods are dependent on public databases thus, novel discoveries are not readily extrapolated from current analysis tools with a homology dependence. Multi-phenotype Assay Plates (MAPs) provide a high-throughput method to profile bacterial phenotypes by growing bacteria in various growth conditions, simultaneously. More robust and accurate computational models can be constructed by coupling MAPs with current genomic annotation methods. PMAnalyzer is an online tool that analyzes bacterial growth curves from the MAP system which are then used to optimize metabolic models during in silico growth simulations. Using Citrobacter sedlakii as a prototype, the Rapid Annotation using Subsystem Technology (RAST) tool produced a model consisting of 1,367 enzymatic reactions. After the optimization, 44 reactions were added to, or modified within, the model. The model correctly predicted the outcome on 93% of growth experiments.

Publication

Lysogeny and Sporulation in Bacillus Isolates from the Gulf of Mexico

Publisher: American Society for Microbiology

Date: 02-2010

DOI: 10.1128/AEM.01710-09

Abstract: Eleven Bacillus isolates from the surface and subsurface waters of the Gulf of Mexico were examined for their capacity to sporulate and harbor prophages. Occurrence of sporulation in each isolate was assessed through decoyinine induction, and putative lysogens were identified by prophage induction by mitomycin C treatment. No obvious correlation between ability to sporulate and prophage induction was found. Four strains that contained inducible virus-like particles (VLPs) were shown to sporulate. Four strains did not produce spores upon induction by decoyinine but contained inducible VLPs. Two of the strains did not produce virus-like particles or sporulate significantly upon induction. Isolate B14905 had a high level of virus-like particle production and a high occurrence of sporulation and was further examined by genomic sequencing in an attempt to shed light on the relationship between sporulation and lysogeny. In silico analysis of the B14905 genome revealed four prophage-like regions, one of which was independently sequenced from a mitomycin C-induced lysate. Based on PCR and transmission electron microscopy (TEM) analysis of an induced phage lysate, one is a noninducible phage remnant, one may be a defective phage-like bacteriocin, and two were inducible prophages. One of the inducible phages contained four putative transcriptional regulators, one of which was a SinR-like regulator that may be involved in the regulation of host sporulation. Isolates that both possess the capacity to sporulate and contain temperate phage may be well adapted for survival in the oligotrophic ocean.

Publication

A diversity-generating retroelement encoded by a globally ubiquitous Bacteroides phage

Publisher: Springer Science and Business Media LLC

Date: 23-10-2018

DOI: 10.1186/S40168-018-0573-6

Publication

Prodigious Prevotella phages

Publisher: Springer Science and Business Media LLC

Date: 21-03-2019

DOI: 10.1038/S41564-019-0419-4

Publication

Viral diversity and dynamics in an infant gut

Publisher: Elsevier BV

Date: 06-2008

DOI: 10.1016/J.RESMIC.2008.04.006

Abstract: Metagenomic sequencing of DNA viruses from the feces of a healthy week-old infant revealed a viral community with extremely low ersity. The identifiable sequences were dominated by phages, which likely influence the ersity and abundance of co-occurring microbes. The most abundant fecal viral sequences did not originate from breast milk or formula, suggesting a non-dietary initial source of viruses. Certain sequences were stable in the infant's gut over the first 3 months of life, but microarray experiments demonstrated that the overall viral community composition changed dramatically between 1 and 2 weeks of age.

Publication

RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

Publisher: Springer Science and Business Media LLC

Date: 10-02-2015

DOI: 10.1038/SREP08365

Publication

Charting the diversity of uncultured viruses of Archaea and Bacteria

Publisher: Springer Science and Business Media LLC

Date: 12-2019

DOI: 10.1186/S12915-019-0723-8

Abstract: Viruses of Archaea and Bacteria are among the most abundant and erse biological entities on Earth. Unraveling their bio ersity has been challenging due to methodological limitations. Recent advances in culture-independent techniques, such as metagenomics, shed light on the unknown viral ersity, revealing thousands of new viral nucleotide sequences at an unprecedented scale. However, these novel sequences have not been properly classified and the evolutionary associations between them were not resolved. Here, we performed phylogenomic analysis of nearly 200,000 viral nucleotide sequences to establish GL-UVAB: Genomic Lineages of Uncultured Viruses of Archaea and Bacteria . The pan-genome content of the identified lineages shed light on some of their infection strategies, potential to modulate host physiology, and mechanisms to escape host resistance systems. Furthermore, using GL-UVAB as a reference database for annotating metagenomes revealed elusive habitat distribution patterns of viral lineages and environmental drivers of community composition. These findings provide insights about the genomic ersity and ecology of viruses of prokaryotes. The source code used in these analyses is freely available at rojects/gluvab/ .

Publication

Chromosomal Rearrangements Formed by rrn Recombination Do Not Improve Replichore Balance in Host-Specific Salmonella enterica Serovars

Publisher: Public Library of Science (PLoS)

Date: 19-10-2010

DOI: 10.1371/JOURNAL.PONE.0013503

Publication

Bacterial Community Associated with the Reef Coral Mussismilia braziliensis's Momentum Boundary Layer over a Diel Cycle

Publisher: Frontiers Media SA

Date: 22-05-2017

DOI: 10.3389/FMICB.2017.00784

Publication

The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation

Publisher: Oxford University Press (OUP)

Date: 03-01-2007

DOI: 10.1093/NAR/GKL947

Publication

Whole proteome analysis of post-translational modifications: Applications of mass-spectrometry for proteogenomic annotation

Publisher: Cold Spring Harbor Laboratory

Date: 09-08-2007

DOI: 10.1101/GR.6427907

Abstract: While bacterial genome annotations have significantly improved in recent years, techniques for bacterial proteome annotation (including post-translational chemical modifications, signal peptides, proteolytic events, etc.) are still in their infancy. At the same time, the number of sequenced bacterial genomes is rising sharply, far outpacing our ability to validate the predicted genes, let alone annotate bacterial proteomes. In this study, we use tandem mass spectrometry (MS/MS) to annotate the proteome of Shewanella oneidensis MR-1, an important microbe for bioremediation. In particular, we provide the first comprehensive map of post-translational modifications in a bacterial genome, including a large number of chemical modifications, signal peptide cleavages, and cleavages of N-terminal methionine residues. We also detect multiple genes that were missed or assigned incorrect start positions by gene prediction programs, and suggest corrections to improve the gene annotation. This study demonstrates that complementing every genome sequencing project by an MS/MS project would significantly improve both genome and proteome annotations for a reasonable cost.

Publication

Erratum: Functional metagenomic profiling of nine biomes

Publisher: Springer Science and Business Media LLC

Date: 10-2008

DOI: 10.1038/NATURE07346

Publication

Functional metagenomic profiling of nine biomes

Publisher: Springer Science and Business Media LLC

Date: 12-03-2008

DOI: 10.1038/NATURE06810

Abstract: Microbial activities shape the biogeochemistry of the planet and macroorganism health. Determining the metabolic processes performed by microbes is important both for understanding and for manipulating ecosystems (for ex le, disruption of key processes that lead to disease, conservation of environmental services, and so on). Describing microbial function is h ered by the inability to culture most microbes and by high levels of genomic plasticity. Metagenomic approaches analyse microbial communities to determine the metabolic processes that are important for growth and survival in any given environment. Here we conduct a metagenomic comparison of almost 15 million sequences from 45 distinct microbiomes and, for the first time, 42 distinct viromes and show that there are strongly discriminatory metabolic profiles across environments. Most of the functional ersity was maintained in all of the communities, but the relative occurrence of metabolisms varied, and the differences between metagenomes predicted the biogeochemical conditions of each environment. The magnitude of the microbial metabolic capabilities encoded by the viromes was extensive, suggesting that they serve as a repository for storing and sharing genes among their microbial hosts and influence global evolutionary and metabolic processes.

Publication

The smallest cells pose the biggest problems: high-performance computing and the analysis of metagenome sequence data

Publisher: IOP Publishing

Date: 07-2008

DOI: 10.1088/1742-6596/125/1/012050

Publication

Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software

Publisher: Springer Science and Business Media LLC

Date: 02-10-2017

DOI: 10.1038/NMETH.4458

Publication

RaFAH: Host prediction for viruses of Bacteria and Archaea based on protein content

Publisher: Elsevier BV

Date: 07-2021

DOI: 10.1016/J.PATTER.2021.100274

Publication

Perspective on taxonomic classification of uncultivated viruses

Publisher: Elsevier BV

Date: 12-2021

DOI: 10.1016/J.COVIRO.2021.10.011

Abstract: Historically, virus taxonomy has been limited to describing viruses that were readily cultivated in the laboratory or emerging in natural biomes. Metagenomic analyses, single-particle sequencing, and database mining efforts have yielded new sequence data on an astounding number of previously unknown viruses. As metagenomes are relatively free of biases, these data provide an unprecedented insight into the vastness of the virosphere, but to properly value the extent of this ersity it is critical that the viruses are taxonomically classified. Inclusion of uncultivated viruses has already improved the process as well as the understanding of the taxa, viruses, and their evolutionary relationships. The continuous development and testing of computational tools will be required to maintain a dynamic virus taxonomy that can accommodate the new discoveries.

Publication

Fastq-pair: Efficient synchronization of paired-end fastq files

Publisher: Cold Spring Harbor Laboratory

Date: 19-02-2019

DOI: 10.1101/552885

Abstract: Paired end DNA sequencing provides additional information about the sequence data that is used in sequence assembly, mapping, and other downstream bioinformatics analysis. Paired end reads are usually provided as two fastq-format files, with each file representing one end of the read. Many commonly used downstream tools require that the sequence reads appear in each file in the same order, and reads that do not have a pair in the corresponding file are placed in a separate file of singletons. Although most sequencing instruments capable of generating paired end reads produce files where each read has a corresponding mate, many downstream bioinformatics manipulations break the one-to-one correspondence between reads, and paired-end sequence files loose synchronicity, and contain either unordered sequences or sequences in one or other file without a mate. Trivial solutions to this problem require reading one or both of the DNA sequence files into memory but quickly become limited by computational resources for moderate to large sized sequence files that are common nowadays. Here, we introduce a fast and memory efficient solution, written in C for portability, that synchronizes paired-end fastq files for subsequent analysis and places unmatched reads into singleton files. Fastq-pair is freely available from insalrob/fastq-pair and is released under the MIT license.

Publication

Statement in Support of: “Virology under the Microscope—a Call for Rational Discourse”

Publisher: American Society for Microbiology

Date: 27-06-2023

DOI: 10.1128/MBIO.00815-23

Publication

Genome-Wide Study of the Defective Sucrose Fermenter Strain of Vibrio cholerae from the Latin American Cholera Epidemic

Publisher: Public Library of Science (PLoS)

Date: 25-05-2012

DOI: 10.1371/JOURNAL.PONE.0037283

Publication

Modeling of the Coral Microbiome: the Influence of Temperature and Microbial Network

Publisher: American Society for Microbiology

Date: 28-04-2020

DOI: 10.1128/MBIO.02691-19

Abstract: Coral microbiome dysbiosis (i.e., shifts in the microbial community structure or complete loss of microbial symbionts) caused by environmental changes is a key player in the decline of coral health worldwide. Multiple factors in the water column and the surrounding biological community influence the dynamics of the coral microbiome. However, by including only temperature as an external factor, our model proved to be successful in describing the microbial community associated with the surface mucus layer (SML) of the coral P. strigosa . The dynamic model developed and validated in this study is a potential tool to predict the coral microbiome under different temperature conditions.

Publication

Genomic and ecological attributes of marine bacteriophages encoding bacterial virulence genes

Publisher: Springer Science and Business Media LLC

Date: 05-02-2020

DOI: 10.1186/S12864-020-6523-2

Abstract: Bacteriophages encode genes that modify bacterial functions during infection. The acquisition of phage-encoded virulence genes is a major mechanism for the rise of bacterial pathogens. In coral reefs, high bacterial density and lysogeny has been proposed to exacerbate reef decline through the transfer of phage-encoded virulence genes. However, the functions and distribution of these genes in phage virions on the reef remain unknown. Here, over 28,000 assembled viral genomes from the free viral community in Atlantic and Pacific Ocean coral reefs were queried against a curated database of virulence genes. The ersity of virulence genes encoded in the viral genomes was tested for relationships with host taxonomy and bacterial density in the environment. These analyses showed that bacterial density predicted the profile of virulence genes encoded by phages. The Shannon ersity of virulence-encoding phages was negatively related with bacterial density, leading to dominance of fewer genes at high bacterial abundances. A statistical learning analysis showed that reefs with high microbial density were enriched in viruses encoding genes enabling bacterial recognition and invasion of metazoan epithelium. Over 60% of phages could not have their hosts identified due to limitations of host prediction tools for those which hosts were identified, host taxonomy was not an indicator of the presence of virulence genes. This study described bacterial virulence factors encoded in the genomes of bacteriophages at the community level. The results showed that the increase in microbial densities that occurs during coral reef degradation is associated with a change in the genomic repertoire of bacteriophages, specifically in the ersity and distribution of bacterial virulence genes. This suggests that phages are implicated in the rise of pathogens in disturbed marine ecosystems.

Publication

No Evidence Known Viruses Play a Role in the Pathogenesis of Onchocerciasis-Associated Epilepsy. An Explorative Metagenomic Case-Control Study

Publisher: MDPI AG

Date: 22-06-2021

DOI: 10.3390/PATHOGENS10070787

Abstract: Despite the increasing epidemiological evidence that the Onchocerca volvulus parasite is strongly associated with epilepsy in children, hence the name onchocerciasis-associated epilepsy (OAE), the pathophysiological mechanism of OAE remains to be elucidated. In June 2014, children with unprovoked convulsive epilepsy and healthy controls were enrolled in a case control study in Titule, Bas-Uélé Province in the Democratic Republic of the Congo (DRC) to identify risk factors for epilepsy. Using a subset of s les collected from in iduals enrolled in this study (16 persons with OAE and 9 controls) plasma, buffy coat, and cerebrospinal fluid (CSF) were subjected to random-primed next-generation sequencing. The resulting sequences were analyzed using sensitive computational methods to identify viral DNA and RNA sequences. Anneloviridae, Flaviviridae, Hepadnaviridae (Hepatitis B virus), Herpesviridae, Papillomaviridae, Polyomaviridae (Human polyomavirus), and Virgaviridae were identified in cases and in controls. Not unexpectedly, a variety of bacteriophages were also detected in all cases and controls. However, none of the identified viral sequences were found enriched in OAE cases, which was our criteria for agents that might play a role in the etiology or pathogenesis of OAE.

Publication

Multi-Analytical Approach Reveals Potential Microbial Indicators in Soil for Sugarcane Model Systems

Publisher: Public Library of Science (PLoS)

Date: 09-06-2015

DOI: 10.1371/JOURNAL.PONE.0129765

Publication

Comparative genomics of closely related salmonellae

Publisher: Elsevier BV

Date: 02-2002

DOI: 10.1016/S0966-842X(01)02293-4

Abstract: As the number of completed genome sequences increases, there is increasing emphasis on comparative genomic analysis of closely related organisms. Comparison of the similarities and differences between the five publicly available Salmonella genome sequences reveals extensive sequence conservation among the Salmonella serovars. However, horizontal gene transfer has provided each genome with between 10% and 12% of unique DNA. Genome comparisons of the closely related salmonellae emphasize the insights that can be gleaned from sequencing genomes of a single species.

Publication

Taxonomic and Functional Microbial Signatures of the Endemic Marine Sponge Arenosclera brasiliensis

Publisher: Public Library of Science (PLoS)

Date: 02-07-2012

DOI: 10.1371/JOURNAL.PONE.0039905

Publication

Erratum for Cazares et al., “A Novel Group of Promiscuous Podophages Infecting Diverse Gammaproteobacteria from River Communities Exhibits Dynamic Intergenus Host Adaptation”

Publisher: American Society for Microbiology

Date: 28-06-2022

DOI: 10.1128/MSYSTEMS.00426-22

Publication

Draft Genome Sequence of the Shrimp Pathogen Vibrio harveyi CAIM 1792

Publisher: American Society for Microbiology

Date: 15-04-2012

DOI: 10.1128/JB.00079-12

Abstract: Vibrio harveyi is a Gram-negative bacterium found in tropical and temperate marine environments as a free-living organism or in association with aquatic animals. We report the first sequenced genome of a Vibrio harveyi strain, CAIM 1792, the etiologic agent of the “bright red” syndrome of the Pacific white shrimp Litopenaeus vannamei .

Publication

SEED Servers: High-Performance Access to the SEED Genomes, Annotations, and Metabolic Models

Publisher: Public Library of Science (PLoS)

Date: 24-10-2012

DOI: 10.1371/JOURNAL.PONE.0048053

Publication

Towards Predicting Gut Microbial Metabolism: Integration of Flux Balance Analysis and Untargeted Metabolomics

Publisher: MDPI AG

Date: 17-04-2020

DOI: 10.3390/METABO10040156

Abstract: Genomics-based metabolic models of microorganisms currently have no easy way of corroborating predicted biomass with the actual metabolites being produced. This study uses untargeted mass spectrometry-based metabolomics data to generate a list of accurate metabolite masses produced from the human commensal bacteria Citrobacter sedlakii grown in the presence of a simple glucose carbon source. A genomics-based flux balance metabolic model of this bacterium was previously generated using the bioinformatics tool PyFBA and phenotypic growth curve data. The high-resolution mass spectrometry data obtained through timed metabolic extractions were integrated with the predicted metabolic model through a program called MS_FBA. This program correlated untargeted metabolomics features from C. sedlakii with 218 of the 699 metabolites in the model using an exact mass match, with 51 metabolites further confirmed using predicted isotope ratios. Over 1400 metabolites were matched with additional metabolites in the ModelSEED database, indicating the need to incorporate more specific gene annotations into the predictive model through metabolomics-guided gap filling.

Publication

Genome of Staphylococcal Phage K: a New Lineage of Myoviridae Infecting Gram-Positive Bacteria with a Low G+C Content

Publisher: American Society for Microbiology

Date: 05-2004

DOI: 10.1128/JB.186.9.2862-2871.2004

Abstract: Phage K is a polyvalent phage of the Myoviridae family which is active against a wide range of staphylococci. Phage genome sequencing revealed a linear DNA genome of 127,395 bp, which carries 118 putative open reading frames. The genome is organized in a modular form, encoding modules for lysis, structural proteins, DNA replication, and transcription. Interestingly, the structural module shows high homology to the structural module from Listeria phage A511, suggesting intergenus horizontal transfer. In addition, phage K exhibits the potential to encode proteins necessary for its own replisome, including DNA ligase, primase, helicase, polymerase, RNase H, and DNA binding proteins. Phage K has a complete absence of GATC sites, making it insensitive to restriction enzymes which cleave this sequence. Three introns ( lys -I1, pol -I2, and pol -I3) encoding putative endonucleases were located in the genome. Two of these ( pol -I2 and pol -I3) were found to interrupt the DNA polymerase gene, while the other ( lys -I1) interrupts the lysin gene. Two of the introns encode putative proteins with homology to HNH endonucleases, whereas the other encodes a 270-amino-acid protein which contains two zinc fingers (CX 2 CX 22 CX 2 C and CX 2 CX 23 CX 2 C). The availability of the genome of this highly virulent phage, which is active against infective staphylococci, should provide new insights into the biology and evolution of large broad-spectrum polyvalent phages.

Publication

Using viromes to predict novel immune proteins in non-model organisms

Publisher: The Royal Society

Date: 31-08-2016

DOI: 10.1098/RSPB.2016.1200

Abstract: Immunity is mostly studied in a few model organisms, leaving the majority of immune systems on the planet unexplored. To characterize the immune systems of non-model organisms alternative approaches are required. Viruses manipulate host cell biology through the expression of proteins that modulate the immune response. We hypothesized that metagenomic sequencing of viral communities would be useful to identify both known and unknown host immune proteins. To test this hypothesis, a mock human virome was generated and compared to the human proteome using tBLASTn, resulting in 36 proteins known to be involved in immunity. This same pipeline was then applied to reef-building coral, a non-model organism that currently lacks traditional molecular tools like transgenic animals, gene-editing capabilities, and in vitro cell cultures. Viromes isolated from corals and compared with the predicted coral proteome resulted in 2503 coral proteins, including many proteins involved with pathogen sensing and apoptosis. There were also 159 coral proteins predicted to be involved with coral immunity but currently lacking any functional annotation. The pipeline described here provides a novel method to rapidly predict host immune components that can be applied to virtually any system with the potential to discover novel immune proteins.

Publication

From DNA to FBA: How to Build Your Own Genome-Scale Metabolic Model

Publisher: Frontiers Media SA

Date: 17-06-2016

DOI: 10.3389/FMICB.2016.00907

Publication

Allelic variation contributes to bacterial host specificity

Publisher: Springer Science and Business Media LLC

Date: 30-10-2015

DOI: 10.1038/NCOMMS9754

Abstract: Understanding the molecular parameters that regulate cross-species transmission and host adaptation of potential pathogens is crucial to control emerging infectious disease. Although microbial pathotype ersity is conventionally associated with gene gain or loss, the role of pathoadaptive nonsynonymous single-nucleotide polymorphisms (nsSNPs) has not been systematically evaluated. Here, our genome-wide analysis of core genes within Salmonella enterica serovar Typhimurium genomes reveals a high degree of allelic variation in surface-exposed molecules, including adhesins that promote host colonization. Subsequent multinomial logistic regression, MultiPhen and Random Forest analyses of known/suspected adhesins from 580 independent Typhimurium isolates identifies distinct host-specific nsSNP signatures. Moreover, population and functional analyses of host-associated nsSNPs for FimH, the type 1 fimbrial adhesin, highlights the role of key allelic residues in host-specific adherence in vitro . Together, our data provide the first concrete evidence that functional differences between allelic variants of bacterial proteins likely contribute to pathoadaption to erse hosts.

Publication

Taxonomy of prokaryotic viruses: update from the ICTV bacterial and archaeal viruses subcommittee

Publisher: Springer Science and Business Media LLC

Date: 05-01-2016

DOI: 10.1007/S00705-015-2728-0

Publication

Fimbrial expression in enteric bacteria: a critical step in intestinal pathogenesis

Publisher: Elsevier BV

Date: 07-1998

DOI: 10.1016/S0966-842X(98)01288-8

Abstract: The ability of species of enteric bacteria to recognize and colonize unique niches along the intestine is mainly based on receptor distribution and interpretation of a combination of environmental signals leading to the expression of specific adherence factors. Such elaborate orchestration of events is critical during the initial steps of pathogenesis.

Publication

Cystic Fibrosis Rapid Response: Translating Multi-omics Data into Clinically Relevant Information

Publisher: American Society for Microbiology

Date: 30-04-2019

DOI: 10.1128/MBIO.00431-19

Abstract: Proper management of polymicrobial infections in patients with cystic fibrosis (CF) has extended their life span. Information about the composition and dynamics of each patient’s microbial community aids in the selection of appropriate treatment of pulmonary exacerbations. We propose the cystic fibrosis rapid response (CFRR) as a fast approach to determine viral and microbial community composition and activity during CF pulmonary exacerbations. The CFRR potential is illustrated with a case study in which a cystic fibrosis fatal exacerbation was characterized by the presence of shigatoxigenic Escherichia coli . The incorporation of the CFRR within the CF clinic could increase the life span and quality of life of CF patients.

Publication

The human gut virome: composition, colonization, interactions, and impacts on human health

Publisher: Frontiers Media SA

Date: 24-05-2023

DOI: 10.3389/FMICB.2023.963173

Abstract: The gut virome is an incredibly complex part of the gut ecosystem. Gut viruses play a role in many disease states, but it is unknown to what extent the gut virome impacts everyday human health. New experimental and bioinformatic approaches are required to address this knowledge gap. Gut virome colonization begins at birth and is considered unique and stable in adulthood. The stable virome is highly specific to each in idual and is modulated by varying factors such as age, diet, disease state, and use of antibiotics. The gut virome primarily comprises bacteriophages, predominantly order Crassvirales, also referred to as crAss-like phages, in industrialized populations and other Caudoviricetes (formerly Caudovirales ). The stability of the virome’s regular constituents is disrupted by disease. Transferring the fecal microbiome, including its viruses, from a healthy in idual can restore the functionality of the gut. It can alleviate symptoms of chronic illnesses such as colitis caused by Clostridiodes difficile . Investigation of the virome is a relatively novel field, with new genetic sequences being published at an increasing rate. A large percentage of unknown sequences, termed ‘viral dark matter’, is one of the significant challenges facing virologists and bioinformaticians. To address this challenge, strategies include mining publicly available viral datasets, untargeted metagenomic approaches, and utilizing cutting-edge bioinformatic tools to quantify and classify viral species. Here, we review the literature surrounding the gut virome, its establishment, its impact on human health, the methods used to investigate it, and the viral dark matter veiling our understanding of the gut virome.

Publication

Improved allelic exchange vectors and their use to analyze 987P fimbria gene expression

Publisher: Elsevier BV

Date: 1998

DOI: 10.1016/S0378-1119(97)00619-7

Abstract: A series of vectors has been developed to provide improved positive and negative selection for allelic exchange. Based on homologous regions of DNA ranging in size from less than 200 bp to over 1 kb, we have successfully used these new plasmids to introduce or remove markers in chromosomal or plasmid DNA. Wild type fimbria genes were replaced both in Salmonella enteritidis (sefA, agfA and fimC) and Escherichia coli (fasA and fasH). Regulation of 987P fimbriation could be identified after replacement of fasA and fasH with allelic reporter fusions. The expression of fasA but not fasH is dependent upon the osmolarity of the growth medium in an HNS-dependent manner, but unlike some other fimbrial systems expression is not dependent on the exogenous iron concentration.

Publication

Finding novel genes in bacterial communities isolated from the environment

Publisher: Oxford University Press (OUP)

Date: 15-07-2006

DOI: 10.1093/BIOINFORMATICS/BTL247

Abstract: Motivation: Novel sequencing techniques can give access to organisms that are difficult to cultivate using conventional methods. When applied to environmental s les, the data generated has some drawbacks, e.g. short length of assembled contigs, in-frame stop codons and frame shifts. Unfortunately, current gene finders cannot circumvent these difficulties. At the same time, the automated prediction of genes is a prerequisite for the increasing amount of genomic sequences to ensure progress in metagenomics. Results: We introduce a novel gene finding algorithm that incorporates features overcoming the short length of the assembled contigs from environmental data, in-frame stop codons as well as frame shifts contained in bacterial sequences. The results show that by searching for sequence similarities in an environmental s le our algorithm is capable of detecting a high fraction of its gene content, depending on the species composition and the overall size of the s le. The method is valuable for hunting novel unknown genes that may be specific for the habitat where the s le is taken. Finally, we show that our algorithm can even exploit the limited information contained in the short reads generated by 454 technology for the prediction of protein coding genes. Availability: The program is freely available upon request. Contact: Lutz.Krause@CeBiTec.Uni-Bielefeld.DE

Publication

Towards the biogeography of butyrate-producing bacteria

Publisher: Cold Spring Harbor Laboratory

Date: 08-10-2022

DOI: 10.1101/2022.10.07.510278

Abstract: Butyrate-producing bacteria are found in many outdoor ecosystems and host organisms, including humans, and are vital to ecosystem functionality and human health. These bacteria ferment organic matter, producing the short-chain fatty acid butyrate. However, few (if any) studies have examined the macroecological influences on their large-scale biogeographical distribution. Here we aimed to characterise their global biogeography together with key explanatory climatic, geographic, and physicochemical variables. Global, and the Australian continent 2005-2020 Butyrate-producing bacteria We developed new normalised butyrate production capacity (BPC) indices derived from global metagenomic ( n =13,078) and Australia-wide soil 16S rRNA ( n =1,331) data, using Geographic Information System (GIS) and modelling techniques to detail their ecological and biogeographical associations. The highest BPC scores were found in anoxic and fermentative environments, including the human and non-human animal gut, and in some plant-soil systems. Within plant-soil systems, roots and rhizospheres had the highest BPC scores. Among soil s les, geographic and climatic variables had the strongest overall influence on BPC scores, with human influence also making key contributions. Higher BPC scores were in soils from seasonally productive sandy rangelands, temperate rural residential areas, and sites with moderate-to-high soil iron concentrations. Abundances of butyrate-producing bacteria in outdoor soils follow complex ecological patterns influenced by geography, climate, soil chemistry, and hydrological fluctuations. Human population density and soil iron also play substantial roles, and their effects are dependent on a combination of ecological variables. These new biogeographical insights further our understanding of the global ecology patterns of butyrate-producing bacteria, with implications for emerging microbially-focussed ecological and human health policies.

Publication

Phage–bacteria relationships and CRISPR elements revealed by a metagenomic survey of the rumen microbiome

Publisher: Wiley

Date: 17-10-2011

DOI: 10.1111/J.1462-2920.2011.02593.X

Abstract: Viruses are the most abundant biological entities on the planet and play an important role in balancing microbes within an ecosystem and facilitating horizontal gene transfer. Although bacteriophages are abundant in rumen environments, little is known about the types of viruses present or their interaction with the rumen microbiome. We undertook random pyrosequencing of virus-enriched metagenomes (viromes) isolated from bovine rumen fluid and analysed the resulting data using comparative metagenomics. A high level of ersity was observed with up to 28,000 different viral genotypes obtained from each environment. The majority (~78%) of sequences did not match any previously described virus. Prophages outnumbered lytic phages approximately 2:1 with the most abundant bacteriophage and prophage types being associated with members of the dominant rumen phyla (Firmicutes and Proteobacteria). Metabolic profiling based on SEED subsystems revealed an enrichment of sequences with putative functional roles in DNA and protein metabolism, but a surprisingly low proportion of sequences assigned to carbohydrate and amino acid metabolism. We expanded our analysis to include previously described metagenomic data and 14 reference genomes. Clustered regularly interspaced short palindromic repeats (CRISPR) were detected in most of the microbial genomes, suggesting previous interactions between viral and microbial communities.

Publication

Applying Shannon's information theory to bacterial and phage genomes and metagenomes

Publisher: Springer Science and Business Media LLC

Date: 08-01-2013

DOI: 10.1038/SREP01033

Publication

The secret hidden in dust: Assessing the potential to use biological and chemical properties of the airborne fraction of soil for provenance assignment and forensic casework

Publisher: Elsevier BV

Date: 11-2023

DOI: 10.1016/J.FSIGEN.2023.102931

Publication

Mitochondrial genome to aid species delimitation and effective conservation of the Sharpnose Guitarfish (Glaucostegus granulatus)

Publisher: Elsevier BV

Date: 06-2020

DOI: 10.1016/J.MGENE.2020.100648

Publication

Emergent community architecture despite distinct diversity in the global whale shark (Rhincodon typus) epidermal microbiome

Publisher: Research Square Platform LLC

Date: 24-10-2022

DOI: 10.21203/RS.3.RS-2176943/V1

Abstract: Microbiomes confer beneficial physiological traits to their host, but microbial ersity is inherently variable, challenging the relationship between microbes and their functional contribution to host health. Here, we compare ersity and architectural complexity of the epidermal microbiome from 74 in idual whale sharks ( Rhincodon typus ) across five aggregations, globally. We hypothesised co-occurrence patterns would occur independently of ersity patterns. Whale shark aggregation was the most important factor discriminating taxonomic ersity patterns. Microbiome network architecture was similar across all aggregations with degree distributions matching Erdos-Renyi graphs. However, networks had greater modularity than expected, indicating definitive microbiome structure. In addition, whale sharks hosted 35 ‘core’ microbiome members supporting the high modularity observed in microbiomes. Therefore, while variability in microbiome ersity is high, network structure and core taxa are inherent characteristics of the microbiome in whale sharks. We suggest host-microbiome and microbe-microbe interactions which drive self-assembly of the microbiome are, in part, the result of emergent functions that support functionally redundant key core microbial members. Teaser Sentence: The skin microbiome of whale sharks has emergent co-occurrences structure despite distinct ersity patterns.

Publication

PhANNs, a fast and accurate tool and web server to classify phage structural proteins

Publisher: Public Library of Science (PLoS)

Date: 02-11-2020

DOI: 10.1371/JOURNAL.PCBI.1007845

Abstract: For any given bacteriophage genome or phage-derived sequences in metagenomic data sets, we are unable to assign a function to 50–90% of genes, or more. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most ergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an “other” category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence ersity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F 1 -score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten structural classes or, if not predicted to fall in one of the ten classes, as “other,” providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally.

Publication

Multivariate analysis of functional metagenomes

Publisher: Frontiers Media SA

Date: 2013

DOI: 10.3389/FGENE.2013.00041

Publication

Comparative genomics of 274 Vibrio cholerae genomes reveals mobile functions structuring three niche dimensions

Publisher: Springer Science and Business Media LLC

Date: 05-08-2014

DOI: 10.1186/1471-2164-15-654

Publication

Using pyrosequencing to shed light on deep mine microbial ecology

Publisher: Springer Science and Business Media LLC

Date: 20-03-2006

DOI: 10.1186/1471-2164-7-57

Abstract: Contrasting biological, chemical and hydrogeological analyses highlights the fundamental processes that shape different environments. Generating and interpreting the biological sequence data was a costly and time-consuming process in defining an environment. Here we have used pyrosequencing, a rapid and relatively inexpensive sequencing technology, to generate environmental genome sequences from two sites in the Soudan Mine, Minnesota, USA. These sites were adjacent to each other, but differed significantly in chemistry and hydrogeology. Comparisons of the microbes and the subsystems identified in the two s les highlighted important differences in metabolic potential in each environment. The microbes were performing distinct biochemistry on the available substrates, and subsystems such as carbon utilization, iron acquisition mechanisms, nitrogen assimilation, and respiratory pathways separated the two communities. Although the correlation between much of the microbial metabolism occurring and the geochemical conditions from which the s les were isolated could be explained, the reason for the presence of many pathways in these environments remains to be determined. Despite being physically close, these two communities were markedly different from each other. In addition, the communities were also completely different from other microbial communities sequenced to date. We anticipate that pyrosequencing will be widely used to sequence environmental s les because of the speed, cost, and technical advantages. Furthermore, subsystem comparisons rapidly identify the important metabolisms employed by the microbes in different environments.

Publication

Microbial taxonomy in the post-genomic era: Rebuilding from scratch?

Publisher: Springer Science and Business Media LLC

Date: 23-12-2014

DOI: 10.1007/S00203-014-1071-2

Abstract: Microbial taxonomy should provide adequate descriptions of bacterial, archaeal, and eukaryotic microbial ersity in ecological, clinical, and industrial environments. Its cornerstone, the prokaryote species has been re-evaluated twice. It is time to revisit polyphasic taxonomy, its principles, and its practice, including its underlying pragmatic species concept. Ultimately, we will be able to realize an old dream of our predecessor taxonomists and build a genomic-based microbial taxonomy, using standardized and automated curation of high-quality complete genome sequences as the new gold standard.

Publication

Genome Sequence of the Human Pathogen Vibrio cholerae Amazonia

Publisher: American Society for Microbiology

Date: 27-09-2011

DOI: 10.1128/JB.05643-11

Publication

Salmonella enterica Serovar Typhi Possesses a Unique Repertoire of Fimbrial Gene Sequences

Publisher: American Society for Microbiology

Date: 05-2001

DOI: 10.1128/IAI.69.5.2894-2901.2001

Abstract: Salmonella enterica serotype Typhi differs from nontyphoidal Salmonella serotypes by its strict host adaptation to humans and higher primates. Since fimbriae have been implicated in host adaptation, we investigated whether the serotype Typhi genome contains fimbrial operons which are unique to this pathogen or restricted to typhoidal Salmonella serotypes. This study established for the first time the total number of fimbrial operons present in an in idual Salmonella serotype. The serotype Typhi CT18 genome, which has been sequenced by the Typhi Sequencing Group at the Sanger Centre, contained a type IV fimbrial operon, an orthologue of the agf operon, and 12 putative fimbrial operons of the chaperone-usher assembly class. In addition to sef, fim, saf , and tcf , which had been described previously in serotype Typhi, we identified eight new putative chaperone-usher-dependent fimbrial operons, which were termed bcf, sta, stb, ste, std, stc, stg , and sth . Hybridization analysis performed with 16 strains of Salmonella reference collection C and 22 strains of Salmonella reference collection B showed that all eight putative fimbrial operons of serotype Typhi were also present in a number of nontyphoidal Salmonella serotypes. Thus, a simple correlation between host range and the presence of a single fimbrial operon seems at present unlikely. However, the serotype Typhi genome differed from that of all other Salmonella serotypes investigated in that it contained a unique combination of putative fimbrial operons.

Publication

Marine viruses discovered via metagenomics shed light on viral strategies throughout the oceans

Publisher: Springer Science and Business Media LLC

Date: 05-07-2017

DOI: 10.1038/NCOMMS15955

Abstract: Marine viruses are key drivers of host ersity, population dynamics and biogeochemical cycling and contribute to the daily flux of billions of tons of organic matter. Despite recent advancements in metagenomics, much of their bio ersity remains uncharacterized. Here we report a data set of 27,346 marine virome contigs that includes 44 complete genomes. These outnumber all currently known phage genomes in marine habitats and include members of previously uncharacterized lineages. We designed a new method for host prediction based on co-occurrence associations that reveals these viruses infect dominant members of the marine microbiome such as Prochlorococcus and Pelagibacter . A negative association between host abundance and the virus-to-host ratio supports the recently proposed Piggyback-the-Winner model of reduced phage lysis at higher host densities. An analysis of the abundance patterns of viruses throughout the oceans revealed how marine viral communities adapt to various seasonal, temperature and photic regimes according to targeted hosts and the ersity of auxiliary metabolic genes.

Publication

Diversification of the Salmonella Fimbriae: A Model of Macro- and Microevolution

Publisher: Public Library of Science (PLoS)

Date: 12-06-2012

DOI: 10.1371/JOURNAL.PONE.0038596

Publication

Characterization of the ELPhiS Prophage from Salmonella enterica Serovar Enteritidis Strain LK5

Publisher: American Society for Microbiology

Date: 15-03-2012

DOI: 10.1128/AEM.07241-11

Abstract: Phages are a primary driving force behind the evolution of bacterial pathogens by transferring a variety of virulence genes into their hosts. Similar to other bacterial genomes, the Salmonella enterica serovar Enteritidis LK5 genome contains several regions that are homologous to phages. Although genomic analysis demonstrated the presence of prophages, it was unable to confirm which phage elements within the genome were viable. Genetic markers were used to tag one of the prophages in the genome to allow monitoring of phage induction. Commonly used laboratory strains of Salmonella were resistant to phage infection, and therefore a rapid screen was developed to identify susceptible hosts. This approach showed that a genetically tagged prophage, ELPhiS (Enteritidis lysogenic phage S), was capable of infecting Salmonella serovars that are erse in host range and virulence and has the potential to laterally transfer genes between these serovars via lysogenic conversion. The rapid screen approach is adaptable to any system with a large collection of isolates and may be used to test the viability of prophages found by sequencing the genomes of various bacterial pathogens.

Publication

Genomics to aid species delimitation and effective conservation of the Sharpnose Guitarfish (Glaucostegus granulatus)

Publisher: Cold Spring Harbor Laboratory

Date: 12-09-2019

DOI: 10.1101/767186

Abstract: The Sharpnose Guitarfish ( Glaucostegus granulatus ) is one of fifteen critically endangered Rhino Rays which has been exploited as incidental catch, leading to severe population depletions and localized disappearances. Like many chondrichthyan species, there are no species-specific time-series data available for the Sharpnose Guitarfish that can be used to calculate population reduction, partly due to a lack of species-specific reporting as well as limitations in accurate taxonomic identification. We here present the first complete mitochondrial genome and partial nuclear genome of the species and the first detail phylogenetic assessment of the species. We expect that data presented in the current manuscript will aid in accurate species-specific landing and population assessments of the species in the future and will enable conservation efforts to protect and recover remaining populations.

Publication

Phables

Publisher: Oxford University Press (OUP)

Date: 21-09-2023

DOI: 10.1093/BIOINFORMATICS/BTAD586

Publication

Local genomic adaptation of coral reef-associated microbiomes to gradients of natural variability and anthropogenic stressors

Publisher: Proceedings of the National Academy of Sciences

Date: 30-06-2014

DOI: 10.1073/PNAS.1403319111

Abstract: Microbial communities associated with coral reefs influence the health and sustenance of keystone benthic organisms (e.g., coral holobionts). The present study investigated the community structure and metabolic potential of microbes inhabiting coral reefs located across an extensive area in the central Pacific. We found that the taxa present correlated strongly with the percent coverage of corals and algae, while community metabolic potential correlated best with geographic location. These findings are inconsistent with prevailing biogeographic models of microbial ersity (e.g., distance decay) and metabolic potential (i.e., similar functional profiles regardless of phylogenetic variability). Based on these findings, we propose that the primary carbon sources determine community structure and that local biogeochemistry determines finer-scale metabolic function.

Publication

Diel population and functional synchrony of microbial communities on coral reefs

Publisher: Springer Science and Business Media LLC

Date: 12-04-2019

DOI: 10.1038/S41467-019-09419-Z

Abstract: On coral reefs, microorganisms are essential for recycling nutrients to primary producers through the remineralization of benthic-derived organic matter. Diel investigations of reef processes are required to holistically understand the functional roles of microbial players in these ecosystems. Here we report a metagenomic analysis characterizing microbial communities in the water column overlying 16 remote forereef sites over a diel cycle. Our results show that microbial community composition is more dissimilar between day and night s les collected from the same site than between day or night s les collected across geographically distant reefs. Diel community differentiation is largely driven by the flux of Psychrobacter sp., which is two-orders of magnitude more abundant during the day. Nighttime communities are enriched with species of Roseobacter , Halomonas , and Alteromonas encoding a greater variety of pathways for carbohydrate catabolism, further illustrating temporal patterns of energetic provisioning between different marine microbes. Dynamic diel fluctuations of microbial populations could also support the efficient trophic transfer of energy posited in coral reef food webs.

Publication

Viral and microbial community dynamics in four aquatic environments

Publisher: Springer Science and Business Media LLC

Date: 11-02-2010

DOI: 10.1038/ISMEJ.2010.1

Abstract: The species composition and metabolic potential of microbial and viral communities are predictable and stable for most ecosystems. This apparent stability contradicts theoretical models as well as the viral-microbial dynamics observed in simple ecosystems, both of which show Kill-the-Winner behavior causing cycling of the dominant taxa. Microbial and viral metagenomes were obtained from four human-controlled aquatic environments at various time points separated by one day to >1 year. These environments were maintained within narrow geochemical bounds and had characteristic species composition and metabolic potentials at all time points. However, underlying this stability were rapid changes at the fine-grained level of viral genotypes and microbial strains. These results suggest a model wherein functionally redundant microbial and viral taxa are cycling at the level of viral genotypes and virus-sensitive microbial strains. Microbial taxa, viral taxa, and metabolic function persist over time in stable ecosystems and both communities fluctuate in a Kill-the-Winner manner at the level of viral genotypes and microbial strains.

Publication

Abolishment of morphology-based taxa and change to binomial species names: 2022 taxonomy update of the ICTV bacterial viruses subcommittee

Publisher: Springer Science and Business Media LLC

Date: 23-01-2023

DOI: 10.1007/S00705-022-05694-2

Abstract: This article summarises the activities of the Bacterial Viruses Subcommittee of the International Committee on Taxonomy of Viruses for the period of March 2021−March 2022. We provide an overview of the new taxa proposed in 2021, approved by the Executive Committee, and ratified by vote in 2022. Significant changes to the taxonomy of bacterial viruses were introduced: the paraphyletic morphological families Podoviridae , Siphoviridae , and Myoviridae as well as the order Caudovirales were abolished, and a binomial system of nomenclature for species was established. In addition, one order, 22 families, 30 subfamilies, 321 genera, and 862 species were newly created, promoted, or moved.

Publication

Taxonomy of prokaryotic viruses: 2017 update from the ICTV Bacterial and Archaeal Viruses Subcommittee.

Publisher: Springer Science and Business Media LLC

Date: 22-01-2018

DOI: 10.1007/S00705-018-3723-Z

Publication

The Phage Proteomic Tree: a Genome-Based Taxonomy for Phage

Publisher: American Society for Microbiology

Date: 15-08-2002

DOI: 10.1128/JB.184.16.4529-4535.2002

Abstract: There are ∼10 31 phage in the biosphere, making them the most abundant biological entities on the planet. Despite their great numbers and ubiquitous presence, very little is known about phage bio ersity, biogeography, or phylogeny. Information is limited, in part, because the current ICTV taxonomical system is based on culturing phage and measuring physical parameters of the free virion. No sequence-based taxonomic systems have previously been established for phage. We present here the “Phage Proteomic Tree,” which is based on the overall similarity of 105 completely sequenced phage genomes. The Phage Proteomic Tree places phage relative to both their near neighbors and all other phage included in the analysis. This method groups phage into taxa that predicts several aspects of phage biology and highlights genetic markers that can be used for monitoring phage bio ersity. We propose that the Phage Proteomic Tree be used as the basis of a genome-based taxonomical system for phage.

Publication

Identification of major and minor chaperone proteins involved in the export of 987P fimbriae

Publisher: American Society for Microbiology

Date: 06-1996

DOI: 10.1128/JB.178.12.3426-3433.1996

Abstract: The 987P fimbriae of Escherichia coli consist mainly of the major subunit, FasA, and two minor subunits, FasF and FasG. In addition to the previously characterized outer membrane or usher protein FasD, the FasB, FasC, and FasE proteins are required for fimbriation. To better understand the roles of these minor proteins, their genes were sequenced and the predicted polypeptides were shown to be most similar to periplasmic chaperone proteins of fimbrial systems. Western blot (immunoblot) analysis and immunoprecipitation of various fas mutants with specific antibody probes identified both the subcellular localizations and associations of these minor components. FasB was shown to be a periplasmic chaperone for the major fimbrial subunit, FasA. A novel periplasmic chaperone, FasC, which stabilizes and specifically interacts with the adhesin, FasG, was identified. FasE, a chaperone-like protein, is also located in the periplasm and is required for optimal export of FasG and possibly other subunits. The use of different chaperone proteins for various 987P subunits is a novel observation for fimbrial biogenesis in bacteria. Whether other fimbrial systems use a similar tactic remains to be discovered.

Publication

PhiSiGns: an online tool to identify signature genes in phages and design PCR primers for examining phage diversity

Publisher: Springer Science and Business Media LLC

Date: 04-03-2012

DOI: 10.1186/1471-2105-13-37

Publication

A role for Salmonella fimbriae in intraperitoneal infections

Publisher: Proceedings of the National Academy of Sciences

Date: 02-2000

DOI: 10.1073/PNAS.97.3.1258

Abstract: Enteric bacteria possess multiple fimbriae, many of which play critical roles in attachment to epithelial cell surfaces. SEF14 fimbriae are only found in Salmonella enterica serovar Enteritidis ( S. enteritidis ) and closely related serovars, suggesting that SEF14 fimbriae may affect serovar-specific virulence traits. Despite evidence that SEF14 fimbriae are expressed by S. enteritidis in vivo , previous studies showed that SEF14 fimbriae do not mediate adhesion to the intestinal epithelium. Therefore, we tested whether SEF14 fimbriae are required for virulence at a stage in infection after the bacteria have passed the intestinal barrier. Polar mutations that disrupt the entire sef operon decreased virulence in mice more than 1,000-fold. Nonpolar mutations that disrupted sefA (encoding the major structural subunit) did not affect virulence, but mutations that disrupted sefD (encoding the putative adhesion subunit) resulted in a severe virulence defect. The results indicate that the putative SEF14 adhesion subunit is specifically required for a stage of the infection subsequent to transit across the intestinal barrier. Therefore, we tested whether SefD is required for uptake or survival in macrophages. The majority of wild-type bacteria were detected inside macrophages soon after i.p. infection, but the sefD mutants were not readily internalized by peritoneal macrophages. These results indicate that the potential SEF14 adhesion subunit is essential for efficient uptake or survival of S. enteritidis in macrophages. This report describes a role of fimbriae in intracellular infection, and indicates that fimbriae may be required for systemic infections at stages beyond the initial colonization of host epithelial surfaces.

Publication

Accessing the SEED Genome Databases via Web Services API: Tools for Programmers

Publisher: Springer Science and Business Media LLC

Date: 14-06-2010

DOI: 10.1186/1471-2105-11-319

Abstract: The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Ex le code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.

Publication

Phage Eco-Locator: a web tool for visualization and analysis of phage genomes in metagenomic data sets

Publisher: Springer Science and Business Media LLC

Date: 05-08-2011

DOI: 10.1186/1471-2105-12-S7-A9

Publication

Reference-independent comparative metagenomics using cross-assembly: crAss

Publisher: Oxford University Press (OUP)

Date: 16-10-2012

DOI: 10.1093/BIOINFORMATICS/BTS613

Abstract: Motivation: Metagenomes are often characterized by high levels of unknown sequences. Reads derived from known microorganisms can easily be identified and analyzed using fast homology search algorithms and a suitable reference database, but the unknown sequences are often ignored in further analyses, biasing conclusions. Nevertheless, it is possible to use more data in a comparative metagenomic analysis by creating a cross-assembly of all reads, i.e. a single assembly of reads from different s les. Comparative metagenomics studies the interrelationships between metagenomes from different s les. Using an assembly algorithm is a fast and intuitive way to link (partially) homologous reads without requiring a database of reference sequences. Results: Here, we introduce crAss, a novel bioinformatic tool that enables fast simple analysis of cross-assembly files, yielding distances between all metagenomic s le pairs and an insightful image displaying the similarities. Availability and implementation: crAss is available as a web server at rass/, and the Perl source code can be downloaded to run as a stand-alone command line tool. Contact: dutilh@cmbi.ru.nl Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

Phage Genome Annotation Using the RAST Pipeline

Publisher: Springer New York

Date: 14-11-2017

DOI: 10.1007/978-1-4939-7343-9_17

Abstract: Phages are complex biomolecular machineries that have to survive in a bacterial world. Phage genomes show many adaptations to their lifestyle such as shorter genes, reduced capacity for redundant DNA sequences, and the inclusion of tRNAs in their genomes. In addition, phages are not free-living, they require a host for replication and survival. These unique adaptations provide challenges for the bioinformatics analysis of phage genomes. In particular, ORF calling, genome annotation, noncoding RNA (ncRNA) identification, and the identification of transposons and insertions are all complicated in phage genome analysis. We provide a road map through the phage genome annotation pipeline, and discuss the challenges and solutions for phage genome annotation as we have implemented in the rapid annotation using subsystems (RAST) pipeline.

Publication

Emergent community architecture despite distinct diversity in the global whale shark (Rhincodon typus) epidermal microbiome

Publisher: Springer Science and Business Media LLC

Date: 07-08-2023

DOI: 10.1038/S41598-023-39184-5

Abstract: Microbiomes confer beneficial physiological traits to their host, but microbial ersity is inherently variable, challenging the relationship between microbes and their contribution to host health. Here, we compare the ersity and architectural complexity of the epidermal microbiome from 74 in idual whale sharks ( Rhincodon typus ) across five aggregations globally to determine if network properties may be more indicative of the microbiome-host relationship. On the premise that microbes are expected to exhibit biogeographic patterns globally and that distantly related microbial groups can perform similar functions, we hypothesized that microbiome co-occurrence patterns would occur independently of ersity trends and that keystone microbes would vary across locations. We found that whale shark aggregation was the most important factor in discriminating taxonomic ersity patterns. Further, microbiome network architecture was similar across all aggregations, with degree distributions matching Erdos–Renyi-type networks. The microbiome-derived networks, however, display modularity indicating a definitive microbiome structure on the epidermis of whale sharks. In addition, whale sharks hosted 35 high-quality metagenome assembled genomes (MAGs) of which 25 were present from all s le locations, termed the abundant ‘core’. Two main MAG groups formed, defined here as Ecogroup 1 and 2, based on the number of genes present in metabolic pathways, suggesting there are at least two important metabolic niches within the whale shark microbiome. Therefore, while variability in microbiome ersity is high, network structure and core taxa are inherent characteristics of the epidermal microbiome in whale sharks. We suggest the host-microbiome and microbe-microbe interactions that drive the self-assembly of the microbiome help support a functionally redundant abundant core and that network characteristics should be considered when linking microbiomes with host health.

Publication

multiPhATE: bioinformatics pipeline for functional annotation of phage isolates

Publisher: Cold Spring Harbor Laboratory

Date: 15-02-2019

DOI: 10.1101/551010

Abstract: To address the need for improved phage annotation tools that scale, we created an automated throughput annotation pipeline: multiple-genome Phage Annotation Toolkit and Evaluator (multiPhATE). multiPhATE is a throughput pipeline driver that invokes an annotation pipeline (PhATE) across a user-specified set of phage genomes. This tool incorporates a de novo phage gene-calling algorithm and assigns putative functions to gene calls using protein-, virus-, and phage-centric databases. multiPhATE’s modular construction allows the user to implement all or any portion of the analyses by acquiring local instances of the desired databases and specifying the desired analyses in a configuration file. We demonstrate multiPhATE by annotating two newly sequenced Yersinia pestis phage genomes. Within multiPhATE, the PhATE processing pipeline can be readily implemented across multiple processors, making it adaptable for throughput sequencing projects. Software documentation assists the user in configuring the system. multiPhATE was implemented in Python 3.7, and runs as a command-line code under Linux or Unix. multiPhATE is freely available under an open-source BSD3 license from arolzhou/multiPhATE . Instructions for acquiring the databases and third-party codes used by multiPhATE are included in the distribution README file. Users may report bugs by submitting to the github issues page associated with the multiPhATE distribution. zhou4@llnl.gov or carol.zhou@comcast.net . Data generated during the current study are included as supplementary files available for download at arolzhou/PhATE_docs .

Publication

The role of uridylyltransferase in the control ofKlebsiella pneumoniae nif gene regulation

Publisher: Springer Science and Business Media LLC

Date: 03-1995

DOI: 10.1007/BF00705649

Publication

RaFAH: A superior method for virus-host prediction

Publisher: Cold Spring Harbor Laboratory

Date: 27-09-2020

DOI: 10.1101/2020.09.25.313155

Abstract: Viruses of prokaryotes are extremely abundant and erse. Culture-independent approaches have recently shed light on the bio ersity these biological entities 1,2 . One fundamental question when trying to understand their ecological roles is: which host do they infect? To tackle this issue we developed a machine-learning approach named Random Forest Assignment of Hosts (RaFAH), based on the analysis of nearly 200,000 viral genomes. RaFAH outperformed other methods for virus-host prediction (F1-score = 0.97 at the level of phylum). RaFAH was applied to erse datasets encompassing genomes of uncultured viruses derived from eight different biomes of medical, biotechnological, and environmental relevance, and was capable of accurately describing these viromes. This led to the discovery of 537 genomic sequences of archaeal viruses. These viruses represent previously unknown lineages and their genomes encode novel auxiliary metabolic genes, which shed light on how these viruses interfere with the host molecular machinery. RaFAH is available at rojects/rafah/ .

Publication

Searching the Sequence Read Archive using Jetstream and Wrangler

Publisher: ACM

Date: 22-07-2018

DOI: 10.1145/3219104.3229278

Publication

Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases

Publisher: Proceedings of the National Academy of Sciences

Date: 10-02-2009

DOI: 10.1073/PNAS.0806191105

Abstract: The complex microbiome of the rumen functions as an effective system for the conversion of plant cell wall biomass to microbial protein, short chain fatty acids, and gases. As such, it provides a unique genetic resource for plant cell wall degrading microbial enzymes that could be used in the production of biofuels. The rumen and gastrointestinal tract harbor a dense and complex microbiome. To gain a greater understanding of the ecology and metabolic potential of this microbiome, we used comparative metagenomics (phylotype analysis and SEED subsystems-based annotations) to examine randomly s led pyrosequence data from 3 fiber-adherent microbiomes and 1 pooled liquid s le (a mixture of the liquid microbiome fractions from the same bovine rumens). Even though the 3 animals were fed the same diet, the community structure, predicted phylotype, and metabolic potentials in the rumen were markedly different with respect to nutrient utilization. A comparison of the glycoside hydrolase and cellulosome functional genes revealed that in the rumen microbiome, initial colonization of fiber appears to be by organisms possessing enzymes that attack the easily available side chains of complex plant polysaccharides and not the more recalcitrant main chains, especially cellulose. Furthermore, when compared with the termite hindgut microbiome, there are fundamental differences in the glycoside hydrolase content that appear to be diet driven for either the bovine rumen (forages and legumes) or the termite hindgut (wood).

Publication

Tractography dissection variability: What happens when 42 groups dissect 14 white matter bundles on the same dataset?

Publisher: Elsevier BV

Date: 11-2021

DOI: 10.1016/J.NEUROIMAGE.2021.118502

Publication

Metagenomics and metatranscriptomics: Windows on CF-associated viral and microbial communities

Publisher: Elsevier BV

Date: 03-2013

DOI: 10.1016/J.JCF.2012.07.009

Publication

Kullback Leibler divergence in complete bacterial and phage genomes

Publisher: PeerJ

Date: 30-11-2017

DOI: 10.7717/PEERJ.4026

Abstract: The amino acid content of the proteins encoded by a genome may predict the coding potential of that genome and may reflect lifestyle restrictions of the organism. Here, we calculated the Kullback–Leibler ergence from the mean amino acid content as a metric to compare the amino acid composition for a large set of bacterial and phage genome sequences. Using these data, we demonstrate that (i) there is a significant difference between amino acid utilization in different phylogenetic groups of bacteria and phages (ii) many of the bacteria with the most skewed amino acid utilization profiles, or the bacteria that host phages with the most skewed profiles, are endosymbionts or parasites (iii) the skews in the distribution are not restricted to certain metabolic processes but are common across all bacterial genomic subsystems (iv) amino acid utilization profiles strongly correlate with GC content in bacterial genomes but very weakly correlate with the G+C percent in phage genomes. These findings might be exploited to distinguish coding from non-coding sequences in large data sets, such as metagenomic sequence libraries, to help in prioritizing subsequent analyses.

Publication

PARTIE: a partition engine to separate metagenomic and amplicon projects in the Sequence Read Archive

Publisher: Oxford University Press (OUP)

Date: 30-03-2017

DOI: 10.1093/BIOINFORMATICS/BTX184

Abstract: The Sequence Read Archive (SRA) contains raw data from many different types of sequence projects. As of 2017, the SRA contained approximately ten petabases of DNA sequence (1016 bp). Annotations of the data are provided by the submitter, and mining the data in the SRA is complicated by both the amount of data and the detail within those annotations. Here, we introduce PARTIE, a partition engine optimized to differentiate sequence read data into metagenomic (random) and licon (targeted) sequence data sets. PARTIE subs les reads from the sequencing file and calculates four different statistics: k-mer frequency, 16S abundance, prokaryotic- and viral-read abundance. These metrics are used to create a RandomForest decision tree to classify the sequencing data, and PARTIE provides mechanisms for both supervised and unsupervised classification. We demonstrate the accuracy of PARTIE for classifying SRA data, discuss the probable error rates in the SRA annotations and introduce a resource assessing SRA data. PARTIE and reclassified metagenome SRA entries are available from insalrob artie Supplementary data are available at Bioinformatics online.

Publication

Marine Environmental Genomics: Unlocking the Ocean's Secrets

Publisher: The Oceanography Society

Date: 06-2007

DOI: 10.5670/OCEANOG.2007.48

Publication

Discovery of an expansive bacteriophage family that includes the most abundant viruses from the human gut

Publisher: Springer Science and Business Media LLC

Date: 13-11-2017

DOI: 10.1038/S41564-017-0053-Y

Publication

Statement in Support of: Virology under the Microscope-a Call for Rational Discourse

Publisher: American Society for Microbiology

Date: 31-05-2023

DOI: 10.1128/JVI.00451-23

Publication

GenomePeek - An online tool for prokaryotic and metagenome analysis

Publisher: PeerJ

Date: 08-10-2014

DOI: 10.7287/PEERJ.PREPRINTS.525V1

Abstract: As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek) was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

Publication

The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes

Publisher: Oxford University Press (OUP)

Date: 25-09-2005

DOI: 10.1093/NAR/GKI866

Publication

Metagenomic analysis indicates that stressors induce production of herpes-like viruses in the coral Porites compressa

Publisher: Proceedings of the National Academy of Sciences

Date: 25-11-2008

DOI: 10.1073/PNAS.0808985105

Abstract: During the last several decades corals have been in decline and at least one-third of all coral species are now threatened with extinction. Coral disease has been a major contributor to this threat, but little is known about the responsible pathogens. To date most research has focused on bacterial and fungal diseases however, viruses may also be important for coral health. Using a combination of empirical viral metagenomics and real-time PCR, we show that Porites compressa corals contain a suite of eukaryotic viruses, many related to the Herpesviridae. This coral-associated viral consortium was found to shift in response to abiotic stressors. In particular, when exposed to reduced pH, elevated nutrients, and thermal stress, the abundance of herpes-like viral sequences rapidly increased in 2 separate experiments. Herpes-like viral sequences were rarely detected in apparently healthy corals, but were abundant in a majority of stressed s les. In addition, surveys of the Nematostella and Hydra genomic projects demonstrate that even distantly related Cnidarians contain numerous herpes-like viral genes, likely as a result of latent or endogenous viral infection. These data support the hypotheses that corals experience viral infections, which are exacerbated by stress, and that herpes-like viruses are common in Cnidarians.

Publication

Analysis of Spounaviruses as a Case Study for the Overdue Reclassification of Tailed Bacteriophages

Publisher: Cold Spring Harbor Laboratory

Date: 16-11-2017

DOI: 10.1101/220434

Abstract: It is almost a cliché that tailed bacteriophages of the order Caudovirales are the most abundant and erse viruses in the world. Yet, their taxonomy still consists of a single order with just three families: Myoviridae , Siphoviridae , and Podoviridae . Thousands of newly discovered phage genomes have recently challenged this morphology-based classification, revealing that tailed bacteriophages are genomically even more erse than once thought. Here, we evaluate a range of methods for bacteriophage taxonomy by using a particularly challenging group as an ex le, the Bacillus phage SPO1-related viruses of the myovirid subfamily Spounavirinae . Exhaustive phylogenetic and phylogenomic analyses indicate that the spounavirins are consistent with the taxonomic rank of family and should be ided into at least five subfamilies. This work is a case study for virus genomic taxonomy and the first step in an impending massive reorganization of the tailed bacteriophage taxonomy.

Publication

Metagenomic analysis of stressed coral holobionts

Publisher: Wiley

Date: 08-2009

DOI: 10.1111/J.1462-2920.2009.01935.X

Abstract: The coral holobiont is the community of metazoans, protists and microbes associated with scleractinian corals. Disruptions in these associations have been correlated with coral disease, but little is known about the series of events involved in the shift from mutualism to pathogenesis. To evaluate structural and functional changes in coral microbial communities, Porites compressa was exposed to four stressors: increased temperature, elevated nutrients, dissolved organic carbon loading and reduced pH. Microbial metagenomic s les were collected and pyrosequenced. Functional gene analysis demonstrated that stressors increased the abundance of microbial genes involved in virulence, stress resistance, sulfur and nitrogen metabolism, motility and chemotaxis, fatty acid and lipid utilization, and secondary metabolism. Relative changes in taxonomy also demonstrated that coral-associated microbiota (Archaea, Bacteria, protists) shifted from a healthy-associated coral community (e.g. Cyanobacteria, Proteobacteria and the zooxanthellae Symbiodinium) to a community (e.g. Bacteriodetes, Fusobacteria and Fungi) of microbes often found on diseased corals. Additionally, low-abundance Vibrio spp. were found to significantly alter microbiome metabolism, suggesting that the contribution of a just a few members of a community can profoundly shift the health status of the coral holobiont.

Publication

How Metagenomics Has Transformed Our Understanding of Bacteriophages in Microbiome Research

Publisher: MDPI AG

Date: 19-08-2022

DOI: 10.3390/MICROORGANISMS10081671

Abstract: The microbiome is an essential part of most ecosystems. It was originally studied mostly through culturing but relatively few microbes can be cultured, so much of the microbiome was left unexplored. The emergence of metagenomic sequencing techniques changed that and allowed the study of microbiomes from all sorts of habitats. Metagenomic sequencing also allowed for a more thorough exploration of prophages, viruses that integrate into bacterial genomes, and how they benefit their hosts. One issue with using open-access metagenomic data is that sequences added to databases often have little to no metadata to work with, so finding enough sequences can be difficult. Many metagenomes have been manually curated but this is a time-consuming process and relies heavily on the uploader to be accurate and thorough when filling in metadata fields and the curators to be working with the same ontologies. Using algorithms to automatically sort metagenomes based on either the taxonomic profile or the functional profile may be a viable solution to the issues with manually curated metagenomes, but it requires that the algorithm is trained on carefully curated datasets and using the most informative profile possible in order to minimize errors.

Publication

Standardized bacteriophage purification for personalized phage therapy

Publisher: Springer Science and Business Media LLC

Date: 24-07-2020

DOI: 10.1038/S41596-020-0346-0

Publication

Phage and bacteria diversification through a prophage acquisition ratchet

Publisher: Cold Spring Harbor Laboratory

Date: 09-04-2020

DOI: 10.1101/2020.04.08.028340

Abstract: Lysogeny is prevalent in the microbial-dense mammalian gut. This contrasts the classical view of lysogeny as a refuge used by phages under poor host growth conditions. Here we hypothesize that as carrying capacity increases, lysogens escape phage top-down control through superinfection exclusion, overcoming the canonical trade-off between competition and resistance. This hypothesis was tested by developing an ecological model that combined lytic and lysogenic communities and a ersification model that estimated the accumulation of prophages in bacterial genomes. The ecological model s led phage-bacteria traits stochastically for communities ranging from 1 to 1000 phage-bacteria pairs, and it included a fraction of escaping lysogens proportional to the increase in carrying capacity. The ersification model introduced new prophages at each ersification step and estimated the distribution of prophages per bacteria using combinatorics. The ecological model recovered the range of abundances and sublinear relationship between phage and bacteria observed across eleven ecosystems. The ersification model predicted an increase in the number of prophages per genome as bacterial abundances increased, in agreement with the distribution of prophages on 833 genomes from marine and human-associated bacteria. The study of lysogeny presented here offers a framework to interpret viral and microbial abundances and reconciles the Kill-the-Winner and Piggyback-the-Winner paradigms in viral ecology.

Publication

Insights into antibiotic resistance through metagenomic approaches

Publisher: Future Medicine Ltd

Date: 2012

DOI: 10.2217/FMB.11.135

Abstract: The consequences of bacterial infections have been curtailed by the introduction of a wide range of antibiotics. However, infections continue to be a leading cause of mortality, in part due to the evolution and acquisition of antibiotic-resistance genes. Antibiotic misuse and overprescription have created a driving force influencing the selection of resistance. Despite the problem of antibiotic resistance in infectious bacteria, little is known about the ersity, distribution and origins of resistance genes, especially for the unculturable majority of environmental bacteria. Functional and sequence-based metagenomics have been used for the discovery of novel resistance determinants and the improved understanding of antibiotic-resistance mechanisms in clinical and natural environments. This review discusses recent findings and future challenges in the study of antibiotic resistance through metagenomic approaches.

Publication

Metagenomic and Small-Subunit rRNA Analyses Reveal the Genetic Diversity of Bacteria, Archaea, Fungi, and Viruses in Soil

Publisher: American Society for Microbiology

Date: 11-2007

DOI: 10.1128/AEM.00358-07

Abstract: Recent studies have highlighted the surprising richness of soil bacterial communities however, bacteria are not the only microorganisms found in soil. To our knowledge, no study has compared the ersities of the four major microbial taxa, i.e., bacteria, archaea, fungi, and viruses, from an in idual soil s le. We used metagenomic and small-subunit RNA-based sequence analysis techniques to compare the estimated richness and evenness of these groups in prairie, desert, and rainforest soils. By grouping sequences at the 97% sequence similarity level (an operational taxonomic unit [OTU]), we found that the archaeal and fungal communities were consistently less even than the bacterial communities. Although total richness levels are difficult to estimate with a high degree of certainty, the estimated number of unique archaeal or fungal OTUs appears to rival or exceed the number of unique bacterial OTUs in each of the collected soils. In this first study to comprehensively survey viral communities using a metagenomic approach, we found that soil viruses are taxonomically erse and distinct from the communities of viruses found in other environments that have been surveyed using a similar approach. Within each of the four microbial groups, we observed minimal taxonomic overlap between sites, suggesting that soil archaea, bacteria, fungi, and viruses are globally as well as locally erse.

Publication

PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies

Publisher: Oxford University Press (OUP)

Date: 14-05-2012

DOI: 10.1093/NAR/GKS406

Publication

Metagenomic analysis of the microbial community associated with the coral Porites astreoides

Publisher: Wiley

Date: 21-07-2007

DOI: 10.1111/J.1462-2920.2007.01383.X

Abstract: The coral holobiont is a dynamic assemblage of the coral animal, zooxanthellae, endolithic algae and fungi, Bacteria,Archaea and viruses. Zooxanthellae and some Bacteria form relatively stable and species-specific associations with corals. Other associations are less specific coral-associated Archaea differ from those in the water column, but the same archaeal species may be found on different coral species. It has been hypothesized that the coral animal can adapt to differing ecological niches by 'switching' its microbial associates. In the case of corals and zooxanthellae, this has been termed adaptive bleaching and it has important implications for carbon cycling within the coral holobiont and ultimately the survival of coral reefs. However, the roles of other components of the coral holobiont are essentially unknown. To better understand these other coral associates, a fractionation procedure was used to separate the microbes, mitochondria and viruses from the coral animal cells and zooxanthellae. The resulting metagenomic DNA was sequenced using pyrosequencing. Fungi, Bacteria and phage were the most commonly identified organisms in the metagenome. Three of the four fungal phyla were represented, including a wide ersity of fungal genes involved in carbon and nitrogen metabolism, suggesting that the endolithic community is more important than previously appreciated. In particular, the data suggested that endolithic fungi could be converting nitrate and nitrite to ammonia, which would enable fixed nitrogen to cycle within the coral holobiont. The most prominent bacterial groups were Proteobacteria (68%), Firmicutes (10%), Cyanobacteria (7%) and Actinobacteria (6%). Functionally, the bacterial community was primarily heterotrophic and included a number of pathways for the degradation of aromatic compounds, the most abundant being the homogentisate pathway. The most abundant phage family was the ssDNA Microphage and most of the eukaryotic viruses were most closely related to those known to infect aquatic organisms. This study provides a metabolic and taxonomic snapshot of microbes associated with the reef-building coral Porites astreoides and presents a basis for understanding how coral-microbial interactions structure the holobiont and coral reefs.

Publication

Statement in Support of: “Virology under the Microscope—a Call for Rational Discourse”

Publisher: American Society for Microbiology

Date: 22-06-2023

DOI: 10.1128/MSPHERE.00165-23

Publication

Computational approaches to predict bacteriophage–host relationships

Publisher: Oxford University Press (OUP)

Date: 09-12-2015

DOI: 10.1093/FEMSRE/FUV048

Publication

The utility of dust for forensic intelligence: Exploring collection methods and detection limits for environmental DNA, elemental and mineralogical analyses of dust samples

Publisher: Elsevier BV

Date: 03-2023

DOI: 10.1016/J.FORSCIINT.2023.111599

Publication

Bacterial Viruses Subcommittee and Archaeal Viruses Subcommittee of the ICTV: update of taxonomy changes in 2021

Publisher: Springer Science and Business Media LLC

Date: 21-08-2021

DOI: 10.1007/S00705-021-05205-9

Abstract: In this article, we – the Bacterial Viruses Subcommittee and the Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV) – summarise the results of our activities for the period March 2020 – March 2021. We report the ision of the former Bacterial and Archaeal Viruses Subcommittee in two separate Subcommittees, welcome new members, a new Subcommittee Chair and Vice Chair, and give an overview of the new taxa that were proposed in 2020, approved by the Executive Committee and ratified by vote in 2021. In particular, a new realm, three orders, 15 families, 31 subfamilies, 734 genera and 1845 species were newly created or redefined (moved romoted).

Publication

Dynamics of infection in a novel group of promiscuous phages and hosts of multiple bacterial genera retrieved from river communities

Publisher: Cold Spring Harbor Laboratory

Date: 10-08-2020

DOI: 10.1101/2020.08.07.242396

Abstract: Phages are generally described as species- or even strain-specific viruses, implying an inherent limitation for some to be maintained and spread in erse bacterial communities. Moreover, phage isolation and host range determination rarely consider the phage ecological context, likely biasing our notion on phage specificity. Here we identified and characterized a novel group of promiscuous phages existing in rivers by using erse bacteria isolated from the same s les, and then used this biological system to investigate infection dynamics in distantly related hosts. We assembled a erse collection of over 600 native bacterial strains and used them to isolate six podophages, named Atoyac, from different geographic origin and capable of infecting six genera in the Gammaproteobacteria. Atoyac phage genomes are highly similar to each other but not to those currently available in the genome and metagenome public databases. Detailed comparison of the phage’s infectivity in erse hosts and trough hundreds of interactions revealed variation in plating efficiency amongst bacterial genera, implying a cost associated with infection of distant hosts, and between phages, despite their sequence similarity. We show, through experimental evolution in single or alternate hosts of different genera, that plaque production efficiency is highly dynamic and tends towards optimization in hosts rendering low plaque formation. Complex adaptation outcomes observed in the evolution experiments differed between highly similar phages and suggest that propagation in multiple hosts may be key to maintain promiscuity in some viruses. Our study expands our knowledge of the virosphere and uncovers bacteria-phage interactions overlooked in natural systems. In natural environments, phages co-exist and interact with a broad variety of bacteria, posing a conundrum for narrow-host-range phages maintenance in erse communities. This context is rarely considered in the study of host-phage interactions, typically focused on narrow-host-range viruses and their infectivity in target bacteria isolated from sources distinct to where the phages were retrieved from. By studying phage-host interactions in bacteria and viruses isolated from river microbial communities, we show that novel phages with promiscuous host range encompassing multiple bacterial genera can be found in the environment. Assessment of hundreds of interactions in erse hosts revealed that similar phages exhibit different infection efficiency and adaptation patterns. Understanding host range is fundamental in our knowledge of bacteria-phage interactions and their impact in microbial communities. The dynamic nature of phage promiscuity revealed in our study has implications in different aspects of phage research such as horizontal gene transfer or phage therapy.

Publication

The skin microbiome of elasmobranchs follows phylosymbiosis, but in teleost fishes, the microbiomes converge

Publisher: Springer Science and Business Media LLC

Date: 13-06-2020

DOI: 10.1186/S40168-020-00840-X

Abstract: The vertebrate clade erged into Chondrichthyes (sharks, rays, and chimeras) and Osteichthyes fishes (bony fishes) approximately 420 mya, with each group accumulating vast anatomical and physiological differences, including skin properties. The skin of Chondrichthyes fishes is covered in dermal denticles, whereas Osteichthyes fishes are covered in scales and are mucous rich. The ergence time among these two fish groups is hypothesized to result in predictable variation among symbionts. Here, using shotgun metagenomics, we test if patterns of ersity in the skin surface microbiome across the two fish clades match predictions made by phylosymbiosis theory. We hypothesize (1) the skin microbiome will be host and clade-specific, (2) evolutionary difference in elasmobranch and teleost will correspond with a concomitant increase in host-microbiome dissimilarity, and (3) the skin structure of the two groups will affect the taxonomic and functional composition of the microbiomes. We show that the taxonomic and functional composition of the microbiomes is host-specific. Teleost fish had lower average microbiome within clade similarity compared to among clade comparison, but their composition is not different among clade in a null based model. Elasmobranch’s average similarity within clade was not different than across clade and not different in a null based model of comparison. In the comparison of host distance with microbiome distance, we found that the taxonomic composition of the microbiome was related to host distance for the elasmobranchs, but not the teleost fishes. In comparison, the gene function composition was not related to the host-organism distance for elasmobranchs but was negatively correlated with host distance for teleost fishes. Our results show the patterns of phylosymbiosis are not consistent across both fish clades, with the elasmobranchs showing phylosymbiosis, while the teleost fish are not. The discrepancy may be linked to alternative processes underpinning microbiome assemblage, including possible historical host-microbiome evolution of the elasmobranchs and convergent evolution in the teleost which filter specific microbial groups. Our comparison of the microbiomes among fishes represents an investigation into the microbial relationships of the oldest ergence of extant vertebrate hosts and reveals that microbial relationships are not consistent across evolutionary timescales.

Publication

Transposases are the most abundant, most ubiquitous genes in nature

Publisher: Oxford University Press (OUP)

Date: 09-03-2010

DOI: 10.1093/NAR/GKQ140

Publication

TagCleaner: Identification and removal of tag sequences from genomic and metagenomic datasets

Publisher: Springer Science and Business Media LLC

Date: 23-06-2010

DOI: 10.1186/1471-2105-11-341

Publication

Coastal bacterioplankton community diversity along a latitudinal gradient in Latin America by means of V6 tag pyrosequencing

Publisher: Springer Science and Business Media LLC

Date: 13-11-2010

DOI: 10.1007/S00203-010-0644-Y

Abstract: The bacterioplankton ersity of coastal waters along a latitudinal gradient between Puerto Rico and Argentina was analyzed using a total of 134,197 high-quality sequences from the V6 hypervariable region of the small-subunit ribosomal RNA gene (16S rRNA) (mean length of 60 nt). Most of the OTUs were identified into Proteobacteria, Bacteriodetes, Cyanobacteria, and Actinobacteria, corresponding to approx. 80% of the total number of sequences. The number of OTUs corresponding to species varied between 937 and 1946 in the seven locations. Proteobacteria appeared at high frequency in the seven locations. An enrichment of Cyanobacteria was observed in Puerto Rico, whereas an enrichment of Bacteroidetes was detected in the Argentinian shelf and Uruguayan coastal lagoons. The highest number of sequences of Actinobacteria and Acidobacteria were obtained in the Amazon estuary mouth. The rarefaction curves and Good coverage estimator for species ersity suggested a significant coverage, with values ranging between 92 and 97% for Good coverage. Conserved taxa corresponded to aprox. 52% of all sequences. This study suggests that human-contaminated environments may influence bacterioplankton ersity.

Publication

Prophage genomics reveals patterns in phage genome organization and replication

Publisher: Cold Spring Harbor Laboratory

Date: 07-03-2017

DOI: 10.1101/114819

Abstract: Temperate phage genomes are highly variable mosaic collections of genes that infect a bacterial host, integrate into the host’s genome or replicate as low copy number plasmids, and are regulated to switch from the lysogenic to lytic cycles to generate new virions and escape their host. Genomes from most Bacterial phyla contain at least one or more prophages. We updated our PhiSpy algorithm to improve detection of prophages and to provide a web-based framework for PhiSpy. We have used this algorithm to identify 36,488 prophage regions from 11,941 bacterial genomes, including almost 600 prophages with no known homology to any proteins. Transfer RNA genes were abundant in the prophages, many of which alleviate the limits of translation efficiency due to host codon bias and presumably enable phages to surpass the normal capacity of the hosts’ translation machinery. We identified integrase genes in 15,765 prophages (43% of the prophages). The integrase was routinely located at either end of the integrated phage genome, and was used to orient and align prophage genomes to reveal their underlying organization. The conserved genome alignments of phages recapitulate early, middle, and late gene order in transcriptional control of phage genes, and demonstrate that gene order, presumably selected by transcription timing and/or coordination among functional modules has been stably conserved throughout phage evolution.

Publication

Genomic Taxonomy of the Genus Prochlorococcus

Publisher: Springer Science and Business Media LLC

Date: 21-08-2013

DOI: 10.1007/S00248-013-0270-8

Abstract: The genus Prochlorococcus is globally abundant and dominates the total phytoplankton biomass and production in the oligotrophic ocean. The single species, Prochlorococcus marinus, comprises six named ecotypes. Our aim was to analyze the taxonomic structure of the genus Prochlorococcus. We analyzed the complete genomes of 13 cultured P. marinus type and reference strains by means of several genomic taxonomy tools (i.e., multilocus sequence analysis, amino acid identity, Karlin genomic signature, and genome to genome distance). In addition, we estimated the ersity of Prochlorococcus species in over 100 marine metagenomes from all the major oceanic provinces. According to our careful taxonomic analysis, the 13 strains corresponded, in fact, to ten different Prochlorococcus species. This analysis establishes a new taxonomic framework for the genus Prochlorococcus. Further, the analysis of the metagenomic data suggests that, in total, there may only be 35 Prochlorococcus species in the world's oceans. We propose that the dearth of species observed in this study is driven by high selective pressures that limit ersification in the global ocean.

Publication

Identification and removal of ribosomal RNA sequences from metatranscriptomes

Publisher: Oxford University Press (OUP)

Date: 06-12-2011

DOI: 10.1093/BIOINFORMATICS/BTR669

Abstract: Summary: Here, we present riboPicker, a robust framework for the rapid, automated identification and removal of ribosomal RNA sequences from metatranscriptomic datasets. The results can be exported for subsequent analysis, and the databases used for the web-based version are updated on a regular basis. riboPicker categorizes rRNA-like sequences and provides graphical visualizations and tabular outputs of ribosomal coverage, alignment results and taxonomic classifications. Availability and implementation: This open-source application was implemented in Perl and can be used as stand-alone version or accessed online through a user-friendly web interface. The source code, user help and additional information is available at Contact: rschmied@sciences.sdsu.edu rschmied@sciences.sdsu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

An application of statistics to comparative metagenomics

Publisher: Springer Science and Business Media LLC

Date: 20-03-2006

DOI: 10.1186/1471-2105-7-162

Abstract: Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems.

Publication

Classification Confidence in Exploratory Learning: A User’s Guide

Publisher: MDPI AG

Date: 21-07-2023

DOI: 10.3390/MAKE5030043

Abstract: This paper investigates the post-hoc calibration of confidence for “exploratory” machine learning classification problems. The difficulty in these problems stems from the continuing desire to push the boundaries of which categories have enough ex les to generalize from when curating datasets, and confusion regarding the validity of those categories. We argue that for such problems the “one-versus-all” approach (top-label calibration) must be used rather than the “calibrate-the-full-response-matrix” approach advocated elsewhere in the literature. We introduce and test four new algorithms designed to handle the idiosyncrasies of category-specific confidence estimation using only the test set and the final model. Chief among these methods is the use of kernel density ratios for confidence calibration including a novel algorithm for choosing the bandwidth. We test our claims and explore the limits of calibration on a bioinformatics application (PhANNs) as well as the classic MNIST benchmark. Finally, our analysis argues that post-hoc calibration should always be performed, may be performed using only the test dataset, and should be sanity-checked visually.

Publication

SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data

Publisher: Oxford University Press (OUP)

Date: 09-10-2015

DOI: 10.1093/BIOINFORMATICS/BTV584

Abstract: Summary: Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original s le, answering the question what they can do. Currently, available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here, we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1000 times faster than other tools. Availability and implementation: SUPER-FOCUS was implemented in Python, and its source code and the tool website are freely available at edwards.sdsu.edu/SUPERFOCUS. Contact: redwards@mail.sdsu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

Critical Assessment of Metagenome Interpretation – a benchmark of computational metagenomics software

Publisher: Cold Spring Harbor Laboratory

Date: 09-01-2017

DOI: 10.1101/099127

Abstract: In metagenome analysis, computational methods for assembly, taxonomic profiling and binning are key components facilitating downstream biological data interpretation. However, a lack of consensus about benchmarking datasets and evaluation metrics complicates proper performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on datasets of unprecedented complexity and realism. Benchmark metagenomes were generated from ~700 newly sequenced microorganisms and ~600 novel viruses and plasmids, including genomes with varying degrees of relatedness to each other and to publicly available ones and representing common experimental setups. Across all datasets, assembly and genome binning programs performed well for species represented by in idual genomes, while performance was substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below the family level. Parameter settings substantially impacted performances, underscoring the importance of program reproducibility. While highlighting current challenges in computational metagenomics, the CAMI results provide a roadmap for software selection to answer specific research questions.

Publication

Latitude and chlorophyll a density drive the distribution of carbohydrate‐active enzymes in the planktonic microbial fraction of the epipelagic zone

Publisher: Wiley

Date: 04-08-2020

DOI: 10.1111/1758-2229.12865

Publication

The Promise and Pitfalls of Prophages

Publisher: Cold Spring Harbor Laboratory

Date: 21-04-2023

DOI: 10.1101/2023.04.20.537752

Abstract: Phages dominate every ecosystem on the planet. While virulent phages sculpt the microbiome by killing their bacterial hosts, temperate phages provide unique growth advantages to their hosts through lysogenic conversion. Many prophages benefit their host, and prophages are responsible for genotypic and phenotypic differences that separate in idual microbial strains. However, the microbes also endure a cost to maintain those phages: additional DNA to replicate and proteins to transcribe and translate. We have never quantified those benefits and costs. Here, we analysed over two and a half million prophages from over half a million bacterial genome assemblies. Analysis of the whole dataset and a representative subset of taxonomically erse bacterial genomes demonstrated that the normalised prophage density was uniform across all bacterial genomes above 2 Mbp. We identified a constant carrying capacity of phage DNA per bacterial DNA. We estimated that each prophage provides cellular services equivalent to approximately 2.4 % of the cell’s energy or 0.9 ATP per bp per hour. We demonstrate analytical, taxonomic, geographic, and temporal disparities in identifying prophages in bacterial genomes that provide novel targets for identifying new phages. We anticipate that the benefits bacteria accrue from the presence of prophages balance the energetics involved in supporting prophages. Furthermore, our data will provide a new framework for identifying phages in environmental datasets, erse bacterial phyla, and from different locations.

Publication

Viral communities associated with healthy and bleaching corals

Publisher: Wiley

Date: 12-08-2008

DOI: 10.1111/J.1462-2920.2008.01652.X

Publication

PHANOTATE: a novel approach to gene identification in phage genomes

Publisher: Oxford University Press (OUP)

Date: 25-04-2019

DOI: 10.1093/BIOINFORMATICS/BTZ265

Abstract: Currently there are no tools specifically designed for annotating genes in phages. Several tools are available that have been adapted to run on phage genomes, but due to their underlying design, they are unable to capture the full complexity of phage genomes. Phages have adapted their genomes to be extremely compact, having adjacent genes that overlap and genes completely inside of other longer genes. This non-delineated genome structure makes it difficult for gene prediction using the currently available gene annotators. Here we present PHANOTATE, a novel method for gene calling specifically designed for phage genomes. Although the compact nature of genes in phages is a problem for current gene annotators, we exploit this property by treating a phage genome as a network of paths: where open reading frames are favorable, and overlaps and gaps are less favorable, but still possible. We represent this network of connections as a weighted graph, and use dynamic programing to find the optimal path. We compare PHANOTATE to other gene callers by annotating a set of 2133 complete phage genomes from GenBank, using PHANOTATE and the three most popular gene callers. We found that the four programs agree on 82% of the total predicted genes, with PHANOTATE predicting more genes than the other three. We searched for these extra genes in both GenBank’s non-redundant protein database and all of the metagenomes in the sequence read archive, and found that they are present at levels that suggest that these are functional protein-coding genes. eprekate/PHANOTATE Supplementary data are available at Bioinformatics online.

Publication

PMAnalyzer: a new web interface for bacterial growth curve analysis

Publisher: Oxford University Press (OUP)

Date: 13-02-2017

DOI: 10.1093/BIOINFORMATICS/BTX084

Abstract: Bacterial growth curves are essential representations for characterizing bacteria metabolism within a variety of media compositions. Using high-throughput, spectrophotometers capable of processing tens of 96-well plates, quantitative phenotypic information can be easily integrated into the current data structures that describe a bacterial organism. The PMAnalyzer pipeline performs a growth curve analysis to parameterize the unique features occurring within microtiter wells containing specific growth media sources. We have expanded the pipeline capabilities and provide a user-friendly, online implementation of this automated pipeline. PMAnalyzer version 2.0 provides fast automatic growth curve parameter analysis, growth identification and high resolution figures of s le-replicate growth curves and several statistical analyses. PMAnalyzer v2.0 can be found at manalyzer/. Source code for the pipeline can be found on GitHub at acuevas/PMAnalyzer. Source code for the online implementation can be found on GitHub at acuevas/PMAnalyzerWeb. Supplementary data are available at Bioinformatics online.

Publication

Reply to: Caution in inferring viral strategies from abundance correlations in marine metagenomes

Publisher: Springer Science and Business Media LLC

Date: 30-01-2019

DOI: 10.1038/S41467-018-08286-4

Publication

Explaining microbial phenotypes on a genomic scale: GWAS for microbes

Publisher: Oxford University Press (OUP)

Date: 26-04-2013

DOI: 10.1093/BFGP/ELT008

Publication

Microbial Ecology of Four Coral Atolls in the Northern Line Islands

Publisher: Public Library of Science (PLoS)

Date: 27-02-2008

DOI: 10.1371/JOURNAL.PONE.0001584

Publication

Compounding Achromobacter Phages for Therapeutic Applications

Publisher: MDPI AG

Date: 30-07-2023

DOI: 10.3390/V15081665

Abstract: Achromobacter species colonization of Cystic Fibrosis respiratory airways is an increasing concern. Two adult patients with Cystic Fibrosis colonized by Achromobacter xylosoxidans CF418 or Achromobacter ruhlandii CF116 experienced fatal exacerbations. Achromobacter spp. are naturally resistant to several antibiotics. Therefore, phages could be valuable as therapeutics for the control of Achromobacter. In this study, thirteen lytic phages were isolated and characterized at the morphological and genomic levels for potential future use in phage therapy. They are presented here as the Achromobacter Kumeyaay phage collection. Six distinct Achromobacter phage genome clusters were identified based on a comprehensive phylogenetic analysis of the Kumeyaay collection as well as the publicly available Achromobacter phages. The infectivity of all phages in the Kumeyaay collection was tested in 23 Achromobacter clinical isolates 78% of these isolates were lysed by at least one phage. A cryptic prophage was induced in Achromobacter xylosoxidans CF418 when infected with some of the lytic phages. This prophage genome was characterized and is presented as Achromobacter phage CF418-P1. Prophage induction during lytic phage preparation for therapy interventions require further exploration. Large-scale production of phages and removal of endotoxins using an octanol-based procedure resulted in a phage concentrate of 1 × 109 plaque-forming units per milliliter with an endotoxin concentration of 65 endotoxin units per milliliter, which is below the Food and Drugs Administration recommended maximum threshold for human administration. This study provides a comprehensive framework for the isolation, bioinformatic characterization, and safe production of phages to kill Achromobacter spp. in order to potentially manage Cystic Fibrosis (CF) pulmonary infections.

Publication

Organizing the bacterial annotation space with amino acid sequence embeddings

Publisher: Springer Science and Business Media LLC

Date: 23-09-2022

DOI: 10.1186/S12859-022-04930-5

Abstract: Due to the ever-expanding gap between the number of proteins being discovered and their functional characterization, protein function inference remains a fundamental challenge in computational biology. Currently, known protein annotations are organized in human-curated ontologies, however, all possible protein functions may not be organized accurately. Meanwhile, recent advancements in natural language processing and machine learning have developed models which embed amino acid sequences as vectors in n -dimensional space. So far, these embeddings have primarily been used to classify protein sequences using manually constructed protein classification schemes. In this work, we describe the use of amino acid sequence embeddings as a systematic framework for studying protein ontologies. Using a sequence embedding, we show that the bacterial carbohydrate metabolism class within the SEED annotation system contains 48 clusters of embedded sequences despite this class containing 29 functional labels. Furthermore, by embedding Bacillus amino acid sequences with unknown functions, we show that these unknown sequences form clusters that are likely to have similar biological roles. This study demonstrates that amino acid sequence embeddings may be a powerful tool for developing more robust ontologies for annotating protein sequence data. In addition, embeddings may be beneficial for clustering protein sequences with unknown functions and selecting optimal candidate proteins to characterize experimentally.

Publication

Acidobacteria Subgroups and Their Metabolic Potential for Carbon Degradation in Sugarcane Soil Amended With Vinasse and Nitrogen Fertilizers

Publisher: Frontiers Media SA

Date: 30-07-2019

DOI: 10.3389/FMICB.2019.01680

Publication

Coral and Seawater Metagenomes Reveal Key Microbial Functions to Coral Health and Ecosystem Functioning Shaped at Reef Scale

Publisher: Springer Science and Business Media LLC

Date: 15-08-2023

DOI: 10.1007/S00248-022-02094-6

Abstract: The coral holobiont is comprised of a highly erse microbial community that provides key services to corals such as protection against pathogens and nutrient cycling. The coral surface mucus layer (SML) microbiome is very sensitive to external changes, as it constitutes the direct interface between the coral host and the environment. Here, we investigate whether the bacterial taxonomic and functional profiles in the coral SML are shaped by the local reef zone and explore their role in coral health and ecosystem functioning. The analysis was conducted using metagenomes and metagenome-assembled genomes (MAGs) associated with the coral Pseudodiploria strigosa and the water column from two naturally distinct reef environments in Bermuda: inner patch reefs exposed to a fluctuating thermal regime and the more stable outer reefs. The microbial community structure in the coral SML varied according to the local environment, both at taxonomic and functional levels. The coral SML microbiome from inner reefs provides more gene functions that are involved in nutrient cycling (e.g., photosynthesis, phosphorus metabolism, sulfur assimilation) and those that are related to higher levels of microbial activity, competition, and stress response. In contrast, the coral SML microbiome from outer reefs contained genes indicative of a carbohydrate-rich mucus composition found in corals exposed to less stressful temperatures and showed high proportions of microbial gene functions that play a potential role in coral disease, such as degradation of lignin-derived compounds and sulfur oxidation. The fluctuating environment in the inner patch reefs of Bermuda could be driving a more beneficial coral SML microbiome, potentially increasing holobiont resilience to environmental changes and disease.

Publication

Ten simple rules and a template for creating workflows-as-applications

Publisher: Center for Open Science

Date: 19-09-2022

DOI: 10.31219/OSF.IO/8W5J3

Abstract: There is a growing trend for releasing bioinformatics workflows as command line applications. This is a good thing as workflow management systems add both functionality and reliability, while command line interfaces are convenient for end users. Developing command line software in this way is considerably faster. However, there are many potential pitfalls that developers of bioinformatics tools should avoid. We outline ten simple rules for converting workflow manager pipelines into command line applications, and present working ex les that also function as templates using the two most popular workflow managers, Snakemake (eardymcjohnface/Snaketool) and Nextflow (eardymcjohnface/Nektool).

Publication

Confidence for exploratory machine learning

Publisher: Elsevier BV

Date: 02-2023

DOI: 10.1016/J.BPJ.2022.11.1601

Publication

Indirect effects of algae on coral: algae‐mediated, microbe‐induced coral mortality

Publisher: Wiley

Date: 05-06-2006

DOI: 10.1111/J.1461-0248.2006.00937.X

Abstract: Declines in coral cover are generally associated with increases in the abundance of fleshy algae. In many cases, it remains unclear whether algae are responsible, directly or indirectly, for coral death or whether they simply settle on dead coral surfaces. Here, we show that algae can indirectly cause coral mortality by enhancing microbial activity via the release of dissolved compounds. When coral and algae were placed in chambers together but separated by a 0.02 mum filter, corals suffered 100% mortality. With the addition of the broad-spectrum antibiotic icillin, mortality was completely prevented. Physiological measurements showed complementary patterns of increasing coral stress with proximity to algae. Our results suggest that as human impacts increase and algae become more abundant on reefs a positive feedback loop may be created whereby compounds released by algae enhance microbial activity on live coral surfaces causing mortality of corals and further algal growth.

Publication

FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares

Publisher: PeerJ

Date: 05-06-2014

DOI: 10.7717/PEERJ.425

Publication

multiPhATE: bioinformatics pipeline for functional annotation of phage isolates

Publisher: Oxford University Press (OUP)

Date: 14-05-2019

DOI: 10.1093/BIOINFORMATICS/BTZ258

Abstract: To address the need for improved phage annotation tools that scale, we created an automated throughput annotation pipeline: multiple-genome Phage Annotation Toolkit and Evaluator (multiPhATE). multiPhATE is a throughput pipeline driver that invokes an annotation pipeline (PhATE) across a user-specified set of phage genomes. This tool incorporates a de novo phage gene calling algorithm and assigns putative functions to gene calls using protein-, virus- and phage-centric databases. multiPhATE’s modular construction allows the user to implement all or any portion of the analyses by acquiring local instances of the desired databases and specifying the desired analyses in a configuration file. We demonstrate multiPhATE by annotating two newly sequenced Yersinia pestis phage genomes. Within multiPhATE, the PhATE processing pipeline can be readily implemented across multiple processors, making it adaptable for throughput sequencing projects. Software documentation assists the user in configuring the system. multiPhATE was implemented in Python 3.7, and runs as a command-line code under Linux or Unix. multiPhATE is freely available under an open-source BSD3 license from arolzhou/multiPhATE. Instructions for acquiring the databases and third-party codes used by multiPhATE are included in the distribution README file. Users may report bugs by submitting to the github issues page associated with the multiPhATE distribution. Supplementary data are available at Bioinformatics online.

Publication

Complete Genome Sequencing of a Multidrug-Resistant and Human-Invasive Salmonella enterica Serovar Typhimurium Strain of the Emerging Sequence Type 213 Genotype

Publisher: American Society for Microbiology

Date: 25-06-2015

DOI: 10.1128/GENOMEA.00663-15

Abstract: Salmonella enterica subsp. enterica serovar Typhimurium strain YU39 was isolated in 2005 in the state of Yucatán, Mexico, from a human systemic infection. The YU39 strain is representative of the multidrug-resistant emergent sequence type 213 (ST213) genotype. The YU39 complete genome is composed of a chromosome and seven plasmids.

Publication

Microbial Community Profile and Water Quality in a Protected Area of the Caatinga Biome

Publisher: Public Library of Science (PLoS)

Date: 16-02-2016

DOI: 10.1371/JOURNAL.PONE.0148296

Publication

Philympics 2021: Prophage Predictions Perplex Programs

Publisher: Cold Spring Harbor Laboratory

Date: 03-06-2021

DOI: 10.1101/2021.06.03.446868

Abstract: Most bacterial genomes contain integrated bacteriophages—prophages—in various states of decay. Many are active and able to excise from the genome and replicate, while others are cryptic prophages, remnants of their former selves. Over the last two decades, many computational tools have been developed to identify the prophage components of bacterial genomes, and it is a particularly active area for the application of machine learning approaches. However, progress is hindered and comparisons thwarted because there are no manually curated bacterial genomes that can be used to test new prophage prediction algorithms. Here, we present a library of gold-standard bacterial genome annotations that include manually curated prophage annotations, and a computational framework to compare the predictions from different algorithms. We use this suite to compare all extant stand-alone prophage prediction algorithms to identify their strengths and weaknesses. We provide a FAIR dataset for prophage identification, and demonstrate the accuracy, precision, recall, and f 1 score from the analysis of seven different algorithms for the prediction of prophages. We discuss caveats and concerns in this analysis and how those concerns may be mitigated.

Publication

Guidelines for public database submission of uncultivated virus genome sequences for taxonomic classification

Publisher: Springer Science and Business Media LLC

Date: 07-2023

DOI: 10.1038/S41587-023-01844-2

Publication

Novel crAssphage isolates exhibit conserved gene order and purifying selection of the host specificity protein

Publisher: Cold Spring Harbor Laboratory

Date: 06-03-2023

DOI: 10.1101/2023.03.05.531146

Abstract: Bacteroides, the prominent bacteria in the human gut, play a crucial role in degrading complex polysaccharides. Their abundance is influenced by phages belonging to the Crassvirales order. Despite identifying over 600 Crassvirales genomes computationally, only few have been successfully isolated. Continued efforts in isolation of more Crassvirales genomes can provide insights into phage-host-evolution and infection mechanisms. We focused on wastewater s les, as potential sources of phages infecting various Bacteroides hosts. Sequencing, assembly, and characterization of isolated phages revealed 14 complete genomes belonging to three novel Crassvirales species infecting Bacteroides cellulosilyticus WH2. These species, Kehishuvirus sp. ‘tikkala’ strain Bc01, Kolpuevirus sp. ‘frurule’ strain Bc03, and ‘Rudgehvirus jaberico’ strain Bc11, spanned two families, and three genera, displaying a broad range of virion productions. Upon testing all successfully cultured Crassvirales species and their respective bacterial hosts, we discovered that they do not exhibit co-evolutionary patterns with their bacterial hosts. Furthermore, we observed variations in gene similarity, with greater shared similarity observed within genera. However, despite belonging to different genera, the three novel species shared a unique structural gene that encodes the tail spike protein. When investigating the relationship between this gene and host interaction, we discovered evidence of purifying selection, indicating its functional importance. Moreover, our analysis demonstrated that this tail spike protein binds to the TonB-dependent receptors present on the bacterial host surface. Combining these observations, our findings provide insights into phage-host interactions and present three Crassvirales species as an ideal system for controlled infectivity experiments on one of the most dominant members of the human enteric virome. Bacteriophages play a crucial role in shaping microbial communities within the human gut. Among the most dominant bacteriophages in the human gut microbiome are Crassvirales phages, which infect Bacteroides. Despite being widely distributed, only a few Crassvirales genomes have been isolated, leading to a limited understanding of their biology, ecology, and evolution. This study isolated and characterized three novel Crassvirales genomes belonging to two different families, and three genera, but infecting one bacterial host, Bacteroides cellulosilyticus WH2. Notably, the observation confirmed the phages are not co-evolving with their bacterial hosts, rather have a shared ability to exploit similar features in their bacterial host. Additionally, the identification of a critical viral protein undergoing purifying selection and interacting with the bacterial receptors opens doors to targeted therapies against bacterial infections. Given Bacteroides role in polysaccharide degradation in the human gut, our findings advance our understanding of the phage-host interactions and could have important implications for the development of phage-based therapies. These discoveries may hold implications for improving gut health and metabolism to support overall well-being. The genomes used in this research are available on Sequence Read Archive (SRA) within the project, PRJNA737576. Bacteroides cellulosilyticus WH2, Kehishuvirus sp. ‘tikkala’ strain Bc01, Kolpuevirus sp. ‘ frurule’ strain Bc03, and ‘Rudgehvirus jaberico’ strain Bc11 are all available on GenBank with accessions NZ_CP072251.1 ( B. cellulosilyticus WH2), QQ198717 (Bc01), QQ198718 (Bc03), and QQ198719 (Bc11), and we are working on making the strains available through ATCC. The 3D protein structures for the three Crassvirales genomes are available to download at 0.25451/flinders.21946034.

Publication

Correction: Corrigendum: Allelic variation contributes to bacterial host specificity

Publisher: Springer Science and Business Media LLC

Date: 08-08-2017

DOI: 10.1038/NCOMMS15229

Abstract: Nature Communications 6 Article: 8754 (2015) Published: 30 Octomber 2015 Updated: 8 August 2017 In Fig. 3 of this Article, the numbers of isolates studied for the fimH41_Newp, fimH44_Newp and fimH45_Newp alleles were inadvertently swapped. The correct version of Fig. 3 appears below as Fig. 1.

Publication

Draft Genome Sequence of the Fish Pathogen Piscirickettsia salmonis

Publisher: American Society for Microbiology

Date: 26-12-2013

DOI: 10.1128/GENOMEA.00926-13

Abstract: Piscirickettsia salmonis is a Gram-negative intracellular fish pathogen that has a significant impact on the salmon industry. Here, we report the genome sequence of P. salmonis strain LF-89. This is the first draft genome sequence of P. salmonis , and it reveals interesting attributes, including flagellar genes, despite this bacterium being considered nonmotile.

Publication

MultiPhATE2: Code for Functional Annotation and Comparison of Bacteriophage Genomes

Publisher: Cold Spring Harbor Laboratory

Date: 07-10-2020

DOI: 10.1101/2020.10.05.324566

Abstract: To address the need for improved tools for annotation and comparative genomics of bacteriophage genomes, we developed multiPhATE2. As an extension of the multiPhATE code, multiPhATE2 performs gene finding and functional sequence annotation of predicted gene and protein sequences, and additional search algorithms and databases extend the search space of the original functional annotation subsystem. MultiPhATE2 includes comparative genomics codes for gene matching among sets of input bacteriophage genomes, and scales well to large input data sets with the incorporation of multiprocessing in the functional annotation and comparative genomics subsystems. MultiPhATE2 was implemented in Python 3.7 and runs as a command-line code under Linux or MAC-OS. MultiPhATE2 is freely available under an open-source GPL-3 license at arolzhou/multiPhATE2 . Instructions for acquiring the databases and third party codes used by multiPhATE2 are found in the README file included with the distribution. Users may report bugs by submitting issues to the project GitHub repository webpage. Contact: zhou4@llnl.gov or multiphate@gmail.com . Supplementary materials, which demonstrate the outputs of multiPhATE2, are available in a GitHub repository, at arolzhou/multiPhATE2_supplementaryData/ .

Publication

The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)

Publisher: Oxford University Press (OUP)

Date: 29-11-2013

DOI: 10.1093/NAR/GKT1226

Publication

Host Association and Spatial Proximity Shape but Do Not Constrain Population Structure in the Mutualistic Symbiont Xenorhabdus bovienii

Publisher: American Society for Microbiology

Date: 27-06-2023

DOI: 10.1128/MBIO.00434-23

Abstract: Microbial populations and species are notoriously hard to delineate. We used a population genomics approach to examine the population structure and the spatial scale of gene flow in Xenorhabdus bovienii , an intriguing species that is both a specialized mutualistic symbiont of nematodes and a broadly virulent insect pathogen.

Publication

PHACTS, a computational approach to classifying the lifestyle of phages

Publisher: Oxford University Press (OUP)

Date: 11-01-2012

DOI: 10.1093/BIOINFORMATICS/BTS014

Abstract: Motivation: Bacteriophages have two distinct lifestyles: virulent and temperate. The virulent lifestyle has many implications for phage therapy, genomics and microbiology. Determining which lifestyle a newly sequenced phage falls into is currently determined using standard culturing techniques. Such laboratory work is not only costly and time consuming, but also cannot be used on phage genomes constructed from environmental sequencing. Therefore, a computational method that utilizes the sequence data of phage genomes is needed. Results: Phage Classification Tool Set (PHACTS) utilizes a novel similarity algorithm and a supervised Random Forest classifier to make a prediction whether the lifestyle of a phage, described by its proteome, is virulent or temperate. The similarity algorithm creates a training set from phages with known lifestyles and along with the lifestyle annotation, trains a Random Forest to classify the lifestyle of a phage. PHACTS predictions are shown to have a 99% precision rate. Availability and implementation: PHACTS was implemented in the PERL programming language and utilizes the FASTA program (Pearson and Lipman, 1988) and the R programming language library ‘Random Forest’ (Liaw and Weiner, 2010). The PHACTS software is open source and is available as downloadable stand-alone version or can be accessed online as a user-friendly web interface. The source code, help files and online version are available at www.phantome.org/PHACTS/. Contact: katelyn@rohan.sdsu.edu redwards@sciences.sdsu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

Biodiversity and Biogeography of Phages in Modern Stromatolites and Thrombolites

Publisher: Wiley

Date: 16-09-2011

DOI: 10.1002/9781118010549.CH5

Publication

PhANNs, a fast and accurate tool and web server to classify phage structural proteins

Publisher: Cold Spring Harbor Laboratory

Date: 03-04-2020

DOI: 10.1101/2020.04.03.023523

Abstract: For any given bacteriophage genome or phage sequences in metagenomic data sets, we are unable to assign a function to 50-90% of genes. Structural protein-encoding genes constitute a large fraction of the average phage genome and are among the most ergent and difficult-to-identify genes using homology-based methods. To understand the functions encoded by phages, their contributions to their environments, and to help gauge their utility as potential phage therapy agents, we have developed a new approach to classify phage ORFs into ten major classes of structural proteins or into an “other” category. The resulting tool is named PhANNs (Phage Artificial Neural Networks). We built a database of 538,213 manually curated phage protein sequences that we split into eleven subsets (10 for cross-validation, one for testing) using a novel clustering method that ensures there are no homologous proteins between sets yet maintains the maximum sequence ersity for training. An Artificial Neural Network ensemble trained on features extracted from those sets reached a test F 1 -score of 0.875 and test accuracy of 86.2%. PhANNs can rapidly classify proteins into one of the ten classes, and non-phage proteins are classified as “other”, providing a new approach for functional annotation of phage proteins. PhANNs is open source and can be run from our web server or installed locally. Bacteriophages (phages, viruses that infect bacteria) are the most abundant biological entity on Earth. They outnumber bacteria by a factor of ten. As phages are very different within them and from bacteria, and we have comparatively few phage genes in our database, we are unable to assign function to 50%-90% of phage genes. In this work, we developed PhANNs, a machine learning tool that can classify a phage gene as one of ten structural roles, or “other”. This approach does not require a similar gene to be known.

Publication

Metagenomic Analysis of Healthy and White Plague-Affected Mussismilia braziliensis Corals

Publisher: Springer Science and Business Media LLC

Date: 15-01-2013

DOI: 10.1007/S00248-012-0161-4

Abstract: Coral health is under threat throughout the world due to regional and global stressors. White plague disease (WP) is one of the most important threats affecting the major reef builder of the Abrolhos Bank in Brazil, the endemic coral Mussismilia braziliensis. We performed a metagenomic analysis of healthy and WP-affected M. braziliensis in order to determine the types of microbes associated with this coral species. We also optimized a protocol for DNA extraction from coral tissues. Our taxonomic analysis revealed Proteobacteria, Bacteroidetes, Firmicutes, Cyanobacteria, and Actinomycetes as the main groups in all healthy and WP-affected corals. Vibrionales, members of the Cytophaga-Flavobacterium-Bacteroides complex, Rickettsiales, and Neisseriales were more abundant in the WP-affected corals. Diseased corals also had more eukaryotic metagenomic sequences identified as Alveolata and Apicomplexa. Our results suggest that WP disease in M. braziliensis is caused by a polymicrobial consortium.

Publication

Biological chlorine cycling in the Arctic Coastal Plain

Publisher: Springer Science and Business Media LLC

Date: 07-07-2017

DOI: 10.1007/S10533-017-0359-0

Publication

Single-cell gene expression links SARS-CoV-2 infection and gut serotonin

Publisher: BMJ

Date: 23-08-2023

DOI: 10.1136/GUTJNL-2022-328262

Publication

Phylogenetic classification of short environmental DNA fragments

Publisher: Oxford University Press (OUP)

Date: 19-02-2008

DOI: 10.1093/NAR/GKN038

Publication

Optimizing and evaluating the reconstruction of Metagenome-assembled microbial genomes

Publisher: Springer Science and Business Media LLC

Date: 28-11-2017

DOI: 10.1186/S12864-017-4294-1

Publication

Sequencing of Seven Haloarchaeal Genomes Reveals Patterns of Genomic Flux

Publisher: Public Library of Science (PLoS)

Date: 24-07-2012

DOI: 10.1371/JOURNAL.PONE.0041389

Publication

A Glimpse into the Expanded Genome Content of Vibrio cholerae through Identification of Genes Present in Environmental Strains

Publisher: American Society for Microbiology

Date: 05-2005

DOI: 10.1128/JB.187.9.2992-3001.2005

Abstract: Vibrio cholerae has multiple survival strategies which are reflected both in its broad distribution in many aquatic environments and its high genotypic ersity. To obtain additional information regarding the content of the V. cholerae genome, suppression subtractive hybridization (SSH) was used to prepare libraries of DNA sequences from two southern California coastal isolates which are ergent or absent in the clinical strain V. cholerae O1 El Tor N16961. More than 1,400 subtracted clones were sequenced. This revealed the presence of novel sequences encoding functions related to cell surface structures, transport, metabolism, signal transduction, luminescence, mobile elements, stress resistance, and virulence. Flanking sequence information was determined for loci of interest, and the distribution of these sequences was assessed for a collection of V. cholerae strains obtained from southern California and Mexican environments. This led to the surprising observation that sequences related to the toxin genes toxA , cnf1 , and exoY are widespread and more common in these strains than those of the cholera toxin genes which are a hallmark of the pandemic strains of V. cholerae . Gene transfer among these strains could be facilitated by a 4.9-kbp plasmid discovered in one isolate, which possesses similarity to plasmids from other environmental vibrios. By investigating some of the nucleotide sequence basis for V. cholerae genotypic ersity, DNA fragments have been uncovered which could promote survival in coastal environments. Furthermore, a set of genes has been described which could be involved in as yet undiscovered interactions between V. cholerae and eukaryotic organisms.

Publication

Heat-stressed coral microbiomes are stable and potentially beneficial at the level of taxa and functional genes

Publisher: Authorea, Inc.

Date: 03-02-2023

DOI: 10.22541/AU.167542930.03517639/V1

Abstract: Coral reef health is tightly connected to the coral microbiome. Coral bleaching and disease outbreaks have caused an unprecedented loss in coral cover worldwide, particularly correlated to a warming ocean. Coping mechanisms of the coral holobiont under high temperatures are not completely described, but the associated microbial community is a potential source of acquired heat-tolerance. The relationship between stress and stability in the microbiome is key to understanding the role that the coral microbiome plays in thermal tolerance. According to the Anna Karenina Principle (AKP), stress or disease will increase instability and stochasticity among animal microbiomes. Here we investigate whether heat stress results in microbiomes that follow the AKP. We used shotgun metagenomics in an experimental setting to understand the dynamics of microbial taxa and genes in the surface mucous layer (SML) microbiome of the coral Pseudodiploria strigosa under heat treatment. The metagenomes of corals exposed to heat stress showed high similarity, indicating a deterministic and stable response of the coral microbiome to disturbance, in opposition to the AKP. We hypothesize that this stability is the result of a selective pressure towards a coral microbiome that is assisting the holobiont to withstand heat stress. The coral SML microbiome responded to heat stress with an increase in the relative abundance of taxa with probiotic potential, and functional genes for nitrogen and sulfur acquisition. These consistent and specific microbial taxa and gene functions that significantly increased in proportional abundance in corals exposed to heat are potentially beneficial to coral health and thermal resistance.

Publication

Sequencing at sea: challenges and experiences in Ion Torrent PGM sequencing during the 2013 Southern Line Islands Research Expedition

Publisher: PeerJ

Date: 19-08-2014

DOI: 10.7717/PEERJ.520

Publication

Deviations from Ultrametricity in Phage Protein Distances

Publisher: World Scientific Pub Co Pte Lt

Date: 03-2009

DOI: 10.1142/S1230161209000062

Abstract: Distances in biological databases are known not to be ultrametric. Deviations from ultrametricity can however reveal useful features of biodata. In the present study we examine deviations from ultrametricity of the distances between known phage proteins quantified in two senses: (1) the failure of triangles to be isosceles and (2) failure of every point to be the center of any sphere in which it resides. The deviations from these two ultrametric properties undergo qualitative changes as a function of the distance. Below we describe these changes and how they can be observed. We further argue that the distances at which the qualitative changes take place reveal intrinsic scales in the dataset. Such scales are important for choosing threshold values of the distance in various algorithms and reveal natural chunking of the data that can be used to decide clade levels in phage phylogeny.

Publication

Viral metagenomics

Publisher: Springer Science and Business Media LLC

Date: 10-05-2005

DOI: 10.1038/NRMICRO1163

Abstract: Viruses, most of which infect microorganisms, are the most abundant biological entities on the planet. Identifying and measuring the community dynamics of viruses in the environment is complicated because less than one percent of microbial hosts have been cultivated. Also, there is no single gene that is common to all viral genomes, so total uncultured viral ersity cannot be monitored using approaches analogous to ribosomal DNA profiling. Metagenomic analyses of uncultured viral communities circumvent these limitations and can provide insights into the composition and structure of environmental viral communities.

Publication

Qudaich

Publisher: Cold Spring Harbor Laboratory

Date: 24-06-2016

DOI: 10.1101/060509

Abstract: Next generation sequencing (NGS) technology produces massive amounts of data in a reasonable time and low cost. Analyzing and annotating these data requires sequence alignments to compare them with genes, proteins and genomes in different databases. Sequence alignment is the first step in metagenomics analysis, and pairwise comparisons of sequence reads provide a measure of similarity between environments. Most of the current aligners focus on aligning NGS datasets against long reference sequences rather than comparing between datasets. As the number of metagenomes and other genomic data increases each year, there is a demand for more sophisticated, faster sequence alignment algorithms. Here, we introduce a novel sequence aligner, Qudaich, which can efficiently process large volumes of data and is suited to de novo comparisons of next generation reads datasets. Qudaich can handle both DNA and protein sequences and attempts to provide the best possible alignment for each query sequence. Qudaich can produce more useful alignments quicker than other contemporary alignment algorithms. The recent developments in sequencing technology provides high throughput sequencing data and have resulted in large volumes of genomic and metagenomic data available in public databases. Sequence alignment is an important step for annotating these data. Many sequence aligners have been developed in last few years for efficient analysis of these data, however most of them are only able to align DNA sequences and mainly focus on aligning NGS data against long reference genomes. Therefore, in this study we have designed a new sequence aligner, qudaich, which can generate pairwise local sequence alignment (at both the DNA and protein level) between two NGS datasets and can efficiently handle the large volume of NGS datasets. In qudaich, we introduce a unique sequence alignment algorithm, which outperforms the traditional approaches. Qudaich not only takes less time to execute, but also finds more useful alignments than contemporary aligners.

Publication

Inside or Outside: Detecting the Cellular Location of Bacterial Pathogens

Publisher: Future Science Ltd

Date: 02-2001

DOI: 10.2144/01302ST03

Abstract: Salmonella are intracellular pathogens that infect and multiply inside macrophages. Although Salmonella are some of the beststudied pathogens, it is difficult to determine quickly and reliably whether the bacteria are intracellular or extracellular. We have developed a novel method using differential fluorescence of two fluorescent proteins to determine the cellular location of pathogenic bacteria in macrophage infection assays. Using the differential expression of two unique fluorescent proteins that are expressed under specific conditions, we have developed a real-time assay for macrophage infections. The critical advantages of this system are that it does not alter the bacterial surface, it is not toxic to either the bacteria or the host cell, and it may be used in realtime quantitative assays. This assay can be readily applied to any other model pathogenic systems such as Listeria, Mycobacteria, and Legionella in which intracellular gene expression has been characterized.

Publication

Community Genomics Among Stratified Microbial Assemblages in the Ocean's Interior

Publisher: American Association for the Advancement of Science (AAAS)

Date: 27-01-2006

DOI: 10.1126/SCIENCE.1120250

Abstract: Microbial life predominates in the ocean, yet little is known about its genomic variability, especially along the depth continuum. We report here genomic analyses of planktonic microbial communities in the North Pacific Subtropical Gyre, from the ocean's surface to near–sea floor depths. Sequence variation in microbial community genes reflected vertical zonation of taxonomic groups, functional gene repertoires, and metabolic potential. The distributional patterns of microbial genes suggested depth-variable community trends in carbon and energy metabolism, attachment and motility, gene mobility, and host-viral interactions. Comparative genomic analyses of stratified microbial communities have the potential to provide significant insight into higher-order community organization and dynamics.

Publication

Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets

Publisher: Public Library of Science (PLoS)

Date: 09-03-2011

DOI: 10.1371/JOURNAL.PONE.0017288

Publication

Phage Phenomics: Physiological Approaches to Characterize Novel Viral Proteins

Publisher: MyJove Corporation

Date: 11-06-2015

DOI: 10.3791/52854

Publication

Microbial genomic taxonomy

Publisher: Springer Science and Business Media LLC

Date: 2013

DOI: 10.1186/1471-2164-14-913

Publication

The minimum information about a genome sequence (MIGS) specification

Publisher: Springer Science and Business Media LLC

Date: 05-2008

DOI: 10.1038/NBT1360

Publication

In Vitro Characterization of the Bacillus subtilis Protein Tyrosine Phosphatase YwqE

Publisher: American Society for Microbiology

Date: 15-05-2005

DOI: 10.1128/JB.187.10.3384-3390.2005

Abstract: Both gram-negative and gram-positive bacteria possess protein tyrosine phosphatases (PTPs) with a catalytic Cys residue. In addition, many gram-positive bacteria have acquired a new family of PTPs, whose first characterized member was CpsB from Streptococcus pneumoniae. Bacillus subtilis contains one such CpsB-like PTP, YwqE, in addition to two class II Cys-based PTPs, YwlE and YfkJ. The substrates for both YwlE and YfkJ are presently unknown, while YwqE was shown to dephosphorylate two phosphotyrosine-containing proteins implicated in UDP-glucuronate biosynthesis, YwqD and YwqF. In this study, we characterize YwqE, compare the activities of the three B. subtilis PTPs (YwqE, YwlE, and YfkJ), and demonstrate that the two B. subtilis class II PTPs do not dephosphorylate the physiological substrates of YwqE.

Publication

Abrolhos Bank Reef Health Evaluated by Means of Water Quality, Microbial Diversity, Benthic Cover, and Fish Biomass Data

Publisher: Public Library of Science (PLoS)

Date: 05-06-2012

DOI: 10.1371/JOURNAL.PONE.0036687

Publication

Increasing DNA Transfer Efficiency by Temporary Inactivation of Host Restriction

Publisher: Future Science Ltd

Date: 05-1999

DOI: 10.2144/99265ST02

Abstract: E. coli and Salmonella typhimurium are widely used bacterial hosts for genetic manipulation of DNA from prokaryotes and eukaryotes. Introduction of foreign DNA by electroporation or transduction into E. coli and Salmonella is limited by host restriction of incoming DNA by the recipient cells. Here, we describe a simple method that temporarily inactivates host restriction, allowing high-frequency DNA transfer. This technique might be readily applied to a wide range of bacteria to increase DNA transfer between strains and species.

Publication

Genomic Comparison of the Closely-Related Salmonella enterica Serovars Enteritidis, Dublin and Gallinarum

Publisher: Public Library of Science (PLoS)

Date: 03-06-2015

DOI: 10.1371/JOURNAL.PONE.0126883

Publication

Analysis of Spounaviruses as a Case Study for the Overdue Reclassification of Tailed Phages

Publisher: Oxford University Press (OUP)

Date: 25-05-2019

DOI: 10.1093/SYSBIO/SYZ036

Abstract: Tailed bacteriophages are the most abundant and erse viruses in the world, with genome sizes ranging from 10 kbp to over 500 kbp. Yet, due to historical reasons, all this ersity is confined to a single virus order—Caudovirales, composed of just four families: Myoviridae, Siphoviridae, Podoviridae, and the newly created Ackermannviridae family. In recent years, this morphology-based classification scheme has started to crumble under the constant flood of phage sequences, revealing that tailed phages are even more genetically erse than once thought. This prompted us, the Bacterial and Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV), to consider overall reorganization of phage taxonomy. In this study, we used a wide range of complementary methods—including comparative genomics, core genome analysis, and marker gene phylogenetics—to show that the group of Bacillus phage SPO1-related viruses previously classified into the Spounavirinae subfamily, is clearly distinct from other members of the family Myoviridae and its ersity deserves the rank of an autonomous family. Thus, we removed this group from the Myoviridae family and created the family Herelleviridae—a new taxon of the same rank. In the process of the taxon evaluation, we explored the feasibility of different demarcation criteria and critically evaluated the usefulness of our methods for phage classification. The convergence of results, drawing a consistent and comprehensive picture of a new family with associated subfamilies, regardless of method, demonstrates that the tools applied here are particularly useful in phage taxonomy. We are convinced that creation of this novel family is a crucial milestone toward much-needed reclassification in the Caudovirales order.

Publication

Lytic to temperate switching of viral communities

Publisher: Springer Science and Business Media LLC

Date: 16-03-2016

DOI: 10.1038/NATURE17193

Abstract: Microbial viruses can control host abundances via density-dependent lytic predator-prey dynamics. Less clear is how temperate viruses, which coexist and replicate with their host, influence microbial communities. Here we show that virus-like particles are relatively less abundant at high host densities. This suggests suppressed lysis where established models predict lytic dynamics are favoured. Meta-analysis of published viral and microbial densities showed that this trend was widespread in erse ecosystems ranging from soil to freshwater to human lungs. Experimental manipulations showed viral densities more consistent with temperate than lytic life cycles at increasing microbial abundance. An analysis of 24 coral reef viromes showed a relative increase in the abundance of hallmark genes encoded by temperate viruses with increased microbial abundance. Based on these four lines of evidence, we propose the Piggyback-the-Winner model wherein temperate dynamics become increasingly important in ecosystems with high microbial densities thus 'more microbes, fewer viruses'.

Publication

Detection of Large Numbers of Novel Sequences in the Metatranscriptomes of Complex Marine Microbial Communities

Publisher: Wiley

Date: 16-09-2011

DOI: 10.1002/9781118010549.CH27

Publication

Metagenomic and stable isotopic analyses of modern freshwater microbialites in Cuatro Ciénegas, Mexico

Publisher: Wiley

Date: 2009

DOI: 10.1111/J.1462-2920.2008.01725.X

Publication

Integration of genomic and proteomic analyses in the classification of the Siphoviridae family

Publisher: Elsevier BV

Date: 03-2015

DOI: 10.1016/J.VIROL.2014.10.016

Abstract: Using a variety of genomic (BLASTN, ClustalW) and proteomic (Phage Proteomic Tree, CoreGenes) tools we have tackled the taxonomic status of members of the largest bacteriophage family, the Siphoviridae. In all over 400 phages were examined and we were able to propose 39 new genera, comprising 216 phage species, and add 62 species to two previously defined genera (Phic3unalikevirus L5likevirus) grouping, in total, 390 fully sequenced phage isolates. Many of the remainders are orphans which the Bacterial and Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV) chooses not to ascribe genus status at the time being.

Publication

Poster Session Abstracts

Publisher: Wiley

Date: 21-09-2016

DOI: 10.1002/PPUL.23576

Publication

Bacterial carbon processing by generalist species in the coastal ocean

Publisher: Springer Science and Business Media LLC

Date: 27-01-2008

DOI: 10.1038/NATURE06513

Abstract: The assimilation and mineralization of dissolved organic carbon (DOC) by marine bacterioplankton is a major process in the ocean carbon cycle. However, little information exists on the specific metabolic functions of participating bacteria and on whether in idual taxa specialize on particular components of the marine DOC pool. Here we use experimental metagenomics to show that coastal communities are populated by taxa capable of metabolizing a wide variety of organic carbon compounds. Genomic DNA captured from bacterial community subsets metabolizing a single model component of the DOC pool (either dimethylsulphoniopropionate or vanillate) showed substantial overlap in gene composition as well as a ersity of carbon-processing capabilities beyond the selected phenotypes. Our direct measure of niche breadth for bacterial functional assemblages indicates that, in accordance with ecological theory, heterogeneity in the composition and supply of organic carbon to coastal oceans may favour generalist bacteria. In the important interplay between microbial community structure and biogeochemical cycling, coastal heterotrophic communities may be controlled less by transient changes in the carbon reservoir that they process and more by factors such as trophic interactions and physical conditions.

Publication

The RAST Server: Rapid Annotations using Subsystems Technology

Publisher: Springer Science and Business Media LLC

Date: 08-02-2008

DOI: 10.1186/1471-2164-9-75

Abstract: The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.

Publication

Temperate gut phages are prevalent, diverse, and predominantly inactive

Publisher: Cold Spring Harbor Laboratory

Date: 18-08-2023

DOI: 10.1101/2023.08.17.553642

Abstract: Large-scale metagenomic and data mining efforts have uncovered an expansive ersity of bacteriophages (phages) within the human gut 1–3 . These insights include broader phage populational dynamics such as temporal stability 4 , interin idual uniqueness 5,6 and potential associations to specific disease states 7,8 . However, the functional understanding of phage-host interactions and their impacts within this complex ecosystem have been limited due to a lack of cultured isolates for experimental validation. Here we characterise 125 active prophages originating from 252 erse human gut bacterial isolates using seven different induction conditions to substantially expand the experimentally validated temperate phage-host pairs originating from the human gut. Importantly, only 17% of computationally predicted prophages were induced with common induction agents and these exhibited distinct gene patterns compared to non-induced predictions. Active Bacteroidota prophages were among the most prevalent members of the gut virome, with extensive use of ersity generating retroelements and exhibiting broad host ranges. Moreover, active polylysogeny was present in 52% of studied gut lysogens and led to coordinated prophage induction across erse conditions. This study represents a substantial expansion of experimentally validated gut prophages, providing key insights into their ersity and genetics, including a genetic pathway for prophage domestication and demonstration that differential induction was complex and influenced by ergent prophage integration sites. More broadly, it highlights the importance of experimental validation alongside genomic based computational prediction to enable further functional understanding of these commensal viruses within the human gut.

Publication

Functional characterization of ligninolytic Klebsiella spp. strains associated with soil and freshwater

Publisher: Springer Science and Business Media LLC

Date: 11-06-2018

DOI: 10.1007/S00203-018-1532-0

Abstract: Overcoming recalcitrance of lignin has motivated bioprospecting of high-yielding enzymes from environmental ligninolytic microorganisms associated with lignocellulose degrading-systems. Here, we performed isolation of 21 ligninolytic strains belonging to the genus Klebsiella spp., driven by the presence of lignin in the media. The fastest-growing strains (FP10-5.23, FP10-5.22 and P3TM1) reached the stationary phase in approximately 24 h, in the media containing lignin as the main carbon source. The strains showed biochemical evidence of ligninolytic potential in liquid- and solid media-converting dyes, which the molecular structures are similar to lignin fragments. In liquid medium, higher levels of dye decolorization was observed for P3TM.1 in the presence of methylene blue, reaching 98% decolorization in 48 h. The highest index values (1.25) were found for isolates P3TM.1 and FP10-5.23, in the presence of toluidine blue. The genomic analysis revealed the presence of more than 20 genes associated with known prokaryotic lignin-degrading systems. Identification of peroxidases (lignin peroxidase-LiP, dye-decolorizing peroxidase-DyP, manganese peroxidase-MnP) and auxiliary activities (AA2, AA3, AA6 and AA10 families) among the genetic repertoire suggest the ability to produce extracellular enzymes able to attack phenolic and non-phenolic lignin structures. Our results suggest that the Klebsiella spp. associated with fresh water and soil may play important role in the cycling of recalcitrant molecules in the Caatinga (desert-like Brazilian biome), and represent a potential source of lignin-degrading enzymes with biotechnological applications.

Publication

An old dog learns new tricks

Publisher: Elsevier BV

Date: 05-2001

DOI: 10.1016/S0966-842X(01)02044-3

Robert Edwards

Researcher

Research Topics

Top 5 Research Topics

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Related Links

Publications

The importance of complete genome sequences

Taxonomy of prokaryotic viruses: 2016 update from the ICTV bacterial and archaeal viruses subcommittee.

Evolution of microbial pathogens

hafeZ: Active prophage identification through read mapping

The human gut virome: Composition, colonisation, interactions, and impacts on human health

The Marine Viromes of Four Oceanic Regions

Aging and Intermittent Fasting Impact on Transcriptional Regulation and Physiological Responses of Adult Drosophila Neuronal and Muscle Tissues

The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes

Ten simple rules and a template for creating workflows-as-applications

NCBI’s Virus Discovery Codeathon: Building “FIVE” —The Federated Index of Viral Experiments API Index

Essential genes on metabolic maps

THEA: A novel approach to gene identification in phage genomes

A bioinformatic analysis of ribonucleotide reductase genes in phage genomes and metagenomes

Variability and host density independence in inductions-based estimates of environmental lysogeny

MultiPhATE2: code for functional annotation and comparison of phage genomes

A Distinct Contractile Injection System Found in a Majority of Adult Human Microbiomes

Low-Molecular-Weight Protein Tyrosine Phosphatases of Bacillus subtilis

Draft Genome Sequence of Cylindrospermopsis raciborskii (Cyanobacteria) Strain ITEP-A1 Isolated from a Brazilian Semiarid Freshwater Body: Evidence of Saxitoxin and Cyli

Genomic analysis and growth-phase-dependent regulation of the SEF14 fimbriae of Salmonella enterica serovar Enteritidis The GenBank accession number for the sequence reported in this pa

Microbes, metagenomes and marine mammals: enabling the next generation of scientist to enter the genomic era

FOCUS2: agile and sensitive classification of metagenomics data using a reduced database

An Agile Functional Analysis of Metagenomic Data Using SUPER-FOCUS

Microfluidic PCR Combined with Pyrosequencing for Identification of Allelic Variants with Phenotypic Associations among Targeted Salmonella Genes

PRFect: A tool to predict programmed ribosomal frameshifts in prokaryotic and viral genomes

Decoding diversity in a coral reef fish species complex with restricted range using metagenomic sequencing of gut contents

Genome Sequences of the Ethanol-Tolerant Lactobacillus vini Strains LMG 23202 T and JP7.8.9

Biodiversity and biogeography of phages in modern stromatolites and thrombolites

A Distinct Contractile Injection System Gene Cluster Found in a Majority of Healthy Adult Human Microbiomes

Predicting the capsid architecture of phages from metagenomic data

Global phylogeography and ancient evolution of the widespread human gut virus crAssphage

GenomePeek—an online tool for prokaryotic genome and metagenome analysis

Mechanistic Model of Rothia mucilaginosa Adaptation toward Persistence in the CF Lung, Based on a Genome Reconstructed from Metagenomic Data

‘Genome skimming’ with the MinION hand-held sequencer identifies CITES-listed shark species in India’s exports market

Author Correction: Guidelines for public database submission of uncultivated virus genome sequences for taxonomic classification

linsalrob/fasta_validator: Initial Release

Differential regulation of fasA and fasH expression of Escherichia coli 987P fimbriae by environmental cues

Host interactions of novel Crassvirales species belonging to multiple families infecting bacterial host, Bacteroides cellulosilyticus WH2

Prophage rates in the human microbiome vary by body site and host health

Global phylogeography and ancient evolution of the widespread human gut virus crAssphage

Structure and function of a cyanophage-encoded peptide deformylase

PRINSEQ++, a multi-threaded tool for fast and efficient quality control and preprocessing of sequencing datasets

Philympics 2021: Prophage Predictions Perplex Programs

Sequencing at sea: Challenges and experiences in Ion Torrent PGM sequencing during the 2013 Southern Line Islands Research Expedition

Philympics 2021: Prophage Predictions Perplex Programs

Multidimensional metrics for estimating phage abundance, distribution, gene density, and sequence coverage in metagenomes

The importance of complete genome sequences

­Growth Score: A single metric to define growth in 96-well phenotype assays

Taxonomy of prokaryotic viruses: 2018-2019 update from the ICTV Bacterial and Archaeal Viruses Subcommittee

Comparative Metagenomics Reveals Host Specific Metavirulomes and Horizontal Gene Transfer Elements in the Chicken Cecum Microbiome

Nitrogen control in bacteria

Genome analysis of the obligately lytic bacteriophage 4268 of Lactococcus lactis provides insight into its adaptable nature

Correction: Corrigendum: Lytic to temperate switching of viral communities

Experimental and Computational Assessment of Conditionally Essential Genes in Escherichia coli

Programmed ribosomal frameshifts, and how to find them

Clinical Insights from Metagenomic Analysis of Sputum Samples from Patients with Cystic Fibrosis

Some of the most interesting CASP11 targets through the eyes of their authors

Real Time Metagenomics: Using k-mers to annotate metagenomes

Quality control and preprocessing of metagenomic datasets

Poster Session Abstracts

Growth Score: a single metric to define growth in 96-well phenotype assays

Connecting genotype to phenotype in the era of high-throughput sequencing

Cyanobacterial biodiversity of semiarid public drinking water supply reservoirs assessed via next-generation DNA sequencing technology

The StkSR Two-Component System Influences Colistin Resistance in Acinetobacter baumannii

Hecatomb: An End-to-End Research Platform for Viral Metagenomics

Combining de novo and reference-guided assembly with scaffold_builder

Genome Sequence of the Bacterioplanktonic, Mixotrophic Vibrio campbellii Strain PEL22A, Isolated in the Abrolhos Bank

Global microbialization of coral reefs

Baseline Assessment of Mesophotic Reefs of the Vitória-Trindade Seamount Chain Based on Water Quality, Microbial Diversity, Benthic Cover and Fish Biomass Data

A Novel Group of Promiscuous Podophages Infecting Diverse Gammaproteobacteria from River Communities Exhibits Dynamic Intergenus Host Adaptation

NCBI’s Virus Discovery Hackathon: Engaging Research Communities to Identify Cloud Infrastructure Requirements

Phables: from fragmented assemblies to high-quality bacteriophage genomes

The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes

Genome Sequences of the Ethanol-Tolerant Lactobacillus vini Strains LMG 23202 ^T and JP7.8.9

Growth Score: A single metric to define growth in 96-well phenotype assays