ORCID Profile
0000-0002-8921-6005
Current Organisation
Ecole Normale Supérieure Lyon
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: F1000 Research Ltd
Date: 19-04-2021
DOI: 10.12688/F1000RESEARCH.29032.2
Abstract: Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
Publisher: F1000 Research Ltd
Date: 18-01-2021
DOI: 10.12688/F1000RESEARCH.29032.1
Abstract: Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.
Publisher: Cold Spring Harbor Laboratory
Date: 15-09-2021
DOI: 10.1101/2021.09.15.460475
Abstract: Short-read variant calling for bacterial genomics is a mature field, and there are many widely-used software tools. Different underlying approaches (eg pileup, local or global assembly, paired-read use, haplotype use) lend each tool different strengths, especially when considering non-SNP (single nucleotide polymorphism) variation or potentially distant reference genomes. It would therefore be valuable to be able to integrate the results from multiple variant callers, using a robust statistical approach to “adjudicate” at loci where there is disagreement between callers. To this end, we present a tool, Minos, for variant adjudication by mapping reads to a genome graph of variant calls. Minos allows users to combine output from multiple variant callers without loss of precision. Minos also addresses a second problem of joint genotyping SNPs and indels in bacterial cohorts, which can also be framed as an adjudication problem. We benchmark on 62 s les from 3 species ( Mycobacterium tuberculosis, Staphylococcus aureus, Klebsiella pneumoniae ) and an outbreak of 385 M. tuberculosis s les. Finally, we joint genotype a large M. tuberculosis cohort (N ≈ 15k) for which the rif icin phenotype is known. We build a map of non-synonymous variants in the RRDR (rif icin resistance determining region) of the rpoB gene and extend current knowledge relating RRDR SNPs to heterogeneity in rif icin resistance levels. We replicate this finding in a second M. tuberculosis cohort (N ≈ 13k). Minos is released under the MIT license, available at qbal-lab-org/minos .
Publisher: Wiley
Date: 08-08-2011
Publisher: Cold Spring Harbor Laboratory
Date: 12-11-2020
DOI: 10.1101/2020.11.12.380378
Abstract: Bacterial genomes follow a U-shaped frequency distribution whereby most genomic loci are either rare (accessory) or common (core) the union of these is the pan-genome. The alignable fraction of two genomes from a single species can be low (e.g. 50-70%), such that no single reference genome can access all single nucleotide polymorphisms (SNPs). The pragmatic solution is to choose a close reference, and analyse SNPs only in the core genome. Given much bacterial adaptability hinges on the accessory genome, this is an unsatisfactory limitation. We present a novel pan-genome graph structure and algorithms implemented in the software pandora , which approximates a sequenced genome as a recombinant of reference genomes, detects novel variation and then pan-genotypes multiple s les. The method takes fastq as input and outputs a multi-s le VCF with respect to an inferred data-dependent reference genome, and is available at mcolq andora . Constructing a reference graph from 578 E. coli genomes, we analyse a erse set of 20 E. coli isolates. We show pandora recovers at least 13k more rare SNPs than single-reference based tools, achieves equal or better error rates with Nanopore as with Illumina data, 6-24x lower Nanopore error rates than other tools, and provides a stable framework for analysing erse s les without reference bias. We also show that our inferred recombinant VCF reference genome is significantly better than simply picking the closest RefSeq reference. This is a step towards comprehensive cohort analysis of bacterial pan-genomic variation, with potential impacts on genotype henotype and epidemiological studies.
Publisher: Springer Science and Business Media LLC
Date: 14-09-2021
DOI: 10.1186/S13059-021-02473-1
Abstract: We present pandora , a novel pan-genome graph structure and algorithms for identifying variants across the full bacterial pan-genome. As much bacterial adaptability hinges on the accessory genome, methods which analyze SNPs in just the core genome have unsatisfactory limitations. Pandora approximates a sequenced genome as a recombinant of references, detects novel variation and pan-genotypes multiple s les. Using a reference graph of 578 Escherichia coli genomes, we compare 20 erse isolates. Pandora recovers more rare SNPs than single-reference-based tools, is significantly better than picking the closest RefSeq reference, and provides a stable framework for analyzing erse s les without reference bias.
Location: United Kingdom of Great Britain and Northern Ireland
Location: United Kingdom of Great Britain and Northern Ireland
No related grants have been discovered for Brice Letcher.