ORCID Profile
0000-0003-3418-4218
Current Organisations
University of Zurich
,
ETH Zürich
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: Cold Spring Harbor Laboratory
Date: 23-09-2022
DOI: 10.1101/2022.09.22.508982
Abstract: Computational methods represent the lifeblood of modern molecular biology. Benchmarking is important for all methods, but with a focus here on computational methods, benchmarking is critical to dissect important steps of analysis pipelines, formally assess performance across common situations as well as edge cases, and ultimately guide users on what tools to use. Benchmarking can also be important for community building and advancing methods in a principled way. We conducted a meta-analysis of recent single-cell benchmarks to summarize the scope, extensibility, neutrality, as well as technical features and whether best practices in open data and reproducible research were followed. The results highlight that while benchmarks often make code available and are in principle reproducible, they remain difficult to extend, for ex le, as new methods and new ways to assess methods emerge. In addition, embracing containerization and workflow systems would enhance reusability of intermediate benchmarking results, thus also driving wider adoption.
Publisher: Springer Science and Business Media LLC
Date: 30-11-2020
DOI: 10.1038/S41467-020-19894-4
Abstract: Single-cell RNA sequencing (scRNA-seq) has become an empowering technology to profile the transcriptomes of in idual cells on a large scale. Early analyses of differential expression have aimed at identifying differences between subpopulations to identify subpopulation markers. More generally, such methods compare expression levels across sets of cells, thus leading to cross-condition analyses. Given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making s le-level inferences, termed here as differential state analysis however, it is not clear which statistical framework best handles this situation. Here, we surveyed methods to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated pseudobulk data. To evaluate method performance, we developed a flexible simulation that mimics multi-s le scRNA-seq data. We analyzed scRNA-seq data from mouse cortex cells to uncover subpopulation-specific responses to lipopolysaccharide treatment, and provide robust tools for multi-condition analysis within the muscat R package.
Publisher: Cold Spring Harbor Laboratory
Date: 02-02-2020
DOI: 10.1101/2020.02.02.930578
Abstract: The massive growth of single-cell RNA-sequencing (scRNAseq) and the methods for its analysis still lack sufficient and up-to-date benchmarks that could guide analytical choices. Numerous benchmark studies already exist and cover most of scRNAseq processing and analytical methods but only a few give advice on a comprehensive pipeline. Moreover, current studies often focused on isolated steps of the process and do not address the impact of a tool on both the intermediate and the final steps of the analysis. Here, we present a flexible R framework for pipeline comparison with multi-level evaluation metrics. We apply it to the benchmark of scRNAseq analysis pipelines using simulated and real datasets with known cell identities, covering common methods of filtering, doublet detection, normalization, feature selection, denoising, dimensionality reduction and clustering. We evaluate the choice of these tools with multi-purpose metrics to assess their ability to reveal cell population structure and lead to efficient clustering. On the basis of our systematic evaluations of analysis pipelines, we make a number of practical recommendations about current analysis choices and for a comprehensive pipeline. The evaluation framework that we developed, pipeComp ( lger ipeComp ), has been implemented so as to easily integrate any other step, tool, or evaluation metric allowing extensible benchmarks and easy applications to other fields of research in Bioinformatics, as we demonstrate through a study of the impact of removal of unwanted variation on differential expression analysis.
Publisher: Cold Spring Harbor Laboratory
Date: 26-07-2019
DOI: 10.1101/713412
Abstract: Single-cell RNA sequencing (scRNA-seq) has quickly become an empowering technology to profile the transcriptomes of in idual cells on a large scale. Many early analyses of differential expression have aimed at identifying differences between subpopulations, and thus are focused on finding subpopulation markers either in a single s le or across multiple s les. More generally, such methods can compare expression levels in multiple sets of cells, thus leading to cross-condition analyses. However, given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making s le-level inferences, termed here as differential state analysis. For ex le, one could investigate the condition-specific responses of cell subpopulations measured from patients from each condition however, it is not clear which statistical framework best handles this situation. In this work, we surveyed the methods available to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated “pseudobulk” data. We developed a flexible simulation platform that mimics both single and multi-s le scRNA-seq data and provide robust tools for multi-condition analysis within the muscat R package.
Publisher: Springer Science and Business Media LLC
Date: 09-2020
DOI: 10.1186/S13059-020-02136-7
Abstract: We present pipeComp ( lger ipeComp ), a flexible R framework for pipeline comparison handling interactions between analysis steps and relying on multi-level evaluation metrics. We apply it to the benchmark of single-cell RNA-sequencing analysis pipelines using simulated and real datasets with known cell identities, covering common methods of filtering, doublet detection, normalization, feature selection, denoising, dimensionality reduction, and clustering. pipeComp can easily integrate any other step, tool, or evaluation metric, allowing extensible benchmarks and easy applications to other fields, as we demonstrate through a study of the impact of removal of unwanted variation on differential expression analysis.
Publisher: Cold Spring Harbor Laboratory
Date: 09-06-2020
DOI: 10.1101/2020.06.08.140608
Abstract: The arrangement of hypotheses in a hierarchical structure (e.g., phylogenies, cell types) appears in many research fields and indicates different resolutions at which data can be interpreted. A common goal is to find a representative resolution that gives high sensitivity to identify relevant entities (e.g., microbial taxa or cell subpopulations) that are related to a phenotypic outcome (e.g. disease status) while controlling false detections, therefore providing a more compact view of detected entities and summarizing characteristics shared among them. Current methods, either performing hypothesis tests at an arbitrary resolution or testing hypotheses at all possible resolutions leading to nested results, are suboptimal. Moreover, they are not flexible enough to work in situations where each entity has multiple features to consider and different resolutions might be required for different features. For ex le, in single cell RNA-seq data, an increasing focus is to find differential state genes that change expression within a cell subpopulation in response to an external stimulus. Such differential expression might occur at different resolutions (e.g., all cells or a small set of cells) for different genes. Our new algorithm treeclimbR is designed to fill this gap by exploiting a hierarchical tree of entities, proposing multiple candidates that capture the latent signal and pinpointing branches or leaves that contain features of interest, in a data-driven way. It outperforms currently available methods on synthetic data, and we highlight the approach on various applications, including microbiome and microRNA surveys as well as single cell cytometry and RNA-seq datasets. With the emergence of various multi-resolution genomic datasets, treeclimbR provides a thorough inspection on entities across resolutions and gives additional flexibility to uncover biological associations.
Publisher: F1000 Research Ltd
Date: 28-09-2021
DOI: 10.12688/F1000RESEARCH.73600.1
Abstract: Doublets are prevalent in single-cell sequencing data and can lead to artifactual findings. A number of strategies have therefore been proposed to detect them. Building on the strengths of existing approaches, we developed scDblFinder , a fast, flexible and accurate Bioconductor-based doublet detection method. Here we present the method, justify its design choices, demonstrate its performance on both single-cell RNA and accessibility sequencing data, and provide some observations on doublet formation, detection, and enrichment analysis. Even in complex datasets, scDblFinder can accurately identify most heterotypic doublets, and was already found by an independent benchmark to outcompete alternatives.
Publisher: F1000 Research Ltd
Date: 16-05-2022
DOI: 10.12688/F1000RESEARCH.73600.2
Abstract: Doublets are prevalent in single-cell sequencing data and can lead to artifactual findings. A number of strategies have therefore been proposed to detect them. Building on the strengths of existing approaches, we developed scDblFinder , a fast, flexible and accurate Bioconductor-based doublet detection method. Here we present the method, justify its design choices, demonstrate its performance on both single-cell RNA and accessibility (ATAC) sequencing data, and provide some observations on doublet formation, detection, and enrichment analysis. Even in complex datasets, scDblFinder can accurately identify most heterotypic doublets, and was already found by an independent benchmark to outcompete alternatives.
Publisher: Cold Spring Harbor Laboratory
Date: 07-08-2023
DOI: 10.1101/2023.08.04.552046
Abstract: Single-cell chromatin accessibility assays, such as scATAC-seq, are increasingly employed in in idual and joint multi-omic profiling of single cells. As the accumulation of scATAC-seq and multi-omics datasets continue, challenges in analyzing such sparse, noisy, and high-dimensional data become pressing. Specifically, one challenge relates to optimizing the processing of chromatin-level measurements and efficiently extracting information to discern cellular heterogeneity. This is of critical importance, since the identification of cell types is a fundamental step in current single-cell data analysis practices. We benchmarked 8 feature engineering pipelines derived from 5 recent methods to assess their ability to discover and discriminate cell types. By using 10 metrics calculated at the cell embedding, shared nearest neighbor graph, or partition levels, we evaluated the performance of each method at different data processing stages. This comprehensive approach allowed us to thoroughly understand the strengths and weaknesses of each method and the influence of parameter selection. Our analysis provides guidelines for choosing analysis methods for different datasets. Overall, feature aggregation, SnapATAC, and SnapATAC2 outperform latent semantic indexing-based methods. For datasets with complex cell-type structures, SnapATAC and SnapATAC2 are preferred. With large datasets, SnapATAC2 and ArchR are most scalable.
Publisher: Elsevier BV
Date: 06-2017
Location: Italy
No related grants have been discovered for Pierre-Luc Germain.