ARDC Research Link Australia

ORCID Profile
Orcid icon. 0000-0003-3418-4218

Current Organisations
University of Zurich , ETH Zürich

Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.

Publications

Publication

Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability

Publisher: Cold Spring Harbor Laboratory

Date: 23-09-2022

DOI: 10.1101/2022.09.22.508982

Abstract: Computational methods represent the lifeblood of modern molecular biology. Benchmarking is important for all methods, but with a focus here on computational methods, benchmarking is critical to dissect important steps of analysis pipelines, formally assess performance across common situations as well as edge cases, and ultimately guide users on what tools to use. Benchmarking can also be important for community building and advancing methods in a principled way. We conducted a meta-analysis of recent single-cell benchmarks to summarize the scope, extensibility, neutrality, as well as technical features and whether best practices in open data and reproducible research were followed. The results highlight that while benchmarks often make code available and are in principle reproducible, they remain difficult to extend, for ex le, as new methods and new ways to assess methods emerge. In addition, embracing containerization and workflow systems would enhance reusability of intermediate benchmarking results, thus also driving wider adoption.

Publication

muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data

Publisher: Springer Science and Business Media LLC

Date: 30-11-2020

DOI: 10.1038/S41467-020-19894-4

Abstract: Single-cell RNA sequencing (scRNA-seq) has become an empowering technology to profile the transcriptomes of in idual cells on a large scale. Early analyses of differential expression have aimed at identifying differences between subpopulations to identify subpopulation markers. More generally, such methods compare expression levels across sets of cells, thus leading to cross-condition analyses. Given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making s le-level inferences, termed here as differential state analysis however, it is not clear which statistical framework best handles this situation. Here, we surveyed methods to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated pseudobulk data. To evaluate method performance, we developed a flexible simulation that mimics multi-s le scRNA-seq data. We analyzed scRNA-seq data from mouse cortex cells to uncover subpopulation-specific responses to lipopolysaccharide treatment, and provide robust tools for multi-condition analysis within the muscat R package.

Publication

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single-cell RNA-seq preprocessing tools

Publisher: Cold Spring Harbor Laboratory

Date: 02-02-2020

DOI: 10.1101/2020.02.02.930578

Abstract: The massive growth of single-cell RNA-sequencing (scRNAseq) and the methods for its analysis still lack sufficient and up-to-date benchmarks that could guide analytical choices. Numerous benchmark studies already exist and cover most of scRNAseq processing and analytical methods but only a few give advice on a comprehensive pipeline. Moreover, current studies often focused on isolated steps of the process and do not address the impact of a tool on both the intermediate and the final steps of the analysis. Here, we present a flexible R framework for pipeline comparison with multi-level evaluation metrics. We apply it to the benchmark of scRNAseq analysis pipelines using simulated and real datasets with known cell identities, covering common methods of filtering, doublet detection, normalization, feature selection, denoising, dimensionality reduction and clustering. We evaluate the choice of these tools with multi-purpose metrics to assess their ability to reveal cell population structure and lead to efficient clustering. On the basis of our systematic evaluations of analysis pipelines, we make a number of practical recommendations about current analysis choices and for a comprehensive pipeline. The evaluation framework that we developed, pipeComp ( lger ipeComp ), has been implemented so as to easily integrate any other step, tool, or evaluation metric allowing extensible benchmarks and easy applications to other fields of research in Bioinformatics, as we demonstrate through a study of the impact of removal of unwanted variation on differential expression analysis.

Publication

On the discovery of subpopulation-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data

Publisher: Cold Spring Harbor Laboratory

Date: 26-07-2019

DOI: 10.1101/713412

Abstract: Single-cell RNA sequencing (scRNA-seq) has quickly become an empowering technology to profile the transcriptomes of in idual cells on a large scale. Many early analyses of differential expression have aimed at identifying differences between subpopulations, and thus are focused on finding subpopulation markers either in a single s le or across multiple s les. More generally, such methods can compare expression levels in multiple sets of cells, thus leading to cross-condition analyses. However, given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making s le-level inferences, termed here as differential state analysis. For ex le, one could investigate the condition-specific responses of cell subpopulations measured from patients from each condition however, it is not clear which statistical framework best handles this situation. In this work, we surveyed the methods available to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated “pseudobulk” data. We developed a flexible simulation platform that mimics both single and multi-s le scRNA-seq data and provide robust tools for multi-condition analysis within the muscat R package.

Publication

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools

Publisher: Springer Science and Business Media LLC

Date: 09-2020

DOI: 10.1186/S13059-020-02136-7

Abstract: We present pipeComp ( lger ipeComp ), a flexible R framework for pipeline comparison handling interactions between analysis steps and relying on multi-level evaluation metrics. We apply it to the benchmark of single-cell RNA-sequencing analysis pipelines using simulated and real datasets with known cell identities, covering common methods of filtering, doublet detection, normalization, feature selection, denoising, dimensionality reduction, and clustering. pipeComp can easily integrate any other step, tool, or evaluation metric, allowing extensible benchmarks and easy applications to other fields, as we demonstrate through a study of the impact of removal of unwanted variation on differential expression analysis.

Publication

treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses

Publisher: Cold Spring Harbor Laboratory

Date: 09-06-2020

DOI: 10.1101/2020.06.08.140608

Abstract: The arrangement of hypotheses in a hierarchical structure (e.g., phylogenies, cell types) appears in many research fields and indicates different resolutions at which data can be interpreted. A common goal is to find a representative resolution that gives high sensitivity to identify relevant entities (e.g., microbial taxa or cell subpopulations) that are related to a phenotypic outcome (e.g. disease status) while controlling false detections, therefore providing a more compact view of detected entities and summarizing characteristics shared among them. Current methods, either performing hypothesis tests at an arbitrary resolution or testing hypotheses at all possible resolutions leading to nested results, are suboptimal. Moreover, they are not flexible enough to work in situations where each entity has multiple features to consider and different resolutions might be required for different features. For ex le, in single cell RNA-seq data, an increasing focus is to find differential state genes that change expression within a cell subpopulation in response to an external stimulus. Such differential expression might occur at different resolutions (e.g., all cells or a small set of cells) for different genes. Our new algorithm treeclimbR is designed to fill this gap by exploiting a hierarchical tree of entities, proposing multiple candidates that capture the latent signal and pinpointing branches or leaves that contain features of interest, in a data-driven way. It outperforms currently available methods on synthetic data, and we highlight the approach on various applications, including microbiome and microRNA surveys as well as single cell cytometry and RNA-seq datasets. With the emergence of various multi-resolution genomic datasets, treeclimbR provides a thorough inspection on entities across resolutions and gives additional flexibility to uncover biological associations.

Publication

Doublet identification in single-cell sequencing data using scDblFinder

Publisher: F1000 Research Ltd

Date: 28-09-2021

DOI: 10.12688/F1000RESEARCH.73600.1

Abstract: Doublets are prevalent in single-cell sequencing data and can lead to artifactual findings. A number of strategies have therefore been proposed to detect them. Building on the strengths of existing approaches, we developed scDblFinder , a fast, flexible and accurate Bioconductor-based doublet detection method. Here we present the method, justify its design choices, demonstrate its performance on both single-cell RNA and accessibility sequencing data, and provide some observations on doublet formation, detection, and enrichment analysis. Even in complex datasets, scDblFinder can accurately identify most heterotypic doublets, and was already found by an independent benchmark to outcompete alternatives.

Publication

Doublet identification in single-cell sequencing data using scDblFinder

Publisher: F1000 Research Ltd

Date: 16-05-2022

DOI: 10.12688/F1000RESEARCH.73600.2

Abstract: Doublets are prevalent in single-cell sequencing data and can lead to artifactual findings. A number of strategies have therefore been proposed to detect them. Building on the strengths of existing approaches, we developed scDblFinder , a fast, flexible and accurate Bioconductor-based doublet detection method. Here we present the method, justify its design choices, demonstrate its performance on both single-cell RNA and accessibility (ATAC) sequencing data, and provide some observations on doublet formation, detection, and enrichment analysis. Even in complex datasets, scDblFinder can accurately identify most heterotypic doublets, and was already found by an independent benchmark to outcompete alternatives.

Publication

Benchmarking computational methods for single-cell chromatin data analysis

Publisher: Cold Spring Harbor Laboratory

Date: 07-08-2023

DOI: 10.1101/2023.08.04.552046

Abstract: Single-cell chromatin accessibility assays, such as scATAC-seq, are increasingly employed in in idual and joint multi-omic profiling of single cells. As the accumulation of scATAC-seq and multi-omics datasets continue, challenges in analyzing such sparse, noisy, and high-dimensional data become pressing. Specifically, one challenge relates to optimizing the processing of chromatin-level measurements and efficiently extracting information to discern cellular heterogeneity. This is of critical importance, since the identification of cell types is a fundamental step in current single-cell data analysis practices. We benchmarked 8 feature engineering pipelines derived from 5 recent methods to assess their ability to discover and discriminate cell types. By using 10 metrics calculated at the cell embedding, shared nearest neighbor graph, or partition levels, we evaluated the performance of each method at different data processing stages. This comprehensive approach allowed us to thoroughly understand the strengths and weaknesses of each method and the influence of parameter selection. Our analysis provides guidelines for choosing analysis methods for different datasets. Overall, feature aggregation, SnapATAC, and SnapATAC2 outperform latent semantic indexing-based methods. For datasets with complex cell-type structures, SnapATAC and SnapATAC2 are preferred. With large datasets, SnapATAC2 and ArchR are most scalable.

Publication

YY1 Haploinsufficiency Causes an Intellectual Disability Syndrome Featuring Transcriptional and Chromatin Dysfunction.

Publisher: Elsevier BV

Date: 06-2017

DOI: 10.1016/J.AJHG.2017.05.006

Related Organisations

Organisation

European Institute Of Oncology

Location: Italy

View Organisation

Organisation

University Of Zurich

Location: Switzerland

View Organisation

Organisation

University Of Montreal

Location: Canada

View Organisation

Organisation

European School Of Molecular Medicine

Location: Italy

View Organisation

Organisation

European School Of Molecular Medicine & University Of Milan

Location: Italy

View Organisation

Organisation

ETH Zürich

Location: Switzerland

View Organisation

Related Funding Activities

No related grants have been discovered for Pierre-Luc Germain.

Pierre-Luc Germain

Researcher

Related Links

Publications

Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability

muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single-cell RNA-seq preprocessing tools

On the discovery of subpopulation-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools

treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses

Doublet identification in single-cell sequencing data using scDblFinder

Doublet identification in single-cell sequencing data using scDblFinder

Benchmarking computational methods for single-cell chromatin data analysis

YY1 Haploinsufficiency Causes an Intellectual Disability Syndrome Featuring Transcriptional and Chromatin Dysfunction.

Related Organisations

European Institute Of Oncology

University Of Zurich

University Of Montreal

European School Of Molecular Medicine

European School Of Molecular Medicine & University Of Milan

ETH Zürich

Related Funding Activities

ARDC NEWSLETTER SIGNUP