ARDC Research Link Australia

Publication

Common Features of Regulatory T Cell Specialization During Th1 Responses

Publisher: Frontiers Media SA

Date: 13-06-2018

DOI: 10.3389/FIMMU.2018.01344

Publication

An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics

Publisher: Cold Spring Harbor Laboratory

Date: 15-11-2017

DOI: 10.1101/GR.218255.116

Abstract: Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae , Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.

Publication

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets

Publisher: F1000 Research Ltd

Date: 24-05-2019

DOI: 10.12688/F1000RESEARCH.11622.3

Abstract: High-dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high-throughput interrogation and characterization of cell populations. Here, we present an updated R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signalling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models or linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across s les to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g., multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g., plots of aggregated signals).

Publication

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets

Publisher: F1000 Research Ltd

Date: 17-12-2019

DOI: 10.12688/F1000RESEARCH.11622.4

Abstract: High-dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high-throughput interrogation and characterization of cell populations. Here, we present an updated R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signalling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models or linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across s les to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g., multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g., plots of aggregated signals).

Publication

A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes

Publisher: Cold Spring Harbor Laboratory

Date: 11-03-2019

DOI: 10.1101/574525

Abstract: A platform for highly parallel direct sequencing of native RNA strands was recently described by Oxford Nanopore Technologies (ONT) in order to assess overall performance in transcript-level investigations, the technology was applied for sequencing sets of synthetic transcripts as well as a yeast transcriptome. However, despite initial efforts it remains crucial to further investigate characteristics of ONT native RNA sequencing when applied to much more complex transcriptomes. Here we thus undertook extensive native RNA sequencing of polyA+ RNA from two human cell lines, and thereby analysed ~5.2 million aligned native RNA reads which consisted of a total of ~4.6 billion bases. To enable informative comparisons, we also performed relevant ONT direct cDNA- and Illumina-sequencing. We find that while native RNA sequencing does enable some of the anticipated advantages, key unexpected aspects h er its performance, most notably the quite frequent inability to obtain full-length transcripts from single reads, as well as difficulties to unambiguously infer their true transcript of origin. While characterising issues that need to be addressed when investigating more complex transcriptomes, our study highlights that with some defined improvements, native RNA sequencing could be an important addition to the mammalian transcriptomics toolbox.

Publication

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets [version 1; referees: awaiting peer review]

Publisher: F1000 Research Ltd

Date: 26-05-2017

DOI: 10.12688/F1000RESEARCH.11622.1

Abstract: High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across s les to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals).

Publication

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets

Publisher: F1000 Research Ltd

Date: 14-11-2017

DOI: 10.12688/F1000RESEARCH.11622.2

Abstract: High dimensional mass and flow cytometry (HDCyto) experiments have become a method of choice for high throughput interrogation and characterization of cell populations.Here, we present an R-based pipeline for differential analyses of HDCyto data, largely based on Bioconductor packages. We computationally define cell populations using FlowSOM clustering, and facilitate an optional but reproducible strategy for manual merging of algorithm-generated clusters. Our workflow offers different analysis paths, including association of cell type abundance with a phenotype or changes in signaling markers within specific subpopulations, or differential analyses of aggregated signals. Importantly, the differential analyses we show are based on regression frameworks where the HDCyto data is the response thus, we are able to model arbitrary experimental designs, such as those with batch effects, paired designs and so on. In particular, we apply generalized linear mixed models to analyses of cell population abundance or cell-population-specific analyses of signaling markers, allowing overdispersion in cell count or aggregated signals across s les to be appropriately modeled. To support the formal statistical analyses, we encourage exploratory data analysis at every step, including quality control (e.g. multi-dimensional scaling plots), reporting of clustering results (dimensionality reduction, heatmaps with dendrograms) and differential analyses (e.g. plots of aggregated signals).

Publication

SampleQC: robust multivariate, multi-cell type, multi-sample quality control for single-cell data

Publisher: Springer Science and Business Media LLC

Date: 10-02-2023

DOI: 10.1186/S13059-023-02859-3

Abstract: Quality control (QC) is a critical component of single-cell RNA-seq (scRNA-seq) processing pipelines. Current approaches to QC implicitly assume that datasets are comprised of one cell type, potentially resulting in biased exclusion of rare cell types. We introduce , which robustly fits a Gaussian mixture model across multiple s les, improves sensitivity, and reduces bias compared to current approaches. We show via simulations that is less susceptible to exclusion of rarer cell types. We also demonstrate on a complex real dataset (867k cells over 172 s les). is general, is implemented in R, and could be applied to other data types.

Publication

A reference single-cell map of freshly dissociated human synovium in inflammatory arthritis with an optimized dissociation protocol for prospective synovial biopsy collection

Publisher: Cold Spring Harbor Laboratory

Date: 03-06-2022

DOI: 10.1101/2022.06.01.493823

Abstract: Single-cell RNA-sequencing is advancing our understanding of synovial pathobiology in inflammatory arthritis. Here, we optimized the protocol for the dissociation of fresh synovial biopsies and created a reference single-cell map of fresh human synovium in inflammatory arthritis. We utilized the published method for dissociating cryopreserved synovium and optimized it for dissociating small fresh synovial biopsies. The optimized protocol enabled the isolation of a good yield of consistently highly viable cells, minimizing the dropout rate of prospectively collected biopsies. Our reference synovium map comprised over 100’000 unsorted single-cell profiles from 25 synovial tissues of patients with inflammatory arthritis. Synovial cells formed 11 lymphoid, 15 myeloid and 16 stromal cell clusters, including IFITM2+ synovial neutrophils. Using this reference map, we successfully annotated published synovial scRNA-seq datasets. Our dataset uncovered endothelial cell ersity and identified SOD2 high SAA1+SAA2+ and SERPINE1+COL5A3+ fibroblast clusters, expressing genes linked to cartilage breakdown (SDC4) and extracellular matrix remodelling (LOXL2, TGFBI, TGFB1), respectively. We broadened the characterization of tissue resident FOLR2+COLEC12 high and LYVE1+SLC40A1+ macrophages, inferring their extracellular matrix sensing and iron recycling activities. Our research brings an efficient synovium dissociation protocol and a reference annotation resource of fresh human synovium, while expanding the knowledge about synovial cell ersity in inflammatory arthritis.

Publication

Benchmarking comes of age

Publisher: Springer Science and Business Media LLC

Date: 09-10-2019

DOI: 10.1186/S13059-019-1846-5

Publication

H3K4me3 enrichment defines neuronal age, while a youthful H3K27ac signature is recapitulated in aged neurons

Publisher: Research Square Platform LLC

Date: 16-03-2022

DOI: 10.21203/RS.3.RS-1367459/V1

Abstract: Neurons live for the lifespan of the in idual and underlie our ability for lifelong learning and memory. However, aging alters neuron morphology and function resulting in age-related cognitive decline. It is well established that epigenetic alterations are essential for learning and memory, yet few neuron-specific genome-wide epigenetic maps exist into old age. Comprehensive mapping of H3K4me3 and H3K27ac in mouse neurons across lifespan revealed plastic H3K4me3 marking that differentiates neuronal age linked to known characteristics of cellular and neuronal aging. We determined that neurons in old age recapitulate the H3K27ac enrichment at promoters, enhancers and super enhancers from young adult neurons, likely representing a re-activation of pathways to maintain neuronal output. Finally, this study identified new characteristics of neuronal aging, including altered rDNA regulation and epigenetic regulatory mechanisms. Collectively, these findings indicate a key role for epigenetic regulation in neurons, that is inextricably linked with aging.

Publication

Shedding Light on the Transcriptomic Dark Matter in Biological Psychiatry: Role of Long Noncoding RNAs in D-cycloserine-Induced Fear Extinction in Posttraumatic Stress Disorder

Publisher: Mary Ann Liebert Inc

Date: 06-2020

DOI: 10.1089/OMI.2020.0031

Publication

Highly efficient DNA-free gene disruption in the agricultural pestCeratitis capitataby CRISPR-Cas9 RNPs

Publisher: Cold Spring Harbor Laboratory

Date: 18-04-2017

DOI: 10.1101/127506

Abstract: The Mediterranean fruitfly Ceratitis capitata (medfly) is an invasive agricultural pest of high economical impact and has become an emerging model for developing new genetic control strategies as alternative to insecticides. Here, we report the successful adaptation of CRISPR-Cas9-based gene disruption in the medfly by injecting in vitro pre-assembled, solubilized Cas9 ribonucleoprotein complexes (RNPs) loaded with gene-specific sgRNAs into early embryos. When targeting the eye pigmentation gene white eye ( we ), we observed a high rate of somatic mosaicism in surviving G0 adults. Germline transmission of mutated we alleles by G0 animals was on average above 70%, with in idual cases achieving a transmission rate of nearly 100%. We further recovered large deletions in the we gene when two sites were simultaneously targeted by two sgRNAs. CRISPR-Cas9 targeting of the Ceratitis ortholog of the Drosophila segmentation paired gene ( Ccprd ) caused segmental malformations in late embryos and in hatched larvae. Mutant phenotypes correlate with repair by non-homologous end joining (NHEJ) lesions in the two targeted genes. This simple and highly effective Cas9 RNP-based gene editing to introduce mutations in Ceratitis capitata will significantly advance the design and development of new effective strategies for pest control management.

Publication

A general and powerful stage-wise testing procedure for differential expression and differential transcript usage

Publisher: Cold Spring Harbor Laboratory

Date: 16-02-2017

DOI: 10.1101/109082

Abstract: Reductions in sequencing cost and innovations in expression quantification have prompted an emergence of RNA-seq studies with complex designs and data analysis at transcript resolution. These applications involve multiple hypotheses per gene, leading to challenging multiple testing problems. Conventional approaches provide separate top-lists for every contrast and false discovery rate (FDR) control at in idual hypothesis level. Hence, they fail to establish proper gene-level error control, which compromises downstream validation experiments. Tests that aggregate in idual hypotheses are more powerful and provide gene-level FDR control, but in the RNA-seq literature no methods are available for post-hoc analysis of in idual hypotheses. We introduce a two-stage procedure that leverages the increased power of aggregated hypothesis tests while maintaining high biological resolution by post-hoc analysis of genes passing the screening hypothesis. Our method is evaluated on simulated and real RNA-seq experiments. It provides gene-level FDR control in studies with complex designs while boosting power for interaction effects without compromising the discovery of main effects. In a differential transcript usage/expression context, stage-wise testing gains power by aggregating hypotheses at the gene level, while providing transcript-level assessment of genes passing the screening stage. Finally, a prostate cancer case study highlights the relevance of combining gene with transcript level results. Stage-wise testing is a general paradigm that can be adopted whenever in idual hypotheses can be aggregated. In our context, it achieves an optimal middle ground between biological resolution and statistical power while providing gene-level FDR control, which is beneficial for downstream biological interpretation and validation.

Publication

Chromothripsis-like patterns are recurring but heterogeneously distributed features in a survey of 22,347 cancer genome screens

Publisher: Springer Science and Business Media LLC

Date: 29-01-2014

DOI: 10.1186/1471-2164-15-82

Abstract: Chromothripsis is a recently discovered phenomenon of genomic rearrangement, possibly arising during a single genome-shattering event. This could provide an alternative paradigm in cancer development, replacing the gradual accumulation of genomic changes with a “one-off” catastrophic event. However, the term has been used with varying operational definitions, with the minimal consensus being a large number of locally clustered copy number aberrations. The mechanisms underlying these chromothripsis-like patterns (CTLP) and their specific impact on tumorigenesis are still poorly understood. Here, we identified CTLP in 918 cancer s les, from a dataset of more than 22,000 oncogenomic arrays covering 132 cancer types. Fragmentation hotspots were found to be located on chromosome 8, 11, 12 and 17. Among the various cancer types, soft-tissue tumors exhibited particularly high CTLP frequencies. Genomic context analysis revealed that CTLP rearrangements frequently occurred in genomes that additionally harbored multiple copy number aberrations (CNAs). An investigation into the affected chromosomal regions showed a large proportion of arm-level pulverization and telomere related events, which would be compatible to a number of underlying mechanisms. We also report evidence that these genomic events may be correlated with patient age, stage and survival rate. Through a large-scale analysis of oncogenomic array data sets, this study characterized features associated with genomic aberrations patterns, compatible to the spectrum of “chromothripsis”-definitions as previously used. While quantifying clustered genomic copy number aberrations in cancer s les, our data indicates an underlying biological heterogeneity behind these chromothripsis-like patterns, beyond a well defined “chromthripsis” phenomenon.

Publication

A systematic performance evaluation of clustering methods for single-cell RNA-seq data

Publisher: F1000 Research Ltd

Date: 10-09-2018

DOI: 10.12688/F1000RESEARCH.15666.2

Abstract: Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple in idual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub ( arkrobinsonuzh/scRNAseq_clustering_comparison ). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor ( ackages/DuoClustering2018 ).

Publication

A systematic performance evaluation of clustering methods for single-cell RNA-seq data

Publisher: F1000 Research Ltd

Date: 26-07-2018

DOI: 10.12688/F1000RESEARCH.15666.1

Abstract: Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 12 clustering algorithms, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using 9 publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple in idual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. The R scripts providing an extensible framework for the evaluation of new methods and data sets are available on GitHub ( arkrobinsonuzh/scRNAseq_clustering_comparison ).

Publication

distinct: a novel approach to differential distribution analyses

Publisher: Cold Spring Harbor Laboratory

Date: 25-11-2020

DOI: 10.1101/2020.11.24.394213

Abstract: We present distinct , a general method for differential analysis of full distributions that is well suited to applications on single-cell data, such as single-cell RNA sequencing and high-dimensional flow or mass cytometry data. High-throughput single-cell data reveal an unprecedented view of cell identity and allow complex variations between conditions to be discovered nonetheless, most methods for differential expression target differences in the mean and struggle to identify changes where the mean is only marginally affected. distinct is based on a hierarchical non-parametric permutation approach and, by comparing empirical cumulative distribution functions, identifies both differential patterns involving changes in the mean, as well as more subtle variations that do not involve the mean. We performed extensive bench-marks across both simulated and experimental datasets from single-cell RNA sequencing and mass cytometry data, where distinct shows favourable performance, identifies more differential patterns than competitors, and displays good control of false positive and false discovery rates. distinct is available as a Bioconductor R package.

Publication

Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data

Publisher: Cold Spring Harbor Laboratory

Date: 10-02-2022

DOI: 10.1101/2022.02.08.479579

Abstract: Long-read RNA sequencing (lrRNA-seq) produces detailed information about full-length transcripts, including novel and s le-specific isoforms. Furthermore, there is opportunity to call variants directly from lrRNA-seq data. However, most state-of-the-art variant callers have been developed for genomic DNA. Here, there are two objectives: first, we perform a mini-benchmark on GATK, DeepVariant, Clair3, and NanoCaller primarily on PacBio Iso-Seq, data, but also on Nanopore and Illumina RNA-seq data second, we propose a pipeline to process spliced-alignment files, making them suitable for variant calling with DNA-based callers. With such manipulations, high calling performance can be achieved using DeepVariant on Iso-seq data.

Publication

Global landscape of protein complexes in the yeast Saccharomyces cerevisiae

Publisher: Springer Science and Business Media LLC

Date: 03-2006

DOI: 10.1038/NATURE04670

Abstract: Identification of protein-protein interactions often provides insight into protein function, and many cellular processes are performed by stable protein complexes. We used tandem affinity purification to process 4,562 different tagged proteins of the yeast Saccharomyces cerevisiae. Each preparation was analysed by both matrix-assisted laser desorption/ionization-time of flight mass spectrometry and liquid chromatography tandem mass spectrometry to increase coverage and accuracy. Machine learning was used to integrate the mass spectrometry scores and assign probabilities to the protein-protein interactions. Among 4,087 different proteins identified with high confidence by mass spectrometry from 2,357 successful purifications, our core data set (median precision of 0.69) comprises 7,123 protein-protein interactions involving 2,708 proteins. A Markov clustering algorithm organized these interactions into 547 protein complexes averaging 4.9 subunits per complex, about half of them absent from the MIPS database, as well as 429 additional interactions between pairs of complexes. The data (all of which are available online) will help future studies on in idual proteins as well as functional genomics and systems biology.

Publication

Phase I Trial Characterizing the Pharmacokinetic Profile of N-803, a Chimeric IL-15 Superagonist, in Healthy Volunteers

Publisher: The American Association of Immunologists

Date: 15-03-2022

DOI: 10.4049/JIMMUNOL.2100066

Abstract: The oncotherapeutic promise of IL-15, a potent immunostimulant, is limited by a short serum t1/2. The fusion protein N-803 is a chimeric IL-15 superagonist that has a & -fold longer in vivo t1/2 versus IL-15. This phase 1 study characterized the pharmacokinetic (PK) profile and safety of N-803 after s.c. administration to healthy human volunteers. Volunteers received two doses of N-803, and after each dose, PK and safety were assessed for 9 d. The primary endpoint was the N-803 PK profile, the secondary endpoint was safety, and immune cell levels and immunogenicity were measures of interest. Serum N-803 concentrations peaked 4 h after administration and declined with a t1/2 of ∼20 h. N-803 did not cause treatment-emergent serious adverse events (AEs) or grade ≥3 AEs. Injection site reactions, chills, and pyrexia were the most common AEs. Administration of N-803 was well tolerated and accompanied by proliferation of NK cells and CD8+ T cells and sustained increases in the number of NK cells. Our results suggest that N-803 administration can potentiate antitumor immunity.

Publication

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single-cell RNA-seq preprocessing tools

Publisher: Cold Spring Harbor Laboratory

Date: 02-02-2020

DOI: 10.1101/2020.02.02.930578

Abstract: The massive growth of single-cell RNA-sequencing (scRNAseq) and the methods for its analysis still lack sufficient and up-to-date benchmarks that could guide analytical choices. Numerous benchmark studies already exist and cover most of scRNAseq processing and analytical methods but only a few give advice on a comprehensive pipeline. Moreover, current studies often focused on isolated steps of the process and do not address the impact of a tool on both the intermediate and the final steps of the analysis. Here, we present a flexible R framework for pipeline comparison with multi-level evaluation metrics. We apply it to the benchmark of scRNAseq analysis pipelines using simulated and real datasets with known cell identities, covering common methods of filtering, doublet detection, normalization, feature selection, denoising, dimensionality reduction and clustering. We evaluate the choice of these tools with multi-purpose metrics to assess their ability to reveal cell population structure and lead to efficient clustering. On the basis of our systematic evaluations of analysis pipelines, we make a number of practical recommendations about current analysis choices and for a comprehensive pipeline. The evaluation framework that we developed, pipeComp ( lger ipeComp ), has been implemented so as to easily integrate any other step, tool, or evaluation metric allowing extensible benchmarks and easy applications to other fields of research in Bioinformatics, as we demonstrate through a study of the impact of removal of unwanted variation on differential expression analysis.

Publication

`edgeR`: a Bioconductor package for differential expression analysis of digital gene expression data

Publisher: Oxford University Press (OUP)

Date: 11-11-2009

DOI: 10.1093/BIOINFORMATICS/BTP616

Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (bioconductor.org). Contact: mrobinson@wehi.edu.au

Publication

Computational epigenomics: challenges and opportunities

Publisher: Frontiers Media SA

Date: 05-03-2015

DOI: 10.3389/FGENE.2015.00088

Publication

Robustly detecting differential expression in RNA sequencing data using observation weights

Publisher: Oxford University Press (OUP)

Date: 20-04-2014

DOI: 10.1093/NAR/GKU310

Publication

RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis

Publisher: Annual Reviews

Date: 20-07-2019

DOI: 10.1146/ANNUREV-BIODATASCI-072018-021255

Abstract: Gene expression is the fundamental level at which the results of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large ersity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq data sets, as well as the performance of the myriad of methods developed. In this review, we give an overview of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on the quantification of gene expression and statistical approachesfor differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.

Publication

A Panoramic View of Yeast Noncoding RNA Processing

Publisher: Elsevier BV

Date: 06-2003

DOI: 10.1016/S0092-8674(03)00466-5

Abstract: Predictive analysis using publicly available yeast functional genomics and proteomics data suggests that many more proteins may be involved in biogenesis of ribonucleoproteins than are currently known. Using a microarray that monitors abundance and processing of noncoding RNAs, we analyzed 468 yeast strains carrying mutations in protein-coding genes, most of which have not previously been associated with RNA or RNP synthesis. Many strains mutated in uncharacterized genes displayed aberrant noncoding RNA profiles. Ten factors involved in noncoding RNA biogenesis were verified by further experimentation, including a protein required for 20S pre-rRNA processing (Tsr2p), a protein associated with the nuclear exosome (Lrp1p), and a factor required for box C/D snoRNA accumulation (Bcd1p). These data present a global view of yeast noncoding RNA processing and confirm that many currently uncharacterized yeast proteins are involved in biogenesis of noncoding RNA.

Publication

Gapless provides combined scaffolding, gap filling, and assembly correction with long reads

Publisher: Life Science Alliance, LLC

Date: 04-05-2023

DOI: 10.26508/LSA.202201471

Abstract: Continuity, correctness, and completeness of genome assemblies are important for many biological projects. Long reads represent a major driver towards delivering high-quality genomes, but not everybody can achieve the necessary coverage for good long read-only assemblies. Therefore, improving existing assemblies with low-coverage long reads is a promising alternative. The improvements include correction, scaffolding, and gap filling. However, most tools perform only one of these tasks and the useful information of reads that supported the scaffolding is lost when running separate programs successively. Therefore, we propose a new tool for combined execution of all three tasks using PacBio or Oxford Nanopore reads. gapless is available at: chmeing/gapless .

Publication

T-cell acute leukaemia exhibits dynamic interactions with bone marrow microenvironments

Publisher: Springer Science and Business Media LLC

Date: 10-2016

DOI: 10.1038/NATURE19801

Publication

BAZ2A (TIP5) is involved in epigenetic alterations in prostate cancer and its overexpression predicts disease recurrence

Publisher: Springer Science and Business Media LLC

Date: 08-12-2014

DOI: 10.1038/NG.3165

Abstract: Prostate cancer is driven by a combination of genetic and/or epigenetic alterations. Epigenetic alterations are frequently observed in all human cancers, yet how aberrant epigenetic signatures are established is poorly understood. Here we show that the gene encoding BAZ2A (TIP5), a factor previously implicated in epigenetic rRNA gene silencing, is overexpressed in prostate cancer and is paradoxically involved in maintaining prostate cancer cell growth, a feature specific to cancer cells. BAZ2A regulates numerous protein-coding genes and directly interacts with EZH2 to maintain epigenetic silencing at genes repressed in metastasis. BAZ2A overexpression is tightly associated with a molecular subtype displaying a CpG island methylator phenotype (CIMP). Finally, high BAZ2A levels serve as an independent predictor of biochemical recurrence in a cohort of 7,682 in iduals with prostate cancer. This work identifies a new aberrant role for the epigenetic regulator BAZ2A, which can also serve as a useful marker for metastatic potential in prostate cancer.

Publication

Carbon brainprint – An estimate of the intellectual contribution of research institutions to reducing greenhouse gas emissions

Publisher: Elsevier BV

Date: 07-2015

DOI: 10.1016/J.PSEP.2015.04.008

Publication

LincRNAs involved in DCS-induced fear extinction: Shedding light on the transcriptomic dark matter

Publisher: Cold Spring Harbor Laboratory

Date: 08-11-2019

DOI: 10.1101/834242

Abstract: There is a growing appreciation of the role of non-coding RNAs in the regulation of gene and protein expression. Long non-coding RNAs can modulate splicing by hybridizing with precursor messenger RNAs (pre-mRNAs) and influence RNA editing, mRNA stability, translation activation and microRNA-mRNA interactions by binding to mature mRNAs. LncRNAs are highly abundant in the brain and have been implicated in neurodevelopmental disorders. Long intergenic non-coding RNAs are the largest subclass of lncRNAs and play a crucial role in gene regulation. We used RNA sequencing and bioinformatic analyses to identify lincRNAs and their predicted mRNA targets associated with fear extinction that was induced by intra-hippoc ally administered D-cycloserine in an animal model investigating the core phenotypes of PTSD. We identified 43 differentially expressed fear extinction related lincRNAs and 190 differentially expressed fear extinction related mRNAs. Eight of these lincRNAs were predicted to interact with and regulate 108 of these mRNAs and seven lincRNAs were predicted to interact with 22 of their pre-mRNA transcripts. On the basis of the functions of their target RNAs, we inferred that these lincRNAs bind to nucleotides, ribonucleotides and proteins and subsequently influence nervous system development, and morphology, immune system functioning, and are associated with nervous system and mental health disorders. Quantitative trait loci that overlapped with fear extinction related lincRNAs, included serum corticosterone level, neuroinflammation, anxiety, stress and despair related responses. This is the first study to identify lincRNAs and their RNA targets with a putative role in transcriptional regulation during fear extinction.

Publication

Are Epigenetic Factors Implicated in Chronic Widespread Pain?

Publisher: Public Library of Science (PLoS)

Date: 10-11-2016

DOI: 10.1371/JOURNAL.PONE.0165548

Publication

Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data

Publisher: Springer Science and Business Media LLC

Date: 24-04-2023

DOI: 10.1186/S13059-023-02923-Y

Abstract: Long-read RNA sequencing (lrRNA-seq) produces detailed information about full-length transcripts, including novel and s le-specific isoforms. Furthermore, there is an opportunity to call variants directly from lrRNA-seq data. However, most state-of-the-art variant callers have been developed for genomic DNA. Here, there are two objectives: first, we perform a mini-benchmark on GATK, DeepVariant, Clair3, and NanoCaller primarily on PacBio Iso-Seq, data, but also on Nanopore and Illumina RNA-seq data second, we propose a pipeline to process spliced-alignment files, making them suitable for variant calling with DNA-based callers. With such manipulations, high calling performance can be achieved using DeepVariant on Iso-seq data.

Publication

Essential guidelines for computational method benchmarking

Publisher: Springer Science and Business Media LLC

Date: 20-06-2019

DOI: 10.1186/S13059-019-1738-8

Publication

zingeR: unlocking RNA-seq tools for zero-inflation and single cell applications

Publisher: Cold Spring Harbor Laboratory

Date: 30-06-2017

DOI: 10.1101/157982

Abstract: Dropout in single cell RNA-seq (scRNA-seq) applications causes many transcripts to go undetected. It induces excess zero counts, which leads to power issues in differential expression (DE) analysis and has triggered the development of bespoke scRNA-seq DE tools that cope with zero-inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce zingeR, a zero-inflated negative binomial model that identifies excess zero counts and generates observation weights to unlock bulk RNA-seq pipelines for zero-inflation, boosting performance in scRNA-seq differential expression analysis.

Publication

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

Publisher: Life Science Alliance, LLC

Date: 17-01-2019

DOI: 10.26508/LSA.201800175

Abstract: Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results is directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility score, which provides a way to evaluate the reliability of transcript-level abundance estimates and the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that although most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.

Publication

De novo assembly and sex-specific transcriptome profiling in the sand fly Phlebotomus perniciosus (Diptera, Phlebotominae), a major Old World vector of Leishmania infantum

Publisher: Springer Science and Business Media LLC

Date: 23-10-2015

DOI: 10.1186/S12864-015-2088-X

Publication

BayMeth: improved DNA methylation quantification for affinity capture sequencing data using a flexible Bayesian approach

Publisher: Springer Science and Business Media LLC

Date: 2014

DOI: 10.1186/GB-2014-15-2-R35

Publication

Maleness-on-the-Y ( MoY ) orchestrates male sex determination in major agricultural fruit fly pests

Publisher: American Association for the Advancement of Science (AAAS)

Date: 27-09-2019

DOI: 10.1126/SCIENCE.AAX1318

Abstract: The Mediterranean fruit fly or Medfly ( Ceratitis capitata ) is a global and highly destructive fruit pest. Meccariello et al. identified the master gene for male sex determination on the Y chromosome of Medfly and named it Maleness-on-the-Y ( MoY ) (see the Perspective by Makki and Meller). Flies of each sex were transformed into the other sex by genetic manipulation, and crosses of transformed files generated male and female progeny. MoY is functionally conserved in the olive fruit fly and in the invasive oriental fruit fly. This discovery has potential for insect genetic control based on mass release of sterile males and future strategies based on gene drive. Science , this issue p. 1457 see also p. 1380

Publication

Mass Cytometric and Transcriptomic Profiling of Epithelial-Mesenchymal Transitions in Human Mammary Cell Lines

Publisher: Cold Spring Harbor Laboratory

Date: 27-03-2021

DOI: 10.1101/2021.03.26.436976

Abstract: Epithelial-mesenchymal transition (EMT) equips breast cancer cells for metastasis and treatment resistance. Inhibition and elimination of EMT-undergoing cells are therefore promising therapy approaches. However, detecting EMT-undergoing cells is challenging due to the intrinsic heterogeneity of cancer cells and the phenotypic ersity of EMT programs. Here, we profiled EMT transition phenotypes in four non-cancerous human mammary epithelial cell lines using a FACS surface marker screen, RNA sequencing, and mass cytometry. EMT was induced in the HMLE and MCF10A cell lines and in the HMLE-Twist-ER and HMLE-Snail-ER cell lines by chronic exposure to TGFβ1 or 4-hydroxytamoxifen, respectively. We observed a spectrum of EMT transition phenotypes in each cell line and the spectrum varied across the time course. Our data provide multiparametric insights at single-cell level into the phenotypic ersity of EMT at different time points and in four human cellular models. These insights are valuable to better understand the complexity of EMT, to compare EMT transitions between the cellular models used herein, and for the design of EMT time course experiments. Mendeley Data: DOI: 10.17632 t3gmyk5r2.1 ArrayExpress Data: Accession number E-MTAB-9365

Publication

SampleQC: robust multivariate, multi-celltype, multi-sample quality control for single cell data

Publisher: Cold Spring Harbor Laboratory

Date: 28-08-2021

DOI: 10.1101/2021.08.28.458012

Abstract: Quality control (QC) is a critical component of single-cell RNA-seq (scRNA-seq) processing pipelines. Current approaches to QC implicitly assume that datasets are comprised of one celltype, potentially resulting in biased exclusion of rare celltypes. We introduce S leQC , which robustly fits a Gaussian mixture model across multiple s les, and improves sensitivity and reduces bias compared to current approaches. We show via simulations that S leQC is less susceptible to exclusion of rarer celltypes. We also demonstrate S leQC on a complex real dataset (867k cells over 172 s les). S leQC is general, is implemented in R, and could be applied to other data types.

Publication

DifferentialRegulation:a Bayesian hierarchical approach to identify differentially regulated genes

Publisher: Cold Spring Harbor Laboratory

Date: 17-08-2023

DOI: 10.1101/2023.08.17.553679

Abstract: Although transcriptomics data is typically used to analyse mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on in idual s les, and rarely allow comparisons between groups of s les (e.g., healthy vs . diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, i.e., reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions. Here, we present DifferentialRegulation , a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, versus state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data. DifferentialRegulation is distributed as a Bioconductor R package.

Publication

From RNA-seq reads to differential expression results

Publisher: Springer Science and Business Media LLC

Date: 2010

DOI: 10.1186/GB-2010-11-12-220

Publication

Evaluation of affinity-based genome-wide DNA methylation data: Effects of CpG density, amplification bias, and copy number variation

Publisher: Cold Spring Harbor Laboratory

Date: 02-11-2010

DOI: 10.1101/GR.110601.110

Abstract: DNA methylation is an essential epigenetic modification that plays a key role associated with the regulation of gene expression during differentiation, but in disease states such as cancer, the DNA methylation landscape is often deregulated. There are now numerous technologies available to interrogate the DNA methylation status of CpG sites in a targeted or genome-wide fashion, but each method, due to intrinsic biases, potentially interrogates different fractions of the genome. In this study, we compare the affinity-purification of methylated DNA between two popular genome-wide techniques, methylated DNA immunoprecipitation (MeDIP) and methyl-CpG binding domain-based capture (MBDCap), and show that each technique operates in a different domain of the CpG density landscape. We explored the effect of whole-genome lification and illustrate that it can reduce sensitivity for detecting DNA methylation in GC-rich regions of the genome. By using MBDCap, we compare and contrast microarray- and sequencing-based readouts and highlight the impact that copy number variation (CNV) can make in differential comparisons of methylomes. These studies reveal that the analysis of DNA methylation data and genome coverage is highly dependent on the method employed, and consideration must be made in light of the GC content, the extent of DNA lification, and the copy number.

Publication

Doublet identification in single-cell sequencing data using scDblFinder

Publisher: F1000 Research Ltd

Date: 28-09-2021

DOI: 10.12688/F1000RESEARCH.73600.1

Abstract: Doublets are prevalent in single-cell sequencing data and can lead to artifactual findings. A number of strategies have therefore been proposed to detect them. Building on the strengths of existing approaches, we developed scDblFinder , a fast, flexible and accurate Bioconductor-based doublet detection method. Here we present the method, justify its design choices, demonstrate its performance on both single-cell RNA and accessibility sequencing data, and provide some observations on doublet formation, detection, and enrichment analysis. Even in complex datasets, scDblFinder can accurately identify most heterotypic doublets, and was already found by an independent benchmark to outcompete alternatives.

Publication

Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications

Publisher: Springer Science and Business Media LLC

Date: 26-02-2018

DOI: 10.1186/S13059-018-1406-4

Publication

Doublet identification in single-cell sequencing data using scDblFinder

Publisher: F1000 Research Ltd

Date: 16-05-2022

DOI: 10.12688/F1000RESEARCH.73600.2

Abstract: Doublets are prevalent in single-cell sequencing data and can lead to artifactual findings. A number of strategies have therefore been proposed to detect them. Building on the strengths of existing approaches, we developed scDblFinder , a fast, flexible and accurate Bioconductor-based doublet detection method. Here we present the method, justify its design choices, demonstrate its performance on both single-cell RNA and accessibility (ATAC) sequencing data, and provide some observations on doublet formation, detection, and enrichment analysis. Even in complex datasets, scDblFinder can accurately identify most heterotypic doublets, and was already found by an independent benchmark to outcompete alternatives.

Publication

Built on sand: the shaky foundations of simulating single-cell RNA sequencing data

Publisher: Cold Spring Harbor Laboratory

Date: 15-11-2021

DOI: 10.1101/2021.11.15.468676

Abstract: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyse aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant – on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task, and often use simulated data that provide a ground truth for evaluations. Thus, demanding a high quality standard for synthetically generated data is critical to make simulation study results credible and transferable to real data. Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects they yield over-optimistic performance of integration, and potentially unreliable ranking of clustering methods and, it is generally unknown which summaries are important to ensure effective simulation-based method comparisons.

Publication

Faithful mRNA splicing depends on the Prp19 complex subunitfaint sausageand is required for tracheal branching morphogenesis inDrosophila

Publisher: The Company of Biologists

Date: 2017

DOI: 10.1242/DEV.144535

Abstract: Morphogenesis requires the dynamic regulation of gene expression, including transcription, mRNA maturation and translation. Dysfunction of the general mRNA splicing machinery can cause surprisingly specific cellular phenotypes, but the basis for these effects is not clear. Here we show that the Drosophila faint sausage (fas) locus, implicated in epithelial morphogenesis and previously reported to encode a secreted immunoglobulin domain protein, in fact encodes a subunit of the spliceosome-activating Prp19 complex, which is essential for efficient pre-mRNA splicing. Loss of zygotic fas function globally impairs the efficiency of splicing, and is associated with widespread retention of introns in mRNAs and dramatic changes in gene expression. Surprisingly, despite these general effects, zygotic fas mutants show specific defects in tracheal cell migration during mid-embryogenesis when maternally supplied splicing factors have declined. We propose that tracheal branching, which relies on dynamic changes in gene expression, is particularly sensitive for efficient spliceosome function. Our results reveal an entry point to study requirements of the splicing machinery during organogenesis and provide a better understanding of disease phenotypes associated with mutations in general splicing factors.

Publication

Abscisic acid is a substrate of the ABC transporter encoded by the durable wheat disease resistance gene Lr34

Publisher: Wiley

Date: 22-04-2019

DOI: 10.1111/NPH.15815

Publication

Author Correction: High-dimensional single-cell analysis predicts response to anti-PD-1 immunotherapy

Publisher: Springer Science and Business Media LLC

Date: 02-07-2018

DOI: 10.1038/S41591-018-0094-7

Abstract: In the version of this article initially published, Figs. 5a,c and 6a were incorrect because of an error in a metadata spreadsheet that led to the healthy donor patient 2 (HD2) s les being used twice in the analysis of baseline s les and in the analysis at 12 weeks of anti-PD-1 therapy, while HD3 s les had not been used.

Publication

Observation weights to unlock bulk RNA-seq tools for zero inflation and single-cell applications

Publisher: Cold Spring Harbor Laboratory

Date: 18-01-2018

DOI: 10.1101/250126

Abstract: Dropout events in single-cell transcriptome sequencing (scRNA-seq) cause many transcripts to go undetected and induce an excess of zero read counts, leading to power issues in differential expression (DE) analysis. This has triggered the development of bespoke scRNA-seq DE methods to cope with zero inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce a weighting strategy, based on a zero-inflated negative binomial (ZINB) model, that identifies excess zero counts and generates gene and cell-specific weights to unlock bulk RNA-seq DE pipelines for zero-inflated data, boosting performance for scRNA-seq.

Publication

Channel crosstalk correction in suspension and imaging mass cytometry

Publisher: Cold Spring Harbor Laboratory

Date: 07-09-2017

DOI: 10.1101/185744

Abstract: Mass cytometry enables simultaneous analysis of over 40 proteins and their modifications in single cells through use of metal-tagged antibodies. Compared to fluorescent dyes, the use of pure metal isotopes strongly reduces spectral overlap among measurement channels. Crosstalk still exists, however, caused by isotopic impurity, oxide formation, and mass cytometer properties. Spillover effects can be minimized, but not avoided, by following a set of constraining rules when designing an antibody panel. Generation of such low crosstalk panels requires considerable expert knowledge, knowledge of the abundance of each marker and substantial experimental effort. Here we describe a novel bead-based compensation workflow that includes R-based software and a web tool, which enables correction for interference between channels. We demonstrate utility in suspension mass cytometry and show how this approach can be applied to imaging mass cytometry. Our approach greatly simplifies the development of new antibody panels, increases flexibility for antibody-metal pairing, improves overall data quality, thereby reducing the risk of reporting cell phenotype and function artifacts, and greatly facilitates analysis of complex s les for which antigen abundances are unknown.

Publication

The hematopoietic oncoprotein FOXP1 promotes tumor cell survival in diffuse large B-cell lymphoma by repressing S1PR2 signaling

Publisher: American Society of Hematology

Date: 17-03-2016

DOI: 10.1182/BLOOD-2015-08-662635

Abstract: The sphingosine-1-phosphate receptor 2 (S1PR2) is a novel tumor suppressor and survival prognosticator in the ABC subtype of DLBCL. S1PR2 is a direct, repressed FOXP1 target ectopic S1PR2 expression induces apoptosis in DLBCL cells in vitro and prevents tumor growth.

Publication

ALT-803, an IL-15 superagonist, in combination with nivolumab in patients with metastatic non-small cell lung cancer: a non-randomised, open-label, phase 1b trial

Publisher: Elsevier BV

Date: 05-2018

DOI: 10.1016/S1470-2045(18)30148-7

Publication

Validation of hypermethylated DNA regions found in colorectal cancers as potential aging-independent biomarkers of precancerous colorectal lesions

Publisher: Cold Spring Harbor Laboratory

Date: 25-05-2023

DOI: 10.1101/2023.05.24.542159

Abstract: We previously identified 16,772 colorectal cancer-associated hypermethylated DNA regions that were also detectable in precancerous colorectal lesions (preCRCs) and unrelated to normal mucosal aging. We have now conducted a study to validate 990 of these differently methylated DNA regions in a new series of preCRCs. We used targeted bisulfite sequencing to validate these 990 potential biomarkers in 59 preCRC tissue s les (41 conventional adenomas, 18 sessile serrated lesions), each with a patient-matched normal mucosal s le. Differential DNA methylation tests for each CpG dinucleotide were conducted, with results aggregated at region level, to choose panels of candidate biomarkers that were (cross-)validated with respect to their stratifying potential between preCRCs and normal mucosas as well as on an independent cohort.. Strong differences in methylation level were observed across the full set of 990 investigated DMRs. Among the 100 randomly selected panels of 30 DMRs analyzed with our bioinformatic approach, the best performing panel correctly classified 58/59 tumors (area under the receiver operating curve: 0.998). These validated DNA hypermethylation markers can be exploited to develop more accurate noninvasive colorectal tumor screening assays.

Publication

Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability

Publisher: Cold Spring Harbor Laboratory

Date: 23-09-2022

DOI: 10.1101/2022.09.22.508982

Abstract: Computational methods represent the lifeblood of modern molecular biology. Benchmarking is important for all methods, but with a focus here on computational methods, benchmarking is critical to dissect important steps of analysis pipelines, formally assess performance across common situations as well as edge cases, and ultimately guide users on what tools to use. Benchmarking can also be important for community building and advancing methods in a principled way. We conducted a meta-analysis of recent single-cell benchmarks to summarize the scope, extensibility, neutrality, as well as technical features and whether best practices in open data and reproducible research were followed. The results highlight that while benchmarks often make code available and are in principle reproducible, they remain difficult to extend, for ex le, as new methods and new ways to assess methods emerge. In addition, embracing containerization and workflow systems would enhance reusability of intermediate benchmarking results, thus also driving wider adoption.

Publication

Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data

Publisher: Cold Spring Harbor Laboratory

Date: 08-04-2016

DOI: 10.1101/047613

Abstract: Recent technological developments in high-dimensional flow cytometry and mass cytometry (CyTOF) have made it possible to detect expression levels of dozens of protein markers in thousands of cells per second, allowing cell populations to be characterized in unprecedented detail. Traditional data analysis by “manual gating” can be inefficient and unreliable in these high-dimensional settings, which has led to the development of a large number of automated analysis methods. Methods designed for unsupervised analysis use specialized clustering algorithms to detect and define cell populations for further downstream analysis. Here, we have performed an up-to-date, extensible performance comparison of clustering methods for high-dimensional flow and mass cytometry data. We evaluated methods using several publicly available data sets from experiments in immunology, containing both major and rare cell populations, with cell population identities from expert manual gating as the reference standard. Several methods performed well, including FlowSOM, X-shift, PhenoGraph, Rclusterpp , and flowMeans . Among these, FlowSOM had extremely fast runtimes, making this method well-suited for interactive, exploratory analysis of large, high-dimensional data sets on a standard laptop or desktop computer. These results extend previously published comparisons by focusing on high-dimensional data and including new methods developed for CyTOF data. R scripts to reproduce all analyses are available from GitHub ( mweber/cytometry-clustering-comparison ), and pre-processed data files are available from FlowRepository (FR-FCM-ZZPH), allowing our comparisons to be extended to include new clustering methods and reference data sets.

Publication

Male sex in houseflies is determined by Mdmd , a paralog of the generic splice factor gene CWC22

Publisher: American Association for the Advancement of Science (AAAS)

Date: 12-05-2017

DOI: 10.1126/SCIENCE.AAM5498

Abstract: Sex comes in many forms, even when considered at the molecular level. In different animals, the chromosomes and specific genes that function in sex determination vary widely. As a case in point, the familiar housefly displays a highly variable sex determination system. In this animal, the male determiner (M-factor) instructs male development when it is active, but female development results when it is inactive. Sharma et al. now identify the housefly M-factor, which arose via the co-option of existing genes, gene duplication, and neofunctionalization. The findings elucidate the remarkable ersity in sex-determining pathways and the forces that drive this ersity. Science , this issue p. 642

Publication

Differential splicing using whole-transcript microarrays

Publisher: Springer Science and Business Media LLC

Date: 22-05-2009

DOI: 10.1186/1471-2105-10-156

Publication

Do count-based differential expression methods perform poorly when genes are expressed in only one condition?

Publisher: Cold Spring Harbor Laboratory

Date: 07-04-2015

DOI: 10.1101/017673

Abstract: A correspondence with respect to: Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data Rapaport F, Khanin R, Liang Y, Pirun M, Krek A, Zumbo P, Mason CE, Socci ND and Betel D, Genome Biol 2013, 14:R95

Publication

High-Definition Macromolecular Composition of Yeast RNA-Processing Complexes

Publisher: Elsevier BV

Date: 2004

DOI: 10.1016/S1097-2765(04)00003-6

Abstract: A remarkably large collection of evolutionarily conserved proteins has been implicated in processing of noncoding RNAs and biogenesis of ribonucleoproteins. To better define the physical and functional relationships among these proteins and their cognate RNAs, we performed 165 highly stringent affinity purifications of known or predicted RNA-related proteins from Saccharomyces cerevisiae. We systematically identified and estimated the relative abundance of stably associated polypeptides and RNA species using a combination of gel densitometry, protein mass spectrometry, and oligonucleotide microarray hybridization. Ninety-two discrete proteins or protein complexes were identified comprising 489 different polypeptides, many associated with one or more specific RNA molecules. Some of the pre-rRNA-processing complexes that were obtained are discrete sub-complexes of those previously described. Among these, we identified the IPI complex required for proper processing of the ITS2 region of the ribosomal RNA primary transcript. This study provides a high-resolution overview of the modular topology of noncoding RNA-processing machinery.

Publication

TAR Syndrome-associated Rbm8a deficiency causes hematopoietic defects and attenuates Wnt/PCP signaling

Publisher: Cold Spring Harbor Laboratory

Date: 12-04-2023

DOI: 10.1101/2023.04.12.536513

Abstract: Defects in blood development frequently occur among syndromic congenital anomalies. Thrombocytopenia-Absent Radius (TAR) Syndrome is a rare congenital condition with reduced platelets (hypomegakaryocytic thrombocytopenia) and forelimb anomalies, concurrent with more variable heart and kidney defects. TAR syndrome associates with hypomorphic gene function for RBM8A/Y14 that encodes a component of the exon junction complex involved in mRNA splicing, transport, and nonsense-mediated decay. How perturbing a general mRNA-processing factor causes the selective TAR Syndrome phenotypes remains unknown. Here, we connect zebrafish rbm8a perturbation to early hematopoietic defects via attenuated non-canonical Wnt/Planar Cell Polarity (PCP) signaling that controls developmental cell arrangements. In hypomorphic rbm8a zebrafish, we observe a significant reduction of cd41 -positive thrombocytes. rbm8a -mutant zebrafish embryos accumulate mRNAs with in idual retained introns, a hallmark of defective nonsense-mediated decay affected mRNAs include transcripts for non-canonical Wnt/PCP pathway components. We establish that rbm8a -mutant embryos show convergent extension defects and that reduced rbm8a function interacts with perturbations in non-canonical Wnt/PCP pathway genes w nt5b , wnt11f2 , fzd7a , and vangl2 . Using live-imaging, we found reduced rbm8a function impairs the architecture of the lateral plate mesoderm (LPM) that forms hematopoietic, cardiovascular, kidney, and forelimb skeleton progenitors as affected in TAR Syndrome. Both mutants for rbm8a and for the PCP gene vangl2 feature impaired expression of early hematopoietic/endothelial genes including runx1 and the megakaryocyte regulator gfi1aa . Together, our data propose aberrant LPM patterning and hematopoietic defects as possible consequence of attenuated non-canonical Wnt/PCP signaling upon reduced rbm8a function. These results link TAR Syndrome to a potential LPM origin and developmental mechanism.

Publication

ARMOR: An Automated Reproducible MOdular Workflow for Preprocessing and Differential Analysis of RNA-seq Data

Publisher: Oxford University Press (OUP)

Date: 07-2019

DOI: 10.1534/G3.119.400185

Abstract: The extensive generation of RNA sequencing (RNA-seq) data in the last decade has resulted in a myriad of specialized software for its analysis. Each software module typically targets a specific step within the analysis pipeline, making it necessary to join several of them to get a single cohesive workflow. Multiple software programs automating this procedure have been proposed, but often lack modularity, transparency or flexibility. We present ARMOR, which performs an end-to-end RNA-seq data analysis, from raw read files, via quality checks, alignment and quantification, to differential expression testing, geneset analysis and browser-based exploration of the data. ARMOR is implemented using the Snakemake workflow management system and leverages conda environments Bioconductor objects are generated to facilitate downstream analysis, ensuring seamless integration with many R packages. The workflow is easily implemented by cloning the GitHub repository, replacing the supplied input and reference files and editing a configuration file. Although we have selected the tools currently included in ARMOR, the setup is modular and alternative tools can be easily integrated.

Publication

censcyt: censored covariates in differential abundance analysis in cytometry

Publisher: Cold Spring Harbor Laboratory

Date: 10-11-2020

DOI: 10.1101/2020.11.09.374447

Abstract: Innovations in single cell technologies have lead to a flurry of datasets and computational tools to process and interpret them, including analyses of cell composition changes and transition in cell states. The diffcyt workflow for differential discovery in cytometry data consist of several steps, including preprocessing, cell population identification and differential testing for an association with a binary or continuous covariate. However, the commonly measured quantity of survival time in clinical studies often results in a censored covariate where classical differential testing is inapplicable. To overcome this limitation, multiple methods to directly include censored covariates in differential abundance analysis were examined with the use of simulation studies and a case study. Results show high error control and decent sensitivity for a subset of the methods. The tested methods are implemented in the R package censcyt as an extension of diffcyt and are available at etogerber/censcyt . Methods for the direct inclusion of a censored variable as a predictor in GLMMs are a valid alternative to classical survival analysis methods, such as the Cox proportional hazard model, while allowing for more flexibility in the differential analysis.

Publication

RNA sequencing data: hitchhiker's guide to expression analysis

Publisher: PeerJ

Date: 17-10-2018

DOI: 10.7287/PEERJ.PREPRINTS.27283V1

Abstract: Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large ersity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.

Publication

RNA sequencing data: hitchhiker's guide to expression analysis

Publisher: PeerJ

Date: 24-11-2018

DOI: 10.7287/PEERJ.PREPRINTS.27283V2

Abstract: Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large ersity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.

Publication

Fibroblastic reticular cells initiate immune responses in visceral adipose tissues and secure peritoneal immunity

Publisher: American Association for the Advancement of Science (AAAS)

Date: 03-08-2018

DOI: 10.1126/SCIIMMUNOL.AAR4539

Abstract: MYD88 signaling in fibroblastic reticular cells drives the initiation of immune responses in fat-associated lymphoid clusters.

Publication

A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes

Publisher: Springer Science and Business Media LLC

Date: 31-07-2019

DOI: 10.1038/S41467-019-11272-Z

Abstract: A platform for highly parallel direct sequencing of native RNA strands was recently described by Oxford Nanopore Technologies, but despite initial efforts it remains crucial to further investigate the technology for quantification of complex transcriptomes. Here we undertake native RNA sequencing of polyA + RNA from two human cell lines, analysing ~5.2 million aligned native RNA reads. To enable informative comparisons, we also perform relevant ONT direct cDNA- and Illumina-sequencing. We find that while native RNA sequencing does enable some of the anticipated advantages, key unexpected aspects currently h er its performance, most notably the quite frequent inability to obtain full-length transcripts from single reads, as well as difficulties to unambiguously infer their true transcript of origin. While characterising issues that need to be addressed when investigating more complex transcriptomes, our study highlights that with some defined improvements, native RNA sequencing could be an important addition to the mammalian transcriptomics toolbox.

Publication

Mammalian Annotation Database for improved annotation and functional classification of Omics datasets from less well-annotated organisms

Publisher: Oxford University Press (OUP)

Date: 2019

DOI: 10.1093/DATABASE/BAZ086

Abstract: Next-generation sequencing technologies and the availability of an increasing number of mammalian and other genomes allow gene expression studies, particularly RNA sequencing, in many non-model organisms. However, incomplete genome annotation and assignments of genes to functional annotation databases can lead to a substantial loss of information in downstream data analysis. To overcome this, we developed Mammalian Annotation Database tool (MAdb, madb.ethz.ch) to conveniently provide homologous gene information for selected mammalian species. The assignment between species is performed in three steps: (i) matching official gene symbols, (ii) using ortholog information contained in Ensembl Compara and (iii) pairwise BLAST comparisons of all transcripts. In addition, we developed a new tool (AnnOverlappeR) for the reliable assignment of the National Center for Biotechnology Information (NCBI) and Ensembl gene IDs. The gene lists translated to gene IDs of well-annotated species such as a human can be used for improved functional annotation with relevant tools based on Gene Ontology and molecular pathway information. We tested the MAdb on a published RNA-seq data set for the pig and showed clearly improved overrepresentation analysis results based on the assigned human homologous gene identifiers. Using the MAdb revealed a similar list of human homologous genes and functional annotation results regardless of whether starting with gene IDs from NCBI or Ensembl. The MAdb database is accessible via a web interface and a Galaxy application.

Publication

iCOBRA: open, reproducible, standardized and live method benchmarking

Publisher: Springer Science and Business Media LLC

Date: 30-03-2016

DOI: 10.1038/NMETH.3805

Publication

Eleven grand challenges in single-cell data science

Publisher: Springer Science and Business Media LLC

Date: 07-02-2020

DOI: 10.1186/S13059-020-1926-6

Abstract: The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands—or even millions—of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.

Publication

Gapless provides combined scaffolding, gap filling and assembly correction with long reads

Publisher: Cold Spring Harbor Laboratory

Date: 09-03-2022

DOI: 10.1101/2022.03.08.483466

Abstract: Continuity, correctness and completeness of genome assemblies are important for many biological projects. Long reads represent a major driver towards delivering high-quality genomes, but not everybody can achieve the necessary coverage for good long-read-only assemblies. Therefore, improving existing assemblies with low-coverage long reads is a promising alternative. The improvements include correction, scaffolding and gap filling. However, most tools perform only one of these tasks and the useful information of reads that supported the scaffolding is lost when running separate programs successively. Therefore, we propose a new tool for combined execution of all three tasks using PacBio or Oxford Nanopore reads. gapless is available at: chmeing/gapless.

Publication

Loss of the Notch effector RBPJ promotes tumorigenesis

Publisher: Rockefeller University Press

Date: 15-12-2014

DOI: 10.1084/JEM.20121192

Abstract: Aberrant Notch activity is oncogenic in several malignancies, but it is unclear how expression or function of downstream elements in the Notch pathway affects tumor growth. Transcriptional regulation by Notch is dependent on interaction with the DNA-binding transcriptional repressor, RBPJ, and consequent derepression or activation of associated gene promoters. We show here that RBPJ is frequently depleted in human tumors. Depletion of RBPJ in human cancer cell lines xenografted into immunodeficient mice resulted in activation of canonical Notch target genes, and accelerated tumor growth secondary to reduced cell death. Global analysis of activated regions of the genome, as defined by differential acetylation of histone H4 (H4ac), revealed that the cell death pathway was significantly dysregulated in RBPJ-depleted tumors. Analysis of transcription factor binding data identified several transcriptional activators that bind promoters with differential H4ac in RBPJ-depleted cells. Functional studies demonstrated that NF-κB and MYC were essential for survival of RBPJ-depleted cells. Thus, loss of RBPJ derepresses target gene promoters, allowing Notch-independent activation by alternate transcription factors that promote tumorigenesis.

Publication

The Spinal Transcriptome after Cortical Stroke: In Search of Molecular Factors Regulating Spontaneous Recovery in the Spinal Cord

Publisher: Society for Neuroscience

Date: 08-04-2019

DOI: 10.1523/JNEUROSCI.2571-18.2019

Publication

Synaptic accumulation of FUS triggers age-dependent misregulation of inhibitory synapses in ALS-FUS mice

Publisher: Cold Spring Harbor Laboratory

Date: 10-06-2020

DOI: 10.1101/2020.06.10.136010

Abstract: FUS is a primarily nuclear RNA-binding protein with important roles in RNA processing and transport. FUS mutations disrupting its nuclear localization characterize a subset of amyotrophic lateral sclerosis (ALS-FUS) patients, through an unidentified pathological mechanism. FUS regulates nuclear RNAs, but its role at the synapse is poorly understood. Here, we used super-resolution imaging to determine the physiological localization of extranuclear, neuronal FUS and found it predominantly near the vesicle reserve pool of presynaptic sites. Using CLIP-seq on synaptoneurosome preparations, we identified synaptic RNA targets of FUS that are associated with synapse organization and plasticity. Synaptic FUS was significantly increased in a knock-in mouse model of ALS-FUS, at presymptomatic stages, accompanied by alterations in density and size of GABAergic synapses. RNA-seq of synaptoneurosomes highlighted age-dependent dysregulation of glutamatergic and GABAergic synapses. Our study indicates that FUS accumulation at the synapse in early stages of ALS-FUS results in synaptic impairment, potentially representing an initial trigger of neurodegeneration.

Publication

diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering

Publisher: Cold Spring Harbor Laboratory

Date: 18-06-2018

DOI: 10.1101/349738

Abstract: 1 High-dimensional flow and mass cytometry allow cell types and states to be characterized in great detail by measuring expression levels of more than 40 targeted protein markers per cell at the single-cell level. However, data analysis can be difficult, due to the large size and dimensionality of datasets as well as limitations of existing computational methods. Here, we present diffcyt , a new computational framework for differential discovery analyses in high-dimensional cytometry data, based on a combination of high-resolution clustering and empirical Bayes moderated tests adapted from transcriptomics. Our approach provides improved statistical performance, including for rare cell populations, along with flexible experimental designs and fast runtimes in an open-source framework.

Publication

treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses

Publisher: Cold Spring Harbor Laboratory

Date: 09-06-2020

DOI: 10.1101/2020.06.08.140608

Abstract: The arrangement of hypotheses in a hierarchical structure (e.g., phylogenies, cell types) appears in many research fields and indicates different resolutions at which data can be interpreted. A common goal is to find a representative resolution that gives high sensitivity to identify relevant entities (e.g., microbial taxa or cell subpopulations) that are related to a phenotypic outcome (e.g. disease status) while controlling false detections, therefore providing a more compact view of detected entities and summarizing characteristics shared among them. Current methods, either performing hypothesis tests at an arbitrary resolution or testing hypotheses at all possible resolutions leading to nested results, are suboptimal. Moreover, they are not flexible enough to work in situations where each entity has multiple features to consider and different resolutions might be required for different features. For ex le, in single cell RNA-seq data, an increasing focus is to find differential state genes that change expression within a cell subpopulation in response to an external stimulus. Such differential expression might occur at different resolutions (e.g., all cells or a small set of cells) for different genes. Our new algorithm treeclimbR is designed to fill this gap by exploiting a hierarchical tree of entities, proposing multiple candidates that capture the latent signal and pinpointing branches or leaves that contain features of interest, in a data-driven way. It outperforms currently available methods on synthetic data, and we highlight the approach on various applications, including microbiome and microRNA surveys as well as single cell cytometry and RNA-seq datasets. With the emergence of various multi-resolution genomic datasets, treeclimbR provides a thorough inspection on entities across resolutions and gives additional flexibility to uncover biological associations.

Publication

Comparison of methyl-DNA immunoprecipitation (MeDIP) and methyl-CpG binding domain (MBD) protein capture for genome-wide DNA methylation analysis reveal CpG sequence coverage bias

Publisher: Informa UK Limited

Date: 2011

DOI: 10.4161/EPI.6.1.13313

Abstract: DNA methylation primarily occurs at CpG dinucleotides in mammals and is a common epigenetic mark that plays a critical role in the regulation of gene expression. Profiling DNA methylation patterns across the genome is vital to understand DNA methylation changes that occur during development and in disease phenotype. In this study, we compared two commonly used approaches to enrich for methylated DNA regions of the genome, namely methyl-DNA immunoprecipitation (MeDIP) that is based on enrichment with antibodies specific for 5'-methylcytosine (5MeC), and capture of methylated DNA using a methyl-CpG binding domain-based (MBD) protein to discover differentially methylated regions (DMRs) in cancer. The enriched methylated DNA fractions were interrogated on Affymetrix promoter tiling arrays and differentially methylated regions were identified. A detailed validation study of 42 regions was performed using Sequenom MassCLEAVE technique. This detailed analysis revealed that both enrichment techniques are sensitive for detecting DMRs and preferentially identified different CpG rich regions of the prostate cancer genome, with MeDIP commonly enriching for methylated regions with a low CpG density, while MBD capture favors regions of higher CpG density and identifies the greatest proportion of CpG islands. This is the first detailed validation report comparing different methylated DNA enrichment techniques for identifying regions of differential DNA methylation. Our study highlights the importance of understanding the nuances of the methods used for DNA genome-wide methylation analyses so that accurate interpretation of the biology is not overlooked.

Publication

Disentangling tumorigenesis-associated DNA methylation changes in colorectal tissues from those associated with ageing

Publisher: Informa UK Limited

Date: 09-08-2021

DOI: 10.1080/15592294.2021.1952375

Publication

FunSpec: a web-based cluster interpreter for yeast.

Publisher: Springer Science and Business Media LLC

Date: 2002

DOI: 10.1186/1471-2105-3-35

Abstract: For effective exposition of biological information, especially with regard to analysis of large-scale data types, researchers need immediate access to multiple categorical knowledge bases and need summary information presented to them on collections of genes, as opposed to the typical one gene at a time. We present here a web-based tool (FunSpec) for statistical evaluation of groups of genes and proteins (e.g. co-regulated genes, protein complexes, genetic interactors) with respect to existing annotations (e.g. functional roles, biochemical properties, localization). FunSpec is available online at funspec.med.utoronto.ca FunSpec is helpful for interpretation of any data type that generates groups of related genes and proteins, such as gene expression clustering and protein complexes, and is useful for predictive methods employing "guilt-by-association."

Publication

TCF / LEF dependent and independent transcriptional regulation of Wnt/β‐catenin target genes

Publisher: EMBO

Date: 13-11-2018

DOI: 10.15252/EMBJ.201798873

Publication

The functional landscape of mouse gene expression

Publisher: Springer Science and Business Media LLC

Date: 2004

DOI: 10.1186/JBIOL16

Publication

Treatment of a metabolic liver disease by in vivo genome base editing in adult mice

Publisher: Springer Science and Business Media LLC

Date: 10-2018

DOI: 10.1038/S41591-018-0209-1

Abstract: CRISPR-Cas-based genome editing holds great promise for targeting genetic disorders, including inborn errors of hepatocyte metabolism. Precise correction of disease-causing mutations in adult tissues in vivo, however, is challenging. It requires repair of Cas9-induced double-stranded DNA (dsDNA) breaks by homology-directed mechanisms, which are highly inefficient in non iding cells. Here we corrected the disease phenotype of adult phenylalanine hydroxylase (Pah)

Publication

ARMOR: an Automated Reproducible MOdular workflow for preprocessing and differential analysis of RNA-seq data

Publisher: Cold Spring Harbor Laboratory

Date: 12-03-2019

DOI: 10.1101/575951

Abstract: The extensive generation of RNA sequencing (RNA-seq) data in the last decade has resulted in a myriad of specialized software for its analysis. Each software module typically targets a specific step within the analysis pipeline, making it necessary to join several of them to get a single cohesive workflow. Multiple software programs automating this procedure have been proposed, but often lack modularity, transparency or flexibility. We present ARMOR, which performs an end-to-end RNA-seq data analysis, from raw read files, via quality checks, alignment and quantification, to differential expression testing, geneset analysis and browser-based exploration of the data. ARMOR is implemented using the Snakemake workflow management system and leverages conda environments Bioconductor objects are generated to facilitate downstream analysis, ensuring seamless integration with many R packages. The workflow is easily implemented by cloning the GitHub repository, replacing the supplied input and reference files and editing a configuration file. Although we have selected the tools currently included in ARMOR, the setup is modular and alternative tools can be easily integrated.

Publication

DNA methylation profiles of elderly individuals subjected to indentured childhood labor and trauma.

Publisher: Springer Science and Business Media LLC

Date: 27-02-2017

DOI: 10.1186/S12881-017-0370-2

Publication

DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics

Publisher: F1000 Research Ltd

Date: 06-12-2016

DOI: 10.12688/F1000RESEARCH.8900.2

Abstract: There are many instances in genomics data analyses where measurements are made on a multivariate response. For ex le, alternative splicing can lead to multiple expressed isoforms from the same primary transcript. There are situations where differences (e.g. between normal and disease state) in the relative ratio of expressed isoforms may have significant phenotypic consequences or lead to prognostic capabilities. Similarly, knowledge of single nucleotide polymorphisms (SNPs) that affect splicing, so-called splicing quantitative trait loci (sQTL) will help to characterize the effects of genetic variation on gene expression. RNA sequencing (RNA-seq) has provided an attractive toolbox to carefully unravel alternative splicing outcomes and recently, fast and accurate methods for transcript quantification have become available. We propose a statistical framework based on the Dirichlet-multinomial distribution that can discover changes in isoform usage between conditions and SNPs that affect relative expression of transcripts using these quantifications. The Dirichlet-multinomial model naturally accounts for the differential gene expression without losing information about overall gene abundance and by joint modeling of isoform expression, it has the capability to account for their correlated nature. The main challenge in this approach is to get robust estimates of model parameters with limited numbers of replicates. We approach this by sharing information and show that our method improves on existing approaches in terms of standard statistical performance metrics. The framework is applicable to other multivariate scenarios, such as Poly-A-seq or where beta-binomial models have been applied (e.g., differential DNA methylation). Our method is available as a Bioconductor R package called DRIMSeq.

Publication

Bisulfite sequencing of chromatin immunoprecipitated DNA (BisChIP-seq) directly informs methylation status of histone-modified DNA

Publisher: Cold Spring Harbor Laboratory

Date: 30-03-2012

DOI: 10.1101/GR.132076.111

Abstract: The complex relationship between DNA methylation, chromatin modification, and underlying DNA sequence is often difficult to unravel with existing technologies. Here, we describe a novel technique based on high-throughput sequencing of bisulfite-treated chromatin immunoprecipitated DNA (BisChIP-seq), which can directly interrogate genetic and epigenetic processes that occur in normal and diseased cells. Unlike most previous reports based on correlative techniques, we found using direct bisulfite sequencing of Polycomb H3K27me3-enriched DNA from normal and prostate cancer cells that DNA methylation and H3K27me3-marked histones are not always mutually exclusive, but can co-occur in a genomic region-dependent manner. Notably, in cancer, the co-dependency of marks is largely redistributed with an increase of the dual repressive marks at CpG islands and transcription start sites of silent genes. In contrast, there is a loss of DNA methylation in intergenic H3K27me3-marked regions. Allele-specific methylation status derived from the BisChIP-seq data clearly showed that both methylated and unmethylated alleles can simultaneously be associated with H3K27me3 histones, highlighting that DNA methylation status in these regions is not dependent on Polycomb chromatin status. BisChIP-seq is a novel approach that can be widely applied to directly interrogate the genomic relationship between allele-specific DNA methylation, histone modification, or other important epigenetic regulators.

Publication

DAMEfinder: A method to detect differential allele-specific methylation

Publisher: Cold Spring Harbor Laboratory

Date: 10-10-2019

DOI: 10.1101/800383

Abstract: DNA methylation is a highly studied epigenetic signature that is associated with regulation of gene expression, whereby genes with high levels of promoter methylation are generally repressed. Genomic imprinting occurs when one of the parental alleles is methylated, i.e, when there is inherited allele-specific methylation (ASM). A special case of imprinting occurs during X chromosome inactivation in females, where one of the two X chromosomes is silenced, in order to achieve dosage compensation between the sexes. Another more widespread form of ASM is sequence dependent (SD-ASM), where ASM is linked to a nearby heterozygous single nucleotide polymorphism (SNP). We developed a method to screen for genomic regions that exhibit loss or gain of ASM in s les from two conditions (treatments, diseases, etc.). The method relies on the availability of bisulfite sequencing data from multiple s les of the two conditions. We leverage other established computational methods to screen for these regions within a new R package called DAMEfinder. It calculates an ASM score for all CpG sites or pairs in the genome of each s le, and then quantifies the change in ASM between conditions. It then clusters nearby CpG sites with consistent change into regions. In the absence of SNP information, our method relies only on reads to quantify ASM. This novel ASM score compares favourably to current methods that also screen for ASM. Not only does it easily discern between imprinted and non-imprinted regions, but also females from males based on X chromosome inactivation. We also applied DAMEfinder to a colorectal cancer dataset and observed that colorectal cancer subtypes are distinguishable according to their ASM signature. We also re-discover known cases of loss of imprinting. We have designed DAMEfinder to detect regions of differential ASM (DAMEs), which is a more refined definition of differential methylation, and can therefore help in breaking down the complexity of DNA methylation and its influence in development and disease.

Publication

Highly efficient DNA-free gene disruption in the agricultural pest Ceratitis capitata by CRISPR-Cas9 ribonucleoprotein complexes

Publisher: Springer Science and Business Media LLC

Date: 30-08-2017

DOI: 10.1038/S41598-017-10347-5

Abstract: The Mediterranean fruitfly Ceratitis capitata (medfly) is an invasive agricultural pest of high economic impact and has become an emerging model for developing new genetic control strategies as an alternative to insecticides. Here, we report the successful adaptation of CRISPR-Cas9-based gene disruption in the medfly by injecting in vitro pre-assembled, solubilized Cas9 ribonucleoprotein complexes (RNPs) loaded with gene-specific single guide RNAs (sgRNA) into early embryos. When targeting the eye pigmentation gene white eye ( we ), a high rate of somatic mosaicism in surviving G0 adults was observed. Germline transmission rate of mutated we alleles by G0 animals was on average above 52%, with in idual cases achieving nearly 100%. We further recovered large deletions in the we gene when two sites were simultaneously targeted by two sgRNAs. CRISPR-Cas9 targeting of the Ceratitis ortholog of the Drosophila segmentation paired gene ( Ccprd ) caused segmental malformations in late embryos and in hatched larvae. Mutant phenotypes correlate with repair by non-homologous end-joining (NHEJ) lesions in the two targeted genes. This simple and highly effective Cas9 RNP-based gene editing to introduce mutations in C. capitata will significantly advance the design and development of new effective strategies for pest control management.

Publication

DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics [version 1; referees: awaiting peer review]

Publisher: F1000 Research Ltd

Date: 13-06-2016

DOI: 10.12688/F1000RESEARCH.8900.1

Abstract: There are many instances in genomics data analyses where measurements are made on a multivariate response. For ex le, alternative splicing can lead to multiple expressed isoforms from the same primary transcript. There are situations where the total abundance of gene expression does not change (e.g. between normal and disease state), but differences in the relative ratio of expressed isoforms may have significant phenotypic consequences or lead to prognostic capabilities. Similarly, knowledge of single nucleotide polymorphisms (SNPs) that affect splicing, so-called splicing quantitative trait loci (sQTL), will help to characterize the effects of genetic variation on gene expression. RNA sequencing (RNA-seq) has provided an attractive toolbox to carefully unravel alternative splicing outcomes and recently, fast and accurate methods for transcript quantification have become available. We propose a statistical framework based on the Dirichlet-multinomial distribution that can discover changes in isoform usage between conditions and SNPs that affect splicing outcome using these quantifications. The Dirichlet-multinomial model naturally accounts for the differential gene expression without losing information about overall gene abundance and by joint modeling of isoform expression, it has the capability to account for their correlated nature. The main challenge in this approach is to get robust estimates of model parameters with limited numbers of replicates. We approach this by sharing information and show that our method improves on existing approaches in terms of standard statistical performance metrics. The framework is applicable to other multivariate scenarios, such as Poly-A-seq or where beta-binomial models have been applied (e.g., differential DNA methylation). Our method is available as a Bioconductor R package called DRIMSeq.

Publication

Acetylation of H2A.Z is a key epigenetic modification associated with gene deregulation and epigenetic remodeling in cancer

Publisher: Cold Spring Harbor Laboratory

Date: 25-07-2011

DOI: 10.1101/GR.118919.110

Abstract: Histone H2A.Z (H2A.Z) is an evolutionarily conserved H2A variant implicated in the regulation of gene expression however, its role in transcriptional deregulation in cancer remains poorly understood. Using genome-wide studies, we investigated the role of promoter-associated H2A.Z and acetylated H2A.Z (acH2A.Z) in gene deregulation and its relationship with DNA methylation and H3K27me3 in prostate cancer. Our results reconcile the conflicting reports of positive and negative roles for histone H2A.Z and gene expression states. We find that H2A.Z is enriched in a bimodal distribution at nucleosomes, surrounding the transcription start sites (TSSs) of both active and poised gene promoters. In addition, H2A.Z spreads across the entire promoter of inactive genes in a deacetylated state. In contrast, acH2A.Z is only localized at the TSSs of active genes. Gene deregulation in cancer is also associated with a reorganization of acH2A.Z and H2A.Z nucleosome occupancy across the promoter region and TSS of genes. Notably, in cancer cells we find that a gain of acH2A.Z at the TSS occurs with an overall decrease of H2A.Z levels, in concert with oncogene activation. Furthermore, deacetylation of H2A.Z at TSSs is increased with silencing of tumor suppressor genes. We also demonstrate that acH2A.Z anti-correlates with promoter H3K27me3 and DNA methylation. We show for the first time, that acetylation of H2A.Z is a key modification associated with gene activity in normal cells and epigenetic gene deregulation in tumorigenesis.

Publication

The proto CpG island methylator phenotype of sessile serrated adenomas/polyps

Publisher: Informa UK Limited

Date: 02-11-2018

DOI: 10.1080/15592294.2018.1543504

Publication

Comparison of clustering methods for high-dimensional single-cell flow and mass cytometry data

Publisher: Wiley

Date: 12-2016

DOI: 10.1002/CYTO.A.23030

Abstract: Recent technological developments in high-dimensional flow cytometry and mass cytometry (CyTOF) have made it possible to detect expression levels of dozens of protein markers in thousands of cells per second, allowing cell populations to be characterized in unprecedented detail. Traditional data analysis by "manual gating" can be inefficient and unreliable in these high-dimensional settings, which has led to the development of a large number of automated analysis methods. Methods designed for unsupervised analysis use specialized clustering algorithms to detect and define cell populations for further downstream analysis. Here, we have performed an up-to-date, extensible performance comparison of clustering methods for high-dimensional flow and mass cytometry data. We evaluated methods using several publicly available data sets from experiments in immunology, containing both major and rare cell populations, with cell population identities from expert manual gating as the reference standard. Several methods performed well, including FlowSOM, X-shift, PhenoGraph, Rclusterpp, and flowMeans. Among these, FlowSOM had extremely fast runtimes, making this method well-suited for interactive, exploratory analysis of large, high-dimensional data sets on a standard laptop or desktop computer. These results extend previously published comparisons by focusing on high-dimensional data and including new methods developed for CyTOF data. R scripts to reproduce all analyses are available from GitHub (mweber/cytometry-clustering-comparison), and pre-processed data files are available from FlowRepository (FR-FCM-ZZPH), allowing our comparisons to be extended to include new clustering methods and reference data sets. © 2016 The Authors. Cytometry Part A published by Wiley Periodicals, Inc. on behalf of ISAC.

Publication

Moderated statistical tests for assessing differences in tag abundance

Publisher: Oxford University Press (OUP)

Date: 19-09-2007

DOI: 10.1093/BIOINFORMATICS/BTM453

Abstract: Motivation: Digital gene expression (DGE) technologies measure gene expression by counting sequence tags. They are sensitive technologies for measuring gene expression on a genomic scale, without the need for prior knowledge of the genome sequence. As the cost of sequencing DNA decreases, the number of DGE datasets is expected to grow dramatically. Various tests of differential expression have been proposed for replicated DGE data using binomial, Poisson, negative binomial or pseudo-likelihood (PL) models for the counts, but none of the these are usable when the number of replicates is very small. Results: We develop tests using the negative binomial distribution to model overdispersion relative to the Poisson, and use conditional weighted likelihood to moderate the level of overdispersion across genes. Not only is our strategy applicable even with the smallest number of libraries, but it also proves to be more powerful than previous strategies when more libraries are available. The methodology is equally applicable to other counting technologies, such as proteomic spectral counts. Availability: An R package can be accessed from bioinf.wehi.edu.au/resources/ Contact: smyth@wehi.edu.au Supplementary information: bioinf.wehi.edu.au/resources/

Publication

An R-based reproducible and user-friendly preprocessing pipeline for CyTOF data

Publisher: F1000 Research Ltd

Date: 22-10-2020

DOI: 10.12688/F1000RESEARCH.26073.1

Abstract: Mass cytometry (CyTOF) has become a method of choice for in-depth characterization of tissue heterogeneity in health and disease, and is currently implemented in multiple clinical trials, where higher quality standards must be met. Currently, preprocessing of raw files is commonly performed in independent standalone tools, which makes it difficult to reproduce. Here, we present an R pipeline based on an updated version of CATALYST that covers all preprocessing steps required for downstream mass cytometry analysis in a fully reproducible way. This new version of CATALYST is based on Bioconductor’s SingleCellExperiment class and fully unit tested. The R-based pipeline includes file concatenation, bead-based normalization, single-cell deconvolution, spillover compensation and live cell gating after debris and doublet removal. Importantly, this pipeline also includes different quality checks to assess machine sensitivity and staining performance while allowing also for batch correction. This pipeline is based on open source R packages and can be easily be adapted to different study designs. It therefore has the potential to significantly facilitate the work of CyTOF users while increasing the quality and reproducibility of data generated with this technology.

Publication

An R-based reproducible and user-friendly preprocessing pipeline for CyTOF data

Publisher: F1000 Research Ltd

Date: 08-08-2022

DOI: 10.12688/F1000RESEARCH.26073.2

Abstract: Mass cytometry (CyTOF) has become a method of choice for in-depth characterization of tissue heterogeneity in health and disease, and is currently implemented in multiple clinical trials, where higher quality standards must be met. Currently, preprocessing of raw files is commonly performed in independent standalone tools, which makes it difficult to reproduce. Here, we present an R pipeline based on an updated version of CATALYST that covers all preprocessing steps required for downstream mass cytometry analysis in a fully reproducible way. This new version of CATALYST is based on Bioconductor’s SingleCellExperiment class and fully unit tested. The R-based pipeline includes file concatenation, bead-based normalization, single-cell deconvolution, spillover compensation and live cell gating after debris and doublet removal. Importantly, this pipeline also includes different quality checks to assess machine sensitivity and staining performance while allowing also for batch correction. This pipeline is based on open source R packages and can be easily be adapted to different study designs. It therefore has the potential to significantly facilitate the work of CyTOF users while increasing the quality and reproducibility of data generated with this technology.

Publication

Coordinated epigenetic remodelling of transcriptional networks occurs during early breast carcinogenesis

Publisher: Springer Science and Business Media LLC

Date: 05-2015

DOI: 10.1186/S13148-015-0086-0

Publication

A systematic performance evaluation of clustering methods for single-cell RNA-seq data

Publisher: F1000 Research Ltd

Date: 16-11-2020

DOI: 10.12688/F1000RESEARCH.15666.3

Abstract: Subpopulation identification, usually via some form of unsupervised clustering, is a fundamental step in the analysis of many single-cell RNA-seq data sets. This has motivated the development and application of a broad range of clustering methods, based on various underlying algorithms. Here, we provide a systematic and extensible performance evaluation of 14 clustering algorithms implemented in R, including both methods developed explicitly for scRNA-seq data and more general-purpose methods. The methods were evaluated using nine publicly available scRNA-seq data sets as well as three simulations with varying degree of cluster separability. The same feature selection approaches were used for all methods, allowing us to focus on the investigation of the performance of the clustering algorithms themselves. We evaluated the ability of recovering known subpopulations, the stability and the run time and scalability of the methods. Additionally, we investigated whether the performance could be improved by generating consensus partitions from multiple in idual clustering methods. We found substantial differences in the performance, run time and stability between the methods, with SC3 and Seurat showing the most favorable results. Additionally, we found that consensus clustering typically did not improve the performance compared to the best of the combined methods, but that several of the top-performing methods already perform some type of consensus clustering. All the code used for the evaluation is available on GitHub ( arkrobinsonuzh/scRNAseq_clustering_comparison ). In addition, an R package providing access to data and clustering results, thereby facilitating inclusion of new methods and data sets, is available from Bioconductor ( ackages/DuoClustering2018 ).

Publication

High-dimensional single-cell analysis reveals the immune signature of narcolepsy

Publisher: Rockefeller University Press

Date: 07-11-2016

DOI: 10.1084/JEM.20160897

Abstract: Narcolepsy type 1 is a devastating neurological sleep disorder resulting from the destruction of orexin-producing neurons in the central nervous system (CNS). Despite its striking association with the HLA-DQB1*06:02 allele, the autoimmune etiology of narcolepsy has remained largely hypothetical. Here, we compared peripheral mononucleated cells from narcolepsy patients with HLA-DQB1*06:02-matched healthy controls using high-dimensional mass cytometry in combination with algorithm-guided data analysis. Narcolepsy patients displayed multifaceted immune activation in CD4+ and CD8+ T cells dominated by elevated levels of B cell–supporting cytokines. Additionally, T cells from narcolepsy patients showed increased production of the proinflammatory cytokines IL-2 and TNF. Although it remains to be established whether these changes are primary to an autoimmune process in narcolepsy or secondary to orexin deficiency, these findings are indicative of inflammatory processes in the pathogenesis of this enigmatic disease.

Publication

treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses

Publisher: Springer Science and Business Media LLC

Date: 17-05-2021

DOI: 10.1186/S13059-021-02368-1

Abstract: treeclimbR is for analyzing hierarchical trees of entities, such as phylogenies or cell types, at different resolutions. It proposes multiple candidates that capture the latent signal and pinpoints branches or leaves that contain features of interest, in a data-driven way. It outperforms currently available methods on synthetic data, and we highlight the approach on various applications, including microbiome and microRNA surveys as well as single-cell cytometry and RNA-seq datasets. With the emergence of various multi-resolution genomic datasets, treeclimbR provides a thorough inspection on entities across resolutions and gives additional flexibility to uncover biological associations.

Publication

A comprehensive single-cell atlas of freshly-dissociated human synovium in inflammatory arthritis with an optimized dissociation protocol for prospective fresh synovial biopsy collection

Publisher: Research Square Platform LLC

Date: 22-06-2022

DOI: 10.21203/RS.3.RS-1702574/V1

Abstract: Single-cell RNA-sequencing is advancing our understanding of synovial pathobiology in inflammatory arthritis. Here, we optimized the protocol for dissociation of synovial biopsies and created a comprehensive reference single-cell atlas of fresh human synovium in inflammatory arthritis. We derived our protocol from the published dissociation method for cryopreserved synovium (Donlin L. et al. Arthritis Res. Ther. 2019) with modifications to enrich synovial cells and minimize cell loss. These modifications enabled consistently high cell yield and viability, thereby minimizing the rate of synovial tissue s le dropout. Our single-cell atlas of the human synovium comprised more than 100’000 unsorted single-cell profiles from 27 synovia of patients with inflammatory arthritis. Synovial cells formed ten lymphoid, 14 myeloid and 17 stromal cell clusters, including IFITM2+ synovial neutrophils. We identified lining SOD2 high SAA1+SAA2+ and transitional SERPINE1+COL5A3+ synovial fibroblasts, exhibiting gene signatures linked to cartilage breakdown (SDC4) and extracellular matrix remodelling (LOXL2, TGFBI, TGFB1), respectively. We uncovered synovial endothelial cell ersity and broadened the transcriptional characterization of tissue-resident FOLR2+ COLEC12+ and SLC40A1+ synovial macrophages, inferring their extracellular matrix sensing and iron recycling activities. Our research brings an efficient synovium dissociation protocol for prospectively collected fresh synovial biopsies and expands the knowledge about human synovium composition in inflammatory arthritis.

Publication

Non-parent of Origin Expression of Numerous Effector Genes Indicates a Role of Gene Regulation in Host Adaption of the Hybrid Triticale Powdery Mildew Pathogen

Publisher: Frontiers Media SA

Date: 30-01-2018

DOI: 10.3389/FPLS.2018.00049

Publication

Epigenetic silencing of monoallelically methylated miRNA loci in precancerous colorectal lesions

Publisher: Springer Science and Business Media LLC

Date: 15-07-2013

DOI: 10.1038/ONCSIS.2013.21

Publication

Single nuclei RNAseq stratifies multiple sclerosis patients into distinct white matter glial responses

Publisher: Cold Spring Harbor Laboratory

Date: 09-04-2022

DOI: 10.1101/2022.04.06.487263

Abstract: The lack of understanding as to the cellular and molecular basis of clinical and genetic heterogeneity in progressive multiple sclerosis (MS) has hindered the search for new effective therapies and biomarkers. Here, to address this gap, we analysed 740,000 single nuclei RNAseq profiles of 165 s les of white matter (WM) lesions, normal appearing WM, grey matter (GM) lesions and normal appearing GM from 55 MS patients and 28 controls. We find that gene expression changes in response to MS are highly cell-type specific in WM and GM lesions but are largely shared within an in idual cell-type across lesions, following a continuum rather than discrete lesion-specific molecular programs. The major biological determinants of variability in gene expression in MS s les relate to in idual patient effects, rather than to lesion types or other metadata. Using multi-omics factor analysis (MOFA+), we identify three subgroups of MS patients with distinct oligodendrocyte composition and WM glial gene expression signatures, suggestive of engagement of different pathological/regenerative processes. The discovery of these three patterns significantly advances our mechanistic understanding of progressive MS, provides a framework to use molecular biomarkers to stratify patients for best therapeutic approaches for progressive MS, and highlights the need for precision-medicine approaches to address heterogeneity among MS patients.

Publication

A comparison of Affymetrix gene expression arrays.

Publisher: Springer Science and Business Media LLC

Date: 15-11-2007

DOI: 10.1186/1471-2105-8-449

Publication

`Repitools`: an R package for the analysis of enrichment-based epigenomic data

Publisher: Oxford University Press (OUP)

Date: 10-05-2010

DOI: 10.1093/BIOINFORMATICS/BTQ247

Abstract: Summary: Epigenetics, the study of heritable somatic phenotypic changes not related to DNA sequence, has emerged as a critical component of the landscape of gene regulation. The epigenetic layers, such as DNA methylation, histone modifications and nuclear architecture are now being extensively studied in many cell types and disease settings. Few software tools exist to summarize and interpret these datasets. We have created a toolbox of procedures to interrogate and visualize epigenomic data (both array- and sequencing-based) and make available a software package for the cross-platform R language. Availability: The package is freely available under LGPL from the R-Forge web site (repitools.r-forge.r-project.org/) Contact: mrobinson@wehi.edu.au

Publication

Consolidation of the cancer genome into domains of repressive chromatin by long-range epigenetic silencing (LRES) reduces transcriptional plasticity

Publisher: Springer Science and Business Media LLC

Date: 21-02-2010

DOI: 10.1038/NCB2023

Publication

Tandem repeat variation in human and great ape populations and its impact on gene expression divergence

Publisher: Cold Spring Harbor Laboratory

Date: 19-08-2015

DOI: 10.1101/GR.190868.115

Abstract: Tandem repeats (TRs) are stretches of DNA that are highly variable in length and mutate rapidly. They are thus an important source of genetic variation. This variation is highly informative for population and conservation genetics. It has also been associated with several pathological conditions and with gene expression regulation. However, genome-wide surveys of TR variation in humans and closely related species have been scarce due to technical difficulties derived from short-read technology. Here we explored the genome-wide ersity of TRs in a panel of 83 human and nonhuman great ape genomes, in a total of six different species, and studied their impact on gene expression evolution. We found that population ersity patterns can be efficiently captured with short TRs (repeat unit length, 1–5 bp). We examined the potential evolutionary role of TRs in gene expression differences between humans and primates by using 30,275 larger TRs (repeat unit length, 2–50 bp). Genes that contained TRs in the promoters, in their 3′ untranslated region, in introns, and in exons had higher expression ergence than genes without repeats in the regions. Polymorphic small repeats (1–5 bp) had also higher expression ergence compared with genes with fixed or no TRs in the gene promoters. Our findings highlight the potential contribution of TRs to human evolution through gene regulation.

Publication

Identifying transcription factor functions and targets by phenotypic activation

Publisher: Proceedings of the National Academy of Sciences

Date: 08-08-2006

DOI: 10.1073/PNAS.0605140103

Abstract: Mapping transcriptional regulatory networks is difficult because many transcription factors (TFs) are activated only under specific conditions. We describe a generic strategy for identifying genes and pathways induced by in idual TFs that does not require knowledge of their normal activation cues. Microarray analysis of 55 yeast TFs that caused a growth phenotype when overexpressed showed that the majority caused increased transcript levels of genes in specific physiological categories, suggesting a mechanism for growth inhibition. Induced genes typically included established targets and genes with consensus promoter motifs, if known, indicating that these data are useful for identifying potential new target genes and binding sites. We identified the sequence 5′-TCACGCAA as a binding sequence for Hms1p, a TF that positively regulates pseudohyphal growth and previously had no known motif. The general strategy outlined here presents a straightforward approach to discovery of TF activities and mapping targets that could be adapted to any organism with transgenic technology.

Publication

BANDITS: Bayesian differential splicing accounting for sample-to-sample variability and mapping uncertainty

Publisher: Springer Science and Business Media LLC

Date: 16-03-2020

DOI: 10.1186/S13059-020-01967-8

Abstract: Alternative splicing is a biological process during gene expression that allows a single gene to code for multiple proteins. However, splicing patterns can be altered in some conditions or diseases. Here, we present BANDITS, a R/Bioconductor package to perform differential splicing, at both gene and transcript level, based on RNA-seq data. BANDITS uses a Bayesian hierarchical structure to explicitly model the variability between s les and treats the transcript allocation of reads as latent variables. We perform an extensive benchmark across both simulated and experimental RNA-seq datasets, where BANDITS has extremely favourable performance with respect to the competitors considered.

Publication

Regional Activation of the Cancer Genome by Long-Range Epigenetic Remodeling

Publisher: Elsevier BV

Date: 2013

DOI: 10.1016/J.CCR.2012.11.006

Abstract: Epigenetic gene deregulation in cancer commonly occurs through chromatin repression and promoter hypermethylation of tumor-associated genes. However, the mechanism underpinning epigenetic-based gene activation in carcinogenesis is still poorly understood. Here, we identify a mechanism of domain gene deregulation through coordinated long-range epigenetic activation (LREA) of regions that typically span 1 Mb and harbor key oncogenes, microRNAs, and cancer biomarker genes. Gene promoters within LREA domains are characterized by a gain of active chromatin marks and a loss of repressive marks. Notably, although promoter hypomethylation is uncommon, we show that extensive DNA hypermethylation of CpG islands or "CpG-island borders" is strongly related to cancer-specific gene activation or differential promoter usage. These findings have wide ramifications for cancer diagnosis, progression, and epigenetic-based gene therapies.

Publication

MiR-CLIP reveals iso-miR selective regulation in the miR-124 targetome

Publisher: Oxford University Press (OUP)

Date: 09-12-2020

DOI: 10.1093/NAR/GKAA1117

Abstract: Many microRNAs regulate gene expression via atypical mechanisms, which are difficult to discern using native cross-linking methods. To ascertain the scope of non-canonical miRNA targeting, methods are needed that identify all targets of a given miRNA. We designed a new class of miR-CLIP probe, whereby psoralen is conjugated to the 3p arm of a pre-microRNA to capture targetomes of miR-124 and miR-132 in HEK293T cells. Processing of pre-miR-124 yields miR-124 and a 5′-extended isoform, iso-miR-124. Using miR-CLIP, we identified overlapping targetomes from both isoforms. From a set of 16 targets, 13 were differently inhibited at mRNA rotein levels by the isoforms. Moreover, delivery of pre-miR-124 into cells repressed these targets more strongly than in idual treatments with miR-124 and iso-miR-124, suggesting that isomirs from one pre-miRNA may function synergistically. By mining the miR-CLIP targetome, we identified nine G-bulged target-sites that are regulated at the protein level by miR-124 but not isomiR-124. Using structural data, we propose a model involving AGO2 helix-7 that suggests why only miR-124 can engage these sites. In summary, access to the miR-124 targetome via miR-CLIP revealed for the first time how heterogeneous processing of miRNAs combined with non-canonical targeting mechanisms expand the regulatory range of a miRNA.

Publication

Large-scale prediction of Saccharomyces cerevisiae gene function using overlapping transcriptional clusters.

Publisher: Springer Science and Business Media LLC

Date: 24-06-2002

DOI: 10.1038/NG906

Publication

RNA recognition by Npl3p reveals U2 snRNA-binding compatible with a chaperone role during splicing

Publisher: Research Square Platform LLC

Date: 23-09-2022

DOI: 10.21203/RS.3.RS-2017343/V1

Abstract: The conserved SR-like protein Npl3 promotes splicing of erse pre-mRNAs. However, the RNA sequence(s) recognized by the RNA Recognition Motifs (RRM1 & RRM2) of Npl3 during the splicing reaction remain elusive. Here, we developed a split-iCRAC approach in yeast to uncover the consensus sequence bound to each RRM. High-resolution NMR structures show that RRM2 recognizes a 5´-GNGG-3´ motif leading to an unusual mille-feuille topology. These structures also reveal how RRM1 preferentially interacts with a CC-dinucleotide upstream of this motif, and how the inter-RRM linker and the region C-terminal to RRM2 contributes to cooperative RNA-binding. Structure-guided functional studies show that Npl3 genetically interacts with U2 snRNP specific factors and we provide evidence that Npl3 melts U2 snRNA stem-loop I, a prerequisite for U2/U6 duplex formation within the catalytic center of the B act spliceosomal complex. Thus, our findings suggest an unanticipated RNA chaperoning role for Npl3 during spliceosome active site formation.

Publication

Censcyt: censored covariates in differential abundance analysis in cytometry

Publisher: Springer Science and Business Media LLC

Date: 10-05-2021

DOI: 10.1186/S12859-021-04125-4

Abstract: Innovations in single cell technologies have lead to a flurry of datasets and computational tools to process and interpret them, including analyses of cell composition changes and transition in cell states. The diffcyt workflow for differential discovery in cytometry data consist of several steps, including preprocessing, cell population identification and differential testing for an association with a binary or continuous covariate. However, the commonly measured quantity of survival time in clinical studies often results in a censored covariate where classical differential testing is inapplicable. To overcome this limitation, multiple methods to directly include censored covariates in differential abundance analysis were examined with the use of simulation studies and a case study. Results show that multiple imputation based methods offer on-par performance with the Cox proportional hazards model in terms of sensitivity and error control, while offering flexibility to account for covariates. The tested methods are implemented in the package censcyt as an extension of diffcyt and are available at ackages/censcyt . Methods for the direct inclusion of a censored variable as a predictor in GLMMs are a valid alternative to classical survival analysis methods, such as the Cox proportional hazard model, while allowing for more flexibility in the differential analysis.

Publication

FIRMA: a method for detection of alternative splicing from exon array data

Publisher: Oxford University Press (OUP)

Date: 23-06-2008

DOI: 10.1093/BIOINFORMATICS/BTN284

Abstract: Motivation: Analyses of EST data show that alternative splicing is much more widespread than once thought. The advent of exon and tiling microarrays means that researchers now have the capacity to experimentally measure alternative splicing on a genome wide level. New methods are needed to analyze the data from these arrays. Results: We present a method, finding isoforms using robust multichip analysis (FIRMA), for detecting differential alternative splicing in exon array data. FIRMA has been developed for Affymetrix exon arrays, but could in principle be extended to other exon arrays, tiling arrays or splice junction arrays. We have evaluated the method using simulated data, and have also applied it to two datasets: a panel of 11 human tissues and a set of 10 pairs of matched normal and tumor colon tissue. FIRMA is able to detect exons in several genes confirmed by reverse transcriptase PCR. Availability: R code implementing our methods is contributed to the package aroma.affymetrix. Contact: epurdom@stat.berkeley.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

Publisher: F1000 Research Ltd

Date: 29-02-2016

DOI: 10.12688/F1000RESEARCH.7563.2

Abstract: High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Various quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that the presence of differential isoform usage can lead to inflated false discovery rates in differential gene expression analyses on simple count matrices but that this can be addressed by incorporating offsets derived from transcript-level abundance estimates. We also show that the problem is relatively minor in several real data sets. Finally, we provide an R package ( tximport ) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

Publication

A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments.

Publisher: Springer Science and Business Media LLC

Date: 29-10-2007

DOI: 10.1186/1471-2105-8-419

Abstract: Gas chromatography-mass spectrometry (GC-MS) is a robust platform for the profiling of certain classes of small molecules in biological s les. When multiple s les are profiled, including replicates of the same s le and/or different s le states, one needs to account for retention time drifts between experiments. This can be achieved either by the alignment of chromatographic profiles prior to peak detection, or by matching signal peaks after they have been extracted from chromatogram data matrices. Automated retention time correction is particularly important in non-targeted profiling studies. A new approach for matching signal peaks based on dynamic programming is presented. The proposed approach relies on both peak retention times and mass spectra. The alignment of more than two peak lists involves three steps: (1) all possible pairs of peak lists are aligned, and similarity of each pair of peak lists is estimated (2) the guide tree is built based on the similarity between the peak lists (3) peak lists are progressively aligned starting with the two most similar peak lists, following the guide tree until all peak lists are exhausted. When two or more experiments are performed on different s le states and each consisting of multiple replicates, peak lists within each set of replicate experiments are aligned first (within-state alignment), and subsequently the resulting alignments are aligned themselves (between-state alignment). When more than two sets of replicate experiments are present, the between-state alignment also employs the guide tree. We demonstrate the usefulness of this approach on GC-MS metabolic profiling experiments acquired on wild-type and mutant Leishmania mexicana parasites. We propose a progressive method to match signal peaks across multiple GC-MS experiments based on dynamic programming. A sensitive peak similarity function is proposed to balance peak retention time and peak mass spectra similarities. This approach can produce the optimal alignment between an arbitrary number of peak lists, and models explicitly within-state and between-state peak alignment. The accuracy of the proposed method was close to the accuracy of manually-curated peak matching, which required tens of man-hours for the analyzed data sets. The proposed approach may offer significant advantages for processing of high-throughput metabolomics data, especially when large numbers of experimental replicates and multiple s le states are analyzed.

Publication

Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences

Publisher: F1000 Research Ltd

Date: 30-12-2015

DOI: 10.12688/F1000RESEARCH.7563.1

Abstract: High-throughput sequencing of cDNA (RNA-seq) is used extensively to characterize the transcriptome of cells. Many transcriptomic studies aim at comparing either abundance levels or the transcriptome composition between given conditions, and as a first step, the sequencing reads must be used as the basis for abundance quantification of transcriptomic features of interest, such as genes or transcripts. Several different quantification approaches have been proposed, ranging from simple counting of reads that overlap given genomic regions to more complex estimation of underlying transcript abundances. In this paper, we show that gene-level abundance estimates and statistical inference offer advantages over transcript-level analyses, in terms of performance and interpretability. We also illustrate that while the presence of differential isoform usage can lead to inflated false discovery rates in differential expression analyses on simple count matrices and transcript-level abundance estimates improve the performance in simulated data, the difference is relatively minor in several real data sets. Finally, we provide an R package ( tximport ) to help users integrate transcript-level abundance estimates from common quantification pipelines into count-based statistical inference engines.

Publication

Compensation of Signal Spillover in Suspension and Imaging Mass Cytometry

Publisher: Elsevier BV

Date: 05-2018

DOI: 10.1016/J.CELS.2018.02.010

Publication

An Optimized Tissue Dissociation Protocol for Single-Cell RNA Sequencing Analysis of Fresh and Cultured Human Skin Biopsies

Publisher: Frontiers Media SA

Date: 28-04-2022

DOI: 10.3389/FCELL.2022.872688

Abstract: We present an optimized dissociation protocol for preparing high-quality skin cell suspensions for in-depth single-cell RNA-sequencing (scRNA-seq) analysis of fresh and cultured human skin. Our protocol enabled the isolation of a consistently high number of highly viable skin cells from small freshly dissociated punch skin biopsies, which we use for scRNA-seq studies. We recapitulated not only the main cell populations of existing single-cell skin atlases, but also identified rare cell populations, such as mast cells. Furthermore, we effectively isolated highly viable single cells from ex vivo cultured skin biopsy fragments and generated a global single-cell map of the explanted human skin. The quality metrics of the generated scRNA-seq datasets were comparable between freshly dissociated and cultured skin. Overall, by enabling efficient cell isolation and comprehensive cell mapping, our skin dissociation-scRNA-seq workflow can greatly facilitate scRNA-seq discoveries across erse human skin pathologies and ex vivo skin explant experimentations.

Publication

Synaptic FUS accumulation triggers early misregulation of synaptic RNAs in a mouse model of ALS

Publisher: Springer Science and Business Media LLC

Date: 21-05-2021

DOI: 10.1038/S41467-021-23188-8

Abstract: Mutations disrupting the nuclear localization of the RNA-binding protein FUS characterize a subset of amyotrophic lateral sclerosis patients (ALS-FUS). FUS regulates nuclear RNAs, but its role at the synapse is poorly understood. Using super-resolution imaging we determined that the localization of FUS within synapses occurs predominantly near the vesicle reserve pool of presynaptic sites. Using CLIP-seq on synaptoneurosomes, we identified synaptic FUS RNA targets, encoding proteins associated with synapse organization and plasticity. Significant increase of synaptic FUS during early disease in a mouse model of ALS was accompanied by alterations in density and size of GABAergic synapses. mRNAs abnormally accumulated at the synapses of 6-month-old ALS-FUS mice were enriched for FUS targets and correlated with those depicting increased short-term mRNA stability via binding primarily on multiple exonic sites. Our study indicates that synaptic FUS accumulation in early disease leads to synaptic impairment, potentially representing an initial trigger of neurodegeneration.

Publication

H3K4me3 enrichment defines neuronal age, while a youthful H3K27ac signature is recapitulated in aged neurons

Publisher: Cold Spring Harbor Laboratory

Date: 12-11-2021

DOI: 10.1101/2021.11.11.467877

Abstract: Neurons live for the lifespan of the in idual and underlie our ability for lifelong learning and memory. However, aging alters neuron morphology and function resulting in age-related cognitive decline. It is well established that epigenetic alterations are essential for learning and memory, yet few neuron-specific genome-wide epigenetic maps exist into old age. Comprehensive mapping of H3K4me3 and H3K27ac in mouse neurons across lifespan revealed plastic H3K4me3 marking that differentiates neuronal age linked to known characteristics of cellular and neuronal aging. We determined that neurons in old age recapitulate the H3K27ac enrichment at promoters, enhancers and super enhancers from young adult neurons, likely representing a re-activation of pathways to maintain neuronal output. Finally, this study identified new characteristics of neuronal aging, including altered rDNA regulation and epigenetic regulatory mechanisms. Collectively, these findings indicate a key role for epigenetic regulation in neurons, that is inextricably linked with aging.

Publication

CD8+ T cells retain protective functions despite sustained inhibitory receptor expression during Epstein-Barr virus infection in vivo

Publisher: Public Library of Science (PLoS)

Date: 30-05-2019

DOI: 10.1371/JOURNAL.PPAT.1007748

Publication

Covalent linkage of the DNA repair template to the CRISPR-Cas9 nuclease enhances homology-directed repair

Publisher: eLife Sciences Publications, Ltd

Date: 29-05-2018

DOI: 10.7554/ELIFE.33761

Abstract: The CRISPR-Cas9 targeted nuclease technology allows the insertion of genetic modifications with single base-pair precision. The preference of mammalian cells to repair Cas9-induced DNA double-strand breaks via error-prone end-joining pathways rather than via homology-directed repair mechanisms, however, leads to relatively low rates of precise editing from donor DNA. Here we show that spatial and temporal co-localization of the donor template and Cas9 via covalent linkage increases the correction rates up to 24-fold, and demonstrate that the effect is mainly caused by an increase of donor template concentration in the nucleus. Enhanced correction rates were observed in multiple cell types and on different genomic loci, suggesting that covalently linking the donor template to the Cas9 complex provides advantages for clinical applications where high-fidelity repair is desired.

Publication

ReSeq simulates realistic Illumina high-throughput sequencing data

Publisher: Cold Spring Harbor Laboratory

Date: 17-07-2020

DOI: 10.1101/2020.07.17.209072

Abstract: In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions in the data processing from raw data to the scientific result. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and s ling-matrix estimates based on two-dimensional margins. These improvements lead to a better representation of the original k-mer spectrum and more faithful performance evaluations. ReSeq and all of its code are available at: chmeing/ReSeq

Publication

Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage

Publisher: Springer Science and Business Media LLC

Date: 26-01-2016

DOI: 10.1186/S13059-015-0862-3

Publication

High-dimensional single-cell analysis predicts response to anti-PD-1 immunotherapy

Publisher: Springer Science and Business Media LLC

Date: 08-01-2018

DOI: 10.1038/NM.4466

Abstract: Immune-checkpoint blockade has revolutionized cancer therapy. In particular, inhibition of programmed cell death protein 1 (PD-1) has been found to be effective for the treatment of metastatic melanoma and other cancers. Despite a dramatic increase in progression-free survival, a large proportion of patients do not show durable responses. Therefore, predictive biomarkers of a clinical response are urgently needed. Here we used high-dimensional single-cell mass cytometry and a bioinformatics pipeline for the in-depth characterization of the immune cell subsets in the peripheral blood of patients with stage IV melanoma before and after 12 weeks of anti-PD-1 immunotherapy. During therapy, we observed a clear response to immunotherapy in the T cell compartment. However, before commencing therapy, a strong predictor of progression-free and overall survival in response to anti-PD-1 immunotherapy was the frequency of CD14

Publication

Small-sample estimation of negative binomial dispersion, with applications to SAGE data

Publisher: Oxford University Press (OUP)

Date: 11-07-2007

DOI: 10.1093/BIOSTATISTICS/KXM030

Abstract: We derive a quantile-adjusted conditional maximum likelihood estimator for the dispersion parameter of the negative binomial distribution and compare its performance, in terms of bias, to various other methods. Our estimation scheme outperforms all other methods in very small s les, typical of those from serial analysis of gene expression studies, the motivating data for this study. The impact of dispersion estimation on hypothesis testing is studied. We derive an "exact" test that outperforms the standard approximate asymptotic tests.

Publication

ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw

Publisher: Cold Spring Harbor Laboratory

Date: 16-07-2020

DOI: 10.1101/2020.07.16.206193

Abstract: Whole genome duplication (WGD) events are common in the evolutionary history of many living organisms. For decades, researchers have been trying to understand the genetic and epigenetic impact of WGD and its underlying molecular mechanisms. Particular attention was given to allopolyploid study systems, species resulting from an hybridization event accompanied by WGD. Investigating the mechanisms behind the survival of a newly formed allopolyploid highlighted the key role of DNA methylation. With the improvement of high-throughput methods, such as whole genome bisulfite sequencing (WGBS), an opportunity opened to further understand the role of DNA methylation at a larger scale and higher resolution. However, only a few studies have applied WGBS to allopolyploids, which might be due to lack of genomic resources combined with a burdensome data analysis process. To overcome these problems, we developed the Automated Reproducible Polyploid EpiGenetic GuIdance workflOw (ARPEGGIO): the first workflow for the analysis of epigenetic data in polyploids. This workflow analyzes WGBS data from allopolyploid species via the genome assemblies of the allopolyploid’s parent species. ARPEGGIO utilizes an updated read classification algorithm (EAGLE-RC), to tackle the challenge of sequence similarity amongst parental genomes. ARPEGGIO offers automation, but more importantly, a complete set of analyses including spot checks starting from raw WGBS data: quality checks, trimming, alignment, methylation extraction, statistical analyses and downstream analyses. A full run of ARPEGGIO outputs a list of genes showing differential methylation. ARPEGGIO’s design focuses on ease of use and reproducibility. ARPEGGIO was made simple to set up, run and interpret, and its implementation includes both package management and containerization. Here we discuss all the steps, challenges and implementation strategies ex le datasets are provided to show how to use ARPEGGIO. In addition, we also test EAGLE-RC with publicly available datasets given a ground truth, and we show that EAGLE-RC decreases the error rate by 3 to 4 times compared to standard approaches. The goal of ARPEGGIO is to promote, support and improve polyploid research with a reproducible and automated set of analyses in a convenient implementation.

Publication

Abstract 4225: Is biomarker-driven precision medicine possible by using high dimensional augmented intelligence assisted analysis of cancer immune responses

Publisher: American Association for Cancer Research (AACR)

Date: 07-2019

DOI: 10.1158/1538-7445.AM2019-4225

Abstract: Checkpoint inhibitors have significantly accelerated cancer treatment but still a majority of patients do not respond. Biomarker driven patient stratification early to the right immunotherapeutic might enhance response and patient survival. Here we used high-dimensional mass cytometry (CyTOF) combined with machine-learning bioinformatics for the in-depth characterization of immune responses before and during anti-PD-1 immunotherapy. CyTOF allows us to monitor protein expression of 34 markers on a single cell while running 20 s les simultaneously. The analysis is data driven, can be adapted to high throughput approaches and can model arbitrary trial designs such as batch effects and paired designs and is quantitative over millions of events. Using CyTOF as a precision medicine tool we could predict response to anti-PD-1 using liquid blood biopsies. Biobanked peripheral blood mononuclear cells (PBMCs) from 51 patients with stage IV melanoma before and after 12 weeks of anti-PD-1 therapy was analyzed. We observed a clear T cell response on therapy. The most evident difference in responders before therapy was an enhanced frequency of CD14+ CD16+HLA-DRhi classical monocytes. We validated our results using conventional flow and found a clear correlation of enhanced monocyte frequencies before therapy initiation with clinical response such as lower hazard and extended progression-free and overall survival. In a second study we used CyTOF to monitor immune response in 21 non small cell lung cancer (NSCLC) patients that initially responded and then progressed under anti-PD-1 to a novel combination immunotherapy of anti-PD-1 plus an IL-15 super-agonist (ALT-803). In this phase Ib clinical study a response in the CD8+ T cell compartment was observed. Unexpected our high dimensional unbiased analysis was able to detect and characterize a strong expansion of innate tumor-reactive effector NK cells starting around day 4 of therapy. Taken together, our unbiased artificial intelligence driven immune workflow might support patient selection prior to therapy, and serve as a novel tool for precision medicine to select the right drug combination and identify new drug-able cell populations. Citation Format: Carsten Krieg, Luis Cardenas, Silvia Guglietta, John Wrangle, Mark Rubinstein, Mark Robinson. Is biomarker-driven precision medicine possible by using high dimensional augmented intelligence assisted analysis of cancer immune responses [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019 2019 Mar 29-Apr 3 Atlanta, GA. Philadelphia (PA): AACR Cancer Res 2019 (13 Suppl):Abstract nr 4225.

Publication

CrispRVariants charts the mutation spectrum of genome engineering experiments

Publisher: Springer Science and Business Media LLC

Date: 07-2016

DOI: 10.1038/NBT.3628

Publication

Hand2 delineates mesothelium progenitors and is reactivated in mesothelioma

Publisher: Cold Spring Harbor Laboratory

Date: 11-11-2020

DOI: 10.1101/2020.11.11.355693

Abstract: The mesothelium forms epithelial membranes that line the bodies cavities and surround the internal organs. Mesothelia widely contribute to organ homeostasis and regeneration, and their dysregulation can result in congenital anomalies of the viscera, ventral wall defects, and mesothelioma tumors. Nonetheless, the embryonic ontogeny and developmental regulation of mesothelium formation has remained uncharted. Here, we combine genetic lineage tracing, in toto live imaging, and single-cell transcriptomics in zebrafish to track mesothelial progenitor origins from the lateral plate mesoderm (LPM). Our single-cell analysis uncovers a post-gastrulation gene expression signature centered on hand2 that delineates distinct progenitor populations within the forming LPM. Combining gene expression analysis and imaging of transgenic reporter zebrafish embryos, we chart the origin of mesothelial progenitors to the lateral-most, hand2 -expressing LPM and confirm evolutionary conservation in mouse. Our time-lapse imaging of transgenic hand2 reporter embryos captures zebrafish mesothelium formation, documenting the coordinated cell movements that form pericardium and visceral and parietal peritoneum. We establish that the primordial germ cells migrate associated with the forming mesothelium as ventral migration boundary. Functionally, hand2 mutants fail to close the ventral mesothelium due to perturbed migration of mesothelium progenitors. Analyzing mouse and human mesothelioma tumors hypothesized to emerge from transformed mesothelium, we find de novo expression of LPM-associated transcription factors, and in particular of Hand2, indicating the re-initiation of a developmental transcriptional program in mesothelioma. Taken together, our work outlines a genetic and developmental signature of mesothelial origins centered around Hand2, contributing to our understanding of mesothelial pathologies and mesothelioma.

Publication

A scaling normalization method for differential expression analysis of RNA-seq data

Publisher: Springer Science and Business Media LLC

Date: 2010

DOI: 10.1186/GB-2010-11-3-R25

Publication

The synthetic genetic interaction spectrum of essential genes.

Publisher: Springer Science and Business Media LLC

Date: 11-09-2005

DOI: 10.1038/NG1640

Abstract: The nature of synthetic genetic interactions involving essential genes (those required for viability) has not been previously examined in a broad and unbiased manner. We crossed yeast strains carrying promoter-replacement alleles for more than half of all essential yeast genes to a panel of 30 different mutants with defects in erse cellular processes. The resulting genetic network is biased toward interactions between functionally related genes, enabling identification of a previously uncharacterized essential gene (PGA1) required for specific functions of the endoplasmic reticulum. But there are also many interactions between genes with dissimilar functions, suggesting that in idual essential genes are required for buffering many cellular processes. The most notable feature of the essential synthetic genetic network is that it has an interaction density five times that of nonessential synthetic genetic networks, indicating that most yeast genetic interactions involve at least one essential gene.

Publication

MyMED: A database system for biomedical research on MEDLINE data

Publisher: IBM

Date: 2004

DOI: 10.1147/SJ.2004.5386762

Publication

A unique enhancer boundary complex on the mouse ribosomal RNA genes persists after loss of Rrn3 or UBF and the inactivation of RNA polymerase I transcription

Publisher: Public Library of Science (PLoS)

Date: 17-07-2017

DOI: 10.1371/JOURNAL.PGEN.1006899

Publication

Differential Gene Expression in the Siphonophore Nanomia bijuga (Cnidaria) Assessed with Multiple Next-Generation Sequencing Workflows

Publisher: Public Library of Science (PLoS)

Date: 29-07-2011

DOI: 10.1371/JOURNAL.PONE.0022953

Publication

High-Throughput Mapping of a Dynamic Signaling Network in Mammalian Cells

Publisher: American Association for the Advancement of Science (AAAS)

Date: 11-03-2005

DOI: 10.1126/SCIENCE.1105776

Abstract: Signaling pathways transmit information through protein interaction networks that are dynamically regulated by complex extracellular cues. We developed LUMIER (for luminescence-based mammalian interactome mapping), an automated high-throughput technology, to map protein-protein interaction networks systematically in mammalian cells and applied it to the transforming growth factorâÎ² (TGFÎ²) pathway. Analysis using self-organizing maps and k -means clustering identified links of the TGFÎ² pathway to the p21-activated kinase (PAK) network, to the polarity complex, and to Occludin, a structural component of tight junctions. We show that Occludin regulates TGFÎ² type I receptor localization for efficient TGFÎ²-dependent dissolution of tight junctions during epithelial-to-mesenchymal transitions.

Publication

Genomics by the beach

Publisher: Springer Science and Business Media LLC

Date: 2014

DOI: 10.1186/GB4171

Publication

Genome-wide analysis of mRNA stability using transcription inhibitors and microarrays reveals posttranscriptional control of ribosome biogenesis factors.

Publisher: Informa UK Limited

Date: 06-2004

DOI: 10.1128/MCB.24.12.5534-5547.2004

Publication

DAMEfinder: a method to detect differential allele-specific methylation

Publisher: Springer Science and Business Media LLC

Date: 06-2020

DOI: 10.1186/S13072-020-00346-8

Abstract: DNA methylation is a highly studied epigenetic signature that is associated with regulation of gene expression, whereby genes with high levels of promoter methylation are generally repressed. Genomic imprinting occurs when one of the parental alleles is methylated, i.e., when there is inherited allele-specific methylation (ASM). A special case of imprinting occurs during X chromosome inactivation in females, where one of the two X chromosomes is silenced, to achieve dosage compensation between the sexes. Another more widespread form of ASM is sequence dependent (SD-ASM), where ASM is linked to a nearby heterozygous single nucleotide polymorphism (SNP). We developed a method to screen for genomic regions that exhibit loss or gain of ASM in s les from two conditions (treatments, diseases, etc.). The method relies on the availability of bisulfite sequencing data from multiple s les of the two conditions. We leverage other established computational methods to screen for these regions within a new R package called DAMEfinder. It calculates an ASM score for all CpG sites or pairs in the genome of each s le, and then quantifies the change in ASM between conditions. It then clusters nearby CpG sites with consistent change into regions. In the absence of SNP information, our method relies only on reads to quantify ASM. This novel ASM score compares favorably to current methods that also screen for ASM. Not only does it easily discern between imprinted and non-imprinted regions, but also females from males based on X chromosome inactivation. We also applied DAMEfinder to a colorectal cancer dataset and observed that colorectal cancer subtypes are distinguishable according to their ASM signature. We also re-discover known cases of loss of imprinting. We have designed DAMEfinder to detect regions of differential ASM (DAMEs), which is a more refined definition of differential methylation, and can therefore help in breaking down the complexity of DNA methylation and its influence in development and disease.

Publication

CrispRVariants: precisely charting the mutation spectrum in genome engineering experiments

Publisher: Cold Spring Harbor Laboratory

Date: 10-12-2015

DOI: 10.1101/034140

Abstract: CRISPR-Cas9 and related technologies efficiently alter genomic DNA at targeted positions and have far-reaching implications for functional screening and therapeutic gene editing. Understanding and unlocking this potential requires accurate evaluation of editing efficiency. We show that methodological decisions for analyzing sequencing data can significantly affect mutagenesis efficiency estimates and we provide a comprehensive R-based toolkit, CrispRVariants and accompanying web tool CrispRVariantsLite, that resolves and localizes in idual mutant alleles with respect to the endonuclease cut site. CrispRVariants-enabled analyses of newly generated and existing genome editing datasets underscore how careful consideration of the full variant spectrum gives insight toward effective guide and licon design as well as the mutagenic process.

Publication

The shaky foundations of simulating single-cell RNA sequencing data

Publisher: Springer Science and Business Media LLC

Date: 29-03-2023

DOI: 10.1186/S13059-023-02904-1

Abstract: With the emergence of hundreds of single-cell RNA-sequencing (scRNA-seq) datasets, the number of computational tools to analyze aspects of the generated data has grown rapidly. As a result, there is a recurring need to demonstrate whether newly developed methods are truly performant—on their own as well as in comparison to existing tools. Benchmark studies aim to consolidate the space of available methods for a given task and often use simulated data that provide a ground truth for evaluations, thus demanding a high quality standard results credible and transferable to real data. Here, we evaluated methods for synthetic scRNA-seq data generation in their ability to mimic experimental data. Besides comparing gene- and cell-level quality control summaries in both one- and two-dimensional settings, we further quantified these at the batch- and cluster-level. Secondly, we investigate the effect of simulators on clustering and batch correction method comparisons, and, thirdly, which and to what extent quality control summaries can capture reference-simulation similarity. Our results suggest that most simulators are unable to accommodate complex designs without introducing artificial effects, they yield over-optimistic performance of integration and potentially unreliable ranking of clustering methods, and it is generally unknown which summaries are important to ensure effective simulation-based method comparisons.

Publication

TreeSummarizedExperiment: a S4 class for data with hierarchical structure

Publisher: F1000 Research Ltd

Date: 02-03-2021

DOI: 10.12688/F1000RESEARCH.26669.2

Abstract: Data organized into hierarchical structures (e.g., phylogenies or cell types) arises in several biological fields. It is therefore of interest to have data containers that store the hierarchical structure together with the biological profile data, and provide functions to easily access or manipulate data at different resolutions. Here, we present TreeSummarizedExperiment, a R/S4 class that extends the commonly used SingleCellExperiment class by incorporating tree representations of rows and/or columns (represented by objects of the phylo class). It follows the convention of the SummarizedExperiment class, while providing links between the assays and the nodes of a tree to allow data manipulation at arbitrary levels of the tree. The package is designed to be extensible, allowing new functions on the tree (phylo) to be contributed. As the work is based on the SingleCellExperiment class and the phylo class, both of which are popular classes used in many R packages, it is expected to be able to interact seamlessly with many other tools.

Publication

TreeSummarizedExperiment: a S4 class for data with hierarchical structure

Publisher: F1000 Research Ltd

Date: 15-10-2020

DOI: 10.12688/F1000RESEARCH.26669.1

Abstract: Data organized into hierarchical structures (e.g., phylogenies or cell types) arises in several biological fields. It is therefore of interest to have data containers that store the hierarchical structure together with the biological profile data, and provide functions to easily access or manipulate data at different resolutions. Here, we present TreeSummarizedExperiment, a R/S4 class that extends the commonly used SingleCellExperiment class by incorporating tree representations of rows and/or columns (represented by objects of the phylo class). It follows the convention of the SummarizedExperiment class, while providing links between the assays and the nodes of a tree to allow data manipulation at arbitrary levels of the tree. The package is designed to be extensible, allowing new functions on the tree (phylo) to be contributed. As the work is based on the SingleCellExperiment class and the phylo class, both of which are popular classes used in many R packages, it is expected to be able to interact seamlessly with many other tools.

Publication

muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data

Publisher: Springer Science and Business Media LLC

Date: 30-11-2020

DOI: 10.1038/S41467-020-19894-4

Abstract: Single-cell RNA sequencing (scRNA-seq) has become an empowering technology to profile the transcriptomes of in idual cells on a large scale. Early analyses of differential expression have aimed at identifying differences between subpopulations to identify subpopulation markers. More generally, such methods compare expression levels across sets of cells, thus leading to cross-condition analyses. Given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making s le-level inferences, termed here as differential state analysis however, it is not clear which statistical framework best handles this situation. Here, we surveyed methods to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated pseudobulk data. To evaluate method performance, we developed a flexible simulation that mimics multi-s le scRNA-seq data. We analyzed scRNA-seq data from mouse cortex cells to uncover subpopulation-specific responses to lipopolysaccharide treatment, and provide robust tools for multi-condition analysis within the muscat R package.

Publication

MINI REVIEW: Statistical methods for detecting differentially methylated loci and regions

Publisher: Cold Spring Harbor Laboratory

Date: 15-07-2014

DOI: 10.1101/007120

Abstract: DNA methylation, and specifically the reversible addition of methyl groups at CpG dinucleotides genome-wide, represents an important layer that is associated with the regulation of gene expression. In particular, aberrations in the methylation status have been noted across a erse set of pathological states, including cancer. With the rapid development and uptake of large scale sequencing of short DNA fragments, there has been an explosion of data analytic methods for processing and discovering changes in DNA methylation across erse data types. In this mini-review, we aim to condense many of the salient challenges, such as experimental design, statistical methods for differential methylation detection and critical considerations such as cell type composition and the potential confounding that can arise from batch effects, into a compact and accessible format. Our main interests, from a statistical perspective, include the practical use of empirical Bayes or hierarchical models, which have been shown to be immensely powerful and flexible in genomics and the procedures by which control of false discoveries are made. Of course, there are many critical platform-specific data preprocessing aspects that we do not discuss here. In addition, we do not make formal performance comparisons of the methods, but rather describe the commonly used statistical models and many of the pertinent issues we make some recommendations for further study.

Publication

Transcriptional networks: reverse-engineering gene regulation on a global scale

Publisher: Elsevier BV

Date: 12-2004

DOI: 10.1016/J.MIB.2004.10.009

Abstract: A major objective in post-genome research is to fully understand the transcriptional control of each gene and the targets of each transcription factor. In yeast, large-scale experimental and computational approaches have been applied to identify co-regulated genes, cis regulatory elements, and transcription factor DNA binding sites in vivo. Methods for modeling and predicting system behavior, and for reconciling discrepancies among data types, are being explored. The results indicate that a complete and comprehensive yeast transcriptional network will ultimately be achieved.

Publication

Genome-wide analysis of mouse transcripts using exon microarrays and factor graphs

Publisher: Springer Science and Business Media LLC

Date: 28-08-2005

DOI: 10.1038/NG1630

Abstract: Recent mammalian microarray experiments detected widespread transcription and indicated that there may be many undiscovered multiple-exon protein-coding genes. To explore this possibility, we labeled cDNA from un lified, polyadenylation-selected RNA s les from 37 mouse tissues to microarrays encompassing 1.14 million exon probes. We analyzed these data using GenRate, a Bayesian algorithm that uses a genome-wide scoring function in a factor graph to infer genes. At a stringent exon false detection rate of 2.7%, GenRate detected 12,145 gene-length transcripts and confirmed 81% of the 10,000 most highly expressed known genes. Notably, our analysis showed that most of the 155,839 exons detected by GenRate were associated with known genes, providing microarray-based evidence that most multiple-exon genes have already been identified. GenRate also detected tens of thousands of potential new exons and reconciled discrepancies in current cDNA databases by 'stitching' new transcribed regions into previously annotated genes.

Publication

Benchmarking computational methods for single-cell chromatin data analysis

Publisher: Cold Spring Harbor Laboratory

Date: 07-08-2023

DOI: 10.1101/2023.08.04.552046

Abstract: Single-cell chromatin accessibility assays, such as scATAC-seq, are increasingly employed in in idual and joint multi-omic profiling of single cells. As the accumulation of scATAC-seq and multi-omics datasets continue, challenges in analyzing such sparse, noisy, and high-dimensional data become pressing. Specifically, one challenge relates to optimizing the processing of chromatin-level measurements and efficiently extracting information to discern cellular heterogeneity. This is of critical importance, since the identification of cell types is a fundamental step in current single-cell data analysis practices. We benchmarked 8 feature engineering pipelines derived from 5 recent methods to assess their ability to discover and discriminate cell types. By using 10 metrics calculated at the cell embedding, shared nearest neighbor graph, or partition levels, we evaluated the performance of each method at different data processing stages. This comprehensive approach allowed us to thoroughly understand the strengths and weaknesses of each method and the influence of parameter selection. Our analysis provides guidelines for choosing analysis methods for different datasets. Overall, feature aggregation, SnapATAC, and SnapATAC2 outperform latent semantic indexing-based methods. For datasets with complex cell-type structures, SnapATAC and SnapATAC2 are preferred. With large datasets, SnapATAC2 and ArchR are most scalable.

Publication

Savant Genome Browser 2: visualization and analysis for population-scale genomics

Publisher: Oxford University Press (OUP)

Date: 25-05-2012

DOI: 10.1093/NAR/GKS427

Publication

DESpace: spatially variable gene detection via differential expression testing of spatial clusters

Publisher: Cold Spring Harbor Laboratory

Date: 18-04-2023

DOI: 10.1101/2023.04.17.537189

Abstract: Spatially resolved transcriptomics (SRT) enables scientists to investigate spatial context of mRNA abundance. Here, we introduce DESpace , a novel approach to discover spatially variable genes (SVGs), i.e., genes whose expression varies across the tissue. Our framework inputs all types of SRT data, summarizes spatial information via spatial clusters, and identifies spatially variable genes by performing differential gene expression testing between clusters. Although several methods have been proposed to identify SVGs, our approach adds some unique features in particular: it allows identifying (and testing) the specific areas of the tissue affected by spatial variability, and it enables joint modelling of multiple s les (i.e., biological replicates). Furthermore, in our benchmarks, DESpace displays a higher true positive rate than competitors, controls for false positive and false discovery rates, and is among the most computationally efficient SVG tools. DESpace is distributed as a Bioconductor R package.

Publication

Protocol matters: which methylome are you actually studying?

Publisher: Future Medicine Ltd

Date: 08-2010

DOI: 10.2217/EPI.10.36

Abstract: The field of epigenetics is now capitalizing on the vast number of emerging technologies, largely based on second-generation sequencing, which interrogate DNA methylation status and histone modifications genome-wide. However, getting an exhaustive and unbiased view of a methylome at a reasonable cost is proving to be a significant challenge. In this article, we take a closer look at the impact of the DNA sequence and bias effects introduced to datasets by genome-wide DNA methylation technologies and where possible, explore the bioinformatics tools that deconvolve them. There remains much to be learned about the performance of genome-wide technologies, the data we mine from these assays and how it reflects the actual biology. While there are several methods to interrogate the DNA methylation status genome-wide, our opinion is that no single technique suitably covers the minimum criteria of high coverage and, high resolution at a reasonable cost. In fact, the fraction of the methylome that is studied currently depends entirely on the inherent biases of the protocol employed. There is promise for this to change, as the third generation of sequencing technologies is expected to again ‘revolutionize’ the way that we study genomes and epigenomes.

Publication

Towards unified quality verification of synthetic count data with countsimQC

Publisher: Oxford University Press (OUP)

Date: 04-10-2018

DOI: 10.1093/BIOINFORMATICS/BTX631

Abstract: Statistical tools for biological data analysis are often evaluated using synthetic data, designed to mimic the features of a specific type of experimental data. The generalizability of such evaluations depends on how well the synthetic data reproduce the main characteristics of the experimental data, and we argue that an assessment of this similarity should accompany any synthetic dataset used for method evaluation. We describe countsimQC, which provides a straightforward way to generate a stand-alone report that shows the main characteristics of (e.g. RNA-seq) count data and can be provided alongside a publication as verification of the appropriateness of any utilized synthetic data. countsimQC is implemented as an R package (for R versions ≥ 3.4) and is available from soneson/countsimQC under a GPL (≥2) license.

Publication

DUSP4 deficiency caused by promoter hypermethylation drives JNK signaling and tumor cell survival in diffuse large B cell lymphoma

Publisher: Rockefeller University Press

Date: 06-04-2015

DOI: 10.1084/JEM.20141957

Abstract: The epigenetic dysregulation of tumor suppressor genes is an important driver of human carcinogenesis. We have combined genome-wide DNA methylation analyses and gene expression profiling after pharmacological DNA demethylation with functional screening to identify novel tumor suppressors in diffuse large B cell lymphoma (DLBCL). We find that a CpG island in the promoter of the dual-specificity phosphatase DUSP4 is aberrantly methylated in nodal and extranodal DLBCL, irrespective of ABC or GCB subtype, resulting in loss of DUSP4 expression in 75% of & examined cases. The DUSP4 genomic locus is further deleted in up to 13% of aggressive B cell lymphomas, and the lack of DUSP4 is a negative prognostic factor in three independent cohorts of DLBCL patients. Ectopic expression of wild-type DUSP4, but not of a phosphatase-deficient mutant, dephosphorylates c-JUN N-terminal kinase (JNK) and induces apoptosis in DLBCL cells. Pharmacological or dominant-negative JNK inhibition restricts DLBCL survival in vitro and in vivo and synergizes strongly with the Bruton’s tyrosine kinase inhibitor ibrutinib. Our results indicate that DLBCL cells depend on JNK signaling for survival. This finding provides a mechanistic basis for the clinical development of JNK inhibitors in DLBCL, ideally in synthetic lethal combinations with inhibitors of chronic active B cell receptor signaling.

Publication

Large‐scale mapping of human protein–protein interactions by mass spectrometry

Publisher: EMBO

Date: 2007

DOI: 10.1038/MSB4100134

Publication

Design and synthesis of 2-oxindole based multi-targeted inhibitors of PDK1/Akt signaling pathway for the treatment of glioblastoma multiforme

Publisher: Elsevier BV

Date: 11-2015

DOI: 10.1016/J.EJMECH.2015.10.020

Abstract: Aggressive behavior and diffuse infiltrative growth are the main features of Glioblastoma multiforme (GBM), together with the high degree of resistance and recurrence. Evidence indicate that GBM-derived stem cells (GSCs), endowed with unlimited proliferative potential, play a critical role in tumor development and maintenance. Among the many signaling pathways involved in maintaining GSC stemness, tumorigenic potential, and anti-apoptotic properties, the PDK1/Akt pathway is a challenging target to develop new potential agents able to affect GBM resistance to chemotherapy. In an effort to find new PDK1/Akt inhibitors, we rationally designed and synthesized a small family of 2-oxindole derivatives. Among them, compound 3 inhibited PDK1 kinase and downstream effectors such as CHK1, GS3Kα and GS3Kβ, which contribute to GCS survival. Compound 3 appeared to be a good tool for studying the role of the PDK1/Akt pathway in GCS self-renewal and tumorigenicity, and might represent the starting point for the development of more potent and focused multi-target therapies for GBM.

Publication

CellMixS: quantifying and visualizing batch effects in single-cell RNA-seq data

Publisher: Life Science Alliance, LLC

Date: 23-03-2021

DOI: 10.26508/LSA.202001004

Abstract: A key challenge in single-cell RNA-sequencing (scRNA-seq) data analysis is batch effects that can obscure the biological signal of interest. Although there are various tools and methods to correct for batch effects, their performance can vary. Therefore, it is important to understand how batch effects manifest to adjust for them. Here, we systematically explore batch effects across various scRNA-seq datasets according to magnitude, cell type specificity, and complexity. We developed a cell-specific mixing score (cms) that quantifies mixing of cells from multiple batches. By considering distance distributions, the score is able to detect local batch bias as well as differentiate between unbalanced batches and systematic differences between cells of the same cell type. We compare metrics in scRNA-seq data using real and synthetic datasets and whereas these metrics target the same question and are used interchangeably, we find differences in scalability, sensitivity, and ability to handle differentially abundant cell types. We find that cell-specific metrics outperform cell type–specific and global metrics and recommend them for both method benchmarks and batch exploration.

Publication

A new bioinformatic pipeline allows the design of small, targeted gene panels for efficient TMB estimation

Publisher: Elsevier BV

Date: 04-2019

DOI: 10.1093/ANNONC/MDZ073.003

Publication

Discovery pipeline for epigenetically deregulated miRNAs in cancer: integration of primary miRNA transcription

Publisher: Springer Science and Business Media LLC

Date: 21-01-2011

DOI: 10.1186/1471-2164-12-54

Abstract: Cancer is commonly associated with widespread disruption of DNA methylation, chromatin modification and miRNA expression. In this study, we established a robust discovery pipeline to identify epigenetically deregulated miRNAs in cancer. Using an integrative approach that combines primary transcription, genome-wide DNA methylation and H3K9Ac marks with microRNA (miRNA) expression, we identified miRNA genes that were epigenetically modified in cancer. We find miR-205, miR-21, and miR-196b to be epigenetically repressed, and miR-615 epigenetically activated in prostate cancer cells. We show that detecting changes in primary miRNA transcription levels is a valuable method for detection of local epigenetic modifications that are associated with changes in mature miRNA expression.

Publication

Human neural networks with sparse TDP-43 pathology reveal NPTX2 misregulation in ALS/FTLD

Publisher: Cold Spring Harbor Laboratory

Date: 09-12-2021

DOI: 10.1101/2021.12.08.471089

Abstract: Human cellular models of neurodegeneration require reproducibility and longevity, which is necessary for simulating these age-dependent diseases. Such systems are particularly needed for TDP-43 proteinopathies 1,2 , which involve human-specific mechanisms 3–6 that cannot be directly studied in animal models. To explore the emergence and consequences of TDP-43 pathologies, we generated iPSC-derived, colony morphology neural stem cells (iCoMoNSCs) via manual selection of neural precursors 7 . Single-cell transcriptomics (scRNA-seq) and comparison to independent NSCs 8 , showed that iCoMoNSCs are uniquely homogenous and self-renewing. Differentiated iCoMoNSCs formed a self-organized multicellular system consisting of synaptically connected and electrophysiologically active neurons, which matured into long-lived functional networks. Neuronal and glial maturation in iCoMoNSC-derived cultures was similar to that of cortical organoids 9 . Overexpression of wild-type TDP-43 in a minority of iCoMoNSC-derived neurons led to progressive fragmentation and aggregation, resulting in loss of function and neurotoxicity. scRNA-seq revealed a novel set of misregulated RNA targets coinciding in both TDP-43 overexpressing neurons and patient brains exhibiting loss of nuclear TDP-43. The strongest misregulated target encoded for the synaptic protein NPTX2, which was consistently misaccumulated in ALS and FTLD patient neurons with TDP-43 pathology. Our work directly links TDP-43 misregulation and NPTX2 accumulation, thereby highlighting a new pathway of neurotoxicity.

Publication

Analysis of Next Generation Sequencing Data Using Integrated Nested Laplace Approximation (INLA)

Publisher: Springer International Publishing

Date: 2014

DOI: 10.1007/978-3-319-07212-8_4

Publication

Systematic Genetic Analysis with Ordered Arrays of Yeast Deletion Mutants

Publisher: American Association for the Advancement of Science (AAAS)

Date: 14-12-2001

DOI: 10.1126/SCIENCE.1065810

Abstract: In Saccharomyces cerevisiae , more than 80% of the ∼6200 predicted genes are nonessential, implying that the genome is buffered from the phenotypic consequences of genetic perturbation. To evaluate function, we developed a method for systematic construction of double mutants, termed synthetic genetic array (SGA) analysis, in which a query mutation is crossed to an array of ∼4700 deletion mutants. Inviable double-mutant meiotic progeny identify functional relationships between genes. SGA analysis of genes with roles in cytoskeletal organization ( BNI1 , ARP2 , ARC40 , BIM1 ), DNA synthesis and repair ( SGS1 , RAD27 ), or uncharacterized functions ( BBC1 , NBP2 ) generated a network of 291 interactions among 204 genes. Systematic application of this approach should produce a global map of gene function.

Publication

Do count-based differential expression methods perform poorly when genes are expressed in only one condition?

Publisher: Springer Science and Business Media LLC

Date: 08-10-2015

DOI: 10.1186/S13059-015-0781-3

Publication

Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability

Publisher: Springer Science and Business Media LLC

Date: 17-05-2023

DOI: 10.1186/S13059-023-02962-5

Abstract: Computational methods represent the lifeblood of modern molecular biology. Benchmarking is important for all methods, but with a focus here on computational methods, benchmarking is critical to dissect important steps of analysis pipelines, formally assess performance across common situations as well as edge cases, and ultimately guide users on what tools to use. Benchmarking can also be important for community building and advancing methods in a principled way. We conducted a meta-analysis of recent single-cell benchmarks to summarize the scope, extensibility, and neutrality, as well as technical features and whether best practices in open data and reproducible research were followed. The results highlight that while benchmarks often make code available and are in principle reproducible, they remain difficult to extend, for ex le, as new methods and new ways to assess methods emerge. In addition, embracing containerization and workflow systems would enhance reusability of intermediate benchmarking results, thus also driving wider adoption.

Publication

Reconstructing an Ancestral Mammalian Immune Supercomplex from a Marsupial Major Histocompatibility Complex

Publisher: Public Library of Science (PLoS)

Date: 31-01-2006

DOI: 10.1371/JOURNAL.PBIO.0040046

Publication

diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering

Publisher: Springer Science and Business Media LLC

Date: 14-05-2019

DOI: 10.1038/S42003-019-0415-5

Abstract: High-dimensional flow and mass cytometry allow cell types and states to be characterized in great detail by measuring expression levels of more than 40 targeted protein markers per cell at the single-cell level. However, data analysis can be difficult, due to the large size and dimensionality of datasets as well as limitations of existing computational methods. Here, we present diffcyt , a new computational framework for differential discovery analyses in high-dimensional cytometry data, based on a combination of high-resolution clustering and empirical Bayes moderated tests adapted from transcriptomics. Our approach provides improved statistical performance, including for rare cell populations, along with flexible experimental designs and fast runtimes in an open-source framework.

Publication

Pro-inflammatory Aorta-Associated Macrophages Are Involved in Embryonic Development of Hematopoietic Stem Cells

Publisher: Elsevier BV

Date: 06-2019

DOI: 10.1016/J.IMMUNI.2019.05.003

Publication

Circulating neutrophil subsets in advanced lung cancer patients exhibit unique immune signature and relate to prognosis

Publisher: Wiley

Date: 19-01-2020

DOI: 10.1096/FJ.201902467R

Publication

Differential Expression for RNA Sequencing (RNA-Seq) Data: Mapping, Summarization, Statistical Analysis, and Experimental Design

Publisher: Springer New York

Date: 22-09-2012

DOI: 10.1007/978-1-4614-0782-9_10

Publication

Bias, robustness and scalability in single-cell differential expression analysis

Publisher: Springer Science and Business Media LLC

Date: 26-02-2018

DOI: 10.1038/NMETH.4612

Abstract: Many methods have been used to determine differential gene expression from single-cell RNA (scRNA)-seq data. We evaluated 36 approaches using experimental and synthetic data and found considerable differences in the number and characteristics of the genes that are called differentially expressed. Prefiltering of lowly expressed genes has important effects, particularly for some of the methods developed for bulk RNA-seq data analysis. However, we found that bulk RNA-seq analysis methods do not generally perform worse than those developed specifically for scRNA-seq. We also present conquer, a repository of consistently processed, analysis-ready public scRNA-seq data sets that is aimed at simplifying method evaluation and reanalysis of published results. Each data set provides abundance estimates for both genes and transcripts, as well as quality control and exploratory analysis reports.

Publication

edgeR for differential RNA-seq and ChIP-seq analysis: an application to stem cell biology

Publisher: Springer New York

Date: 2014

DOI: 10.1007/978-1-4939-0512-6_3

Abstract: The edgeR package, an R-based tool within the Bioconductor project, offers a flexible statistical framework for detection of changes in abundance based on counts. In this chapter, we illustrate the use of edgeR on a human embryonic stem cell dataset, in particular for RNA-seq and ChIP-seq data. We focus on a step-by-step statistical analysis of differential expression, going from raw data to a list of putative differentially expressed genes and give ex les of integrative analysis using the ChIP-seq data. We emphasize data quality spot checks and the use of positive controls throughout the process and give practical recommendations for reproducible research.

Publication

BANDITS: Bayesian differential splicing accounting for sample-to-sample variability and mapping uncertainty

Publisher: Cold Spring Harbor Laboratory

Date: 29-08-2019

DOI: 10.1101/750018

Abstract: Alternative splicing is a biological process during gene expression that allows a single gene to code for multiple proteins. However, splicing patterns can be altered in some conditions or diseases. Here, we present BANDITS, a R/Bioconductor package to perform differential splicing, at both gene and transcript-level, based on RNA-seq data. BANDITS uses a Bayesian hierarchical structure to explicitly model the variability between s les, and treats the transcript allocation of reads as latent variables. We perform an extensive benchmark across both simulated and experimental RNA-seq datasets, where BANDITS has extremely favorable performance with respect to the competitors considered.

Publication

The promise of functional genomics: completing the encyclopedia of a cell.

Publisher: Elsevier BV

Date: 10-2004

DOI: 10.1016/J.MIB.2004.08.015

Publication

Directed shotgun proteomics guided by saturated RNA-seq identifies a complete expressed prokaryotic proteome

Publisher: Cold Spring Harbor Laboratory

Date: 22-07-2013

DOI: 10.1101/GR.151035.112

Abstract: Prokaryotes, due to their moderate complexity, are particularly amenable to the comprehensive identification of the protein repertoire expressed under different conditions. We applied a generic strategy to identify a complete expressed prokaryotic proteome, which is based on the analysis of RNA and proteins extracted from matched s les. Saturated transcriptome profiling by RNA-seq provided an endpoint estimate of the protein-coding genes expressed under two conditions which mimic the interaction of Bartonella henselae with its mammalian host. Directed shotgun proteomics experiments were carried out on four subcellular fractions. By specifically targeting proteins which are short, basic, low abundant, and membrane localized, we could eliminate their initial underrepresentation compared to the estimated endpoint. A total of 1250 proteins were identified with an estimated false discovery rate below 1%. This represents 85% of all distinct annotated proteins and ∼90% of the expressed protein-coding genes. Genes that were detected at the transcript but not protein level, were found to be highly enriched in several genomic islands. Furthermore, genes that lacked an ortholog and a functional annotation were not detected at the protein level these may represent ex les of overprediction in genome annotations. A dramatic membrane proteome reorganization was observed, including differential regulation of autotransporters, adhesins, and hemin binding proteins. Particularly noteworthy was the complete membrane proteome coverage, which included expression of all members of the VirB/D4 type IV secretion system, a key virulence factor.

Publication

The DNA hypermethylation phenotype of colorectal cancer liver metastases resembles that of the primary colorectal cancers

Publisher: Springer Science and Business Media LLC

Date: 06-04-2020

DOI: 10.1186/S12885-020-06777-6

Abstract: Identifying molecular differences between primary and metastatic colorectal cancers—now possible with the aid of omics technologies—can improve our understanding of the biological mechanisms of cancer progression and facilitate the discovery of novel treatments for late-stage cancer. We compared the DNA methylomes of primary colorectal cancers (CRCs) and CRC metastases to the liver. Laser microdissection was used to obtain epithelial tissue (10 to 25 × 10 6 μm 2 ) from sections of fresh-frozen s les of primary CRCs ( n = 6), CRC liver metastases ( n = 12), and normal colon mucosa ( n = 3). DNA extracted from tissues was enriched for methylated sequences with a methylCpG binding domain (MBD) polypeptide-based protocol and subjected to deep sequencing. The performance of this protocol was compared with that of targeted enrichment for bisulfite sequencing used in a previous study of ours. MBD enrichment captured a total of 322,551 genomic regions (249.5 Mb or ~ 7.8% of the human genome), which included over seven million CpG sites. A few of these regions were differentially methylated at an expected false discovery rate (FDR) of 5% in neoplastic tissues (primaries: 0.67%, i.e., 2155 regions containing 279,441 CpG sites liver metastases: 1%, i.e., 3223 regions containing 312,723 CpG sites) as compared with normal mucosa s les. Most of the differentially methylated regions (DMRs 94% in primaries 70% in metastases) were hyper methylated, and almost 80% of these (1882 of 2396) were present in both lesion types. At 5% FDR, no DMRs were detected in liver metastases vs. primary CRC. However, short regions of low-magnitude hypo methylation were frequent in metastases but rare in primaries. Hypermethylated DMRs were far more abundant in sequences classified as intragenic, gene-regulatory, or CpG shelves-shores-island segments, whereas hypomethylated DMRs were equally represented in extragenic (mainly, open-sea) and intragenic (mainly, gene bodies) sequences of the genome. Compared with targeted enrichment, MBD capture provided a better picture of the extension of CRC-associated DNA hypermethylation but was less powerful for identifying hypomethylation. Our findings demonstrate that the hypermethylation phenotype in CRC liver metastases remains similar to that of the primary tumor, whereas CRC-associated DNA hypomethylation probably undergoes further progression after the cancer cells have migrated to the liver.

Publication

Maximizing mutagenesis with solubilized CRISPR-Cas9 ribonucleoprotein complexes.

Publisher: The Company of Biologists

Date: 2016

DOI: 10.1242/DEV.134809

Abstract: CRISPR-Cas9 enables efficient sequence-specific mutagenesis for creating somatic or germline mutants of model organisms. Key constraints in vivo remain the expression and delivery of active Cas9-guideRNA ribonucleoprotein complexes (RNPs) with minimal toxicity, variable mutagenesis efficiencies depending on targeting sequence, and high mutation mosaicism. Here, we apply in vitro-assembled, fluorescent Cas9-sgRNA RNPs in solubilizing salt solution to achieve maximal mutagenesis efficiency in zebrafish embryos. MiSeq-based sequence analysis of targeted loci in in idual embryos using CrispRVariants, a customized software tool for mutagenesis quantification and visualization, reveals efficient bi-allelic mutagenesis that reaches saturation at several tested gene loci. Such virtually complete mutagenesis exposes loss-of-function phenotypes for candidate genes in somatic mutant embryos for subsequent generation of stable germline mutants. We further show that targeting of non-coding elements in gene-regulatory regions using saturating mutagenesis uncovers functional control elements in transgenic reporters and endogenous genes in injected embryos. Our results establish that optimally solubilized, in vitro assembled fluorescent Cas9-sgRNA RNPs provide a reproducible reagent for direct and scalable loss-of-function studies and applications beyond zebrafish experiments that require maximal DNA cutting efficiency in vivo.

Publication

ARPEGGIO: Automated Reproducible Polyploid EpiGenetic GuIdance workflOw

Publisher: Springer Science and Business Media LLC

Date: 17-07-2021

DOI: 10.1186/S12864-021-07845-2

Abstract: Whole genome duplication (WGD) events are common in the evolutionary history of many living organisms. For decades, researchers have been trying to understand the genetic and epigenetic impact of WGD and its underlying molecular mechanisms. Particular attention was given to allopolyploid study systems, species resulting from an hybridization event accompanied by WGD. Investigating the mechanisms behind the survival of a newly formed allopolyploid highlighted the key role of DNA methylation. With the improvement of high-throughput methods, such as whole genome bisulfite sequencing (WGBS), an opportunity opened to further understand the role of DNA methylation at a larger scale and higher resolution. However, only a few studies have applied WGBS to allopolyploids, which might be due to lack of genomic resources combined with a burdensome data analysis process. To overcome these problems, we developed the Automated Reproducible Polyploid EpiGenetic GuIdance workflOw (ARPEGGIO): the first workflow for the analysis of epigenetic data in polyploids. This workflow analyzes WGBS data from allopolyploid species via the genome assemblies of the allopolyploid’s parent species. ARPEGGIO utilizes an updated read classification algorithm (EAGLE-RC), to tackle the challenge of sequence similarity amongst parental genomes. ARPEGGIO offers automation, but more importantly, a complete set of analyses including spot checks starting from raw WGBS data: quality checks, trimming, alignment, methylation extraction, statistical analyses and downstream analyses. A full run of ARPEGGIO outputs a list of genes showing differential methylation. ARPEGGIO was made simple to set up, run and interpret, and its implementation ensures reproducibility by including both package management and containerization. We evaluated ARPEGGIO in two ways. First, we tested EAGLE-RC’s performance with publicly available datasets given a ground truth, and we show that EAGLE-RC decreases the error rate by 3 to 4 times compared to standard approaches. Second, using the same initial dataset, we show agreement between ARPEGGIO’s output and published results. Compared to other similar workflows, ARPEGGIO is the only one supporting polyploid data. The goal of ARPEGGIO is to promote, support and improve polyploid research with a reproducible and automated set of analyses in a convenient implementation. ARPEGGIO is available at upermaxiste/ARPEGGIO .

Publication

12 Grand Challenges in Single-Cell Data Science

Publisher: PeerJ

Date: 23-08-2019

DOI: 10.7287/PEERJ.PREPRINTS.27885V3

Abstract: The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis together, they give rise to the new realm of 'Single-Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single-Cell Data Science' for the coming years.

Publication

Hand2 delineates mesothelium progenitors and is reactivated in mesothelioma

Publisher: Springer Science and Business Media LLC

Date: 30-03-2022

DOI: 10.1038/S41467-022-29311-7

Abstract: The mesothelium lines body cavities and surrounds internal organs, widely contributing to homeostasis and regeneration. Mesothelium disruptions cause visceral anomalies and mesothelioma tumors. Nonetheless, the embryonic emergence of mesothelia remains incompletely understood. Here, we track mesothelial origins in the lateral plate mesoderm (LPM) using zebrafish. Single-cell transcriptomics uncovers a post-gastrulation gene expression signature centered on hand2 in distinct LPM progenitor cells. We map mesothelial progenitors to lateral-most, hand2 -expressing LPM and confirm conservation in mouse. Time-lapse imaging of zebrafish hand2 reporter embryos captures mesothelium formation including pericardium, visceral, and parietal peritoneum. We find primordial germ cells migrate with the forming mesothelium as ventral migration boundary. Functionally, hand2 loss disrupts mesothelium formation with reduced progenitor cells and perturbed migration. In mouse and human mesothelioma, we document expression of LPM-associated transcription factors including Hand2, suggesting re-initiation of a developmental program. Our data connects mesothelium development to Hand2, expanding our understanding of mesothelial pathologies.

Publication

12 Grand challenges in single-cell data science

Publisher: PeerJ

Date: 06-08-2019

DOI: 10.7287/PEERJ.PREPRINTS.27885V1

Abstract: The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis together, they give rise to the new realm of 'Single Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single Cell Data Science' for the coming years.

Publication

12 Grand challenges in single-cell data science

Publisher: PeerJ

Date: 07-08-2019

DOI: 10.7287/PEERJ.PREPRINTS.27885V2

Abstract: The recent upswing of microfluidics and combinatorial indexing strategies, further enhanced by very low sequencing costs, have turned single cell sequencing into an empowering technology analyzing thousands—or even millions—of cells per experimental run is becoming a routine assignment in laboratories worldwide. As a consequence, we are witnessing a data revolution in single cell biology. Although some issues are similar in spirit to those experienced in bulk sequencing, many of the emerging data science problems are unique to single cell analysis together, they give rise to the new realm of 'Single Cell Data Science'. Here, we outline twelve challenges that will be central in bringing this new field forward. For each challenge, the current state of the art in terms of prior work is reviewed, and open problems are formulated, with an emphasis on the research goals that motivate them. This compendium is meant to serve as a guideline for established researchers, newcomers and students alike, highlighting interesting and rewarding problems in 'Single Cell Data Science' for the coming years.

Publication

Relationship between genome and epigenome - challenges and requirements for future research

Publisher: Springer Science and Business Media LLC

Date: 18-06-2014

DOI: 10.1186/1471-2164-15-487

Publication

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

Publisher: Cold Spring Harbor Laboratory

Date: 28-07-2018

DOI: 10.1101/378539

Abstract: Most methods for statistical analysis of RNA-seq data take a matrix of abundance estimates for some type of genomic features as their input, and consequently the quality of any obtained results are directly dependent on the quality of these abundances. Here, we present the junction coverage compatibility (JCC) score, which provides a way to evaluate the reliability of transcript-level abundance estimates as well as the accuracy of transcript annotation catalogs. It works by comparing the observed number of reads spanning each annotated splice junction in a genomic region to the predicted number of junction-spanning reads, inferred from the estimated transcript abundances and the genomic coordinates of the corresponding annotated transcripts. We show that while most genes show good agreement between the observed and predicted junction coverages, there is a small set of genes that do not. Genes with poor agreement are found regardless of the method used to estimate transcript abundances, and the corresponding transcript abundances should be treated with care in any downstream analyses.

Publication

stageR: a general stage-wise method for controlling the gene-level false discovery rate in differential expression and differential transcript usage

Publisher: Springer Science and Business Media LLC

Date: 07-08-2017

DOI: 10.1186/S13059-017-1277-0

Publication

Statistical methods for detecting differentially methylated loci and regions

Publisher: Frontiers Media SA

Date: 16-09-2014

DOI: 10.3389/FGENE.2014.00324

Publication

Active receptor tyrosine kinases, but not Brachyury, are sufficient to trigger chordoma in zebrafish

Publisher: No publisher found

Date: 2019

DOI: 10.1242/DMM.039545

Publication

Count-based differential expression analysis of RNA sequencing data using R and Bioconductor

Publisher: Springer Science and Business Media LLC

Date: 22-08-2013

DOI: 10.1038/NPROT.2013.099

Abstract: RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations) while optionally adjusting for other systematic factors that affect the data-collection process. There are a number of subtle yet crucial aspects of these analyses, such as read counting, appropriate treatment of biological variability, quality control checks and appropriate setup of statistical modeling. Several variations have been presented in the literature, and there is a need for guidance on current best practices. This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR. Hands-on time for typical small experiments (e.g., 4-10 s les) can be <1 h, with computation time <1 d using a standard desktop PC.

Publication

Origin and differentiation trajectories of fibroblastic reticular cells in the splenic white pulp

Publisher: Springer Science and Business Media LLC

Date: 15-04-2019

DOI: 10.1038/S41467-019-09728-3

Abstract: The splenic white pulp is underpinned by poorly characterized stromal cells that demarcate distinct immune cell microenvironments. Here we establish fibroblastic reticular cell (FRC)-specific fate-mapping in mice to define their embryonic origin and differentiation trajectories. Our data show that all reticular cell subsets descend from multipotent progenitors emerging at embryonic day 19.5 from periarterial progenitors. Commitment of FRC progenitors is concluded during the first week of postnatal life through occupation of niches along developing central arterioles. Single cell transcriptomic analysis facilitated deconvolution of FRC differentiation trajectories and indicated that perivascular reticular cells function both as adult lymphoid organizer cells and mural cell progenitors. The lymphotoxin-β receptor-independent sustenance of postnatal progenitor stemness unveils that systemic immune surveillance in the splenic white pulp is governed through subset specification of reticular cells from a multipotent periarterial progenitor cell. In sum, the finding that discrete signaling events in perivascular niches determine the differentiation trajectories of reticular cell networks explains the development of distinct microenvironmental niches in secondary and tertiary lymphoid tissues that are crucial for the induction and regulation of innate and adaptive immune processes.

Publication

Conserved stromal–immune cell circuits secure B cell homeostasis and function

Publisher: Springer Science and Business Media LLC

Date: 18-05-2023

DOI: 10.1038/S41590-023-01503-3

Abstract: B cell zone reticular cells (BRCs) form stable microenvironments that direct efficient humoral immunity with B cell priming and memory maintenance being orchestrated across lymphoid organs. However, a comprehensive understanding of systemic humoral immunity is h ered by the lack of knowledge of global BRC sustenance, function and major pathways controlling BRC–immune cell interactions. Here we dissected the BRC landscape and immune cell interactome in human and murine lymphoid organs. In addition to the major BRC subsets underpinning the follicle, including follicular dendritic cells, PI16 + RCs were present across organs and species. As well as BRC-produced niche factors, immune cell-driven BRC differentiation and activation programs governed the convergence of shared BRC subsets, overwriting tissue-specific gene signatures. Our data reveal that a canonical set of immune cell-provided cues enforce bidirectional signaling programs that sustain functional BRC niches across lymphoid organs and species, thereby securing efficient humoral immunity.

Publication

pubassistant.ch: consolidating publication profiles of researchers

Publisher: F1000 Research Ltd

Date: 30-09-2021

DOI: 10.12688/F1000RESEARCH.73493.1

Abstract: Online accounts to keep track of scientific publications, such as Open Researcher and Contributor ID (ORCID) or Google Scholar, can be time consuming to maintain and synchronize. Furthermore, the open access status of publications is often not easily accessible, hindering potential opening of closed publications. To lessen the burden of managing personal profiles, we developed a R shiny app that allows publication lists from multiple platforms to be retrieved and consolidated, as well as interactive exploration and comparison of publication profiles. A live version can be found at pubassistant.ch.

Publication

pubassistant.ch: consolidating publication profiles of researchers

Publisher: F1000 Research Ltd

Date: 20-12-2021

DOI: 10.12688/F1000RESEARCH.73493.2

Abstract: Online accounts to keep track of scientific publications, such as Open Researcher and Contributor ID (ORCID) or Google Scholar, can be time consuming to maintain and synchronize. Furthermore, the open access status of publications is often not easily accessible, hindering potential opening of closed publications. To lessen the burden of managing personal profiles, we developed a R shiny app that allows publication lists from multiple platforms to be retrieved and consolidated, as well as interactive exploration and comparison of publication profiles. A live version can be found at pubassistant.ch.

Publication

Mass cytometric and transcriptomic profiling of epithelial-mesenchymal transitions in human mammary cell lines

Publisher: Springer Science and Business Media LLC

Date: 09-02-2022

DOI: 10.1038/S41597-022-01137-4

Abstract: Epithelial-mesenchymal transition (EMT) equips breast cancer cells for metastasis and treatment resistance. However, detection, inhibition, and elimination of EMT-undergoing cells is challenging due to the intrinsic heterogeneity of cancer cells and the phenotypic ersity of EMT programs. We comprehensively profiled EMT transition phenotypes in four non-cancerous human mammary epithelial cell lines using a flow cytometry surface marker screen, RNA sequencing, and mass cytometry. EMT was induced in the HMLE and MCF10A cell lines and in the HMLE-Twist-ER and HMLE-Snail-ER cell lines by prolonged exposure to TGFβ1 or 4-hydroxytamoxifen, respectively. Each cell line exhibited a spectrum of EMT transition phenotypes, which we compared to the steady-state phenotypes of fifteen luminal, HER2-positive, and basal breast cancer cell lines. Our data provide multiparametric insights at single-cell level into the phenotypic ersity of EMT at different time points and in four human cellular models. These insights are valuable to better understand the complexity of EMT, to compare EMT transitions between the cellular models used here, and for the design of EMT time course experiments.

Publication

CellMixS: quantifying and visualizing batch effects in single cell RNA-seq data

Publisher: Cold Spring Harbor Laboratory

Date: 11-12-2020

DOI: 10.1101/2020.12.11.420885

Abstract: A key challenge in single cell RNA-sequencing (scRNA-seq) data analysis are dataset- and batch-specific differences that can obscure the biological signal of interest. While there are various tools and methods to perform data integration and correct for batch effects, their performance can vary between datasets and according to the nature of the bias. Therefore, it is important to understand how batch effects manifest in order to adjust for them in a reliable way. Here, we systematically explore batch effects in a variety of scRNA-seq datasets according to magnitude, cell type specificity and complexity. We developed a cell-specific mixing score ( cms ) that quantifies how well cells from multiple batches are mixed. By considering distance distributions (in a lower dimensional space), the score is able to detect local batch bias and differentiate between unbalanced batches (i.e., when one cell type is more abundant in a batch) and systematic differences between cells of the same cell type. We implemented cms and related metrics to detect batch effects or measure structure preservation in the CellMixS R/Bioconductor package. We systematically compare different metrics that have been proposed to quantify batch effects or bias in scRNA-seq data using real datasets with known batch effects and synthetic data that mimic various real data scenarios. While these metrics target the same question and are used interchangeably, we find differences in inter- and intra-dataset scalability, sensitivity and in a metric’s ability to handle batch effects with differentially abundant cell types. We find that cell-specific metrics outperform cell type-specific and global metrics and recommend them for both method benchmarks and batch exploration.

Publication

Small RNA-seq analysis of single porcine blastocysts revealed that maternal estradiol-17beta exposure does not affect miRNA isoform (isomiR) expression

Publisher: Springer Science and Business Media LLC

Date: 06-08-2018

DOI: 10.1186/S12864-018-4954-9

Publication

Copy-number-aware differential analysis of quantitative DNA sequencing data

Publisher: Cold Spring Harbor Laboratory

Date: 09-08-2012

DOI: 10.1101/GR.139055.112

Abstract: Developments in microarray and high-throughput sequencing (HTS) technologies have resulted in a rapid expansion of research into epigenomic changes that occur in normal development and in the progression of disease, such as cancer. Not surprisingly, copy number variation (CNV) has a direct effect on HTS read densities and can therefore bias differential detection results. We have developed a flexible approach called ABCD-DNA (affinity-based copy-number-aware differential quantitative DNA sequencing analyses) that integrates CNV and other systematic factors directly into the differential enrichment engine.

Publication

miRNA-Seq normalization comparisons need improvement

Publisher: Cold Spring Harbor Laboratory

Date: 24-04-2013

DOI: 10.1261/RNA.037895.112

Publication

pubassistant.ch: consolidating publication profiles of researchers

Publisher: F1000 Research Ltd

Date: 13-04-2022

DOI: 10.12688/F1000RESEARCH.73493.3

Abstract: Online accounts to keep track of scientific publications, such as Open Researcher and Contributor ID (ORCID) or Google Scholar, can be time consuming to maintain and synchronize. Furthermore, the open access status of publications is often not easily accessible, hindering potential opening of closed publications. To lessen the burden of managing personal profiles, we developed a R shiny app that allows publication lists from multiple platforms to be retrieved and consolidated, as well as interactive exploration and comparison of publication profiles. A live version can be found at pubassistant.ch.

Publication

On the discovery of subpopulation-specific state transitions from multi-sample multi-condition single-cell RNA sequencing data

Publisher: Cold Spring Harbor Laboratory

Date: 26-07-2019

DOI: 10.1101/713412

Abstract: Single-cell RNA sequencing (scRNA-seq) has quickly become an empowering technology to profile the transcriptomes of in idual cells on a large scale. Many early analyses of differential expression have aimed at identifying differences between subpopulations, and thus are focused on finding subpopulation markers either in a single s le or across multiple s les. More generally, such methods can compare expression levels in multiple sets of cells, thus leading to cross-condition analyses. However, given the emergence of replicated multi-condition scRNA-seq datasets, an area of increasing focus is making s le-level inferences, termed here as differential state analysis. For ex le, one could investigate the condition-specific responses of cell subpopulations measured from patients from each condition however, it is not clear which statistical framework best handles this situation. In this work, we surveyed the methods available to perform cross-condition differential state analyses, including cell-level mixed models and methods based on aggregated “pseudobulk” data. We developed a flexible simulation platform that mimics both single and multi-s le scRNA-seq data and provide robust tools for multi-condition analysis within the muscat R package.

Publication

Definition and characterization of a trypsinosome from specific peptide characteristics by nano-HPLC-MS/MS and in silico analysis of complex protein mixtures.

Publisher: American Chemical Society (ACS)

Date: 20-10-2004

DOI: 10.1021/PR049909X

Abstract: Although HPLC-ESI-MS/MS is rapidly becoming an indispensable tool for the analysis of peptides in complex mixtures, the sequence coverage it affords is often quite poor. Low protein expression resulting in peptide signal intensities that fall below the limit of detection of the MS system in combination with differences in peptide ionization efficiency plays a significant role in this. A second important factor stems from differences in physicochemical properties of each peptide and how these properties relate to chromatographic retention and ultimate detection. To identify and understand those properties, we compared data from experimentally identified peptides with data from peptides predicted by in silico digest of all corresponding proteins in the experimental set. Three different complex protein mixtures extracted were used to define a training set to evaluate the amino acid retention coefficients based on linear regression analysis. The retention coefficients were also compared with other previous hydrophobic and retention scale. From this, we have constructed an empirical model that can be readily used to predict peptides that are likely to be observed on our HPLC-ESI-MS/MS system based on their physicochemical properties. Finally, we demonstrated that in silico prediction of peptides and their retention coefficients can be used to generate an inclusion list for a targeted mass spectrometric identification of low abundance proteins in complex protein s les. This approach is based on experimentally derived data to calibrate the method and therefore may theoretically be applied to any HPLC-MS/MS system on which data are being generated.

Publication

Scholarly Communication Practices in Humanities and Social Sciences: A Study of Researchers’ Attitudes and Awareness of Open Access

Publisher: Walter de Gruyter GmbH

Date: 12-2018

DOI: 10.1515/OPIS-2018-0013

Abstract: This paper examines issues relating to the perceptions and adoption of open access (OA) and institutional repositories. Using a survey research design, we collected data from academics and other researchers in the humanities, arts and social sciences (HASS) at a university in Australia. We looked at factors influencing choice of publishers and journal outlets, as well as the use of social media and nontraditional channels for scholarly communication. We used an online questionnaire to collect data and used descriptive statistics to analyse the data. Our findings suggest that researchers are highly influenced by traditional measures of quality, such as journal impact factor, and are less concerned with making their work more findable and promoting it through social media. This highlights a disconnect between researchers’ desired outcomes and the efforts that they put in toward the same. Our findings also suggest that institutional policies have the potential to increase OA awareness and adoption. This study contributes to the growing literature on scholarly communication by offering evidence from the HASS field, where limited studies have been conducted. Based on the findings, we recommend that academic librarians engage with faculty through outreach and workshops to change perceptions of OA and the institutional repository.

Publication

Methylome sequencing in triple-negative breast cancer reveals distinct methylation clusters with prognostic value

Publisher: Springer Science and Business Media LLC

Date: 02-02-2015

DOI: 10.1038/NCOMMS6899

Abstract: Epigenetic alterations in the cancer methylome are common in breast cancer and provide novel options for tumour stratification. Here, we perform whole-genome methylation capture sequencing on small amounts of DNA isolated from formalin-fixed, paraffin-embedded tissue from triple-negative breast cancer (TNBC) and matched normal s les. We identify differentially methylated regions (DMRs) enriched with promoters associated with transcription factor binding sites and DNA hypersensitive sites. Importantly, we stratify TNBCs into three distinct methylation clusters associated with better or worse prognosis and identify 17 DMRs that show a strong association with overall survival, including DMRs located in the Wilms tumour 1 (WT1) gene, bi-directional-promoter and antisense WT1-AS. Our data reveal that coordinated hypermethylation can occur in oestrogen receptor-negative disease, and that characterizing the epigenetic framework provides a potential signature to stratify TNBCs. Together, our findings demonstrate the feasibility of profiling the cancer methylome with limited archival tissue to identify regulatory regions associated with cancer.

Publication

benchmarkR: an R package for benchmarking genome-scale methods

Publisher: Cold Spring Harbor Laboratory

Date: 17-04-2015

DOI: 10.1101/018200

Abstract: benchmarkR is an R package designed to assess and visualize the performance of statistical methods for datasets that have an independent truth (e.g., simulations or datasets with large-scale validation), in particular for methods that claim to control false discovery rates (FDR). We augment some of the standard performance plots (e.g., receiver operating characteristic, or ROC, curves) with information about how well the methods are calibrated (i.e., whether they achieve their expected FDR control). For ex le, performance plots are extended with a point to highlight the power or FDR at a user-set threshold (e.g., at a method's estimated 5% FDR). The package contains general containers to store simulation results (SimResults) and methods to create graphical summaries, such as receiver operating characteristic curves (rocX), false discovery plots (fdX) and power-to-achieved FDR plots (powerFDR) each plot is augmented with some form of calibration information. We find these plots to be an improved way to interpret relative performance of statistical methods for genomic datasets where many hypothesis tests are performed. The strategies, however, are general and will find applications in other domains.

Publication

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single cell RNA-seq preprocessing tools

Publisher: Springer Science and Business Media LLC

Date: 09-2020

DOI: 10.1186/S13059-020-02136-7

Abstract: We present pipeComp ( lger ipeComp ), a flexible R framework for pipeline comparison handling interactions between analysis steps and relying on multi-level evaluation metrics. We apply it to the benchmark of single-cell RNA-sequencing analysis pipelines using simulated and real datasets with known cell identities, covering common methods of filtering, doublet detection, normalization, feature selection, denoising, dimensionality reduction, and clustering. pipeComp can easily integrate any other step, tool, or evaluation metric, allowing extensible benchmarks and easy applications to other fields, as we demonstrate through a study of the impact of removal of unwanted variation on differential expression analysis.

Publication

A cis-regulatory element promoting increased transcription at low temperature in cultured ectothermic Drosophila cells

Publisher: Springer Science and Business Media LLC

Date: 28-10-2021

DOI: 10.1186/S12864-021-08057-4

Abstract: Temperature change affects the myriad of concurrent cellular processes in a non-uniform, disruptive manner. While endothermic organisms minimize the challenge of ambient temperature variation by keeping the core body temperature constant, cells of many ectothermic species maintain homeostatic function within a considerable temperature range. The cellular mechanisms enabling temperature acclimation in ectotherms are still poorly understood. At the transcriptional level, the heat shock response has been analyzed extensively. The opposite, the response to sub-optimal temperature, has received lesser attention in particular in animal species. The tissue specificity of transcriptional responses to cool temperature has not been addressed and it is not clear whether a prominent general response occurs. Cis -regulatory elements (CREs), which mediate increased transcription at cool temperature, and responsible transcription factors are largely unknown. The ectotherm Drosophila melanogaster with a presumed temperature optimum around 25 °C was used for transcriptomic analyses of effects of temperatures at the lower end of the readily tolerated range (14–29 °C). Comparative analyses with adult flies and cell culture lines indicated a striking degree of cell-type specificity in the transcriptional response to cool. To identify potential cis -regulatory elements (CREs) for transcriptional upregulation at cool temperature, we analyzed temperature effects on DNA accessibility in chromatin of S2R+ cells. Candidate cis -regulatory elements (CREs) were evaluated with a novel reporter assay for accurate assessment of their temperature-dependency. Robust transcriptional upregulation at low temperature could be demonstrated for a fragment from the pastrel gene, which expresses more transcript and protein at reduced temperatures. This CRE is controlled by the JAK/STAT signaling pathway and antagonizing activities of the transcription factors Pointed and Ets97D. Beyond a rich data resource for future analyses of transcriptional control within the readily tolerated range of an ectothermic animal, a novel reporter assay permitting quantitative characterization of CRE temperature dependence was developed. Our identification and functional dissection of the pst _E1 enhancer demonstrate the utility of resources and assay. The functional characterization of this CoolUp enhancer provides initial mechanistic insights into transcriptional upregulation induced by a shift to temperatures at the lower end of the readily tolerated range.

Publication

Supervised spatial inference of dissociated single-cell data with SageNet

Publisher: Cold Spring Harbor Laboratory

Date: 15-04-2022

DOI: 10.1101/2022.04.14.488419

Abstract: Spatially-resolved transcriptomics uncovers patterns of gene expression at supercellular, cellular, or subcellular resolution, providing insights into spatially variable cellular functions, diffusible morphogens, and cell-cell interactions. However, for practical reasons, multiplexed single cell RNA-sequencing remains the most widely used technology for profiling transcriptomes of single cells, especially in the context of large-scale anatomical atlassing. Devising techniques to accurately predict the latent physical positions as well as the latent cell-cell proximities of such dissociated cells, represents an exciting and new challenge. Most of the current approaches rely on an ‘autocorrelation’ assumption, i.e., cells with similar transcriptomic profiles are located close to each other in physical space and vice versa. However, this is not always the case in native biological contexts due to complex morphological and functional patterning. To address this challenge, we developed SageNet, a graph neural network approach that spatially reconstructs dissociated single cell data using one or more spatial references. SageNet first estimates a gene-gene interaction network from a reference spatial dataset. This informs the structure of the graph on which the graph neural network is trained to predict the region of dissociated cells. Finally, SageNet produces a low-dimensional embedding of the query dataset, corresponding to the reconstructed spatial coordinates of the dissociated tissue. Furthermore, SageNet reveals spatially informative genes by extracting the most important features from the neural network model. We demonstrate the utility and robust performance of SageNet using molecule-resolved seqFISH and spot-based Spatial Transcriptomics reference datasets as well as dissociated single-cell data, across multiple biological contexts. SageNet is provided as an open-source python software package at github.com/MarioniLab/SageNet .

Publication

Wnt inhibitory factor 1 (WIF1) is a marker of osteoblastic differentiation stage and is not silenced by DNA methylation in osteosarcoma

Publisher: Elsevier BV

Date: 04-2015

DOI: 10.1016/J.BONE.2014.12.063

Abstract: Wnt pathway targeting is of high clinical interest for treating bone loss disorders such as osteoporosis. These therapies inhibit the action of negative regulators of osteoblastic Wnt signaling. The report that Wnt inhibitory factor 1 (WIF1) was epigenetically silenced via promoter DNA methylation in osteosarcoma (OS) raised potential concerns for such treatment approaches. Here we confirm that Wif1 expression is frequently reduced in OS. However, we demonstrate that silencing is not driven by DNA methylation. Treatment of mouse and human OS cells showed that Wif1 expression was robustly induced by HDAC inhibition but not by methylation inhibition. Consistent with HDAC dependent silencing, the Wif1 locus in OS was characterized by low acetylation levels and a bivalent H3K4/H3K27-trimethylation state. Wif1 expression marked late stages of normal osteoblast maturation and stratified OS tumors based on differentiation stage across species. Culture of OS cells under differentiation inductive conditions increased expression of Wif1. Together these results demonstrate that Wif1 is not targeted for silencing by DNA methylation in OS. Instead, the reduced expression of Wif1 in OS cells is in context with their stage in differentiation.

Publication

Correction: TNFR2 induced priming of the inflammasome leads to a RIPK1-dependent cell death in the absence of XIAP

Publisher: Springer Science and Business Media LLC

Date: 23-01-2020

DOI: 10.1038/S41419-020-2261-2

Abstract: The original version of this article contained an error in the name of one of the co-authors (Erika Owsley). This has been corrected in the PDF and HTML versions.

Mark Robinson

Researcher

Related Links

Publications

Common Features of Regulatory T Cell Specialization During Th1 Responses

An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets

A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets [version 1; referees: awaiting peer review]

CyTOF workflow: differential discovery in high-throughput high-dimensional cytometry datasets

SampleQC: robust multivariate, multi-cell type, multi-sample quality control for single-cell data

A reference single-cell map of freshly dissociated human synovium in inflammatory arthritis with an optimized dissociation protocol for prospective synovial biopsy collection

Benchmarking comes of age

H3K4me3 enrichment defines neuronal age, while a youthful H3K27ac signature is recapitulated in aged neurons

Shedding Light on the Transcriptomic Dark Matter in Biological Psychiatry: Role of Long Noncoding RNAs in D-cycloserine-Induced Fear Extinction in Posttraumatic Stress Disorder

Highly efficient DNA-free gene disruption in the agricultural pestCeratitis capitataby CRISPR-Cas9 RNPs

A general and powerful stage-wise testing procedure for differential expression and differential transcript usage

Chromothripsis-like patterns are recurring but heterogeneously distributed features in a survey of 22,347 cancer genome screens

A systematic performance evaluation of clustering methods for single-cell RNA-seq data

A systematic performance evaluation of clustering methods for single-cell RNA-seq data

distinct: a novel approach to differential distribution analyses

Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data

Global landscape of protein complexes in the yeast Saccharomyces cerevisiae

Phase I Trial Characterizing the Pharmacokinetic Profile of N-803, a Chimeric IL-15 Superagonist, in Healthy Volunteers

pipeComp, a general framework for the evaluation of computational pipelines, reveals performant single-cell RNA-seq preprocessing tools

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Computational epigenomics: challenges and opportunities

Robustly detecting differential expression in RNA sequencing data using observation weights

RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis

A Panoramic View of Yeast Noncoding RNA Processing

Gapless provides combined scaffolding, gap filling, and assembly correction with long reads

T-cell acute leukaemia exhibits dynamic interactions with bone marrow microenvironments

BAZ2A (TIP5) is involved in epigenetic alterations in prostate cancer and its overexpression predicts disease recurrence

Carbon brainprint – An estimate of the intellectual contribution of research institutions to reducing greenhouse gas emissions

LincRNAs involved in DCS-induced fear extinction: Shedding light on the transcriptomic dark matter

Are Epigenetic Factors Implicated in Chronic Widespread Pain?

Transformation of alignment files improves performance of variant callers for long-read RNA sequencing data

Essential guidelines for computational method benchmarking

zingeR: unlocking RNA-seq tools for zero-inflation and single cell applications

A junction coverage compatibility score to quantify the reliability of transcript abundance estimates and annotation catalogs

De novo assembly and sex-specific transcriptome profiling in the sand fly Phlebotomus perniciosus (Diptera, Phlebotominae), a major Old World vector of Leishmania infantum

BayMeth: improved DNA methylation quantification for affinity capture sequencing data using a flexible Bayesian approach

Maleness-on-the-Y ( MoY ) orchestrates male sex determination in major agricultural fruit fly pests

Mass Cytometric and Transcriptomic Profiling of Epithelial-Mesenchymal Transitions in Human Mammary Cell Lines

SampleQC: robust multivariate, multi-celltype, multi-sample quality control for single cell data

DifferentialRegulation:a Bayesian hierarchical approach to identify differentially regulated genes

From RNA-seq reads to differential expression results

Evaluation of affinity-based genome-wide DNA methylation data: Effects of CpG density, amplification bias, and copy number variation

Doublet identification in single-cell sequencing data using scDblFinder

Observation weights unlock bulk RNA-seq tools for zero inflation and single-cell applications

Doublet identification in single-cell sequencing data using scDblFinder

Built on sand: the shaky foundations of simulating single-cell RNA sequencing data

Faithful mRNA splicing depends on the Prp19 complex subunitfaint sausageand is required for tracheal branching morphogenesis inDrosophila

Abscisic acid is a substrate of the ABC transporter encoded by the durable wheat disease resistance gene Lr34

Author Correction: High-dimensional single-cell analysis predicts response to anti-PD-1 immunotherapy

Observation weights to unlock bulk RNA-seq tools for zero inflation and single-cell applications

Channel crosstalk correction in suspension and imaging mass cytometry

The hematopoietic oncoprotein FOXP1 promotes tumor cell survival in diffuse large B-cell lymphoma by repressing S1PR2 signaling

ALT-803, an IL-15 superagonist, in combination with nivolumab in patients with metastatic non-small cell lung cancer: a non-randomised, open-label, phase 1b trial

Validation of hypermethylated DNA regions found in colorectal cancers as potential aging-independent biomarkers of precancerous colorectal lesions

Meta-analysis of (single-cell method) benchmarks reveals the need for extensibility and interoperability

Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data

Male sex in houseflies is determined by Mdmd , a paralog of the generic splice factor gene CWC22

Differential splicing using whole-transcript microarrays

Do count-based differential expression methods perform poorly when genes are expressed in only one condition?

High-Definition Macromolecular Composition of Yeast RNA-Processing Complexes

TAR Syndrome-associated Rbm8a deficiency causes hematopoietic defects and attenuates Wnt/PCP signaling

ARMOR: An Automated Reproducible MOdular Workflow for Preprocessing and Differential Analysis of RNA-seq Data

censcyt: censored covariates in differential abundance analysis in cytometry

RNA sequencing data: hitchhiker's guide to expression analysis

RNA sequencing data: hitchhiker's guide to expression analysis

Fibroblastic reticular cells initiate immune responses in visceral adipose tissues and secure peritoneal immunity

A comprehensive examination of Nanopore native RNA sequencing for characterization of complex transcriptomes

Mammalian Annotation Database for improved annotation and functional classification of Omics datasets from less well-annotated organisms

iCOBRA: open, reproducible, standardized and live method benchmarking

Eleven grand challenges in single-cell data science

Gapless provides combined scaffolding, gap filling and assembly correction with long reads

Loss of the Notch effector RBPJ promotes tumorigenesis

The Spinal Transcriptome after Cortical Stroke: In Search of Molecular Factors Regulating Spontaneous Recovery in the Spinal Cord

`edgeR`: a Bioconductor package for differential expression analysis of digital gene expression data

`Repitools`: an R package for the analysis of enrichment-based epigenomic data