ARDC Research Link Australia

Publication

JAFFAL: Detecting fusion genes with long read transcriptome sequencing

Publisher: Cold Spring Harbor Laboratory

Date: 26-04-2021

DOI: 10.1101/2021.04.26.441398

Abstract: Massively parallel short read transcriptome sequencing has greatly expanded our knowledge of fusion genes which are drivers of tumor initiation and progression. In cancer, many fusions are also important diagnostic markers and targets for therapy. Long read transcriptome sequencing allows the full length of fusion transcripts to be discovered, however, this data has a high rate of errors and fusion finding algorithms designed for short reads do not work. While numerous fusion finding algorithms now exist for short read RNA sequencing data, there are few methods to detect fusions using third generation or long read sequencing data. Fusion finding in long read sequencing will allow the discovery of the full isoform structure of fusion genes. Here we present JAFFAL, a method to identify fusions from long-read transcriptome sequencing. We validated JAFFAL using simulation, cell line and patient data from Nanopore and PacBio. We show that fusions can be accurately detected in long read data with JAFFAL, providing better accuracy than other long read fusion finders and with similar performance as state-of-the-art methods applied to short read data. By comparing Nanopore transcriptome sequencing protocols we find that numerous chimeric molecules are generated during cDNA library preparation that are absent when RNA is sequenced directly. We demonstrate that JAFFAL enables fusions to be detected at the level of in idual cells, when applied to long read single cell sequencing. Moreover, we demonstrate JAFFAL can identify fusions spanning three genes, highlighting the utility of long reads to characterise the transcriptional products of complex structural rearrangements with unprecedented resolution. JAFFAL is open source and available as part of the JAFFA package at github.com/Oshlack/JAFFA/wiki .

Publication

A Combination of Genomic Approaches Reveals the Role of FOXO1a in Regulating an Oxidative Stress Response Pathway

Publisher: Public Library of Science (PLoS)

Date: 27-02-2008

DOI: 10.1371/JOURNAL.PONE.0001670

Publication

Multimodal single cell analysis of the paediatric lower airway reveals novel immune cell phenotypes in early life health and disease

Publisher: Cold Spring Harbor Laboratory

Date: 17-06-2022

DOI: 10.1101/2022.06.17.496207

Abstract: Inflammation is a key driver of cystic fibrosis (CF) lung disease, not addressed by current standard care. Improved understanding of the mechanisms leading to aberrant inflammation may assist the development of effective anti-inflammatory therapy. Single-cell RNA sequencing (scRNA-seq) allows profiling of cell composition and function at previously unprecedented resolution. Herein, we seek to use multimodal single-cell analysis to comprehensively define immune cell phenotypes, proportions and functional characteristics in preschool children with CF. We analyzed 42,658 cells from bronchoalveolar lavage of 11 preschool children with CF and a healthy control using scRNA-seq and parallel assessment of 154 cell surface proteins. Validation of cell types identified by scRNA-seq was achieved by assessment of s les by spectral flow cytometry. Analysis of transcriptome expression and cell surface protein expression, combined with functional pathway analysis, revealed 41 immune and epithelial cell populations in BAL. Spectral flow cytometry analysis of over 256,000 cells from a subset of the same patients revealed high correlation in major cell type proportions across the two technologies. Macrophages consisted of 13 functionally distinct sub populations, including previously undescribed populations enriched for markers of vesicle production and regulatory/repair functions. Other novel cell populations included CD4 T cells expressing inflammatory IFNα/β and NFκB signalling genes. Our work provides a comprehensive cellular analysis of the pediatric lower airway in preschool children with CF, reveals novel cell types and provides a reference for investigation of inflammation in early life CF.

Publication

Co-option of the cardiac transcription factor Nkx2.5 during development of the emu wing

Publisher: Springer Science and Business Media LLC

Date: 25-07-2017

DOI: 10.1038/S41467-017-00112-7

Abstract: The ratites are a distinctive clade of flightless birds, typified by the emu and ostrich that have acquired a range of unique anatomical characteristics since erging from basal Aves at least 100 million years ago. The emu possesses a vestigial wing with a single digit and greatly reduced forelimb musculature. However, the embryological basis of wing reduction and other anatomical changes associated with loss of flight are unclear. Here we report a previously unknown co-option of the cardiac transcription factor Nkx2.5 to the forelimb in the emu embryo, but not in ostrich, or chicken and zebra finch, which have fully developed wings. Nkx2.5 is expressed in emu limb bud mesenchyme and maturing wing muscle, and mis-expression of Nkx2.5 throughout the limb bud in chick results in wing reductions. We propose that Nkx2.5 functions to inhibit early limb bud expansion and later muscle growth during development of the vestigial emu wing.

Publication

Nephron progenitor commitment is a stochastic process influenced by cell migration.

Publisher: eLife Sciences Publications, Ltd

Date: 24-01-2019

DOI: 10.7554/ELIFE.41156

Abstract: Progenitor self-renewal and differentiation is often regulated by spatially restricted cues within a tissue microenvironment. Here, we examine how progenitor cell migration impacts regionally induced commitment within the nephrogenic niche in mice. We identify a subset of cells that express Wnt4, an early marker of nephron commitment, but migrate back into the progenitor population where they accumulate over time. Single cell RNA-seq and computational modelling of returning cells reveals that nephron progenitors can traverse the transcriptional hierarchy between self-renewal and commitment in either direction. This plasticity may enable robust regulation of nephrogenesis as niches remodel and grow during organogenesis.

Publication

Transcriptional profiles for distinct aggregation states of mutant Huntingtin exon 1 protein unmask new Huntington's disease pathways

Publisher: Elsevier BV

Date: 09-2017

DOI: 10.1016/J.MCN.2017.07.004

Abstract: Huntington's disease is caused by polyglutamine (polyQ)-expansion mutations in the CAG tandem repeat of the Huntingtin gene. The central feature of Huntington's disease pathology is the aggregation of mutant Huntingtin (Htt) protein into micrometer-sized inclusion bodies. Soluble mutant Htt states are most proteotoxic and trigger an enhanced risk of death whereas inclusions confer different changes to cellular health, and may even provide adaptive responses to stress. Yet the molecular mechanisms underpinning these changes remain unclear. Using the flow cytometry method of pulse-shape analysis (PulSA) to sort neuroblastoma (Neuro2a) cells enriched with mutant or wild-type Htt into different aggregation states, we clarified which transcriptional signatures were specifically attributable to cells before versus after inclusion assembly. D ened CREB signalling was the most striking change overall and invoked specifically by soluble mutant Httex1 states. Toxicity could be rescued by stimulation of CREB signalling. Other biological processes mapped to different changes before and after aggregation included NF-kB signalling, autophagy, SUMOylation, transcription regulation by histone deacetylases and BRD4, NAD+ biosynthesis, ribosome biogenesis and altered HIF-1 signalling. These findings open the path for therapeutic strategies targeting key molecular changes invoked prior to, and subsequently to, Httex1 aggregation.

Publication

The nature of the optical emission in radio-selected AGN

Publisher: WORLD SCIENTIFIC

Date: 10-2004

DOI: 10.1142/9789812702432_0029

Publication

STRipy: a graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data

Publisher: Cold Spring Harbor Laboratory

Date: 14-06-2021

DOI: 10.1101/2021.06.13.448220

Abstract: Short tandem repeats (STRs) are highly polymorphic with high mutation rates and expansions of STRs have been implicated as the causal variant in diseases. The application of genome sequencing in patients has recently allowed many new discoveries with over 50 disease causing loci known to date. There are several tools which allow genotyping of STRs from high-throughput sequencing (HTS) data. However, running these tools out of the box only allow around half of the known disease-causing loci to be genotyped, with lengths often limited to either read or fragment length which is less than the pathogenic cut-off for some diseases. While analysis tools can be customised to genotype extra loci, this requires proficiency in bioinformatics to set up, use, and analyse the resulting data, limiting their widespread usage by other researchers and clinicians. To address these issues, we have created a new software called STRipy that has an intuitive graphical interface and requires no specific skills for usage, thus significantly simplifying detection of STRs expansions from human HTS data. STRipy is able to target all known disease-causing STRs with genotyping performed with an established tool, ExpansionHunter, that is incorporated into the software. We have created additional functionality into STRipy to work with long alleles exceeding the fragment length. STRipy was validated using over 60 thousand simulated s les and was shown to work on whole genome sequencing of biological s les with pathogenic variants. Finally, we have used STRipy to acquire genotypes of pathogenic loci for thousands of s les from various populations which are provided to the user along with the data from the literature to assist with results interpretation. We believe the simplicity and breadth of STRipy will increase the testing of STR diseases in current datasets resulting in further diagnoses of rare diseases caused by STRs expansions.

Publication

Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis

Publisher: Cold Spring Harbor Laboratory

Date: 09-10-2017

DOI: 10.1101/200287

Abstract: RNA-Seq analyses can benefit from performing a genome-guided and de novo assembly, in particular for species where the reference genome or the annotation is incomplete. However, tools for integrating assembled transcriptome with reference annotation are lacking. Necklace is a software pipeline that runs genome-guided and de novo assembly and combines the resulting transcriptomes with reference genome annotations. Necklace constructs a compact but comprehensive superTranscriptome out of the assembled and reference data. Reads are subsequently aligned and counted in preparation for differential expression testing. Necklace allows a comprehensive transcriptome to be built from a combination of assembled and annotated transcripts which results in a more comprehensive transcriptome for the majority of organisms. In addition RNA-seq data is mapped back to this newly created superTranscript reference to enable differential expression testing with standard methods. Necklace is available from github.com/Oshlack/necklace/wiki under GPL 3.0.

Publication

Using DNA microarrays to study gene expression in closely related species

Publisher: Oxford University Press (OUP)

Date: 23-03-2007

DOI: 10.1093/BIOINFORMATICS/BTM111

Abstract: Motivation: Comparisons of gene expression levels within and between species have become a central tool in the study of the genetic basis for phenotypic variation, as well as in the study of the evolution of gene regulation. DNA microarrays are a key technology that enables these studies. Currently, however, microarrays are only available for a small number of species. Thus, in order to study gene expression levels in species for which microarrays are not available, researchers face three sets of choices: (i) use a microarray designed for another species, but only compare gene expression levels within species, (ii) construct a new microarray for every species whose gene expression profiles will be compared or (iii) build a multi-species microarray with probes from each species of interest. Here, we use data collected using a multi-primate cDNA array to evaluate the reliability of each approach. Results: We find that, for inter-species comparisons, estimates of expression differences based on multi-species microarrays are more accurate than those based on multiple species-specific arrays. We also demonstrate that within-species expression differences can be estimated using a microarray for a closely related species, without discernible loss of information. Contact: A.O. (oshlack@wehi.edu.au) or Y.G. (gilad@uchicago.edu) Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

MINTIE: identifying novel structural and splice variants in transcriptomes using RNA-seq data

Publisher: Cold Spring Harbor Laboratory

Date: 04-06-2020

DOI: 10.1101/2020.06.03.131532

Abstract: Genomic rearrangements can modify gene function by altering transcript sequences, and have been shown to be drivers in both cancer and rare diseases. Although there are now many methods to detect structural variants from Whole Genome Sequencing (WGS), RNA sequencing (RNA-seq) remains under-utilised as a technology for the detection of gene altering structural variants. Calling fusion genes from RNA-seq data is well established, but other transcriptional variants such as fusions with novel sequence, tandem duplications, large insertions and deletions, and novel splicing are difficult to detect using existing approaches. To identify all types of variants in transcriptomes, we developed MINTIE, an integrated pipeline for RNA-seq data. We take a reference free approach, which combines de novo assembly of transcripts with differential expression analysis, to identify up-regulated novel variants in a case s le. We validated MINTIE on simulated and real data sets and compared it with eight other approaches for finding novel transcriptional variants. We found MINTIE was able to detect % of variants while no other method was able to achieve this. We applied MINTIE to RNA-seq data from a cohort of acute lymphoblastic leukemia (ALL) patient s les and identified several clinically relevant variants, including a recurrent unpartnered fusion involving the tumour suppressor gene RB1, and variants in ALL-associated genes: tandem duplications in IKZF1 and PAX5, and novel splicing in ETV6. We further demonstrate the utility of MINTIE to identify rare disease variants using RNA-seq, including the discovery of an inter-chromosomal translocation in the DMD gene in a patient with muscular dystrophy. We posit that MINTIE will be able to identify new disease variants across a range of cancers and other disease types.

Publication

Accuracy of short tandem repeats genotyping tools in whole exome sequencing data

Publisher: F1000 Research Ltd

Date: 23-03-2020

DOI: 10.12688/F1000RESEARCH.22639.1

Abstract: Background: Short tandem repeats are an important source of genetic variation. They are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington's disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to analyse these repeats at large scale however, accurate genotyping is still a challenging task. We compared four different short tandem repeats genotyping tools on whole exome sequencing data to determine their genotyping performance and limits, which will aid other researchers in choosing a suitable tool and parameters for analysis. Methods: The analysis was performed on the Simons Simplex Collection dataset, where we used a novel method of evaluation with accuracy determined by the rate of homozygous calls on the X chromosome of male s les. In total we analysed 433 s les and around a million genotypes for evaluating tools on whole exome sequencing data. Results: We determined a relatively good performance of all tools when genotyping repeats of 3-6 bp in length, which could be improved with coverage and quality score filtering. However, genotyping homopolymers was challenging for all tools and a high error rate was present across different thresholds of coverage and quality scores. Interestingly, dinucleotide repeats displayed a high error rate as well, which was found to be mainly caused by the AC/TG repeats. Overall, LobSTR was able to make the most calls and was also the fastest tool, while RepeatSeq and HipSTR exhibited the lowest heterozygous error rate at low coverage. Conclusions: All tools have different strengths and weaknesses and the choice may depend on the application. In this analysis we demonstrated the effect of using different filtering parameters and offered recommendations based on the trade-off between the best accuracy of genotyping and the highest number of calls.

Publication

Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data

Publisher: Springer Science and Business Media LLC

Date: 18-04-2019

DOI: 10.1186/S13059-019-1688-1

Publication

Jarid2 regulates hematopoietic stem cell function by acting with polycomb repressive complex 2.

Publisher: American Society of Hematology

Date: 19-03-2015

DOI: 10.1182/BLOOD-2014-10-603969

Abstract: Depletion of Jarid2 in mouse and human hematopoietic stem cells enhances their activity. Jarid2 acts as part of PRC2 in hematopoietic stem and progenitor cells.

Publication

NKX2-5 regulates human cardiomyogenesis via a HEY2 dependent transcriptional network

Publisher: Springer Science and Business Media LLC

Date: 10-04-2018

DOI: 10.1038/S41467-018-03714-X

Abstract: Congenital heart defects can be caused by mutations in genes that guide cardiac lineage formation. Here, we show deletion of NKX2-5 , a critical component of the cardiac gene regulatory network, in human embryonic stem cells (hESCs), results in impaired cardiomyogenesis, failure to activate VCAM1 and to downregulate the progenitor marker PDGFRα. Furthermore, NKX2-5 null cardiomyocytes have abnormal physiology, with asynchronous contractions and altered action potentials. Molecular profiling and genetic rescue experiments demonstrate that the bHLH protein HEY2 is a key mediator of NKX2-5 function during human cardiomyogenesis. These findings identify HEY2 as a novel component of the NKX2-5 cardiac transcriptional network, providing tangible evidence that hESC models can decipher the complex pathways that regulate early stage human heart development. These data provide a human context for the evaluation of pathogenic mutations in congenital heart disease.

Publication

Removing unwanted variation in a differential methylation analysis of Illumina HumanMethylation450 array data

Publisher: Oxford University Press (OUP)

Date: 18-05-2015

DOI: 10.1093/NAR/GKV526

Publication

Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips

Publisher: Oxford University Press (OUP)

Date: 06-10-2010

DOI: 10.1093/NAR/GKQ871

Publication

Natural selection on gene expression

Publisher: Elsevier BV

Date: 08-2006

DOI: 10.1016/J.TIG.2006.06.002

Abstract: Changes in genetic regulation contribute to adaptations in natural populations and influence susceptibility to human diseases. Despite their potential phenotypic importance, the selective pressures acting on regulatory processes in general and gene expression levels in particular are largely unknown. Studies in model organisms suggest that the expression levels of most genes evolve under stabilizing selection, although a few are consistent with adaptive evolution. However, it has been proposed that gene expression levels in primates evolve largely in the absence of selective constraints. In this article, we discuss the microarray-based observations that led to these disparate interpretations. We conclude that in both primates and model organisms, stabilizing selection is likely to be the dominant mode of gene expression evolution. An important implication is that mutations affecting gene expression will often be deleterious and might underlie many human diseases.

Publication

The spectral energy distribution of PKS 2004-447: a compact steep-spectrum source and possible radio-loud narrow-line Seyfert 1 galaxy

Publisher: Oxford University Press (OUP)

Date: 21-07-2006

DOI: 10.1111/J.1365-2966.2006.10482.X

Publication

Identification of candidate gonadal sex differentiation genes in the chicken embryo using RNA-seq

Publisher: Springer Science and Business Media LLC

Date: 16-09-2015

DOI: 10.1186/S12864-015-1886-5

Publication

SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips

Publisher: Springer Science and Business Media LLC

Date: 2012

DOI: 10.1186/GB-2012-13-6-R44

Publication

Single cell analysis of the developing mouse kidney provides deeper insight into marker gene expression and ligand-receptor crosstalk

Publisher: The Company of Biologists

Date: 12-06-2019

DOI: 10.1242/DEV.178673

Abstract: Recent advances in the generation of kidney organoids and the culture of primary nephron progenitors from mouse and human have been based on knowledge of the molecular basis of kidney development in mice. Although gene expression during kidney development has been intensely investigated, single cell profiling provides new opportunities to further subsect component cell types and the signalling networks at play. Here, we describe the generation and analysis of 6732 single cell transcriptomes from the fetal mouse kidney [embryonic day (E)18.5] and 7853 sorted nephron progenitor cells (E14.5). These datasets provide improved resolution of cell types and specific markers, including sub ision of the renal stroma and heterogeneity within the nephron progenitor population. Ligand-receptor interaction and pathway analysis reveals novel crosstalk between cellular compartments and associates new pathways with differentiation of nephron and ureteric epithelium cell types. We identify transcriptional congruence between the distal nephron and ureteric epithelium, showing that most markers previously used to identify ureteric epithelium are not specific. Together, this work improves our understanding of metanephric kidney development and provides a template to guide the regeneration of renal tissue.

Publication

Enhancer retargeting of CDX2 and UBTF::ATXN7L3 define a subtype of high-risk B-progenitor acute lymphoblastic leukemia

Publisher: American Society of Hematology

Date: 16-06-2022

DOI: 10.1182/BLOOD.2022015444

Abstract: Transcriptome sequencing has identified multiple subtypes of B-progenitor acute lymphoblastic leukemia (B-ALL) of prognostic significance, but a minority of cases lack a known genetic driver. Here, we used integrated whole-genome (WGS) and -transcriptome sequencing (RNA-seq), enhancer mapping, and chromatin topology analysis to identify previously unrecognized genomic drivers in B-ALL. Newly diagnosed (n = 3221) and relapsed (n = 177) B-ALL cases with tumor RNA-seq were studied. WGS was performed to detect mutations, structural variants, and copy number alterations. Integrated analysis of histone 3 lysine 27 acetylation and chromatin looping was performed using HiChIP. We identified a subset of 17 newly diagnosed and 5 relapsed B-ALL cases with a distinct gene expression profile and 2 universal and unique genomic alterations resulting from aberrant recombination-activating gene activation: a focal deletion downstream of PAN3 at 13q12.2 resulting in CDX2 deregulation by the PAN3 enhancer and a focal deletion of exons 18-21 of UBTF at 17q21.31 resulting in a chimeric fusion, UBTF::ATXN7L3. A subset of cases also had rearrangement and increased expression of the PAX5 gene, which is otherwise uncommon in B-ALL. Patients were more commonly female and young adult with median age 35 (range,12-70 years). The immunophenotype was characterized by CD10 negativity and immunoglobulin M positivity. Among 16 patients with known clinical response, 9 (56.3%) had high-risk features including relapse (n = 4) or minimal residual disease & % at the end of remission induction (n = 5). CDX2-deregulated, UBTF::ATXN7L3 rearranged (CDX2/UBTF) B-ALL is a high-risk subtype of leukemia in young adults for which novel therapeutic approaches are required.

Publication

Clustering trees: a visualization for evaluating clusterings at multiple resolutions

Publisher: Oxford University Press (OUP)

Date: 07-2018

DOI: 10.1093/GIGASCIENCE/GIY083

Publication

STRetch: detecting and discovering pathogenic short tandem repeat expansions

Publisher: Cold Spring Harbor Laboratory

Date: 04-07-2017

DOI: 10.1101/159228

Abstract: Short tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian diseases. Historically, pathogenic STR expansions could only be detected by single locus techniques, such as PCR and electrophoresis. The ability to use short read sequencing data to screen for STR expansions has the potential to reduce both the time and cost to reaching diagnosis and enable the discovery of new causal STR loci. Most existing tools detect STR variation within the read length, and so are unable to detect the majority of pathogenic expansions. Those tools that can detect large expansions are limited to a set of known disease loci and as yet no new disease causing STR expansions have been identified with high-throughput sequencing technologies. Here we address this by presenting STRetch, a new genome-wide method to detect STR expansions at all loci across the human genome. We demonstrate the use of STRetch for detecting pathogenic STR expansions in short-read whole genome sequencing data with a very low false discovery rate. We further demonstrate the application of STRetch to solve cases of patients with undiagnosed disease and apply STRetch to the analysis of 97 whole genomes to reveal variation at STR loci. STRetch assesses expansions at all STR loci in the genome and allows screening for novel disease-causing STRs. STRetch is open source software, available from github.com/Oshlack/STRetch .

Publication

Expression profiling in primates reveals a rapid evolution of human transcription factors

Publisher: Springer Science and Business Media LLC

Date: 03-2006

DOI: 10.1038/NATURE04559

Abstract: Although it has been hypothesized for thirty years that many human adaptations are likely to be due to changes in gene regulation, almost nothing is known about the modes of natural selection acting on regulation in primates. Here we identify a set of genes for which expression is evolving under natural selection. We use a new multi-species complementary DNA array to compare steady-state messenger RNA levels in liver tissues within and between humans, chimpanzees, orangutans and rhesus macaques. Using estimates from a linear mixed model, we identify a set of genes for which expression levels have remained constant across the entire phylogeny (approximately 70 million years), and are therefore likely to be under stabilizing selection. Among the top candidates are five genes with expression levels that have previously been shown to be altered in liver carcinoma. We also find a number of genes with similar expression levels among non-human primates but significantly elevated or reduced expression in the human lineage, features that point to the action of directional selection. Among the gene set with a human-specific increase in expression, there is an excess of transcription factors the same is not true for genes with increased expression in chimpanzee.

Publication

Near infrared micro-variability of radio-loud quasars

Publisher: Cambridge University Press (CUP)

Date: 2002

DOI: 10.1071/AS01083

Abstract: We observed three AGN from the Parkes Half-Jansky Flat-spectrum S le at near infrared (NIR) wavelengths to search for micro-variability. In one source, the blue quasar PKS 2243–123, good evidence for NIR micro-variability was found. In the other two sources, PKS 2240–260 and PKS 2233–148, both BL Lacertae objects, no such evidence of variability was detected. We discuss the implications of these observations for the various mechanisms that have been proposed for micro-variability.

Publication

Functionally distinct roles for different miR-155 expression levels through contrasting effects on gene expression, in acute myeloid leukaemia.

Publisher: Springer Science and Business Media LLC

Date: 14-10-2017

DOI: 10.1038/LEU.2016.279

Abstract: Enforced expression of microRNA-155 (miR-155) in myeloid cells has been shown to have both oncogenic or tumour-suppressor functions in acute myeloid leukaemia (AML). We sought to resolve these contrasting effects of miR-155 overexpression using murine models of AML and human paediatric AML data sets. We show that the highest miR-155 expression levels inhibited proliferation in murine AML models. Over time, enforced miR-155 expression in AML in vitro and in vivo, however, favours selection of intermediate miR-155 expression levels that results in increased tumour burden in mice, without accelerating the onset of disease. Strikingly, we show that intermediate and high miR-155 expression also regulate very different subsets of miR-155 targets and have contrasting downstream effects on the transcriptional environments of AML cells, including genes involved in haematopoiesis and leukaemia. Furthermore, we show that elevated miR-155 expression detected in paediatric AML correlates with intermediate and not high miR-155 expression identified in our experimental models. These findings collectively describe a novel dose-dependent role for miR-155 in the regulation of AML, which may have important therapeutic implications.

Publication

Bpipe: a tool for running and managing bioinformatics pipelines

Publisher: Oxford University Press (OUP)

Date: 12-04-2012

DOI: 10.1093/BIOINFORMATICS/BTS167

Abstract: Summary: Bpipe is a simple, dedicated programming language for defining and executing bioinformatics pipelines. It specializes in enabling users to turn existing pipelines based on shell scripts or command line tools into highly flexible, adaptable and maintainable workflows with a minimum of effort. Bpipe ensures that pipelines execute in a controlled and repeatable fashion and keeps audit trails and logs to ensure that experimental results are reproducible. Requiring only Java as a dependency, Bpipe is fully self-contained and cross-platform, making it very easy to adopt and deploy into existing environments. Availability and implementation: Bpipe is freely available from bpipe.org under a BSD License. Contact: simon.sadedin@mcri.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

Publication

A scaling normalization method for differential expression analysis of RNA-seq data

Publisher: Springer Science and Business Media LLC

Date: 2010

DOI: 10.1186/GB-2010-11-3-R25

Publication

A cross-package Bioconductor workflow for analysing methylation array data

Publisher: F1000 Research Ltd

Date: 08-06-2016

DOI: 10.12688/F1000RESEARCH.8839.1

Abstract: Methylation in the human genome is known to be associated with development and disease. The Illumina Infinium methylation arrays are by far the most common way to interrogate methylation across the human genome. This paper provides a Bioconductor workflow using multiple packages for the analysis of methylation array data. Specifically, we demonstrate the steps involved in a typical differential methylation analysis pipeline including: quality control, filtering, normalization, data exploration and statistical testing for probe-wise differential methylation. We further outline other analyses such as differential methylation of regions, differential variability analysis, estimating cell type composition and gene ontology testing. Finally, we provide some ex les of how to visualise methylation array data.

Publication

A cross-package Bioconductor workflow for analysing methylation array data.

Publisher: F1000 Research Ltd

Date: 26-07-2016

DOI: 10.12688/F1000RESEARCH.8839.2

Abstract: Methylation in the human genome is known to be associated with development and disease. The Illumina Infinium methylation arrays are by far the most common way to interrogate methylation across the human genome. This paper provides a Bioconductor workflow using multiple packages for the analysis of methylation array data. Specifically, we demonstrate the steps involved in a typical differential methylation analysis pipeline including: quality control, filtering, normalization, data exploration and statistical testing for probe-wise differential methylation. We further outline other analyses such as differential methylation of regions, differential variability analysis, estimating cell type composition and gene ontology testing. Finally, we provide some ex les of how to visualise methylation array data.

Publication

MLL-TFE3: a novel and aggressive KMT2A fusion identified in infant leukemia

Publisher: American Society of Hematology

Date: 09-10-2020

DOI: 10.1182/BLOODADVANCES.2020002708

Abstract: A novel KMT2A-rearrangement, MLL-TFE3, was identified in an infant leukemia patient. MLL-TFE3 expression produces aggressive leukemia in a mouse model.

Publication

Black hole mass estimates of radio-selected quasars

Publisher: American Astronomical Society

Date: 09-2002

DOI: 10.1086/341729

Publication

A cross-package Bioconductor workflow for analysing methylation array data

Publisher: F1000 Research Ltd

Date: 05-04-2017

DOI: 10.12688/F1000RESEARCH.8839.3

Abstract: Methylation in the human genome is known to be associated with development and disease. The Illumina Infinium methylation arrays are by far the most common way to interrogate methylation across the human genome. This paper provides a Bioconductor workflow using multiple packages for the analysis of methylation array data. Specifically, we demonstrate the steps involved in a typical differential methylation analysis pipeline including: quality control, filtering, normalization, data exploration and statistical testing for probe-wise differential methylation. We further outline other analyses such as differential methylation of regions, differential variability analysis, estimating cell type composition and gene ontology testing. Finally, we provide some ex les of how to visualise methylation array data.

Publication

Detecting copy number alterations in RNA-Seq using SuperFreq

Publisher: Cold Spring Harbor Laboratory

Date: 06-2020

DOI: 10.1101/2020.05.31.126888

Abstract: Calling copy number alterations (CNAs) from RNA-Seq is challenging, because differences in gene expression mean that read depth across genes varies by several orders of magnitude and there is a paucity of informative single nucleotide polymorphisms (SNPs). We previously developed SuperFreq to analyse exome data of tumours by combining variant calling and copy number estimation in an integrated pipeline. Here we have used the SuperFreq framework for the analysis of RNA sequencing (RNA-Seq) data, which allows for the detection of absolute and allele sensitive CNAs. SuperFreq uses an error-propagation framework to combine and maximise the information available in the read depth and B-allele frequencies of SNPs (BAFs) to make CNA calls on RNA-seq data. We used data from The Cancer Genome Atlas (TCGA) to evaluate the CNA called from RNA-Seq with those generated from SNP-arrays. When ploidy estimates were consistent, we found excellent agreement with CNAs called from DNA of over 98% of the genome for acute myeloid leukaemia (TCGA-AML, n=116) and 87% for colorectal cancer (TCGA-CRC, n=377), which has a much higher CNA burden. As expected, the sensitivity of CNA calling from RNA-Seq was dependent on gene density. Nonetheless, using RNA-Seq SuperFreq detected 78% of CNA calls covering 100 or more genes with a precision of 94%. Recall dropped markedly for focal events, but this also depended on the signal intensity. For ex le, in the CRC cohort SuperFreq identified 100% (7/7) of cases with high-level lification of ERBB2, where the copy number was typically , but identified only 6% (1/17) of cases with moderate lification of IGF2, typically 4 or 5 copies over a smaller region (median 5 flanking genes for IGF2, compared to 20 for ERBB2). We were able to reproduce the relationship between mutational load and CNA profile in CRC using RNA-Seq alone. SuperFreq offers an integrated platform for identification of CNAs and point mutations from RNA-seq in cancer transcriptomes. The software is implemented in R and is available through GitHub: github.com/ChristofferFlensburg/SuperFreq .

Publication

splatPop: simulating population scale single-cell RNA sequencing data

Publisher: Cold Spring Harbor Laboratory

Date: 17-06-2021

DOI: 10.1101/2021.06.17.448806

Abstract: With improving technology and decreasing costs, single-cell RNA sequencing (scRNA-seq) at the population scale has become more viable, opening up the doors to study functional genomics at the single-cell level. This development has lead to a rush to adapt bulk methods and develop new single-cell-specific methods and tools for computational analysis of these studies. Many single-cell methods have been tested, developed, and benchmarked using simulated data. However, current scRNA-seq simulation frameworks do not allow for the simulation of population-scale scRNA-seq data. Here, we present splatPop, a new Splatter model, for flexible, reproducible, and well documented simulation of population-scale scRNA-seq data with known expression quantitative trait loci (eQTL) effects. The splatPop model also allows for the simulation of complex batch effects, cell group effects, and conditional effects between in iduals from different cohorts.

Publication

Splatter: simulation of single-cell RNA sequencing data

Publisher: Cold Spring Harbor Laboratory

Date: 02-05-2017

DOI: 10.1101/133173

Abstract: As single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available. Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.

Publication

SuperFreq: Integrated mutation detection and clonal tracking in cancer

Publisher: Cold Spring Harbor Laboratory

Date: 30-07-2018

DOI: 10.1101/380097

Abstract: Analysing multiple cancer s les from an in idual patient can provide insight into the way the disease evolves. Monitoring the expansion and contraction of distinct clones helps to reveal the mutations that initiate the disease and those that drive progression. Existing approaches for clonal tracking from sequencing data typically require the user to combine multiple tools that are not purpose-built for this task. Furthermore, most methods require a matched normal (non-tumour) s le, which limits the scope of application. We developed SuperFreq, a cancer exome sequencing analysis pipeline that integrates identification of somatic single nucleotide variants (SNVs) and copy number alterations (CNAs) and clonal tracking for both. SuperFreq does not require a matched normal and instead relies on unrelated controls. When analysing multiple s les from a single patient, SuperFreq cross checks variant calls to improve clonal tracking, which helps to separate somatic from germline variants, and to resolve overlapping CNA calls. To demonstrate our software we analysed 304 cancer-normal exome s les across 33 cancer types in The Cancer Genome Atlas (TCGA) and evaluated the quality of the SNV and CNA calls. We simulated clonal evolution through in silico mixing of cancer and normal s les in known proportion. We found that SuperFreq identified 93% of clones with a cellular fraction of at least 50% and mutations were assigned to the correct clone with high recall and precision. In addition, SuperFreq maintained a similar level of performance for most aspects of the analysis when run without a matched normal. SuperFreq is highly versatile and can be applied in many different experimental settings for the analysis of exomes and other capture libraries. We demonstrate an application of SuperFreq to leukaemia patients with diagnosis and relapse s les. SuperFreq is implemented in R and available on github at github.com/ChristofferFlensburg/SuperFreq .

Publication

Whole exome sequencing in systemic juvenile idiopathic arthritis

Publisher: Elsevier BV

Date: 02-2016

DOI: 10.1016/J.PATHOL.2015.12.106

Publication

Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes

Publisher: Springer Science and Business Media LLC

Date: 2007

DOI: 10.1186/GB-2007-8-1-R2

Publication

From RNA-seq reads to differential expression results

Publisher: Springer Science and Business Media LLC

Date: 2010

DOI: 10.1186/GB-2010-11-12-220

Publication

MCM3AP in recessive Charcot-Marie-Tooth neuropathy and mild intellectual disability

Publisher: Oxford University Press (OUP)

Date: 19-06-2017

DOI: 10.1093/BRAIN/AWX138

Abstract: Defects in mRNA export from the nucleus have been linked to various neurodegenerative disorders. We report mutations in the gene MCM3AP, encoding the germinal center associated nuclear protein (GANP), in nine affected in iduals from five unrelated families. The variants were associated with severe childhood onset primarily axonal (four families) or demyelinating (one family) Charcot-Marie-Tooth neuropathy. Mild to moderate intellectual disability was present in seven of nine affected in iduals. The affected in iduals were either compound heterozygous or homozygous for different MCM3AP variants, which were predicted to cause depletion of GANP or affect conserved amino acids with likely importance for its function. Accordingly, fibroblasts of affected in iduals from one family demonstrated severe depletion of GANP. GANP has been described to function as an mRNA export factor, and to suppress TDP-43-mediated motor neuron degeneration in flies. Thus our results suggest defective mRNA export from nucleus as a potential pathogenic mechanism of axonal degeneration in these patients. The identification of MCM3AP variants in affected in iduals from multiple centres establishes it as a disease gene for childhood-onset recessively inherited Charcot-Marie-Tooth neuropathy with intellectual disability.

Publication

A comparison of background correction methods for two-colour microarrays.

Publisher: Oxford University Press (OUP)

Date: 25-08-2007

DOI: 10.1093/BIOINFORMATICS/BTM412

Abstract: Motivation: Microarray data must be background corrected to remove the effects of non-specific binding or spatial heterogeneity across the array, but this practice typically causes other problems such as negative corrected intensities and high variability of low intensity log-ratios. Different estimators of background, and various model-based processing methods, are compared in this study in search of the best option for differential expression analyses of small microarray experiments. Results: Using data where some independent truth in gene expression is known, eight different background correction alternatives are compared, in terms of precision and bias of the resulting gene expression measures, and in terms of their ability to detect differentially expressed genes as judged by two popular algorithms, SAM and limma eBayes. A new background processing method (normexp) is introduced which is based on a convolution model. The model-based correction methods are shown to be markedly superior to the usual practice of subtracting local background estimates. Methods which stabilize the variances of the log-ratios along the intensity range perform the best. The normexp+offset method is found to give the lowest false discovery rate overall, followed by morph and vsn. Like vsn, normexp is applicable to most types of two-colour microarray data. Availability: The background correction methods compared in this article are available in the R package limma (Smyth, 2005) from www.bioconductor.org. Contact: smyth@wehi.edu.au Supplementary information: Supplementary data are available from bioinf.wehi.edu.au/resources/webReferences.html.

Publication

The role of cardiac transcription factor NKX2-5 in regulating the human cardiac miRNAome

Publisher: Springer Science and Business Media LLC

Date: 04-11-2019

DOI: 10.1038/S41598-019-52280-9

Abstract: MicroRNAs (miRNAs) are translational regulatory molecules with recognised roles in heart development and disease. Therefore, it is important to define the human miRNA expression profile in cardiac progenitors and early-differentiated cardiomyocytes and to determine whether critical cardiac transcription factors such as NKX2-5 regulate miRNA expression. We used an NKX2-5 eGFP/w reporter line to isolate both cardiac committed mesoderm and cardiomyocytes. We identified 11 miRNAs that were differentially expressed in NKX2-5 -expressing cardiac mesoderm compared to non-cardiac mesoderm. Subsequent profiling revealed that the canonical myogenic miRNAs including MIR1-1 , MIR133A1 and MIR208A were enriched in cardiomyocytes. Strikingly, deletion of NKX2-5 did not result in gross changes in the cardiac miRNA profile, either at committed mesoderm or cardiomyocyte stages. Thus, in early human cardiomyocyte commitment and differentiation, the cardiac myogenic miRNA program is predominantly regulated independently of the highly conserved NKX2-5 -dependant gene regulatory network.

Publication

Benchmarking single-cell hashtag oligo demultiplexing methods

Publisher: Cold Spring Harbor Laboratory

Date: 21-12-2022

DOI: 10.1101/2022.12.20.521313

Abstract: S le multiplexing is often used to reduce cost and limit batch effects in single-cell RNA sequencing (scRNA-seq) experiments. A commonly used multiplexing technique involves tagging cells prior to pooling with a hashtag oligo (HTO) that can be sequenced along with the cells’ RNA to determine their s le of origin. Several tools have been developed to demultiplex HTO sequencing data and assign cells to s les. In this study, we critically assess the performance of seven HTO demultiplexing tools: hashedDrops, HTODemux, GMM-Demux, demuxmix, deMULTIplex, BFF and HashSolo . The comparison uses data sets where each s le has also been demultiplexed using genetic variants from the RNA, enabling comparison of HTO demultiplexing techniques against complementary data from the genetic “ground truth”. We find that all methods perform similarly where HTO labelling is of high quality, but methods that assume a bimodal counts distribution perform poorly on lower quality data. We also suggest heuristic approaches for assessing the quality of HTO counts in a scRNA-seq experiment.

Publication

SuperTranscript: a data driven reference for analysis and visualisation of transcriptomes

Publisher: Cold Spring Harbor Laboratory

Date: 27-09-2016

DOI: 10.1101/077750

Abstract: Numerous methods have been developed to analyse RNA sequencing data, but most rely on the availability of a reference genome, making them unsuitable for non-model organisms. De novo transcriptome assembly can build a reference transcriptome from the non-model sequencing data, but falls short of allowing most tools to be applied. Here we present superTranscripts, a simple but powerful solution to bridge that gap. SuperTranscripts are a substitute for a reference genome, consisting of all the unique exonic sequence, in transcriptional order, such that each gene is represented by a single sequence. We demonstrate how superTranscripts allow visualization, variant detection and differential isoform detection in non-model organisms, using widely applied methods that are designed to work with reference genomes. SuperTranscripts can also be applied to model organisms to enhance visualization and discover novel expressed sequence. We describe Lace, software to construct superTranscripts from any set of transcripts including de novo assembled transcriptomes. In addition we used Lace to combine reference and assembled transcriptomes for chicken and recovered the sequence of hundreds of gaps in the reference genome.

Publication

SFPQ-ABL1 and BCR-ABL1 use different signaling networks to drive B-cell acute lymphoblastic leukemia

Publisher: American Society of Hematology

Date: 07-04-2022

DOI: 10.1182/BLOODADVANCES.2021006076

Abstract: Philadelphia-like (Ph-like) acute lymphoblastic leukemia (ALL) is a high-risk subtype of B-cell ALL characterized by a gene expression profile resembling Philadelphia chromosome–positive ALL (Ph+ ALL) in the absence of BCR-ABL1. Tyrosine kinase–activating fusions, some involving ABL1, are recurrent drivers of Ph-like ALL and are targetable with tyrosine kinase inhibitors (TKIs). We identified a rare instance of SFPQ-ABL1 in a child with Ph-like ALL. SFPQ-ABL1 expressed in cytokine-dependent cell lines was sufficient to transform cells and these cells were sensitive to ABL1-targeting TKIs. In contrast to BCR-ABL1, SFPQ-ABL1 localized to the nuclear compartment and was a weaker driver of cellular proliferation. Phosphoproteomics analysis showed upregulation of cell cycle, DNA replication, and spliceosome pathways, and downregulation of signal transduction pathways, including ErbB, NF-κB, vascular endothelial growth factor (VEGF), and MAPK signaling in SFPQ-ABL1–expressing cells compared with BCR-ABL1–expressing cells. SFPQ-ABL1 expression did not activate phosphatidylinositol 3-kinase rotein kinase B (PI3K/AKT) signaling and was associated with phosphorylation of G2/M cell cycle proteins. SFPQ-ABL1 was sensitive to navitoclax and S-63845 and promotes cell survival by maintaining expression of Mcl-1 and Bcl-xL. SFPQ-ABL1 has functionally distinct mechanisms by which it drives ALL, including subcellular localization, proliferative capacity, and activation of cellular pathways. These findings highlight the role that fusion partners have in mediating the function of ABL1 fusions.

Publication

Slinker: Visualising novel splicing events in RNA-Seq data [version 1; peer review: awaiting peer review]

Publisher: F1000 Research Ltd

Date: 07-12-2021

DOI: 10.12688/F1000RESEARCH.74836.1

Abstract: Visualisation of the transcriptome relative to a reference genome is fraught with sparsity. This is due to RNA sequencing (RNA-Seq) reads being predominantly mapped to exons that account for just under 3% of the human genome. Recently, we have used exon-only references, superTranscripts, to improve visualisation of aligned RNA-Seq data through the omission of supposedly unexpressed regions such as introns. However, variation within these regions can lead to novel splicing events that may drive a pathogenic phenotype. In these cases, the loss of information in only retaining annotated exons presents significant drawbacks. Here we present Slinker, a bioinformatics pipeline written in Python and Bpipe that uses a data-driven approach to assemble s le-specific superTranscripts. At its core, Slinker uses Stringtie2 to assemble transcripts with any sequence across any gene. This assembly is merged with reference transcripts, converted to a superTranscript, of which rich visualisations are made through Plotly with associated annotation and coverage information. Slinker was validated on five novel splicing events of rare disease s les from a cohort of primary muscular disorders. In addition, Slinker was shown to be effective in visualising deletion events within transcriptomes of tumour s les in the important leukemia gene, IKZF1. Slinker offers a succinct visualisation of RNA-Seq alignments across typically sparse regions and is freely available on Github.

Publication

STRipy: A graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data.

Publisher: Hindawi Limited

Date: 21-04-2022

DOI: 10.1002/HUMU.24382

Abstract: Expansions of short tandem repeats (STRs) have been implicated as the causal variant in over 50 diseases known to date. There are several tools which can genotype STRs from high-throughput sequencing (HTS) data. However, running these tools out of the box only allows around half of the known disease-causing loci to be genotyped. Furthermore, the genotypes estimated at these loci are often underestimated with maximum lengths limited to either the read or fragment length, which is less than the pathogenic cutoff for some diseases. Although analysis tools can be customized to genotype extra loci, this requires proficiency in bioinformatics to set up, limiting their widespread usage by other researchers and clinicians. To address these issues, we have developed a new software called STRipy, which is able to target all known disease-causing STRs from HTS data. We created an intuitive graphical interface for STRipy and significantly simplified the detection of STRs expansions. Moreover, we genotyped all disease loci for over two and half thousand s les to provide population-wide distributions to assist with interpretation of results. We believe the simplicity and breadth of STRipy will increase the genotyping of STRs in sequencing data resulting in further diagnoses of rare STR diseases.

Publication

A prospective evaluation of whole-exome sequencing as a first-tier molecular test in infants with suspected monogenic disorders

Publisher: Elsevier BV

Date: 11-2016

DOI: 10.1038/GIM.2016.1

Abstract: To prospectively evaluate the diagnostic and clinical utility of singleton whole-exome sequencing (WES) as a first-tier test in infants with suspected monogenic disease. Singleton WES was performed as a first-tier sequencing test in infants recruited from a single pediatric tertiary center. This occurred in parallel with standard investigations, including single- or multigene panel sequencing when clinically indicated. The diagnosis rate, clinical utility, and impact on management of singleton WES were evaluated. Of 80 enrolled infants, 46 received a molecular genetic diagnosis through singleton WES (57.5%) compared with 11 (13.75%) who underwent standard investigations in the same patient group. Clinical management changed following exome diagnosis in 15 of 46 diagnosed participants (32.6%). Twelve relatives received a genetic diagnosis following cascade testing, and 28 couples were identified as being at high risk of recurrence in future pregnancies. This prospective study provides strong evidence for increased diagnostic and clinical utility of singleton WES as a first-tier sequencing test for infants with a suspected monogenic disorder. Singleton WES outperformed standard care in terms of diagnosis rate and the benefits of a diagnosis, namely, impact on management of the child and clarification of reproductive risks for the extended family in a timely manner.Genet Med 18 11, 1090-1096.

Publication

Clustering trees: a visualisation for evaluating clusterings at multiple resolutions

Publisher: Cold Spring Harbor Laboratory

Date: 02-03-2018

DOI: 10.1101/274035

Abstract: Clustering techniques are widely used in the analysis of large data sets to group together s les with similar properties. For ex le, clustering is often used in the field of single-cell RNA-sequencing in order to identify different cell types present in a tissue s le. There are many algorithms for performing clustering and the results can vary substantially. In particular, the number of groups present in a data set is often unknown and the number of clusters identified by an algorithm can change based on the parameters used. To explore and examine the impact of varying clustering resolution we present clustering trees. This visualisation shows the relationships between clusters at multiple resolutions allowing researchers to see how s les move as the number of clusters increases. In addition, meta-information can be overlaid on the tree to inform the choice of resolution and guide in identification of clusters. We illustrate the features of clustering trees using a series of simulations as well as two real ex les, the classical iris dataset and a complex single-cell RNA-sequencing dataset. Clustering trees can be produced using the clustree R package available from CRAN ( CRAN.Rackage=clustree ) and developed on GitHub ( azappi/clustree ).

Publication

Author response: Nephron progenitor commitment is a stochastic process influenced by cell migration

Publisher: eLife Sciences Publications, Ltd

Date: 24-12-2018

DOI: 10.7554/ELIFE.41156.030

Publication

Disorders of sex development: insights from targeted gene sequencing of a large international patient cohort

Publisher: Springer Science and Business Media LLC

Date: 29-11-2016

DOI: 10.1186/S13059-016-1105-Y

Publication

RNA sequencing reveals sexually dimorphic gene expression before gonadal differentiation in chicken and allows comprehensive annotation of the W-chromosome

Publisher: Springer Science and Business Media LLC

Date: 2013

DOI: 10.1186/GB-2013-14-3-R26

Publication

Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database

Publisher: Cold Spring Harbor Laboratory

Date: 20-10-2017

DOI: 10.1101/206573

Abstract: As single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database ( www.scRNA-tools.org ) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records of the growth of the field over time. In recent years single-cell RNA-sequeing technologies have emerged that allow scientists to measure the activity of genes in thousands of in idual cells simultaneously. This means we can start to look at what each cell in a s le is doing instead of considering an average across all cells in a s le, as was the case with older technologies. However, while access to this kind of data presents a wealth of opportunities it comes with a new set of challenges. Researchers across the world have developed new methods and software tools to make the most of these datasets but the field is moving at such a rapid pace it is difficult to keep up with what is currently available. To make this easier we have developed the scRNA-tools database and website ( www.scRNA-tools.org ). Our database catalogues analysis tools, recording the tasks they can be used for, where they can be downloaded from and the publications that describe how they work. By looking at this database we can see that developers have focued on methods specific to single-cell data and that they embrace an open-source approach with permissive licensing, sharing of code and preprint publications.

Publication

Accuracy of short tandem repeats genotyping tools in whole exome sequencing data

Publisher: Cold Spring Harbor Laboratory

Date: 04-02-2020

DOI: 10.1101/2020.02.03.933002

Abstract: Short tandem repeats are important source of genetic variation, they are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington’s disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to analyse these repeats at large scale, however, accurate genotyping is still a challenging task. We compared four different short tandem repeats genotyping tools on whole exome sequencing data to determine their genotyping performance and limits which will aid other researchers to choose a suitable tool and parameters for analysis. The analysis was performed on the Simons Simplex Collection dataset where we used a novel method of evaluation with accuracy determined by the rate of homozygous calls on the X chromosome of male s les. In total we analysed 433 s les and around a million genotypes for evaluating tools on whole exome sequencing data. We determined a relatively good performance of all tools when genotyping repeats of 3-6 bp in length which could be improved with coverage and quality score filtering. However, genotyping homopolymers was challenging for all tools and a high error rate was present across different thresholds of coverage and quality scores. Interestingly, dinucleotide repeats displayed a high error rate as well, which was found to be mainly caused by the AC/TG repeats. Overall, LobSTR was able to make the most calls and was also the fastest tool while RepeatSeq and HipSTR exhibited the lowest heterozygous error rate at low coverage. All tools have different strengths and weaknesses and the choice may depend on the type of analysis. In this analysis we demonstrated the effect of using different filtering parameters and offered recommendations based on the trade-off between the best accuracy of genotyping and the highest number of calls.

Publication

Cpipe: A shared variant detection pipeline designed for diagnostic settings

Publisher: Springer Science and Business Media LLC

Date: 10-07-2015

DOI: 10.1186/S13073-015-0191-X

Publication

Transcript length bias in RNA-seq data confounds systems biology

Publisher: Springer Science and Business Media LLC

Date: 2009

DOI: 10.1186/1745-6150-4-14

Publication

Gene Regulation in Primates Evolves under Tissue-Specific Selection Pressures

Publisher: Public Library of Science (PLoS)

Date: 21-11-2008

DOI: 10.1371/JOURNAL.PGEN.1000271

Publication

ALLSorts: an RNA-Seq subtype classifier for B-cell acute lymphoblastic leukemia

Publisher: American Society of Hematology

Date: 15-07-2022

DOI: 10.1182/BLOODADVANCES.2021005894

Publication

Diagnostic impact and cost-effectiveness of whole-exome sequencing for ambulant children with suspected monogenic conditions

Publisher: American Medical Association (AMA)

Date: 09-2017

DOI: 10.1001/JAMAPEDIATRICS.2017.1755

Publication

Fast and accurate differential transcript usage by testing equivalence class counts

Publisher: Cold Spring Harbor Laboratory

Date: 19-12-2018

DOI: 10.1101/501106

Abstract: RNA sequencing has enabled high-throughput and fine-grained quantitative analyses of the transcriptome. While differential gene expression is the most widely used application of this technology, RNA-seq data also has the resolution to infer differential transcript usage (DTU), which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. DTU has typically been inferred from exon-count data, which has issues with assigning reads unambiguously to counting bins, and requires alignment of reads to the genome. Recently, approaches have emerged that use transcript quantifications estimates directly for DTU. Transcript counts can be inferred from ‘pseudo’ or lightweight aligners, which are significantly faster than traditional genome alignment. However, recent evaluations show lower sensitivity in DTU analysis. Transcript abundances are estimated from equivalence classes (ECs), which determine the transcripts that any given read is compatible with. Here we propose performing DTU testing directly on equivalence class read counts. We evaluate this approach on simulated human and drosophila data, as well as on a real dataset through subset testing. We find that ECs counts have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners. We posit that equivalent class counts is a natural unit on which to perform many types of analysis.

Publication

Cell-Type–Specific Transcriptional Profiles of the Dimorphic Pathogen Penicillium marneffei Reflect Distinct Reproductive, Morphological, and Environmental Demands

Publisher: Oxford University Press (OUP)

Date: 11-2013

DOI: 10.1534/G3.113.006809

Abstract: Penicillium marneffei is an opportunistic human pathogen endemic to Southeast Asia. At 25° P. marneffei grows in a filamentous hyphal form and can undergo asexual development (conidiation) to produce spores (conidia), the infectious agent. At 37° P. marneffei grows in the pathogenic yeast cell form that replicates by fission. Switching between these growth forms, known as dimorphic switching, is dependent on temperature. To understand the process of dimorphic switching and the physiological capacity of the different cell types, two microarray-based profiling experiments covering approximately 42% of the genome were performed. The first experiment compared cells from the hyphal, yeast, and conidiation phases to identify “phase or cell-state–specific” gene expression. The second experiment examined gene expression during the dimorphic switch from one morphological state to another. The data identified a variety of differentially expressed genes that have been organized into metabolic clusters based on predicted function and expression patterns. In particular, C-14 sterol reductase–encoding gene ergM of the ergosterol biosynthesis pathway showed high-level expression throughout yeast morphogenesis compared to hyphal. Deletion of ergM resulted in severe growth defects with increased sensitivity to azole-type antifungal agents but not hotericin B. The data defined gene classes based on spatio-temporal expression such as those expressed early in the dimorphic switch but not in the terminal cell types and those expressed late. Such classifications have been helpful in linking a given gene of interest to its expression pattern throughout the P. marneffei dimorphic life cycle and its likely role in pathogenicity.

Publication

STRetch

Publisher: Springer Science and Business Media LLC

Date: 21-08-2018

DOI: 10.1186/S13059-018-1505-2

Publication

splatPop: simulating population scale single-cell RNA sequencing data

Publisher: Springer Science and Business Media LLC

Date: 12-2021

DOI: 10.1186/S13059-021-02546-1

Abstract: Population-scale single-cell RNA sequencing (scRNA-seq) is now viable, enabling finer resolution functional genomics studies and leading to a rush to adapt bulk methods and develop new single-cell-specific methods to perform these studies. Simulations are useful for developing, testing, and benchmarking methods but current scRNA-seq simulation frameworks do not simulate population-scale data with genetic effects. Here, we present splatPop, a model for flexible, reproducible, and well-documented simulation of population-scale scRNA-seq data with known expression quantitative trait loci. splatPop can also simulate complex batch, cell group, and conditional effects between in iduals from different cohorts as well as genetically-driven co-expression.

Publication

3D organoid-derived human glomeruli for personalised podocyte disease modelling and drug screening.

Publisher: Springer Science and Business Media LLC

Date: 04-12-2018

DOI: 10.1038/S41467-018-07594-Z

Abstract: The podocytes within the glomeruli of the kidney maintain the filtration barrier by forming interdigitating foot processes with intervening slit diaphragms, disruption in which results in proteinuria. Studies into human podocytopathies to date have employed primary or immortalised podocyte cell lines cultured in 2D. Here we compare 3D human glomeruli sieved from induced pluripotent stem cell-derived kidney organoids with conditionally immortalised human podocyte cell lines, revealing improved podocyte-specific gene expression, maintenance in vitro of polarised protein localisation and an improved glomerular basement membrane matrisome compared to 2D cultures. Organoid-derived glomeruli retain marker expression in culture for 96 h, proving amenable to toxicity screening. In addition, 3D organoid glomeruli from a congenital nephrotic syndrome patient with compound heterozygous NPHS1 mutations reveal reduced protein levels of both NEPHRIN and PODOCIN. Hence, human iPSC-derived organoid glomeruli represent an accessible approach to the in vitro modelling of human podocytopathies and screening for podocyte toxicity.

Publication

JAFFAL: detecting fusion genes with long-read transcriptome sequencing

Publisher: Springer Science and Business Media LLC

Date: 06-01-2022

DOI: 10.1186/S13059-021-02588-5

Abstract: In cancer, fusions are important diagnostic markers and targets for therapy. Long-read transcriptome sequencing allows the discovery of fusions with their full-length isoform structure. However, due to higher sequencing error rates, fusion finding algorithms designed for short reads do not work. Here we present JAFFAL, to identify fusions from long-read transcriptome sequencing. We validate JAFFAL using simulations, cell lines, and patient data from Nanopore and PacBio. We apply JAFFAL to single-cell data and find fusions spanning three genes demonstrating transcripts detected from complex rearrangements. JAFFAL is available at github.com/Oshlack/JAFFA/wiki .

Publication

propeller: Testing for differences in cell type proportions in single cell data

Publisher: Cold Spring Harbor Laboratory

Date: 28-11-2021

DOI: 10.1101/2021.11.28.470236

Abstract: Single cell RNA Sequencing (scRNA-seq) has rapidly gained popularity over the last few years for profiling the transcriptomes of thousands to millions of single cells. This technology is now being used to analyse experiments with complex designs including biological replication. One question that can be asked from single cell experiments, which has been difficult to directly address with bulk RNA-seq data, is whether the cell type proportions are different between two or more experimental conditions. As well as gene expression changes, the relative depletion or enrichment of a particular cell type can be the functional consequence of disease or treatment. However, cell type proportions estimates from scRNA-seq data are variable and statistical methods that can correctly account for different sources of variability are needed to confidently identify statistically significant shifts in cell type composition between experimental conditions. We have developed propeller , a robust and flexible method that leverages biological replication to find statistically significant differences in cell type proportions between groups. Using simulated cell type proportions data we show that propeller performs well under a variety of scenarios. We applied propeller to test for significant changes in proportions of cell types related to human heart development, ageing and COVID-19 disease severity. The propeller method is publicly available in the open source speckle R package ( hipsonlab/speckle ). All the analysis code for the paper is available at hipsonlab ropeller-paper-analysis/ , and the associated analysis website is available at phipsonlab.github.io ropeller-paper-analysis/ . Alicia Oshlack: Alicia.Oshlack@petermac.org Belinda Phipson: phipson.b@wehi.edu.au Yes.

Publication

ALLSorts: a RNA-Seq classifier for B-Cell Acute Lymphoblastic Leukemia

Publisher: Cold Spring Harbor Laboratory

Date: 08-2021

DOI: 10.1101/2021.08.01.454393

Abstract: B-cell acute lymphoblastic leukemia (B-ALL) is the most common childhood cancer. Subtypes within B-ALL are distinguished by characteristic structural variants and mutations, which in some instances strongly correlate with responses to treatment. The World Health Organisation (WHO) recognises seven distinct classifications, or subtypes , as of 2016. However, recent studies have demonstrated that B-ALL can be segmented into 23 subtypes based on a combination of genomic features and gene expression profiles. A method to identify a patient’s subtype would have clear clinical utility. Despite this, no publically available classification methods using RNA-Seq exist for this purpose. Here we present ALLSorts: a publicly available method that uses RNA-Seq data to classify B-ALL s les to 18 known subtypes and five meta-subtypes. ALLSorts is the result of a hierarchical supervised machine learning algorithm applied to a training set of 1223 B-ALL s les aggregated from multiple cohorts. Validation revealed that ALLSorts can accurately attribute s les to subtypes and can attribute multiple subtypes to a s le. Furthermore, when applied to both paediatric and adult cohorts, ALLSorts was able to classify previously undefined s les into subtypes. ALLSorts is available and documented on GitHub ( github.com/Oshlack/AllSorts/ ). ALLSorts is a gene expression classifier for B-cell acute lymphoblastic leukemia, which predicts 18 distinct genomic subtypes - including those designated by the World Health Organisation (WHO) and provisional entities. Trained and validated on over 2300 B-ALL s les, representing each subtype and a variety of clinical features. Correctly identified subtypes in 91% of cases in a held-out dataset and between 82-93% across a newly combined cohort of paediatric and adult s les. ALLSorts assigned subtypes to s les with previously unknown driver events. ALLsorts is an accurate, comprehensive and freely available classification tool that distinguishes subtypes of B-cell acute lymphoblastic leukemia from RNA-sequencing.

Publication

As we come to the end of 2011, several members of the Genome Biology Editorial Board give their views on the state of play in genomics

Publisher: Springer Science and Business Media LLC

Date: 2011

DOI: 10.1186/GB-2011-12-12-137

Publication

Toblerone: detecting exon deletion events in cancer using RNA-seq

Publisher: Cold Spring Harbor Laboratory

Date: 31-10-2022

DOI: 10.1101/2022.10.27.514132

Abstract: Cancer is driven by mutations of the genome that can result in the activation of oncogenes or repression of tumour suppressor genes. In acute lymphoblastic leukemia (ALL) focal deletions in IKAROS family zinc finger 1 (IKZF1) result in the loss of zinc-finger DNA-binding domains and a dominant negative isoform that is associated with higher rates of relapse and poorer patient outcomes. Clinically, the presence of IKZF1 deletions informs prognosis and treatment options. In this work we developed a method for detecting exon deletions in genes using RNA-seq with application to IKZF1. We developed a pipeline that first uses a custom transcriptome reference consisting of transcripts with exon deletions. Next, RNA-seq reads are mapped using a pseudoalignment algorithm to identify reads that uniquely support deletions. These are then evaluated for evidence of the deletion with respect to gene expression and other s les. We applied the algorithm, named Toblerone, to a cohort of 99 B-ALL paediatric s les including validated IKZF1 deletions. Furthermore, we developed a graphical desktop app for non-bioinformatics users that can quickly and easily identify and report deletions in IKZF1 from RNA-seq data with informative graphical outputs.

Publication

TALLSorts: a T-cell acute lymphoblastic leukemia subtype classifier using RNA-seq expression data

Publisher: American Society of Hematology

Date: 30-10-2023

DOI: 10.1182/BLOODADVANCES.2023010385

Publication

Purification and Transcriptomic Analysis of Mouse Fetal Leydig Cells Reveals Candidate Genes for Specification of Gonadal Steroidogenic Cells1

Publisher: Oxford University Press (OUP)

Date: 06-2015

DOI: 10.1095/BIOLREPROD.115.128918

Abstract: Male sex determination hinges on the development of testes in the embryo, beginning with the differentiation of Sertoli cells under the influence of the Y-linked gene SRY. Sertoli cells then orchestrate fetal testis formation including the specification of fetal Leydig cells (FLCs) that produce steroid hormones to direct virilization of the XY embryo. As the majority of XY disorders of sex development (DSDs) remain unexplained at the molecular genetic level, we reasoned that genes involved in FLC development might represent an unappreciated source of candidate XY DSD genes. To identify these genes, and to gain a more detailed understanding of the regulatory networks underpinning the specification and differentiation of the FLC population, we developed methods for isolating fetal Sertoli, Leydig, and interstitial cell-enriched subpopulations using an Sf1-eGFP transgenic mouse line. RNA sequencing followed by rigorous bioinformatic filtering identified 84 genes upregulated in FLCs, 704 genes upregulated in nonsteroidogenic interstitial cells, and 1217 genes upregulated in the Sertoli cells at 12.5 days postcoitum. The analysis revealed a trend for expression of components of neuroactive ligand interactions in FLCs and Sertoli cells and identified factors potentially involved in signaling between the Sertoli cells, FLCs, and interstitial cells. We identified 61 genes that were not known previously to be involved in specification or differentiation of FLCs. This dataset provides a platform for exploring the biology of FLCs and understanding the role of these cells in testicular development. In addition, it provides a basis for targeted studies designed to identify causes of idiopathic XY DSD.

Publication

Author Correction: Variants in SART3 cause a spliceosomopathy characterised by failure of testis development and neuronal defects (Nature Communications, (2023), 14, 1, (3403), 10.1038/s41467-023-39040-0)

Publisher: Springer Science and Business Media LLC

Date: 15-06-2023

DOI: 10.1038/S41467-023-39372-X

Publication

JAFFA: High sensitivity transcriptome-focused fusion gene detection.

Publisher: Springer Science and Business Media LLC

Date: 11-05-2015

DOI: 10.1186/S13073-015-0167-X

Publication

Differential Expression for RNA Sequencing (RNA-Seq) Data: Mapping, Summarization, Statistical Analysis, and Experimental Design

Publisher: Springer New York

Date: 22-09-2012

DOI: 10.1007/978-1-4614-0782-9_10

Publication

Toblerone: detecting exon deletion events in cancer using RNA-seq

Publisher: F1000 Research Ltd

Date: 03-02-2023

DOI: 10.12688/F1000RESEARCH.129490.1

Abstract: Cancer is driven by mutations of the genome that can result in the activation of oncogenes or repression of tumour suppressor genes. In acute lymphoblastic leukemia (ALL) focal deletions in IKAROS family zinc finger 1 (IKZF1) result in the loss of zinc-finger DNA-binding domains and a dominant negative isoform that is associated with higher rates of relapse and poorer patient outcomes. Clinically, the presence of IKZF1 deletions informs prognosis and treatment options. In this work we developed a method for detecting exon deletions in genes using RNA-seq with application to IKZF1. We developed a pipeline that first uses a custom transcriptome reference consisting of transcripts with exon deletions. Next, RNA-seq reads are mapped using a pseudoalignment algorithm to identify reads that uniquely support deletions. These are then evaluated for evidence of the deletion with respect to gene expression and other s les. We applied the algorithm, named Toblerone, to a cohort of 99 B-ALL paediatric s les including validated IKZF1 deletions. Furthermore, we developed a graphical desktop app for non-bioinformatics users that can quickly and easily identify and report deletions in IKZF1 from RNA-seq data with informative graphical outputs.

Publication

Unusual PDGFRB fusion reveals novel mechanism of kinase activation in Ph-like B-ALL

Publisher: Springer Science and Business Media LLC

Date: 21-02-2023

DOI: 10.1038/S41375-023-01843-X

Publication

Variants in SART3 cause a spliceosomopathy characterised by failure of testis development and neuronal defects

Publisher: Springer Science and Business Media LLC

Date: 09-06-2023

DOI: 10.1038/S41467-023-39040-0

Abstract: Squamous cell carcinoma antigen recognized by T cells 3 ( SART3 ) is an RNA-binding protein with numerous biological functions including recycling small nuclear RNAs to the spliceosome. Here, we identify recessive variants in SART3 in nine in iduals presenting with intellectual disability, global developmental delay and a subset of brain anomalies, together with gonadal dysgenesis in 46,XY in iduals. Knockdown of the Drosophila orthologue of SART3 reveals a conserved role in testicular and neuronal development. Human induced pluripotent stem cells carrying patient variants in SART3 show disruption to multiple signalling pathways, upregulation of spliceosome components and demonstrate aberrant gonadal and neuronal differentiation in vitro. Collectively, these findings suggest that bi-allelic SART3 variants underlie a spliceosomopathy which we tentatively propose be termed INDYGON syndrome ( I ntellectual disability, N eurodevelopmental defects and D evelopmental delay with 46,X Y GON adal dysgenesis). Our findings will enable additional diagnoses and improved outcomes for in iduals born with this condition.

Publication

A 10-step guide to party conversation for bioinformaticians

Publisher: Springer Science and Business Media LLC

Date: 2013

DOI: 10.1186/GB-2013-14-1-104

Publication

Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis

Publisher: Oxford University Press (OUP)

Date: 05-2018

DOI: 10.1093/GIGASCIENCE/GIY045

Publication

Genome-wide DNA methylation analysis identifies hypomethylated genes regulated by FOXP3 in human regulatory T cells

Publisher: American Society of Hematology

Date: 17-10-2013

DOI: 10.1182/BLOOD-2013-02-481788

Abstract: Human naive CD4+ T cells and resting nTreg are differentially methylated at 127 regions in their genomic DNA. Forkhead-binding motifs are present in promoter-associated differentially methylated regions, inferring broader epigenetic control of Treg.

Publication

Splatter: simulation of single-cell RNA sequencing data

Publisher: Springer Science and Business Media LLC

Date: 12-09-2017

DOI: 10.1186/S13059-017-1305-0

Publication

Catchii: empowering literature review screening in healthcare

Publisher: Cold Spring Harbor Laboratory

Date: 14-02-2202

DOI: 10.1101/2023.02.10.23285791

Abstract: A systematic review is a type of literature review that aims to collect and analyse all available evidence from the literature on a particular topic. The process of screening and identifying eligible articles from the vast amounts of literature is a time-consuming task. Specialized software has been developed to aid in the screening process and save significant time and labour. However, the most suitable software tools that are available often come with a cost or only offer either a limited or a trial version for free. In this paper, we report the release of a new software application, Catchii, which contains all the necessary features of a systematic review screener application while being completely free. It supports a user at different stages of screening, from detecting duplicates to creating the final flowchart for a publication. Catchii is designed to provide a good user experience and streamline the screening process through its clean and user-friendly interface on both computers and mobile devices, as well as features such as multi-coloured keyword highlighting, the ability to screen titles and abstracts smoothly with an unstable or even absent internet connection, and more. Catchii is a valuable addition to the current selection of systematic review screening applications that also allows researchers without financial capabilities to access many of the features found in the best paid tools. Catchii is available at catchii.org

Publication

Haploinsufficiency for the Six2 gene increases nephron progenitor proliferation promoting branching and nephron number.

Publisher: Elsevier BV

Date: 03-2018

DOI: 10.1016/J.KINT.2017.09.015

Publication

Corset: enabling differential gene expression analysis for de novo assembled transcriptomes.

Publisher: Springer Science and Business Media LLC

Date: 2014

DOI: 10.1186/PREACCEPT-2088857056122054

Publication

Differentiation of human embryonic stem cells to HOXA+ hemogenic vasculature that resembles the aorta-gonad-mesonephros

Publisher: Springer Science and Business Media LLC

Date: 17-10-2016

DOI: 10.1038/NBT.3702

Abstract: The ability to generate hematopoietic stem cells from human pluripotent cells would enable many biomedical applications. We find that hematopoietic CD34

Publication

Women in Science

Publisher: Springer Science and Business Media LLC

Date: 08-03-2012

DOI: 10.1186/GB-2012-13-3-148

Publication

TALLSorts: a T-cell acute lymphoblastic leukaemia subtype classifier using RNA-seq expression data

Publisher: Cold Spring Harbor Laboratory

Date: 05-04-2023

DOI: 10.1101/2023.04.05.535648

Abstract: T-cell acute lymphoblastic leukaemia (T-ALL) is an aggressive and heterogenous haematological malignancy affecting both children and adults. T-ALL subtype identification is an emerging area of active research, with several recent studies proposing potential subtypes based on transcriptomic and genomic analyses. Here we present TALLSorts, a machine-learning bioinformatic tool which classifies T-ALL s les by using bulk RNA sequencing (RNA-seq) data. Trained on four international cohorts totalling 264 s les, TALLSorts exhibits excellent accuracy when tested on holdout and independent test sets. TALLSorts is publicly available for use and will be constantly updated as the field of T-ALL classification further develops.

Publication

Condensin I associates with structural and gene regulatory regions in vertebrate chromosomes

Publisher: Springer Science and Business Media LLC

Date: 03-10-2013

DOI: 10.1038/NCOMMS3537

Publication

Susceptibility to Acute Rheumatic Fever Based on Differential Expression of Genes Involved in Cytotoxicity, Chemotaxis, and Apoptosis

Publisher: American Society for Microbiology

Date: 02-2014

DOI: 10.1128/IAI.01152-13

Abstract: It is unknown why only some in iduals are susceptible to acute rheumatic fever (ARF). We investigated whether there are differences in the immune response, detectable by gene expression, between in iduals who are susceptible to ARF and those who are not. Peripheral blood mononuclear cells (PBMCs) from 15 ARF-susceptible and 10 nonsusceptible (control) adults were stimulated with rheumatogenic (Rh+) group A streptococci (GAS) or nonrheumatogenic (Rh−) GAS. RNA from stimulated PBMCs from each subject was cohybridized with RNA from unstimulated PBMCs on oligonucleotide arrays to compare gene expression. Thirty-four genes were significantly differentially expressed between ARF-susceptible and control groups after stimulation with Rh+ GAS. A total of 982 genes were differentially expressed between Rh+ GAS- and Rh− GAS-stimulated s les from ARF-susceptible in iduals. Thirteen genes were differentially expressed in the same direction (predominantly decreased) between the two study groups and between the two stimulation conditions, giving a strong indication of their involvement. Seven of these were immune response genes involved in cytotoxicity, chemotaxis, and apoptosis. There was variability in the degree of expression change between in iduals. The high proportion of differentially expressed apoptotic and immune response genes supports the current model of autoimmune and cytokine dysregulation in ARF. This study also raises the possibility that a “failed” immune response, involving decreased expression of cytotoxic and apoptotic genes, contributes to the immunopathogenesis of ARF.

Publication

A clinically driven variant prioritization framework outperforms purely computational approaches for the diagnostic analysis of singleton WES data

Publisher: Springer Science and Business Media LLC

Date: 23-08-2017

DOI: 10.1038/EJHG.2017.123

Publication

A comparison of control samples for ChIP-seq of histone modifications

Publisher: Frontiers Media SA

Date: 25-09-2014

DOI: 10.3389/FGENE.2014.00329

Publication

Limb patterning genes and heterochronic development of the emu wing bud

Publisher: Springer Science and Business Media LLC

Date: 03-2021

DOI: 10.1186/S13227-016-0063-5

Publication

Gene length and detection bias in single cell RNA sequencing protocols

Publisher: F1000 Research Ltd

Date: 28-04-2017

DOI: 10.12688/F1000RESEARCH.11290.1

Abstract: Background : Single cell RNA sequencing (scRNA-seq) has rapidly gained popularity for profiling transcriptomes of hundreds to thousands of single cells. This technology has led to the discovery of novel cell types and revealed insights into the development of complex tissues. However, many technical challenges need to be overcome during data generation. Due to minute amounts of starting material, s les undergo extensive lification, increasing technical variability. A solution for mitigating lification biases is to include unique molecular identifiers (UMIs), which tag in idual molecules. Transcript abundances are then estimated from the number of unique UMIs aligning to a specific gene, with PCR duplicates resulting in copies of the UMI not included in expression estimates. Methods : Here we investigate the effect of gene length bias in scRNA-Seq across a variety of datasets that differ in terms of capture technology, library preparation, cell types and species. Results : We find that scRNA-seq datasets that have been sequenced using a full-length transcript protocol exhibit gene length bias akin to bulk RNA-seq data. Specifically, shorter genes tend to have lower counts and a higher rate of dropout. In contrast, protocols that include UMIs do not exhibit gene length bias, with a mostly uniform rate of dropout across genes of varying length. Across four different scRNA-Seq datasets profiling mouse embryonic stem cells (mESCs), we found the subset of genes that are only detected in the UMI datasets tended to be shorter, while the subset of genes detected only in the full-length datasets tended to be longer. Conclusions : We find that the choice of scRNA-seq protocol influences the detection rate of genes, and that full-length datasets exhibit gene-length bias. In addition, despite clear differences between UMI and full-length transcript data, we illustrate that full-length and UMI data can be combined to reveal the underlying biology influencing expression of mESCs.

Publication

The polycomb repressive complex 2 governs life and death of peripheral T cells

Publisher: American Society of Hematology

Date: 31-07-2014

DOI: 10.1182/BLOOD-2013-12-544106

Abstract: Ezh2 represses Ifng, Gata3, and Il10 loci in naïve CD4+T cells, and its deficiency leads to Th1 skewing and IL-10 overproduction in Th2 cells. Ezh2 deficiency activates multiple death pathways in differentiated effector Th cells.

Publication

Gene ontology analysis for RNA-seq: accounting for selection bias

Publisher: Springer Science and Business Media LLC

Date: 2010

DOI: 10.1186/GB-2010-11-2-R14

Publication

Clinker: visualising fusion genes detected in RNA-seq data

Publisher: Cold Spring Harbor Laboratory

Date: 13-11-2017

DOI: 10.1101/218586

Abstract: Genomic profiling efforts have revealed a rich ersity of oncogenic fusion genes, and many are emerging as important therapeutic targets. While there are many ways to identify fusion genes from RNA-seq data, visualising these transcripts and their supporting reads remains challenging. Clinker is a bioinformatics tool written in Python, R and Bpipe, that leverages the superTranscript method to visualise fusion genes. We demonstrate the use of Clinker to obtain interpretable visualisations of the RNA-seq data that lead to fusion calls. In addition, we use Clinker to explore multiple fusion transcripts with novel breakpoints within the P2RY8-CRLF2 fusion gene in B-cell Acute Lymphoblastic Leukaemia (B-ALL). Clinker is freely available from Github github.com/Oshlack/Clinker under a MIT License. alicia.oshlack@mcri.edu.au

Publication

Statistical analysis of an RNA titration series evaluates microarray precision and sensitivity on a whole-array basis

Publisher: Springer Science and Business Media LLC

Date: 22-11-2006

DOI: 10.1186/1471-2105-7-511

Abstract: Concerns are often raised about the accuracy of microarray technologies and the degree of cross-platform agreement, but there are yet no methods which can unambiguously evaluate precision and sensitivity for these technologies on a whole-array basis. A methodology is described for evaluating the precision and sensitivity of whole-genome gene expression technologies such as microarrays. The method consists of an easy-to-construct titration series of RNA s les and an associated statistical analysis using non-linear regression. The method evaluates the precision and responsiveness of each microarray platform on a whole-array basis, i.e., using all the probes, without the need to match probes across platforms. An experiment is conducted to assess and compare four widely used microarray platforms. All four platforms are shown to have satisfactory precision but the commercial platforms are superior for resolving differential expression for genes at lower expression levels. The effective precision of the two-color platforms is improved by allowing for probe-specific dye-effects in the statistical model. The methodology is used to compare three data extraction algorithms for the Affymetrix platforms, demonstrating poor performance for the commonly used proprietary algorithm relative to the other algorithms. For probes which can be matched across platforms, the cross-platform variability is decomposed into within-platform and between-platform components, showing that platform disagreement is almost entirely systematic rather than due to measurement variability. The results demonstrate good precision and sensitivity for all the platforms, but highlight the need for improved probe annotation. They quantify the extent to which cross-platform measures can be expected to be less accurate than within-platform comparisons for predicting disease progression or outcome.

Publication

Clinker: visualizing fusion genes detected in RNA-seq data

Publisher: Oxford University Press (OUP)

Date: 07-2018

DOI: 10.1093/GIGASCIENCE/GIY079

Publication

Ximmer: a system for improving accuracy and consistency of CNV calling from exome data

Publisher: Oxford University Press (OUP)

Date: 06-09-2018

DOI: 10.1093/GIGASCIENCE/GIY112

Publication

Publisher Correction: The role of cardiac transcription factor NKX2-5 in regulating the human cardiac miRNAome

Publisher: Springer Science and Business Media LLC

Date: 27-12-2019

DOI: 10.1038/S41598-019-55970-6

Abstract: An amendment to this paper has been published and can be accessed via a link at the top of the paper.

Publication

Sierra: discovery of differential transcript usage from polyA-captured single-cell RNA-seq data

Publisher: Cold Spring Harbor Laboratory

Date: 06-12-2022

DOI: 10.1101/867309

Abstract: High-throughput single-cell RNA-seq (scRNA-seq) is a powerful tool for studying gene expression in single cells. Most current scRNA-seq bioinformatics tools focus on analysing overall expression levels, largely ignoring alternative mRNA isoform expression. We present a computational pipeline, Sierra, that readily detects differential transcript usage from data generated by commonly used polyA-captured scRNA-seq technology. We validate Sierra by comparing cardiac scRNA-seq cell-types to bulk RNA-seq of matched populations, finding significant overlap in differential transcripts. Sierra detects differential transcript usage across human peripheral blood mononuclear cells and the Tabula Muris, and 3’UTR shortening in cardiac fibroblasts. Sierra is available at github.com/VCCRI/Sierra .

Publication

Gene set enrichment analysis for genome-wide DNA methylation data

Publisher: Cold Spring Harbor Laboratory

Date: 25-08-2020

DOI: 10.1101/2020.08.24.265702

Abstract: DNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalisation and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.

Publication

MINTIE: identifying novel structural and splice variants in transcriptomes using RNA-seq data

Publisher: Springer Science and Business Media LLC

Date: 22-10-2021

DOI: 10.1186/S13059-021-02507-8

Abstract: Calling fusion genes from RNA-seq data is well established, but other transcriptional variants are difficult to detect using existing approaches. To identify all types of variants in transcriptomes we developed MINTIE, an integrated pipeline for RNA-seq data. We take a reference-free approach, combining de novo assembly of transcripts with differential expression analysis to identify up-regulated novel variants in a case s le. We compare MINTIE with eight other approaches, detecting 85% of variants while no other method is able to achieve this. We posit that MINTIE will be able to identify new disease variants across a range of disease types.

Publication

SuperFreq: Integrated mutation detection and clonal tracking in cancer

Publisher: Public Library of Science (PLoS)

Date: 13-02-2020

DOI: 10.1371/JOURNAL.PCBI.1007603

Publication

Pooled-parent exome sequencing to prioritise de novo variants in genetic disease

Publisher: Cold Spring Harbor Laboratory

Date: 07-04-2019

DOI: 10.1101/601740

Abstract: In the clinical setting, exome sequencing has become standard-of-care in diagnosing rare genetic disorders, however many patients remain unsolved. Trio sequencing has been demonstrated to produce a higher diagnostic yield than singleton (proband-only) sequencing. Parental sequencing is especially useful when a disease is suspected to be caused by a de novo variant in the proband, because parental data provide a strong filter for the majority of variants that are shared by the proband and their parents. However the additional cost of sequencing the parents makes the trio strategy uneconomical for many clinical situations. With two thirds of the sequencing budget being spent on parents, these are funds that could be used to sequence more probands. For this reason many clinics are reluctant to sequence parents. Here we propose a pooled-parent strategy for exome sequencing of in iduals with likely de novo disease. In this strategy, DNA from all the parents of a cohort of unrelated probands is pooled together into a single exome capture and sequencing run. Variants called in the proband can then be filtered if they are also found in the parent pool, resulting in a shorter list of prioritised variants. To evaluate the pooled-parent strategy we performed a series of simulations by combining reads from in idual exomes to imitate s le pooling. We assessed the recall and false positive rate and investigated the trade-off between pool size and recall rate. We compared the performance of GATK HaplotypeCaller in idual and joint calling, and FreeBayes to genotype pooled s les. Finally, we applied a pooled-parent strategy to a set of real unsolved cases and showed that the parent pool is a powerful filter that is complementary to other commonly used variant filters such as population variant frequencies.

Publication

miRNA-Seq normalization comparisons need improvement

Publisher: Cold Spring Harbor Laboratory

Date: 24-04-2013

DOI: 10.1261/RNA.037895.112

Publication

Sixty years of genome biology

Publisher: Springer Science and Business Media LLC

Date: 2013

DOI: 10.1186/GB-2013-14-4-113

Publication

Bazam: A rapid method for read extraction and realignment of high throughput sequencing data

Publisher: Cold Spring Harbor Laboratory

Date: 03-10-2018

DOI: 10.1101/433003

Abstract: As costs of high throughput sequencing have fallen, we are seeing vast quantities of short read genomic data being generated. Often, the data is exchanged and stored as aligned reads, which provides high compression and convenient access for many analyses. However, aligned data becomes outdated as new reference genomes and alignment methods become available. Moreover, some applications cannot utilise pre-aligned reads at all, necessitating conversion back to raw format (FASTQ) before they can be used. In both cases, the process of extraction and realignment is expensive and time consuming. We describe Bazam, a tool that efficiently extracts the original paired FASTQ from reads stored in aligned form (BAM or CRAM format). Bazam extracts reads in a format that directly allows realignment with popular aligners with high concurrency. Through eliminating steps and increasing the accessible concurrency, Bazam facilitates up to a 90% reduction in the time required for realignment compared to standard methods. Bazam can support selective extraction of read pairs from focused genomic regions, further increasing efficiency for targeted analyses. Bazam is additionally suitable as a base for other applications that require efficient paired read information, such as quality control, structural variant calling and alignment comparison. Bazam offers significant improvements for users needing to realign genomic data.

Publication

Segmental Duplications Contribute to Gene Expression Differences Between Humans and Chimpanzees

Publisher: Oxford University Press (OUP)

Date: 06-2009

DOI: 10.1534/GENETICS.108.099960

Abstract: In addition to specific changes in cis- and trans-regulatory elements, structural changes in the genome are hypothesized to underlie a large number of differences in gene expression between species. Accordingly, we show that species-specific segmental duplications are enriched with genes that are differentially expressed between humans and chimpanzees.

Publication

Expression discordance of monozygotic twins at birth: Effect of intrauterine environment and a possible mechanism for fetal programming

Publisher: Informa UK Limited

Date: 05-2011

DOI: 10.4161/EPI.6.5.15072

Abstract: Within-pair comparison of monozygotic (MZ) twins provides an ideal model for studying factors that regulate epigenetic profile, by controlling for genetic variation. Previous reports have demonstrated epigenetic variability within MZ pairs, but the contribution of early life exposures to this variation remains unclear. As epigenetic marks govern gene expression, we have used gene expression discordance as a proxy measure of epigenetic discordance in MZ twins at birth in two cell types. We found strong evidence of expression discordance at birth in both cell types and some evidence for higher discordance in twin pairs with separate placentas. Genes previously defined as being involved in response to the external environment showed the most variable expression within pairs, independent of cell type, supporting the idea that even slight differences in intrauterine environment can influence expression profile. Focusing on birthweight, previously identified as a predisposing factor for cardiovascular, metabolic and other complex diseases, and using a statistical model that estimated association based on within-pair variation of expression and birthweight, we found some association between birthweight and expression of genes involved in metabolism and cardiovascular function. This study is the first to examine expression discordance in newborn twins. It provides evidence of a link between birthweight and activity of specific cellular pathways and, as evidence points to gene expression profiles being maintained through cell ision by epigenetic factors, provides a plausible biological mechanism for the previously described link between low birthweight and increased risk of later complex disease.

Publication

SuperTranscripts: a data driven reference for analysis and visualisation of transcriptomes

Publisher: Springer Science and Business Media LLC

Date: 04-08-2017

DOI: 10.1186/S13059-017-1284-1

Publication

Identification of recurrent FGFR3 fusion genes in lung cancer through kinome‐centred RNA sequencing

Publisher: Wiley

Date: 07-06-2013

DOI: 10.1002/PATH.4209

Abstract: Oncogenic fusion genes that involve kinases have proven to be effective targets for therapy in a wide range of cancers. Unfortunately, the diagnostic approaches required to identify these events are struggling to keep pace with the erse array of genetic alterations that occur in cancer. Diagnostic screening in solid tumours is particularly challenging, as many fusion genes occur with a low frequency. To overcome these limitations, we developed a capture enrichment strategy to enable high-throughput transcript sequencing of the human kinome. This approach provides a global overview of kinase fusion events, irrespective of the identity of the fusion partner. To demonstrate the utility of this system, we profiled 100 non-small cell lung cancers and identified numerous genetic alterations impacting fibroblast growth factor receptor 3 (FGFR3) in lung squamous cell carcinoma and a novel ALK fusion partner in lung adenocarcinoma.

Publication

ChIP-seq analysis reveals distinct H3K27me3 profiles that correlate with transcriptional activity

Publisher: Oxford University Press (OUP)

Date: 07-06-2011

DOI: 10.1093/NAR/GKR416

Publication

A very radio loud narrow-line Seyfert 1: PKS 2004-447

Publisher: American Astronomical Society

Date: 10-09-2001

DOI: 10.1086/322299

Publication

Diagnostic and cost utility of whole exome sequencing in peripheral neuropathy

Publisher: Wiley

Date: 26-04-2017

DOI: 10.1002/ACN3.409

Publication

Analysis of epigenetic changes in survivors of preterm birth reveals the effect of gestational age and evidence for a long term legacy

Publisher: Springer Science and Business Media LLC

Date: 2013

DOI: 10.1186/GM500

Publication

Gene length and detection bias in single cell RNA sequencing protocols

Publisher: Cold Spring Harbor Laboratory

Date: 22-03-2017

DOI: 10.1101/119222

Abstract: Single cell RNA sequencing (scRNA-seq) has rapidly gained popularity for profiling transcriptomes of hundreds to thousands of single cells. This technology has led to the discovery of novel cell types and revealed insights into the development of complex tissues. However, many technical challenges need to be overcome during data generation. Due to minute amounts of starting material, s les undergo extensive lification, increasing technical variability. A solution for mitigating lification biases is to include Unique Molecular Identifiers (UMIs), which tag in idual molecules. Transcript abundances are then estimated from the number of unique UMIs aligning to a specific gene and PCR duplicates resulting in copies of the UMI are not included in expression estimates. Here we investigate the effect of gene length bias in scRNA-Seq across a variety of datasets differing in terms of capture technology, library preparation, cell types and species. We find that scRNA-seq datasets that have been sequenced using a full-length transcript protocol exhibit gene length bias akin to bulk RNA-seq data. Specifically, shorter genes tend to have lower counts and a higher rate of dropout. In contrast, protocols that include UMIs do not exhibit gene length bias, and have a mostly uniform rate of dropout across genes of varying length. Across four different scRNA-Seq datasets profiling mouse embryonic stem cells (mESCs), we found the subset of genes that are only detected in the UMI datasets tended to be shorter, while the subset of genes detected only in the full-length datasets tended to be longer. We briefly discuss the role of these genes in the context of differential expression testing and GO analysis. In addition, despite clear differences between UMI and full-length transcript data, we illustrate that full-length and UMI data can be combined to reveal underlying biology influencing expression of mESCs.

Publication

Fast and accurate differential transcript usage by testing equivalence class counts

Publisher: F1000 Research Ltd

Date: 07-03-2019

DOI: 10.12688/F1000RESEARCH.18276.1

Abstract: Background: RNA sequencing has enabled high-throughput and fine-grained quantitative analyses of the transcriptome. While differential gene expression is the most widely used application of this technology, RNA-seq data also has the resolution to infer differential transcript usage (DTU), which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. DTU has typically been inferred from exon-count data, which has issues with assigning reads unambiguously to counting bins, and requires alignment of reads to the genome. Recently, approaches have emerged that use transcript quantifications estimates directly for DTU. Transcript counts can be inferred from 'pseudo' or lightweight aligners, which are significantly faster than traditional genome alignment. However, recent evaluations show lower sensitivity in DTU analysis. Transcript abundances are estimated from equivalence classes (ECs), which determine the transcripts that any given read is compatible with. Recent work has proposed performing differential expression testing directly on equivalence class read counts (ECs). Methods: Here we demonstrate that ECs can be used effectively with existing count-based methods for detecting DTU. We evaluate this approach on simulated human and drosophila data, as well as on a real dataset through subset testing. Results: We find that ECs counts have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners. Conclusions: We posit that equivalence class read counts are a natural unit on which to perform many types of analysis.

Publication

Using equivalence class counts for fast and accurate testing of differential transcript usage

Publisher: F1000 Research Ltd

Date: 29-04-2019

DOI: 10.12688/F1000RESEARCH.18276.2

Abstract: Background: RNA sequencing has enabled high-throughput and fine-grained quantitative analyses of the transcriptome. While differential gene expression is the most widely used application of this technology, RNA-seq data also has the resolution to infer differential transcript usage (DTU), which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. DTU has typically been inferred from exon-count data, which has issues with assigning reads unambiguously to counting bins, and requires alignment of reads to the genome. Recently, approaches have emerged that use transcript quantification estimates directly for DTU. Transcript counts can be inferred from 'pseudo' or lightweight aligners, which are significantly faster than traditional genome alignment. However, recent evaluations show lower sensitivity in DTU analysis compared to exon-level analysis. Transcript abundances are estimated from equivalence classes (ECs), which determine the transcripts that any given read is compatible with. Recent work has proposed performing a variety of RNA-seq analysis directly on equivalence class counts (ECCs). Methods: Here we demonstrate that ECCs can be used effectively with existing count-based methods for detecting DTU. We evaluate this approach on simulated human and drosophila data, as well as on a real dataset through subset testing. Results: We find that ECCs have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners. Conclusions: We posit that equivalence class read counts are a natural unit on which to perform differential transcript usage analysis.

Publication

Performance of a validated spontaneous preterm delivery predictor in South Asian and Sub-Saharan African women: a nested case control study

Publisher: Informa UK Limited

Date: 30-11-2021

DOI: 10.1080/14767058.2021.2005573

Abstract: To address the disproportionate burden of preterm birth (PTB) in low- and middle-income countries, this study aimed to (1) verify the performance of the United States-validated spontaneous PTB (sPTB) predictor, comprised of the IBP4/SHBG protein ratio, in subjects from Bangladesh, Pakistan and Tanzania enrolled in the Alliance for Maternal and Newborn Health Improvement (AMANHI) biorepository study, and (2) discover biomarkers that improve performance of IBP4/SHBG in the AMANHI cohort. The performance of the IBP4/SHBG biomarker was first evaluated in a nested case control validation study, then utilized in a follow-on discovery study performed on the same s les. Levels of serum proteins were measured by targeted mass spectrometry. Differences between the AMANHI and U.S. cohorts were adjusted using body mass index (BMI) and gestational age (GA) at blood draw as covariates. Prediction of sPTB < 37 weeks and < 34 weeks was assessed by area under the receiver operator curve (AUC). In the discovery phase, an artificial intelligence method selected additional protein biomarkers complementary to IBP4/SHBG in the AMANHI cohort. The IBP4/SHBG biomarker significantly predicted sPTB < 37 weeks ( A protein biomarker pair developed in the U.S. may have broader application in erse non-U.S. populations.

Alicia Oshlack

Researcher

Research Topics

Top 5 Research Topics

ANZSRC Field of Research (FoR)

ANZSRC Socio-Economic Objective (SEO)

Related Links

Publications

JAFFAL: Detecting fusion genes with long read transcriptome sequencing

A Combination of Genomic Approaches Reveals the Role of FOXO1a in Regulating an Oxidative Stress Response Pathway

Multimodal single cell analysis of the paediatric lower airway reveals novel immune cell phenotypes in early life health and disease

Co-option of the cardiac transcription factor Nkx2.5 during development of the emu wing

Nephron progenitor commitment is a stochastic process influenced by cell migration.

Transcriptional profiles for distinct aggregation states of mutant Huntingtin exon 1 protein unmask new Huntington's disease pathways

The nature of the optical emission in radio-selected AGN

STRipy: a graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data

Necklace: combining reference and assembled transcriptomes for more comprehensive RNA-Seq analysis

Using DNA microarrays to study gene expression in closely related species

MINTIE: identifying novel structural and splice variants in transcriptomes using RNA-seq data

Accuracy of short tandem repeats genotyping tools in whole exome sequencing data

Bazam: a rapid method for read extraction and realignment of high-throughput sequencing data

Jarid2 regulates hematopoietic stem cell function by acting with polycomb repressive complex 2.

NKX2-5 regulates human cardiomyogenesis via a HEY2 dependent transcriptional network

Removing unwanted variation in a differential methylation analysis of Illumina HumanMethylation450 array data

Optimizing the noise versus bias trade-off for Illumina whole genome expression BeadChips

Natural selection on gene expression

The spectral energy distribution of PKS 2004-447: a compact steep-spectrum source and possible radio-loud narrow-line Seyfert 1 galaxy

Identification of candidate gonadal sex differentiation genes in the chicken embryo using RNA-seq

SWAN: Subset-quantile Within Array Normalization for Illumina Infinium HumanMethylation450 BeadChips

Single cell analysis of the developing mouse kidney provides deeper insight into marker gene expression and ligand-receptor crosstalk

Enhancer retargeting of CDX2 and UBTF::ATXN7L3 define a subtype of high-risk B-progenitor acute lymphoblastic leukemia

Clustering trees: a visualization for evaluating clusterings at multiple resolutions

STRetch: detecting and discovering pathogenic short tandem repeat expansions

Expression profiling in primates reveals a rapid evolution of human transcription factors

Near infrared micro-variability of radio-loud quasars

Functionally distinct roles for different miR-155 expression levels through contrasting effects on gene expression, in acute myeloid leukaemia.

Bpipe: a tool for running and managing bioinformatics pipelines

A scaling normalization method for differential expression analysis of RNA-seq data

A cross-package Bioconductor workflow for analysing methylation array data

A cross-package Bioconductor workflow for analysing methylation array data.

MLL-TFE3: a novel and aggressive KMT2A fusion identified in infant leukemia

Black hole mass estimates of radio-selected quasars

A cross-package Bioconductor workflow for analysing methylation array data

Detecting copy number alterations in RNA-Seq using SuperFreq

splatPop: simulating population scale single-cell RNA sequencing data

Splatter: simulation of single-cell RNA sequencing data

SuperFreq: Integrated mutation detection and clonal tracking in cancer

Whole exome sequencing in systemic juvenile idiopathic arthritis

Normalization of boutique two-color microarrays with a high proportion of differentially expressed probes

From RNA-seq reads to differential expression results

MCM3AP in recessive Charcot-Marie-Tooth neuropathy and mild intellectual disability

A comparison of background correction methods for two-colour microarrays.

The role of cardiac transcription factor NKX2-5 in regulating the human cardiac miRNAome

Benchmarking single-cell hashtag oligo demultiplexing methods

SuperTranscript: a data driven reference for analysis and visualisation of transcriptomes

SFPQ-ABL1 and BCR-ABL1 use different signaling networks to drive B-cell acute lymphoblastic leukemia

Slinker: Visualising novel splicing events in RNA-Seq data [version 1; peer review: awaiting peer review]

STRipy: A graphical application for enhanced genotyping of pathogenic short tandem repeats in sequencing data.

A prospective evaluation of whole-exome sequencing as a first-tier molecular test in infants with suspected monogenic disorders

Clustering trees: a visualisation for evaluating clusterings at multiple resolutions

Author response: Nephron progenitor commitment is a stochastic process influenced by cell migration

Disorders of sex development: insights from targeted gene sequencing of a large international patient cohort

RNA sequencing reveals sexually dimorphic gene expression before gonadal differentiation in chicken and allows comprehensive annotation of the W-chromosome

Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database

Accuracy of short tandem repeats genotyping tools in whole exome sequencing data

Cpipe: A shared variant detection pipeline designed for diagnostic settings

Transcript length bias in RNA-seq data confounds systems biology

Gene Regulation in Primates Evolves under Tissue-Specific Selection Pressures

ALLSorts: an RNA-Seq subtype classifier for B-cell acute lymphoblastic leukemia

Diagnostic impact and cost-effectiveness of whole-exome sequencing for ambulant children with suspected monogenic conditions

Fast and accurate differential transcript usage by testing equivalence class counts

Cell-Type–Specific Transcriptional Profiles of the Dimorphic Pathogen Penicillium marneffei Reflect Distinct Reproductive, Morphological, and Environmental Demands

STRetch

splatPop: simulating population scale single-cell RNA sequencing data

3D organoid-derived human glomeruli for personalised podocyte disease modelling and drug screening.

JAFFAL: detecting fusion genes with long-read transcriptome sequencing

propeller: Testing for differences in cell type proportions in single cell data

ALLSorts: a RNA-Seq classifier for B-Cell Acute Lymphoblastic Leukemia

As we come to the end of 2011, several members of the Genome Biology Editorial Board give their views on the state of play in genomics

Toblerone: detecting exon deletion events in cancer using RNA-seq