ORCID Profile
0000-0002-8461-7467
Current Organisation
The Walter and Eliza Hall Institute
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: American Society of Hematology
Date: 09-10-2020
DOI: 10.1182/BLOODADVANCES.2020002708
Abstract: A novel KMT2A-rearrangement, MLL-TFE3, was identified in an infant leukemia patient. MLL-TFE3 expression produces aggressive leukemia in a mouse model.
Publisher: Springer Science and Business Media LLC
Date: 03-10-2013
DOI: 10.1038/NCOMMS3537
Publisher: Cold Spring Harbor Laboratory
Date: 26-04-2021
DOI: 10.1101/2021.04.26.441398
Abstract: Massively parallel short read transcriptome sequencing has greatly expanded our knowledge of fusion genes which are drivers of tumor initiation and progression. In cancer, many fusions are also important diagnostic markers and targets for therapy. Long read transcriptome sequencing allows the full length of fusion transcripts to be discovered, however, this data has a high rate of errors and fusion finding algorithms designed for short reads do not work. While numerous fusion finding algorithms now exist for short read RNA sequencing data, there are few methods to detect fusions using third generation or long read sequencing data. Fusion finding in long read sequencing will allow the discovery of the full isoform structure of fusion genes. Here we present JAFFAL, a method to identify fusions from long-read transcriptome sequencing. We validated JAFFAL using simulation, cell line and patient data from Nanopore and PacBio. We show that fusions can be accurately detected in long read data with JAFFAL, providing better accuracy than other long read fusion finders and with similar performance as state-of-the-art methods applied to short read data. By comparing Nanopore transcriptome sequencing protocols we find that numerous chimeric molecules are generated during cDNA library preparation that are absent when RNA is sequenced directly. We demonstrate that JAFFAL enables fusions to be detected at the level of in idual cells, when applied to long read single cell sequencing. Moreover, we demonstrate JAFFAL can identify fusions spanning three genes, highlighting the utility of long reads to characterise the transcriptional products of complex structural rearrangements with unprecedented resolution. JAFFAL is open source and available as part of the JAFFA package at github.com/Oshlack/JAFFA/wiki .
Publisher: Cold Spring Harbor Laboratory
Date: 19-12-2018
DOI: 10.1101/501106
Abstract: RNA sequencing has enabled high-throughput and fine-grained quantitative analyses of the transcriptome. While differential gene expression is the most widely used application of this technology, RNA-seq data also has the resolution to infer differential transcript usage (DTU), which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. DTU has typically been inferred from exon-count data, which has issues with assigning reads unambiguously to counting bins, and requires alignment of reads to the genome. Recently, approaches have emerged that use transcript quantifications estimates directly for DTU. Transcript counts can be inferred from ‘pseudo’ or lightweight aligners, which are significantly faster than traditional genome alignment. However, recent evaluations show lower sensitivity in DTU analysis. Transcript abundances are estimated from equivalence classes (ECs), which determine the transcripts that any given read is compatible with. Here we propose performing DTU testing directly on equivalence class read counts. We evaluate this approach on simulated human and drosophila data, as well as on a real dataset through subset testing. We find that ECs counts have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners. We posit that equivalent class counts is a natural unit on which to perform many types of analysis.
Publisher: Springer Science and Business Media LLC
Date: 25-07-2017
DOI: 10.1038/S41467-017-00112-7
Abstract: The ratites are a distinctive clade of flightless birds, typified by the emu and ostrich that have acquired a range of unique anatomical characteristics since erging from basal Aves at least 100 million years ago. The emu possesses a vestigial wing with a single digit and greatly reduced forelimb musculature. However, the embryological basis of wing reduction and other anatomical changes associated with loss of flight are unclear. Here we report a previously unknown co-option of the cardiac transcription factor Nkx2.5 to the forelimb in the emu embryo, but not in ostrich, or chicken and zebra finch, which have fully developed wings. Nkx2.5 is expressed in emu limb bud mesenchyme and maturing wing muscle, and mis-expression of Nkx2.5 throughout the limb bud in chick results in wing reductions. We propose that Nkx2.5 functions to inhibit early limb bud expansion and later muscle growth during development of the vestigial emu wing.
Publisher: The Company of Biologists
Date: 2020
DOI: 10.1242/DEV.193037
Abstract: The genetic regulatory network controlling early fate choices during human blood cell development are not well understood. We used human pluripotent stem cell reporter lines to track the development of endothelial and haematopoietic populations in an in vitro model of human yolk-sac development. We identified SOX17−CD34+CD43− endothelial cells at day 2 of blast colony development, as a haemangioblast-like branch point from which SOX17−CD34+CD43+ blood cells and SOX17+CD34+CD43− endothelium subsequently arose. Most human blood cell development was dependent on RUNX1. Deletion of RUNX1 only permitted a single wave of yolk sac-like primitive erythropoiesis, but no yolk sac myelopoiesis or aorta-gonad-mesonephros (AGM)-like haematopoiesis. Blocking GFI1/1B activity with a small molecule inhibitor abrogated all blood cell development, even in cell lines with an intact RUNX1 gene. Together, our data defines the hierarchical requirements for both RUNX1 and GFI1/1B during early human haematopoiesis arising from a yolk sac-like SOX17-negative haemogenic endothelial intermediate.
Publisher: Cold Spring Harbor Laboratory
Date: 22-04-2021
DOI: 10.1101/2021.04.21.440736
Abstract: The human genome contains more than 200,000 gene isoforms. However, different isoforms can be highly similar, and with an average length of 1.5kb remain difficult to study with short read sequencing. To systematically evaluate the ability to study the transcriptome at a resolution of in idual isoforms we profiled 5 human cell lines with short read cDNA sequencing and Nanopore long read direct RNA, lification-free direct cDNA, PCR-cDNA sequencing. The long read protocols showed a high level of consistency, with lification-free RNA and cDNA sequencing being most similar. While short and long reads generated comparable gene expression estimates, they differed substantially for in idual isoforms. We find that increased read length improves read-to-transcript assignment, identifies interactions between alternative promoters and splicing, enables the discovery of novel transcripts from repetitive regions, facilitates the quantification of full-length fusion isoforms and enables the simultaneous profiling of m6A RNA modifications when RNA is sequenced directly. Our study demonstrates the advantage of long read RNA sequencing and provides a comprehensive resource that will enable the development and benchmarking of computational methods for profiling complex transcriptional events at isoform-level resolution.
Publisher: Springer Science and Business Media LLC
Date: 03-2021
Publisher: Springer Science and Business Media LLC
Date: 06-01-2022
DOI: 10.1186/S13059-021-02588-5
Abstract: In cancer, fusions are important diagnostic markers and targets for therapy. Long-read transcriptome sequencing allows the discovery of fusions with their full-length isoform structure. However, due to higher sequencing error rates, fusion finding algorithms designed for short reads do not work. Here we present JAFFAL, to identify fusions from long-read transcriptome sequencing. We validate JAFFAL using simulations, cell lines, and patient data from Nanopore and PacBio. We apply JAFFAL to single-cell data and find fusions spanning three genes demonstrating transcripts detected from complex rearrangements. JAFFAL is available at github.com/Oshlack/JAFFA/wiki .
Publisher: Elsevier BV
Date: 02-2016
Publisher: Cold Spring Harbor Laboratory
Date: 08-2021
DOI: 10.1101/2021.08.01.454393
Abstract: B-cell acute lymphoblastic leukemia (B-ALL) is the most common childhood cancer. Subtypes within B-ALL are distinguished by characteristic structural variants and mutations, which in some instances strongly correlate with responses to treatment. The World Health Organisation (WHO) recognises seven distinct classifications, or subtypes , as of 2016. However, recent studies have demonstrated that B-ALL can be segmented into 23 subtypes based on a combination of genomic features and gene expression profiles. A method to identify a patient’s subtype would have clear clinical utility. Despite this, no publically available classification methods using RNA-Seq exist for this purpose. Here we present ALLSorts: a publicly available method that uses RNA-Seq data to classify B-ALL s les to 18 known subtypes and five meta-subtypes. ALLSorts is the result of a hierarchical supervised machine learning algorithm applied to a training set of 1223 B-ALL s les aggregated from multiple cohorts. Validation revealed that ALLSorts can accurately attribute s les to subtypes and can attribute multiple subtypes to a s le. Furthermore, when applied to both paediatric and adult cohorts, ALLSorts was able to classify previously undefined s les into subtypes. ALLSorts is available and documented on GitHub ( github.com/Oshlack/AllSorts/ ). ALLSorts is a gene expression classifier for B-cell acute lymphoblastic leukemia, which predicts 18 distinct genomic subtypes - including those designated by the World Health Organisation (WHO) and provisional entities. Trained and validated on over 2300 B-ALL s les, representing each subtype and a variety of clinical features. Correctly identified subtypes in 91% of cases in a held-out dataset and between 82-93% across a newly combined cohort of paediatric and adult s les. ALLSorts assigned subtypes to s les with previously unknown driver events. ALLsorts is an accurate, comprehensive and freely available classification tool that distinguishes subtypes of B-cell acute lymphoblastic leukemia from RNA-sequencing.
Publisher: Springer Science and Business Media LLC
Date: 22-10-2021
DOI: 10.1186/S13059-021-02507-8
Abstract: Calling fusion genes from RNA-seq data is well established, but other transcriptional variants are difficult to detect using existing approaches. To identify all types of variants in transcriptomes we developed MINTIE, an integrated pipeline for RNA-seq data. We take a reference-free approach, combining de novo assembly of transcripts with differential expression analysis to identify up-regulated novel variants in a case s le. We compare MINTIE with eight other approaches, detecting 85% of variants while no other method is able to achieve this. We posit that MINTIE will be able to identify new disease variants across a range of disease types.
Publisher: Wiley
Date: 28-06-2019
DOI: 10.1002/PBC.27897
Abstract: We report two patients with leukaemia driven by the rare CNTRL-FGFR1 fusion oncogene. This fusion arises from a t(8 )(p12 q33) translocation, and is a rare driver of biphenotypic leukaemia in children. We used RNA sequencing to report novel features of expressed CNTRL-FGFR1, including CNTRL-FGFR1 fusion alternative splicing. From this knowledge, we designed and tested a Droplet Digital PCR assay that detects CNTRL-FGFR1 expression to approximately one cell in 100 000 using fusion breakpoint-specific primers and probes. We also utilised cell-line models to show that effective tyrosine kinase inhibitors, which may be included in treatment regimens for this disease, are only those that block FGFR1 phosphorylation.
Publisher: Springer Science and Business Media LLC
Date: 07-2014
Publisher: Springer Science and Business Media LLC
Date: 11-05-2015
Publisher: Cold Spring Harbor Laboratory
Date: 04-06-2020
DOI: 10.1101/2020.06.03.131532
Abstract: Genomic rearrangements can modify gene function by altering transcript sequences, and have been shown to be drivers in both cancer and rare diseases. Although there are now many methods to detect structural variants from Whole Genome Sequencing (WGS), RNA sequencing (RNA-seq) remains under-utilised as a technology for the detection of gene altering structural variants. Calling fusion genes from RNA-seq data is well established, but other transcriptional variants such as fusions with novel sequence, tandem duplications, large insertions and deletions, and novel splicing are difficult to detect using existing approaches. To identify all types of variants in transcriptomes, we developed MINTIE, an integrated pipeline for RNA-seq data. We take a reference free approach, which combines de novo assembly of transcripts with differential expression analysis, to identify up-regulated novel variants in a case s le. We validated MINTIE on simulated and real data sets and compared it with eight other approaches for finding novel transcriptional variants. We found MINTIE was able to detect % of variants while no other method was able to achieve this. We applied MINTIE to RNA-seq data from a cohort of acute lymphoblastic leukemia (ALL) patient s les and identified several clinically relevant variants, including a recurrent unpartnered fusion involving the tumour suppressor gene RB1, and variants in ALL-associated genes: tandem duplications in IKZF1 and PAX5, and novel splicing in ETV6. We further demonstrate the utility of MINTIE to identify rare disease variants using RNA-seq, including the discovery of an inter-chromosomal translocation in the DMD gene in a patient with muscular dystrophy. We posit that MINTIE will be able to identify new disease variants across a range of cancers and other disease types.
Publisher: American Society of Hematology
Date: 09-03-2020
DOI: 10.1182/BLOODADVANCES.2019001008
Abstract: Acute lymphoblastic leukemia (ALL) is the most common childhood malignancy, and implementation of risk-adapted therapy has been instrumental in the dramatic improvements in clinical outcomes. A key to risk-adapted therapies includes the identification of genomic features of in idual tumors, including chromosome number (for hyper- and hypodiploidy) and gene fusions, notably ETV6-RUNX1, TCF3-PBX1, and BCR-ABL1 in B-cell ALL (B-ALL). RNA-sequencing (RNA-seq) of large ALL cohorts has expanded the number of recurrent gene fusions recognized as drivers in ALL, and identification of these new entities will contribute to refining ALL risk stratification. We used RNA-seq on 126 ALL patients from our clinical service to test the utility of including RNA-seq in standard-of-care diagnostic pipelines to detect gene rearrangements and IKZF1 deletions. RNA-seq identified 86% of rearrangements detected by standard-of-care diagnostics. KMT2A (MLL) rearrangements, although usually identified, were the most commonly missed by RNA-seq as a result of low expression. RNA-seq identified rearrangements that were not detected by standard-of-care testing in 9 patients. These were found in patients who were not classifiable using standard molecular assessment. We developed an approach to detect the most common IKZF1 deletion from RNA-seq data and validated this using an RQ-PCR assay. We applied an expression classifier to identify Philadelphia chromosome–like B-ALL patients. T-ALL proved a rich source of novel gene fusions, which have clinical implications or provide insights into disease biology. Our experience shows that RNA-seq can be implemented within an in idual clinical service to enhance the current molecular diagnostic risk classification of ALL.
Publisher: Elsevier BV
Date: 03-2012
Publisher: American Society of Hematology
Date: 07-04-2022
DOI: 10.1182/BLOODADVANCES.2021006076
Abstract: Philadelphia-like (Ph-like) acute lymphoblastic leukemia (ALL) is a high-risk subtype of B-cell ALL characterized by a gene expression profile resembling Philadelphia chromosome–positive ALL (Ph+ ALL) in the absence of BCR-ABL1. Tyrosine kinase–activating fusions, some involving ABL1, are recurrent drivers of Ph-like ALL and are targetable with tyrosine kinase inhibitors (TKIs). We identified a rare instance of SFPQ-ABL1 in a child with Ph-like ALL. SFPQ-ABL1 expressed in cytokine-dependent cell lines was sufficient to transform cells and these cells were sensitive to ABL1-targeting TKIs. In contrast to BCR-ABL1, SFPQ-ABL1 localized to the nuclear compartment and was a weaker driver of cellular proliferation. Phosphoproteomics analysis showed upregulation of cell cycle, DNA replication, and spliceosome pathways, and downregulation of signal transduction pathways, including ErbB, NF-κB, vascular endothelial growth factor (VEGF), and MAPK signaling in SFPQ-ABL1–expressing cells compared with BCR-ABL1–expressing cells. SFPQ-ABL1 expression did not activate phosphatidylinositol 3-kinase rotein kinase B (PI3K/AKT) signaling and was associated with phosphorylation of G2/M cell cycle proteins. SFPQ-ABL1 was sensitive to navitoclax and S-63845 and promotes cell survival by maintaining expression of Mcl-1 and Bcl-xL. SFPQ-ABL1 has functionally distinct mechanisms by which it drives ALL, including subcellular localization, proliferative capacity, and activation of cellular pathways. These findings highlight the role that fusion partners have in mediating the function of ABL1 fusions.
Publisher: F1000 Research Ltd
Date: 07-12-2021
DOI: 10.12688/F1000RESEARCH.74836.1
Abstract: Visualisation of the transcriptome relative to a reference genome is fraught with sparsity. This is due to RNA sequencing (RNA-Seq) reads being predominantly mapped to exons that account for just under 3% of the human genome. Recently, we have used exon-only references, superTranscripts, to improve visualisation of aligned RNA-Seq data through the omission of supposedly unexpressed regions such as introns. However, variation within these regions can lead to novel splicing events that may drive a pathogenic phenotype. In these cases, the loss of information in only retaining annotated exons presents significant drawbacks. Here we present Slinker, a bioinformatics pipeline written in Python and Bpipe that uses a data-driven approach to assemble s le-specific superTranscripts. At its core, Slinker uses Stringtie2 to assemble transcripts with any sequence across any gene. This assembly is merged with reference transcripts, converted to a superTranscript, of which rich visualisations are made through Plotly with associated annotation and coverage information. Slinker was validated on five novel splicing events of rare disease s les from a cohort of primary muscular disorders. In addition, Slinker was shown to be effective in visualising deletion events within transcriptomes of tumour s les in the important leukemia gene, IKZF1. Slinker offers a succinct visualisation of RNA-Seq alignments across typically sparse regions and is freely available on Github.
Publisher: Elsevier BV
Date: 03-2011
Publisher: Springer Science and Business Media LLC
Date: 04-08-2017
Publisher: Wiley
Date: 07-06-2013
DOI: 10.1002/PATH.4209
Abstract: Oncogenic fusion genes that involve kinases have proven to be effective targets for therapy in a wide range of cancers. Unfortunately, the diagnostic approaches required to identify these events are struggling to keep pace with the erse array of genetic alterations that occur in cancer. Diagnostic screening in solid tumours is particularly challenging, as many fusion genes occur with a low frequency. To overcome these limitations, we developed a capture enrichment strategy to enable high-throughput transcript sequencing of the human kinome. This approach provides a global overview of kinase fusion events, irrespective of the identity of the fusion partner. To demonstrate the utility of this system, we profiled 100 non-small cell lung cancers and identified numerous genetic alterations impacting fibroblast growth factor receptor 3 (FGFR3) in lung squamous cell carcinoma and a novel ALK fusion partner in lung adenocarcinoma.
Publisher: Cold Spring Harbor Laboratory
Date: 21-08-2023
DOI: 10.1101/2023.08.21.554084
Abstract: The process of analyzing high throughput sequencing data often requires the identification and extraction of specific target sequences. This could include tasks such as identifying cellular barcodes and UMIs in single cell data, and specific genetic variants for genotyping. However, the existing tools which perform these functions are often task-specific, such as only demultiplexing barcodes for a dedicated type of experiment or are not tolerant to noise in the sequencing data. To overcome these limitations, we developed Flexiplex, a versatile and fast sequence searching and demultiplexing tool for omics data, which is based on the Levenshtein distance and thus allows imperfect matches. We demonstrate Flexiplex’s application on three use cases, identifying cell line specific sequences in Illumina short read single cell data, and discovering and demultiplexing cellular barcodes from noisy long-read single cell RNA-seq data. We show that Flexiplex achieves an excellent balance of accuracy and computational efficiency compared to leading task-specific tools. Flexiplex is available at davidsongroup.github.io/flexiplex/ .
Publisher: Springer Science and Business Media LLC
Date: 2013
Publisher: Springer Science and Business Media LLC
Date: 16-09-2015
Publisher: Oxford University Press (OUP)
Date: 05-2018
Publisher: F1000 Research Ltd
Date: 07-03-2019
DOI: 10.12688/F1000RESEARCH.18276.1
Abstract: Background: RNA sequencing has enabled high-throughput and fine-grained quantitative analyses of the transcriptome. While differential gene expression is the most widely used application of this technology, RNA-seq data also has the resolution to infer differential transcript usage (DTU), which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. DTU has typically been inferred from exon-count data, which has issues with assigning reads unambiguously to counting bins, and requires alignment of reads to the genome. Recently, approaches have emerged that use transcript quantifications estimates directly for DTU. Transcript counts can be inferred from 'pseudo' or lightweight aligners, which are significantly faster than traditional genome alignment. However, recent evaluations show lower sensitivity in DTU analysis. Transcript abundances are estimated from equivalence classes (ECs), which determine the transcripts that any given read is compatible with. Recent work has proposed performing differential expression testing directly on equivalence class read counts (ECs). Methods: Here we demonstrate that ECs can be used effectively with existing count-based methods for detecting DTU. We evaluate this approach on simulated human and drosophila data, as well as on a real dataset through subset testing. Results: We find that ECs counts have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners. Conclusions: We posit that equivalence class read counts are a natural unit on which to perform many types of analysis.
Publisher: F1000 Research Ltd
Date: 29-04-2019
DOI: 10.12688/F1000RESEARCH.18276.2
Abstract: Background: RNA sequencing has enabled high-throughput and fine-grained quantitative analyses of the transcriptome. While differential gene expression is the most widely used application of this technology, RNA-seq data also has the resolution to infer differential transcript usage (DTU), which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. DTU has typically been inferred from exon-count data, which has issues with assigning reads unambiguously to counting bins, and requires alignment of reads to the genome. Recently, approaches have emerged that use transcript quantification estimates directly for DTU. Transcript counts can be inferred from 'pseudo' or lightweight aligners, which are significantly faster than traditional genome alignment. However, recent evaluations show lower sensitivity in DTU analysis compared to exon-level analysis. Transcript abundances are estimated from equivalence classes (ECs), which determine the transcripts that any given read is compatible with. Recent work has proposed performing a variety of RNA-seq analysis directly on equivalence class counts (ECCs). Methods: Here we demonstrate that ECCs can be used effectively with existing count-based methods for detecting DTU. We evaluate this approach on simulated human and drosophila data, as well as on a real dataset through subset testing. Results: We find that ECCs have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners. Conclusions: We posit that equivalence class read counts are a natural unit on which to perform differential transcript usage analysis.
Publisher: American Society of Hematology
Date: 15-07-2022
Start Date: 2018
End Date: 2020
Funder: National Health and Medical Research Council
View Funded ActivityStart Date: 2013
End Date: 2013
Funder: Swiss National Science Foundation
View Funded ActivityStart Date: 2022
End Date: 2013
Funder: National Health and Medical Research Council
View Funded Activity