ORCID Profile
0000-0001-9788-5690
Current Organisation
Peter MacCallum Cancer Centre
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Central Nervous System | Medical Biochemistry and Metabolomics | Regenerative Medicine (incl. Stem Cells and Tissue Engineering) | Medical Biochemistry and Metabolomics not elsewhere classified | Nanobiotechnology
Cardiovascular System and Diseases | Nervous System and Disorders | Blood Disorders |
Publisher: Cold Spring Harbor Laboratory
Date: 26-04-2021
DOI: 10.1101/2021.04.26.441398
Abstract: Massively parallel short read transcriptome sequencing has greatly expanded our knowledge of fusion genes which are drivers of tumor initiation and progression. In cancer, many fusions are also important diagnostic markers and targets for therapy. Long read transcriptome sequencing allows the full length of fusion transcripts to be discovered, however, this data has a high rate of errors and fusion finding algorithms designed for short reads do not work. While numerous fusion finding algorithms now exist for short read RNA sequencing data, there are few methods to detect fusions using third generation or long read sequencing data. Fusion finding in long read sequencing will allow the discovery of the full isoform structure of fusion genes. Here we present JAFFAL, a method to identify fusions from long-read transcriptome sequencing. We validated JAFFAL using simulation, cell line and patient data from Nanopore and PacBio. We show that fusions can be accurately detected in long read data with JAFFAL, providing better accuracy than other long read fusion finders and with similar performance as state-of-the-art methods applied to short read data. By comparing Nanopore transcriptome sequencing protocols we find that numerous chimeric molecules are generated during cDNA library preparation that are absent when RNA is sequenced directly. We demonstrate that JAFFAL enables fusions to be detected at the level of in idual cells, when applied to long read single cell sequencing. Moreover, we demonstrate JAFFAL can identify fusions spanning three genes, highlighting the utility of long reads to characterise the transcriptional products of complex structural rearrangements with unprecedented resolution. JAFFAL is open source and available as part of the JAFFA package at github.com/Oshlack/JAFFA/wiki .
Publisher: Public Library of Science (PLoS)
Date: 27-02-2008
Publisher: Cold Spring Harbor Laboratory
Date: 17-06-2022
DOI: 10.1101/2022.06.17.496207
Abstract: Inflammation is a key driver of cystic fibrosis (CF) lung disease, not addressed by current standard care. Improved understanding of the mechanisms leading to aberrant inflammation may assist the development of effective anti-inflammatory therapy. Single-cell RNA sequencing (scRNA-seq) allows profiling of cell composition and function at previously unprecedented resolution. Herein, we seek to use multimodal single-cell analysis to comprehensively define immune cell phenotypes, proportions and functional characteristics in preschool children with CF. We analyzed 42,658 cells from bronchoalveolar lavage of 11 preschool children with CF and a healthy control using scRNA-seq and parallel assessment of 154 cell surface proteins. Validation of cell types identified by scRNA-seq was achieved by assessment of s les by spectral flow cytometry. Analysis of transcriptome expression and cell surface protein expression, combined with functional pathway analysis, revealed 41 immune and epithelial cell populations in BAL. Spectral flow cytometry analysis of over 256,000 cells from a subset of the same patients revealed high correlation in major cell type proportions across the two technologies. Macrophages consisted of 13 functionally distinct sub populations, including previously undescribed populations enriched for markers of vesicle production and regulatory/repair functions. Other novel cell populations included CD4 T cells expressing inflammatory IFNα/β and NFκB signalling genes. Our work provides a comprehensive cellular analysis of the pediatric lower airway in preschool children with CF, reveals novel cell types and provides a reference for investigation of inflammation in early life CF.
Publisher: Springer Science and Business Media LLC
Date: 25-07-2017
DOI: 10.1038/S41467-017-00112-7
Abstract: The ratites are a distinctive clade of flightless birds, typified by the emu and ostrich that have acquired a range of unique anatomical characteristics since erging from basal Aves at least 100 million years ago. The emu possesses a vestigial wing with a single digit and greatly reduced forelimb musculature. However, the embryological basis of wing reduction and other anatomical changes associated with loss of flight are unclear. Here we report a previously unknown co-option of the cardiac transcription factor Nkx2.5 to the forelimb in the emu embryo, but not in ostrich, or chicken and zebra finch, which have fully developed wings. Nkx2.5 is expressed in emu limb bud mesenchyme and maturing wing muscle, and mis-expression of Nkx2.5 throughout the limb bud in chick results in wing reductions. We propose that Nkx2.5 functions to inhibit early limb bud expansion and later muscle growth during development of the vestigial emu wing.
Publisher: eLife Sciences Publications, Ltd
Date: 24-01-2019
DOI: 10.7554/ELIFE.41156
Abstract: Progenitor self-renewal and differentiation is often regulated by spatially restricted cues within a tissue microenvironment. Here, we examine how progenitor cell migration impacts regionally induced commitment within the nephrogenic niche in mice. We identify a subset of cells that express Wnt4, an early marker of nephron commitment, but migrate back into the progenitor population where they accumulate over time. Single cell RNA-seq and computational modelling of returning cells reveals that nephron progenitors can traverse the transcriptional hierarchy between self-renewal and commitment in either direction. This plasticity may enable robust regulation of nephrogenesis as niches remodel and grow during organogenesis.
Publisher: Elsevier BV
Date: 09-2017
DOI: 10.1016/J.MCN.2017.07.004
Abstract: Huntington's disease is caused by polyglutamine (polyQ)-expansion mutations in the CAG tandem repeat of the Huntingtin gene. The central feature of Huntington's disease pathology is the aggregation of mutant Huntingtin (Htt) protein into micrometer-sized inclusion bodies. Soluble mutant Htt states are most proteotoxic and trigger an enhanced risk of death whereas inclusions confer different changes to cellular health, and may even provide adaptive responses to stress. Yet the molecular mechanisms underpinning these changes remain unclear. Using the flow cytometry method of pulse-shape analysis (PulSA) to sort neuroblastoma (Neuro2a) cells enriched with mutant or wild-type Htt into different aggregation states, we clarified which transcriptional signatures were specifically attributable to cells before versus after inclusion assembly. D ened CREB signalling was the most striking change overall and invoked specifically by soluble mutant Httex1 states. Toxicity could be rescued by stimulation of CREB signalling. Other biological processes mapped to different changes before and after aggregation included NF-kB signalling, autophagy, SUMOylation, transcription regulation by histone deacetylases and BRD4, NAD+ biosynthesis, ribosome biogenesis and altered HIF-1 signalling. These findings open the path for therapeutic strategies targeting key molecular changes invoked prior to, and subsequently to, Httex1 aggregation.
Publisher: WORLD SCIENTIFIC
Date: 10-2004
Publisher: Cold Spring Harbor Laboratory
Date: 14-06-2021
DOI: 10.1101/2021.06.13.448220
Abstract: Short tandem repeats (STRs) are highly polymorphic with high mutation rates and expansions of STRs have been implicated as the causal variant in diseases. The application of genome sequencing in patients has recently allowed many new discoveries with over 50 disease causing loci known to date. There are several tools which allow genotyping of STRs from high-throughput sequencing (HTS) data. However, running these tools out of the box only allow around half of the known disease-causing loci to be genotyped, with lengths often limited to either read or fragment length which is less than the pathogenic cut-off for some diseases. While analysis tools can be customised to genotype extra loci, this requires proficiency in bioinformatics to set up, use, and analyse the resulting data, limiting their widespread usage by other researchers and clinicians. To address these issues, we have created a new software called STRipy that has an intuitive graphical interface and requires no specific skills for usage, thus significantly simplifying detection of STRs expansions from human HTS data. STRipy is able to target all known disease-causing STRs with genotyping performed with an established tool, ExpansionHunter, that is incorporated into the software. We have created additional functionality into STRipy to work with long alleles exceeding the fragment length. STRipy was validated using over 60 thousand simulated s les and was shown to work on whole genome sequencing of biological s les with pathogenic variants. Finally, we have used STRipy to acquire genotypes of pathogenic loci for thousands of s les from various populations which are provided to the user along with the data from the literature to assist with results interpretation. We believe the simplicity and breadth of STRipy will increase the testing of STR diseases in current datasets resulting in further diagnoses of rare diseases caused by STRs expansions.
Publisher: Cold Spring Harbor Laboratory
Date: 09-10-2017
DOI: 10.1101/200287
Abstract: RNA-Seq analyses can benefit from performing a genome-guided and de novo assembly, in particular for species where the reference genome or the annotation is incomplete. However, tools for integrating assembled transcriptome with reference annotation are lacking. Necklace is a software pipeline that runs genome-guided and de novo assembly and combines the resulting transcriptomes with reference genome annotations. Necklace constructs a compact but comprehensive superTranscriptome out of the assembled and reference data. Reads are subsequently aligned and counted in preparation for differential expression testing. Necklace allows a comprehensive transcriptome to be built from a combination of assembled and annotated transcripts which results in a more comprehensive transcriptome for the majority of organisms. In addition RNA-seq data is mapped back to this newly created superTranscript reference to enable differential expression testing with standard methods. Necklace is available from github.com/Oshlack/necklace/wiki under GPL 3.0.
Publisher: Oxford University Press (OUP)
Date: 23-03-2007
DOI: 10.1093/BIOINFORMATICS/BTM111
Abstract: Motivation: Comparisons of gene expression levels within and between species have become a central tool in the study of the genetic basis for phenotypic variation, as well as in the study of the evolution of gene regulation. DNA microarrays are a key technology that enables these studies. Currently, however, microarrays are only available for a small number of species. Thus, in order to study gene expression levels in species for which microarrays are not available, researchers face three sets of choices: (i) use a microarray designed for another species, but only compare gene expression levels within species, (ii) construct a new microarray for every species whose gene expression profiles will be compared or (iii) build a multi-species microarray with probes from each species of interest. Here, we use data collected using a multi-primate cDNA array to evaluate the reliability of each approach. Results: We find that, for inter-species comparisons, estimates of expression differences based on multi-species microarrays are more accurate than those based on multiple species-specific arrays. We also demonstrate that within-species expression differences can be estimated using a microarray for a closely related species, without discernible loss of information. Contact: A.O. (oshlack@wehi.edu.au) or Y.G. (gilad@uchicago.edu) Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Cold Spring Harbor Laboratory
Date: 04-06-2020
DOI: 10.1101/2020.06.03.131532
Abstract: Genomic rearrangements can modify gene function by altering transcript sequences, and have been shown to be drivers in both cancer and rare diseases. Although there are now many methods to detect structural variants from Whole Genome Sequencing (WGS), RNA sequencing (RNA-seq) remains under-utilised as a technology for the detection of gene altering structural variants. Calling fusion genes from RNA-seq data is well established, but other transcriptional variants such as fusions with novel sequence, tandem duplications, large insertions and deletions, and novel splicing are difficult to detect using existing approaches. To identify all types of variants in transcriptomes, we developed MINTIE, an integrated pipeline for RNA-seq data. We take a reference free approach, which combines de novo assembly of transcripts with differential expression analysis, to identify up-regulated novel variants in a case s le. We validated MINTIE on simulated and real data sets and compared it with eight other approaches for finding novel transcriptional variants. We found MINTIE was able to detect % of variants while no other method was able to achieve this. We applied MINTIE to RNA-seq data from a cohort of acute lymphoblastic leukemia (ALL) patient s les and identified several clinically relevant variants, including a recurrent unpartnered fusion involving the tumour suppressor gene RB1, and variants in ALL-associated genes: tandem duplications in IKZF1 and PAX5, and novel splicing in ETV6. We further demonstrate the utility of MINTIE to identify rare disease variants using RNA-seq, including the discovery of an inter-chromosomal translocation in the DMD gene in a patient with muscular dystrophy. We posit that MINTIE will be able to identify new disease variants across a range of cancers and other disease types.
Publisher: F1000 Research Ltd
Date: 23-03-2020
DOI: 10.12688/F1000RESEARCH.22639.1
Abstract: Background: Short tandem repeats are an important source of genetic variation. They are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington's disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to analyse these repeats at large scale however, accurate genotyping is still a challenging task. We compared four different short tandem repeats genotyping tools on whole exome sequencing data to determine their genotyping performance and limits, which will aid other researchers in choosing a suitable tool and parameters for analysis. Methods: The analysis was performed on the Simons Simplex Collection dataset, where we used a novel method of evaluation with accuracy determined by the rate of homozygous calls on the X chromosome of male s les. In total we analysed 433 s les and around a million genotypes for evaluating tools on whole exome sequencing data. Results: We determined a relatively good performance of all tools when genotyping repeats of 3-6 bp in length, which could be improved with coverage and quality score filtering. However, genotyping homopolymers was challenging for all tools and a high error rate was present across different thresholds of coverage and quality scores. Interestingly, dinucleotide repeats displayed a high error rate as well, which was found to be mainly caused by the AC/TG repeats. Overall, LobSTR was able to make the most calls and was also the fastest tool, while RepeatSeq and HipSTR exhibited the lowest heterozygous error rate at low coverage. Conclusions: All tools have different strengths and weaknesses and the choice may depend on the application. In this analysis we demonstrated the effect of using different filtering parameters and offered recommendations based on the trade-off between the best accuracy of genotyping and the highest number of calls.
Publisher: Springer Science and Business Media LLC
Date: 18-04-2019
Publisher: American Society of Hematology
Date: 19-03-2015
DOI: 10.1182/BLOOD-2014-10-603969
Abstract: Depletion of Jarid2 in mouse and human hematopoietic stem cells enhances their activity. Jarid2 acts as part of PRC2 in hematopoietic stem and progenitor cells.
Publisher: Springer Science and Business Media LLC
Date: 10-04-2018
DOI: 10.1038/S41467-018-03714-X
Abstract: Congenital heart defects can be caused by mutations in genes that guide cardiac lineage formation. Here, we show deletion of NKX2-5 , a critical component of the cardiac gene regulatory network, in human embryonic stem cells (hESCs), results in impaired cardiomyogenesis, failure to activate VCAM1 and to downregulate the progenitor marker PDGFRα. Furthermore, NKX2-5 null cardiomyocytes have abnormal physiology, with asynchronous contractions and altered action potentials. Molecular profiling and genetic rescue experiments demonstrate that the bHLH protein HEY2 is a key mediator of NKX2-5 function during human cardiomyogenesis. These findings identify HEY2 as a novel component of the NKX2-5 cardiac transcriptional network, providing tangible evidence that hESC models can decipher the complex pathways that regulate early stage human heart development. These data provide a human context for the evaluation of pathogenic mutations in congenital heart disease.
Publisher: Oxford University Press (OUP)
Date: 18-05-2015
DOI: 10.1093/NAR/GKV526
Publisher: Oxford University Press (OUP)
Date: 06-10-2010
DOI: 10.1093/NAR/GKQ871
Publisher: Elsevier BV
Date: 08-2006
DOI: 10.1016/J.TIG.2006.06.002
Abstract: Changes in genetic regulation contribute to adaptations in natural populations and influence susceptibility to human diseases. Despite their potential phenotypic importance, the selective pressures acting on regulatory processes in general and gene expression levels in particular are largely unknown. Studies in model organisms suggest that the expression levels of most genes evolve under stabilizing selection, although a few are consistent with adaptive evolution. However, it has been proposed that gene expression levels in primates evolve largely in the absence of selective constraints. In this article, we discuss the microarray-based observations that led to these disparate interpretations. We conclude that in both primates and model organisms, stabilizing selection is likely to be the dominant mode of gene expression evolution. An important implication is that mutations affecting gene expression will often be deleterious and might underlie many human diseases.
Publisher: Oxford University Press (OUP)
Date: 21-07-2006
Publisher: Springer Science and Business Media LLC
Date: 16-09-2015
Publisher: Springer Science and Business Media LLC
Date: 2012
Publisher: The Company of Biologists
Date: 12-06-2019
DOI: 10.1242/DEV.178673
Abstract: Recent advances in the generation of kidney organoids and the culture of primary nephron progenitors from mouse and human have been based on knowledge of the molecular basis of kidney development in mice. Although gene expression during kidney development has been intensely investigated, single cell profiling provides new opportunities to further subsect component cell types and the signalling networks at play. Here, we describe the generation and analysis of 6732 single cell transcriptomes from the fetal mouse kidney [embryonic day (E)18.5] and 7853 sorted nephron progenitor cells (E14.5). These datasets provide improved resolution of cell types and specific markers, including sub ision of the renal stroma and heterogeneity within the nephron progenitor population. Ligand-receptor interaction and pathway analysis reveals novel crosstalk between cellular compartments and associates new pathways with differentiation of nephron and ureteric epithelium cell types. We identify transcriptional congruence between the distal nephron and ureteric epithelium, showing that most markers previously used to identify ureteric epithelium are not specific. Together, this work improves our understanding of metanephric kidney development and provides a template to guide the regeneration of renal tissue.
Publisher: American Society of Hematology
Date: 16-06-2022
Abstract: Transcriptome sequencing has identified multiple subtypes of B-progenitor acute lymphoblastic leukemia (B-ALL) of prognostic significance, but a minority of cases lack a known genetic driver. Here, we used integrated whole-genome (WGS) and -transcriptome sequencing (RNA-seq), enhancer mapping, and chromatin topology analysis to identify previously unrecognized genomic drivers in B-ALL. Newly diagnosed (n = 3221) and relapsed (n = 177) B-ALL cases with tumor RNA-seq were studied. WGS was performed to detect mutations, structural variants, and copy number alterations. Integrated analysis of histone 3 lysine 27 acetylation and chromatin looping was performed using HiChIP. We identified a subset of 17 newly diagnosed and 5 relapsed B-ALL cases with a distinct gene expression profile and 2 universal and unique genomic alterations resulting from aberrant recombination-activating gene activation: a focal deletion downstream of PAN3 at 13q12.2 resulting in CDX2 deregulation by the PAN3 enhancer and a focal deletion of exons 18-21 of UBTF at 17q21.31 resulting in a chimeric fusion, UBTF::ATXN7L3. A subset of cases also had rearrangement and increased expression of the PAX5 gene, which is otherwise uncommon in B-ALL. Patients were more commonly female and young adult with median age 35 (range,12-70 years). The immunophenotype was characterized by CD10 negativity and immunoglobulin M positivity. Among 16 patients with known clinical response, 9 (56.3%) had high-risk features including relapse (n = 4) or minimal residual disease & % at the end of remission induction (n = 5). CDX2-deregulated, UBTF::ATXN7L3 rearranged (CDX2/UBTF) B-ALL is a high-risk subtype of leukemia in young adults for which novel therapeutic approaches are required.
Publisher: Oxford University Press (OUP)
Date: 07-2018
Publisher: Cold Spring Harbor Laboratory
Date: 04-07-2017
DOI: 10.1101/159228
Abstract: Short tandem repeat (STR) expansions have been identified as the causal DNA mutation in dozens of Mendelian diseases. Historically, pathogenic STR expansions could only be detected by single locus techniques, such as PCR and electrophoresis. The ability to use short read sequencing data to screen for STR expansions has the potential to reduce both the time and cost to reaching diagnosis and enable the discovery of new causal STR loci. Most existing tools detect STR variation within the read length, and so are unable to detect the majority of pathogenic expansions. Those tools that can detect large expansions are limited to a set of known disease loci and as yet no new disease causing STR expansions have been identified with high-throughput sequencing technologies. Here we address this by presenting STRetch, a new genome-wide method to detect STR expansions at all loci across the human genome. We demonstrate the use of STRetch for detecting pathogenic STR expansions in short-read whole genome sequencing data with a very low false discovery rate. We further demonstrate the application of STRetch to solve cases of patients with undiagnosed disease and apply STRetch to the analysis of 97 whole genomes to reveal variation at STR loci. STRetch assesses expansions at all STR loci in the genome and allows screening for novel disease-causing STRs. STRetch is open source software, available from github.com/Oshlack/STRetch .
Publisher: Springer Science and Business Media LLC
Date: 03-2006
DOI: 10.1038/NATURE04559
Abstract: Although it has been hypothesized for thirty years that many human adaptations are likely to be due to changes in gene regulation, almost nothing is known about the modes of natural selection acting on regulation in primates. Here we identify a set of genes for which expression is evolving under natural selection. We use a new multi-species complementary DNA array to compare steady-state messenger RNA levels in liver tissues within and between humans, chimpanzees, orangutans and rhesus macaques. Using estimates from a linear mixed model, we identify a set of genes for which expression levels have remained constant across the entire phylogeny (approximately 70 million years), and are therefore likely to be under stabilizing selection. Among the top candidates are five genes with expression levels that have previously been shown to be altered in liver carcinoma. We also find a number of genes with similar expression levels among non-human primates but significantly elevated or reduced expression in the human lineage, features that point to the action of directional selection. Among the gene set with a human-specific increase in expression, there is an excess of transcription factors the same is not true for genes with increased expression in chimpanzee.
Publisher: Cambridge University Press (CUP)
Date: 2002
DOI: 10.1071/AS01083
Abstract: We observed three AGN from the Parkes Half-Jansky Flat-spectrum S le at near infrared (NIR) wavelengths to search for micro-variability. In one source, the blue quasar PKS 2243–123, good evidence for NIR micro-variability was found. In the other two sources, PKS 2240–260 and PKS 2233–148, both BL Lacertae objects, no such evidence of variability was detected. We discuss the implications of these observations for the various mechanisms that have been proposed for micro-variability.
Publisher: Springer Science and Business Media LLC
Date: 14-10-2017
DOI: 10.1038/LEU.2016.279
Abstract: Enforced expression of microRNA-155 (miR-155) in myeloid cells has been shown to have both oncogenic or tumour-suppressor functions in acute myeloid leukaemia (AML). We sought to resolve these contrasting effects of miR-155 overexpression using murine models of AML and human paediatric AML data sets. We show that the highest miR-155 expression levels inhibited proliferation in murine AML models. Over time, enforced miR-155 expression in AML in vitro and in vivo, however, favours selection of intermediate miR-155 expression levels that results in increased tumour burden in mice, without accelerating the onset of disease. Strikingly, we show that intermediate and high miR-155 expression also regulate very different subsets of miR-155 targets and have contrasting downstream effects on the transcriptional environments of AML cells, including genes involved in haematopoiesis and leukaemia. Furthermore, we show that elevated miR-155 expression detected in paediatric AML correlates with intermediate and not high miR-155 expression identified in our experimental models. These findings collectively describe a novel dose-dependent role for miR-155 in the regulation of AML, which may have important therapeutic implications.
Publisher: Oxford University Press (OUP)
Date: 12-04-2012
DOI: 10.1093/BIOINFORMATICS/BTS167
Abstract: Summary: Bpipe is a simple, dedicated programming language for defining and executing bioinformatics pipelines. It specializes in enabling users to turn existing pipelines based on shell scripts or command line tools into highly flexible, adaptable and maintainable workflows with a minimum of effort. Bpipe ensures that pipelines execute in a controlled and repeatable fashion and keeps audit trails and logs to ensure that experimental results are reproducible. Requiring only Java as a dependency, Bpipe is fully self-contained and cross-platform, making it very easy to adopt and deploy into existing environments. Availability and implementation: Bpipe is freely available from bpipe.org under a BSD License. Contact: simon.sadedin@mcri.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Springer Science and Business Media LLC
Date: 2010
Publisher: F1000 Research Ltd
Date: 08-06-2016
DOI: 10.12688/F1000RESEARCH.8839.1
Abstract: Methylation in the human genome is known to be associated with development and disease. The Illumina Infinium methylation arrays are by far the most common way to interrogate methylation across the human genome. This paper provides a Bioconductor workflow using multiple packages for the analysis of methylation array data. Specifically, we demonstrate the steps involved in a typical differential methylation analysis pipeline including: quality control, filtering, normalization, data exploration and statistical testing for probe-wise differential methylation. We further outline other analyses such as differential methylation of regions, differential variability analysis, estimating cell type composition and gene ontology testing. Finally, we provide some ex les of how to visualise methylation array data.
Publisher: F1000 Research Ltd
Date: 26-07-2016
DOI: 10.12688/F1000RESEARCH.8839.2
Abstract: Methylation in the human genome is known to be associated with development and disease. The Illumina Infinium methylation arrays are by far the most common way to interrogate methylation across the human genome. This paper provides a Bioconductor workflow using multiple packages for the analysis of methylation array data. Specifically, we demonstrate the steps involved in a typical differential methylation analysis pipeline including: quality control, filtering, normalization, data exploration and statistical testing for probe-wise differential methylation. We further outline other analyses such as differential methylation of regions, differential variability analysis, estimating cell type composition and gene ontology testing. Finally, we provide some ex les of how to visualise methylation array data.
Publisher: American Society of Hematology
Date: 09-10-2020
DOI: 10.1182/BLOODADVANCES.2020002708
Abstract: A novel KMT2A-rearrangement, MLL-TFE3, was identified in an infant leukemia patient. MLL-TFE3 expression produces aggressive leukemia in a mouse model.
Publisher: American Astronomical Society
Date: 09-2002
DOI: 10.1086/341729
Publisher: F1000 Research Ltd
Date: 05-04-2017
DOI: 10.12688/F1000RESEARCH.8839.3
Abstract: Methylation in the human genome is known to be associated with development and disease. The Illumina Infinium methylation arrays are by far the most common way to interrogate methylation across the human genome. This paper provides a Bioconductor workflow using multiple packages for the analysis of methylation array data. Specifically, we demonstrate the steps involved in a typical differential methylation analysis pipeline including: quality control, filtering, normalization, data exploration and statistical testing for probe-wise differential methylation. We further outline other analyses such as differential methylation of regions, differential variability analysis, estimating cell type composition and gene ontology testing. Finally, we provide some ex les of how to visualise methylation array data.
Publisher: Cold Spring Harbor Laboratory
Date: 06-2020
DOI: 10.1101/2020.05.31.126888
Abstract: Calling copy number alterations (CNAs) from RNA-Seq is challenging, because differences in gene expression mean that read depth across genes varies by several orders of magnitude and there is a paucity of informative single nucleotide polymorphisms (SNPs). We previously developed SuperFreq to analyse exome data of tumours by combining variant calling and copy number estimation in an integrated pipeline. Here we have used the SuperFreq framework for the analysis of RNA sequencing (RNA-Seq) data, which allows for the detection of absolute and allele sensitive CNAs. SuperFreq uses an error-propagation framework to combine and maximise the information available in the read depth and B-allele frequencies of SNPs (BAFs) to make CNA calls on RNA-seq data. We used data from The Cancer Genome Atlas (TCGA) to evaluate the CNA called from RNA-Seq with those generated from SNP-arrays. When ploidy estimates were consistent, we found excellent agreement with CNAs called from DNA of over 98% of the genome for acute myeloid leukaemia (TCGA-AML, n=116) and 87% for colorectal cancer (TCGA-CRC, n=377), which has a much higher CNA burden. As expected, the sensitivity of CNA calling from RNA-Seq was dependent on gene density. Nonetheless, using RNA-Seq SuperFreq detected 78% of CNA calls covering 100 or more genes with a precision of 94%. Recall dropped markedly for focal events, but this also depended on the signal intensity. For ex le, in the CRC cohort SuperFreq identified 100% (7/7) of cases with high-level lification of ERBB2, where the copy number was typically , but identified only 6% (1/17) of cases with moderate lification of IGF2, typically 4 or 5 copies over a smaller region (median 5 flanking genes for IGF2, compared to 20 for ERBB2). We were able to reproduce the relationship between mutational load and CNA profile in CRC using RNA-Seq alone. SuperFreq offers an integrated platform for identification of CNAs and point mutations from RNA-seq in cancer transcriptomes. The software is implemented in R and is available through GitHub: github.com/ChristofferFlensburg/SuperFreq .
Publisher: Cold Spring Harbor Laboratory
Date: 17-06-2021
DOI: 10.1101/2021.06.17.448806
Abstract: With improving technology and decreasing costs, single-cell RNA sequencing (scRNA-seq) at the population scale has become more viable, opening up the doors to study functional genomics at the single-cell level. This development has lead to a rush to adapt bulk methods and develop new single-cell-specific methods and tools for computational analysis of these studies. Many single-cell methods have been tested, developed, and benchmarked using simulated data. However, current scRNA-seq simulation frameworks do not allow for the simulation of population-scale scRNA-seq data. Here, we present splatPop, a new Splatter model, for flexible, reproducible, and well documented simulation of population-scale scRNA-seq data with known expression quantitative trait loci (eQTL) effects. The splatPop model also allows for the simulation of complex batch effects, cell group effects, and conditional effects between in iduals from different cohorts.
Publisher: Cold Spring Harbor Laboratory
Date: 02-05-2017
DOI: 10.1101/133173
Abstract: As single-cell RNA sequencing technologies have rapidly developed, so have analysis methods. Many methods have been tested, developed and validated using simulated datasets. Unfortunately, current simulations are often poorly documented, their similarity to real data is not demonstrated, or reproducible code is not available. Here we present the Splatter Bioconductor package for simple, reproducible and well-documented simulation of single-cell RNA-seq data. Splatter provides an interface to multiple simulation methods including Splat, our own simulation, based on a gamma-Poisson distribution. Splat can simulate single populations of cells, populations with multiple cell types or differentiation paths.
Publisher: Cold Spring Harbor Laboratory
Date: 30-07-2018
DOI: 10.1101/380097
Abstract: Analysing multiple cancer s les from an in idual patient can provide insight into the way the disease evolves. Monitoring the expansion and contraction of distinct clones helps to reveal the mutations that initiate the disease and those that drive progression. Existing approaches for clonal tracking from sequencing data typically require the user to combine multiple tools that are not purpose-built for this task. Furthermore, most methods require a matched normal (non-tumour) s le, which limits the scope of application. We developed SuperFreq, a cancer exome sequencing analysis pipeline that integrates identification of somatic single nucleotide variants (SNVs) and copy number alterations (CNAs) and clonal tracking for both. SuperFreq does not require a matched normal and instead relies on unrelated controls. When analysing multiple s les from a single patient, SuperFreq cross checks variant calls to improve clonal tracking, which helps to separate somatic from germline variants, and to resolve overlapping CNA calls. To demonstrate our software we analysed 304 cancer-normal exome s les across 33 cancer types in The Cancer Genome Atlas (TCGA) and evaluated the quality of the SNV and CNA calls. We simulated clonal evolution through in silico mixing of cancer and normal s les in known proportion. We found that SuperFreq identified 93% of clones with a cellular fraction of at least 50% and mutations were assigned to the correct clone with high recall and precision. In addition, SuperFreq maintained a similar level of performance for most aspects of the analysis when run without a matched normal. SuperFreq is highly versatile and can be applied in many different experimental settings for the analysis of exomes and other capture libraries. We demonstrate an application of SuperFreq to leukaemia patients with diagnosis and relapse s les. SuperFreq is implemented in R and available on github at github.com/ChristofferFlensburg/SuperFreq .
Publisher: Elsevier BV
Date: 02-2016
Publisher: Springer Science and Business Media LLC
Date: 2007
Publisher: Springer Science and Business Media LLC
Date: 2010
Publisher: Oxford University Press (OUP)
Date: 19-06-2017
DOI: 10.1093/BRAIN/AWX138
Abstract: Defects in mRNA export from the nucleus have been linked to various neurodegenerative disorders. We report mutations in the gene MCM3AP, encoding the germinal center associated nuclear protein (GANP), in nine affected in iduals from five unrelated families. The variants were associated with severe childhood onset primarily axonal (four families) or demyelinating (one family) Charcot-Marie-Tooth neuropathy. Mild to moderate intellectual disability was present in seven of nine affected in iduals. The affected in iduals were either compound heterozygous or homozygous for different MCM3AP variants, which were predicted to cause depletion of GANP or affect conserved amino acids with likely importance for its function. Accordingly, fibroblasts of affected in iduals from one family demonstrated severe depletion of GANP. GANP has been described to function as an mRNA export factor, and to suppress TDP-43-mediated motor neuron degeneration in flies. Thus our results suggest defective mRNA export from nucleus as a potential pathogenic mechanism of axonal degeneration in these patients. The identification of MCM3AP variants in affected in iduals from multiple centres establishes it as a disease gene for childhood-onset recessively inherited Charcot-Marie-Tooth neuropathy with intellectual disability.
Publisher: Oxford University Press (OUP)
Date: 25-08-2007
DOI: 10.1093/BIOINFORMATICS/BTM412
Abstract: Motivation: Microarray data must be background corrected to remove the effects of non-specific binding or spatial heterogeneity across the array, but this practice typically causes other problems such as negative corrected intensities and high variability of low intensity log-ratios. Different estimators of background, and various model-based processing methods, are compared in this study in search of the best option for differential expression analyses of small microarray experiments. Results: Using data where some independent truth in gene expression is known, eight different background correction alternatives are compared, in terms of precision and bias of the resulting gene expression measures, and in terms of their ability to detect differentially expressed genes as judged by two popular algorithms, SAM and limma eBayes. A new background processing method (normexp) is introduced which is based on a convolution model. The model-based correction methods are shown to be markedly superior to the usual practice of subtracting local background estimates. Methods which stabilize the variances of the log-ratios along the intensity range perform the best. The normexp+offset method is found to give the lowest false discovery rate overall, followed by morph and vsn. Like vsn, normexp is applicable to most types of two-colour microarray data. Availability: The background correction methods compared in this article are available in the R package limma (Smyth, 2005) from www.bioconductor.org. Contact: smyth@wehi.edu.au Supplementary information: Supplementary data are available from bioinf.wehi.edu.au/resources/webReferences.html.
Publisher: Springer Science and Business Media LLC
Date: 04-11-2019
DOI: 10.1038/S41598-019-52280-9
Abstract: MicroRNAs (miRNAs) are translational regulatory molecules with recognised roles in heart development and disease. Therefore, it is important to define the human miRNA expression profile in cardiac progenitors and early-differentiated cardiomyocytes and to determine whether critical cardiac transcription factors such as NKX2-5 regulate miRNA expression. We used an NKX2-5 eGFP/w reporter line to isolate both cardiac committed mesoderm and cardiomyocytes. We identified 11 miRNAs that were differentially expressed in NKX2-5 -expressing cardiac mesoderm compared to non-cardiac mesoderm. Subsequent profiling revealed that the canonical myogenic miRNAs including MIR1-1 , MIR133A1 and MIR208A were enriched in cardiomyocytes. Strikingly, deletion of NKX2-5 did not result in gross changes in the cardiac miRNA profile, either at committed mesoderm or cardiomyocyte stages. Thus, in early human cardiomyocyte commitment and differentiation, the cardiac myogenic miRNA program is predominantly regulated independently of the highly conserved NKX2-5 -dependant gene regulatory network.
Publisher: Cold Spring Harbor Laboratory
Date: 21-12-2022
DOI: 10.1101/2022.12.20.521313
Abstract: S le multiplexing is often used to reduce cost and limit batch effects in single-cell RNA sequencing (scRNA-seq) experiments. A commonly used multiplexing technique involves tagging cells prior to pooling with a hashtag oligo (HTO) that can be sequenced along with the cells’ RNA to determine their s le of origin. Several tools have been developed to demultiplex HTO sequencing data and assign cells to s les. In this study, we critically assess the performance of seven HTO demultiplexing tools: hashedDrops, HTODemux, GMM-Demux, demuxmix, deMULTIplex, BFF and HashSolo . The comparison uses data sets where each s le has also been demultiplexed using genetic variants from the RNA, enabling comparison of HTO demultiplexing techniques against complementary data from the genetic “ground truth”. We find that all methods perform similarly where HTO labelling is of high quality, but methods that assume a bimodal counts distribution perform poorly on lower quality data. We also suggest heuristic approaches for assessing the quality of HTO counts in a scRNA-seq experiment.
Publisher: Cold Spring Harbor Laboratory
Date: 27-09-2016
DOI: 10.1101/077750
Abstract: Numerous methods have been developed to analyse RNA sequencing data, but most rely on the availability of a reference genome, making them unsuitable for non-model organisms. De novo transcriptome assembly can build a reference transcriptome from the non-model sequencing data, but falls short of allowing most tools to be applied. Here we present superTranscripts, a simple but powerful solution to bridge that gap. SuperTranscripts are a substitute for a reference genome, consisting of all the unique exonic sequence, in transcriptional order, such that each gene is represented by a single sequence. We demonstrate how superTranscripts allow visualization, variant detection and differential isoform detection in non-model organisms, using widely applied methods that are designed to work with reference genomes. SuperTranscripts can also be applied to model organisms to enhance visualization and discover novel expressed sequence. We describe Lace, software to construct superTranscripts from any set of transcripts including de novo assembled transcriptomes. In addition we used Lace to combine reference and assembled transcriptomes for chicken and recovered the sequence of hundreds of gaps in the reference genome.
Publisher: American Society of Hematology
Date: 07-04-2022
DOI: 10.1182/BLOODADVANCES.2021006076
Abstract: Philadelphia-like (Ph-like) acute lymphoblastic leukemia (ALL) is a high-risk subtype of B-cell ALL characterized by a gene expression profile resembling Philadelphia chromosome–positive ALL (Ph+ ALL) in the absence of BCR-ABL1. Tyrosine kinase–activating fusions, some involving ABL1, are recurrent drivers of Ph-like ALL and are targetable with tyrosine kinase inhibitors (TKIs). We identified a rare instance of SFPQ-ABL1 in a child with Ph-like ALL. SFPQ-ABL1 expressed in cytokine-dependent cell lines was sufficient to transform cells and these cells were sensitive to ABL1-targeting TKIs. In contrast to BCR-ABL1, SFPQ-ABL1 localized to the nuclear compartment and was a weaker driver of cellular proliferation. Phosphoproteomics analysis showed upregulation of cell cycle, DNA replication, and spliceosome pathways, and downregulation of signal transduction pathways, including ErbB, NF-κB, vascular endothelial growth factor (VEGF), and MAPK signaling in SFPQ-ABL1–expressing cells compared with BCR-ABL1–expressing cells. SFPQ-ABL1 expression did not activate phosphatidylinositol 3-kinase rotein kinase B (PI3K/AKT) signaling and was associated with phosphorylation of G2/M cell cycle proteins. SFPQ-ABL1 was sensitive to navitoclax and S-63845 and promotes cell survival by maintaining expression of Mcl-1 and Bcl-xL. SFPQ-ABL1 has functionally distinct mechanisms by which it drives ALL, including subcellular localization, proliferative capacity, and activation of cellular pathways. These findings highlight the role that fusion partners have in mediating the function of ABL1 fusions.
Publisher: F1000 Research Ltd
Date: 07-12-2021
DOI: 10.12688/F1000RESEARCH.74836.1
Abstract: Visualisation of the transcriptome relative to a reference genome is fraught with sparsity. This is due to RNA sequencing (RNA-Seq) reads being predominantly mapped to exons that account for just under 3% of the human genome. Recently, we have used exon-only references, superTranscripts, to improve visualisation of aligned RNA-Seq data through the omission of supposedly unexpressed regions such as introns. However, variation within these regions can lead to novel splicing events that may drive a pathogenic phenotype. In these cases, the loss of information in only retaining annotated exons presents significant drawbacks. Here we present Slinker, a bioinformatics pipeline written in Python and Bpipe that uses a data-driven approach to assemble s le-specific superTranscripts. At its core, Slinker uses Stringtie2 to assemble transcripts with any sequence across any gene. This assembly is merged with reference transcripts, converted to a superTranscript, of which rich visualisations are made through Plotly with associated annotation and coverage information. Slinker was validated on five novel splicing events of rare disease s les from a cohort of primary muscular disorders. In addition, Slinker was shown to be effective in visualising deletion events within transcriptomes of tumour s les in the important leukemia gene, IKZF1. Slinker offers a succinct visualisation of RNA-Seq alignments across typically sparse regions and is freely available on Github.
Publisher: Hindawi Limited
Date: 21-04-2022
DOI: 10.1002/HUMU.24382
Abstract: Expansions of short tandem repeats (STRs) have been implicated as the causal variant in over 50 diseases known to date. There are several tools which can genotype STRs from high-throughput sequencing (HTS) data. However, running these tools out of the box only allows around half of the known disease-causing loci to be genotyped. Furthermore, the genotypes estimated at these loci are often underestimated with maximum lengths limited to either the read or fragment length, which is less than the pathogenic cutoff for some diseases. Although analysis tools can be customized to genotype extra loci, this requires proficiency in bioinformatics to set up, limiting their widespread usage by other researchers and clinicians. To address these issues, we have developed a new software called STRipy, which is able to target all known disease-causing STRs from HTS data. We created an intuitive graphical interface for STRipy and significantly simplified the detection of STRs expansions. Moreover, we genotyped all disease loci for over two and half thousand s les to provide population-wide distributions to assist with interpretation of results. We believe the simplicity and breadth of STRipy will increase the genotyping of STRs in sequencing data resulting in further diagnoses of rare STR diseases.
Publisher: Elsevier BV
Date: 11-2016
DOI: 10.1038/GIM.2016.1
Abstract: To prospectively evaluate the diagnostic and clinical utility of singleton whole-exome sequencing (WES) as a first-tier test in infants with suspected monogenic disease. Singleton WES was performed as a first-tier sequencing test in infants recruited from a single pediatric tertiary center. This occurred in parallel with standard investigations, including single- or multigene panel sequencing when clinically indicated. The diagnosis rate, clinical utility, and impact on management of singleton WES were evaluated. Of 80 enrolled infants, 46 received a molecular genetic diagnosis through singleton WES (57.5%) compared with 11 (13.75%) who underwent standard investigations in the same patient group. Clinical management changed following exome diagnosis in 15 of 46 diagnosed participants (32.6%). Twelve relatives received a genetic diagnosis following cascade testing, and 28 couples were identified as being at high risk of recurrence in future pregnancies. This prospective study provides strong evidence for increased diagnostic and clinical utility of singleton WES as a first-tier sequencing test for infants with a suspected monogenic disorder. Singleton WES outperformed standard care in terms of diagnosis rate and the benefits of a diagnosis, namely, impact on management of the child and clarification of reproductive risks for the extended family in a timely manner.Genet Med 18 11, 1090-1096.
Publisher: Cold Spring Harbor Laboratory
Date: 02-03-2018
DOI: 10.1101/274035
Abstract: Clustering techniques are widely used in the analysis of large data sets to group together s les with similar properties. For ex le, clustering is often used in the field of single-cell RNA-sequencing in order to identify different cell types present in a tissue s le. There are many algorithms for performing clustering and the results can vary substantially. In particular, the number of groups present in a data set is often unknown and the number of clusters identified by an algorithm can change based on the parameters used. To explore and examine the impact of varying clustering resolution we present clustering trees. This visualisation shows the relationships between clusters at multiple resolutions allowing researchers to see how s les move as the number of clusters increases. In addition, meta-information can be overlaid on the tree to inform the choice of resolution and guide in identification of clusters. We illustrate the features of clustering trees using a series of simulations as well as two real ex les, the classical iris dataset and a complex single-cell RNA-sequencing dataset. Clustering trees can be produced using the clustree R package available from CRAN ( CRAN.Rackage=clustree ) and developed on GitHub ( azappi/clustree ).
Publisher: eLife Sciences Publications, Ltd
Date: 24-12-2018
Publisher: Springer Science and Business Media LLC
Date: 29-11-2016
Publisher: Springer Science and Business Media LLC
Date: 2013
Publisher: Cold Spring Harbor Laboratory
Date: 20-10-2017
DOI: 10.1101/206573
Abstract: As single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database ( www.scRNA-tools.org ) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records of the growth of the field over time. In recent years single-cell RNA-sequeing technologies have emerged that allow scientists to measure the activity of genes in thousands of in idual cells simultaneously. This means we can start to look at what each cell in a s le is doing instead of considering an average across all cells in a s le, as was the case with older technologies. However, while access to this kind of data presents a wealth of opportunities it comes with a new set of challenges. Researchers across the world have developed new methods and software tools to make the most of these datasets but the field is moving at such a rapid pace it is difficult to keep up with what is currently available. To make this easier we have developed the scRNA-tools database and website ( www.scRNA-tools.org ). Our database catalogues analysis tools, recording the tasks they can be used for, where they can be downloaded from and the publications that describe how they work. By looking at this database we can see that developers have focued on methods specific to single-cell data and that they embrace an open-source approach with permissive licensing, sharing of code and preprint publications.
Publisher: Cold Spring Harbor Laboratory
Date: 04-02-2020
DOI: 10.1101/2020.02.03.933002
Abstract: Short tandem repeats are important source of genetic variation, they are highly mutable and repeat expansions are associated dozens of human disorders, such as Huntington’s disease and spinocerebellar ataxias. Technical advantages in sequencing technology have made it possible to analyse these repeats at large scale, however, accurate genotyping is still a challenging task. We compared four different short tandem repeats genotyping tools on whole exome sequencing data to determine their genotyping performance and limits which will aid other researchers to choose a suitable tool and parameters for analysis. The analysis was performed on the Simons Simplex Collection dataset where we used a novel method of evaluation with accuracy determined by the rate of homozygous calls on the X chromosome of male s les. In total we analysed 433 s les and around a million genotypes for evaluating tools on whole exome sequencing data. We determined a relatively good performance of all tools when genotyping repeats of 3-6 bp in length which could be improved with coverage and quality score filtering. However, genotyping homopolymers was challenging for all tools and a high error rate was present across different thresholds of coverage and quality scores. Interestingly, dinucleotide repeats displayed a high error rate as well, which was found to be mainly caused by the AC/TG repeats. Overall, LobSTR was able to make the most calls and was also the fastest tool while RepeatSeq and HipSTR exhibited the lowest heterozygous error rate at low coverage. All tools have different strengths and weaknesses and the choice may depend on the type of analysis. In this analysis we demonstrated the effect of using different filtering parameters and offered recommendations based on the trade-off between the best accuracy of genotyping and the highest number of calls.
Publisher: Springer Science and Business Media LLC
Date: 10-07-2015
Publisher: Springer Science and Business Media LLC
Date: 2009
Publisher: Public Library of Science (PLoS)
Date: 21-11-2008
Publisher: American Society of Hematology
Date: 15-07-2022
Publisher: American Medical Association (AMA)
Date: 09-2017
Publisher: Cold Spring Harbor Laboratory
Date: 19-12-2018
DOI: 10.1101/501106
Abstract: RNA sequencing has enabled high-throughput and fine-grained quantitative analyses of the transcriptome. While differential gene expression is the most widely used application of this technology, RNA-seq data also has the resolution to infer differential transcript usage (DTU), which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. DTU has typically been inferred from exon-count data, which has issues with assigning reads unambiguously to counting bins, and requires alignment of reads to the genome. Recently, approaches have emerged that use transcript quantifications estimates directly for DTU. Transcript counts can be inferred from ‘pseudo’ or lightweight aligners, which are significantly faster than traditional genome alignment. However, recent evaluations show lower sensitivity in DTU analysis. Transcript abundances are estimated from equivalence classes (ECs), which determine the transcripts that any given read is compatible with. Here we propose performing DTU testing directly on equivalence class read counts. We evaluate this approach on simulated human and drosophila data, as well as on a real dataset through subset testing. We find that ECs counts have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners. We posit that equivalent class counts is a natural unit on which to perform many types of analysis.
Publisher: Oxford University Press (OUP)
Date: 11-2013
Abstract: Penicillium marneffei is an opportunistic human pathogen endemic to Southeast Asia. At 25° P. marneffei grows in a filamentous hyphal form and can undergo asexual development (conidiation) to produce spores (conidia), the infectious agent. At 37° P. marneffei grows in the pathogenic yeast cell form that replicates by fission. Switching between these growth forms, known as dimorphic switching, is dependent on temperature. To understand the process of dimorphic switching and the physiological capacity of the different cell types, two microarray-based profiling experiments covering approximately 42% of the genome were performed. The first experiment compared cells from the hyphal, yeast, and conidiation phases to identify “phase or cell-state–specific” gene expression. The second experiment examined gene expression during the dimorphic switch from one morphological state to another. The data identified a variety of differentially expressed genes that have been organized into metabolic clusters based on predicted function and expression patterns. In particular, C-14 sterol reductase–encoding gene ergM of the ergosterol biosynthesis pathway showed high-level expression throughout yeast morphogenesis compared to hyphal. Deletion of ergM resulted in severe growth defects with increased sensitivity to azole-type antifungal agents but not hotericin B. The data defined gene classes based on spatio-temporal expression such as those expressed early in the dimorphic switch but not in the terminal cell types and those expressed late. Such classifications have been helpful in linking a given gene of interest to its expression pattern throughout the P. marneffei dimorphic life cycle and its likely role in pathogenicity.
Publisher: Springer Science and Business Media LLC
Date: 21-08-2018
Publisher: Springer Science and Business Media LLC
Date: 12-2021
DOI: 10.1186/S13059-021-02546-1
Abstract: Population-scale single-cell RNA sequencing (scRNA-seq) is now viable, enabling finer resolution functional genomics studies and leading to a rush to adapt bulk methods and develop new single-cell-specific methods to perform these studies. Simulations are useful for developing, testing, and benchmarking methods but current scRNA-seq simulation frameworks do not simulate population-scale data with genetic effects. Here, we present splatPop, a model for flexible, reproducible, and well-documented simulation of population-scale scRNA-seq data with known expression quantitative trait loci. splatPop can also simulate complex batch, cell group, and conditional effects between in iduals from different cohorts as well as genetically-driven co-expression.
Publisher: Springer Science and Business Media LLC
Date: 04-12-2018
DOI: 10.1038/S41467-018-07594-Z
Abstract: The podocytes within the glomeruli of the kidney maintain the filtration barrier by forming interdigitating foot processes with intervening slit diaphragms, disruption in which results in proteinuria. Studies into human podocytopathies to date have employed primary or immortalised podocyte cell lines cultured in 2D. Here we compare 3D human glomeruli sieved from induced pluripotent stem cell-derived kidney organoids with conditionally immortalised human podocyte cell lines, revealing improved podocyte-specific gene expression, maintenance in vitro of polarised protein localisation and an improved glomerular basement membrane matrisome compared to 2D cultures. Organoid-derived glomeruli retain marker expression in culture for 96 h, proving amenable to toxicity screening. In addition, 3D organoid glomeruli from a congenital nephrotic syndrome patient with compound heterozygous NPHS1 mutations reveal reduced protein levels of both NEPHRIN and PODOCIN. Hence, human iPSC-derived organoid glomeruli represent an accessible approach to the in vitro modelling of human podocytopathies and screening for podocyte toxicity.
Publisher: Springer Science and Business Media LLC
Date: 06-01-2022
DOI: 10.1186/S13059-021-02588-5
Abstract: In cancer, fusions are important diagnostic markers and targets for therapy. Long-read transcriptome sequencing allows the discovery of fusions with their full-length isoform structure. However, due to higher sequencing error rates, fusion finding algorithms designed for short reads do not work. Here we present JAFFAL, to identify fusions from long-read transcriptome sequencing. We validate JAFFAL using simulations, cell lines, and patient data from Nanopore and PacBio. We apply JAFFAL to single-cell data and find fusions spanning three genes demonstrating transcripts detected from complex rearrangements. JAFFAL is available at github.com/Oshlack/JAFFA/wiki .
Publisher: Cold Spring Harbor Laboratory
Date: 28-11-2021
DOI: 10.1101/2021.11.28.470236
Abstract: Single cell RNA Sequencing (scRNA-seq) has rapidly gained popularity over the last few years for profiling the transcriptomes of thousands to millions of single cells. This technology is now being used to analyse experiments with complex designs including biological replication. One question that can be asked from single cell experiments, which has been difficult to directly address with bulk RNA-seq data, is whether the cell type proportions are different between two or more experimental conditions. As well as gene expression changes, the relative depletion or enrichment of a particular cell type can be the functional consequence of disease or treatment. However, cell type proportions estimates from scRNA-seq data are variable and statistical methods that can correctly account for different sources of variability are needed to confidently identify statistically significant shifts in cell type composition between experimental conditions. We have developed propeller , a robust and flexible method that leverages biological replication to find statistically significant differences in cell type proportions between groups. Using simulated cell type proportions data we show that propeller performs well under a variety of scenarios. We applied propeller to test for significant changes in proportions of cell types related to human heart development, ageing and COVID-19 disease severity. The propeller method is publicly available in the open source speckle R package ( hipsonlab/speckle ). All the analysis code for the paper is available at hipsonlab ropeller-paper-analysis/ , and the associated analysis website is available at phipsonlab.github.io ropeller-paper-analysis/ . Alicia Oshlack: Alicia.Oshlack@petermac.org Belinda Phipson: phipson.b@wehi.edu.au Yes.
Publisher: Cold Spring Harbor Laboratory
Date: 08-2021
DOI: 10.1101/2021.08.01.454393
Abstract: B-cell acute lymphoblastic leukemia (B-ALL) is the most common childhood cancer. Subtypes within B-ALL are distinguished by characteristic structural variants and mutations, which in some instances strongly correlate with responses to treatment. The World Health Organisation (WHO) recognises seven distinct classifications, or subtypes , as of 2016. However, recent studies have demonstrated that B-ALL can be segmented into 23 subtypes based on a combination of genomic features and gene expression profiles. A method to identify a patient’s subtype would have clear clinical utility. Despite this, no publically available classification methods using RNA-Seq exist for this purpose. Here we present ALLSorts: a publicly available method that uses RNA-Seq data to classify B-ALL s les to 18 known subtypes and five meta-subtypes. ALLSorts is the result of a hierarchical supervised machine learning algorithm applied to a training set of 1223 B-ALL s les aggregated from multiple cohorts. Validation revealed that ALLSorts can accurately attribute s les to subtypes and can attribute multiple subtypes to a s le. Furthermore, when applied to both paediatric and adult cohorts, ALLSorts was able to classify previously undefined s les into subtypes. ALLSorts is available and documented on GitHub ( github.com/Oshlack/AllSorts/ ). ALLSorts is a gene expression classifier for B-cell acute lymphoblastic leukemia, which predicts 18 distinct genomic subtypes - including those designated by the World Health Organisation (WHO) and provisional entities. Trained and validated on over 2300 B-ALL s les, representing each subtype and a variety of clinical features. Correctly identified subtypes in 91% of cases in a held-out dataset and between 82-93% across a newly combined cohort of paediatric and adult s les. ALLSorts assigned subtypes to s les with previously unknown driver events. ALLsorts is an accurate, comprehensive and freely available classification tool that distinguishes subtypes of B-cell acute lymphoblastic leukemia from RNA-sequencing.
Publisher: Springer Science and Business Media LLC
Date: 2011
Publisher: Cold Spring Harbor Laboratory
Date: 31-10-2022
DOI: 10.1101/2022.10.27.514132
Abstract: Cancer is driven by mutations of the genome that can result in the activation of oncogenes or repression of tumour suppressor genes. In acute lymphoblastic leukemia (ALL) focal deletions in IKAROS family zinc finger 1 (IKZF1) result in the loss of zinc-finger DNA-binding domains and a dominant negative isoform that is associated with higher rates of relapse and poorer patient outcomes. Clinically, the presence of IKZF1 deletions informs prognosis and treatment options. In this work we developed a method for detecting exon deletions in genes using RNA-seq with application to IKZF1. We developed a pipeline that first uses a custom transcriptome reference consisting of transcripts with exon deletions. Next, RNA-seq reads are mapped using a pseudoalignment algorithm to identify reads that uniquely support deletions. These are then evaluated for evidence of the deletion with respect to gene expression and other s les. We applied the algorithm, named Toblerone, to a cohort of 99 B-ALL paediatric s les including validated IKZF1 deletions. Furthermore, we developed a graphical desktop app for non-bioinformatics users that can quickly and easily identify and report deletions in IKZF1 from RNA-seq data with informative graphical outputs.
Publisher: American Society of Hematology
Date: 30-10-2023
Publisher: Oxford University Press (OUP)
Date: 06-2015
DOI: 10.1095/BIOLREPROD.115.128918
Abstract: Male sex determination hinges on the development of testes in the embryo, beginning with the differentiation of Sertoli cells under the influence of the Y-linked gene SRY. Sertoli cells then orchestrate fetal testis formation including the specification of fetal Leydig cells (FLCs) that produce steroid hormones to direct virilization of the XY embryo. As the majority of XY disorders of sex development (DSDs) remain unexplained at the molecular genetic level, we reasoned that genes involved in FLC development might represent an unappreciated source of candidate XY DSD genes. To identify these genes, and to gain a more detailed understanding of the regulatory networks underpinning the specification and differentiation of the FLC population, we developed methods for isolating fetal Sertoli, Leydig, and interstitial cell-enriched subpopulations using an Sf1-eGFP transgenic mouse line. RNA sequencing followed by rigorous bioinformatic filtering identified 84 genes upregulated in FLCs, 704 genes upregulated in nonsteroidogenic interstitial cells, and 1217 genes upregulated in the Sertoli cells at 12.5 days postcoitum. The analysis revealed a trend for expression of components of neuroactive ligand interactions in FLCs and Sertoli cells and identified factors potentially involved in signaling between the Sertoli cells, FLCs, and interstitial cells. We identified 61 genes that were not known previously to be involved in specification or differentiation of FLCs. This dataset provides a platform for exploring the biology of FLCs and understanding the role of these cells in testicular development. In addition, it provides a basis for targeted studies designed to identify causes of idiopathic XY DSD.
Publisher: Springer Science and Business Media LLC
Date: 15-06-2023
Publisher: Springer Science and Business Media LLC
Date: 11-05-2015
Publisher: Springer New York
Date: 22-09-2012
Publisher: F1000 Research Ltd
Date: 03-02-2023
DOI: 10.12688/F1000RESEARCH.129490.1
Abstract: Cancer is driven by mutations of the genome that can result in the activation of oncogenes or repression of tumour suppressor genes. In acute lymphoblastic leukemia (ALL) focal deletions in IKAROS family zinc finger 1 (IKZF1) result in the loss of zinc-finger DNA-binding domains and a dominant negative isoform that is associated with higher rates of relapse and poorer patient outcomes. Clinically, the presence of IKZF1 deletions informs prognosis and treatment options. In this work we developed a method for detecting exon deletions in genes using RNA-seq with application to IKZF1. We developed a pipeline that first uses a custom transcriptome reference consisting of transcripts with exon deletions. Next, RNA-seq reads are mapped using a pseudoalignment algorithm to identify reads that uniquely support deletions. These are then evaluated for evidence of the deletion with respect to gene expression and other s les. We applied the algorithm, named Toblerone, to a cohort of 99 B-ALL paediatric s les including validated IKZF1 deletions. Furthermore, we developed a graphical desktop app for non-bioinformatics users that can quickly and easily identify and report deletions in IKZF1 from RNA-seq data with informative graphical outputs.
Publisher: Springer Science and Business Media LLC
Date: 21-02-2023
Publisher: Springer Science and Business Media LLC
Date: 09-06-2023
DOI: 10.1038/S41467-023-39040-0
Abstract: Squamous cell carcinoma antigen recognized by T cells 3 ( SART3 ) is an RNA-binding protein with numerous biological functions including recycling small nuclear RNAs to the spliceosome. Here, we identify recessive variants in SART3 in nine in iduals presenting with intellectual disability, global developmental delay and a subset of brain anomalies, together with gonadal dysgenesis in 46,XY in iduals. Knockdown of the Drosophila orthologue of SART3 reveals a conserved role in testicular and neuronal development. Human induced pluripotent stem cells carrying patient variants in SART3 show disruption to multiple signalling pathways, upregulation of spliceosome components and demonstrate aberrant gonadal and neuronal differentiation in vitro. Collectively, these findings suggest that bi-allelic SART3 variants underlie a spliceosomopathy which we tentatively propose be termed INDYGON syndrome ( I ntellectual disability, N eurodevelopmental defects and D evelopmental delay with 46,X Y GON adal dysgenesis). Our findings will enable additional diagnoses and improved outcomes for in iduals born with this condition.
Publisher: Springer Science and Business Media LLC
Date: 2013
Publisher: Oxford University Press (OUP)
Date: 05-2018
Publisher: American Society of Hematology
Date: 17-10-2013
DOI: 10.1182/BLOOD-2013-02-481788
Abstract: Human naive CD4+ T cells and resting nTreg are differentially methylated at 127 regions in their genomic DNA. Forkhead-binding motifs are present in promoter-associated differentially methylated regions, inferring broader epigenetic control of Treg.
Publisher: Springer Science and Business Media LLC
Date: 12-09-2017
Publisher: Cold Spring Harbor Laboratory
Date: 14-02-2202
DOI: 10.1101/2023.02.10.23285791
Abstract: A systematic review is a type of literature review that aims to collect and analyse all available evidence from the literature on a particular topic. The process of screening and identifying eligible articles from the vast amounts of literature is a time-consuming task. Specialized software has been developed to aid in the screening process and save significant time and labour. However, the most suitable software tools that are available often come with a cost or only offer either a limited or a trial version for free. In this paper, we report the release of a new software application, Catchii, which contains all the necessary features of a systematic review screener application while being completely free. It supports a user at different stages of screening, from detecting duplicates to creating the final flowchart for a publication. Catchii is designed to provide a good user experience and streamline the screening process through its clean and user-friendly interface on both computers and mobile devices, as well as features such as multi-coloured keyword highlighting, the ability to screen titles and abstracts smoothly with an unstable or even absent internet connection, and more. Catchii is a valuable addition to the current selection of systematic review screening applications that also allows researchers without financial capabilities to access many of the features found in the best paid tools. Catchii is available at catchii.org
Publisher: Elsevier BV
Date: 03-2018
Publisher: Springer Science and Business Media LLC
Date: 2014
Publisher: Springer Science and Business Media LLC
Date: 17-10-2016
DOI: 10.1038/NBT.3702
Abstract: The ability to generate hematopoietic stem cells from human pluripotent cells would enable many biomedical applications. We find that hematopoietic CD34
Publisher: Springer Science and Business Media LLC
Date: 08-03-2012
Publisher: Cold Spring Harbor Laboratory
Date: 05-04-2023
DOI: 10.1101/2023.04.05.535648
Abstract: T-cell acute lymphoblastic leukaemia (T-ALL) is an aggressive and heterogenous haematological malignancy affecting both children and adults. T-ALL subtype identification is an emerging area of active research, with several recent studies proposing potential subtypes based on transcriptomic and genomic analyses. Here we present TALLSorts, a machine-learning bioinformatic tool which classifies T-ALL s les by using bulk RNA sequencing (RNA-seq) data. Trained on four international cohorts totalling 264 s les, TALLSorts exhibits excellent accuracy when tested on holdout and independent test sets. TALLSorts is publicly available for use and will be constantly updated as the field of T-ALL classification further develops.
Publisher: Springer Science and Business Media LLC
Date: 03-10-2013
DOI: 10.1038/NCOMMS3537
Publisher: American Society for Microbiology
Date: 02-2014
DOI: 10.1128/IAI.01152-13
Abstract: It is unknown why only some in iduals are susceptible to acute rheumatic fever (ARF). We investigated whether there are differences in the immune response, detectable by gene expression, between in iduals who are susceptible to ARF and those who are not. Peripheral blood mononuclear cells (PBMCs) from 15 ARF-susceptible and 10 nonsusceptible (control) adults were stimulated with rheumatogenic (Rh+) group A streptococci (GAS) or nonrheumatogenic (Rh−) GAS. RNA from stimulated PBMCs from each subject was cohybridized with RNA from unstimulated PBMCs on oligonucleotide arrays to compare gene expression. Thirty-four genes were significantly differentially expressed between ARF-susceptible and control groups after stimulation with Rh+ GAS. A total of 982 genes were differentially expressed between Rh+ GAS- and Rh− GAS-stimulated s les from ARF-susceptible in iduals. Thirteen genes were differentially expressed in the same direction (predominantly decreased) between the two study groups and between the two stimulation conditions, giving a strong indication of their involvement. Seven of these were immune response genes involved in cytotoxicity, chemotaxis, and apoptosis. There was variability in the degree of expression change between in iduals. The high proportion of differentially expressed apoptotic and immune response genes supports the current model of autoimmune and cytokine dysregulation in ARF. This study also raises the possibility that a “failed” immune response, involving decreased expression of cytotoxic and apoptotic genes, contributes to the immunopathogenesis of ARF.
Publisher: Springer Science and Business Media LLC
Date: 23-08-2017
Publisher: Frontiers Media SA
Date: 25-09-2014
Publisher: Springer Science and Business Media LLC
Date: 03-2021
Publisher: F1000 Research Ltd
Date: 28-04-2017
DOI: 10.12688/F1000RESEARCH.11290.1
Abstract: Background : Single cell RNA sequencing (scRNA-seq) has rapidly gained popularity for profiling transcriptomes of hundreds to thousands of single cells. This technology has led to the discovery of novel cell types and revealed insights into the development of complex tissues. However, many technical challenges need to be overcome during data generation. Due to minute amounts of starting material, s les undergo extensive lification, increasing technical variability. A solution for mitigating lification biases is to include unique molecular identifiers (UMIs), which tag in idual molecules. Transcript abundances are then estimated from the number of unique UMIs aligning to a specific gene, with PCR duplicates resulting in copies of the UMI not included in expression estimates. Methods : Here we investigate the effect of gene length bias in scRNA-Seq across a variety of datasets that differ in terms of capture technology, library preparation, cell types and species. Results : We find that scRNA-seq datasets that have been sequenced using a full-length transcript protocol exhibit gene length bias akin to bulk RNA-seq data. Specifically, shorter genes tend to have lower counts and a higher rate of dropout. In contrast, protocols that include UMIs do not exhibit gene length bias, with a mostly uniform rate of dropout across genes of varying length. Across four different scRNA-Seq datasets profiling mouse embryonic stem cells (mESCs), we found the subset of genes that are only detected in the UMI datasets tended to be shorter, while the subset of genes detected only in the full-length datasets tended to be longer. Conclusions : We find that the choice of scRNA-seq protocol influences the detection rate of genes, and that full-length datasets exhibit gene-length bias. In addition, despite clear differences between UMI and full-length transcript data, we illustrate that full-length and UMI data can be combined to reveal the underlying biology influencing expression of mESCs.
Publisher: American Society of Hematology
Date: 31-07-2014
DOI: 10.1182/BLOOD-2013-12-544106
Abstract: Ezh2 represses Ifng, Gata3, and Il10 loci in naïve CD4+T cells, and its deficiency leads to Th1 skewing and IL-10 overproduction in Th2 cells. Ezh2 deficiency activates multiple death pathways in differentiated effector Th cells.
Publisher: Springer Science and Business Media LLC
Date: 2010
Publisher: Cold Spring Harbor Laboratory
Date: 13-11-2017
DOI: 10.1101/218586
Abstract: Genomic profiling efforts have revealed a rich ersity of oncogenic fusion genes, and many are emerging as important therapeutic targets. While there are many ways to identify fusion genes from RNA-seq data, visualising these transcripts and their supporting reads remains challenging. Clinker is a bioinformatics tool written in Python, R and Bpipe, that leverages the superTranscript method to visualise fusion genes. We demonstrate the use of Clinker to obtain interpretable visualisations of the RNA-seq data that lead to fusion calls. In addition, we use Clinker to explore multiple fusion transcripts with novel breakpoints within the P2RY8-CRLF2 fusion gene in B-cell Acute Lymphoblastic Leukaemia (B-ALL). Clinker is freely available from Github github.com/Oshlack/Clinker under a MIT License. alicia.oshlack@mcri.edu.au
Publisher: Springer Science and Business Media LLC
Date: 22-11-2006
Abstract: Concerns are often raised about the accuracy of microarray technologies and the degree of cross-platform agreement, but there are yet no methods which can unambiguously evaluate precision and sensitivity for these technologies on a whole-array basis. A methodology is described for evaluating the precision and sensitivity of whole-genome gene expression technologies such as microarrays. The method consists of an easy-to-construct titration series of RNA s les and an associated statistical analysis using non-linear regression. The method evaluates the precision and responsiveness of each microarray platform on a whole-array basis, i.e., using all the probes, without the need to match probes across platforms. An experiment is conducted to assess and compare four widely used microarray platforms. All four platforms are shown to have satisfactory precision but the commercial platforms are superior for resolving differential expression for genes at lower expression levels. The effective precision of the two-color platforms is improved by allowing for probe-specific dye-effects in the statistical model. The methodology is used to compare three data extraction algorithms for the Affymetrix platforms, demonstrating poor performance for the commonly used proprietary algorithm relative to the other algorithms. For probes which can be matched across platforms, the cross-platform variability is decomposed into within-platform and between-platform components, showing that platform disagreement is almost entirely systematic rather than due to measurement variability. The results demonstrate good precision and sensitivity for all the platforms, but highlight the need for improved probe annotation. They quantify the extent to which cross-platform measures can be expected to be less accurate than within-platform comparisons for predicting disease progression or outcome.
Publisher: Oxford University Press (OUP)
Date: 07-2018
Publisher: Oxford University Press (OUP)
Date: 06-09-2018
Publisher: Springer Science and Business Media LLC
Date: 27-12-2019
DOI: 10.1038/S41598-019-55970-6
Abstract: An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Publisher: Cold Spring Harbor Laboratory
Date: 06-12-2022
DOI: 10.1101/867309
Abstract: High-throughput single-cell RNA-seq (scRNA-seq) is a powerful tool for studying gene expression in single cells. Most current scRNA-seq bioinformatics tools focus on analysing overall expression levels, largely ignoring alternative mRNA isoform expression. We present a computational pipeline, Sierra, that readily detects differential transcript usage from data generated by commonly used polyA-captured scRNA-seq technology. We validate Sierra by comparing cardiac scRNA-seq cell-types to bulk RNA-seq of matched populations, finding significant overlap in differential transcripts. Sierra detects differential transcript usage across human peripheral blood mononuclear cells and the Tabula Muris, and 3’UTR shortening in cardiac fibroblasts. Sierra is available at github.com/VCCRI/Sierra .
Publisher: Cold Spring Harbor Laboratory
Date: 25-08-2020
DOI: 10.1101/2020.08.24.265702
Abstract: DNA methylation is one of the most commonly studied epigenetic marks, due to its role in disease and development. Illumina methylation arrays have been extensively used to measure methylation across the human genome. Methylation array analysis has primarily focused on preprocessing, normalisation and identification of differentially methylated CpGs and regions. GOmeth and GOregion are new methods for performing unbiased gene set testing following differential methylation analysis. Benchmarking analyses demonstrate GOmeth outperforms other approaches and GOregion is the first method for gene set testing of differentially methylated regions. Both methods are publicly available in the missMethyl Bioconductor R package.
Publisher: Springer Science and Business Media LLC
Date: 22-10-2021
DOI: 10.1186/S13059-021-02507-8
Abstract: Calling fusion genes from RNA-seq data is well established, but other transcriptional variants are difficult to detect using existing approaches. To identify all types of variants in transcriptomes we developed MINTIE, an integrated pipeline for RNA-seq data. We take a reference-free approach, combining de novo assembly of transcripts with differential expression analysis to identify up-regulated novel variants in a case s le. We compare MINTIE with eight other approaches, detecting 85% of variants while no other method is able to achieve this. We posit that MINTIE will be able to identify new disease variants across a range of disease types.
Publisher: Public Library of Science (PLoS)
Date: 13-02-2020
Publisher: Cold Spring Harbor Laboratory
Date: 07-04-2019
DOI: 10.1101/601740
Abstract: In the clinical setting, exome sequencing has become standard-of-care in diagnosing rare genetic disorders, however many patients remain unsolved. Trio sequencing has been demonstrated to produce a higher diagnostic yield than singleton (proband-only) sequencing. Parental sequencing is especially useful when a disease is suspected to be caused by a de novo variant in the proband, because parental data provide a strong filter for the majority of variants that are shared by the proband and their parents. However the additional cost of sequencing the parents makes the trio strategy uneconomical for many clinical situations. With two thirds of the sequencing budget being spent on parents, these are funds that could be used to sequence more probands. For this reason many clinics are reluctant to sequence parents. Here we propose a pooled-parent strategy for exome sequencing of in iduals with likely de novo disease. In this strategy, DNA from all the parents of a cohort of unrelated probands is pooled together into a single exome capture and sequencing run. Variants called in the proband can then be filtered if they are also found in the parent pool, resulting in a shorter list of prioritised variants. To evaluate the pooled-parent strategy we performed a series of simulations by combining reads from in idual exomes to imitate s le pooling. We assessed the recall and false positive rate and investigated the trade-off between pool size and recall rate. We compared the performance of GATK HaplotypeCaller in idual and joint calling, and FreeBayes to genotype pooled s les. Finally, we applied a pooled-parent strategy to a set of real unsolved cases and showed that the parent pool is a powerful filter that is complementary to other commonly used variant filters such as population variant frequencies.
Publisher: Cold Spring Harbor Laboratory
Date: 24-04-2013
Publisher: Springer Science and Business Media LLC
Date: 2013
Publisher: Cold Spring Harbor Laboratory
Date: 03-10-2018
DOI: 10.1101/433003
Abstract: As costs of high throughput sequencing have fallen, we are seeing vast quantities of short read genomic data being generated. Often, the data is exchanged and stored as aligned reads, which provides high compression and convenient access for many analyses. However, aligned data becomes outdated as new reference genomes and alignment methods become available. Moreover, some applications cannot utilise pre-aligned reads at all, necessitating conversion back to raw format (FASTQ) before they can be used. In both cases, the process of extraction and realignment is expensive and time consuming. We describe Bazam, a tool that efficiently extracts the original paired FASTQ from reads stored in aligned form (BAM or CRAM format). Bazam extracts reads in a format that directly allows realignment with popular aligners with high concurrency. Through eliminating steps and increasing the accessible concurrency, Bazam facilitates up to a 90% reduction in the time required for realignment compared to standard methods. Bazam can support selective extraction of read pairs from focused genomic regions, further increasing efficiency for targeted analyses. Bazam is additionally suitable as a base for other applications that require efficient paired read information, such as quality control, structural variant calling and alignment comparison. Bazam offers significant improvements for users needing to realign genomic data.
Publisher: Oxford University Press (OUP)
Date: 06-2009
DOI: 10.1534/GENETICS.108.099960
Abstract: In addition to specific changes in cis- and trans-regulatory elements, structural changes in the genome are hypothesized to underlie a large number of differences in gene expression between species. Accordingly, we show that species-specific segmental duplications are enriched with genes that are differentially expressed between humans and chimpanzees.
Publisher: Informa UK Limited
Date: 05-2011
Abstract: Within-pair comparison of monozygotic (MZ) twins provides an ideal model for studying factors that regulate epigenetic profile, by controlling for genetic variation. Previous reports have demonstrated epigenetic variability within MZ pairs, but the contribution of early life exposures to this variation remains unclear. As epigenetic marks govern gene expression, we have used gene expression discordance as a proxy measure of epigenetic discordance in MZ twins at birth in two cell types. We found strong evidence of expression discordance at birth in both cell types and some evidence for higher discordance in twin pairs with separate placentas. Genes previously defined as being involved in response to the external environment showed the most variable expression within pairs, independent of cell type, supporting the idea that even slight differences in intrauterine environment can influence expression profile. Focusing on birthweight, previously identified as a predisposing factor for cardiovascular, metabolic and other complex diseases, and using a statistical model that estimated association based on within-pair variation of expression and birthweight, we found some association between birthweight and expression of genes involved in metabolism and cardiovascular function. This study is the first to examine expression discordance in newborn twins. It provides evidence of a link between birthweight and activity of specific cellular pathways and, as evidence points to gene expression profiles being maintained through cell ision by epigenetic factors, provides a plausible biological mechanism for the previously described link between low birthweight and increased risk of later complex disease.
Publisher: Springer Science and Business Media LLC
Date: 04-08-2017
Publisher: Wiley
Date: 07-06-2013
DOI: 10.1002/PATH.4209
Abstract: Oncogenic fusion genes that involve kinases have proven to be effective targets for therapy in a wide range of cancers. Unfortunately, the diagnostic approaches required to identify these events are struggling to keep pace with the erse array of genetic alterations that occur in cancer. Diagnostic screening in solid tumours is particularly challenging, as many fusion genes occur with a low frequency. To overcome these limitations, we developed a capture enrichment strategy to enable high-throughput transcript sequencing of the human kinome. This approach provides a global overview of kinase fusion events, irrespective of the identity of the fusion partner. To demonstrate the utility of this system, we profiled 100 non-small cell lung cancers and identified numerous genetic alterations impacting fibroblast growth factor receptor 3 (FGFR3) in lung squamous cell carcinoma and a novel ALK fusion partner in lung adenocarcinoma.
Publisher: Oxford University Press (OUP)
Date: 07-06-2011
DOI: 10.1093/NAR/GKR416
Publisher: American Astronomical Society
Date: 10-09-2001
DOI: 10.1086/322299
Publisher: Wiley
Date: 26-04-2017
DOI: 10.1002/ACN3.409
Publisher: Springer Science and Business Media LLC
Date: 2013
DOI: 10.1186/GM500
Publisher: Cold Spring Harbor Laboratory
Date: 22-03-2017
DOI: 10.1101/119222
Abstract: Single cell RNA sequencing (scRNA-seq) has rapidly gained popularity for profiling transcriptomes of hundreds to thousands of single cells. This technology has led to the discovery of novel cell types and revealed insights into the development of complex tissues. However, many technical challenges need to be overcome during data generation. Due to minute amounts of starting material, s les undergo extensive lification, increasing technical variability. A solution for mitigating lification biases is to include Unique Molecular Identifiers (UMIs), which tag in idual molecules. Transcript abundances are then estimated from the number of unique UMIs aligning to a specific gene and PCR duplicates resulting in copies of the UMI are not included in expression estimates. Here we investigate the effect of gene length bias in scRNA-Seq across a variety of datasets differing in terms of capture technology, library preparation, cell types and species. We find that scRNA-seq datasets that have been sequenced using a full-length transcript protocol exhibit gene length bias akin to bulk RNA-seq data. Specifically, shorter genes tend to have lower counts and a higher rate of dropout. In contrast, protocols that include UMIs do not exhibit gene length bias, and have a mostly uniform rate of dropout across genes of varying length. Across four different scRNA-Seq datasets profiling mouse embryonic stem cells (mESCs), we found the subset of genes that are only detected in the UMI datasets tended to be shorter, while the subset of genes detected only in the full-length datasets tended to be longer. We briefly discuss the role of these genes in the context of differential expression testing and GO analysis. In addition, despite clear differences between UMI and full-length transcript data, we illustrate that full-length and UMI data can be combined to reveal underlying biology influencing expression of mESCs.
Publisher: F1000 Research Ltd
Date: 07-03-2019
DOI: 10.12688/F1000RESEARCH.18276.1
Abstract: Background: RNA sequencing has enabled high-throughput and fine-grained quantitative analyses of the transcriptome. While differential gene expression is the most widely used application of this technology, RNA-seq data also has the resolution to infer differential transcript usage (DTU), which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. DTU has typically been inferred from exon-count data, which has issues with assigning reads unambiguously to counting bins, and requires alignment of reads to the genome. Recently, approaches have emerged that use transcript quantifications estimates directly for DTU. Transcript counts can be inferred from 'pseudo' or lightweight aligners, which are significantly faster than traditional genome alignment. However, recent evaluations show lower sensitivity in DTU analysis. Transcript abundances are estimated from equivalence classes (ECs), which determine the transcripts that any given read is compatible with. Recent work has proposed performing differential expression testing directly on equivalence class read counts (ECs). Methods: Here we demonstrate that ECs can be used effectively with existing count-based methods for detecting DTU. We evaluate this approach on simulated human and drosophila data, as well as on a real dataset through subset testing. Results: We find that ECs counts have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners. Conclusions: We posit that equivalence class read counts are a natural unit on which to perform many types of analysis.
Publisher: F1000 Research Ltd
Date: 29-04-2019
DOI: 10.12688/F1000RESEARCH.18276.2
Abstract: Background: RNA sequencing has enabled high-throughput and fine-grained quantitative analyses of the transcriptome. While differential gene expression is the most widely used application of this technology, RNA-seq data also has the resolution to infer differential transcript usage (DTU), which can elucidate the role of different transcript isoforms between experimental conditions, cell types or tissues. DTU has typically been inferred from exon-count data, which has issues with assigning reads unambiguously to counting bins, and requires alignment of reads to the genome. Recently, approaches have emerged that use transcript quantification estimates directly for DTU. Transcript counts can be inferred from 'pseudo' or lightweight aligners, which are significantly faster than traditional genome alignment. However, recent evaluations show lower sensitivity in DTU analysis compared to exon-level analysis. Transcript abundances are estimated from equivalence classes (ECs), which determine the transcripts that any given read is compatible with. Recent work has proposed performing a variety of RNA-seq analysis directly on equivalence class counts (ECCs). Methods: Here we demonstrate that ECCs can be used effectively with existing count-based methods for detecting DTU. We evaluate this approach on simulated human and drosophila data, as well as on a real dataset through subset testing. Results: We find that ECCs have similar sensitivity and false discovery rates as exon-level counts but can be generated in a fraction of the time through the use of pseudo-aligners. Conclusions: We posit that equivalence class read counts are a natural unit on which to perform differential transcript usage analysis.
Publisher: Informa UK Limited
Date: 30-11-2021
DOI: 10.1080/14767058.2021.2005573
Abstract: To address the disproportionate burden of preterm birth (PTB) in low- and middle-income countries, this study aimed to (1) verify the performance of the United States-validated spontaneous PTB (sPTB) predictor, comprised of the IBP4/SHBG protein ratio, in subjects from Bangladesh, Pakistan and Tanzania enrolled in the Alliance for Maternal and Newborn Health Improvement (AMANHI) biorepository study, and (2) discover biomarkers that improve performance of IBP4/SHBG in the AMANHI cohort. The performance of the IBP4/SHBG biomarker was first evaluated in a nested case control validation study, then utilized in a follow-on discovery study performed on the same s les. Levels of serum proteins were measured by targeted mass spectrometry. Differences between the AMANHI and U.S. cohorts were adjusted using body mass index (BMI) and gestational age (GA) at blood draw as covariates. Prediction of sPTB < 37 weeks and < 34 weeks was assessed by area under the receiver operator curve (AUC). In the discovery phase, an artificial intelligence method selected additional protein biomarkers complementary to IBP4/SHBG in the AMANHI cohort. The IBP4/SHBG biomarker significantly predicted sPTB < 37 weeks ( A protein biomarker pair developed in the U.S. may have broader application in erse non-U.S. populations.
Start Date: 07-2011
End Date: 12-2019
Amount: $21,000,000.00
Funder: Australian Research Council
View Funded Activity