ORCID Profile
0000-0003-3861-0472
Current Organisation
Garvan Institute of Medical Research
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Developmental Genetics (incl. Sex Determination) | Evolution of Developmental Systems | Genetics
Expanding Knowledge in the Environmental Sciences | Effects of Climate Change and Variability on Australia (excl. Social Impacts) | Expanding Knowledge in the Biological Sciences |
Publisher: Wiley
Date: 14-05-2022
DOI: 10.1002/JMV.27832
Publisher: Proceedings of the National Academy of Sciences
Date: 24-01-2022
Abstract: Reptiles have an extraordinary variety of mechanisms to determine sex. The best candidate sex-determining gene in our model reptile (the Australian central bearded dragon) is the key vertebrate sex gene nr5a1 (coding for the steroidogenic factor 1). There are no sex-specific sequence differences between nr5a1 alleles on the sex chromosomes, but the Z- and W-borne alleles are transcribed into remarkably different alternative transcripts. We propose that altered configuration of the repeat-laden W chromosome affects the conformation of the primary transcript to generate more erse and potentially inhibitory W-borne isoforms that suppress testis determination. This is a mechanism for vertebrate sex determination, in which epigenetic control regulates the action of a gene present on both sex chromosomes.
Publisher: Research Square Platform LLC
Date: 14-04-2022
DOI: 10.21203/RS.3.RS-1495587/V1
Abstract: Chimeric antigen receptor (CAR) T cells have demonstrable efficacy in treating B-cell malignancies. Factors such as product composition, lymphodepletion and immune reconstitution are known to influence functional persistence of CAR+ T cells. However, little is known about the determinants of differentiation and phenotypic plasticity of CAR+ T and immune cells early post-infusion. We report single cell multi-omics analysis of molecular, clonal, and phenotypic profiles of CAR+ T and other immune cells circulating in patients receiving donor-derived products. We used these data to reconstruct a differentiation trajectory, which explained the observed phenotypic plasticity and identified cell fate of CAR+ and CAR- T cells. Following lympho-depletion, endogenous CAR- CD8+ and š¾įŗ T cells, clonally expand, and differentiate across heterogenous phenotypes, from a dominant resting or proliferating state into precursor of exhausted T cells, and notably into a terminal NK-like phenotype. In parallel, following infusion, CAR+ T cells undergo a similar differentiation trajectory, showing increased proliferation, metabolic activity and exhaustion when compared to circulating CAR- T cells. The subset of NK-like CAR+ T cells was associated with increasing levels of circulating proinflammatory cytokines, including innate-like IL-12 and IL-18. These results demonstrate that differentiation and phenotype of CAR+ T cells are determined by non-CAR induced signals that are shared with endogenous T cells, and condition the patientsā immune-recovery.
Publisher: MDPI AG
Date: 19-01-2022
DOI: 10.3390/V14020185
Abstract: Whole-genome sequencing of viral isolates is critical for informing transmission patterns and for the ongoing evolution of pathogens, especially during a pandemic. However, when genomes have low variability in the early stages of a pandemic, the impact of technical and/or sequencing errors increases. We quantitatively assessed inter-laboratory differences in consensus genome assemblies of 72 matched SARS-CoV-2-positive specimens sequenced at different laboratories in Sydney, Australia. Raw sequence data were assembled using two different bioinformatics pipelines in parallel, and resulting consensus genomes were compared to detect laboratory-specific differences. Matched genome sequences were predominantly concordant, with a median pairwise identity of 99.997%. Identified differences were predominantly driven by ambiguous site content. Ignoring these produced differences in only 2.3% (5/216) of pairwise comparisons, each differing by a single nucleotide. Matched s les were assigned the same Pango lineage in 98.2% (212/216) of pairwise comparisons, and were mostly assigned to the same phylogenetic clade. However, epidemiological inference based only on single nucleotide variant distances may lead to significant differences in the number of defined clusters if variant allele frequency thresholds for consensus genome generation differ between laboratories. These results underscore the need for a unified, best-practices approach to bioinformatics between laboratories working on a common outbreak problem.
Publisher: Elsevier BV
Date: 08-2022
Publisher: Oxford University Press (OUP)
Date: 2023
DOI: 10.1093/BRAINCOMMS/FCAD208
Abstract: Cerebellar ataxia, neuropathy and vestibular areflexia syndrome is a progressive, generally late-onset, neurological disorder associated with biallelic pentanucleotide expansions in Intron 2 of the RFC1 gene. The locus exhibits substantial genetic variability, with multiple pathogenic and benign pentanucleotide repeat alleles previously identified. To determine the contribution of pathogenic RFC1 expansions to neurological disease within an Australasian cohort and further investigate the heterogeneity exhibited at the locus, a combination of flanking and repeat-primed PCR was used to screen a cohort of 242 Australasian patients with neurological disease. Patients whose data indicated large gaps within expanded alleles following repeat-primed PCR, underwent targeted long-read sequencing to identify novel repeat motifs at the locus. To increase diagnostic yield, additional probes at the RFC1 repeat region were incorporated into the PathWest diagnostic laboratory targeted neurological disease gene panel to enable first-pass screening of the locus for all s les tested on the panel. Within the Australasian cohort, we detected known pathogenic biallelic expansions in 15.3% (n = 37) of patients. Thirty indicated biallelic AAGGG expansions, two had biallelic āMÄori allelesā [(AAAGG)exp(AAGGG)exp], two s les were compound heterozygous for the MÄori allele and an AAGGG expansion, two s les had biallelic ACAGG expansions and one s le was compound heterozygous for the ACAGG and AAGGG expansions. Forty-five s les tested indicated the presence of biallelic expansions not known to be pathogenic. A large proportion (84%) showed complex interrupted patterns following repeat-primed PCR, suggesting that these expansions are likely to be comprised of more than one repeat motif, including previously unknown repeats. Using targeted long-read sequencing, we identified three novel repeat motifs in expanded alleles. Here, we also show that short-read sequencing can be used to reliably screen for the presence or absence of biallelic RFC1 expansions in all s les tested using the PathWest targeted neurological disease gene panel. Our results show that RFC1 pathogenic expansions make a substantial contribution to neurological disease in the Australasian population and further extend the heterogeneity of the locus. To accommodate the increased complexity, we outline a multi-step workflow utilizing both targeted short- and long-read sequencing to achieve a definitive genotype and provide accurate diagnoses for patients.
Publisher: Springer Science and Business Media LLC
Date: 13-04-2022
DOI: 10.1038/S41597-022-01276-8
Abstract: Recently we reported the accuracy and reproducibility of circulating tumor DNA (ctDNA) assays using a unique set of reference materials, associated analytical framework, and suggested best practices. With the rapid adoption of ctDNA sequencing in precision oncology, it is critical to understand the analytical validity and technical limitations of this cutting-edge and medical-practice-changing technology. The SEQC2 Oncopanel Sequencing Working Group has developed a multi-site, cross-platform study design for evaluating the analytical performance of five industry-leading ctDNA assays. The study used tailor-made reference s les at various levels of input material to assess ctDNA sequencing across 12 participating clinical and research facilities. The generated dataset encompasses multiple key variables, including a broad range of mutation frequencies, sequencing coverage depth, DNA input quantity, etc. It is the most comprehensive public-facing dataset of its kind and provides valuable insights into ultra-deep ctDNA sequencing technology. Eventually the clinical utility of ctDNA assays is required and our proficiency study and corresponding dataset are needed steps towards this goal.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 02-06-2017
Abstract: Alternative splicing in chromatin-modifying genes is associated with temperature-dependent sex in ergent reptile lineages.
Publisher: Cold Spring Harbor Laboratory
Date: 22-04-2021
DOI: 10.1101/2021.04.21.440861
Abstract: InterARTIC is an interactive web application for the analysis of viral whole-genome sequencing (WGS) data generated on Oxford Nanopore Technologies (ONT) devices. A graphical interface enables users with no bioinformatics expertise to analyse WGS experiments and reconstruct consensus genome sequences from in idual isolates of viruses, such as SARS-CoV-2. InterARTIC is intended to facilitate widespread adoption and standardisation of ONT sequencing for viral surveillance and molecular epidemiology. We demonstrate the use of InterARTIC for the analysis of ONT viral WGS data from SARS-CoV-2 and Ebola virus, using a laptop computer or the internal computer on an ONT GridION sequencing device. We showcase the intuitive graphical interface, workflow customisation capabilities and job-scheduling system that facilitate execution of small- and large-scale WGS projects on any common virus. InterARTIC is a free, open-source web application implemented in Python. The application can be downloaded as a set of pre-compiled binaries that are compatible with all common Ubuntu distributions, or built from source. For further details please visit: github.com/Psy-Fer/interARTIC/ .
Publisher: Cold Spring Harbor Laboratory
Date: 26-02-2021
DOI: 10.1101/2021.02.17.21251943
Abstract: Australiaās early COVID-19 experience involved clusters in northern Sydney, including hospital and aged-care facility (ACF) outbreaks. We explore transmission dynamics, drivers and outcomes of a metropolitan hospital COVID-19 outbreak that occurred in the context of established local community transmission. A retrospective cohort analysis is presented, with integration of viral genome sequencing, clinical and epidemiological data. We demonstrate using genomic epidemiology that the hospital outbreak (n=23) was linked to a concurrent outbreak at a local aged care facility, but was phylogenetically distinct from other community clusters. Thirty day survival was 50% for hospitalised patients (an elderly cohort with significant comorbidities) and 100% for staff. Staff who acquired infection were unable to attend work for a median of 26.5 days (range 14-191) an additional 140 staff were furloughed for quarantine. Transmission from index cases showed a wide dispersion (mean 3.5 persons infected for every patient case and 0.6 persons infected for every staff case). One patient, who received regular nebulised medication prior to their diagnosis being known, acted as an apparent superspreader. No secondary transmissions occurred from isolated cases or contacts who were quarantined prior to becoming infectious. This analysis elaborates the wide-ranging impacts on patients and staff of nosocomial COVID-19 transmission and highlights the utility of genomic analysis as an adjunct to traditional epidemiological investigations. Delayed case recognition resulted in nosocomial transmission but once recognised, prompt action by the outbreak management team and isolation with contact and droplet (without airborne) precautions were sufficient to prevent transmission within this cohort. Our findings support current PPE recommendations in Australia but demonstrate the risk of administering nebulised medications when COVID-19 is circulating locally.
Publisher: Cold Spring Harbor Laboratory
Date: 04-08-2020
DOI: 10.1101/2020.08.04.236893
Abstract: Viral whole-genome sequencing (WGS) provides critical insight into the transmission and evolution of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Long-read sequencing devices from Oxford Nanopore Technologies (ONT) promise significant improvements in turnaround time, portability and cost, compared to established short-read sequencing platforms for viral WGS (e.g., Illumina). However, adoption of ONT sequencing for SARS-CoV-2 surveillance has been limited due to common concerns around sequencing accuracy. To address this, we performed viral WGS with ONT and Illumina platforms on 157 matched SARS-CoV-2-positive patient specimens and synthetic RNA controls, enabling rigorous evaluation of analytical performance. Despite the elevated error rates observed in ONT sequencing reads, highly accurate consensus-level sequence determination was achieved, with single nucleotide variants (SNVs) detected at % sensitivity and % precision above a minimum ~ 60-fold coverage depth, thereby ensuring suitability for SARS-CoV-2 genome analysis. ONT sequencing also identified a surprising ersity of structural variation within SARS-CoV-2 specimens that were supported by evidence from short-read sequencing on matched s les. However, ONT sequencing failed to accurately detect short indels and variants at low read-count frequencies. This systematic evaluation of analytical performance for SARS-CoV-2 WGS will facilitate widespread adoption of ONT sequencing within local, national and international COVID-19 public health initiatives.
Publisher: Springer Science and Business Media LLC
Date: 09-12-2020
DOI: 10.1038/S41467-020-20075-6
Abstract: Viral whole-genome sequencing (WGS) provides critical insight into the transmission and evolution of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Long-read sequencing devices from Oxford Nanopore Technologies (ONT) promise significant improvements in turnaround time, portability and cost, compared to established short-read sequencing platforms for viral WGS (e.g., Illumina). However, adoption of ONT sequencing for SARS-CoV-2 surveillance has been limited due to common concerns around sequencing accuracy. To address this, here we perform viral WGS with ONT and Illumina platforms on 157 matched SARS-CoV-2-positive patient specimens and synthetic RNA controls, enabling rigorous evaluation of analytical performance. We report that, despite the elevated error rates observed in ONT sequencing reads, highly accurate consensus-level sequence determination was achieved, with single nucleotide variants (SNVs) detected at % sensitivity and % precision above a minimum ~60-fold coverage depth, thereby ensuring suitability for SARS-CoV-2 genome analysis. ONT sequencing also identified a surprising ersity of structural variation within SARS-CoV-2 specimens that were supported by evidence from short-read sequencing on matched s les. However, ONT sequencing failed to accurately detect short indels and variants at low read-count frequencies. This systematic evaluation of analytical performance for SARS-CoV-2 WGS will facilitate widespread adoption of ONT sequencing within local, national and international COVID-19 public health initiatives.
Publisher: Cold Spring Harbor Laboratory
Date: 20-06-2022
DOI: 10.1101/2022.06.19.496732
Abstract: Nanopore sequencing is an emerging technology that is being rapidly adopted in research and clinical genomics. We recently developed SLOW5, a new file format for storage and analysis of raw data from nanopore sequencing experiments. SLOW5 is a community-centric, open source format that offers considerable performance benefits over the existing nanopore data format, known as FAST5. Here we introduce slow5tools , a simple, intuitive toolkit for handling nanopore raw signal data in SLOW5 format. Slow5tools enables lossless FAST5-to-SLOW5 and SLOW5-to-FAST5 data conversion, and a range of tools for structuring, indexing, viewing and querying SLOW5 files. Slow5tools uses multi-threading, multi-processing and other engineering strategies to achieve fast data conversion and manipulation, including live FAST5-to-SLOW5 conversion during sequencing. We outline a series of ex les and benchmarking experiments to illustrate slow5tools usage, and describe the engineering principles underpinning its high performance. Slow5tools is an essential toolkit for handling nanopore signal data, which was developed to support adoption of SLOW5 by the nanopore community. Slow5tools is written in C/C++ with minimal dependencies and is freely available as an open-source program under an MIT licence: asindu2008/slow5tools .
Publisher: Cold Spring Harbor Laboratory
Date: 19-10-2023
Publisher: Wiley
Date: 05-03-2022
DOI: 10.1111/JNS.12485
Abstract: Biallelic mutations in sorbitol dehydrogenase ( SORD ) have been recently identified as a common cause of recessive axonal CharcotāMarieāTooth neuropathy (CMT2). We aimed to assess a novel longāread sequencing approach to overcome current limitations in SORD neuropathy diagnostics due to the SORD2P pseudogene and the phasing of biallelic mutations in recessive disease. We conducted a screen of our Australian whole exome sequencing (WES) CMT cohort to identify in iduals with homozygous or compound heterozygous SORD variants. In iduals detected with SORD mutations then underwent longāread sequencing, clinical assessment, and serum sorbitol analysis. An in idual was detected with compound heterozygous truncating mutations in SORD exon 7, NM_003104.5:c.625C T (p.Arg209Ter) and NM_003104.5:c.757del (p.Ala253GlnfsTer27). Subsequent Oxford Nanopore Tech (ONT) longāread sequencing was used to successfully differentiate SORD from the highly homologous nonāfunctional SORD2P pseudogene and confirmed that the mutations were biallelic through haplotypeāresolved analysis. The patient presented with axonal sensorimotor polyneuropathy (CMT2) and ulnar neuropathy without compression at the elbow. Burning neuropathic pain in the forearms and feet was also reported and was exacerbated by alcohol consumption and improved with alcohol cessation. UPLCātandem mass spectrometry confirmed that the patient had elevated serum sorbitol levels (12.0 mg/L) consistent with levels previously observed in patients with biallelic SORD mutations. This represents a novel clinical presentation and expands the phenotype associated with biallelic SORD mutations causing CMT2. Our study is the first report of longāread sequencing for an in idual with CMT and demonstrates the utility of this approach for clinical genomics.
Publisher: Cold Spring Harbor Laboratory
Date: 06-04-2021
DOI: 10.1101/2021.04.06.438497
Abstract: The primary objective of the FDA-led Sequencing and Quality Control Phase 2 (SEQC2) project is to develop standard analysis protocols and quality control metrics for use in DNA testing to enhance scientific research and precision medicine. This study reports a targeted next generation sequencing (NGS) method that enables more accurate detection of actionable mutations in circulating tumor DNA (ctDNA) clinical specimens. This advancement was enabled by designing a synthetic internal standard spike-in for each actionable mutation target, suitable for use in NGS following hybrid-capture enrichment and unique molecular index (UMI) or non-UMI library preparation. When mixed with contrived ctDNA reference s les, internal standards enabled calculation of technical error rate, limit of blank, and limit of detection for each variant at each nucleotide position, in each s le. True positive mutations with variant allele fraction too low for detection by current practice were detected with this method, thereby increasing sensitivity.
Publisher: Cold Spring Harbor Laboratory
Date: 30-06-2021
DOI: 10.1101/2021.06.29.450255
Abstract: Nanopore sequencing is an emerging genomic technology with great potential. However, the storage and analysis of nanopore sequencing data have become major bottlenecks preventing more widespread adoption in research and clinical genomics. Here, we elucidate an inherent limitation in the file format used to store raw nanopore data ā known as FAST5 ā that prevents efficient analysis on high-performance computing (HPC) systems. To overcome this, we have developed SLOW5, an alternative file format that permits efficient parallelisation and, thereby, acceleration of nanopore data analysis. For ex le, we show that using SLOW5 format, instead of FAST5, reduces the time and cost of genome-wide DNA methylation profiling by an order of magnitude on common HPC systems, and delivers consistent improvements on a wide range of different architectures. With a simple, accessible file structure and a ~ 25% reduction in size compared to FAST5, SLOW5 format will deliver substantial benefits to all areas of the nanopore community.
Publisher: Cold Spring Harbor Laboratory
Date: 10-2021
DOI: 10.1101/2021.09.27.21263187
Abstract: Short-tandem repeat (STR) expansions are an important class of pathogenic genetic variants. Over forty neurological and neuromuscular diseases are caused by STR expansions, with 37 different genes implicated to date. Here we describe the use of programmable targeted long-read sequencing with Oxford Nanoporeās ReadUntil function for parallel genotyping of all known neuropathogenic STRs in a single, simple assay. Our approach enables accurate, haplotype-resolved assembly and DNA methylation profiling of expanded and non-expanded STR sites. In doing so, the assay correctly diagnoses all in iduals in a cohort of patients ( n = 27) with various neurogenetic diseases, including Huntingtonās disease, fragile X syndrome and cerebellar ataxia (CANVAS) and others. Targeted long-read sequencing solves large and complex STR expansions that confound established molecular tests and short-read sequencing, and identifies non-canonical STR motif conformations and internal sequence interruptions. Even in our relatively small cohort, we observe a wide ersity of STR alleles of known and unknown pathogenicity, suggesting that long-read sequencing will redefine the genetic landscape of STR expansion disorders. Finally, we show how the flexible inclusion of pharmacogenomics (PGx) genes as secondary ReadUntil targets can identify clinically actionable PGx genotypes to further inform patient care, at no extra cost. Our study addresses the need for improved techniques for genetic diagnosis of STR expansion disorders and illustrates the broad utility of programmable long-read sequencing for clinical genomics. This study describes the development and validation of a programmable targeted nanopore sequencing assay for parallel genetic diagnosis of all known pathogenic short-tandem repeats (STRs) in a single, simple test.
Publisher: Research Square Platform LLC
Date: 13-07-2021
DOI: 10.21203/RS.3.RS-668517/V1
Abstract: Nanopore sequencing is an emerging genomic technology with great potential. However, the storage and analysis of nanopore sequencing data have become major bottlenecks preventing more widespread adoption in research and clinical genomics. Here, we elucidate an inherent limitation in the file format used to store raw nanopore data ā known as FAST5 ā that prevents efficient analysis on high-performance computing (HPC) systems. To overcome this we have developed SLOW5, an alternative file format that permits efficient parallelisation and, thereby, acceleration of nanopore data analysis. For ex le, we show that using SLOW5 format, instead of FAST5, reduces the time and cost of genome-wide DNA methylation profiling by an order of magnitude on common HPC systems, and delivers consistent improvements on a wide range of different architectures. With a simple, accessible file structure and a ~25% reduction in size compared to FAST5, SLOW5 format will deliver substantial benefits to all areas of the nanopore community.
Publisher: Cold Spring Harbor Laboratory
Date: 23-07-2022
DOI: 10.1101/2022.07.22.501046
Abstract: We have designed, constructed, and debugged a synthetic 753,096 bp version of Saccharomyces cerevisiae chromosome XIV as part of the international Sc2.0 project. We showed that certain synthetic loxPsym recombination sites can interfere with mitochondrial protein localization, that the deletion of one intron ( NOG2 ) reduced fitness, and that a reassigned stop codon can lead to a growth defect. In parallel to these rational debugging modifications, we used Adaptive Laboratory Evolution to generate a general growth defect suppressor rearrangement in the form of increased TAR1 copy number. We also extended the utility of the Synthetic Chromosome Recombination and Modification by LoxP-mediated Evolution (SCRaMbLE) system by engineering synthetic-wild-type tetraploid hybrid strains that buffer against essential gene loss. The presence of wild-type chromosomes in the hybrid tetraploids increased post-SCRaMbLE viability and heterologous DNA integration, highlighting the plasticity of the S. cerevisiae genome in the presence of rational and non-rational modifications.
Publisher: Springer Science and Business Media LLC
Date: 22-03-2019
DOI: 10.1038/S41467-019-09272-0
Abstract: Chirality is a property describing any object that is inequivalent to its mirror image. Due to its 5ā²ā3ā² directionality, a DNA sequence is distinct from a mirrored sequence arranged in reverse nucleotide-order, and is therefore chiral. A given sequence and its opposing chiral partner sequence share many properties, such as nucleotide composition and sequence entropy. Here we demonstrate that chiral DNA sequence pairs also perform equivalently during molecular and bioinformatic techniques that underpin genetic analysis, including PCR lification, hybridization, whole-genome, target-enriched and nanopore sequencing, sequence alignment and variant detection. Given these shared properties, synthetic DNA sequences mirroring clinically relevant or analytically challenging regions of the human genome are ideal controls for clinical genomics. The addition of synthetic chiral sequences (sequins) to patient tumor s les can prevent false-positive and false-negative mutation detection to improve diagnosis. Accordingly, we propose that sequins can fulfill the need for commutable internal controls in precision medicine.
Publisher: Cold Spring Harbor Laboratory
Date: 25-11-2022
DOI: 10.1101/2021.11.25.468910
Abstract: Our understanding of the molecular pathology of posttraumatic stress disorder (PTSD) is rapidly evolving and is being driven by advances in sequencing techniques. Conventional short-read RNA sequencing (RNA-seq) is a central tool in transcriptomics research that enables unbiased gene expression profiling. With the recent emergence of Oxford Nanopore direct RNA-seq (dRNA-seq), it is now also possible to interrogate erse RNA modifications, collectively known as the āepitranscriptomeā. Here, we present our analyses of the male and female mouse amygdala transcriptome and epitranscriptome, obtained using parallel Illumina RNA-seq and Oxford Nanopore dRNA-seq, associated with the acquisition of PTSD-like fear induced by Pavlovian cued-fear conditioning. We report significant sex-specific differences in the amygdala transcriptional response during fear acquisition, and a range of shared and dimorphic epitranscriptomic signatures. Differential RNA modifications are enriched among mRNA transcripts associated with neurotransmitter regulation and mitochondrial function, many of which have been previously implicated in PTSD. Very few differentially modified transcripts are also differentially expressed, suggesting an influential, expression-independent role for epitranscriptional regulation in PTSD-like fear-acquisition. Overall, our application of conventional and newly developed methods provides a platform for future work that will lead to new insights into and therapeutics for PTSD.
Publisher: Springer Science and Business Media LLC
Date: 08-04-2020
DOI: 10.1038/S41467-020-15697-9
Abstract: An amendment to this paper has been published and can be accessed via a link at the top of the paper.
Publisher: Springer Science and Business Media LLC
Date: 27-03-2019
DOI: 10.1038/S41467-019-09374-9
Abstract: Fusion genes are a major cause of cancer. Their rapid and accurate diagnosis can inform clinical action, but current molecular diagnostic assays are restricted in resolution and throughput. Here, we show that targeted RNA sequencing (RNAseq) can overcome these limitations. First, we establish that fusion gene detection with targeted RNAseq is both sensitive and quantitative by optimising laboratory and bioinformatic variables using spike-in standards and cell lines. Next, we analyse a clinical patient cohort and improve the overall fusion gene diagnostic rate from 63% with conventional approaches to 76% with targeted RNAseq while demonstrating high concordance for patient s les with previous diagnoses. Finally, we show that targeted RNAseq offers additional advantages by simultaneously measuring gene expression levels and profiling the immune-receptor repertoire. We anticipate that targeted RNAseq will improve clinical fusion gene detection, and its increasing use will provide a deeper understanding of fusion gene biology.
Publisher: Cold Spring Harbor Laboratory
Date: 31-08-2018
DOI: 10.1101/404285
Abstract: Chirality is a geometric property describing any object that is inequivalent to a mirror image of itself. Due to its 5ā-3ā directionality, a DNA sequence is distinct from a mirrored sequence arranged in reverse nucleotide order, and is therefore chiral. A given sequence and its opposing chiral partner sequence share many properties, such as nucleotide composition and sequence entropy. Here we demonstrate that chiral DNA sequence pairs also perform equivalently during molecular and bioinformatic techniques that underpin modern genetic analysis, including PCR lification, hybridization, whole-genome, target-enriched and nanopore sequencing, sequence alignment and variant detection. Given these shared properties, synthetic DNA sequences that directly mirror clinically relevant and/or analytically challenging regions of the human genome are ideal reference standards for clinical genomics. We show how the addition of chiral DNA standards to patient tumor s les can prevent false-positive and false-negative mutation detection and, thereby, improve diagnosis. Accordingly, we propose that chiral DNA standards can fulfill the unmet need for commutable internal reference standards in precision medicine.
Publisher: Cold Spring Harbor Laboratory
Date: 22-08-2021
DOI: 10.1101/2021.08.19.21262296
Abstract: Whole-genome sequencing of viral isolates is critical for informing transmission patterns and ongoing evolution of pathogens, especially during a pandemic. However, when genomes have low variability in the early stages of a pandemic, the impact of technical and/or sequencing errors increases. We quantitatively assessed inter-laboratory differences in consensus genome assemblies of 72 matched SARS-CoV-2-positive specimens sequenced at different laboratories in Sydney, Australia. Raw sequence data were assembled using two different bioinformatics pipelines in parallel, and resulting consensus genomes were compared to detect laboratory-specific differences. Matched genome sequences were predominantly concordant, with a median pairwise identity of 99.997%. Identified differences were predominantly driven by ambiguous site content. Ignoring these produced differences in only 2.3% (5/216) of pairwise comparisons, each differing by a single nucleotide. Matched s les were assigned the same Pango lineage in 98.2% (212/216) of pairwise comparisons, and were mostly assigned to the same phylogenetic clade. However, epidemiological inference based only on single nucleotide variant distances may lead to significant differences in the number of defined clusters if variant allele frequency thresholds for consensus genome generation differ between laboratories. These results underscore the need for a unified, best-practices approach to bioinformatics between laboratories working on a common outbreak problem.
Publisher: Springer Science and Business Media LLC
Date: 06-08-2018
DOI: 10.1038/S41467-018-05555-0
Abstract: The complexity of microbial communities, combined with technical biases in next-generation sequencing, pose a challenge to metagenomic analysis. Here, we develop a set of internal DNA standards, termed āsequinsā (sequencing spike-ins), that together constitute a synthetic community of artificial microbial genomes. Sequins are added to environmental DNA s les prior to library preparation, and undergo concurrent sequencing with the accompanying s le. We validate the performance of sequins by comparison to mock microbial communities, and demonstrate their use in the analysis of real metagenome s les. We show how sequins can be used to measure fold change differences in the size and structure of accompanying microbial communities, and perform quantitative normalization between s les. We further illustrate how sequins can be used to benchmark and optimize new methods, including nanopore long-read sequencing technology. We provide metagenome sequins, along with associated data sets, protocols, and an accompanying software toolkit, as reference standards to aid in metagenomic studies.
Publisher: Springer Science and Business Media LLC
Date: 25-05-2021
DOI: 10.1186/S40478-021-01201-X
Abstract: Short tandem repeat (STR) expansion disorders are an important cause of human neurological disease. They have an established role in more than 40 different phenotypes including the myotonic dystrophies, Fragile X syndrome, Huntingtonās disease, the hereditary cerebellar ataxias, amyotrophic lateral sclerosis and frontotemporal dementia. STR expansions are difficult to detect and may explain unsolved diseases, as highlighted by recent findings including: the discovery of a biallelic intronic āAAGGGā repeat in RFC1 as the cause of cerebellar ataxia, neuropathy, and vestibular areflexia syndrome (CANVAS) and the finding of āCGGā repeat expansions in NOTCH2NLC as the cause of neuronal intranuclear inclusion disease and a range of clinical phenotypes. However, established laboratory techniques for diagnosis of repeat expansions (repeat-primed PCR and Southern blot) are cumbersome, low-throughput and poorly suited to parallel analysis of multiple gene regions. While next generation sequencing (NGS) has been increasingly used, established short-read NGS platforms (e.g., Illumina) are unable to genotype large and/or complex repeat expansions. Long-read sequencing platforms recently developed by Oxford Nanopore Technology and Pacific Biosciences promise to overcome these limitations to deliver enhanced diagnosis of repeat expansion disorders in a rapid and cost-effective fashion. We anticipate that long-read sequencing will rapidly transform the detection of short tandem repeat expansion disorders for both clinical diagnosis and gene discovery.
Publisher: Springer Science and Business Media LLC
Date: 29-09-2020
DOI: 10.1038/S42003-020-01270-Z
Abstract: The advent of portable nanopore sequencing devices has enabled DNA and RNA sequencing to be performed in the field or the clinic. However, advances in in situ genomics require parallel development of portable, offline solutions for the computational analysis of sequencing data. Here we introduce Genopo , a mobile toolkit for nanopore sequencing analysis. Genopo compacts popular bioinformatics tools to an Android application, enabling fully portable computation. To demonstrate its utility for in situ genome analysis, we use Genopo to determine the complete genome sequence of the human coronavirus SARS-CoV-2 in nine patient isolates sequenced on a nanopore device, with Genopo executing this workflow in less than 30 min per s le on a range of popular smartphones. We further show how Genopo can be used to profile DNA methylation in a human genome s le, illustrating a flexible, efficient architecture that is suitable to run many popular bioinformatics tools and accommodate small or large genomes. As the first ever smartphone application for nanopore sequencing analysis, Genopo enables the genomics community to harness this cheap, ubiquitous computational resource.
Publisher: Springer Science and Business Media LLC
Date: 12-01-2022
DOI: 10.1186/S13059-021-02579-6
Abstract: Next-generation sequencing (NGS) can identify mutations in the human genome that cause disease and has been widely adopted in clinical diagnosis. However, the human genome contains many polymorphic, low-complexity, and repetitive regions that are difficult to sequence and analyze. Despite their difficulty, these regions include many clinically important sequences that can inform the treatment of human diseases and improve the diagnostic yield of NGS. To evaluate the accuracy by which these difficult regions are analyzed with NGS, we built an in silico decoy chromosome, along with corresponding synthetic DNA reference controls, that encode difficult and clinically important human genome regions, including repeats, microsatellites, HLA genes, and immune receptors. These controls provide a known ground-truth reference against which to measure the performance of erse sequencing technologies, reagents, and bioinformatic tools. Using this approach, we provide a comprehensive evaluation of short- and long-read sequencing instruments, library preparation methods, and software tools and identify the errors and systematic bias that confound our resolution of these remaining difficult regions. This study provides an analytical validation of diagnosis using NGS in difficult regions of the human genome and highlights the challenges that remain to resolve these difficult regions.
Publisher: Public Library of Science (PLoS)
Date: 15-04-2021
DOI: 10.1371/JOURNAL.PGEN.1009465
Abstract: How temperature determines sex remains unknown. A recent hypothesis proposes that conserved cellular mechanisms (calcium and redox āCaReā status) sense temperature and identify genes and regulatory pathways likely to be involved in driving sexual development. We take advantage of the unique sex determining system of the model organism, Pogona vitticeps , to assess predictions of this hypothesis. P . vitticeps has ZZ male: ZW female sex chromosomes whose influence can be overridden in genetic males by high temperatures, causing male-to-female sex reversal. We compare a developmental transcriptome series of ZWf females and temperature sex reversed ZZf females. We demonstrate that early developmental cascades differ dramatically between genetically driven and thermally driven females, later converging to produce a common outcome (ovaries). We show that genes proposed as regulators of thermosensitive sex determination play a role in temperature sex reversal. Our study greatly advances the search for the mechanisms by which temperature determines sex.
Publisher: Elsevier BV
Date: 07-2017
DOI: 10.1016/J.TIG.2017.04.004
Abstract: The combination of pervasive transcription and prolific alternative splicing produces a mammalian transcriptome of great breadth and ersity. The majority of transcribed genomic bases are intronic, antisense, or intergenic to protein-coding genes, yielding a plethora of short and long non-protein-coding regulatory RNAs. Long noncoding RNAs (lncRNAs) share most aspects of their biogenesis, processing, and regulation with mRNAs. However, lncRNAs are typically expressed in more restricted patterns, frequently from enhancers, and exhibit almost universal alternative splicing. These features are consistent with their role as modular epigenetic regulators. We describe here the key studies and technological advances that have shaped our understanding of the dimensions, dynamics, and biological relevance of the mammalian noncoding transcriptome.
Publisher: Cold Spring Harbor Laboratory
Date: 26-08-2023
DOI: 10.1101/2023.08.24.554710
Abstract: Studies of sex chromosome dosage compensation have historically focussed on therian mammals which have a conserved XY sex determination system. In contrast, lizards have sex determination systems that can differ between even closely related species that include XY and ZW systems and thermolabile systems where genetic and temperature interact to various degrees to determine sex. The eastern three-lined skink ( Bassiana duperreyi ) has a differentiated XY sex determination system, in which low temperature incubation during development can cause female to male sex reversal, producing XX males. This provides a unique opportunity to investigate how genotype and phenotype affect dosage compensation. We generated transcriptomes from brain and heart tissue of normal adult males and females, along with brain tissue of sex-reversed XX males. We observed partial dosage compensation between XX females and XY males in both brain and heart, with median gene expression from the X in normal males being 0.7 times that of normal females. Surprisingly, in brain of sex reversed XX males the median X chromosome output did not match that of either normal males or females, but instead was 0.89 times that of the normal XX female level. This suggests that not just genotype, but also sexual phenotype, influences gene dosage of the X chromosome. This has profound implications for our understanding of the evolution of dosage compensation.
Publisher: Elsevier BV
Date: 09-2021
DOI: 10.1016/J.CELREP.2021.109722
Abstract: DNA replication timing and three-dimensional (3D) genome organization are associated with distinct epigenome patterns across large domains. However, whether alterations in the epigenome, in particular cancer-related DNA hypomethylation, affects higher-order levels of genome architecture is still unclear. Here, using Repli-Seq, single-cell Repli-Seq, and Hi-C, we show that genome-wide methylation loss is associated with both concordant loss of replication timing precision and deregulation of 3D genome organization. Notably, we find distinct disruption in 3D genome compartmentalization, striking gains in cell-to-cell replication timing heterogeneity and loss of allelic replication timing in cancer hypomethylation models, potentially through the gene deregulation of DNA replication and genome organization pathways. Finally, we identify ectopic H3K4me3-H3K9me3 domains from across large hypomethylated domains, where late replication is maintained, which we purport serves to protect against catastrophic genome reorganization and aberrant gene transcription. Our results highlight a potential role for the methylome in the maintenance of 3D genome regulation.
Publisher: Frontiers Media SA
Date: 2013
Publisher: Cold Spring Harbor Laboratory
Date: 10-05-2023
DOI: 10.1101/2023.05.09.539953
Abstract: In silico simulation of next-generation sequencing data is a technique used widely in the genomics field. However, there is currently a lack of optimal tools for creating simulated data from āthird-generationā nanopore sequencing devices, which measure DNA or RNA molecules in the form of time-series current signal data. Here, we introduce Squigulator , a fast and simple tool for simulation of realistic nanopore signal data. Squigulator takes a reference genome, transcriptome or read sequences and generates corresponding raw nanopore signal data. This is compatible with basecalling software from Oxford Nanopore Technologies (ONT) and other third-party tools, thereby providing a useful substrate for testing, debugging, validation and optimisation of nanopore analysis methods. The user may generate noise-free āidealā data, realistic data with noise profiles emulating specific ONT protocols, or they may deterministically modify noise parameters and other variables to shape the data to their needs. To highlight its utility, we use Squigulator to model the degree to which different types of noise impact the accuracy of ONT basecalling and downstream variant detection, revealing new insights into the properties of ONT data. We provide Squigulator as an open-source tool for the nanopore community: asindu2008/squigulator
Publisher: American Association for the Advancement of Science (AAAS)
Date: 22-04-2022
Abstract: Sex determination and differentiation in reptiles are complex. In the model species, Pogona vitticeps , high incubation temperature can cause male to female sex reversal. To elucidate the epigenetic mechanisms of thermolabile sex, we used an unbiased genome-wide assessment of intron retention during sex reversal. The previously implicated chromatin modifiers ( jarid2 and kdm6b ) were two of three genes to display sex reversalāspecific intron retention. In these species, embryonic intron retention resulting in C-terminally truncated jarid2 and kdm6b isoforms consistently occurs at low temperatures. High-temperature sex reversal is uniquely characterized by a high prevalence of N-terminally truncated isoforms of jarid2 and kdm6b , which are not present at low temperatures, or in two other reptiles with temperature-dependent sex determination. This work verifies that chromatin-modifying genes are involved in highly conserved temperature responses and can also be transcribed into isoforms with new sex-determining roles.
Publisher: Elsevier BV
Date: 08-2021
Publisher: Elsevier BV
Date: 02-2018
DOI: 10.1016/J.CELS.2017.12.005
Abstract: The human transcriptome is so large, erse, and dynamic that, even after a decade of investigation by RNA sequencing (RNA-seq), we have yet to resolve its true dimensions. RNA-seq suffers from an expression-dependent bias that impedes characterization of low-abundance transcripts. We performed targeted single-molecule and short-read RNA-seq to survey the transcriptional landscape of a single human chromosome (Hsa21) at unprecedented resolution. Our analysis reaches the lower limits of the transcriptome, identifying a fundamental distinction between protein-coding and noncoding gene content: almost every noncoding exon undergoes alternative splicing, producing a seemingly limitless variety of isoforms. Analysis of syntenic regions of the mouse genome shows that few noncoding exons are shared between human and mouse, yet human splicing profiles are recapitulated on Hsa21 in mouse cells, indicative of regulation by a deeply conserved splicing code. We propose that noncoding exons are functionally modular, with alternative splicing generating an enormous repertoire of potential regulatory RNAs and a rich transcriptional reservoir for gene evolution.
Publisher: Cold Spring Harbor Laboratory
Date: 16-10-2020
DOI: 10.1101/2020.10.15.338855
Abstract: DNA replication timing and three-dimensional (3D) genome organisation occur across large domains associated with distinct epigenome patterns to functionally compartmentalise genome regulation. However, it is still unclear if alternations in the epigenome, in particular cancer-related DNA hypomethylation, can directly result in alterations to cancer higher order genome architecture. Here, we use Hi-C and single cell Repli-Seq, in the colorectal cancer DNMT1 and DNMT3B DNA methyltransferases double knockout model, to determine the impact of DNA hypomethylation on replication timing and 3D genome organisation. First, we find that the hypomethylated cells show a striking loss of replication timing precision with gain of cell-to-cell replication timing heterogeneity and loss of 3D genome compartmentalisation. Second, hypomethylated regions that undergo a large change in replication timing also show loss of allelic replication timing, including at cancer-related genes. Finally, we observe the formation of broad ectopic H3K4me3-H3K9me3 domains across hypomethylated regions where late replication is maintained, that potentially prevent aberrant transcription and loss of genome organisation after DNA demethylation. Together, our results highlight a previously underappreciated role for DNA methylation in maintenance of 3D genome architecture.
Publisher: Elsevier BV
Date: 09-2022
Publisher: Cold Spring Harbor Laboratory
Date: 02-04-2022
DOI: 10.1101/2022.03.29.22273013
Abstract: Chimeric antigen receptor (CAR) T cells have demonstrable efficacy in treating B-cell malignancies. Factors such as product composition, lymphodepletion and immune reconstitution are known to influence functional persistence of CAR + T cells. However, little is known about the determinants of differentiation and phenotypic plasticity of CAR + T and immune cells early post-infusion. We report single cell multi-omics analysis of molecular, clonal, and phenotypic profiles of CAR + T and other immune cells circulating in patients receiving donor-derived products. We used these data to reconstruct a differentiation trajectory, which explained the observed phenotypic plasticity and identified cell fate of CAR + and CAR - T cells. Following lympho-depletion, endogenous CAR - CD8 + and Ī³ā” T cells, clonally expand, and differentiate across heterogenous phenotypes, from a dominant resting or proliferating state into precursor of exhausted T cells, and notably into a terminal NK-like phenotype. In parallel, following infusion, CAR + T cells undergo a similar differentiation trajectory, showing increased proliferation, metabolic activity and exhaustion when compared to circulating CAR - T cells. The subset of NK-like CAR + T cells was associated with increasing levels of circulating proinflammatory cytokines, including innate-like IL-12 and IL-18. These results demonstrate that differentiation and phenotype of CAR + T cells are determined by non-CAR induced signals that are shared with endogenous T cells, and condition the patientsā immune-recovery. CAR + and CAR - CD8+ T cells share a differentiation trajectory terminating in an NK-like phenotype that is associated with increased inflammatory cytokines levels.
Publisher: Springer Science and Business Media LLC
Date: 16-04-2021
DOI: 10.1186/S13059-021-02315-0
Abstract: Targeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing. All panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5ā20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden. This comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.
Publisher: Cold Spring Harbor Laboratory
Date: 25-05-2023
DOI: 10.1101/2023.05.25.542242
Abstract: DNA methylation (5-methylcytosine, 5mC) is a repressive gene regulatory mark widespread in vertebrate genomes, yet the developmental dynamics in which 5mC patterns are established vary across species. While mammals undergo two rounds of global 5mC erasure, the zebrafish genome exhibits localized maternal-to-paternal 5mC remodeling, in which the sperm epigenome is inherited in the early embryo. To date, it is unclear how evolutionarily conserved such 5mC remodeling strategies are, and what their biological function is. Here, we studied 5mC dynamics during the embryonic development of sea l rey ( Petromyzon marinus ), a jawless vertebrate which occupies a critical phylogenetic position as the sister group of the jawed vertebrates. We employed base-resolution 5mC quantification in the l rey germline, embryonic and somatic tissues, and discovered large-scale maternal-to-paternal epigenome remodeling that affects % of the embryonic genome and is predominantly associated with partially methylated domains (PMDs). We further demonstrate that sequences eliminated during programmed genome rearrangement (PGR), a hallmark of l rey embryogenesis, are hypermethylated in sperm prior to the onset of PGR. Our study thus unveils important insights into the evolutionary origins of vertebrate 5mC reprogramming, and how this process might participate in erse developmental strategies.
Publisher: Springer Science and Business Media LLC
Date: 08-08-2016
DOI: 10.1038/NMETH.3958
Abstract: RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and ersity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between s les. We demonstrate the use of sequins in RNA-seq experiments to measure s le-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA s les. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.
Publisher: Springer Science and Business Media LLC
Date: 08-08-2016
DOI: 10.1038/NMETH.3957
Abstract: The identification of genetic variation with next-generation sequencing is confounded by the complexity of the human genome sequence and by biases that arise during library preparation, sequencing and analysis. We have developed a set of synthetic DNA standards, termed 'sequins', that emulate human genetic features and constitute qualitative and quantitative spike-in controls for genome sequencing. Sequencing reads derived from sequins align exclusively to an artificial in silico reference chromosome, rather than the human reference genome, which allows them them to be partitioned for parallel analysis. Here we use this approach to represent common and clinically relevant genetic variation, ranging from single nucleotide variants to large structural rearrangements and copy-number variation. We validate the design and performance of sequin standards by comparison to ex les in the NA12878 reference genome, and we demonstrate their utility during the detection and quantification of variants. We provide sequins as a standardized, quantitative resource against which human genetic variation can be measured and diagnostic performance assessed.
Publisher: Springer Science and Business Media LLC
Date: 19-06-2019
DOI: 10.1038/S41596-019-0175-1
Abstract: Next-generation sequencing (NGS) has been widely adopted to identify genetic variants and investigate their association with disease. However, the analysis of sequencing data remains challenging because of the complexity of human genetic variation and confounding errors introduced during library preparation, sequencing and analysis. We have developed a set of synthetic DNA spike-ins-termed 'sequins' (sequencing spike-ins)-that are directly added to DNA s les before library preparation. Sequins can be used to measure technical biases and to act as internal quantitative and qualitative controls throughout the sequencing workflow. This step-by-step protocol explains the use of sequins for both whole-genome and targeted sequencing of the human genome. This includes instructions regarding the dilution and addition of sequins to human DNA s les, followed by the bioinformatic steps required to separate sequin- and s le-derived sequencing reads and to evaluate the diagnostic performance of the assay. These practical guidelines are accompanied by a broader discussion of the conceptual and statistical principles that underpin the design of sequin standards. This protocol is suitable for users with standard laboratory and bioinformatic experience. The laboratory steps require ~1-4 d and the bioinformatic steps (which can be performed with the provided ex le data files) take an additional day.
Publisher: Cold Spring Harbor Laboratory
Date: 11-06-2020
DOI: 10.1101/2020.06.09.143412
Abstract: DNA synthesis in vitro has enabled the rapid production of reference standards. These are used as controls, and allow measurement and improvement of the accuracy and quality of diagnostic tests. Current reference standards typically represent target genetic material, and act only as positive controls to assess test sensitivity. However, negative controls are also required to evaluate test specificity. Using a pair of chimeric A/B RNA standards, this allowed incorporation of positive and negative controls into diagnostic testing for the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). The chimeric standards constituted target regions for RT-PCR primer robe sets that are joined in tandem across two separate synthetic molecules. Accordingly, a target region that is present in standard A provides a positive control, whilst being absent in standard B, thereby providing a negative control. This design enables cross-validation of positive and negative controls between the paired standards in the same reaction, with identical conditions. This enables control and test failures to be distinguished, increasing confidence in the accuracy of results. The chimeric A/B standards were assessed using the US Centers for Disease Control real-time RT-PCR protocol, and showed results congruent with other commercial controls in detecting SARS CoV-2 in patient s les. This chimeric reference standard design approach offers extensive flexibility, allowing representation of erse genetic features and distantly related sequences, even from different organisms.
Publisher: Oxford University Press (OUP)
Date: 30-05-2023
DOI: 10.1093/BIOINFORMATICS/BTAD352
Abstract: Nanopore sequencing is emerging as a key pillar in the genomic technology landscape but computational constraints limiting its scalability remain to be overcome. The translation of raw current signal data into DNA or RNA sequence reads, known as ābasecallingā, is a major friction in any nanopore sequencing workflow. Here, we exploit the advantages of the recently developed signal data format āSLOW5ā to streamline and accelerate nanopore basecalling on high-performance computing (HPC) and cloud environments. SLOW5 permits highly efficient sequential data access, eliminating a potential analysis bottleneck. To take advantage of this, we introduce Buttery-eel, an open-source wrapper for Oxford Nanoporeās Guppy basecaller that enables SLOW5 data access, resulting in performance improvements that are essential for scalable, affordable basecalling. Buttery-eel is available at github.com/Psy-Fer/buttery-eel.
Publisher: Cold Spring Harbor Laboratory
Date: 07-02-2023
DOI: 10.1101/2023.02.06.527365
Abstract: Nanopore sequencing is emerging as a key pillar in the genomic technology landscape but computational constraints limiting its scalability remain to be overcome. The translation of raw current signal data into DNA or RNA sequence reads, known as ābasecallingā, is a major friction in any nanopore sequencing workflow. Here, we exploit the advantages of the recently developed signal data format āSLOW5ā to streamline and accelerate nanopore basecalling on high-performance computer (HPC) and cloud environments. SLOW5 permits highly efficient sequential data access, eliminating a significant analysis bottleneck. To take advantage of this, we introduce Buttery-eel , an open-source wrapper for Oxford Nanoporeās Guppy basecaller that enables SLOW5 data access, resulting in performance improvements that are essential for scalable, affordable basecalling.
Publisher: Springer Science and Business Media LLC
Date: 04-07-0012
DOI: 10.1038/S41467-020-17445-5
Abstract: Standard units of measurement are required for the quantitative description of nature however, few standard units have been established for genomics to date. Here, we have developed a synthetic DNA ladder that defines a quantitative standard unit that can measure DNA sequence abundance within a next-generation sequencing library. The ladder can be spiked into a DNA s le, and act as an internal scale that measures quantitative genetics features. Unlike previous spike-ins, the ladder is encoded within a single molecule, and can be equivalently and independently synthesized by different laboratories. We show how the ladder can measure erse quantitative features, including human genetic variation and microbial abundance, and also estimate uncertainty due to technical variation and improve normalization between libraries. This ladder provides an independent quantitative unit that can be used with any organism, application or technology, thereby providing a common metric by which genomes can be measured.
Publisher: Elsevier BV
Date: 2021
DOI: 10.2139/SSRN.3830017
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 14-03-2023
DOI: 10.1212/NXG.0000000000200064
Abstract: Duchenne muscular dystrophy (DMD) is caused by pathogenic variants in the dystrophin gene ( DMD ). Hypermethylated CGG expansions within DIP2B 5ā² UTR are associated with an intellectual development disorder. Here, we demonstrate the diagnostic utility of genomic short-read sequencing (SRS) and transcriptome sequencing to identify a novel DMD structural variant (SV) and a DIP2B CGG expansion in a patient with DMD for whom conventional diagnostic testing failed to yield a genetic diagnosis. We performed genomic SRS, skeletal muscle transcriptome sequencing, and targeted programmable long-read sequencing (LRS). The proband had a typical DMD clinical presentation, autism spectrum disorder (ASD), and dystrophinopathy on muscle biopsy. Transcriptome analysis identified 6 aberrantly expressed genes DMD and DIP2B were the strongest underexpression and overexpression outliers, respectively. Genomic SRS identified a 216 kb paracentric inversion (NC_000023.11: g.33162217-33378800) overlapping 2 DMD promoters. ExpansionHunter indicated an expansion of 109 CGG repeats within the 5ā² UTR of DIP2B . Targeted genomic LRS confirmed the SV and genotyped the DIP2B repeat expansion as 270 CGG repeats. Here, transcriptome data heavily guided genomic analysis to resolve a complex DMD inversion and a DIP2B repeat expansion. Longitudinal follow-up will be important for clarifying the clinical significance of the DIP2B genotype.
Publisher: Springer Science and Business Media LLC
Date: 19-06-2017
DOI: 10.1038/NRG.2017.44
Abstract: Next-generation sequencing (NGS) provides a broad investigation of the genome, and it is being readily applied for the diagnosis of disease-associated genetic features. However, the interpretation of NGS data remains challenging owing to the size and complexity of the genome and the technical errors that are introduced during s le preparation, sequencing and analysis. These errors can be understood and mitigated through the use of reference standards - well-characterized genetic materials or synthetic spike-in controls that help to calibrate NGS measurements and to evaluate diagnostic performance. The informed use of reference standards, and associated statistical principles, ensures rigorous analysis of NGS data and is essential for its future clinical use.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 04-03-2022
Abstract: More than 50 neurological and neuromuscular diseases are caused by short tandem repeat (STR) expansions, with 37 different genes implicated to date. We describe the use of programmable targeted long-read sequencing with Oxford Nanoporeās ReadUntil function for parallel genotyping of all known neuropathogenic STRs in a single assay. Our approach enables accurate, haplotype-resolved assembly and DNA methylation profiling of STR sites, from a list of predetermined candidates. This correctly diagnoses all in iduals in a small cohort ( n = 37) including patients with various neurogenetic diseases ( n = 25). Targeted long-read sequencing solves large and complex STR expansions that confound established molecular tests and short-read sequencing and identifies noncanonical STR motif conformations and internal sequence interruptions. We observe a ersity of STR alleles of known and unknown pathogenicity, suggesting that long-read sequencing will redefine the genetic landscape of repeat disorders. Last, we show how the inclusion of pharmacogenomic genes as secondary ReadUntil targets can further inform patient care.
Publisher: Springer Science and Business Media LLC
Date: 06-07-2021
DOI: 10.1038/S41467-021-24442-9
Abstract: Spi-1 Proto-Oncogene (SPI1) fusion genes are recurrently found in T-cell acute lymphoblastic leukemia (T-ALL) cases but are insufficient to drive leukemogenesis. Here we show that SPI1 fusions in combination with activating NRAS mutations drive an immature T-ALL in vivo using a conditional bone marrow transplant mouse model. Addition of the oncogenic fusion to the NRAS mutation also results in a higher leukemic stem cell frequency. Mechanistically, genetic deletion of the Ī²-catenin binding domain within Transcription factor 7 ( TCF7)-SPI1 or use of a TCF/Ī²-catenin interaction antagonist abolishes the oncogenic activity of the fusion. Targeting the TCF7-SPI1 fusion in vivo with a doxycycline-inducible knockdown results in increased differentiation. Moreover, both pharmacological and genetic inhibition lead to down-regulation of SPI1 targets. Together, our results reveal an ex le where TCF7-SPI1 leukemia is vulnerable to pharmacological targeting of the TCF/Ī²-catenin interaction.
Publisher: Springer Science and Business Media LLC
Date: 10-08-2022
DOI: 10.1038/S41586-022-05054-9
Abstract: The notion that mobile units of nucleic acid known as transposable elements can operate as genomic controlling elements was put forward over six decades ago
Publisher: Springer Science and Business Media LLC
Date: 20-12-2117
Publisher: Springer Science and Business Media LLC
Date: 03-01-2022
DOI: 10.1038/S41587-021-01147-4
Abstract: Nanopore sequencing depends on the FAST5 file format, which does not allow efficient parallel analysis. Here we introduce SLOW5, an alternative format engineered for efficient parallelization and acceleration of nanopore data analysis. Using the ex le of DNA methylation profiling of a human genome, analysis runtime is reduced from more than two weeks to approximately 10.5 h on a typical high-performance computer. SLOW5 is approximately 25% smaller than FAST5 and delivers consistent improvements on different computer architectures.
Publisher: Oxford University Press (OUP)
Date: 15-12-2021
DOI: 10.1093/BIOINFORMATICS/BTAB846
Abstract: InterARTIC is an interactive web application for the analysis of viral whole-genome sequencing (WGS) data generated on Oxford Nanopore Technologies (ONT) devices. A graphical interface enables users with no bioinformatics expertise to analyze WGS experiments and reconstruct consensus genome sequences from in idual isolates of viruses, such as SARS-CoV-2. InterARTIC is intended to facilitate widespread adoption and standardization of ONT sequencing for viral surveillance and molecular epidemiology. We demonstrate the use of InterARTIC for the analysis of ONT viral WGS data from SARS-CoV-2 and Ebola virus, using a laptop computer or the internal computer on an ONT GridION sequencing device. We showcase the intuitive graphical interface, workflow customization capabilities and job-scheduling system that facilitate execution of small- and large-scale WGS projects on any common virus. InterARTIC is a free, open-source web application implemented in Python that executes best-practice command line workflows from the ARTIC network. The application can be downloaded as a set of pre-compiled binaries that are compatible with all common Linux distributions, Windows with Linux subsystems, MacOSX and ARM systems. All code can be found on GitHub at github.com/Psy-Fer/interARTIC/ and documentation can be found at github.com/Psy-Fer/interARTIC/. Supplementary data are available at Bioinformatics online.
Publisher: Cold Spring Harbor Laboratory
Date: 04-02-2021
DOI: 10.1101/2021.02.03.429474
Abstract: How temperature determines sex remains unknown. A recent hypothesis proposes that conserved cellular mechanisms (calcium and redox āCaReā status) sense temperature and identify genes and regulatory pathways likely to be involved in driving sexual development. We take advantage of the unique sex determining system of the model organism, Pogona vitticeps , to assess predictions of this hypothesis. P. vitticeps has ZZ male: ZW female sex chromosomes whose influence can be overridden in genetic males by high temperatures, causing male-to-female sex reversal. We compare a developmental transcriptome series of ZWf females and temperature sex reversed ZZf females. We demonstrate that early developmental cascades differ dramatically between genetically driven and thermally driven females, later converging to produce a common outcome (ovaries). We show that genes proposed as regulators of thermosensitive sex determination play a role in temperature sex reversal. Our study greatly advances the search for the mechanisms by which temperature determines sex. In many reptiles and fish, environment can determine, or influence, the sex of developing embryos. How this happens at a molecular level that has eluded resolution for half a century of intensive research. We studied the bearded dragon, a lizard that has sex chromosomes (ZZ male and ZW female), but in which that temperature can override ZZ sex chromosomes to cause male to female sex reversal. This provides an unparalleled opportunity to disentangle, in the same species, the biochemical pathways required to make a female by these two different routes. We sequenced the transcriptomes of gonads from developing ZZ reversed and normal ZW dragon embryos and discovered that different sets of genes are active in ovary development driven by genotype or temperature. Females whose sex was initiated by temperature showed a transcriptional profile consistent with the recently-proposed Calcium-Redox hypotheses of cellular temperature sensing. These findings are an important for understanding how the environment influences the development of sex, and more generally how the environment can epigenetically modify the action of genes.
Publisher: Cold Spring Harbor Laboratory
Date: 10-05-2017
DOI: 10.1101/136275
Abstract: The human transcriptome is so large, erse and dynamic that, even after a decade of investigation by RNA sequencing (RNA-Seq), we are yet to resolve its true dimensions. RNA-Seq suffers from an expression-dependent bias that impedes characterization of low-abundance transcripts. We performed targeted single-molecule and short-read RNA-Seq to survey the transcriptional landscape of a single human chromosome (Hsa21) at unprecedented resolution. Our analysis reaches the lower limits of the transcriptome, identifying a fundamental distinction between protein-coding and noncoding gene content: almost every noncoding exon undergoes alternative splicing, producing a seemingly limitless variety of isoforms. Analysis of syntenic regions of the mouse genome shows that few noncoding exons are shared between human and mouse, yet human splicing profiles are recapitulated on Hsa21 in mouse cells, indicative of regulation by a deeply conserved splicing code. We propose that noncoding exons are functionally modular, with alternative splicing generating an enormous repertoire of potential regulatory RNAs and a rich transcriptional reservoir for gene evolution.
Publisher: Springer Science and Business Media LLC
Date: 29-01-2021
DOI: 10.1038/S41598-021-81760-0
Abstract: DNA synthesis in vitro has enabled the rapid production of reference standards. These are used as controls, and allow measurement and improvement of the accuracy and quality of diagnostic tests. Current reference standards typically represent target genetic material, and act only as positive controls to assess test sensitivity. However, negative controls are also required to evaluate test specificity. Using a pair of chimeric A/B RNA standards, this allowed incorporation of positive and negative controls into diagnostic testing for the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). The chimeric standards constituted target regions for RT-PCR primer robe sets that are joined in tandem across two separate synthetic molecules. Accordingly, a target region that is present in standard A provides a positive control, whilst being absent in standard B, thereby providing a negative control. This design enables cross-validation of positive and negative controls between the paired standards in the same reaction, with identical conditions. This enables control and test failures to be distinguished, increasing confidence in the accuracy of results. The chimeric A/B standards were assessed using the US Centres for Disease Control real-time RT-PCR protocol, and showed results congruent with other commercial controls in detecting SARS-CoV-2 in patient s les. This chimeric reference standard design approach offers extensive flexibility, allowing representation of erse genetic features and distantly related sequences, even from different organisms.
Publisher: Springer Science and Business Media LLC
Date: 16-02-2021
DOI: 10.1038/S41598-021-83642-X
Abstract: Accumulating evidence supports the high prevalence of co-infections among Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) patients, and their potential to worsen the clinical outcome of COVID-19. However, there are few data on Southern Hemisphere populations, and most studies to date have investigated a narrow spectrum of viruses using targeted qRT-PCR. Here we assessed respiratory viral co-infections among SARS-CoV-2 patients in Australia, through respiratory virome characterization. Nasopharyngeal swabs of 92 SARS-CoV-2-positive cases were sequenced using pan-viral hybrid-capture and the Twist Respiratory Virus Panel. In total, 8% of cases were co-infected, with rhinovirus (6%) or influenzavirus (2%). Twist capture also achieved near-complete sequencing ( 90% coverage, tenfold depth) of the SARS-CoV-2 genome in 95% of specimens with Ct 30. Our results highlight the importance of assessing all pathogens in symptomatic patients, and the dual-functionality of Twist hybrid-capture, for SARS-CoV-2 whole-genome sequencing without licon generation and the simultaneous identification of viral co-infections with ease.
Publisher: F1000 Research Ltd
Date: 19-05-2021
DOI: 10.12688/WELLCOMEOPENRES.16661.1
Abstract: Late in 2020, two genetically-distinct clusters of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with mutations of biological concern were reported, one in the United Kingdom and one in South Africa. Using a combination of data from routine surveillance, genomic sequencing and international travel we track the international dispersal of lineages B.1.1.7 and B.1.351 (variant 501Y-V2). We account for potential biases in genomic surveillance efforts by including passenger volumes from location of where the lineage was first reported, London and South Africa respectively. Using the software tool grinch (global report investigating novel coronavirus haplotypes), we track the international spread of lineages of concern with automated daily reports, Further, we have built a custom tracking website (lobal_report.html) which hosts this daily report and will continue to include novel SARS-CoV-2 lineages of concern as they are detected.
Publisher: Springer Science and Business Media LLC
Date: 28-10-2022
DOI: 10.1038/S41467-022-34028-8
Abstract: Library adaptors are short oligonucleotides that are attached to RNA and DNA s les in preparation for next-generation sequencing (NGS). Adaptors can also include additional functional elements, such as s le indexes and unique molecular identifiers, to improve library analysis. Here, we describe Control Library Adaptors, termed CAPTORs, that measure the accuracy and reliability of NGS. CAPTORs can be integrated within the library preparation of RNA and DNA s les, and their encoded information is retrieved during sequencing. We show how CAPTORs can measure the accuracy of nanopore sequencing, evaluate the quantitative performance of metagenomic and RNA sequencing, and improve normalisation between s les. CAPTORs can also be customised for clinical diagnoses, correcting systematic sequencing errors and improving the diagnosis of pathogenic BRCA1/2 variants in breast cancer. CAPTORs are a simple and effective method to increase the accuracy and reliability of NGS, enabling comparisons between s les, reagents and laboratories, and supporting the use of nanopore sequencing for clinical diagnosis.
Publisher: F1000 Research Ltd
Date: 17-09-2021
DOI: 10.12688/WELLCOMEOPENRES.16661.2
Abstract: Late in 2020, two genetically-distinct clusters of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) with mutations of biological concern were reported, one in the United Kingdom and one in South Africa. Using a combination of data from routine surveillance, genomic sequencing and international travel we track the international dispersal of lineages B.1.1.7 and B.1.351 (variant 501Y-V2). We account for potential biases in genomic surveillance efforts by including passenger volumes from location of where the lineage was first reported, London and South Africa respectively. Using the software tool grinch (global report investigating novel coronavirus haplotypes), we track the international spread of lineages of concern with automated daily reports, Further, we have built a custom tracking website (lobal_report.html) which hosts this daily report and will continue to include novel SARS-CoV-2 lineages of concern as they are detected.
Publisher: Cold Spring Harbor Laboratory
Date: 10-10-2023
Publisher: Cold Spring Harbor Laboratory
Date: 30-05-2021
DOI: 10.1101/2021.05.28.446065
Abstract: A recent study proposed severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) hijacks the LINE-1 (L1) retrotransposition machinery to integrate into the DNA of infected cells. If confirmed, this finding could have significant clinical implications. Here, we applied deep ( Ć) long-read Oxford Nanopore Technologies (ONT) sequencing to HEK293T cells infected with SARS-CoV-2, and did not find the virus integrated into the genome. By examining ONT data from separate HEK293T cultivars, we completely resolved 78 L1 insertions arising in vitro in the absence of L1 overexpression systems. ONT sequencing applied to hepatitis B virus (HBV) positive liver cancer tissues located a single HBV insertion. These experiments demonstrate reliable resolution of retrotransposon and exogenous virus insertions via ONT sequencing. That we found no evidence of SARS-CoV-2 integration suggests such events are, at most, extremely rare in vivo , and therefore are unlikely to drive oncogenesis or explain post-recovery detection of the virus.
Publisher: Wiley
Date: 05-10-2013
DOI: 10.1016/J.FEBSLET.2013.09.037
Abstract: In plants, the silencing efficacy of microRNAs (miRNAs) is thought to be predominantly determined by the degree of complementarity to their target genes. Here, silencing efficacy was determined for Arabidopsis miR159 and four artificial miRNAs (amiRNAs) that all target MYB33/MYB65 with analogous complementarities. As determined through complementation of a loss-of-function mir159 mutant, the amiRNAs displayed highly variable efficacies, none of which was as strong as endogenous miR159. This was despite amiRNA expression levels being many fold-higher than miR159 in wild-type. The results highlight the variable nature of miRNA silencing efficacy in plants, where it appears that factors additional to complementarity strongly impact silencing.
Publisher: Springer Science and Business Media LLC
Date: 06-04-2023
DOI: 10.1186/S13059-023-02910-3
Abstract: Nanopore sequencing is being rapidly adopted in genomics. We recently developed SLOW5, a new file format with advantages for storage and analysis of raw signal data from nanopore experiments. Here we introduce slow5tools , an intuitive toolkit for handling nanopore data in SLOW5 format. Slow5tools enables lossless data conversion and a range of tools for interacting with SLOW5 files. Slow5tools uses multi-threading, multi-processing, and other engineering strategies to achieve fast data conversion and manipulation, including live FAST5-to-SLOW5 conversion during sequencing. We provide ex les and benchmarking experiments to illustrate slow5tools usage, and describe the engineering principles underpinning its performance.
Publisher: Oxford University Press (OUP)
Date: 27-01-2017
DOI: 10.1093/BIOINFORMATICS/BTX038
Abstract: Spike-in controls are synthetic nucleic-acid sequences that are added to a userās s le and constitute internal standards for subsequent steps in the next generation sequencing workflow. The Anaquin software toolkit can be used to analyze the performance of spike-in controls at multiple steps during RNA sequencing or genome sequencing analysis, providing useful diagnostic statistics, data visualization and s le normalization. The software is implemented in C ++/R and is freely available under BSD license. The source code is available from tudent-t/Anaquin, binaries and user manual from www.sequin.xyz/software and R package from ackages/Anaquin Supplementary data are available at Bioinformatics online.
Publisher: Elsevier BV
Date: 11-2021
Publisher: Oxford University Press (OUP)
Date: 14-07-2023
Abstract: Cerebellar Ataxia, Neuropathy and Vestibular Areflexia Syndrome (CANVAS) is an autosomal recessive neurodegenerative disease, usually caused by biallelic AAGGG repeat expansions in RFC1. In this study, we leveraged whole genome sequencing (WGS) data from nearly 10,000 in iduals recruited within the Genomics England sequencing project to investigate the normal and pathogenic variation of the RFC1 repeat. We identified three novel repeat motifs, AGGGC (n=6 from 5 families), AAGGC (n=2 from 1 family), AGAGG (n=1), associated with CANVAS in the homozygous or compound heterozygous state with the common pathogenic AAGGG expansion. While AAAAG, AAAGGG and AAGAG expansions appear to be benign, here we show a pathogenic role for large AAAGG repeat configuration expansions (n=5). Long read sequencing was used to fully characterise the entire repeat sequence and revealed a pure AGGGC expansion in six patients, whereas the other patients presented complex motifs with AAGGG or AAAGG interruptions. All pathogenic motifs seem to have arisen from a common haplotype and are predicted to form highly stable G quadruplexes, which have been previously demonstrated to affect gene transcription in other conditions. The assessment of these novel configurations is warranted in CANVAS patients with negative or inconclusive genetic testing. Particular attention should be paid to carriers of compound AAGGG/AAAGG expansions, since the AAAGG motif when very large (& repeats) or in the presence of AAGGG interruptions. Accurate sizing and full sequencing of the satellite repeat with long read is recommended in clinically selected cases, in order to achieve an accurate molecular diagnosis and counsel patients and their families.
Start Date: 2022
End Date: 12-2025
Amount: $1,257,021.00
Funder: Australian Research Council
View Funded Activity