ORCID Profile
0000-0003-2259-1713
Current Organisations
Université de Montréal
,
Centre Hospitalier Universitaire Sainte-Justine Centre de Recherche
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Epigenetics (incl. Genome Methylation and Epigenomics) | Genomics | Bioinformatics | Genetics
Expanding Knowledge in the Biological Sciences | Expanding Knowledge in Technology |
Publisher: Springer Science and Business Media LLC
Date: 22-05-2009
Abstract: We have recently identified two large families of extinct transposable elements termed Short Interspersed DEgenerated Retroposons (SIDERs) in the parasitic protozoan Leishmania major . The characterization of SIDER elements was limited to the SIDER2 subfamily, although members of both subfamilies have been shown to play a role in the regulation of gene expression at the post-transcriptional level. Apparent functional domestication of SIDERs prompted further investigation of their characterization, dissemination and evolution throughout the Leishmania genus, with particular attention to the disregarded SIDER1 subfamily. Using optimized statistical profiles of both SIDER1 and SIDER2 subgroups, we report the first automated and highly sensitive annotation of SIDERs in the genomes of L. infantum, L. braziliensis and L. major . SIDER annotations were combined to in-silico mRNA extremity predictions to generate a detailed distribution map of the repeat family, hence uncovering an enrichment of antisense-oriented SIDER repeats between the polyadenylation and trans -splicing sites of intergenic regions, in contrast to the exclusive sense orientation of SIDER elements within 3'UTRs. Our data indicate that SIDER elements are quite uniformly dispersed throughout all three genomes and that their distribution is generally syntenic. However, only 47.4% of orthologous genes harbor a SIDER element in all three species. There is evidence for species-specific enrichment of SIDERs and for their preferential association, especially for SIDER2s, with different metabolic functions. Investigation of the sequence attributes and evolutionary relationship of SIDERs to other trypanosomatid retroposons reveals that SIDER1 is a truncated version of extinct autonomous ingi -like retroposons (DIREs), which were functional in the ancestral Leishmania genome. A detailed characterization of the sequence traits for both SIDER subfamilies unveils major differences. The SIDER1 subfamily is more heterogeneous and shows an evolutionary link with vestigial DIRE retroposons as previously observed for the ingi /RIME and L1Tc/NARTc couples identified in the T. brucei and T. cruzi genomes, whereas no identified DIREs are related to SIDER2 sequences. Although SIDER1s and SIDER2s display equivalent genomic distribution globally, the varying degrees of sequence conservation, preferential genomic disposition, and differential association to orthologous genes allude to an intricate web of SIDER assimilation in these parasitic organisms.
Publisher: University of Toronto Press Inc. (UTPress)
Date: 09-2022
Abstract: BACKGROUND: COVID-19 is usually a time-limited disease. However, prolonged infections and reinfections can occur among immunocompromised patients. It can be difficult to distinguish a prolonged infection from a new one, especially when reinfection occurs early. METHODS: We report the case of a 57-year-old man infected with SARS-CoV-2 while undergoing chemotherapy for follicular lymphoma. He experienced prolonged symptomatic infection for 3 months despite a 5-day course of remdesivir and eventually deteriorated and died. RESULTS: Viral genome sequencing showed that his final deterioration was most likely due to reinfection. Serologic studies confirmed that the patient did not seroconvert. CONCLUSIONS: This case report highlights that reinfection can occur rapidly (62–67 d) among immunocompromised patients after a prolonged disease. We provide substantial proof of prolonged infection through repeated nucleic acid lification tests and positive viral culture at day 56 of the disease course, and we put forward evidence of reinfection with viral genome sequencing.
Publisher: Cold Spring Harbor Laboratory
Date: 04-02-2019
DOI: 10.1101/539882
Abstract: The human brain is one of the last frontiers of biomedical research. Genome-wide association studies (GWAS) have succeeded in identifying thousands of haplotype blocks associated with a range of neuropsychiatric traits, including disorders such as schizophrenia, Alzheimer’s and Parkinson’s disease. However, the majority of single nucleotide polymorphisms (SNPs) that mark these haplotype blocks fall within non-coding regions of the genome, hindering their functional validation. While some of these GWAS loci may contain cis- acting regulatory DNA elements such as enhancers, we hypothesized that many are also transcribed into non-coding RNAs that are missing from publicly available transcriptome annotations. Here, we use targeted RNA capture (‘RNA CaptureSeq’) in combination with nanopore long-read cDNA sequencing to transcriptionally profile 1,023 haplotype blocks across the genome containing non-coding GWAS SNPs associated with neuropsychiatric traits, using post-mortem human brain tissue from three neurologically healthy donors. We find that the majority (62%) of targeted haplotype blocks, including 13% of intergenic blocks, are transcribed into novel, multi-exonic RNAs, most of which are not yet recorded in GENCODE annotations. We validated our findings with short-read RNA-seq, providing orthogonal confirmation of novel splice junctions and enabling a quantitative assessment of the long-read assemblies. Many novel transcripts are supported by independent evidence of transcription including cap analysis of gene expression (CAGE) data and epigenetic marks, and some show signs of potential functional roles. We present these transcriptomes as a preliminary atlas of non-coding transcription in human brain that can be used to connect neurological phenotypes with gene expression.
Publisher: American Society of Hematology
Date: 20-09-2023
Publisher: Cold Spring Harbor Laboratory
Date: 06-2021
DOI: 10.1101/2021.05.29.21257760
Abstract: The first confirmed case of COVID-19 in Quebec, Canada, occurred at Verdun Hospital on February 25, 2020. A month later, a localized outbreak was observed at this hospital. We performed tiled licon whole genome nanopore sequencing on nasopharyngeal swabs from all SARS-CoV-2 positive s les from 31 March to 17 April 2020 in 2 local hospitals to assess the viral ersity of the outbreak. We report 264 viral genomes from 242 in iduals (both staff and patients) with associated clinical features and outcomes, as well as longitudinal s les, technical replicates and the first publicly disseminated SARS-CoV-2 genomes in Quebec. Viral lineage assessment identified multiple subclades in both hospitals, with a predominant subclade in the Verdun outbreak, indicative of hospital-acquired transmission. Dimensionality reduction identified two subclades that evaded supervised lineage assignment methods, including Pangolin, and identified certain symptoms (headache, myalgia and sore throat) that are significantly associated with favorable patient outcomes. We also address certain limitations of standard SARS-CoV-2 bioinformatics procedures, notably when presented with multiple viral haplotypes.
Publisher: Cold Spring Harbor Laboratory
Date: 16-02-2022
DOI: 10.1101/2022.02.14.480418
Abstract: A series of well-regulated cellular and molecular events result in the compartmentalization of the anterior foregut into the esophagus and trachea. Disruption of the compartmentalization process leads to esophageal atresia/tracheoesophageal fistula (EA/TEF). Therefore, the objective is to differentiate pluripotent stem cells (PSCs), namely, embryonic stem cells and iPSCs from healthy in iduals and iPSCs from EA/TEF type C patients, into mature 3-dimensional esophageal organoids expressing Involucrin, Keratin-4, -13, and p63. CXCR4, SOX17, and GATA4 expression was similar in both patient and healthy endodermal cells. Key transcription factor SOX2 was significantly lower in patient-derived anterior foregut. RNA sequencing revealed critical genes GSTM1 and RAB37 to be significantly lower in patient-derived anterior foregut. Furthermore, we observed an abnormal expression of NKX2.1 in the patient-derived mature esophageal organoids. We therefore hypothesize that a transient dysregulation of SOX2 and the abnormal expression of NKX2.1 in patient-derived cells could be responsible for the abnormal foregut compartmentalization.
Publisher: Springer Science and Business Media LLC
Date: 29-10-2019
DOI: 10.1038/S41467-019-12671-Y
Abstract: Familial Adult Myoclonic Epilepsy (FAME) is characterised by cortical myoclonic tremor usually from the second decade of life and overt myoclonic or generalised tonic-clonic seizures. Four independent loci have been implicated in FAME on chromosomes (chr) 2, 3, 5 and 8. Using whole genome sequencing and repeat primed PCR, we provide evidence that chr2-linked FAME (FAME2) is caused by an expansion of an ATTTC pentamer within the first intron of STARD7 . The ATTTC expansions segregate in 158/158 in iduals typically affected by FAME from 22 pedigrees including 16 previously reported families recruited worldwide. RNA sequencing from patient derived fibroblasts shows no accumulation of the AUUUU or AUUUC repeat sequences and STARD7 gene expression is not affected. These data, in combination with other genes bearing similar mutations that have been implicated in FAME, suggest ATTTC expansions may cause this disorder, irrespective of the genomic locus involved.
Publisher: Oxford University Press (OUP)
Date: 11-07-2013
DOI: 10.1093/NAR/GKT596
Publisher: Elsevier BV
Date: 05-2016
Publisher: Springer Science and Business Media LLC
Date: 23-10-2020
DOI: 10.1038/S41598-020-75374-1
Abstract: Current methods for dengue virus (DENV) genome lification, lify parts of the genome in at least 5 overlapping segments and then combine the output to characterize a full genome. This process is laborious, costly and requires at least 10 primers per serotype, thus increasing the likelihood of PCR bias. We introduce an assay to lify near full-length dengue virus genomes as intact molecules, sequence these licons with third generation “nanopore” technology without fragmenting and use the sequence data to differentiate within-host viral variants with a bioinformatics tool (Nano-Q). The new assay successfully generated near full-length licons from DENV serotypes 1, 2 and 3 s les which were sequenced with nanopore technology. Consensus DENV sequences generated by nanopore sequencing had over 99.5% pairwise sequence similarity to Illumina generated counterparts provided the coverage was 100 with both platforms. Maximum likelihood phylogenetic trees generated from nanopore consensus sequences were able to reproduce the exact trees made from Illumina sequencing with a conservative 99% bootstrapping threshold (after 1000 replicates and 10% burn-in). Pairwise genetic distances of within host variants identified from the Nano-Q tool were less than that of between host variants, thus enabling the phylogenetic segregation of variants from the same host.
Publisher: Cold Spring Harbor Laboratory
Date: 30-08-2017
Abstract: RNA modifications have been historically considered as fine-tuning chemo-structural features of infrastructural RNAs, such as rRNAs, tRNAs, and snoRNAs. This view has changed dramatically in recent years, to a large extent as a result of systematic efforts to map and quantify various RNA modifications in a transcriptome-wide manner, revealing that RNA modifications are reversible, dynamically regulated, far more widespread than originally thought, and involved in major biological processes, including cell differentiation, sex determination, and stress responses. Here we summarize the state of knowledge and provide a catalog of RNA modifications and their links to neurological disorders, cancers, and other diseases. With the advent of direct RNA-sequencing technologies, we expect that this catalog will help prioritize those RNA modifications for transcriptome-wide maps.
Publisher: Cold Spring Harbor Laboratory
Date: 09-2020
Abstract: Nanopore sequencing enables direct measurement of RNA molecules without conversion to cDNA, thus opening the gates to a new era for RNA biology. However, the lack of molecular barcoding of direct RNA nanopore sequencing data sets severely affects the applicability of this technology to biological s les, where RNA availability is often limited. Here, we provide the first experimental protocol and associated algorithm to barcode and demultiplex direct RNA nanopore sequencing data sets. Specifically, we present a novel and robust approach to accurately classify raw nanopore signal data by transforming current intensities into images or arrays of pixels, followed by classification using a deep learning algorithm. We demonstrate the power of this strategy by developing the first experimental protocol for barcoding and demultiplexing direct RNA sequencing libraries. Our method, DeePlexiCon, can classify 93% of reads with 95.1% accuracy or 60% of reads with 99.9% accuracy. The availability of an efficient and simple multiplexing strategy for native RNA sequencing will improve the cost-effectiveness of this technology, as well as facilitate the analysis of lower-input biological s les. Overall, our work exemplifies the power, simplicity, and robustness of signal-to-image conversion for nanopore data analysis using deep learning.
Publisher: Cold Spring Harbor Laboratory
Date: 30-06-2021
DOI: 10.1101/2021.06.29.450255
Abstract: Nanopore sequencing is an emerging genomic technology with great potential. However, the storage and analysis of nanopore sequencing data have become major bottlenecks preventing more widespread adoption in research and clinical genomics. Here, we elucidate an inherent limitation in the file format used to store raw nanopore data – known as FAST5 – that prevents efficient analysis on high-performance computing (HPC) systems. To overcome this, we have developed SLOW5, an alternative file format that permits efficient parallelisation and, thereby, acceleration of nanopore data analysis. For ex le, we show that using SLOW5 format, instead of FAST5, reduces the time and cost of genome-wide DNA methylation profiling by an order of magnitude on common HPC systems, and delivers consistent improvements on a wide range of different architectures. With a simple, accessible file structure and a ~ 25% reduction in size compared to FAST5, SLOW5 format will deliver substantial benefits to all areas of the nanopore community.
Publisher: Oxford University Press (OUP)
Date: 26-01-2014
DOI: 10.1093/BIOINFORMATICS/BTU036
Abstract: Summary: The initial steps in the analysis of next-generation sequencing data can be automated by way of software ‘pipelines’. However, in idual components depreciate rapidly because of the evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands enables the use of hot swappable modular components as opposed to the more rigid program call wrapping by higher level languages, as implemented in comparable published pipelining systems. Here we present Next Generation Sequencing ANalysis for Enterprises (NGSANE), a Linux-based, high-performance-computing-enabled framework that minimizes overhead for set up and processing of new projects, yet maintains full flexibility of custom scripting when processing raw sequence data. Availability and implementation: Ngsane is implemented in bash and publicly available under BSD (3-Clause) licence via GitHub at github.com/BauerLab/ngsane. Contact: Denis.Bauer@csiro.au Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: American Association for Cancer Research (AACR)
Date: 31-05-2011
DOI: 10.1158/0008-5472.CAN-10-4460
Abstract: The identification of cancer-associated long noncoding RNAs (lncRNAs) and the investigation of their molecular and biological functions are important to understand the molecular biology of cancer and its progression. Although the functions of lncRNAs and the mechanisms regulating their expression are largely unknown, recent studies are beginning to unravel their importance in human health and disease. Here, we report that a number of lncRNAs are differentially expressed in melanoma cell lines in comparison to melanocytes and keratinocyte controls. One of these lncRNAs, SPRY4-IT1 (GenBank accession ID AK024556), is derived from an intron of the SPRY4 gene and is predicted to contain several long hairpins in its secondary structure. RNA-FISH analysis showed that SPRY4-IT1 is predominantly localized in the cytoplasm of melanoma cells, and SPRY4-IT1 RNAi knockdown results in defects in cell growth, differentiation, and higher rates of apoptosis in melanoma cell lines. Differential expression of both SPRY4 and SPRY4-IT1 was also detected in vivo, in 30 distinct patient s les, classified as primary in situ, regional metastatic, distant metastatic, and nodal metastatic melanoma. The elevated expression of SPRY4-IT1 in melanoma cells compared to melanocytes, its accumulation in cell cytoplasm, and effects on cell dynamics, including increased rate of wound closure on SPRY4-IT1 overexpression, suggest that the higher expression of SPRY4-IT1 may have an important role in the molecular etiology of human melanoma. Cancer Res 71(11) 3852–62. ©2011 AACR.
Publisher: Proceedings of the National Academy of Sciences
Date: 28-06-2019
Abstract: In hypersaline environments, Nanohaloarchaeota (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaeota [DPANN] superphylum) are thought to be free-living microorganisms. We report cultivation of 2 strains of Antarctic Nanohaloarchaeota and show that they require the haloarchaeon Halorubrum lacusprofundi for growth. By performing growth using enrichments and fluorescence-activated cell sorting, we demonstrated successful cultivation of Candidatus Nanohaloarchaeum antarcticus, purification of Ca. Nha. antarcticus away from other species, and growth and verification of Ca. Nha. antarcticus with Hrr. lacusprofundi these findings are analogous to those required for fulfilling Koch’s postulates. We use fluorescent in situ hybridization and transmission electron microscopy to assess cell structures and interactions metagenomics to characterize enrichment taxa, generate metagenome assembled genomes, and interrogate Antarctic communities and proteomics to assess metabolic pathways and speculate about the roles of certain proteins. Metagenome analysis indicates the presence of a single species, which is endemic to Antarctic hypersaline systems that support the growth of haloarchaea. The presence of unusually large proteins predicted to function in attachment and invasion of hosts plus the absence of key biosynthetic pathways (e.g., lipids) in metagenome assembled genomes of globally distributed Nanohaloarchaeota indicate that all members of the lineage have evolved as symbionts. Our work expands the range of archaeal symbiotic lifestyles and provides a genetically tractable model system for advancing understanding of the factors controlling microbial symbiotic relationships.
Publisher: Springer Science and Business Media LLC
Date: 22-03-2019
DOI: 10.1038/S41467-019-09272-0
Abstract: Chirality is a property describing any object that is inequivalent to its mirror image. Due to its 5′–3′ directionality, a DNA sequence is distinct from a mirrored sequence arranged in reverse nucleotide-order, and is therefore chiral. A given sequence and its opposing chiral partner sequence share many properties, such as nucleotide composition and sequence entropy. Here we demonstrate that chiral DNA sequence pairs also perform equivalently during molecular and bioinformatic techniques that underpin genetic analysis, including PCR lification, hybridization, whole-genome, target-enriched and nanopore sequencing, sequence alignment and variant detection. Given these shared properties, synthetic DNA sequences mirroring clinically relevant or analytically challenging regions of the human genome are ideal controls for clinical genomics. The addition of synthetic chiral sequences (sequins) to patient tumor s les can prevent false-positive and false-negative mutation detection to improve diagnosis. Accordingly, we propose that sequins can fulfill the need for commutable internal controls in precision medicine.
Publisher: Cold Spring Harbor Laboratory
Date: 07-08-2018
DOI: 10.1101/386847
Abstract: The advent of nanopore sequencing has realised portable genomic research and applications. However, state of the art long read aligners and large reference genomes are not compatible with most mobile computing devices due to their high memory requirements. We show how memory requirements can be reduced through parameter optimization and reference genome partitioning, but highlight the associated limitations and caveats of these approaches. We then demonstrate how these issues can be overcome through an appropriate merging technique. We extend the Minimap2 aligner and demonstrate that long read alignment to the human genome can be performed on a system with 2GB RAM with negligible impact on accuracy.
Publisher: Springer Science and Business Media LLC
Date: 06-08-2018
DOI: 10.1038/S41467-018-05555-0
Abstract: The complexity of microbial communities, combined with technical biases in next-generation sequencing, pose a challenge to metagenomic analysis. Here, we develop a set of internal DNA standards, termed “sequins” (sequencing spike-ins), that together constitute a synthetic community of artificial microbial genomes. Sequins are added to environmental DNA s les prior to library preparation, and undergo concurrent sequencing with the accompanying s le. We validate the performance of sequins by comparison to mock microbial communities, and demonstrate their use in the analysis of real metagenome s les. We show how sequins can be used to measure fold change differences in the size and structure of accompanying microbial communities, and perform quantitative normalization between s les. We further illustrate how sequins can be used to benchmark and optimize new methods, including nanopore long-read sequencing technology. We provide metagenome sequins, along with associated data sets, protocols, and an accompanying software toolkit, as reference standards to aid in metagenomic studies.
Publisher: Cold Spring Harbor Laboratory
Date: 24-09-2018
DOI: 10.1101/424945
Abstract: High-throughput single-cell RNA-Sequencing is a powerful technique for gene expression profiling of complex and heterogeneous cellular populations such as the immune system. However, these methods only provide short-read sequence from one end of a cDNA template, making them poorly suited to the investigation of gene-regulatory events such as mRNA splicing, adaptive immune responses or somatic genome evolution. To address this challenge, we have developed a method that combines targeted long-read sequencing with short-read based transcriptome profiling of barcoded single cell libraries generated by droplet-based partitioning. We use Repertoire And Gene Expression sequencing (RAGE-seq) to accurately characterize full-length T cell (TCR) and B cell (BCR) receptor sequences and transcriptional profiles of more than 7,138 lymphocytes s led from the primary tumour and draining lymph node of a breast cancer patient. With this method we show that somatic mutation, alternate splicing and clonal evolution of T and B lymphocytes can be tracked across these tissue compartments. Our results demonstrate that RAGE-Seq is an accessible and cost-effective method for high-throughput deep single cell profiling, applicable to a wide range of biological challenges.
Publisher: Cold Spring Harbor Laboratory
Date: 05-09-2020
DOI: 10.1101/756122
Abstract: Nanopore sequencing has the potential to revolutionise genomics by realising portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these applications requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. For instance, comparing raw nanopore signals to a biological reference sequence is a computationally complex task despite leveraging a dynamic programming algorithm for Adaptive Banded Event Alignment (ABEA)—a commonly used approach to polish sequencing data and identify non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c ) to efficiently run on heterogeneous CPU-GPU architectures. By optimising memory, compute and load balancing between CPU and GPU, we demonstrate how f5c can perform ~3-5× faster than the original implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at asindu2008/f5c .
Publisher: Oxford University Press (OUP)
Date: 07-05-2010
DOI: 10.1093/NAR/GKQ349
Publisher: Springer Science and Business Media LLC
Date: 17-02-2019
DOI: 10.1038/S41467-019-11049-4
Abstract: High-throughput single-cell RNA sequencing is a powerful technique but only generates short reads from one end of a cDNA template, limiting the reconstruction of highly erse sequences such as antigen receptors. To overcome this limitation, we combined targeted capture and long-read sequencing of T-cell-receptor (TCR) and B-cell-receptor (BCR) mRNA transcripts with short-read transcriptome profiling of barcoded single-cell libraries generated by droplet-based partitioning. We show that Repertoire and Gene Expression by Sequencing (RAGE-Seq) can generate accurate full-length antigen receptor sequences at nucleotide resolution, infer B-cell clonal evolution and identify alternatively spliced BCR transcripts. We apply RAGE-Seq to 7138 cells s led from the primary tumor and draining lymph node of a breast cancer patient to track transcriptome profiles of expanded lymphocyte clones across tissues. Our results demonstrate that RAGE-Seq is a powerful method for tracking the clonal evolution from large numbers of lymphocytes applicable to the study of immunity, autoimmunity and cancer.
Publisher: Frontiers Media SA
Date: 12-04-2019
Publisher: Cambridge University Press (CUP)
Date: 2023
DOI: 10.1017/ASH.2023.119
Abstract: We evaluated the added value of infection control-guided, on demand, and locally performed severe acute respiratory coronavirus virus 2 (SARS-CoV-2) genomic sequencing to support outbreak investigation and control in acute-care settings. This 18-month prospective molecular epidemiology study was conducted at a tertiary-care hospital in Montreal, Canada. When nosocomial transmission was suspected by local infection control, viral genomic sequencing was performed locally for all putative outbreak cases. Molecular and conventional epidemiology data were correlated on a just-in-time basis to improve understanding of coronavirus disease 2019 (COVID-19) transmission and reinforce or adapt control measures. Between April 2020 and October 2021, 6 outbreaks including 59 nosocomial infections (per the epidemiological definition) were investigated. Genomic data supported 7 distinct transmission clusters involving 6 patients and 26 healthcare workers. We identified multiple distinct modes of transmission, which led to reinforcement and adaptation of infection control measures. Molecular epidemiology data also refuted (n = 14) suspected transmission events in favor of community acquired but institutionally clustered cases. SARS-CoV-2 genomic sequencing can refute or strengthen transmission hypotheses from conventional nosocomial epidemiological investigations, and guide implementation of setting-specific control strategies. Our study represents a template for prospective, on site, outbreak-focused SARS-CoV-2 sequencing. This approach may become increasingly relevant in a COVID-19 endemic state where systematic sequencing within centralized surveillance programs is not available. clinicaltrials.gov identifier: NCT05411562
Publisher: Elsevier BV
Date: 09-2021
DOI: 10.1016/J.CELREP.2021.109722
Abstract: DNA replication timing and three-dimensional (3D) genome organization are associated with distinct epigenome patterns across large domains. However, whether alterations in the epigenome, in particular cancer-related DNA hypomethylation, affects higher-order levels of genome architecture is still unclear. Here, using Repli-Seq, single-cell Repli-Seq, and Hi-C, we show that genome-wide methylation loss is associated with both concordant loss of replication timing precision and deregulation of 3D genome organization. Notably, we find distinct disruption in 3D genome compartmentalization, striking gains in cell-to-cell replication timing heterogeneity and loss of allelic replication timing in cancer hypomethylation models, potentially through the gene deregulation of DNA replication and genome organization pathways. Finally, we identify ectopic H3K4me3-H3K9me3 domains from across large hypomethylated domains, where late replication is maintained, which we purport serves to protect against catastrophic genome reorganization and aberrant gene transcription. Our results highlight a potential role for the methylome in the maintenance of 3D genome regulation.
Publisher: Frontiers Media SA
Date: 21-02-2022
Abstract: The genome of the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), the pathogen that causes coronavirus disease 2019 (COVID-19), has been sequenced at an unprecedented scale leading to a tremendous amount of viral genome sequencing data. To assist in tracing infection pathways and design preventive strategies, a deep understanding of the viral genetic ersity landscape is needed. We present here a set of genomic surveillance tools from population genetics which can be used to better understand the evolution of this virus in humans. To illustrate the utility of this toolbox, we detail an in depth analysis of the genetic ersity of SARS-CoV-2 in first year of the COVID-19 pandemic. We analyzed 329,854 high-quality consensus sequences published in the GISAID database during the pre-vaccination phase. We demonstrate that, compared to standard phylogenetic approaches, haplotype networks can be computed efficiently on much larger datasets. This approach enables real-time lineage identification, a clear description of the relationship between variants of concern, and efficient detection of recurrent mutations. Furthermore, time series change of Tajima's D by haplotype provides a powerful metric of lineage expansion. Finally, principal component analysis (PCA) highlights key steps in variant emergence and facilitates the visualization of genomic variation in the context of SARS-CoV-2 ersity. The computational framework presented here is simple to implement and insightful for real-time genomic surveillance of SARS-CoV-2 and could be applied to any pathogen that threatens the health of populations of humans and other organisms.
Publisher: Cold Spring Harbor Laboratory
Date: 04-12-2019
DOI: 10.1101/864322
Abstract: Nanopore sequencing has enabled sequencing of native RNA molecules without conversion to cDNA, thus opening the gates to a new era for the unbiased study of RNA biology. However, a formal barcoding protocol for direct sequencing of native RNA molecules is currently lacking, limiting the efficient processing of multiple s les in the same flowcell. A major limitation for the development of barcoding protocols for direct RNA sequencing is the error rate introduced during the base-calling process, especially towards the 5’ and 3’ ends of reads, which complicates sequence-based barcode demultiplexing. Here, we propose a novel strategy to barcode and demultiplex direct RNA sequencing nanopore data, which does not rely on base-calling or additional library preparation steps. Specifically, custom DNA oligonucleotides are ligated to RNA transcripts during library preparation. Then, raw current signal corresponding to the DNA barcode is extracted and transformed into an array of pixels, which is used to determine the underlying barcode using a deep convolutional neural network classifier. Our method, DeePlexiCon , implements a 20-layer residual neural network model that can demultiplex 93% of the reads with 95.1% specificity, or 60% of reads with 99.9% specificity. The availability of an efficient and simple barcoding strategy for native RNA sequencing will enhance the use of direct RNA sequencing by making it more cost-effective to the entire community. Moreover, it will facilitate the applicability of direct RNA sequencing to s les where the RNA amounts are limited, such as patient-derived s les.
Publisher: Cold Spring Harbor Laboratory
Date: 27-06-2022
DOI: 10.1101/2022.06.22.22276550
Abstract: Gene expression profiling provides a detailed molecular snapshot of cellular phenotypes that can be used to compare different biological conditions. Nanopore sequencing technology can generate high-resolution transcriptomic data in real-time and at low cost, which heralds new opportunities for molecular medicine. In this study, we demonstrate the clinical utility of real-time transcriptomic profiling by processing RNA sequencing data from childhood acute lymphoblastic leukemia (ALL) patients on-the-fly with a trained neural network classifier. This strategy successfully distinguished 11/12 representative ALL molecular subtypes and one non-leukemia control in as little as 5 minutes of sequencing on a MinION sequencer or in less than 1 hour on disposable, low cost Flongle flow cells. Our findings suggest that real-time transcriptomics constitutes a drastically efficient solution for the molecular diagnosis of ALL and other diseases, where conventional clinical workflows require days if not weeks to achieve similar results.
Publisher: Springer Science and Business Media LLC
Date: 13-03-2019
DOI: 10.1038/S41598-019-40739-8
Abstract: The advent of Nanopore sequencing has realised portable genomic research and applications. However, state of the art long read aligners and large reference genomes are not compatible with most mobile computing devices due to their high memory requirements. We show how memory requirements can be reduced through parameter optimisation and reference genome partitioning, but highlight the associated limitations and caveats of these approaches. We then demonstrate how these issues can be overcome through an appropriate merging technique. We incorporated multi-index merging into the Minimap2 aligner and demonstrate that long read alignment to the human genome can be performed on a system with 2 GB RAM with negligible impact on accuracy.
Publisher: Springer Science and Business Media LLC
Date: 09-09-2019
DOI: 10.1038/S41467-019-11713-9
Abstract: The epitranscriptomics field has undergone an enormous expansion in the last few years however, a major limitation is the lack of generic methods to map RNA modifications transcriptome-wide. Here, we show that using direct RNA sequencing, N 6 -methyladenosine (m 6 A) RNA modifications can be detected with high accuracy, in the form of systematic errors and decreased base-calling qualities. Specifically, we find that our algorithm, trained with m 6 A-modified and unmodified synthetic sequences, can predict m 6 A RNA modifications with ~90% accuracy. We then extend our findings to yeast data sets, finding that our method can identify m 6 A RNA modifications in vivo with an accuracy of 87%. Moreover, we further validate our method by showing that these ‘errors’ are typically not observed in yeast ime4 -knockout strains, which lack m 6 A modifications. Our results open avenues to investigate the biological roles of RNA modifications in their native RNA context.
Publisher: Public Library of Science (PLoS)
Date: 02-12-2021
DOI: 10.1371/JOURNAL.PONE.0260714
Abstract: The first confirmed case of COVID-19 in Quebec, Canada, occurred at Verdun Hospital on February 25, 2020. A month later, a localized outbreak was observed at this hospital. We performed tiled licon whole genome nanopore sequencing on nasopharyngeal swabs from all SARS-CoV-2 positive s les from 31 March to 17 April 2020 in 2 local hospitals to assess viral ersity (unknown at the time in Quebec) and potential associations with clinical outcomes. We report 264 viral genomes from 242 in iduals–both staff and patients–with associated clinical features and outcomes, as well as longitudinal s les and technical replicates. Viral lineage assessment identified multiple subclades in both hospitals, with a predominant subclade in the Verdun outbreak, indicative of hospital-acquired transmission. Dimensionality reduction identified two subclades with mutations of clinical interest, namely in the Spike protein, that evaded supervised lineage assignment methods–including Pangolin and NextClade supervised lineage assignment tools. We also report that certain symptoms (headache, myalgia and sore throat) are significantly associated with favorable patient outcomes. Our findings demonstrate the strength of unsupervised, data-driven analyses whilst suggesting that caution should be used when employing supervised genomic workflows, particularly during the early stages of a pandemic.
Publisher: Springer Science and Business Media LLC
Date: 05-08-2020
DOI: 10.1186/S12859-020-03697-X
Abstract: Nanopore sequencing enables portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these outcomes requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. However, comparing raw nanopore signals to a biological reference sequence is a computationally complex task. The dynamic programming algorithm called Adaptive Banded Event Alignment (ABEA) is a crucial step in polishing sequencing data and identifying non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c ) to efficiently run on heterogeneous CPU-GPU architectures. By optimising memory, computations and load balancing between CPU and GPU, we demonstrate how f5c can perform ∼3-5 × faster than an optimised version of the original CPU-only implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at asindu2008/f5c .
Publisher: Springer Science and Business Media LLC
Date: 12-2017
Publisher: Cold Spring Harbor Laboratory
Date: 11-05-2020
Abstract: Noncoding RNA has a proven ability to direct and regulate chromatin modifications by acting as scaffolds between DNA and histone-modifying complexes. However, it is unknown if ncRNA plays any role in DNA replication and epigenome maintenance, including histone eviction and reinstallment of histone modifications after genome duplication. Isolation of nascent chromatin has identified a large number of RNA-binding proteins in addition to unknown components of the replication and epigenetic maintenance machinery. Here, we isolated and characterized long and short RNAs associated with nascent chromatin at active replication forks and track RNA composition during chromatin maturation across the cell cycle. Shortly after fork passage, GA-rich-, alpha- and TElomeric Repeat-containing RNAs (TERRA) are associated with replicated DNA. These repeat containing RNAs arise from loci undergoing replication, suggesting an interaction in cis. Post-replication during chromatin maturation, and even after mitosis in G1, the repeats remain enriched on DNA. This suggests that specific types of repeat RNAs are transcribed shortly after DNA replication and stably associate with their loci of origin throughout the cell cycle. The presented method and data enable studies of RNA interactions with replication forks and post-replicative chromatin and provide insights into how repeat RNAs and their engagement with chromatin are regulated with respect to DNA replication and across the cell cycle.
Publisher: Cold Spring Harbor Laboratory
Date: 22-06-2021
Abstract: The testis transcriptome is highly complex and includes RNAs that potentially hybridize to form double-stranded RNA (dsRNA). We isolated dsRNA using the monoclonal J2 antibody and deep-sequenced the enriched s les from testes of juvenile Dicer1 knockout mice, age-matched controls, and adult animals. Comparison of our data set with recently published data from mouse liver revealed that the dsRNA transcriptome in testis is markedly different from liver: In testis, dsRNA-forming transcripts derive from mRNAs including promoters and immediate downstream regions, whereas in somatic cells they originate more often from introns and intergenic transcription. The genes that generate dsRNA are significantly expressed in isolated male germ cells with particular enrichment in pachytene spermatocytes. dsRNA formation is lower on the sex (X and Y) chromosomes. The dsRNA transcriptome is significantly less complex in juvenile mice as compared to adult controls and, possibly as a consequence, the knockout of Dicer1 has only a minor effect on the total number of transcript peaks associated with dsRNA. The comparison between dsRNA-associated genes in testis and liver with a reported set of genes that produce endogenous siRNAs reveals a significant overlap in testis but not in liver. Testis dsRNAs also significantly associate with natural antisense genes—again, this feature is not observed in liver. These findings point to a testis-specific mechanism involving natural antisense transcripts and the formation of dsRNAs that feed into the RNA interference pathway, possibly to mitigate the mutagenic impacts of recombination and transposon mobilization.
Publisher: Public Library of Science (PLoS)
Date: 28-09-2007
Publisher: Springer Science and Business Media LLC
Date: 20-03-2008
Abstract: Leishmania and other members of the Trypanosomatidae family erged early on in eukaryotic evolution and consequently display unique cellular properties. Their apparent lack of transcriptional regulation is compensated by complex post-transcriptional control mechanisms, including the processing of polycistronic transcripts by means of coupled trans -splicing and polyadenylation. Trans -splicing signals are often U-rich polypyrimidine (poly(Y)) tracts, which precede AG splice acceptor sites. However, as opposed to higher eukaryotes there is no consensus polyadenylation signal in trypanosomatid mRNAs. We refined a previously reported method to target 5' splice junctions by incorporating the pyrimidine content of query sequences into a scoring function. We also investigated a novel approach for predicting polyadenylation (poly(A)) sites in-silico , by comparing query sequences to polyadenylated expressed sequence tags (ESTs) using position-specific scanning matrices (PSSMs). An additional analysis of the distribution of putative splice junction to poly(A) distances helped to increase prediction rates by limiting the scanning range. These methods were able to simplify splice junction prediction without loss of precision and to increase polyadenylation site prediction from 22% to 47% within 100 nucleotides. We propose a simplified trans -splicing prediction tool and a novel poly(A) prediction tool based on comparative sequence analysis. We discuss the impact of certain regions surrounding the poly(A) sites on prediction rates and contemplate correlating biological mechanisms. This work aims to sharpen the identification of potentially functional untranslated regions (UTRs) in a large-scale, comparative genomics framework.
Publisher: Springer Science and Business Media LLC
Date: 03-01-2022
DOI: 10.1038/S41587-021-01147-4
Abstract: Nanopore sequencing depends on the FAST5 file format, which does not allow efficient parallel analysis. Here we introduce SLOW5, an alternative format engineered for efficient parallelization and acceleration of nanopore data analysis. Using the ex le of DNA methylation profiling of a human genome, analysis runtime is reduced from more than two weeks to approximately 10.5 h on a typical high-performance computer. SLOW5 is approximately 25% smaller than FAST5 and delivers consistent improvements on different computer architectures.
Publisher: Oxford University Press (OUP)
Date: 23-07-2019
DOI: 10.1093/BIOINFORMATICS/BTZ586
Abstract: The management of raw nanopore sequencing data poses a challenge that must be overcome to facilitate the creation of new bioinformatics algorithms predicated on signal analysis. SquiggleKit is a toolkit for manipulating and interrogating nanopore data that simplifies file handling, data extraction, visualization and signal processing. SquiggleKit is cross platform and freely available from GitHub at (github.com/Psy-Fer/SquiggleKit). Detailed documentation can be found at (psy-fer.github.io/SquiggleKitDocs/). All tools have been designed to operate in python 2.7+, with minimal additional libraries. Supplementary data are available at Bioinformatics online.
Publisher: Elsevier BV
Date: 02-2022
Publisher: Public Library of Science (PLoS)
Date: 26-07-2012
Publisher: American Society for Microbiology
Date: 19-07-2022
DOI: 10.1128/AAC.00198-22
Abstract: In vitro selection of remdesivir-resistant severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) revealed the emergence of a V166L substitution, located outside of the polymerase active site of the Nsp12 protein, after 9 passages of a single lineage. V166L remained the only Nsp12 substitution after 17 passages (10 μM remdesivir), conferring a 2.3-fold increase in 50% effective concentration (EC 50 ).
Publisher: Cold Spring Harbor Laboratory
Date: 03-06-2021
DOI: 10.1101/2021.06.03.446959
Abstract: The rapid, global dispersion of SARS-CoV-2 since its initial identification in December 2019 has led to the emergence of a erse range of variants. The initial concerns regarding the virus were quickly compounded with concerns relating to the impact of its mutated forms on viral infectivity, pathogenicity and immunogenicity. To address the latter, we seek to understand how the mutational landscape of SARS-CoV-2 has shaped HLA-restricted T cell immunity at the population level during the first year of the pandemic, before mass vaccination. We analyzed a total of 330,246 high quality SARS-CoV-2 genome assemblies s led across 143 countries and all major continents. Strikingly, we found that specific mutational patterns in SARS-CoV-2 ersify T cell epitopes in an HLA supertype-dependent manner. In fact, we observed that proline residues are preferentially removed from the proteome of prevalent mutants, leading to a predicted global loss of SARS-CoV-2 T cell epitopes in in iduals expressing HLA-B alleles of the B7 supertype family. In addition, we show that this predicted global loss of epitopes is largely driven by a dominant C-to-U mutation type at the RNA level. These results indicate that B7 supertype-associated epitopes, including the most immunodominant ones, were more likely to escape CD8+ T cell immunosurveillance during the first year of the pandemic. Together, our study lays the foundation to help understand how SARS-CoV-2 mutants shape the repertoire of T cell targets and T cell immunity across human populations. The proposed theoretical framework has implications in viral evolution, disease severity, vaccine resistance and herd immunity.
Publisher: Springer Science and Business Media LLC
Date: 02-03-2021
DOI: 10.1186/S12864-021-07460-1
Abstract: Hepatitis C (HCV) and many other RNA viruses exist as rapidly mutating quasi-species populations in a single infected host. High throughput characterization of full genome, within-host variants is still not possible despite advances in next generation sequencing. This limitation constrains viral genomic studies that depend on accurate identification of hemi-genome or whole genome, within-host variants, especially those occurring at low frequencies. With the advent of third generation long read sequencing technologies, including Oxford Nanopore Technology (ONT) and PacBio platforms, this problem is potentially surmountable. ONT is particularly attractive in this regard due to the portable nature of the MinION sequencer, which makes real-time sequencing in remote and resource-limited locations possible. However, this technology (termed here ‘nanopore sequencing’) has a comparatively high technical error rate. The present study aimed to assess the utility, accuracy and cost-effectiveness of nanopore sequencing for HCV genomes. We also introduce a new bioinformatics tool (Nano-Q) to differentiate within-host variants from nanopore sequencing. The Nanopore platform, when the coverage exceeded 300 reads, generated comparable consensus sequences to Illumina sequencing. Using HCV Envelope plasmids (~ 1800 nt) mixed in known proportions, the capacity of nanopore sequencing to reliably identify variants with an abundance as low as 0.1% was demonstrated, provided the autologous reference sequence was available to identify the matching reads. Successful pooling and nanopore sequencing of 52 s les from patients with HCV infection demonstrated its cost effectiveness (AUD$ 43 per s le with nanopore sequencing versus $100 with paired-end short read technology). The Nano-Q tool successfully separated between-host sequences, including those from the same subtype, by bulk sorting and phylogenetic clustering without an autologous reference sequence (using only a subtype-specific generic reference). The pipeline also identified within-host viral variants and their abundance when the parameters were appropriately adjusted. Cost effective HCV whole genome sequencing and within-host variant identification without haplotype reconstruction are potential advantages of nanopore sequencing.
Publisher: Springer Science and Business Media LLC
Date: 29-05-2008
Abstract: Leishmania parasites cause a erse spectrum of diseases in humans ranging from spontaneously healing skin lesions (e.g., L. major ) to life-threatening visceral diseases (e.g., L. infantum ). The high conservation in gene content and genome organization between Leishmania major and Leishmania infantum contrasts their distinct pathophysiologies, suggesting that highly regulated hierarchical and temporal changes in gene expression may be involved. We used a multispecies DNA oligonucleotide microarray to compare whole-genome expression patterns of promastigote (sandfly vector) and amastigote (mammalian macrophages) developmental stages between L. major and L. infantum . Seven per cent of the total L. infantum genome and 9.3% of the L. major genome were differentially expressed at the RNA level throughout development. The main variations were found in genes involved in metabolism, cellular organization and biogenesis, transport and genes encoding unknown function. Remarkably, this comparative global interspecies analysis demonstrated that only 10–12% of the differentially expressed genes were common to L. major and L. infantum . Differentially expressed genes are randomly distributed across chromosomes further supporting a posttranscriptional control, which is likely to involve a variety of 3'UTR elements. This study highlighted substantial differences in gene expression patterns between L. major and L. infantum . These important species-specific differences in stage-regulated gene expression may contribute to the disease tropism that distinguishes L. major from L. infantum.
Publisher: Cold Spring Harbor Laboratory
Date: 16-02-2019
DOI: 10.1101/549741
Abstract: The management of raw nanopore sequencing data poses a challenge that must be overcome to accelerate the development of new bioinformatics algorithms predicated on signal analysis. SquiggleKit is a toolkit for manipulating and interrogating nanopore data that simplifies file handling, data extraction, visualisation, and signal processing. Its modular tools can be used to reduce file numbers and memory footprint, identify poly-A tails, target barcodes, adapters, and find nucleotide sequence motifs in raw nanopore signal, amongst other applications. SquiggleKit serves as a bioinformatics portal into signal space, for novice and experienced users alike. It is comprehensively documented, simple to use, cross-platform compatible and freely available from ( github.com/Psy-Fer/SquiggleKit ).
Publisher: Springer Science and Business Media LLC
Date: 25-05-2020
Publisher: Springer New York
Date: 29-11-2017
DOI: 10.1007/978-1-4939-6613-4_4
Abstract: Protein-coding RNAs represent only a small fraction of the transcriptional output in higher eukaryotes. The remaining RNA species encompass a broad range of molecular functions and regulatory roles, a consequence of the structural polyvalence of RNA polymers. Albeit several classes of small noncoding RNAs are relatively well characterized, the accessibility of affordable high-throughput sequencing is generating a wealth of novel, unannotated transcripts, especially long noncoding RNAs (lncRNAs) that are derived from genomic regions that are antisense, intronic, intergenic, and overlapping protein-coding loci. Parsing and characterizing the functions of noncoding RNAs-lncRNAs in particular-is one of the great challenges of modern genome biology. Here we discuss concepts and computational methods for the identification of structural domains in lncRNAs from genomic and transcriptomic data. In the first part, we briefly review how to identify RNA structural motifs in in idual lncRNAs. In the second part, we describe how to leverage the evolutionary dynamics of structured RNAs in a computationally efficient screen to detect putative functional lncRNA motifs using comparative genomics.
Publisher: Cold Spring Harbor Laboratory
Date: 04-2011
DOI: 10.1261/RNA.2528811
Abstract: Long noncoding RNAs (lncRNAs) are increasingly recognized to play major regulatory roles in development and disease. To identify novel regulators in breast biology, we identified differentially regulated lncRNAs during mouse mammary development. Among the highest and most differentially expressed was a transcript ( Zfas1 ) antisense to the 5′ end of the protein-coding gene Znfx1 . In vivo, Zfas1 RNA is localized within the ducts and alveoli of the mammary gland. Zfas1 intronically hosts three previously undescribed C/D box snoRNAs (SNORDs): Snord12 , Snord12b , and Snord12c . In contrast to the general assumption that noncoding SNORD-host transcripts function only as vehicles to generate snoRNAs, knockdown of Zfas1 in a mammary epithelial cell line resulted in increased cellular proliferation and differentiation, while not substantially altering the levels of the SNORDs. In support of an independent function, we also found that Zfas1 is extremely stable, with a half-life h. Expression analysis of the SNORDs revealed these were expressed at different levels, likely a result of distinct structures conferring differential stability. While there is relatively low primary sequence conservation between Zfas1 and its syntenic human ortholog ZFAS1 , their predicted secondary structures have similar features. Like Zfas1 , ZFAS1 is highly expressed in the mammary gland and is down-regulated in breast tumors compared to normal tissue. We propose a functional role for Zfas1/ ZFAS1 in the regulation of alveolar development and epithelial cell differentiation in the mammary gland, which, together with its dysregulation in human breast cancer, suggests ZFAS1 as a putative tumor suppressor gene.
Publisher: Elsevier BV
Date: 08-2011
Location: Canada
Start Date: 2013
End Date: 2017
Funder: Cancer Council NSW
View Funded ActivityStart Date: 2018
End Date: 2019
Funder: Australian Research Council
View Funded ActivityStart Date: 2018
End Date: 2018
Funder: Cancer Institute NSW
View Funded ActivityStart Date: 2018
End Date: 08-2019
Amount: $388,950.00
Funder: Australian Research Council
View Funded Activity