ARDC Research Link Australia

Publication

An integrated platform to systematically identify causal variants and genes for polygenic human traits

Publisher: Cold Spring Harbor Laboratory

Date: 24-10-2019

Abstract: Genome-wide association studies (GWAS) have identified over 150,000 links between common genetic variants and human traits or complex diseases. Over 80% of these associations map to polymorphisms in non-coding DNA. Therefore, the challenge is to identify disease-causing variants, the genes they affect, and the cells in which these effects occur. We have developed a platform using ATAC-seq, DNaseI footprints, NG Capture-C and machine learning to address this challenge. Applying this approach to red blood cell traits identifies a significant proportion of known causative variants and their effector genes, which we show can be validated by direct in vivo modelling.

Publication

Multi Locus View : An Extensible Web Based Tool for the Analysis of Genomic Data

Publisher: Cold Spring Harbor Laboratory

Date: 16-06-2020

DOI: 10.1101/2020.06.15.151837

Abstract: Tracking and understanding data quality, analysis and reproducibility are critical concerns in the biological sciences. This is especially true in genomics where Next Generation Sequencing (NGS) based technologies such as ChIP-seq, RNA-seq and ATAC-seq are generating a flood of genome-scale data. These data-types are extremely high level and complex with single experiments capable of mapping ten to hundreds of thousands of biologically meaningful events across the genome. However, such data are usually processed with automated tools and pipelines, generating tabular outputs and static visualizations. These are difficult to interact with and require substantial bioinformatic skills to manipulate and query. Similarly, interpretation is normally made at a high level without the ability to visualise the underlying data in detail and so the complexity and quality of the real underlying biological signal is lost. Also genomics datasets require integration with other genomics datasets to be properly interpreted and this integration with multiple tracks again requires substantial bioinformatics skills and is difficult to visualise across multiple pertinent datasets. Conventional genome browsers do allow for the detailed visualisation of multiple tracks but are limited to browsing single locations and do not allow for interactions with the dataset as a whole. MLV has been developed to allow users to fluidly interact with genomics datasets at multiple scales, from complete metadata labelled and clustered populations to detailed representations of in idual elements. It has inbuilt tools to integrate signals across multiple datasets and to perform dimensionality reduction and clustering analysis based on the extracted signal, allowing for the high-level analysis of complex datasets while maintaining visualisation of the fine grain structure of the data. MLV’s ability to visualise clustering within the data combined with efficient tools for large-scale tagging of in idual elements makes it a unique tool for the generation of annotated datasets for modern machine learning approaches. Multi Locus View (MLV) is a web based tool for the visualisation, analysis and annotation of Next Generation Sequencing data sets. The user is able to browse the raw data, cluster, and combine the data with other analysis. Intuitive filtering and visualisation then enables the user to quickly locate and annotate regions of interest. User datasets can then be shared with other users or made public for quick assessment from the academic community. MLV is publically available at mlv.molbiol.ox.ac.uk and the source code is available at github.com/Hughes-Genome-Group/mlv

Publication

LanceOtron: a deep learning peak caller for genome sequencing experiments

Publisher: Oxford University Press (OUP)

Date: 22-07-2022

DOI: 10.1093/BIOINFORMATICS/BTAC525

Abstract: Genome sequencing experiments have revolutionized molecular biology by allowing researchers to identify important DNA-encoded elements genome wide. Regions where these elements are found appear as peaks in the analog signal of an assay’s coverage track, and despite the ease with which humans can visually categorize these patterns, the size of many genomes necessitates algorithmic implementations. Commonly used methods focus on statistical tests to classify peaks, discounting that the background signal does not completely follow any known probability distribution and reducing the information-dense peak shapes to simply maximum height. Deep learning has been shown to be highly accurate for many pattern recognition tasks, on par or even exceeding human capabilities, providing an opportunity to reimagine and improve peak calling. We present the peak calling framework LanceOtron, which combines deep learning for recognizing peak shape with multifaceted enrichment calculations for assessing significance. In benchmarking ATAC-seq, ChIP-seq and DNase-seq, LanceOtron outperforms long-standing, gold-standard peak callers through its improved selectivity and near-perfect sensitivity. A fully featured web application is freely available from LanceOtron.molbiol.ox.ac.uk, command line interface via python is pip installable from PyPI at roject/lanceotron/, and source code and benchmarking tests are available at github.com/LHentges/LanceOtron. Supplementary data are available at Bioinformatics online.

Publication

LanceOtron: a deep learning peak caller for ATAC-seq, ChIP-seq, and DNase-seq

Publisher: Cold Spring Harbor Laboratory

Date: 27-01-2021

DOI: 10.1101/2021.01.25.428108

Abstract: ATAC-seq, ChIP-seq, and DNase-seq have revolutionized molecular biology by allowing researchers to identify important DNA-encoded elements genome-wide. Regions where these elements are found appear as peaks in the analog signal of an assay’s coverage track, and despite the ease with which humans can visually categorize these regions, meaningful peak calls from whole genome datasets require complex analytical techniques. Current methods focus on statistical tests to classify peaks, reducing the information-dense peak shapes to simply maximum height, and discounting that background signals do not completely follow any known probability distribution for significance testing. Deep learning has been shown to be highly accurate for image recognition, on par or exceeding human ability, providing an opportunity to reimagine and improve peak calling. We present the peak calling framework LanceOtron, which combines multifaceted enrichment measurements with deep learning image recognition techniques for assessing peak shape. In benchmarking transcription factor binding, chromatin modification, and open chromatin datasets, LanceOtron outperforms the long-standing, gold-standard peak caller MACS2 through its improved selectivity and near perfect sensitivity. In addition to command line accessibility, a graphical web application was designed to give any researcher the ability to generate optimal peak calls and interactive visualizations in a single step.

Publication

Defining genome architecture at base-pair resolution.

Publisher: Springer Science and Business Media LLC

Date: 09-06-2021

DOI: 10.1038/S41586-021-03639-4

Abstract: In higher eukaryotes, many genes are regulated by enhancers that are 10

Publication

Multi Locus View: an extensible web-based tool for the analysis of genomic data.

Publisher: Springer Science and Business Media LLC

Date: 25-05-2021

DOI: 10.1038/S42003-021-02097-Y

Abstract: Tracking and understanding data quality, analysis and reproducibility are critical concerns in the biological sciences. This is especially true in genomics where next generation sequencing (NGS) based technologies such as ChIP-seq, RNA-seq and ATAC-seq are generating a flood of genome-scale data. However, such data are usually processed with automated tools and pipelines, generating tabular outputs and static visualisations. Interpretation is normally made at a high level without the ability to visualise the underlying data in detail. Conventional genome browsers are limited to browsing single locations and do not allow for interactions with the dataset as a whole. Multi Locus View (MLV), a web-based tool, has been developed to allow users to fluidly interact with genomics datasets at multiple scales. The user is able to browse the raw data, cluster, and combine the data with other analysis and annotate the data. User datasets can then be shared with other users or made public for quick assessment from the academic community. MLV is publically available at mlv.molbiol.ox.ac.uk .

Publication

The chromatin remodeller ATRX facilitates diverse nuclear processes, in a stochastic manner, in both heterochromatin and euchromatin

Publisher: Springer Science and Business Media LLC

Date: 17-06-2022

DOI: 10.1038/S41467-022-31194-7

Abstract: The chromatin remodeller ATRX interacts with the histone chaperone DAXX to deposit the histone variant H3.3 at sites of nucleosome turnover. ATRX is known to bind repetitive, heterochromatic regions of the genome including telomeres, ribosomal DNA and pericentric repeats, many of which are putative G-quadruplex forming sequences (PQS). At these sites ATRX plays an ancillary role in a wide range of nuclear processes facilitating replication, chromatin modification and transcription. Here, using an improved protocol for chromatin immunoprecipitation, we show that ATRX also binds active regulatory elements in euchromatin. Mutations in ATRX lead to perturbation of gene expression associated with a reduction in chromatin accessibility, histone modification, transcription factor binding and deposition of H3.3 at the sequences to which it normally binds. In erythroid cells where downregulation of α-globin expression is a hallmark of ATR-X syndrome, perturbation of chromatin accessibility and gene expression occurs in only a subset of cells. The stochastic nature of this process suggests that ATRX acts as a general facilitator of cell specific transcriptional and epigenetic programmes, both in heterochromatin and euchromatin.

Stephen Taylor

Researcher

Publications

An integrated platform to systematically identify causal variants and genes for polygenic human traits

Multi Locus View : An Extensible Web Based Tool for the Analysis of Genomic Data

LanceOtron: a deep learning peak caller for genome sequencing experiments

LanceOtron: a deep learning peak caller for ATAC-seq, ChIP-seq, and DNase-seq

Defining genome architecture at base-pair resolution.

Multi Locus View: an extensible web-based tool for the analysis of genomic data.

The chromatin remodeller ATRX facilitates diverse nuclear processes, in a stochastic manner, in both heterochromatin and euchromatin

Related Organisations

Wellcome Centre For Human Genetics

University Of Oxford

Related Funding Activities

Stephen Taylor

Researcher

Related Links

Publications

An integrated platform to systematically identify causal variants and genes for polygenic human traits

Multi Locus View : An Extensible Web Based Tool for the Analysis of Genomic Data

LanceOtron: a deep learning peak caller for genome sequencing experiments

LanceOtron: a deep learning peak caller for ATAC-seq, ChIP-seq, and DNase-seq

Defining genome architecture at base-pair resolution.

Multi Locus View: an extensible web-based tool for the analysis of genomic data.

The chromatin remodeller ATRX facilitates diverse nuclear processes, in a stochastic manner, in both heterochromatin and euchromatin

Related Organisations

Wellcome Centre For Human Genetics

University Of Oxford

Related Funding Activities

ARDC NEWSLETTER SIGNUP