ARDC Research Link Australia

Publication

RefKA: A fast and efficient long-read genome assembly approach for large and complex genomes

Publisher: Cold Spring Harbor Laboratory

Date: 18-04-2020

DOI: 10.1101/2020.04.17.035287

Abstract: Recent advances in long-read sequencing have the potential to produce more complete genome assemblies using sequence reads which can span repetitive regions. However, overlap based assembly methods routinely used for this data require significant computing time and resources. Here, we have developed RefKA, a reference-based approach for long read genome assembly. This approach relies on breaking up a closely related reference genome into bins, aligning k -mers unique to each bin with PacBio reads, and then assembling each bin in parallel followed by a final bin-stitching step. During benchmarking, we assembled the wheat Chinese Spring (CS) genome using publicly available PacBio reads in parallel in 168 wall hours on a 250 CPU system. The maximum RAM used was 300 Gb and the computing time was 42,000 CPU hours. The approach opens applications for the assembly of other large and complex genomes with much-reduced computing requirements. The RefKA pipeline is available at github.com/AppliedBioinformatics/RefKA

Publication

Analysis of Bisulfite Sequencing Data Using Bismark and DMRcaller to Identify Differentially Methylated Regions

Publisher: Springer US

Date: 2022

DOI: 10.1007/978-1-0716-2067-0_23

Abstract: The mechanism of the addition of a methyl group to cytosine has been identified as one of several heritable epigenetic mechanisms. In plants, DNA methylation is involved in mediating response to stress, plant development, polyploidy, and domestication through regulation of gene expression. The correlation of epigenetic variation to phenotypic traits expands our understanding toward plant evolution, and provides new source for targeted manipulation in crop improvement. To address the increasing interest to map methylation landscape in plant species, this chapter describes methods to analyze bisulfite sequencing data and identify epigenetic variation between s les. We also detailed guidelines to highlight possible optimizations, as well as ways to tailor parameters according to data and biological variability.

Publication

There’s more to it: uncovering genomewide DNA methylation heterogeneity

Publisher: Future Medicine Ltd

Date: 07-2023

DOI: 10.2217/EPI-2023-0228

Abstract: Tweetable abstract Monitoring changes in methylation heterogeneity can be powerful in detecting disease progression early. This editorial highlights the importance of profiling methylation heterogeneity and identifies existing measures and research gaps.

Publication

Assembly and comparison of two closely related Brassica napus genomes

Publisher: Wiley

Date: 14-06-2017

DOI: 10.1111/PBI.12742

Publication

Long‐read sequencing reveals widespread intragenic structural variants in a recent allopolyploid crop plant

Publisher: Wiley

Date: 06-09-2020

DOI: 10.1111/PBI.13456

Abstract: Genome structural variation (SV) contributes strongly to trait variation in eukaryotic species and may have an even higher functional significance than single‐nucleotide polymorphism (SNP). In recent years, there have been a number of studies associating large chromosomal scale SV ranging from hundreds of kilobases all the way up to a few megabases to key agronomic traits in plant genomes. However, there have been little or no efforts towards cataloguing small‐ (30–10 000 bp) to mid‐scale (10 000–30 000 bp) SV and their impact on evolution and adaptation‐related traits in plants. This might be attributed to complex and highly duplicated nature of plant genomes, which makes them difficult to assess using high‐throughput genome screening methods. Here, we describe how long‐read sequencing technologies can overcome this problem, revealing a surprisingly high level of widespread, small‐ to mid‐scale SV in a major allopolyploid crop species, Brassica napus . We found that up to 10% of all genes were affected by small‐ to mid‐scale SV events. Nearly half of these SV events ranged between 100 bp and 1000 bp, which makes them challenging to detect using short‐read Illumina sequencing. Ex les demonstrating the contribution of such SV towards eco‐geographical adaptation and disease resistance in oilseed rape suggest that revisiting complex plant genomes using medium‐coverage long‐read sequencing might reveal unexpected levels of functional gene variation, with major implications for trait regulation and crop improvement.

Publication

The pangenome of hexaploid bread wheat

Publisher: Wiley

Date: 05-04-2017

DOI: 10.1111/TPJ.13515

Abstract: There is an increasing understanding that variation in gene presence-absence plays an important role in the heritability of agronomic traits however, there have been relatively few studies on variation in gene presence-absence in crop species. Hexaploid wheat is one of the most important food crops in the world and intensive breeding has reduced the genetic ersity of elite cultivars. Major efforts have produced draft genome assemblies for the cultivar Chinese Spring, but it is unknown how well this represents the genome ersity found in current modern elite cultivars. In this study we build an improved reference for Chinese Spring and explore gene ersity across 18 wheat cultivars. We predict a pangenome size of 140 500 ± 102 genes, a core genome of 81 070 ± 1631 genes and an average of 128 656 genes in each cultivar. Functional annotation of the variable gene set suggests that it is enriched for genes that may be associated with important agronomic traits. In addition to variation in gene presence, more than 36 million intervarietal single nucleotide polymorphisms were identified across the pangenome. This study of the wheat pangenome provides insight into genome ersity in elite wheat as a basis for genomics-based improvement of this important crop. A wheat pangenome, GBrowse, is available at appliedbioinformatics.com.au/cgi-bin/gb2/gbrowse/WheatPan/, and data are available to download from heat_genome_databases.php.

Publication

Draft Genome Sequences of Helicobacter pylori Isolates from Malaysia, Cultured from Patients with Functional Dyspepsia and Gastric Cancer

Publisher: American Society for Microbiology

Date: 15-10-2012

DOI: 10.1128/JB.01278-12

Abstract: Helicobacter pylori is the main bacterial causative agent of gastroduodenal disorders and a risk factor for gastric adenocarcinoma and mucosa-associated lymphoid tissue (MALT) lymphoma. The draft genomes of 10 closely related H. pylori isolates from the multiracial Malaysian population will provide an insight into the genetic ersity of isolates in Southeast Asia. These isolates were cultured from gastric biopsy s les from patients with functional dyspepsia and gastric cancer. The availability of this genomic information will provide an opportunity for examining the evolution and population structure of H. pylori isolates from Southeast Asia, where the East meets the West.

Publication

Chromosome-Scale Assembly of Winter Oilseed Rape Brassica napus

Publisher: Frontiers Media SA

Date: 28-04-2020

DOI: 10.3389/FPLS.2020.00496

Publication

Modelling of gene loss propensity in the pangenomes of three Brassica species suggests different mechanisms between polyploids and diploids

Publisher: Wiley

Date: 24-08-2021

DOI: 10.1111/PBI.13674

Abstract: Plant genomes demonstrate significant presence/absence variation (PAV) within a species however, the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidization, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.

Publication

The genome of a southern hemisphere seagrass species (Zostera muelleri)

Publisher: Oxford University Press (OUP)

Date: 03-07-2016

DOI: 10.1104/PP.16.00868

Publication

Genome-wide survey of the seagrass Zostera muelleri suggests modification of the ethylene signalling network

Publisher: Oxford University Press (OUP)

Date: 06-01-2015

DOI: 10.1093/JXB/ERU510

Publication

Assessment of Web-Based Consumer Reviews as a Resource for Drug Performance

Publisher: JMIR Publications Inc.

Date: 28-08-2015

DOI: 10.2196/JMIR.4396

Publication

Transgressive and parental dominant gene expression and cytosine methylation during seed development inBrassica napushybrids

Publisher: Cold Spring Harbor Laboratory

Date: 06-09-2022

DOI: 10.1101/2022.09.05.506610

Abstract: The enhanced performance of hybrids though heterosis remains a key aspect in plant breeding, however the underlying mechanisms are still not fully elucidated. To investigate the potential role of transcriptomic and epigenomic patterns in early expression of hybrid vigour, we investigated gene expression, small RNA abundance and genome-wide methylation in hybrids from two distant Brassica napus ecotypes during seed and seedling developmental stages using next-generation sequencing technologies. A total of 71217, 773, 79518 and 31825 differentially expressed genes, microRNAs, small interfering RNAs and differentially methylated regions were identified, respectively. Approximately 70% of the differential expression and methylation patterns observed could be explained due to parental dominance levels. Via gene ontology enrichment and microRNA-target association analyses during seed development we found copies of reproductive, developmental and meiotic genes with transgressive and paternal dominance patterns. Interestingly, maternal dominance was more prominent in hypermethylated and downregulated features during seed formation. This contrasts to the general maternal gamete demethylation reported during gametogenesis in most plant species. Associations between methylation and gene expression allowed identification of putative epialleles with erse pivotal biological functions during seed formation. Furthermore, most differentially methylated regions, differentially expressed siRNAs and transposable elements were found in regions flanking genes that had no differential expression. This suggests that differential expression and methylation of epigenomic features may help maintain expression of pivotal genes in a hybrid context. Differential expression and methylation patterns during seed formation in an F1 hybrid provide novel insight into genes and mechanisms with a potential role in early heterosis. Transcriptomic and epigenomic profiling of gene expression and small RNAs during seed and seedling development reveals expression and methylation dominance levels with implications on early stage heterosis in oilseed rape.

Publication

Genomic comparison of two independent seagrass lineages reveals habitat-driven convergent evolution

Publisher: Oxford University Press (OUP)

Date: 18-04-2018

DOI: 10.1093/JXB/ERY147

Publication

A reference genome for pea provides insight into legume genome evolution

Publisher: Springer Science and Business Media LLC

Date: 09-2019

DOI: 10.1038/S41588-019-0480-1

Abstract: We report the first annotated chromosome-level reference genome assembly for pea, Gregor Mendel's original genetic model. Phylogenetics and paleogenomics show genomic rearrangements across legumes and suggest a major role for repetitive elements in pea genome evolution. Compared to other sequenced Leguminosae genomes, the pea genome shows intense gene dynamics, most likely associated with genome size expansion when the Fabeae erged from its sister tribes. During Pisum evolution, translocation and transposition differentially occurred across lineages. This reference sequence will accelerate our understanding of the molecular basis of agronomically important traits and support crop improvement.

Publication

Long-read sequencing reveals widespread intragenic structural variants in a recent allopolyploid crop plant

Publisher: Cold Spring Harbor Laboratory

Date: 28-01-2020

DOI: 10.1101/2020.01.27.915470

Abstract: Genome structural variation (SV) contributes strongly to trait variation in eukaryotic species and may have an even higher functional significance than single nucleotide polymorphism (SNP). In recent years there have been a number of studies associating large, chromosomal scale SV ranging from hundreds of kilobases all the way up to a few megabases to key agronomic traits in plant genomes. However, there have been little or no efforts towards cataloging small (30 to 10,000 bp) to mid-scale (10,000 bp to 30,000 bp) SV and their impact on evolution and adaptation related traits in plants. This might be attributed to complex and highly-duplicated nature of plant genomes, which makes them difficult to assess using high-throughput genome screening methods. Here we describe how long-read sequencing technologies can overcome this problem, revealing a surprisingly high level of widespread, small to mid-scale SV in a major allopolyploid crop species, Brassica napus . We found that up to 10% of all genes were affected by small to mid-scale SV events. Nearly half of these SV events ranged between 100 bp to 1000 bp, which makes them challenging to detect using short read Illumina sequencing. Ex les demonstrating the contribution of such SV towards eco-geographical adaptation and disease resistance in oilseed rape suggest that revisiting complex plant genomes using medium-coverage, long-read sequencing might reveal unexpected levels of functional gene variation, with major implications for trait regulation and crop improvement.

Publication

Single-cell genomic analysis in plants

Publisher: MDPI AG

Date: 22-01-2018

DOI: 10.3390/GENES9010050

Publication

RunBNG: A software package for BioNano genomic analysis on the command line

Publisher: Oxford University Press (OUP)

Date: 09-06-2017

DOI: 10.1093/BIOINFORMATICS/BTX366

Abstract: We developed runBNG, an open-source software package which wraps BioNano genomic analysis tools into a single script that can be run on the command line. runBNG can complete analyses, including quality control of single molecule maps, optical map de novo assembly, comparisons between different optical maps, super-scaffolding and structural variation detection. Compared to existing software BioNano IrysView and the KSU scripts, the major advantages of runBNG are that the whole pipeline runs on one single platform and it has a high customizability. runBNG is written in bash, with the requirement of BioNano IrysSolve packages, GCC, Perl and Python software. It is freely available at ppliedbioinformatics/runBNG. Supplementary data are available at Bioinformatics online.

HueyTyng Lee

Researcher

Publications

RefKA: A fast and efficient long-read genome assembly approach for large and complex genomes

Analysis of Bisulfite Sequencing Data Using Bismark and DMRcaller to Identify Differentially Methylated Regions

There’s more to it: uncovering genomewide DNA methylation heterogeneity

Assembly and comparison of two closely related Brassica napus genomes

Long‐read sequencing reveals widespread intragenic structural variants in a recent allopolyploid crop plant

The pangenome of hexaploid bread wheat

Draft Genome Sequences of Helicobacter pylori Isolates from Malaysia, Cultured from Patients with Functional Dyspepsia and Gastric Cancer

Chromosome-Scale Assembly of Winter Oilseed Rape Brassica napus

Modelling of gene loss propensity in the pangenomes of three Brassica species suggests different mechanisms between polyploids and diploids

The genome of a southern hemisphere seagrass species (Zostera muelleri)

Genome-wide survey of the seagrass Zostera muelleri suggests modification of the ethylene signalling network

Assessment of Web-Based Consumer Reviews as a Resource for Drug Performance

Transgressive and parental dominant gene expression and cytosine methylation during seed development inBrassica napushybrids

Genomic comparison of two independent seagrass lineages reveals habitat-driven convergent evolution

A reference genome for pea provides insight into legume genome evolution

Long-read sequencing reveals widespread intragenic structural variants in a recent allopolyploid crop plant

Single-cell genomic analysis in plants

RunBNG: A software package for BioNano genomic analysis on the command line

Related Organisations

Malaysian Genomics Resource Center

Justus Liebig Universitat Giessen

Genome Institute Of Singapore

University Of Western Australia

Related Funding Activities

HueyTyng Lee

Researcher

Related Links

Publications

RefKA: A fast and efficient long-read genome assembly approach for large and complex genomes

Analysis of Bisulfite Sequencing Data Using Bismark and DMRcaller to Identify Differentially Methylated Regions

There’s more to it: uncovering genomewide DNA methylation heterogeneity

Assembly and comparison of two closely related Brassica napus genomes

Long‐read sequencing reveals widespread intragenic structural variants in a recent allopolyploid crop plant

The pangenome of hexaploid bread wheat

Draft Genome Sequences of Helicobacter pylori Isolates from Malaysia, Cultured from Patients with Functional Dyspepsia and Gastric Cancer

Chromosome-Scale Assembly of Winter Oilseed Rape Brassica napus

Modelling of gene loss propensity in the pangenomes of three Brassica species suggests different mechanisms between polyploids and diploids

The genome of a southern hemisphere seagrass species (Zostera muelleri)

Genome-wide survey of the seagrass Zostera muelleri suggests modification of the ethylene signalling network

Assessment of Web-Based Consumer Reviews as a Resource for Drug Performance

Transgressive and parental dominant gene expression and cytosine methylation during seed development inBrassica napushybrids

Genomic comparison of two independent seagrass lineages reveals habitat-driven convergent evolution

A reference genome for pea provides insight into legume genome evolution

Long-read sequencing reveals widespread intragenic structural variants in a recent allopolyploid crop plant

Single-cell genomic analysis in plants

RunBNG: A software package for BioNano genomic analysis on the command line

Related Organisations

Malaysian Genomics Resource Center

Justus Liebig Universitat Giessen

Genome Institute Of Singapore

University Of Western Australia

Related Funding Activities

ARDC NEWSLETTER SIGNUP