ARDC Research Link Australia

Publication

RefKA: A fast and efficient long-read genome assembly approach for large and complex genomes

Publisher: Cold Spring Harbor Laboratory

Date: 18-04-2020

DOI: 10.1101/2020.04.17.035287

Abstract: Recent advances in long-read sequencing have the potential to produce more complete genome assemblies using sequence reads which can span repetitive regions. However, overlap based assembly methods routinely used for this data require significant computing time and resources. Here, we have developed RefKA, a reference-based approach for long read genome assembly. This approach relies on breaking up a closely related reference genome into bins, aligning k -mers unique to each bin with PacBio reads, and then assembling each bin in parallel followed by a final bin-stitching step. During benchmarking, we assembled the wheat Chinese Spring (CS) genome using publicly available PacBio reads in parallel in 168 wall hours on a 250 CPU system. The maximum RAM used was 300 Gb and the computing time was 42,000 CPU hours. The approach opens applications for the assembly of other large and complex genomes with much-reduced computing requirements. The RefKA pipeline is available at github.com/AppliedBioinformatics/RefKA

Publication

A pangenome analysis pipeline provides insights into functional gene identification in rice

Publisher: Springer Science and Business Media LLC

Date: 26-01-2023

DOI: 10.1186/S13059-023-02861-9

Abstract: A pangenome aims to capture the complete genetic ersity within a species and reduce bias in genetic analysis inherent in using a single reference genome. However, the current linear format of most plant pangenomes limits the presentation of position information for novel sequences. Graph pangenomes have been developed to overcome this limitation. However, bioinformatics analysis tools for graph format genomes are lacking. To overcome this problem, we develop a novel strategy for pangenome construction and a downstream pangenome analysis pipeline (PSVCP) that captures genetic variants’ position information while maintaining a linearized layout. Using PSVCP, we construct a high-quality rice pangenome using 12 representative rice genomes and analyze an international rice panel with 413 erse accessions using the pangenome as the reference. We show that PSVCP successfully identifies causal structural variations for rice grain weight and plant height. Our results provide insights into rice population structure and genomic ersity. We characterize a new locus ( qPH8-1 ) associated with plant height on chromosome 8 undetected by the SNP-based genome-wide association study (GWAS). Our results demonstrate that the pangenome constructed by our pipeline combined with a presence and absence variation-based GWAS can provide additional power for genomic and genetic analysis. The pangenome constructed in this study and the associated genome sequence and genetic variants data provide valuable genomic resources for rice genomics research and improvement in future.

Publication

Copy Number Variation among Resistance Genes Analogues in Brassica napus

Publisher: MDPI AG

Date: 04-11-2022

DOI: 10.3390/GENES13112037

Abstract: Copy number variations (CNVs) are defined as deletions, duplications and insertions among in iduals of a species. There is growing evidence that CNV is a major factor underlining various autoimmune disorders and diseases in humans however, in plants, especially oilseed crops, the role of CNVs in disease resistance is not well studied. Here, we investigate the genome-wide ersity and genetic properties of CNVs in resistance gene analogues (RGAs) across eight Brassica napus lines. A total of 1137 CNV events (704 deletions and 433 duplications) were detected across 563 RGAs. The results show CNVs are more likely to occur across clustered RGAs compared to singletons. In addition, 112 RGAs were linked to a blackleg resistance QTL, of which 25 were affected by CNV. Overall, we show that the presence and abundance of CNVs differ between lines, suggesting that in B. napus, the distribution of CNVs depends on genetic background. Our findings advance the understanding of CNV as an important type of genomic structural variation in B. napus and provide a resource to support breeding of advanced canola lines.

Publication

Applications of Optical Mapping for Plant Genome Assembly and Structural Variation Detection

Publisher: Springer US

Date: 2022

DOI: 10.1007/978-1-0716-2067-0_13

Abstract: Optical mapping plays an important role in plant genomics, particularly in plant genome assembly and large-scale structural variation detection. While DNA sequencing provides base-by-base nucleotide information, optical mapping shows the physical locations of selected enzyme restriction sites in a genome. The long single-molecule maps produced by optical mapping make it a useful auxiliary technique to DNA sequencing, which generally cannot span large and complex genomic regions. Although optical mapping, therefore, offers unique advantages to researchers, there are few dedicated tools to assist in optical mapping analyses. In this chapter, we present runBNG2, a successor of runBNG to help optical-mapping data analysis for erse datasets.

Publication

Method for Genome-Wide Association Study

Publisher: Springer US

Date: 2020

DOI: 10.1007/978-1-0716-0235-5_7

Abstract: Genome-wide association studies (GWAS) are a valuable approach to identify single nucleotide polymorphisms (SNPs) associated with a phenotype of interest. There are now a variety of R-packages and command line tools available to perform GWAS. Here, we provide an ex le downloading and filtering SNP data, followed by GWAS analysis using the R-package rMVP.

Publication

Using genomics to adapt crops to climate change

Publisher: Springer International Publishing

Date: 2019

DOI: 10.1007/978-3-319-77878-5_5

Publication

Legume pangenome construction using an iterative mapping and assembly approach

Publisher: Springer US

Date: 2020

DOI: 10.1007/978-1-0716-0235-5_3

Abstract: A pangenome is a collection of genomic sequences found in the entire species rather than a single in idual. It allows for comprehensive, species-wide characterization of genetic variations and mining of variable genes which may play important roles in phenotypes of interest. Recent advances in sequencing technologies have facilitated draft genome sequence construction and have made pangenome constructions feasible. Here, we present a reference genome-based iterative mapping and assembly method to construct a pangenome for a legume species.

Publication

An advanced reference genome of Trifolium subterraneum L. reveals genes related to agronomic performance

Publisher: Wiley

Date: 23-03-2017

DOI: 10.1111/PBI.12697

Publication

Assembly and comparison of two closely related Brassica napus genomes

Publisher: Wiley

Date: 14-06-2017

DOI: 10.1111/PBI.12742

Publication

Current status of structural variation studies in plants

Publisher: Wiley

Date: 20-07-2021

DOI: 10.1111/PBI.13646

Abstract: Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic ersity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever‐greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.

Publication

Large-Scale Structural Variation Detection in Subterranean Clover Subtypes Using Optical Mapping

Publisher: Frontiers Media SA

Date: 17-07-2018

DOI: 10.3389/FPLS.2018.00971

Publication

Advances in genomics for adapting crops to climate change

Publisher: Elsevier BV

Date: 10-2016

DOI: 10.1016/J.CPB.2016.09.001

Publication

BioNanoAnalyst: A visualisation tool to assess genome assembly quality using BioNano data

Publisher: Springer Science and Business Media LLC

Date: 30-06-2017

DOI: 10.1186/S12859-017-1735-4

Publication

Construction and comparison of three reference‐quality genome assemblies for soybean

Publisher: Wiley

Date: 28-10-2019

DOI: 10.1111/TPJ.14500

Abstract: We report reference-quality genome assemblies and annotations for two accessions of soybean (Glycine max) and for one accession of Glycine soja, the closest wild relative of G. max. The G. max assemblies provided are for widely used US cultivars: the northern line Williams 82 (Wm82) and the southern line Lee. The Wm82 assembly improves the prior published assembly, and the Lee and G. soja assemblies are new for these accessions. Comparisons among the three accessions show generally high structural conservation, but nucleotide difference of 1.7 single-nucleotide polymorphisms (snps) per kb between Wm82 and Lee, and 4.7 snps per kb between these lines and G. soja. snp distributions and comparisons with genotypes of the Lee and Wm82 parents highlight patterns of introgression and haplotype structure. Comparisons against the US germplasm collection show placement of the sequenced accessions relative to global soybean ersity. Analysis of a pan-gene collection shows generally high conservation, with variation occurring primarily in genomically clustered gene families. We found approximately 40-42 inversions per chromosome between either Lee or Wm82v4 and G. soja, and approximately 32 inversions per chromosome between Wm82 and Lee. We also investigated five domestication loci. For each locus, we found two different alleles with functional differences between G. soja and the two domesticated accessions. The genome assemblies for multiple cultivated accessions and for the closest wild ancestor of soybean provides a valuable set of resources for identifying causal variants that underlie traits for the domestication and improvement of soybean, serving as a basis for future research and crop improvement efforts for this important crop species.

Publication

Modelling of gene loss propensity in the pangenomes of three Brassica species suggests different mechanisms between polyploids and diploids

Publisher: Wiley

Date: 24-08-2021

DOI: 10.1111/PBI.13674

Abstract: Plant genomes demonstrate significant presence/absence variation (PAV) within a species however, the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidization, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.

Publication

Advances in optical mapping for genomic research

Publisher: Elsevier BV

Date: 2020

DOI: 10.1016/J.CSBJ.2020.07.018

Publication

Toward haplotype studies in polyploid plants to assist breeding

Publisher: Elsevier BV

Date: 12-2021

DOI: 10.1016/J.MOLP.2021.11.004

Publication

Improvements in Genomic Technologies

Publisher: Elsevier BV

Date: 06-2017

DOI: 10.1016/J.TIBTECH.2017.02.009

Abstract: Second-generation sequencing (SGS) has advanced the study of crop genomes and has provided insights into ersity and evolution. However, repetitive DNA sequences in crops often lead to incomplete or erroneous assemblies because SGS reads are too short to fully resolve these repeats. To overcome some of these challenges, long-read sequencing and optical mapping have been developed to produce high-quality assemblies for complex genomes. Previously, high error rates, low throughput, and high costs have limited the adoption of long-read sequencing and optical mapping. However, with recent improvements and the development of novel algorithms, the application of these technologies is increasing. We review the development of long-read sequencing and optical mapping, and assess their application in crop genomics for breeding improved crops.

Publication

PanGraphViewer: A Versatile Tool to Visualize Pangenome Graphs

Publisher: Cold Spring Harbor Laboratory

Date: 02-04-2023

DOI: 10.1101/2023.03.30.534931

Abstract: Pangenome graphs provide a powerful way to present both sequence and structural features in a given genome relative to the typical features of a population. There are different methods of building pangenome graphs, but few tools are available to visualize them. To address this problem, we developed PanGraphViewer, which is written in Python 3 and runs on all major operating systems. The PanGraphViewer package contains two separate versions: a desktop-based application and a web-based application. Compared to other graph viewers that are initially designed to visualize in idual genome graphs, PanGraphViewer targets pangenome graphs and allows the viewing of pangenome graphs built from multiple genomes in either the (reference) graphical fragment assembly format or the variant call format (VCF). Apart from visualization of different types of structural variations (SV), PanGraphViewer also integrates genome annotations with graph nodes to analyze insertions or deletions in a particular gene model. The graph node shapes in PanGraphViewer can represent different types of genomic variations when a VCF file is used. Notably, PanGraphViewer displays subgraphs from a chromosome or sequence segment based on any given coordinates. This function is absent from most genome graph viewers. PanGraphViewer is freely available at github.com/TF-Chan-Lab anGraphViewer to facilitate pangenome analysis.

Publication

Single-cell genomic analysis in plants

Publisher: MDPI AG

Date: 22-01-2018

DOI: 10.3390/GENES9010050

Publication

Sequencing the USDA core soybean collection reveals gene loss during domestication and breeding

Publisher: Wiley

Date: 24-06-2022

DOI: 10.1002/TPG2.20109

Abstract: The gene content of plants varies between in iduals of the same species due to gene presence/absence variation, and selection can alter the frequency of specific genes in a population. Selection during domestication and breeding will modify the genomic landscape, though the nature of these modifications is only understood for specific genes or on a more general level (e.g., by a loss of genetic ersity). Here we have assembled and analyzed a soybean ( Glycine spp.) pangenome representing more than 1,000 soybean accessions derived from the USDA Soybean Germplasm Collection, including both wild and cultivated lineages, to assess genomewide changes in gene and allele frequency during domestication and breeding. We identified 3,765 genes that are absent from the Lee reference genome assembly and assessed the presence/absence of all genes across this population. In addition to a loss of genetic ersity, we found a significant reduction in the average number of protein‐coding genes per in idual during domestication and subsequent breeding, though with some genes and allelic variants increasing in frequency associated with selection for agronomic traits. This analysis provides a genomic perspective of domestication and breeding in this important oilseed crop.

Publication

RunBNG: A software package for BioNano genomic analysis on the command line

Publisher: Oxford University Press (OUP)

Date: 09-06-2017

DOI: 10.1093/BIOINFORMATICS/BTX366

Abstract: We developed runBNG, an open-source software package which wraps BioNano genomic analysis tools into a single script that can be run on the command line. runBNG can complete analyses, including quality control of single molecule maps, optical map de novo assembly, comparisons between different optical maps, super-scaffolding and structural variation detection. Compared to existing software BioNano IrysView and the KSU scripts, the major advantages of runBNG are that the whole pipeline runs on one single platform and it has a high customizability. runBNG is written in bash, with the requirement of BioNano IrysSolve packages, GCC, Perl and Python software. It is freely available at ppliedbioinformatics/runBNG. Supplementary data are available at Bioinformatics online.

Publication

Databases for wheat genomics and crop improvement

Publisher: Springer New York

Date: 2017

DOI: 10.1007/978-1-4939-7337-8_18

Abstract: The genomics revolution brought on by advances in high-throughput sequencing has led to the production of vast amounts of data. Databases play an essential role in storing and managing this information to make it available to researchers and crop breeders. This chapter provides an outline of how to use databases and tools for wheat genome research.

Publication

Supporting crop plant resilience during climate change

Publisher: Wiley

Date: 04-06-2023

DOI: 10.1002/CSC2.21019

Abstract: The changing climate poses significant threats to agriculture and the ability to ensure sufficient global food production. With the expanding population, there is an urgent demand to increase crop productivity to meet the rising food demand. Producing climate‐smart crop varieties together with developing new agronomic management strategies are strategies that may help address this issue. Recent advances in genomics‐assistant breeding, the use of high‐throughput DNA sequencing, high‐resolution phenotyping, and advanced genome engineering can support the development of advanced, climate resilient crops. Here, we assess the potential to enhance the resilience of crops under the changing climate. Through the use of big data, advanced breeding strategies, and advanced agriculture practices, crop varieties could be produced with enhanced resilience and increased productivity and nutrition, supporting future global food security.

Yuxuan Yuan

Researcher

Publications

RefKA: A fast and efficient long-read genome assembly approach for large and complex genomes

A pangenome analysis pipeline provides insights into functional gene identification in rice

Copy Number Variation among Resistance Genes Analogues in Brassica napus

Applications of Optical Mapping for Plant Genome Assembly and Structural Variation Detection

Method for Genome-Wide Association Study

Using genomics to adapt crops to climate change

Legume pangenome construction using an iterative mapping and assembly approach

An advanced reference genome of Trifolium subterraneum L. reveals genes related to agronomic performance

Assembly and comparison of two closely related Brassica napus genomes

Current status of structural variation studies in plants

Large-Scale Structural Variation Detection in Subterranean Clover Subtypes Using Optical Mapping

Advances in genomics for adapting crops to climate change

BioNanoAnalyst: A visualisation tool to assess genome assembly quality using BioNano data

Construction and comparison of three reference‐quality genome assemblies for soybean

Modelling of gene loss propensity in the pangenomes of three Brassica species suggests different mechanisms between polyploids and diploids

Advances in optical mapping for genomic research

Toward haplotype studies in polyploid plants to assist breeding

Improvements in Genomic Technologies

PanGraphViewer: A Versatile Tool to Visualize Pangenome Graphs

Single-cell genomic analysis in plants

Sequencing the USDA core soybean collection reveals gene loss during domestication and breeding

RunBNG: A software package for BioNano genomic analysis on the command line

Databases for wheat genomics and crop improvement

Supporting crop plant resilience during climate change

Related Organisations

University Of Western Australia

The Chinese University Of Hong Kong School Of Life Sciences

Related Funding Activities

Yuxuan Yuan

Researcher

Related Links

Publications

RefKA: A fast and efficient long-read genome assembly approach for large and complex genomes

A pangenome analysis pipeline provides insights into functional gene identification in rice

Copy Number Variation among Resistance Genes Analogues in Brassica napus

Applications of Optical Mapping for Plant Genome Assembly and Structural Variation Detection

Method for Genome-Wide Association Study

Using genomics to adapt crops to climate change

Legume pangenome construction using an iterative mapping and assembly approach

An advanced reference genome of Trifolium subterraneum L. reveals genes related to agronomic performance

Assembly and comparison of two closely related Brassica napus genomes

Current status of structural variation studies in plants

Large-Scale Structural Variation Detection in Subterranean Clover Subtypes Using Optical Mapping

Advances in genomics for adapting crops to climate change

BioNanoAnalyst: A visualisation tool to assess genome assembly quality using BioNano data

Construction and comparison of three reference‐quality genome assemblies for soybean

Modelling of gene loss propensity in the pangenomes of three Brassica species suggests different mechanisms between polyploids and diploids

Advances in optical mapping for genomic research

Toward haplotype studies in polyploid plants to assist breeding

Improvements in Genomic Technologies

PanGraphViewer: A Versatile Tool to Visualize Pangenome Graphs

Single-cell genomic analysis in plants

Sequencing the USDA core soybean collection reveals gene loss during domestication and breeding

RunBNG: A software package for BioNano genomic analysis on the command line

Databases for wheat genomics and crop improvement

Supporting crop plant resilience during climate change

Related Organisations

University Of Western Australia

The Chinese University Of Hong Kong School Of Life Sciences

Related Funding Activities

ARDC NEWSLETTER SIGNUP