ORCID Profile
0000-0003-0741-4196
Current Organisations
University of Western Australia
,
The Chinese University of Hong Kong School of Life Sciences
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: Cold Spring Harbor Laboratory
Date: 18-04-2020
DOI: 10.1101/2020.04.17.035287
Abstract: Recent advances in long-read sequencing have the potential to produce more complete genome assemblies using sequence reads which can span repetitive regions. However, overlap based assembly methods routinely used for this data require significant computing time and resources. Here, we have developed RefKA, a reference-based approach for long read genome assembly. This approach relies on breaking up a closely related reference genome into bins, aligning k -mers unique to each bin with PacBio reads, and then assembling each bin in parallel followed by a final bin-stitching step. During benchmarking, we assembled the wheat Chinese Spring (CS) genome using publicly available PacBio reads in parallel in 168 wall hours on a 250 CPU system. The maximum RAM used was 300 Gb and the computing time was 42,000 CPU hours. The approach opens applications for the assembly of other large and complex genomes with much-reduced computing requirements. The RefKA pipeline is available at github.com/AppliedBioinformatics/RefKA
Publisher: Springer Science and Business Media LLC
Date: 26-01-2023
DOI: 10.1186/S13059-023-02861-9
Abstract: A pangenome aims to capture the complete genetic ersity within a species and reduce bias in genetic analysis inherent in using a single reference genome. However, the current linear format of most plant pangenomes limits the presentation of position information for novel sequences. Graph pangenomes have been developed to overcome this limitation. However, bioinformatics analysis tools for graph format genomes are lacking. To overcome this problem, we develop a novel strategy for pangenome construction and a downstream pangenome analysis pipeline (PSVCP) that captures genetic variants’ position information while maintaining a linearized layout. Using PSVCP, we construct a high-quality rice pangenome using 12 representative rice genomes and analyze an international rice panel with 413 erse accessions using the pangenome as the reference. We show that PSVCP successfully identifies causal structural variations for rice grain weight and plant height. Our results provide insights into rice population structure and genomic ersity. We characterize a new locus ( qPH8-1 ) associated with plant height on chromosome 8 undetected by the SNP-based genome-wide association study (GWAS). Our results demonstrate that the pangenome constructed by our pipeline combined with a presence and absence variation-based GWAS can provide additional power for genomic and genetic analysis. The pangenome constructed in this study and the associated genome sequence and genetic variants data provide valuable genomic resources for rice genomics research and improvement in future.
Publisher: MDPI AG
Date: 04-11-2022
Abstract: Copy number variations (CNVs) are defined as deletions, duplications and insertions among in iduals of a species. There is growing evidence that CNV is a major factor underlining various autoimmune disorders and diseases in humans however, in plants, especially oilseed crops, the role of CNVs in disease resistance is not well studied. Here, we investigate the genome-wide ersity and genetic properties of CNVs in resistance gene analogues (RGAs) across eight Brassica napus lines. A total of 1137 CNV events (704 deletions and 433 duplications) were detected across 563 RGAs. The results show CNVs are more likely to occur across clustered RGAs compared to singletons. In addition, 112 RGAs were linked to a blackleg resistance QTL, of which 25 were affected by CNV. Overall, we show that the presence and abundance of CNVs differ between lines, suggesting that in B. napus, the distribution of CNVs depends on genetic background. Our findings advance the understanding of CNV as an important type of genomic structural variation in B. napus and provide a resource to support breeding of advanced canola lines.
Publisher: Springer US
Date: 2022
DOI: 10.1007/978-1-0716-2067-0_13
Abstract: Optical mapping plays an important role in plant genomics, particularly in plant genome assembly and large-scale structural variation detection. While DNA sequencing provides base-by-base nucleotide information, optical mapping shows the physical locations of selected enzyme restriction sites in a genome. The long single-molecule maps produced by optical mapping make it a useful auxiliary technique to DNA sequencing, which generally cannot span large and complex genomic regions. Although optical mapping, therefore, offers unique advantages to researchers, there are few dedicated tools to assist in optical mapping analyses. In this chapter, we present runBNG2, a successor of runBNG to help optical-mapping data analysis for erse datasets.
Publisher: Springer US
Date: 2020
DOI: 10.1007/978-1-0716-0235-5_7
Abstract: Genome-wide association studies (GWAS) are a valuable approach to identify single nucleotide polymorphisms (SNPs) associated with a phenotype of interest. There are now a variety of R-packages and command line tools available to perform GWAS. Here, we provide an ex le downloading and filtering SNP data, followed by GWAS analysis using the R-package rMVP.
Publisher: Springer International Publishing
Date: 2019
Publisher: Springer US
Date: 2020
DOI: 10.1007/978-1-0716-0235-5_3
Abstract: A pangenome is a collection of genomic sequences found in the entire species rather than a single in idual. It allows for comprehensive, species-wide characterization of genetic variations and mining of variable genes which may play important roles in phenotypes of interest. Recent advances in sequencing technologies have facilitated draft genome sequence construction and have made pangenome constructions feasible. Here, we present a reference genome-based iterative mapping and assembly method to construct a pangenome for a legume species.
Publisher: Wiley
Date: 23-03-2017
DOI: 10.1111/PBI.12697
Publisher: Wiley
Date: 14-06-2017
DOI: 10.1111/PBI.12742
Publisher: Wiley
Date: 20-07-2021
DOI: 10.1111/PBI.13646
Abstract: Structural variations (SVs) including gene presence/absence variations and copy number variations are a common feature of genomes in plants and, together with single nucleotide polymorphisms and epigenetic differences, are responsible for the heritable phenotypic ersity observed within and between species. Understanding the contribution of SVs to plant phenotypic variation is important for plant breeders to assist in producing improved varieties. The low resolution of early genetic technologies and inefficient methods have previously limited our understanding of SVs in plants. However, with the rapid expansion in genomic technologies, it is possible to assess SVs with an ever‐greater resolution and accuracy. Here, we review the current status of SV studies in plants, examine the roles that SVs play in phenotypic traits, compare current technologies and assess future challenges for SV studies.
Publisher: Frontiers Media SA
Date: 17-07-2018
Publisher: Elsevier BV
Date: 10-2016
Publisher: Springer Science and Business Media LLC
Date: 30-06-2017
Publisher: Wiley
Date: 28-10-2019
DOI: 10.1111/TPJ.14500
Abstract: We report reference-quality genome assemblies and annotations for two accessions of soybean (Glycine max) and for one accession of Glycine soja, the closest wild relative of G. max. The G. max assemblies provided are for widely used US cultivars: the northern line Williams 82 (Wm82) and the southern line Lee. The Wm82 assembly improves the prior published assembly, and the Lee and G. soja assemblies are new for these accessions. Comparisons among the three accessions show generally high structural conservation, but nucleotide difference of 1.7 single-nucleotide polymorphisms (snps) per kb between Wm82 and Lee, and 4.7 snps per kb between these lines and G. soja. snp distributions and comparisons with genotypes of the Lee and Wm82 parents highlight patterns of introgression and haplotype structure. Comparisons against the US germplasm collection show placement of the sequenced accessions relative to global soybean ersity. Analysis of a pan-gene collection shows generally high conservation, with variation occurring primarily in genomically clustered gene families. We found approximately 40-42 inversions per chromosome between either Lee or Wm82v4 and G. soja, and approximately 32 inversions per chromosome between Wm82 and Lee. We also investigated five domestication loci. For each locus, we found two different alleles with functional differences between G. soja and the two domesticated accessions. The genome assemblies for multiple cultivated accessions and for the closest wild ancestor of soybean provides a valuable set of resources for identifying causal variants that underlie traits for the domestication and improvement of soybean, serving as a basis for future research and crop improvement efforts for this important crop species.
Publisher: Wiley
Date: 24-08-2021
DOI: 10.1111/PBI.13674
Abstract: Plant genomes demonstrate significant presence/absence variation (PAV) within a species however, the factors that lead to this variation have not been studied systematically in Brassica across diploids and polyploids. Here, we developed pangenomes of polyploid Brassica napus and its two diploid progenitor genomes B. rapa and B. oleracea to infer how PAV may differ between diploids and polyploids. Modelling of gene loss suggests that loss propensity is primarily associated with transposable elements in the diploids while in B. napus, gene loss propensity is associated with homoeologous recombination. We use these results to gain insights into the different causes of gene loss, both in diploids and following polyploidization, and pave the way for the application of machine learning methods to understanding the underlying biological and physical causes of gene presence/absence.
Publisher: Elsevier BV
Date: 2020
Publisher: Elsevier BV
Date: 12-2021
Publisher: Elsevier BV
Date: 06-2017
DOI: 10.1016/J.TIBTECH.2017.02.009
Abstract: Second-generation sequencing (SGS) has advanced the study of crop genomes and has provided insights into ersity and evolution. However, repetitive DNA sequences in crops often lead to incomplete or erroneous assemblies because SGS reads are too short to fully resolve these repeats. To overcome some of these challenges, long-read sequencing and optical mapping have been developed to produce high-quality assemblies for complex genomes. Previously, high error rates, low throughput, and high costs have limited the adoption of long-read sequencing and optical mapping. However, with recent improvements and the development of novel algorithms, the application of these technologies is increasing. We review the development of long-read sequencing and optical mapping, and assess their application in crop genomics for breeding improved crops.
Publisher: Cold Spring Harbor Laboratory
Date: 02-04-2023
DOI: 10.1101/2023.03.30.534931
Abstract: Pangenome graphs provide a powerful way to present both sequence and structural features in a given genome relative to the typical features of a population. There are different methods of building pangenome graphs, but few tools are available to visualize them. To address this problem, we developed PanGraphViewer, which is written in Python 3 and runs on all major operating systems. The PanGraphViewer package contains two separate versions: a desktop-based application and a web-based application. Compared to other graph viewers that are initially designed to visualize in idual genome graphs, PanGraphViewer targets pangenome graphs and allows the viewing of pangenome graphs built from multiple genomes in either the (reference) graphical fragment assembly format or the variant call format (VCF). Apart from visualization of different types of structural variations (SV), PanGraphViewer also integrates genome annotations with graph nodes to analyze insertions or deletions in a particular gene model. The graph node shapes in PanGraphViewer can represent different types of genomic variations when a VCF file is used. Notably, PanGraphViewer displays subgraphs from a chromosome or sequence segment based on any given coordinates. This function is absent from most genome graph viewers. PanGraphViewer is freely available at github.com/TF-Chan-Lab anGraphViewer to facilitate pangenome analysis.
Publisher: MDPI AG
Date: 22-01-2018
DOI: 10.3390/GENES9010050
Publisher: Wiley
Date: 24-06-2022
DOI: 10.1002/TPG2.20109
Abstract: The gene content of plants varies between in iduals of the same species due to gene presence/absence variation, and selection can alter the frequency of specific genes in a population. Selection during domestication and breeding will modify the genomic landscape, though the nature of these modifications is only understood for specific genes or on a more general level (e.g., by a loss of genetic ersity). Here we have assembled and analyzed a soybean ( Glycine spp.) pangenome representing more than 1,000 soybean accessions derived from the USDA Soybean Germplasm Collection, including both wild and cultivated lineages, to assess genomewide changes in gene and allele frequency during domestication and breeding. We identified 3,765 genes that are absent from the Lee reference genome assembly and assessed the presence/absence of all genes across this population. In addition to a loss of genetic ersity, we found a significant reduction in the average number of protein‐coding genes per in idual during domestication and subsequent breeding, though with some genes and allelic variants increasing in frequency associated with selection for agronomic traits. This analysis provides a genomic perspective of domestication and breeding in this important oilseed crop.
Publisher: Oxford University Press (OUP)
Date: 09-06-2017
DOI: 10.1093/BIOINFORMATICS/BTX366
Abstract: We developed runBNG, an open-source software package which wraps BioNano genomic analysis tools into a single script that can be run on the command line. runBNG can complete analyses, including quality control of single molecule maps, optical map de novo assembly, comparisons between different optical maps, super-scaffolding and structural variation detection. Compared to existing software BioNano IrysView and the KSU scripts, the major advantages of runBNG are that the whole pipeline runs on one single platform and it has a high customizability. runBNG is written in bash, with the requirement of BioNano IrysSolve packages, GCC, Perl and Python software. It is freely available at ppliedbioinformatics/runBNG. Supplementary data are available at Bioinformatics online.
Publisher: Springer New York
Date: 2017
DOI: 10.1007/978-1-4939-7337-8_18
Abstract: The genomics revolution brought on by advances in high-throughput sequencing has led to the production of vast amounts of data. Databases play an essential role in storing and managing this information to make it available to researchers and crop breeders. This chapter provides an outline of how to use databases and tools for wheat genome research.
Publisher: Wiley
Date: 04-06-2023
DOI: 10.1002/CSC2.21019
Abstract: The changing climate poses significant threats to agriculture and the ability to ensure sufficient global food production. With the expanding population, there is an urgent demand to increase crop productivity to meet the rising food demand. Producing climate‐smart crop varieties together with developing new agronomic management strategies are strategies that may help address this issue. Recent advances in genomics‐assistant breeding, the use of high‐throughput DNA sequencing, high‐resolution phenotyping, and advanced genome engineering can support the development of advanced, climate resilient crops. Here, we assess the potential to enhance the resilience of crops under the changing climate. Through the use of big data, advanced breeding strategies, and advanced agriculture practices, crop varieties could be produced with enhanced resilience and increased productivity and nutrition, supporting future global food security.
Location: Hong Kong
No related grants have been discovered for Yuxuan Yuan.