ORCID Profile
0000-0003-1070-213X
Current Organisations
Murdoch University
,
University of Western Australia
,
Guangdong Academy of Agricultural Sciences
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: Springer US
Date: 2020
DOI: 10.1007/978-1-0716-0235-5_3
Abstract: A pangenome is a collection of genomic sequences found in the entire species rather than a single in idual. It allows for comprehensive, species-wide characterization of genetic variations and mining of variable genes which may play important roles in phenotypes of interest. Recent advances in sequencing technologies have facilitated draft genome sequence construction and have made pangenome constructions feasible. Here, we present a reference genome-based iterative mapping and assembly method to construct a pangenome for a legume species.
Publisher: Cold Spring Harbor Laboratory
Date: 13-12-2021
DOI: 10.1101/2021.12.12.472330
Abstract: The availability of increasing quantities of crop pangenome data permits the detailed association of gene content with agronomic traits. Here, we investigate disease resistance gene content of erse soybean cultivars and report a significant negative correlation between the number of NLR resistance ( R ) genes and yield. We find no association between R- genes with seed weight, oil or protein content, and we find no correlation between yield and the number of RLK, RLP genes, or the total number of genes. These results suggest that recent yield improvement in soybean may be partially associated with the selective loss of NLR genes. Three quarters of soybean NLR genes do not show presence/absence variation, limiting the ability to select for their absence, and so the deletion or disabling of select NLR genes may support future yield improvement.
Publisher: MDPI AG
Date: 31-05-2018
Publisher: Springer Science and Business Media LLC
Date: 09-02-2022
DOI: 10.1007/S00122-022-04045-8
Abstract: The major soy protein QTL, cqProt-003, was analysed for haplotype ersity and global distribution, and results indicate 304 bp deletion and variable tandem repeats in protein coding regions are likely causal candidates. Here, we present association and linkage analysis of 985 wild, landrace and cultivar soybean accessions in a pan genomic dataset to characterize the major high-protein/low-oil associated locus cqProt-003 located on chromosome 20. A significant trait-associated region within a 173 kb linkage block was identified, and variants in the region were characterized, identifying 34 high confidence SNPs, 4 insertions, 1 deletion and a larger 304 bp structural variant in the high-protein haplotype. Trinucleotide tandem repeats of variable length present in the second exon of gene Glyma.20G085100 are strongly correlated with the high-protein phenotype and likely represent causal variation. Structural variation has previously been found in the same gene, for which we report the global distribution of the 304 bp deletion and have identified additional nested variation present in high-protein in iduals. Mapping variation at the cqProt-003 locus across demographic groups suggests that the high-protein haplotype is common in wild accessions (94.7%), rare in landraces (10.6%) and near absent in cultivated breeding pools (4.1%), suggesting its decrease in frequency primarily correlates with domestication and continued during subsequent improvement. However, the variation that has persisted in under-utilized wild and landrace populations holds high breeding potential for breeders willing to forego seed oil to maximize protein content. The results of this study include the identification of distinct haplotype structures within the high-protein population, and a broad characterization of the genomic context and linkage patterns of cqProt-003 across global populations, supporting future functional characterization and modification.
Publisher: Oxford University Press (OUP)
Date: 19-01-2023
DOI: 10.1093/HR/UHAD005
Abstract: Rhodomyrtus tomentosa is an important fleshy-fruited tree and a well-known medicinal plant of the Myrtaceae family that is widely cultivated in tropical and subtropical areas of the world. However, studies on the evolution and genomic breeding of R. tomentosa were hindered by the lack of a reference genome. Here, we presented a chromosome-level gap-free T2T genome assembly of R. tomentosa using PacBio and ONT long read sequencing. We assembled the genome with size of 470.35 Mb and contig N50 of ~43.80 Mb with 11 pseudochromosomes. A total of 33 382 genes and 239.31 Mb of repetitive sequences were annotated in this genome. Phylogenetic analysis elucidated the independent evolution of R. tomentosa starting from 14.37MYA and shared a recent WGD event with other Myrtaceae species. We identified four major compounds of anthocyanins and their synthetic pathways in R. tomentosa. Comparative genomic and gene expression analysis suggested the coloring and high anthocyanin accumulation in R. tomentosa tends to be determined by the activation of anthocyanin synthesis pathway. The positive selection and up-regulation of MYB transcription factors were the implicit factors in this process. The copy number increase of downstream anthocyanin transport-related OMT and GST gene were also detected in R. tomentosa. Expression analysis and pathway identification enriched the importance of starch degradation, response to stimuli, effect of hormones, and cell wall metabolism during the fleshy fruit development in Myrtaceae. Our genome assembly provided a foundation for investigating the origins and differentiation of Myrtaceae species and accelerated the genetic improvement of R. tomentosa.
Publisher: Wiley
Date: 28-10-2019
DOI: 10.1111/TPJ.14500
Abstract: We report reference-quality genome assemblies and annotations for two accessions of soybean (Glycine max) and for one accession of Glycine soja, the closest wild relative of G. max. The G. max assemblies provided are for widely used US cultivars: the northern line Williams 82 (Wm82) and the southern line Lee. The Wm82 assembly improves the prior published assembly, and the Lee and G. soja assemblies are new for these accessions. Comparisons among the three accessions show generally high structural conservation, but nucleotide difference of 1.7 single-nucleotide polymorphisms (snps) per kb between Wm82 and Lee, and 4.7 snps per kb between these lines and G. soja. snp distributions and comparisons with genotypes of the Lee and Wm82 parents highlight patterns of introgression and haplotype structure. Comparisons against the US germplasm collection show placement of the sequenced accessions relative to global soybean ersity. Analysis of a pan-gene collection shows generally high conservation, with variation occurring primarily in genomically clustered gene families. We found approximately 40-42 inversions per chromosome between either Lee or Wm82v4 and G. soja, and approximately 32 inversions per chromosome between Wm82 and Lee. We also investigated five domestication loci. For each locus, we found two different alleles with functional differences between G. soja and the two domesticated accessions. The genome assemblies for multiple cultivated accessions and for the closest wild ancestor of soybean provides a valuable set of resources for identifying causal variants that underlie traits for the domestication and improvement of soybean, serving as a basis for future research and crop improvement efforts for this important crop species.
Publisher: Cold Spring Harbor Laboratory
Date: 25-02-2022
DOI: 10.1101/2022.02.23.481560
Abstract: Bread wheat is one of humanity’s most important staple crops, characterized by a large and complex genome with a high level of gene presence/absence variation between cultivars, h ering genomic approaches for crop improvement. With the growing global population and the increasing impact of climate change on crop yield, there is an urgent need to apply genomic approaches to accelerate wheat breeding. With recent advances in DNA sequencing technology, a growing number of high-quality reference genomes are becoming available, reflecting the genetic content of a erse range of cultivars. However, information on the presence or absence of genomic regions has been hard to visualize and interrogate due to the size of these genomes and the lack of suitable bioinformatics tools. To address this limitation, we have produced a wheat pangenome graph maintained within an online database to facilitate interrogation and comparison of wheat cultivar genomes. The database allows users to visualize regions of the pangenome to assess presence/absence variation between bread wheat genomes. Database URL: www.appliedbioinformatics.com.au/wheat_panache
Publisher: Elsevier BV
Date: 10-2022
DOI: 10.1016/J.PLAPHY.2022.07.027
Abstract: Extreme weather events have become more frequent, increasing crop yield fluctuations in many regions and thus the risk to global food security. Breeding crop cultivars with improved tolerance to a combination of abiotic stresses is an effective solution to counter the adverse impact of climate change. The ever-increasing genomic data and analytical tools provide unprecedented opportunities to mine genes with tolerance to multiple abiotic stresses through bioinformatics analysis. We undertook an integrated meta-analysis using 260 transcriptome data of barley related to drought, salt, heat, cold, and waterlogging stresses. A total of 223 shared differentially expressed genes (DEGs) were identified in response to five abiotic stresses, and significantly enriched in 'glutathione metabolism' and 'monoterpenoid biosynthesis' pathways. Using weighted gene co-expression network analysis (WGCNA), we further identified 15 hub genes (e.g., MYB, WRKY, NADH, and GST4) and selected the GST4 gene for functional validation. HvGST4 overexpression in Arabidopsis thaliana enhanced the tolerance to multiple abiotic stresses, likely through increasing the content of glutathione to scavenge reactive oxygen species and alleviate cell membrane peroxidation. Furthermore, we showed that virus-induced gene silencing (VIGS) of HvGST4 in barley leaves exacerbated cell membrane peroxidation under five abiotic stresses, reducing tolerance to multiple abiotic stress. Our study provides a new solution for identifying genes with tolerance to multiple abiotic stresses based on meta-analysis, which could contribute to breeding new varieties adapted genetically to adverse environmental conditions.
Publisher: MDPI AG
Date: 23-04-2021
Abstract: Sulfate transporters (SULTRs), also known as H+/SO42− symporters, play a key role in sulfate transport, plant growth and stress responses. However, the evolutionary relationships and functional differentiation of SULTRs in Gramineae crops are rarely reported. Here, 111 SULTRs were retrieved from the genomes of 10 Gramineae species, including Brachypodium disachyon, Hordeum vulgare, Setaria italica, Sorghum bicolor, Zea mays, Oryza barthii, Oryza rufipogon, Oryza glabbermia and Oryza sativa (Oryza sativa ssp. indica and Oryza sativa ssp. japonica). The SULTRs were clustered into five clades based on a phylogenetic analysis. Syntheny analysis indicates that whole-genome duplication/segmental duplication and tandem duplication events were essential in the SULTRs family expansion. We further found that different clades and orthologous groups of SULTRs were under a strong purifying selective force. Expression analysis showed that rice SULTRs with high-affinity transporters are associated with the functions of sulfate uptake and transport during rice seedling development. Furthermore, using Oryza sativa ssp. indica as a model species, we found that OsiSULTR10 was significantly upregulated under salt stress, while OsiSULTR3 and OsiSULTR12 showed remarkable upregulation under high temperature, low-selenium and drought stresses. OsiSULTR3 and OsiSULTR9 were upregulated under both low-selenium and high-selenium stresses. This study illustrates the expression and evolutionary patterns of the SULTRs family in Gramineae species, which will facilitate further studies of SULTR in other Gramineae species.
Publisher: Cold Spring Harbor Laboratory
Date: 13-10-2021
DOI: 10.1101/2021.10.12.464159
Abstract: Here, we present association and linkage analysis of 985 wild, landrace and cultivar soybean accessions in a pan genomic dataset to characterize the major high-protein/low-oil associated locus cqProt-003 located on chromosome 20. A significant trait associated region within a 173 kb linkage block was identified and variants in the region were characterised, identifying 34 high confidence SNPs, 4 insertions, 1 deletion and a larger 304 bp structural variant in the high-protein haplotype. Trinucleotide tandem repeats of variable length present in the third exon of gene 20G085100 are strongly correlated with the high-protein phenotype and likely represent causal variation. Structural variation has previously been found in the same gene, for which we report the global distribution of the 304bp deletion and have identified additional nested variation present in high-protein in iduals. Mapping variation at the cqProt-003 locus across demographic groups suggests that the high-protein haplotype is common in wild accessions (94.7%), rare in landraces (10.6%) and near absent in cultivated breeding pools (4.1%), suggesting its decrease in frequency primarily correlates with domestication and continued during subsequent improvement. However, the variation that has persisted in under-utilized wild and landrace populations holds high breeding potential for breeders willing to forego seed oil to maximise protein content. The results of this study include the identification of distinct haplotype structures within the high-protein population, and a broad characterization of the genomic context and linkage patterns of cqProt-003 across global populations, supporting future functional characterisation and modification. The major soy protein QTL, cqProt-003, was analysed for haplotype ersity and global distribution, results indicate 304bp deletion and variable tandem repeats in protein coding regions are likely causal candidates.
Publisher: Springer Science and Business Media LLC
Date: 10-08-2023
DOI: 10.1038/S41597-023-02434-2
Abstract: Wild barley, from “Evolution Canyon (EC)” in Mount Carmel, Israel, are ideal models for cereal chromosome evolution studies. Here, the wild barley EC_S1 is from the south slope with higher daily temperatures and drought, while EC_N1 is from the north slope with a cooler climate and higher relative humidity, which results in a differentiated selection due to contrasting environments. We assembled a 5.03 Gb genome with contig N50 of 3.53 Mb for wild barley EC_S1 and a 5.05 Gb genome with contig N50 of 3.45 Mb for EC_N1 using 145 Gb and 160.0 Gb Illumina sequencing data, 295.6 Gb and 285.35 Gb Nanopore sequencing data and 555.1 Gb and 514.5 Gb Hi-C sequencing data, respectively. BUSCOs and CEGMA evaluation suggested highly complete assemblies. Using full-length transcriptome data, we predicted 39,179 and 38,373 high-confidence genes in EC_S1 and EC_N1, in which 93.6% and 95.2% were functionally annotated, respectively. We annotated repetitive elements and non-coding RNAs. These two wild barley genome assemblies will provide a rich gene pool for domesticated barley.
Publisher: Springer Science and Business Media LLC
Date: 26-01-2023
DOI: 10.1186/S13059-023-02861-9
Abstract: A pangenome aims to capture the complete genetic ersity within a species and reduce bias in genetic analysis inherent in using a single reference genome. However, the current linear format of most plant pangenomes limits the presentation of position information for novel sequences. Graph pangenomes have been developed to overcome this limitation. However, bioinformatics analysis tools for graph format genomes are lacking. To overcome this problem, we develop a novel strategy for pangenome construction and a downstream pangenome analysis pipeline (PSVCP) that captures genetic variants’ position information while maintaining a linearized layout. Using PSVCP, we construct a high-quality rice pangenome using 12 representative rice genomes and analyze an international rice panel with 413 erse accessions using the pangenome as the reference. We show that PSVCP successfully identifies causal structural variations for rice grain weight and plant height. Our results provide insights into rice population structure and genomic ersity. We characterize a new locus ( qPH8-1 ) associated with plant height on chromosome 8 undetected by the SNP-based genome-wide association study (GWAS). Our results demonstrate that the pangenome constructed by our pipeline combined with a presence and absence variation-based GWAS can provide additional power for genomic and genetic analysis. The pangenome constructed in this study and the associated genome sequence and genetic variants data provide valuable genomic resources for rice genomics research and improvement in future.
Publisher: Wiley
Date: 27-07-2023
DOI: 10.1111/PBI.14077
Publisher: Oxford University Press (OUP)
Date: 30-07-2021
Abstract: Domestication and breeding have reshaped the genomic architecture of chicken, but the retention and loss of genomic elements during these evolutionary processes remain unclear. We present the first chicken pan-genome constructed using 664 in iduals, which identified an additional approximately 66.5-Mb sequences that are absent from the reference genome (GRCg6a). The constructed pan-genome encoded 20,491 predicated protein-coding genes, of which higher expression levels are observed in conserved genes relative to dispensable genes. Presence/absence variation (PAV) analyses demonstrated that gene PAV in chicken was shaped by selection, genetic drift, and hybridization. PAV-based genome-wide association studies identified numerous candidate mutations related to growth, carcass composition, meat quality, or physiological traits. Among them, a deletion in the promoter region of IGF2BP1 affecting chicken body size is reported, which is supported by functional studies and extra s les. This is the first time to report the causal variant of chicken body size quantitative trait locus located at chromosome 27 which was repeatedly reported. Therefore, the chicken pan-genome is a useful resource for biological discovery and breeding. It improves our understanding of chicken genome ersity and provides materials to unveil the evolution history of chicken domestication.
Publisher: Wiley
Date: 29-05-2022
DOI: 10.1002/TPG2.20221
Abstract: Bread wheat ( Triticum aestivum L.) is one of humanity's most important staple crops, characterized by a large and complex genome with a high level of gene presence–absence variation (PAV) between cultivars, h ering genomic approaches for crop improvement. With the growing global population and the increasing impact of climate change on crop yield, there is an urgent need to apply genomic approaches to accelerate wheat breeding. With recent advances in DNA sequencing technology, a growing number of high‐quality reference genomes are becoming available, reflecting the genetic content of a erse range of cultivars. However, information on the presence or absence of genomic regions has been hard to visualize and interrogate because of the size of these genomes and the lack of suitable bioinformatics tools. To address this limitation, we have produced a wheat pangenome graph maintained within an online database to facilitate interrogation and comparison of wheat cultivar genomes. The database allows users to visualize regions of the pangenome to assess PAV between bread wheat genomes.
Publisher: Springer Science and Business Media LLC
Date: 14-04-2021
Publisher: Wiley
Date: 28-12-2022
DOI: 10.1002/TPG2.20184
Abstract: In the last decade, more than 70 quantitative trait loci (QTL) related to soybean [Glycine max (L.) Merr.] partial resistance (PR) against Phytophthora sojae have been identified by genome-wide association studies (GWAS). However, most of them have either a minor effect on the resistance level or are specific to a single phenotypic variable or one isolate, thereby limiting their use in breeding programs. In this study, we have used an analytical approach combining (a) the phenotypic characterization of a erse panel of 357 soybean accessions for resistance to P. sojae captured through a single variable, corrected dry weight (b) a new hydroponic assay allowing the inoculation of a combination of P. sojae isolates covering the spectrum of commercially relevant Rps genes and (c) exhaustive genotyping through whole-genome resequencing (WGS). This led to the identification of a novel P. sojae resistance QTL with a relatively major effect compared with the previously reported QTL. The QTL interval, spanning ∼500 kb on chromosome (Chr) 15, does not colocalize with previously reported QTL for P. sojae resistance. Plants carrying the favorable allele at this QTL were 60% more resistant. Eight genes were found to reside in the linkage disequilibrium (LD) block containing the peak single-nucleotide polymorphism (SNP) including Glyma.15G217100, which encodes a major latex protein (MLP)-like protein, with a functional annotation related to pathogen resistance. Expression analysis of Glyma.15G217100 indicated that it was nearly eight times more highly expressed in a group of plant introductions (PIs) carrying the resistant (R) allele compared with those carrying the susceptible (S) allele within a short period after inoculation. These results offer new and valuable options to develop improved soybean cultivars with broad resistance to P. sojae through marker-assisted selection.
Publisher: Cold Spring Harbor Laboratory
Date: 02-10-2023
Publisher: Frontiers Media SA
Date: 21-03-2023
DOI: 10.3389/FPLS.2023.1147946
Abstract: Yellowhorn ( Xanthoceras sorbifolia ) is a species of deciduous tree that is native to Northern and Central China, including Loess Plateau. The yellowhorn tree is a hardy plant, tolerating a wide range of growing conditions, and is often grown for ornamental purposes in parks, gardens, and other landscaped areas. The seeds of yellowhorn are edible and contain rich oil and fatty acid contents, making it an ideal plant for oil production. However, the mechanism of its ability to adapt to extreme environments and the genetic basis of oil synthesis remains to be elucidated. In this study, we reported a high-quality and near gap-less yellowhorn genome assembly, containing the highest genome continuity with a contig N50 of 32.5 Mb. Comparative genomics analysis showed that 1,237 and 231 gene families under expansion and the yellowhorn-specific gene family NB-ARC were enriched in photosynthesis and root cap development, which may contribute to the environmental adaption and abiotic stress resistance of yellowhorn. A 3-ketoacyl-CoA thiolase ( KAT ) gene ( Xso_LG02_00600 ) was identified under positive selection, which may be associated with variations of seed oil content among different yellowhorn cultivars. This study provided insights into environmental adaptation and seed oil content variations of yellowhorn to accelerate its genetic improvement.
Publisher: Wiley
Date: 19-08-2022
DOI: 10.1111/NPH.17658
Publisher: Wiley
Date: 16-11-2022
DOI: 10.1111/PBI.13917
Abstract: Divergent selection of populations in contrasting environments leads to functional genomic ergence. However, the genomic architecture underlying heterogeneous genomic differentiation remains poorly understood. Here, we de novo assembled two high-quality wild barley (Hordeum spontaneum K. Koch) genomes and examined genomic differentiation and gene expression patterns under abiotic stress in two populations. These two populations had a shared ancestry and originated in close geographic proximity but experienced different selective pressures due to their contrasting micro-environments. We identified structural variants that may have played significant roles in affecting genes potentially associated with well-differentiated phenotypes such as flowering time and drought response between two wild barley genomes. Among them, a 29-bp insertion into the promoter region formed a cis-regulatory element in the HvWRKY45 gene, which may contribute to enhanced tolerance to drought. A single SNP mutation in the promoter region may influence HvCO5 expression and be putatively linked to local flowering time adaptation. We also revealed significant genomic differentiation between the two populations with ongoing gene flow. Our results indicate that SNPs and small SVs link to genetic differentiation at the gene level through local adaptation and are maintained through ergent selection. In contrast, large chromosome inversions may have shaped the heterogeneous pattern of genomic differentiation along the chromosomes by suppressing chromosome recombination and gene flow. Our research offers novel insights into the genomic basis underlying local adaptation and provides valuable resources for the genetic improvement of cultivated barley.
Publisher: Springer Science and Business Media LLC
Date: 06-02-2023
DOI: 10.1186/S12915-022-01503-Z
Abstract: Gene duplication is a prevalent phenomenon and a major driving force underlying genome evolution. The process leading to the fixation of gene duplicates following duplication is critical to understand how genome evolves but remains fragmentally understood. Most previous studies on gene retention are based on gene duplicate analyses in single reference genome. No population-based comparative gene retention analysis has been performed to date. Taking advantage of recently published genomic data in Triticeae , we dissected a ergent homogentisate phytyltransferase ( HPT2 ) lineage caught in the middle stage of gene fixation following duplication. The presence/absence of HPT2 in barley (diploid), wild emmer (tetraploid), and bread wheat (hexaploid) pangenome lines appears to be associated with gene dosage constraint and environmental adaption. Based on these observations, we adopted a phylogeny-based orthology inference approach and performed comparative gene retention analyses across barley, wild emmer, and bread wheat. This led to the identification of 326 HPT2-pattern-like genes at whole genome scale, representing a pool of gene duplicates in the middle stage of gene fixation. Majority of these HPT2-pattern-like genes were identified as small-scale duplicates, such as dispersed, tandem, and proximal duplications. Natural selection analyses showed that HPT2-pattern-like genes have experienced relaxed selection pressure, which is generally accompanied with partial positive selection and transcriptional ergence. Functional enrichment analyses showed that HPT2-pattern-like genes are over-represented with molecular-binding and defense response functions, supporting the potential role of environmental adaption during gene retention. We also observed that gene duplicates from larger gene family are more likely to be lost, implying a gene dosage constraint effect. Further comparative gene retention analysis in barley and bread wheat pangenome lines revealed combined effects of species-specific selection and gene dosage constraint. Comparative gene retention analyses at the population level support gene dosage constraint, environmental adaption, and species-specific selection as three factors that may affect gene retention following gene duplication. Our findings shed light on the evolutionary process leading to the retention of newly formed gene duplicates and will greatly improve our understanding on genome evolution via duplication.
Publisher: MDPI AG
Date: 22-01-2018
DOI: 10.3390/GENES9010050
Publisher: Wiley
Date: 24-06-2022
DOI: 10.1002/TPG2.20109
Abstract: The gene content of plants varies between in iduals of the same species due to gene presence/absence variation, and selection can alter the frequency of specific genes in a population. Selection during domestication and breeding will modify the genomic landscape, though the nature of these modifications is only understood for specific genes or on a more general level (e.g., by a loss of genetic ersity). Here we have assembled and analyzed a soybean ( Glycine spp.) pangenome representing more than 1,000 soybean accessions derived from the USDA Soybean Germplasm Collection, including both wild and cultivated lineages, to assess genomewide changes in gene and allele frequency during domestication and breeding. We identified 3,765 genes that are absent from the Lee reference genome assembly and assessed the presence/absence of all genes across this population. In addition to a loss of genetic ersity, we found a significant reduction in the average number of protein‐coding genes per in idual during domestication and subsequent breeding, though with some genes and allelic variants increasing in frequency associated with selection for agronomic traits. This analysis provides a genomic perspective of domestication and breeding in this important oilseed crop.
Publisher: American Association for the Advancement of Science (AAAS)
Date: 2023
Publisher: Springer Science and Business Media LLC
Date: 08-04-2022
DOI: 10.1186/S12870-022-03559-Z
Abstract: Recent growth in crop genomic and trait data have opened opportunities for the application of novel approaches to accelerate crop improvement. Machine learning and deep learning are at the forefront of prediction-based data analysis. However, few approaches for genotype to phenotype prediction compare machine learning with deep learning and further interpret the models that support the predictions. This study uses genome wide molecular markers and traits across 1110 soybean in iduals to develop accurate prediction models. For 13/14 sets of predictions, XGBoost or random forest outperformed deep learning models in prediction performance. Top ranked SNPs by F-score were identified from XGBoost, and with further investigation found overlap with significantly associated loci identified from GWAS and previous literature. Feature importance rankings were used to reduce marker input by up to 90%, and subsequent models maintained or improved their prediction performance. These findings support interpretable machine learning as an approach for genomic based prediction of traits in soybean and other crops.
No related grants have been discovered for HAIFEI HU.