ARDC Research Link Australia

Publication

Bioinformatics and plant breeding.

Publisher: CABI

Date: 2020

DOI: 10.1079/9781789240214.0071

Publication

The application of pangenomics and machine learning in genomic selection in plants.

Publisher: Wiley

Date: 20-07-2021

DOI: 10.1002/TPG2.20112

Abstract: Genomic selection approaches have increased the speed of plant breeding, leading to growing crop yields over the last decade. However, climate change is impacting current and future yields, resulting in the need to further accelerate breeding efforts to cope with these changing conditions. Here we present approaches to accelerate plant breeding and incorporate nonadditive effects in genomic selection by applying state-of-the-art machine learning approaches. These approaches are made more powerful by the inclusion of pangenomes, which represent the entire genome content of a species. Understanding the strengths and limitations of machine learning methods, compared with more traditional genomic selection efforts, is paramount to the successful application of these methods in crop breeding. We describe ex les of genomic selection and pangenome-based approaches in crop breeding, discuss machine learning-specific challenges, and highlight the potential for the application of machine learning in genomic selection. We believe that careful implementation of machine learning approaches will support crop improvement to help counter the adverse outcomes of climate change on crop production.

Publication

Pangenomes as a Resource to Accelerate Breeding of Under-Utilised Crop Species

Publisher: MDPI AG

Date: 28-02-2022

DOI: 10.3390/IJMS23052671

Abstract: Pangenomes are a rich resource to examine the genomic variation observed within a species or genera, supporting population genetics studies, with applications for the improvement of crop traits. Major crop species such as maize (Zea mays), rice (Oryza sativa), Brassica (Brassica spp.), and soybean (Glycine max) have had pangenomes constructed and released, and this has led to the discovery of valuable genes associated with disease resistance and yield components. However, pangenome data are not available for many less prominent crop species that are currently under-utilised. Despite many under-utilised species being important food sources in regional populations, the scarcity of genomic data for these species hinders their improvement. Here, we assess several under-utilised crops and review the pangenome approaches that could be used to build resources for their improvement. Many of these under-utilised crops are cultivated in arid or semi-arid environments, suggesting that novel genes related to drought tolerance may be identified and used for introgression into related major crop species. In addition, we discuss how previously collected data could be used to enrich pangenome functional analysis in genome-wide association studies (GWAS) based on studies in major crops. Considering the technological advances in genome sequencing, pangenome references for under-utilised species are becoming more obtainable, offering the opportunity to identify novel genes related to agro-morphological traits in these species.

Publication

The Global Assessment of Oilseed Brassica Crop Species Yield, Yield Stability and the Underlying Genetics

Publisher: MDPI AG

Date: 17-10-2022

DOI: 10.3390/PLANTS11202740

Abstract: The global demand for oilseeds is increasing along with the human population. The family of Brassicaceae crops are no exception, typically harvested as a valuable source of oil, rich in beneficial molecules important for human health. The global capacity for improving Brassica yield has steadily risen over the last 50 years, with the major crop Brassica napus (rapeseed, canola) production increasing to ~72 Gt in 2020. In contrast, the production of Brassica mustard crops has fluctuated, rarely improving in farming efficiency. The drastic increase in global yield of B. napus is largely due to the demand for a stable source of cooking oil. Furthermore, with the adoption of highly efficient farming techniques, yield enhancement programs, breeding programs, the integration of high-throughput phenotyping technology and establishing the underlying genetics, B. napus yields have increased by fold since 1978. Yield stability has been improved with new management strategies targeting diseases and pests, as well as by understanding the complex interaction of environment, phenotype and genotype. This review assesses the global yield and yield stability of agriculturally important oilseed Brassica species and discusses how contemporary farming and genetic techniques have driven improvements.

Publication

DNABERT-based explainable lncRNA identification in plant genome assemblies

Publisher: Cold Spring Harbor Laboratory

Date: 10-02-2022

DOI: 10.1101/2022.02.09.479647

Abstract: Long non-coding ribonucleic acids (lncRNAs) have been shown to play an important role in plant gene regulation, being involved in both epigenetic and transcript regulation. LncRNAs are transcripts longer than 200 nucleotides that are not translated into functional proteins but can be translated into small peptides. Machine learning and deep learning models have predominantly used transcriptome data with manually defined features to detect lncRNAs, however, they often underrepresent the abundance of lncRNAs and can be biased in their detection. Here we present a study using Natural Language Processing (NLP) models to identify plant lncRNAs from genomic sequences rather than transcriptomic data. The NLP models were trained to predict lncRNAs for seven model and crop species ( Zea mays, Arabidopsis thaliana, Brassica napus, Brassica oleracea, Brassica rapa, Glycine max and Oryza sativa ) using publicly available genomic references. We demonstrated that lncRNAs can be accurately predicted from genomic sequences, and that genome assembly quality affects the accuracy of lncRNA identification. Furthermore, we demonstrated that the NLP models are applicable for cross-species prediction as they could predict lncRNAs from a species not used to train the model, with an average of 61% accuracy. Finally, we show that the models can be interpreted using explainable artificial intelligence to identify motifs important for lncRNA prediction and that these motifs were frequently present flanking the lncRNA sequence. We demonstrate for the first time the identification of lncRNAs from genomic sequences, instead of transcriptome sequences, allowing the identification of lowly expressed lncRNAs. A deep learning model (natural language processing) was employed for the prediction of lncRNAs in two monocot and five dicot plant species. We used explainable machine learning to extract the genomic motifs associated with lncRNA identification, highlighting potentially conserved structures present in more than one plant species.

Publication

Resources for image-based high-throughput phenotyping in crops and data sharing challenges

Publisher: Oxford University Press (OUP)

Date: 28-06-2021

DOI: 10.1093/PLPHYS/KIAB301

Abstract: High-throughput phenotyping (HTP) platforms are capable of monitoring the phenotypic variation of plants through multiple types of sensors, such as red green and blue (RGB) cameras, hyperspectral sensors, and computed tomography, which can be associated with environmental and genotypic data. Because of the wide range of information provided, HTP datasets represent a valuable asset to characterize crop phenotypes. As HTP becomes widely employed with more tools and data being released, it is important that researchers are aware of these resources and how they can be applied to accelerate crop improvement. Researchers may exploit these datasets either for phenotype comparison or employ them as a benchmark to assess tool performance and to support the development of tools that are better at generalizing between different crops and environments. In this review, we describe the use of image-based HTP for yield prediction, root phenotyping, development of climate-resilient crops, detecting pathogen and pest infestation, and quantitative trait measurement. We emphasize the need for researchers to share phenotypic data, and offer a comprehensive list of available datasets to assist crop breeders and tool developers to leverage these resources in order to accelerate crop breeding.

Publication

Genetic and Genomic Resources for Soybean Breeding Research

Publisher: MDPI AG

Date: 27-04-2022

DOI: 10.3390/PLANTS11091181

Abstract: Soybean (Glycine max) is a legume species of significant economic and nutritional value. The yield of soybean continues to increase with the breeding of improved varieties, and this is likely to continue with the application of advanced genetic and genomic approaches for breeding. Genome technologies continue to advance rapidly, with an increasing number of high-quality genome assemblies becoming available. With accumulating data from marker arrays and whole-genome resequencing, studying variations between in iduals and populations is becoming increasingly accessible. Furthermore, the recent development of soybean pangenomes has highlighted the significant structural variation between in iduals, together with knowledge of what has been selected for or lost during domestication and breeding, information that can be applied for the breeding of improved cultivars. Because of this, resources such as genome assemblies, SNP datasets, pangenomes and associated databases are becoming increasingly important for research underlying soybean crop improvement.

Publication

Maize Yield Prediction at an Early Developmental Stage Using Multispectral Images and Genotype Data for Preliminary Hybrid Selection

Publisher: MDPI AG

Date: 04-10-2021

DOI: 10.3390/RS13193976

Abstract: Assessing crop production in the field often requires breeders to wait until the end of the season to collect yield-related measurements, limiting the pace of the breeding cycle. Early prediction of crop performance can reduce this constraint by allowing breeders more time to focus on the highest-performing varieties. Here, we present a multimodal deep learning model for predicting the performance of maize (Zea mays) at an early developmental stage, offering the potential to accelerate crop breeding. We employed multispectral images and eight vegetation indices, collected by an uncrewed aerial vehicle approximately 60 days after sowing, over three consecutive growing cycles (2017, 2018 and 2019). The multimodal deep learning approach was used to integrate field management and genotype information with the multispectral data, providing context to the conditions that the plants experienced during the trial. Model performance was assessed using holdout data, in which the model accurately predicted the yield (RMSE 1.07 t/ha, a relative RMSE of 7.60% of 16 t/ha, and R2 score 0.73) and identified the majority of high-yielding varieties, outperforming previously published models for early yield prediction. The inclusion of vegetation indices was important for model performance, with a normalized difference vegetation index and green with normalized difference vegetation index contributing the most to model performance. The model provides a decision support tool, identifying promising lines early in the field trial.

Publication

Segmentation of Sandplain Lupin Weeds from Morphologically Similar Narrow-Leafed Lupins in the Field

Publisher: MDPI AG

Date: 29-03-2023

DOI: 10.3390/RS15071817

Abstract: Narrow-leafed lupin (Lupinus angustifolius) is an important dryland crop, providing a protein source in global grain markets. While agronomic practices have successfully controlled many dicot weeds among narrow-leafed lupins, the closely related sandplain lupin (Lupinus cosentinii) has proven difficult to control, reducing yield and harvest quality. Here, we successfully trained a segmentation model to detect sandplain lupins and differentiate them from narrow-leafed lupins under field conditions. The deep learning model was trained using 9171 images collected from a field site in the Western Australian grain belt. Images were collected using an unoccupied aerial vehicle at heights of 4, 10, and 20 m. The dataset was supplemented with images sourced from the WeedAI database, which were collected at 1.5 m. The resultant model had an average precision of 0.86, intersection over union of 0.60, and F1 score of 0.70 for segmenting the narrow-leafed and sandplain lupins across the multiple datasets. Images collected at a closer range and showing plants at an early developmental stage had significantly higher precision and recall scores (p-value 0.05), indicating image collection methods and plant developmental stages play a substantial role in the model performance. Nonetheless, the model identified 80.3% of the sandplain lupins on average, with a low variation (±6.13%) in performance across the 5 datasets. The results presented in this study contribute to the development of precision weed management systems within morphologically similar crops, particularly for sandplain lupin detection, supporting future narrow-leafed lupin grain yield and quality.

Publication

Plant pangenomics

Publisher: Elsevier BV

Date: 04-2020

DOI: 10.1016/J.PBI.2019.12.005

Abstract: With the assembly of increasing numbers of plant genomes, it is becoming accepted that a single reference assembly does not reflect the gene ersity of a species. The production of pangenomes, which reflect the structural variation and polymorphisms in genomes, enables in depth comparisons of variation within species or higher taxonomic groups. In this review, we discuss the current and emerging approaches for pangenome assembly, analysis and visualisation. In addition, we consider the potential of pangenomes for applied crop improvement, evolutionary and bio ersity studies. To fully exploit the value of pangenomes it is important to integrate broad information such as phenotypic, environmental, and expression data to gain insights into the role of variable regions within genomes.

Publication

Expanding Gene-Editing Potential in Crop Improvement with Pangenomes

Publisher: MDPI AG

Date: 18-02-2022

DOI: 10.3390/IJMS23042276

Abstract: Pangenomes aim to represent the complete repertoire of the genome ersity present within a species or cohort of species, capturing the genomic structural variance between in iduals. This genomic information coupled with phenotypic data can be applied to identify genes and alleles involved with abiotic stress tolerance, disease resistance, and other desirable traits. The characterisation of novel structural variants from pangenomes can support genome editing approaches such as Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR associated protein Cas (CRISPR-Cas), providing functional information on gene sequences and new target sites in variant-specific genes with increased efficiency. This review discusses the application of pangenomes in genome editing and crop improvement, focusing on the potential of pangenomes to accurately identify target genes for CRISPR-Cas editing of plant genomes while avoiding adverse off-target effects. We consider the limitations of applying CRISPR-Cas editing with pangenome references and potential solutions to overcome these limitations.

Publication

High-Throughput Genotyping Technologies in Plant Taxonomy

Publisher: Springer US

Date: 11-12-2020

DOI: 10.1007/978-1-0716-0997-2_9

Monica Furaste Danilevicz

Researcher

Publications

Bioinformatics and plant breeding.

The application of pangenomics and machine learning in genomic selection in plants.

Pangenomes as a Resource to Accelerate Breeding of Under-Utilised Crop Species

The Global Assessment of Oilseed Brassica Crop Species Yield, Yield Stability and the Underlying Genetics

DNABERT-based explainable lncRNA identification in plant genome assemblies

Resources for image-based high-throughput phenotyping in crops and data sharing challenges

Genetic and Genomic Resources for Soybean Breeding Research

Maize Yield Prediction at an Early Developmental Stage Using Multispectral Images and Genotype Data for Preliminary Hybrid Selection

Segmentation of Sandplain Lupin Weeds from Morphologically Similar Narrow-Leafed Lupins in the Field

Plant pangenomics

Expanding Gene-Editing Potential in Crop Improvement with Pangenomes

High-Throughput Genotyping Technologies in Plant Taxonomy

Related Organisations

University Of Western Australia

Universidade Federal Do Rio De Janeiro

Related Funding Activities

Monica Furaste Danilevicz

Researcher

Related Links

Publications

Bioinformatics and plant breeding.

The application of pangenomics and machine learning in genomic selection in plants.

Pangenomes as a Resource to Accelerate Breeding of Under-Utilised Crop Species

The Global Assessment of Oilseed Brassica Crop Species Yield, Yield Stability and the Underlying Genetics

DNABERT-based explainable lncRNA identification in plant genome assemblies

Resources for image-based high-throughput phenotyping in crops and data sharing challenges

Genetic and Genomic Resources for Soybean Breeding Research

Maize Yield Prediction at an Early Developmental Stage Using Multispectral Images and Genotype Data for Preliminary Hybrid Selection

Segmentation of Sandplain Lupin Weeds from Morphologically Similar Narrow-Leafed Lupins in the Field

Plant pangenomics

Expanding Gene-Editing Potential in Crop Improvement with Pangenomes

High-Throughput Genotyping Technologies in Plant Taxonomy

Related Organisations

University Of Western Australia

Universidade Federal Do Rio De Janeiro

Related Funding Activities

ARDC NEWSLETTER SIGNUP