ORCID Profile
0000-0003-0413-6397
Current Organisation
Australian Genome Research Facility Ltd Melbourne
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Terrestrial Ecology | Plant Physiology | Ecological Impacts of Climate Change | Ecological Applications |
Climate Change Adaptation Measures | Forest and Woodlands Flora, Fauna and Biodiversity
Publisher: Springer Science and Business Media LLC
Date: 20-01-2016
Publisher: Cold Spring Harbor Laboratory
Date: 18-04-2020
DOI: 10.1101/2020.04.17.035287
Abstract: Recent advances in long-read sequencing have the potential to produce more complete genome assemblies using sequence reads which can span repetitive regions. However, overlap based assembly methods routinely used for this data require significant computing time and resources. Here, we have developed RefKA, a reference-based approach for long read genome assembly. This approach relies on breaking up a closely related reference genome into bins, aligning k -mers unique to each bin with PacBio reads, and then assembling each bin in parallel followed by a final bin-stitching step. During benchmarking, we assembled the wheat Chinese Spring (CS) genome using publicly available PacBio reads in parallel in 168 wall hours on a 250 CPU system. The maximum RAM used was 300 Gb and the computing time was 42,000 CPU hours. The approach opens applications for the assembly of other large and complex genomes with much-reduced computing requirements. The RefKA pipeline is available at github.com/AppliedBioinformatics/RefKA
Publisher: Wiley
Date: 22-08-2015
DOI: 10.1111/PBI.12240
Abstract: Despite being a major international crop, our understanding of the wheat genome is relatively poor due to its large size and complexity. To gain a greater understanding of wheat genome ersity, we have identified single nucleotide polymorphisms between 16 Australian bread wheat varieties. Whole-genome shotgun Illumina paired read sequence data were mapped to the draft assemblies of chromosomes 7A, 7B and 7D to identify more than 4 million intervarietal SNPs. SNP density varied between the three genomes, with much greater density observed on the A and B genomes than the D genome. This variation may be a result of substantial gene flow from the tetraploid Triticum turgidum, which possesses A and B genomes, during early co-cultivation of tetraploid and hexaploid wheat. In addition, we examined SNP density variation along the chromosome syntenic builds and identified genes in low-density regions which may have been selected during domestication and breeding. This study highlights the impact of evolution and breeding on the bread wheat genome and provides a substantial resource for trait association and crop improvement. All SNP data are publically available on a generic genome browser GBrowse at www.wheatgenome.info.
Publisher: Springer Science and Business Media LLC
Date: 24-10-2014
Publisher: Oxford University Press (OUP)
Date: 10-02-2022
DOI: 10.1093/G3JOURNAL/JKAC034
Abstract: Shrimp are a valuable aquaculture species globally however, disease remains a major hindrance to shrimp aquaculture sustainability and growth. Mechanisms mediated by endogenous viral elements have been proposed as a means by which shrimp that encounter a new virus start to accommodate rather than succumb to infection over time. However, evidence on the nature of such endogenous viral elements and how they mediate viral accommodation is limited. More extensive genomic data on Penaeid shrimp from different geographical locations should assist in exposing the ersity of endogenous viral elements. In this context, reported here is a PacBio Sequel-based draft genome assembly of an Australian black tiger shrimp (Penaeus monodon) inbred for 1 generation. The 1.89 Gbp draft genome is comprised of 31,922 scaffolds (N50: 496,398 bp) covering 85.9% of the projected genome size. The genome repeat content (61.8% with 30% representing simple sequence repeats) is almost the highest identified for any species. The functional annotation identified 35,517 gene models, of which 25,809 were protein-coding and 17,158 were annotated using interproscan. Scaffold scanning for specific endogenous viral elements identified an element comprised of a 9,045-bp stretch of repeated, inverted, and jumbled genome fragments of infectious hypodermal and hematopoietic necrosis virus bounded by a repeated 591/590 bp host sequence. As only near complete linear ∼4 kb infectious hypodermal and hematopoietic necrosis virus genomes have been found integrated in the genome of P. monodon previously, its discovery has implications regarding the validity of PCR tests designed to specifically detect such linear endogenous viral element types. The existence of joined inverted infectious hypodermal and hematopoietic necrosis virus genome fragments also provides a means by which hairpin double-stranded RNA could be expressed and processed by the shrimp RNA interference machinery.
Publisher: Wiley
Date: 14-06-2017
DOI: 10.1111/PBI.12742
Publisher: Wiley
Date: 05-04-2017
DOI: 10.1111/TPJ.13515
Abstract: There is an increasing understanding that variation in gene presence-absence plays an important role in the heritability of agronomic traits however, there have been relatively few studies on variation in gene presence-absence in crop species. Hexaploid wheat is one of the most important food crops in the world and intensive breeding has reduced the genetic ersity of elite cultivars. Major efforts have produced draft genome assemblies for the cultivar Chinese Spring, but it is unknown how well this represents the genome ersity found in current modern elite cultivars. In this study we build an improved reference for Chinese Spring and explore gene ersity across 18 wheat cultivars. We predict a pangenome size of 140 500 ± 102 genes, a core genome of 81 070 ± 1631 genes and an average of 128 656 genes in each cultivar. Functional annotation of the variable gene set suggests that it is enriched for genes that may be associated with important agronomic traits. In addition to variation in gene presence, more than 36 million intervarietal single nucleotide polymorphisms were identified across the pangenome. This study of the wheat pangenome provides insight into genome ersity in elite wheat as a basis for genomics-based improvement of this important crop. A wheat pangenome, GBrowse, is available at appliedbioinformatics.com.au/cgi-bin/gb2/gbrowse/WheatPan/, and data are available to download from heat_genome_databases.php.
Publisher: Wiley
Date: 15-10-2019
DOI: 10.1111/PBI.13015
Publisher: Oxford University Press (OUP)
Date: 06-01-2015
DOI: 10.1093/JXB/ERU510
Publisher: Oxford University Press (OUP)
Date: 05-04-2019
DOI: 10.1093/BIB/BBY016
Abstract: Improving productivity of the staple crops wheat and rice is essential to feed the growing global population, particularly in the context of a changing climate. However, current rates of yield gain are insufficient to support the predicted population growth. New approaches are required to accelerate the breeding process, and many of these are driven by the application of large-scale crop data. To leverage the substantial volumes and types of data that can be applied for precision breeding, the wheat and rice research communities are working towards the development of integrated systems to access and standardize the dispersed, heterogeneous available data. Here, we outline the initiatives of the International Wheat Information System (WheatIS) and the International Rice Informatics Consortium (IRIC) to establish Web-based single-access systems and data mining tools to make the available resources more accessible, drive discovery and accelerate the production of new crop varieties. We discuss the progress of WheatIS and IRIC towards unifying specialized wheat and rice databases and building custom software platforms to manage and interrogate these data. Single-access crop information systems will strengthen scientific collaboration, optimize the use of public research funds and help achieve the required yield gains in the two most important global food crops.
Publisher: Springer Science and Business Media LLC
Date: 27-11-2019
DOI: 10.1007/S10142-018-0647-3
Abstract: Next-generation DNA sequencing technologies, such as RNA-Seq, currently dominate genome-wide gene expression studies. A standard approach to analyse this data requires mapping sequence reads to a reference and counting the number of reads which map to each gene. However, for many transcriptome studies, a suitable reference genome is unavailable, especially for meta-transcriptome studies which assay gene expression from mixed populations of organisms. Where a reference is unavailable, it is possible to generate a reference by the de novo assembly of the sequence reads. However, the high cost of generating high-coverage data for de novo assembly hinders this approach and more importantly the accurate assembly of such data is challenging, especially for meta-transcriptome data, and resulting assemblies frequently suffer from collapsed regions or chimeric sequences. As an alternative to the standard reference mapping approach, we have developed a k-mer-based analysis pipeline (DiffKAP) to identify differentially expressed reads between RNA-Seq datasets without the requirement for a reference. We compared the DiffKAP approach with the traditional Tophat/Cuffdiff method using RNA-Seq data from soybean, which has a suitable reference genome. We subsequently examined differential gene expression for a coral meta-transcriptome where no reference is available, and validated the results using qRT-PCR. We conclude that DiffKAP is an accurate method to study differential gene expression in complex meta-transcriptomes without the requirement of a reference genome.
Publisher: IEEE
Date: 2007
DOI: 10.1109/FBIT.2007.70
Publisher: Springer New York
Date: 2017
DOI: 10.1007/978-1-4939-7337-8_18
Abstract: The genomics revolution brought on by advances in high-throughput sequencing has led to the production of vast amounts of data. Databases play an essential role in storing and managing this information to make it available to researchers and crop breeders. This chapter provides an outline of how to use databases and tools for wheat genome research.
Publisher: Elsevier BV
Date: 11-2017
DOI: 10.1016/J.BIORTECH.2017.06.003
Abstract: To map out key lipid-related pathways that lead to rapid triacylglyceride accumulation in oleaginous microalgae, RNA-Seq was performed with Tetraselmis sp. M8 at 24h after exhaustion of exogenous nitrogen to reveal molecular changes during early stationary phase. Further gene expression profiling by quantitative real-time PCR at 16-72h revealed a distinct shift in expression of the fatty acid/triacylglyceride biosynthesis and β-oxidation pathways, when cells transitioned from log-phase into early-stationary and stationary phase. Metabolic reconstruction modeling combined with real-time PCR and RNA-Seq gene expression data indicates that the increased lipid accumulation is a result of a decrease in lipid catabolism during the early-stationary phase combined with increased metabolic fluxes in lipid biosynthesis during the stationary phase. During these two stages, Tetraselmis shifts from reduced lipid consumption to active lipid production. This process appears to be independent from DGAT expression, a key gene for lipid accumulation in microalgae.
Publisher: Springer New York
Date: 2016
DOI: 10.1007/978-1-4939-3167-5_18
Abstract: The recent advances in high throughput RNA sequencing (RNA-Seq) have generated huge amounts of data in a very short span of time for a single s le. These data have required the parallel advancement of computing tools to organize and interpret them meaningfully in terms of biological implications, at the same time using minimum computing resources to reduce computation costs. Here we describe the method of analyzing RNA-seq data using the set of open source software programs of the Tuxedo suite: TopHat and Cufflinks. TopHat is designed to align RNA-seq reads to a reference genome, while Cufflinks assembles these mapped reads into possible transcripts and then generates a final transcriptome assembly. Cufflinks also includes Cuffdiff, which accepts the reads assembled from two or more biological conditions and analyzes their differential expression of genes and transcripts, thus aiding in the investigation of their transcriptional and post transcriptional regulation under different conditions. We also describe the use of an accessory tool called CummeRbund, which processes the output files of Cuffdiff and gives an output of publication quality plots and figures of the user's choice. We demonstrate the effectiveness of the Tuxedo suite by analyzing RNA-Seq datasets of Arabidopsis thaliana root subjected to two different conditions.
Publisher: Springer Science and Business Media LLC
Date: 11-11-2016
DOI: 10.1038/NCOMMS13390
Abstract: There is an increasing awareness that as a result of structural variation, a reference sequence representing a genome of a single in idual is unable to capture all of the gene repertoire found in the species. A large number of genes affected by presence/absence and copy number variation suggest that it may contribute to phenotypic and agronomic trait ersity. Here we show by analysis of the Brassica oleracea pangenome that nearly 20% of genes are affected by presence/absence variation. Several genes displaying presence/absence variation are annotated with functions related to major agronomic traits, including disease resistance, flowering time, glucosinolate metabolism and vitamin biosynthesis.
Publisher: Hindawi Limited
Date: 2008
DOI: 10.1155/2008/513701
Abstract: Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining 7 % – 15 % speed improvement.
Publisher: Springer Science and Business Media LLC
Date: 28-04-2008
Abstract: In metagenomic studies, a process called binning is necessary to assign contigs that belong to multiple species to their respective phylogenetic groups. Most of the current methods of binning, such as BLAST, k -mer and PhyloPythia, involve assigning sequence fragments by comparing sequence similarity or sequence composition with already-sequenced genomes that are still far from comprehensive. We propose a semi-supervised seeding method for binning that does not depend on knowledge of completed genomes. Instead, it extracts the flanking sequences of highly conserved 16S rRNA from the metagenome and uses them as seeds (labels) to assign other reads based on their compositional similarity. The proposed seeding method is implemented on an unsupervised Growing Self-Organising Map (GSOM), and called Seeded GSOM (S-GSOM). We compared it with four well-known semi-supervised learning methods in a preliminary test, separating random-length prokaryotic sequence fragments s led from the NCBI genome database. We identified the flanking sequences of the highly conserved 16S rRNA as suitable seeds that could be used to group the sequence fragments according to their species. S-GSOM showed superior performance compared to the semi-supervised methods tested. Additionally, S-GSOM may also be used to visually identify some species that do not have seeds. The proposed method was then applied to simulated metagenomic datasets using two different confidence threshold settings and compared with PhyloPythia, k -mer and BLAST. At the reference taxonomic level Order, S-GSOM outperformed all k -mer and BLAST results and showed comparable results with PhyloPythia for each of the corresponding confidence settings, where S-GSOM performed better than PhyloPythia in the ≥ 10 reads datasets and comparable in the ≥ 8 kb benchmark tests. In the task of binning using semi-supervised learning methods, results indicate S-GSOM to be the best of the methods tested. Most importantly, the proposed method does not require knowledge from known genomes and uses only very few labels (one per species is sufficient in most cases), which are extracted from the metagenome itself. These advantages make it a very attractive binning method. S-GSOM outperformed the binning methods that depend on already-sequenced genomes, and compares well to the current most advanced binning method, PhyloPythia.
Publisher: Springer Berlin Heidelberg
Date: 2009
Publisher: Wiley
Date: 03-05-2011
Publisher: Springer New York
Date: 2016
Publisher: Springer Science and Business Media LLC
Date: 10-03-2015
DOI: 10.1007/S00122-015-2488-Y
Abstract: We characterise the distribution of crossover and non-crossover recombination in Brassica napus and Cicer arietinum using a low-coverage genotyping by sequencing pipeline SkimGBS. The growth of next-generation DNA sequencing technologies has led to a rapid increase in sequence-based genotyping for applications including ersity assessment, genome structure validation and gene-trait association. We have established a skim-based genotyping by sequencing method for crop plants and applied this approach to genotype-segregating populations of Brassica napus and Cicer arietinum. Comparison of progeny genotypes with those of the parental in iduals allowed the identification of crossover and non-crossover (gene conversion) events. Our results identify the positions of recombination events with high resolution, permitting the mapping and frequency assessment of recombination in segregating populations.
Publisher: Springer Science and Business Media LLC
Date: 30-06-2017
Publisher: Oxford University Press (OUP)
Date: 03-07-2016
DOI: 10.1104/PP.16.00868
Publisher: Wiley
Date: 10-01-2018
DOI: 10.1111/PBI.12867
Publisher: Oxford University Press (OUP)
Date: 18-04-2018
DOI: 10.1093/JXB/ERY147
Publisher: Wiley
Date: 28-01-2019
DOI: 10.1111/TPJ.14194
Abstract: Advances in sequencing technology have led to a rapid rise in the genomic data available for plants, driving new insights into the evolution, domestication and improvement of crops. Single nucleotide polymorphisms (SNPs) are a major component of crop genomic ersity, and are invaluable as genetic markers in research and breeding programs. High-throughput SNP arrays, or 'SNP chips', can generate reproducible sets of informative SNP markers and have been broadly adopted. Although there are many public repositories for sequencing data, which are routinely uploaded, there are no formal repositories for crop SNP array data. To make SNP array data more easily accessible, we have developed CropSNPdb (snpdb.appliedbioinformatics.com.au), a database for SNP array data produced by the Illumina Infinium™ hexaploid bread wheat (Triticum aestivum) 90K and Brassica 60K arrays. We currently host SNPs from datasets covering 526 Brassica lines and 309 bread wheat lines, and provide search, download and upload utilities for users. CropSNPdb provides a useful repository for these data, which can be applied for a range of genomics and molecular crop-breeding activities.
Publisher: Public Library of Science (PLoS)
Date: 28-10-2015
Publisher: Springer Science and Business Media LLC
Date: 2014
Location: Australia
Start Date: 03-2021
End Date: 02-2024
Amount: $410,237.00
Funder: Australian Research Council
View Funded Activity