ARDC Research Link Australia

ORCID Profile
Orcid icon. 0000-0003-1315-4896

Current Organisations
Macquarie University , Fundação Oswaldo Cruz , CSIRO

Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.

Publications

Publication

Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing

Publisher: American Chemical Society (ACS)

Date: 25-05-2022

DOI: 10.1021/ACS.JPROTEOME.1C00968

Abstract: Alternative splicing can lead to distinct protein isoforms. These can have different functions in specific cells and tissues or in different developmental stages. In this study, we explored whether transcripts assembled from long read, nanopore-based, direct RNA-sequencing (RNA-seq) could improve the identification of protein isoforms in human K562 cells. By comparing with Illumina-based short read RNA-seq, we showed that a large proportion of Ensembl transcripts (5949/14,326) and genes expressing alternatively spliced transcripts (486/2981) identified with long direct reads were missed by short paired-end reads. By co-analyzing proteomic and transcriptomic data, we also showed that some peptides (826/35,976), proteins (262/3215), and protein isoforms arising from distinct transcript variants (574/1212) identified with isoform-specific peptides via custom long-read-based databases were missed in Illumina-derived databases. Finally, we generated unequivocal peptide evidence for a set of protein isoforms and showed that long read, direct RNA-seq allows the discovery of novel protein isoforms not already in reference databases or custom databases built from short read RNA-seq data. Our analysis highlights the benefits of long read RNA-seq data in the generation of reference databases to increase tandem mass spectrometry (MS/MS) identification of protein isoforms.

Publication

Proteogenomic Discovery of a Small, Novel Protein in Yeast Reveals a Strategy for the Detection of Unannotated Short Open Reading Frames

Publisher: American Chemical Society (ACS)

Date: 10-11-2015

DOI: 10.1021/ACS.JPROTEOME.5B00734

Abstract: In recent years, proteomic data have contributed to genome annotation efforts, most notably in humans and mice, and spawned a field termed "proteogenomics". Yeast, in contrast with higher eukaryotes, has a small genome, which has lent itself to simpler ORF prediction. Despite this, continual advances in mass spectrometry suggest that proteomics should be able to improve genome annotation even in this well-characterized species. Here we applied a proteogenomics workflow to yeast to identify novel protein-coding genes. Specific databases were generated, from intergenic regions of the genome, which were then queried with MS/MS data. This suggested the existence of several putative novel ORFs of <100 codons, one of which we chose to validate. Synthetic peptides, RNA-Seq analysis, and evidence of evolutionary conservation allowed for the unequivocal definition of a new protein of 78 amino acids encoded on chromosome X, which we dub YJR107C-A. It encodes a new type of domain, which ab initio modeling suggests as predominantly α-helical. We show that this gene is nonessential for growth however, deletion increases sensitivity to osmotic stress. Finally, from the above discovery process, we discuss a generalizable strategy for the identification of short ORFs and small proteins, many of which are likely to be undiscovered.

Publication

MS2‐Deisotoper: A Tool for Deisotoping High‐Resolution MS/MS Spectra in Normal and Heavy Isotope‐Labelled Samples

Publisher: Wiley

Date: 08-08-2019

DOI: 10.1002/PMIC.201800444

Abstract: High-resolution MS/MS spectra of peptides can be deisotoped to identify monoisotopic masses of peptide fragments. The use of such masses should improve protein identification rates. However, deisotoping is not universally used and its benefits have not been fully explored. Here, MS2-Deisotoper, a tool for use prior to database search, is used to identify monoisotopic peaks in centroided MS/MS spectra. MS2-Deisotoper works by comparing the mass and relative intensity of each peptide fragment peak to every other peak of greater mass, and by applying a set of rules concerning mass and intensity differences. After comprehensive parameter optimization, it is shown that MS2-Deisotoper can improve the number of peptide spectrum matches (PSMs) identified by up to 8.2% and proteins by up to 2.8%. It is effective with SILAC and non-SILAC MS/MS data. The identification of unique peptide sequences is also improved, increasing the number of human proteoforms by 3.7%. Detailed investigation of results shows that deisotoping increases Mascot ion scores, improves FDR estimation for PSMs, and leads to greater protein sequence coverage. At a peptide level, it is found that the efficacy of deisotoping is affected by peptide mass and charge. MS2-Deisotoper can be used via a user interface or as a command-line tool.

Publication

GOANA: A Universal High-Throughput Web Service for Assessing and Comparing the Outcome and Efficiency of Genome Editing Experiments

Publisher: Mary Ann Liebert Inc

Date: 04-2021

DOI: 10.1089/CRISPR.2020.0068

Publication

Occurrence of multiple genotype infection caused by Leishmania infantum in naturally infected dogs

Publisher: Public Library of Science (PLoS)

Date: 27-07-2020

DOI: 10.1371/JOURNAL.PNTD.0007986

Publication

MethylQuant: A Tool for Sensitive Validation of Enzyme-Mediated Protein Methylation Sites from Heavy-Methyl SILAC Data

Publisher: American Chemical Society (ACS)

Date: 03-11-2018

DOI: 10.1021/ACS.JPROTEOME.7B00601

Abstract: The study of post-translational methylation is h ered by the fact that large-scale LC-MS/MS experiments produce high methylpeptide false discovery rates (FDRs). The use of heavy-methyl stable isotope labeling by amino acids in cell culture (heavy-methyl SILAC) can drastically reduce these FDRs however, this approach is limited by a lack of heavy-methyl SILAC compatible software. To fill this gap, we recently developed MethylQuant. Here, using an updated version of MethylQuant, we demonstrate its methylpeptide validation and quantification capabilities and provide guidelines for its best use. Using reference heavy-methyl SILAC data sets, we show that MethylQuant predicts with statistical significance the true or false positive status of methylpeptides in s les of varying complexity, degree of methylpeptide enrichment, and heavy to light mixing ratios. We introduce methylpeptide confidence indicators, MethylQuant Confidence and MethylQuant Score, and demonstrate their strong performance in complex s les characterized by a lack of methylpeptide enrichment. For these challenging data sets, MethylQuant identifies 882 of 1165 true positive methylpeptide spectrum matches (i.e., >75% sensitivity) at high specificity (<2% FDR) and achieves near-perfect specificity at 41% sensitivity. We also demonstrate that MethylQuant produces high accuracy relative quantification data that are tolerant of interference from coeluting peptide ions. Together MethylQuant's capabilities provide a path toward routine, accurate characterizations of the methylproteome using heavy-methyl SILAC.

Publication

Proteomic validation of transcript isoforms, including those assembled from RNA-Seq data

Publisher: American Chemical Society (ACS)

Date: 20-05-2015

DOI: 10.1021/PR5011394

Abstract: Human proteome analysis now requires an understanding of protein isoforms. We recently published the PG Nexus pipeline, which facilitates high confidence validation of exons and splice junctions by integrating genomics and proteomics data. Here we comprehensively explore how RNA-seq transcriptomics data, and proteomic analysis of the same s le, can identify protein isoforms. RNA-seq data from human mesenchymal (hMSC) stem cells were analyzed with our new TranscriptCoder tool to generate a database of protein isoform sequences. MS/MS data from matching hMSC s les were then matched against the TranscriptCoder-derived database, along with Ensembl and the neXtProt database. Querying the TranscriptCoder-derived or Ensembl database could unambiguously identify ∼450 protein isoforms, with isoform-specific proteotypic peptides, including candidate hMSC-specific isoforms for the genes DPYSL2 and FXR1. Where isoform-specific peptides did not exist, groups of nonisoform-specific proteotypic peptides could specifically identify many isoforms. In both the above cases, isoforms will be detectable with targeted MS/MS assays. Unfortunately, our analysis also revealed that some isoforms will be difficult to identify unambiguously as they do not have peptides that are sufficiently distinguishing. We covisualize mRNA isoforms and peptides in a genome browser to illustrate the above situations. Mass spectrometry data is available via ProteomeXchange (PXD001449).

Publication

Visualizing Post-Translational Modifications in Protein Interaction Networks Using PTMOracle

Publisher: Wiley

Date: 17-01-2019

DOI: 10.1002/CPBI.71

Abstract: Post‐translational modifications (PTMs) of proteins act as key regulators of protein activity, including the regulation of protein‐protein interactions (PPIs). However, exploring functional links between PTMs and PPIs can be difficult. PTMOracle is a Cytoscape app that facilitates the co‐visualization and co‐analysis of PTMs in the context of PPI networks. PTMOracle also allows extensive data to be integrated and co‐analyzed, allowing the role of domains, motifs, and disordered regions to be considered. Here, we describe several PTMOracle protocols investigating complex PTM‐associated relationships and their role in PPIs. This is assisted by OraclePainter for coloring proteins by the modifications present and visualizing these in the context of networks, by OracleTools for cross‐matching PTMs with sequence feature for all nodes in the network, and by OracleResults for exploring specific proteins and visualizing their PTMs in the context of protein sequences. This unit aims to demonstrate how PTMOracle can be used to systematically explore network visualizations and generate testable hypotheses regarding the functional role of PTMs in PPIs, and how the results can be analyzed to better understand the regulatory role of PTMs in PPIs. © 2019 by John Wiley & Sons, Inc.

Publication

Large Scale Mass Spectrometry-based Identifications of Enzyme-mediated Protein Methylation Are Subject to High False Discovery Rates

Publisher: Elsevier BV

Date: 03-2016

DOI: 10.1074/MCP.M115.055384

Publication

Supporting pandemic response using genomics and bioinformatics: A case study on the emergent SARS-CoV-2 outbreak

Publisher: Hindawi Limited

Date: 25-05-2020

DOI: 10.1111/TBED.13588

Publication

Publisher Correction: The RNA Atlas expands the catalog of human non-coding RNAs

Publisher: Springer Science and Business Media LLC

Date: 28-06-2021

DOI: 10.1038/S41587-021-00996-3

Publication

Tools to covisualize and coanalyze proteomic data with genomes and transcriptomes: Validation of genes and Alternative mRNA splicing

Publisher: American Chemical Society (ACS)

Date: 12-11-2014

DOI: 10.1021/PR400820P

Abstract: Direct links between proteomic and genomic/transcriptomic data are not frequently made, partly because of lack of appropriate bioinformatics tools. To help address this, we have developed the PG Nexus pipeline. The PG Nexus allows users to covisualize peptides in the context of genomes or genomic contigs, along with RNA-seq reads. This is done in the Integrated Genome Viewer (IGV). A Results Analyzer reports the precise base position where LC-MS/MS-derived peptides cover genes or gene isoforms, on the chromosomes or contigs where this occurs. In prokaryotes, the PG Nexus pipeline facilitates the validation of genes, where annotation or gene prediction is available, or the discovery of genes using a "virtual protein"-based unbiased approach. We illustrate this with a comprehensive proteogenomics analysis of two strains of C ylobacter concisus . For higher eukaryotes, the PG Nexus facilitates gene validation and supports the identification of mRNA splice junction boundaries and splice variants that are protein-coding. This is illustrated with an analysis of splice junctions covered by human phosphopeptides, and other ex les of relevance to the Chromosome-Centric Human Proteome Project. The PG Nexus is open-source and available from github.com/IntersectAustralia/ap11_Samifier. It has been integrated into Galaxy and made available in the Galaxy tool shed.

Publication

PTMOracle: A Cytoscape App for Covisualizing and Coanalyzing Post-Translational Modifications in Protein Interaction Networks

Publisher: American Chemical Society (ACS)

Date: 06-04-2017

DOI: 10.1021/ACS.JPROTEOME.6B01052

Abstract: Post-translational modifications of proteins (PTMs) act as key regulators of protein activity and of protein-protein interactions (PPIs). To date, it has been difficult to comprehensively explore functional links between PTMs and PPIs. To address this, we developed PTMOracle, a Cytoscape app for coanalyzing PTMs within PPI networks. PTMOracle also allows extensive data to be integrated and coanalyzed with PPI networks, allowing the role of domains, motifs, and disordered regions to be considered. For proteins of interest, or a whole proteome, PTMOracle can generate network visualizations to reveal complex PTM-associated relationships. This is assisted by OraclePainter for coloring proteins by modifications, OracleTools for network analytics, and OracleResults for exploring tabulated findings. To illustrate the use of PTMOracle, we investigate PTM-associated relationships and their role in PPIs in four case studies. In the yeast interactome and its rich set of PTMs, we construct and explore histone-associated and domain-domain interaction networks and show how integrative approaches can predict kinases involved in phosphodegrons. In the human interactome, a phosphotyrosine-associated network is analyzed but highlights the sparse nature of human PPI networks and lack of PTM-associated data. PTMOracle is open source and available at the Cytoscape app store: pps tmoracle .

Publication

The RNA Atlas expands the catalog of human non-coding RNAs

Publisher: Springer Science and Business Media LLC

Date: 17-06-2021

DOI: 10.1038/S41587-021-00936-1

Abstract: Existing compendia of non-coding RNA (ncRNA) are incomplete, in part because they are derived almost exclusively from small and polyadenylated RNAs. Here we present a more comprehensive atlas of the human transcriptome, which includes small and polyA RNA as well as total RNA from 300 human tissues and cell lines. We report thousands of previously uncharacterized RNAs, increasing the number of documented ncRNAs by approximately 8%. To infer functional regulation by known and newly characterized ncRNAs, we exploited pre-mRNA abundance estimates from total RNA sequencing, revealing 316 microRNAs and 3,310 long non-coding RNAs with multiple lines of evidence for roles in regulating protein-coding genes and pathways. Our study both refines and expands the current catalog of human ncRNAs and their regulatory interactions. All data, analyses and results are available for download and interrogation in the R2 web portal, serving as a basis for future exploration of RNA biology and function.

Publication

Genome Sequence of Campylobacter showae UNSWCD, Isolated from a Patient with Crohn's Disease

Publisher: American Society for Microbiology

Date: 28-02-2013

DOI: 10.1128/GENOMEA.00193-12

Abstract: C ylobacter showae UNSWCD was isolated from a patient with Crohn's disease. Here we present a 2.1 Mb draft assembly of its genome.

Related Organisations

Organisation

Macquarie University

Location: Australia

View Organisation

Organisation

Universidade Federal Do Espirito Santo

Location: Brazil

View Organisation

Organisation

Fundação Oswaldo Cruz

Location: Brazil

View Organisation

Organisation

Commonwealth Scientific And Industrial Research Organisation

Location: Australia

View Organisation

Organisation

University Of New South Wales

Location: Australia

View Organisation

Organisation

CSIRO

Location: Australia

View Organisation

Related Funding Activities

No related grants have been discovered for Aidan P. Tay.

Aidan P. Tay

Researcher

Related Links

Publications

Identification of Protein Isoforms Using Reference Databases Built from Long and Short Read RNA-Sequencing

Proteogenomic Discovery of a Small, Novel Protein in Yeast Reveals a Strategy for the Detection of Unannotated Short Open Reading Frames

MS2‐Deisotoper: A Tool for Deisotoping High‐Resolution MS/MS Spectra in Normal and Heavy Isotope‐Labelled Samples

GOANA: A Universal High-Throughput Web Service for Assessing and Comparing the Outcome and Efficiency of Genome Editing Experiments

Occurrence of multiple genotype infection caused by Leishmania infantum in naturally infected dogs

MethylQuant: A Tool for Sensitive Validation of Enzyme-Mediated Protein Methylation Sites from Heavy-Methyl SILAC Data

Proteomic validation of transcript isoforms, including those assembled from RNA-Seq data

Visualizing Post-Translational Modifications in Protein Interaction Networks Using PTMOracle

Large Scale Mass Spectrometry-based Identifications of Enzyme-mediated Protein Methylation Are Subject to High False Discovery Rates

Supporting pandemic response using genomics and bioinformatics: A case study on the emergent SARS-CoV-2 outbreak

Publisher Correction: The RNA Atlas expands the catalog of human non-coding RNAs

Tools to covisualize and coanalyze proteomic data with genomes and transcriptomes: Validation of genes and Alternative mRNA splicing

PTMOracle: A Cytoscape App for Covisualizing and Coanalyzing Post-Translational Modifications in Protein Interaction Networks

The RNA Atlas expands the catalog of human non-coding RNAs

Genome Sequence of Campylobacter showae UNSWCD, Isolated from a Patient with Crohn's Disease

Related Organisations

Macquarie University

Universidade Federal Do Espirito Santo

Fundação Oswaldo Cruz

Commonwealth Scientific And Industrial Research Organisation

University Of New South Wales

CSIRO

Related Funding Activities

ARDC NEWSLETTER SIGNUP