ORCID Profile
0000-0002-3004-2119
Current Organisation
University of Melbourne
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
Publisher: Proceedings of the National Academy of Sciences
Date: 24-01-2022
Abstract: Sulfoquinovose, a sulfosugar derivative of glucose, is produced by most photosynthetic organisms and contains up to half of all sulfur in the biosphere. Several pathways for its breakdown are known, though they provide access to only half of the carbon in sulfoquinovose and none of its sulfur. Here, we describe a fundamentally different pathway within the plant pathogen Agrobacterium tumefaciens that features oxidative desulfurization of sulfoquinovose to access all carbon and sulfur within the molecule. Biochemical and structural analyses of the pathway’s key proteins provided insights how the sulfosugar is recognized and degraded. Genes encoding this sulfoquinovose monooxygenase pathway are present in many plant pathogens and symbionts, alluding to a possible role for sulfoquinovose in plant host–bacteria interactions.
Publisher: American Association for Cancer Research (AACR)
Date: 07-2015
DOI: 10.1158/2159-8290.CD-14-1096
Abstract: Familial renal cell carcinoma (RCC) is genetically heterogeneous and may be caused by mutations in multiple genes, including VHL, MET, SDHB, FH, FLCN, PTEN, and BAP1. However, most in iduals with inherited RCC do not have a detectable germline mutation. To identify novel inherited RCC genes, we undertook exome resequencing studies in a familial RCC kindred and identified a CDKN2B nonsense mutation that segregated with familial RCC status. Targeted resequencing of CDKN2B in in iduals (n = 82) with features of inherited RCC then revealed three candidate CDKN2B missense mutations (p.Pro40Thr, p.Ala23Glu, and p.Asp86Asn). In silico analysis of the three-dimensional structures indicated that each missense substitution was likely pathogenic through reduced stability of the mutant or reduced affinity for cyclin-dependent kinases 4 and 6, and in vitro studies demonstrated that each of the mutations impaired CDKN2B-induced suppression of proliferation in an RCC cell line. These findings identify germline CDKN2B mutations as a novel cause of familial RCC. Significance: Germline loss-of-function CDKN2B mutations were identified in a subset of patients with features of inherited RCC. Detection of germline CDKN2B mutations will have an impact on familial cancer screening and might prove to influence the management of disseminated disease. Cancer Discov 5(7) 723–9. ©2015 AACR. This article is highlighted in the In This Issue feature, p. 681
Publisher: Oxford University Press (OUP)
Date: 02-06-2022
DOI: 10.1093/BIB/BBAC165
Abstract: Proteins are capable of highly specific interactions and are responsible for a wide range of functions, making them attractive in the pursuit of new therapeutic options. Previous studies focusing on overall geometry of protein–protein interfaces, however, concluded that PPI interfaces were generally flat. More recently, this idea has been challenged by their structural and thermodynamic characterisation, suggesting the existence of concave binding sites that are closer in character to traditional small-molecule binding sites, rather than exhibiting complete flatness. Here, we present a large-scale analysis of binding geometry and physicochemical properties of all protein–protein interfaces available in the Protein Data Bank. In this review, we provide a comprehensive overview of the protein–protein interface landscape, including evidence that even for overall larger, more flat interfaces that utilize discontinuous interacting regions, small and potentially druggable pockets are utilized at binding sites.
Publisher: Cold Spring Harbor Laboratory
Date: 13-12-2021
DOI: 10.1101/2021.12.11.21267504
Abstract: The detection of adverse drug reactions (ADRs) is critical to our understanding of the safety and risk-benefit profile of medications. With an incidence that has not changed over the last 30 years, ADRs are a significant source of patient morbidity, responsible for 5-10% of acute care hospital admissions worldwide. Spontaneous reporting of ADRs has long been the standard method of reporting, however this approach is known to have high rates of under-reporting, a problem that limits pharmacovigilance efforts. Automated ADR reporting presents an alternative pathway to increase reporting rates, although this may be limited by over-reporting of other drug-related adverse events. We developed a deep learning natural language processing algorithm to identify ADRs in discharge summaries at a single academic hospital centre. Our model was developed in two stages: first, a pre-trained model (DeBERTa) was further pre-trained on 1.1 million unlabelled clinical documents secondly, this model was fine-tuned to detect ADR mentions in a corpus of 861 annotated discharge summaries. This model was compared to a version without the pre-training step, and a model finetuned from the ClinicalBERT model, which has demonstrated state-of-the-art performance on other pharmacovigilance tasks. To ensure that our algorithm could differentiate ADRs from other drug-related adverse events, the annotated corpus was enriched for both validated ADR reports and confounding drug-related adverse events using. The final model demonstrated good performance with a ROC-AUC of 0.955 (95% CI 0.946 - 0.963) for the task of identifying discharge summaries containing ADR mentions, significantly outperforming the two comparator models.
Publisher: Oxford University Press (OUP)
Date: 24-02-2022
DOI: 10.1093/BIB/BBAC042
Abstract: Herbicides have revolutionised weed management, increased crop yields and improved profitability allowing for an increase in worldwide food security. Their widespread use, however, has also led to a rise in resistance and concerns about their environmental impact. Despite the need for potent and safe herbicidal molecules, no herbicide with a new mode of action has reached the market in 30 years. Although development of computational approaches has proven invaluable to guide rational drug discovery pipelines, leading to higher hit rates and lower attrition due to poor toxicity, little has been done in contrast for herbicide design. To fill this gap, we have developed cropCSM, a computational platform to help identify new, potent, nontoxic and environmentally safe herbicides. By using a knowledge-based approach, we identified physicochemical properties and substructures enriched in safe herbicides. By representing the small molecules as a graph, we leveraged these insights to guide the development of predictive models trained and tested on the largest collected data set of molecules with experimentally characterised herbicidal profiles to date (over 4500 compounds). In addition, we developed six new environmental and human toxicity predictors, spanning five different species to assist in molecule prioritisation. cropCSM was able to correctly identify 97% of herbicides currently available commercially, while predicting toxicity profiles with accuracies of up to 92%. We believe cropCSM will be an essential tool for the enrichment of screening libraries and to guide the development of potent and safe herbicides. We have made the method freely available through a user-friendly webserver at biosig.unimelb.edu.au/crop_csm.
Publisher: Springer Science and Business Media LLC
Date: 22-10-2020
DOI: 10.1038/S41598-020-74648-Y
Abstract: Rif icin resistance is a major therapeutic challenge, particularly in tuberculosis, leprosy, P. aeruginosa and S. aureus infections, where it develops via missense mutations in gene rpoB. Previously we have highlighted that these mutations reduce protein affinities within the RNA polymerase complex, subsequently reducing nucleic acid affinity. Here, we have used these insights to develop a computational rif icin resistance predictor capable of identifying resistant mutations even outside the well-defined rif icin resistance determining region (RRDR), using clinical M. tuberculosis sequencing information. Our tool successfully identified up to 90.9% of M. tuberculosis rpoB variants correctly, with sensitivity of 92.2%, specificity of 83.6% and MCC of 0.69, outperforming the current gold-standard GeneXpert-MTB/RIF. We show our model can be translated to other clinically relevant organisms: M. leprae , P. aeruginosa and S. aureus , despite weak sequence identity. Our method was implemented as an interactive tool, SUSPECT-RIF (StrUctural Susceptibility PrEdiCTion for RIF icin), freely available at biosig.unimelb.edu.au/suspect_rif/ .
Publisher: Association for Research in Vision and Ophthalmology (ARVO)
Date: 18-10-2017
Abstract: The aim of this article is to report the investigation of the structural features of ABCA4, a protein associated with a genetic retinal disease. A new database collecting knowledge of ABCA4 structure may facilitate predictions about the possible functional consequences of gene mutations observed in clinical practice. In order to correlate structural and functional effects of the observed mutations, the structure of mouse P-glycoprotein was used as a template for homology modeling. The obtained structural information and genetic data are the basis of our relational database (ABCA4Database). Sequence variability among all ABCA4-deposited entries was calculated and reported as Shannon entropy score at the residue level. The three-dimensional model of ABCA4 structure was used to locate the spatial distribution of the observed variable regions. Our predictions from structural in silico tools were able to accurately link the functional effects of mutations to phenotype. The development of the ABCA4Database gathers all the available genetic and structural information, yielding a global view of the molecular basis of some retinal diseases. ABCA4 modeled structure provides a molecular basis on which to analyze protein sequence mutations related to genetic retinal disease in order to predict the risk of retinal disease across all possible ABCA4 mutations. Additionally, our ABCA4 predicted structure is a good starting point for the creation of a new data analysis model, appropriate for precision medicine, in order to develop a deeper knowledge network of the disease and to improve the management of patients.
Publisher: BMJ
Date: 30-09-2023
Abstract: Amyotrophic lateral sclerosis (ALS) is a progressively fatal, neurodegenerative disease associated with both motor and non-motor symptoms, including frontotemporal dementia. Approximately 10% of cases are genetically inherited (familial ALS), while the majority are sporadic. Mutations across a wide range of genes have been associated however, the underlying molecular effects of these mutations and their relation to phenotypes remain poorly explored. We initially curated an extensive list (n = 1343) of missense mutations identified in the clinical literature, which spanned across 111 unique genes. Of these, mutations in genes SOD1 , FUS and TDP43 were analysed using in silico biophysical tools, which characterised changes in protein stability, interactions, localisation and function. The effects of pathogenic and non-pathogenic mutations within these genes were statistically compared to highlight underlying molecular drivers. Compared with previous ALS-dedicated databases, we have curated the most extensive missense mutation database to date and observed a twofold increase in unique implicated genes, and almost a threefold increase in the number of mutations. Our gene-specific analysis identified distinct molecular drivers across the different proteins, where SOD1 mutations primarily reduced protein stability and dimer formation, and those in FUS and TDP-43 were present within disordered regions, suggesting different mechanisms of aggregate formation. Using our three genes as case studies, we identified distinct insights which can drive further research to better understand ALS. The information curated in our database can serve as a resource for similar gene-specific analyses, further improving the current understanding of disease, crucial for the development of treatment strategies.
Publisher: Wiley
Date: 14-08-2008
DOI: 10.1002/PROT.22187
Abstract: In this study, we carried out a comparative analysis between two classical methodologies to prospect residue contacts in proteins: the traditional cutoff dependent (CD) approach and cutoff free Delaunay tessellation (DT). In addition, two alternative coarse-grained forms to represent residues were tested: using alpha carbon (CA) and side chain geometric center (GC). A database was built, comprising three top classes: all alpha, all beta, and alpha/beta. We found that the cutoff value at about 7.0 A emerges as an important distance parameter. Up to 7.0 A, CD and DT properties are unified, which implies that at this distance all contacts are complete and legitimate (not occluded). We also have shown that DT has an intrinsic missing edges problem when mapping the first layer of neighbors. In proteins, it may produce systematic errors affecting mainly the contact network in beta chains with CA. The almost-Delaunay (AD) approach has been proposed to solve this DT problem. We found that even AD may not be an advantageous solution. As a consequence, in the strict range up to 7.0 A, the CD approach revealed to be a simpler, more complete, and reliable technique than DT or AD. Finally, we have shown that coarse-grained residue representations may introduce bias in the analysis of neighbors in cutoffs up to 6.8 A, with CA favoring alpha proteins and GC favoring beta proteins. This provides an additional argument pointing to the value of 7.0 A as an important lower bound cutoff to be used in contact analysis of proteins.
Publisher: Public Library of Science (PLoS)
Date: 28-07-2016
Publisher: Oxford University Press (OUP)
Date: 16-10-2015
DOI: 10.1093/NAR/GKU966
Publisher: Oxford University Press (OUP)
Date: 04-04-2017
DOI: 10.1093/NAR/GKX236
Publisher: Oxford University Press (OUP)
Date: 09-12-2021
DOI: 10.1093/BIB/BBAB512
Abstract: Protein-carbohydrate interactions are crucial for many cellular processes but can be challenging to biologically characterise. To improve our understanding and ability to model these molecular interactions, we used a carefully curated set of 370 protein-carbohydrate complexes with experimental structural and biophysical data in order to train and validate a new tool, cutoff scanning matrix (CSM)-carbohydrate, using machine learning algorithms to accurately predict their binding affinity and rank docking poses as a scoring function. Information on both protein and carbohydrate complementarity, in terms of shape and chemistry, was captured using graph-based structural signatures. Across both training and independent test sets, we achieved comparable Pearson’s correlations of 0.72 under cross-validation [root mean square error (RMSE) of 1.58 Kcal/mol] and 0.67 on the independent test (RMSE of 1.72 Kcal/mol), providing confidence in the generalisability and robustness of the final model. Similar performance was obtained across mono-, di- and oligosaccharides, further highlighting the applicability of this approach to the study of larger complexes. We show CSM-carbohydrate significantly outperformed previous approaches and have implemented our method and make all data freely available through both a user-friendly web interface and application programming interface, to facilitate programmatic access at biosig.unimelb.edu.au/csm_carbohydrate/. We believe CSM-carbohydrate will be an invaluable tool for helping assess docking poses and the effects of mutations on protein-carbohydrate affinity, unravelling important aspects that drive binding recognition.
Publisher: Elsevier BV
Date: 2023
Publisher: BMJ
Date: 31-01-2018
DOI: 10.1136/JMEDGENET-2017-105127
Abstract: Germline pathogenic variants in SDHB/SDHC / SDHD are the most frequent causes of inherited phaeochromocytomas aragangliomas. Insufficient information regarding penetrance and phenotypic variability hinders optimum management of mutation carriers. We estimate penetrance for symptomatic tumours and elucidate genotype–phenotype correlations in a large cohort of SDHB/SDHC / SDHD mutation carriers. A retrospective survey of 1832 in iduals referred for genetic testing due to a personal or family history of phaeochromocytoma araganglioma. 876 patients (401 previously reported) had a germline mutation in SDHB/SDHC / SDHD (n=673/43/160). Tumour risks were correlated with in silico structural prediction analyses. Tumour risks analysis provided novel penetrance estimates and genotype–phenotype correlations. In addition to tumour type susceptibility differences for in idual genes, we confirmed that the SDHD: p.Pro81Leu mutation has a distinct phenotype and identified increased age-related tumour risks with highly destabilising SDHB missense mutations. By Kaplan-Meier analysis, the penetrance (cumulative risk of clinically apparent tumours) in SDHB and (paternally inherited) SDHD mutation-positive non-probands (n=371/67 with detailed clinical information) by age 60 years was 21.8% (95% CI 15.2% to 27.9%) and 43.2% (95% CI 25.4% to 56.7%), respectively. Risk of malignant disease at age 60 years in non-proband SDHB mutation carriers was 4.2%(95% CI 1.1% to 7.2%). With retrospective cohort analysis to adjust for ascertainment, cumulative tumour risks for SDHB mutation carriers at ages 60 years and 80 years were 23.9% (95% CI 20.9% to 27.4%) and 30.6% (95% CI 26.8% to 34.7%). Overall risks of clinically apparent tumours for SDHB mutation carriers are substantially lower than initially estimated and will improve counselling of affected families. Specific genotype–tumour risk associations provides a basis for novel investigative strategies into succinate dehydrogenase-related mechanisms of tumourigenesis and the development of personalised management for SDHB/SDHC / SDHD mutation carriers.
Publisher: Elsevier BV
Date: 2021
Publisher: Springer Science and Business Media LLC
Date: 25-03-2015
DOI: 10.1038/EJHG.2015.60
Publisher: Oxford University Press (OUP)
Date: 26-06-2014
DOI: 10.1093/HMG/DDU321
Publisher: BMJ
Date: 06-09-2019
DOI: 10.1136/JMEDGENET-2019-106214
Abstract: Pathogenic germline variants in subunits of succinate dehydrogenase ( SDHB , SDHC and SDHD ) are broadly associated with disease subtypes of phaeochromocytoma–paraganglioma (PPGL) syndrome. Our objective was to investigate the role of variant type (ie, missense vs truncating) in determining tumour phenotype. Three independent datasets comprising 950 PPGL and head and neck paraganglioma (HNPGL) patients were analysed for associations of variant type with tumour type and age-related tumour risk. All patients were carriers of pathogenic germline variants in the SDHB , SDHC or SDHD genes. Truncating SDH variants were significantly over-represented in clinical cases compared with missense variants, and carriers of SDHD truncating variants had a significantly higher risk for PPGL (p .001), an earlier age of diagnosis (p .0001) and a greater risk for PPGL/HNPGL comorbidity compared with carriers of missense variants. Carriers of SDHB truncating variants displayed a trend towards increased risk of PPGL, and all three SDH genes showed a trend towards over-representation of missense variants in HNPGL cases. Overall, variant types conferred PPGL risk in the (highest-to-lowest) sequence SDHB truncating, SDHB missense, SDHD truncating and SDHD missense, with the opposite pattern apparent for HNPGL (p .001). SDHD truncating variants represent a distinct group, with a clinical phenotype reminiscent of but not identical to SDHB . We propose that surveillance and counselling of carriers of SDHD should be tailored by variant type. The clinical impact of truncating SDHx variants is distinct from missense variants and suggests that residual SDH protein subunit function determines risk and site of disease.
Publisher: Oxford University Press (OUP)
Date: 09-12-2201
DOI: 10.1093/BIOINFORMATICS/BTR680
Abstract: Motivation: Protein–protein interfaces contain important information about molecular recognition. The discovery of conserved patterns is essential for understanding how substrates and inhibitors are bound and for predicting molecular binding. When an inhibitor binds to different enzymes (e.g. dissimilar sequences, structures or mechanisms what we call cross-inhibition), identification of invariants is a difficult task for which traditional methods may fail. Results: To clarify how cross-inhibition happens, we model the problem, propose and evaluate a methodology called HydroPaCe to detect conserved patterns. Interfaces are modeled as graphs of atomic apolar interactions and hydrophobic patches are computed and summarized by centroids (HP-centroids), and their conservation is detected. Despite sequence and structure dissimilarity, our method achieves an appropriate level of abstraction to obtain invariant properties in cross-inhibition. We show ex les in which HP-centroids successfully predicted enzymes that could be inhibited by the studied inhibitors according to BRENDA database. Availability: www.dcc.ufmg.br/~raquelcm/hydropace Contact: valdetemg@ufmg.br raquelcm@dcc.ufmg.br santoro@icb.ufmg.br Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Elsevier BV
Date: 11-2023
Publisher: Cold Spring Harbor Laboratory
Date: 21-10-2022
DOI: 10.1101/2022.10.17.512613
Abstract: The ability to identify B-cell epitopes is an essential step in vaccine design, immunodiagnostic tests, and antibody production. Several computational approaches have been proposed to identify, from an antigen protein, which residues are likely to be part of an epitope, but have limited performance on relatively homogeneous data sets and lack interpretability, limiting biological insights that could be derived. To address these limitations, we have developed epitope1D, an explainable machine learning method capable of accurately identifying linear B-cell epitopes, leveraging two new descriptors: a graph-based signature representation of protein sequences, based on our well established CSM (Cutoff Scanning Matrix) algorithm and Organism Ontology information. Our model achieved Area Under the ROC curve of up to 0.935 on cross-validation and blind tests, demonstrating robust performance and outperforming state-of-the-art tools. epitope1D has been made available as a user-friendly web server interface and API at biosig.lab.uq.edu.au/epitope1d .
Publisher: Oxford University Press (OUP)
Date: 04-10-2021
Abstract: While protein–nucleic acid interactions are pivotal for many crucial biological processes, limited experimental data has made the development of computational approaches to characterise these interactions a challenge. Consequently, most approaches to understand the effects of missense mutations on protein-nucleic acid affinity have focused on single-point mutations and have presented a limited performance on independent data sets. To overcome this, we have curated the largest dataset of experimentally measured effects of mutations on nucleic acid binding affinity to date, encompassing 856 single-point mutations and 141 multiple-point mutations across 155 experimentally solved complexes. This was used in combination with an optimized version of our graph-based signatures to develop mmCSM-NA (biosig.unimelb.edu.au/mmcsm_na), the first scalable method capable of quantitatively and accurately predicting the effects of multiple-point mutations on nucleic acid binding affinities. mmCSM-NA obtained a Pearson's correlation of up to 0.67 (RMSE of 1.06 Kcal/mol) on single-point mutations under cross-validation, and up to 0.65 on independent non-redundant datasets of multiple-point mutations (RMSE of 1.12 kcal/mol), outperforming similar tools. mmCSM-NA is freely available as an easy-to-use web-server and API. We believe it will be an invaluable tool to shed light on the role of mutations affecting protein–nucleic acid interactions in diseases.
Publisher: American Chemical Society (ACS)
Date: 22-04-2015
Publisher: Elsevier BV
Date: 2021
Publisher: Springer Science and Business Media LLC
Date: 20-03-2017
DOI: 10.1038/S41525-017-0009-4
Abstract: We characterize a novel human cohesinopathy originated from a familial germline mutation of the gene encoding the cohesin subunit STAG2, which we propose to call STAG2 -related X-linked Intellectual Deficiency. Five in iduals carry a STAG2 p.Ser327Asn (c.980 G A) variant that perfectly cosegregates with a phenotype of syndromic mental retardation in a characteristic X-linked recessive pattern. Although patient-derived cells did not show overt sister-chromatid cohesion defects, they exhibited altered cell cycle profiles and gene expression patterns that were consistent with cohesin deficiency. The protein level of STAG2 in patient cells was normal. Interestingly, STAG2 S327 is located at a conserved site crucial for binding to SCC1 and cohesin regulators. When expressed in human cells, the STAG2 p.Ser327Asn mutant is defective in binding to SCC1 and other cohesin subunits and regulators. Thus, decreased amount of intact cohesin likely underlies the phenotypes of STAG2 -SXLID. Intriguingly, recombinant STAG2 p.Ser327Asn binds normally to SCC1, WAPL, and SGO1 in vitro, suggesting the existence of unknown in vivo mechanisms that regulate the interaction between STAG2 and SCC1.
Publisher: Springer US
Date: 18-08-2020
Publisher: Elsevier BV
Date: 10-2021
Publisher: Oxford University Press (OUP)
Date: 23-05-2016
DOI: 10.1093/NAR/GKW458
Publisher: Oxford University Press (OUP)
Date: 21-10-2021
DOI: 10.1093/BIB/BBAB423
Abstract: The ability to identify antigenic determinants of pathogens, or epitopes, is fundamental to guide rational vaccine development and immunotherapies, which are particularly relevant for rapid pandemic response. A range of computational tools has been developed over the past two decades to assist in epitope prediction however, they have presented limited performance and generalization, particularly for the identification of conformational B-cell epitopes. Here, we present epitope3D, a novel scalable machine learning method capable of accurately identifying conformational epitopes trained and evaluated on the largest curated epitope data set to date. Our method uses the concept of graph-based signatures to model epitope and non-epitope regions as graphs and extract distance patterns that are used as evidence to train and test predictive models. We show epitope3D outperforms available alternative approaches, achieving Mathew’s Correlation Coefficient and F1-scores of 0.55 and 0.57 on cross-validation and 0.45 and 0.36 during independent blind tests, respectively.
Publisher: Oxford University Press (OUP)
Date: 2021
Abstract: G protein-coupled receptors (GPCRs) can selectively bind to many types of ligands, ranging from light-sensitive compounds, ions, hormones, pheromones and neurotransmitters, modulating cell physiology. Considering their role in many essential cellular processes, they are one of the most targeted protein families, with over a third of all approved drugs modulating GPCR signalling. Despite this, the large ersity of receptors and their multipass transmembrane architectures make the identification and development of novel specific, and safe GPCR ligands a challenge. While computational approaches have the potential to assist GPCR drug development, they have presented limited performance and generalization capabilities. Here, we explored the use of graph-based signatures to develop pdCSM-GPCR, a method capable of rapidly and accurately screening potential GPCR ligands. Bioactivity data (IC50, EC50, Ki and Kd) for in idual GPCRs were curated. After curation, we used the data for developing predictive models for 36 major GPCR targets, across 4 classes (A, B, C and F). Our models compose the most comprehensive computational resource for GPCR bioactivity prediction to date. Across stratified 10-fold cross-validation and blind tests, our approach achieved Pearson’s correlations of up to 0.89, significantly outperforming previous methods. Interpreting our results, we identified common important features of potent GPCRs ligands, which tend to have bicyclic rings, leading to higher levels of aromaticity. We believe pdCSM-GPCR will be an invaluable tool to assist screening efforts, enriching compound libraries and ranking candidates for further experimental validation. pdCSM-GPCR predictive models and datasets used have been made available via a freely accessible and easy-to-use web server at biosig.unimelb.edu.au dcsm_gpcr/. Supplementary data are available at Bioinformatics Advances online.
Publisher: American Chemical Society (ACS)
Date: 02-07-2020
Publisher: Springer Science and Business Media LLC
Date: 07-07-2016
DOI: 10.1038/SREP29575
Abstract: The ability to predict how a mutation affects ligand binding is an essential step in understanding, anticipating and improving the design of new treatments for drug resistance and in understanding genetic diseases. Here we present mCSM-lig, a structure-guided computational approach for quantifying the effects of single-point missense mutations on affinities of small molecules for proteins. mCSM-lig uses graph-based signatures to represent the wild-type environment of mutations and small-molecule chemical features and changes in protein stability as evidence to train a predictive model using a representative set of protein-ligand complexes from the Platinum database. We show our method provides a very good correlation with experimental data (up to ρ = 0.67) and is effective in predicting a range of chemotherapeutic, antiviral and antibiotic resistance mutations, providing useful insights for genotypic screening and to guide drug development. mCSM-lig also provides insights into understanding Mendelian disease mutations and as a tool for guiding protein design. mCSM-lig is freely available as a web server at structure.bioc.cam.ac.uk/mcsm_lig .
Publisher: Oxford University Press (OUP)
Date: 29-06-2023
DOI: 10.1093/BIOINFORMATICS/BTAD402
Abstract: With the development of sequencing techniques, the discovery of new proteins significantly exceeds the human capacity and resources for experimentally characterizing protein functions. Localization, EC numbers, and GO terms with the structure-based Cutoff Scanning Matrix (LEGO-CSM) is a comprehensive web-based resource that fills this gap by leveraging the well-established and robust graph-based signatures to supervised learning models using both protein sequence and structure information to accurately model protein function in terms of Subcellular Localization, Enzyme Commission (EC) numbers, and Gene Ontology (GO) terms. We show our models perform as well as or better than alternative approaches, achieving area under the receiver operating characteristic curve of up to 0.93 for subcellular localization, up to 0.93 for EC, and up to 0.81 for GO terms on independent blind tests. LEGO-CSM’s web server is freely available at biosig.lab.uq.edu.au/lego_csm. In addition, all datasets used to train and test LEGO-CSM’s models can be downloaded at biosig.lab.uq.edu.au/lego_csm/data.
Publisher: Oxford University Press (OUP)
Date: 07-06-2023
DOI: 10.1093/NAR/GKAD472
Abstract: Understanding the effects of mutations on protein stability is crucial for variant interpretation and prioritisation, protein engineering, and biotechnology. Despite significant efforts, community assessments of predictive tools have highlighted ongoing limitations, including computational time, low predictive power, and biased predictions towards destabilising mutations. To fill this gap, we developed DDMut, a fast and accurate siamese network to predict changes in Gibbs Free Energy upon single and multiple point mutations, leveraging both forward and hypothetical reverse mutations to account for model anti-symmetry. Deep learning models were built by integrating graph-based representations of the localised 3D environment, with convolutional layers and transformer encoders. This combination better captured the distance patterns between atoms by extracting both short-range and long-range interactions. DDMut achieved Pearson's correlations of up to 0.70 (RMSE: 1.37 kcal/mol) on single point mutations, and 0.70 (RMSE: 1.84 kcal/mol) on double/triple mutants, outperforming most available methods across non-redundant blind test sets. Importantly, DDMut was highly scalable and demonstrated anti-symmetric performance on both destabilising and stabilising mutations. We believe DDMut will be a useful platform to better understand the functional consequences of mutations, and guide rational protein engineering. DDMut is freely available as a web server and API at biosig.lab.uq.edu.au/ddmut.
Publisher: Oxford University Press (OUP)
Date: 08-02-2013
DOI: 10.1093/BIOINFORMATICS/BTT058
Abstract: Motivation: Receptor-ligand interactions are a central phenomenon in most biological systems. They are characterized by molecular recognition, a complex process mainly driven by physicochemical and structural properties of both receptor and ligand. Understanding and predicting these interactions are major steps towards protein ligand prediction, target identification, lead discovery and drug design. Results: We propose a novel graph-based–binding pocket signature called aCSM, which proved to be efficient and effective in handling large-scale protein ligand prediction tasks. We compare our results with those described in the literature and demonstrate that our algorithm overcomes the competitor’s techniques. Finally, we predict novel ligands for proteins from Trypanosoma cruzi, the parasite responsible for Chagas disease, and validate them in silico via a docking protocol, showing the applicability of the method in suggesting ligands for pockets in a real-world scenario. Availability and implementation: Datasets and the source code are available at www.dcc.ufmg.br/∼dpires/acsm. Contact: dpires@dcc.ufmg.br or raquelcm@dcc.ufmg.br Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 10-04-2023
DOI: 10.1093/BIB/BBAD114
Abstract: The ability to identify B-cell epitopes is an essential step in vaccine design, immunodiagnostic tests and antibody production. Several computational approaches have been proposed to identify, from an antigen protein or peptide sequence, which residues are more likely to be part of an epitope, but have limited performance on relatively homogeneous data sets and lack interpretability, limiting biological insights that could otherwise be obtained. To address these limitations, we have developed epitope1D, an explainable machine learning method capable of accurately identifying linear B-cell epitopes, leveraging two new descriptors: a graph-based signature representation of protein sequences, based on our well-established Cutoff Scanning Matrix algorithm and Organism Ontology information. Our model achieved Areas Under the ROC curve of up to 0.935 on cross-validation and blind tests, demonstrating robust performance. A comprehensive comparison to alternative methods using distinct benchmark data sets was also employed, with our model outperforming state-of-the-art tools. epitope1D represents not only a significant advance in predictive performance, but also allows biologically meaningful features to be combined and used for model interpretation. epitope1D has been made available as a user-friendly web server interface and application programming interface at biosig.lab.uq.edu.au/epitope1d/.
Publisher: Oxford University Press (OUP)
Date: 21-02-2022
DOI: 10.1093/BIB/BBAC025
Abstract: Changes in protein sequence can have dramatic effects on how proteins fold, their stability and dynamics. Over the last 20 years, pioneering methods have been developed to try to estimate the effects of missense mutations on protein stability, leveraging growing availability of protein 3D structures. These, however, have been developed and validated using experimentally derived structures and biophysical measurements. A large proportion of protein structures remain to be experimentally elucidated and, while many studies have based their conclusions on predictions made using homology models, there has been no systematic evaluation of the reliability of these tools in the absence of experimental structural data. We have, therefore, systematically investigated the performance and robustness of ten widely used structural methods when presented with homology models built using templates at a range of sequence identity levels (from 15% to 95%) and contrasted performance with sequence-based tools, as a baseline. We found there is indeed performance deterioration on homology models built using templates with sequence identity below 40%, where sequence-based tools might become preferable. This was most marked for mutations in solvent exposed residues and stabilizing mutations. As structure prediction tools improve, the reliability of these predictors is expected to follow, however we strongly suggest that these factors should be taken into consideration when interpreting results from structure-based predictors of mutation effects on protein stability.
Publisher: Oxford University Press (OUP)
Date: 30-04-2018
DOI: 10.1093/NAR/GKY300
Publisher: Oxford University Press (OUP)
Date: 23-10-2021
DOI: 10.1093/NAR/GKAA925
Abstract: Proteins are intricate, dynamic structures, and small changes in their amino acid sequences can lead to large effects on their folding, stability and dynamics. To facilitate the further development and evaluation of methods to predict these changes, we have developed ThermoMutDB, a manually curated database containing & ,669 experimental data of thermodynamic parameters for wild type and mutant proteins. This represents an increase of 83% in unique mutations over previous databases and includes thermodynamic information on 204 new proteins. During manual curation we have also corrected annotation errors in previously curated entries. Associated with each entry, we have included information on the unfolding Gibbs free energy and melting temperature change, and have associated entries with available experimental structural information. ThermoMutDB supports users to contribute to new data points and programmatic access to the database via a RESTful API. ThermoMutDB is freely available at: biosig.unimelb.edu.au/thermomutdb.
Publisher: Cold Spring Harbor Laboratory
Date: 04-2023
DOI: 10.1101/2023.03.31.535182
Abstract: Glycoside hydrolases (GHs) are a erse group of enzymes that catalyze the hydrolysis of glycosidic bonds. The Carbohydrate-Active enZymes (CAZy) classification organizes GHs into families based on sequence data and function, with fewer than 1% of the predicted proteins characterized biochemically. Consideration of genomic context can provide clues to infer possible enzyme activities for proteins of unknown function. We used the MultiGeneBLAST tool to discover a gene cluster in Marinovum sp., a member of the marine Roseobacter clade, that encodes homologues of enzymes belonging to the sulfoquinovose monooxygenase pathway for sulfosugar catabolism. This cluster lacks a gene encoding a classical family GH31 sulfoquinovosidase candidate, but which instead includes an uncharacterized family GH13 protein ( Ms GH13) that we hypothesized could be a non-classical sulfoquinovosidase. Surprisingly, recombinant Ms GH13 lacks sulfoquinovosidase activity and is a broad spectrum α-glucosidase that is active on a erse array of α-linked disaccharides, including: maltose, sucrose, nigerose, trehalose, isomaltose, and kojibiose. Using AlphaFold, a 3D model for the Ms GH13 enzyme was constructed that predicted its active site shared close similarity with an α-glucosidase from Halomonas sp. H11 of the same GH13 subfamily that shows narrower substrate specificity.
Publisher: Oxford University Press (OUP)
Date: 26-11-2014
DOI: 10.1093/BIOINFORMATICS/BTT691
Abstract: Motivation: Mutations play fundamental roles in evolution by introducing ersity into genomes. Missense mutations in structural genes may become either selectively advantageous or disadvantageous to the organism by affecting protein stability and/or interfering with interactions between partners. Thus, the ability to predict the impact of mutations on protein stability and interactions is of significant value, particularly in understanding the effects of Mendelian and somatic mutations on the progression of disease. Here, we propose a novel approach to the study of missense mutations, called mCSM, which relies on graph-based signatures. These encode distance patterns between atoms and are used to represent the protein residue environment and to train predictive models. To understand the roles of mutations in disease, we have evaluated their impacts not only on protein stability but also on protein–protein and protein–nucleic acid interactions. Results: We show that mCSM performs as well as or better than other methods that are used widely. The mCSM signatures were successfully used in different tasks demonstrating that the impact of a mutation can be correlated with the atomic-distance patterns surrounding an amino acid residue. We showed that mCSM can predict stability changes of a wide range of mutations occurring in the tumour suppressor protein p53, demonstrating the applicability of the proposed method in a challenging disease scenario. Availability and implementation: A web server is available at structure.bioc.cam.ac.uk/mcsm. Contact: dpires@dcc.ufmg.br tom@cryst.bioc.cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Springer Science and Business Media LLC
Date: 22-01-2016
DOI: 10.1038/SREP19848
Abstract: Despite interest in associating polymorphisms with clinical or experimental phenotypes, functional interpretation of mutation data has lagged behind generation of data from modern high-throughput techniques and the accurate prediction of the molecular impact of a mutation remains a non-trivial task. We present here an integrated knowledge-driven computational workflow designed to evaluate the effects of experimental and disease missense mutations on protein structure and interactions. We exemplify its application with analyses of saturation mutagenesis of DBR1 and Gal4 and show that the experimental phenotypes for over 80% of the mutations correlate well with predicted effects of mutations on protein stability and RNA binding affinity. We also show that analysis of mutations in VHL using our workflow provides valuable insights into the effects of mutations and their links to the risk of developing renal carcinoma. Taken together the analyses of the three ex les demonstrate that structural bioinformatics tools, when applied in a systematic, integrated way, can rapidly analyse a given system to provide a powerful approach for predicting structural and functional effects of thousands of mutations in order to reveal molecular mechanisms leading to a phenotype. Missense or non-synonymous mutations are nucleotide substitutions that alter the amino acid sequence of a protein. Their effects can range from modifying transcription, translation, processing and splicing, localization, changing stability of the protein, altering its dynamics or interactions with other proteins, nucleic acids and ligands, including small molecules and metal ions. The advent of high-throughput techniques including sequencing and saturation mutagenesis has provided large amounts of phenotypic data linked to mutations. However, one of the hurdles has been understanding and quantifying the effects of a particular mutation and how they translate into a given phenotype. One approach to overcome this is to use robust, accurate and scalable computational methods to understand and correlate structural effects of mutations with disease.
Publisher: Springer Science and Business Media LLC
Date: 22-12-2022
Publisher: Oxford University Press (OUP)
Date: 29-04-2017
DOI: 10.1093/NAR/GKX337
Publisher: American Chemical Society (ACS)
Date: 11-10-2022
Abstract: The design of novel, safe, and effective drugs to treat human diseases is a challenging venture, with toxicity being one of the main sources of attrition at later stages of development. Failure due to toxicity incurs a significant increase in costs and time to market, with multiple drugs being withdrawn from the market due to their adverse effects. Cardiotoxicity, for instance, was responsible for the failure of drugs such as fenspiride, propoxyphene, and valdecoxib. While significant effort has been dedicated to mitigate this issue by developing computational approaches that aim to identify molecules likely to be toxic, including quantitative structure-activity relationship models and machine learning methods, current approaches present limited performance and interpretability. To overcome these, we propose a new web-based computational method, cardioToxCSM , which can predict six types of cardiac toxicity outcomes, including arrhythmia, cardiac failure, heart block, hERG toxicity, hypertension, and myocardial infarction, efficiently and accurately. cardioToxCSM was developed using the concept of graph-based signatures, molecular descriptors, toxicophore matchings, and molecular fingerprints, leveraging explainable machine learning, and was validated internally via different cross validation schemes and externally via low-redundancy blind sets. The models presented robust performances with areas under ROC curves of up to 0.898 on 5-fold cross-validation, consistent with metrics on blind tests. Additionally, our models provide interpretation of the predictions by identifying whether substructures that are commonly enriched in toxic compounds were present. We believe cardioToxCSM will provide valuable insight into the potential cardiotoxicity of small molecules early on drug screening efforts. The method is made freely available as a web server at biosig.lab.uq.edu.au/cardiotoxcsm.
Publisher: Oxford University Press (OUP)
Date: 12-05-2020
DOI: 10.1093/BIOINFORMATICS/BTAA480
Abstract: EasyVS is a web-based platform built to simplify molecule library selection and virtual screening. With an intuitive interface, the tool allows users to go from selecting a protein target with a known structure and tailoring a purchasable molecule library to performing and visualizing docking in a few clicks. Our system also allows users to filter screening libraries based on molecule properties, cluster molecules by similarity and personalize docking parameters. EasyVS is freely available as an easy-to-use web interface at biosig.unimelb.edu.au/easyvs. douglas.pires@unimelb.edu.au or david.ascher@unimelb.edu.au Supplementary data are available at Bioinformatics online.
Publisher: American Chemical Society (ACS)
Date: 11-2021
Publisher: American Chemical Society (ACS)
Date: 02-07-2021
Publisher: American Chemical Society (ACS)
Date: 03-01-2023
Publisher: Wiley
Date: 02-03-2017
DOI: 10.1002/MGG3.279
Publisher: Oxford University Press (OUP)
Date: 26-10-2023
DOI: 10.1093/HMG/DDAD181
Publisher: Oxford University Press (OUP)
Date: 24-04-2015
DOI: 10.1093/BIOINFORMATICS/BTV223
Abstract: Summary: PDBest (PDB Enhanced Structures Toolkit) is a user-friendly, freely available platform for acquiring, manipulating and normalizing protein structures in a high-throughput and seamless fashion. With an intuitive graphical interface it allows users with no programming background to download and manipulate their files. The platform also exports protocols, enabling users to easily share PDB searching and filtering criteria, enhancing analysis reproducibility. Availability and implementation: PDBest installation packages are freely available for several platforms at www.pdbest.dcc.ufmg.br Contact: wellisson@dcc.ufmg.br, dpires@dcc.ufmg.br, raquelcm@dcc.ufmg.br Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 29-05-2021
DOI: 10.1093/NAR/GKAB428
Abstract: The identification of disease-causal variants is non-trivial. By mapping population variation from over 448,000 exome and genome sequences to over 81,000 experimental structures and homology models of the human proteome, we have calculated both regional intolerance to missense variation (Missense Tolerance Ratio, MTR), using a sliding window of 21–41 codons, and introduce a new 3D spatial intolerance to missense variation score (3D Missense Tolerance Ratio, MTR3D), using spheres of 5–8 Å. We show that the MTR3D is less biased by regions with limited data and more accurately identifies regions under purifying selection than estimates relying on the sequence alone. Intolerant regions were highly enriched for both ClinVar pathogenic and COSMIC somatic missense variants (Mann–Whitney U test P & 2.2 × 10−16). Further, we combine sequence- and spatial-based scores to generate a consensus score, MTRX, which distinguishes pathogenic from benign variants more accurately than either score separately (AUC = 0.85). The MTR3D server enables easy visualisation of population variation, MTR, MTR3D and MTRX scores across the entire gene and protein structure for & ,000 human genes and & ,000 alternative alternate transcripts, including both Ensembl and RefSeq transcripts. MTR3D is freely available by user-friendly web-interface and API at biosig.unimelb.edu.au/mtr3d/.
Publisher: Informa UK Limited
Date: 11-05-2017
DOI: 10.1080/17460441.2017.1322579
Abstract: Mutations introduce ersity into genomes, leading to selective changes and driving evolution. These changes have contributed to the emergence of many of the current major health concerns of the 21st century, from the development of genetic diseases and cancers to the rise and spread of drug resistance. The experimental systematic testing of all mutations in a system of interest is impractical and not cost-effective, which has created interest in the development of computational tools to understand the molecular consequences of mutations to aid and guide rational experimentation. Areas covered: Here, the authors discuss the recent development of computational methods to understand the effects of coding mutations to protein function and interactions, particularly in the context of the 3D structure of the protein. Expert opinion: While significant progress has been made in terms of innovative tools to understand and quantify the different range of effects in which a mutation or a set of mutations can give rise to a phenotype, a great gap still exists when integrating these predictions and drawing causality conclusions linking variants. This often requires a detailed understanding of the system being perturbed. However, as part of the drug development process it can be used preemptively in a similar fashion to pharmacokinetics predictions, to guide development of therapeutics to help guide the design and analysis of clinical trials, patient treatment and public health policy strategies.
Publisher: Oxford University Press (OUP)
Date: 29-06-2023
DOI: 10.1093/BIOINFORMATICS/BTAD392
Abstract: While antibodies have been ground-breaking therapeutic agents, the structural determinants for antibody binding specificity remain to be fully elucidated, which is compounded by the virtually unlimited repertoire of antigens they can recognize. Here, we have explored the structural landscapes of antibody–antigen interfaces to identify the structural determinants driving target recognition by assessing concavity and interatomic interactions. We found that complementarity-determining regions utilized deeper concavity with their longer H3 loops, especially H3 loops of nanobody showing the deepest use of concavity. Of all amino acid residues found in complementarity-determining regions, tryptophan used deeper concavity, especially in nanobodies, making it suitable for leveraging concave antigen surfaces. Similarly, antigens utilized arginine to bind to deeper pockets of the antibody surface. Our findings fill a gap in knowledge about the antibody specificity, binding affinity, and the nature of antibody–antigen interface features, which will lead to a better understanding of how antibodies can be more effective to target druggable sites on antigen surfaces. The data and scripts are available at: github.com/YoochanMyung/scripts.
Publisher: Springer Science and Business Media LLC
Date: 12-2011
Publisher: Portland Press Ltd.
Date: 15-08-2016
DOI: 10.1042/BST20160080
Abstract: Polypyrimidine tract binding protein (PTBP1) is a heterogeneous nuclear ribonucleoprotein (hnRNP) that plays roles in most stages of the life-cycle of pre-mRNA and mRNAs in the nucleus and cytoplasm. PTBP1 has four RNA binding domains of the RNA recognition motif (RRM) family, each of which can bind to pyrimidine motifs. In addition, RRM2 can interact via its dorsal surface with proteins containing short peptide ligands known as PTB RRM2 interacting (PRI) motifs, originally found in the protein Raver1. Here we review our recent progress in understanding the interactions of PTB with RNA and with various proteins containing PRI ligands.
Publisher: Springer Berlin Heidelberg
Date: 2014
Publisher: Public Library of Science (PLoS)
Date: 13-12-2016
Publisher: Oxford University Press (OUP)
Date: 21-05-2022
DOI: 10.1093/BIB/BBAC178
Abstract: Metals are present in >30% of proteins found in nature and assist them to perform important biological functions, including storage, transport, signal transduction and enzymatic activity. Traditional and experimental techniques for metal-binding site prediction are usually costly and time-consuming, making computational tools that can assist in these predictions of significant importance. Here we present Genetic Active Site Search (GASS)-Metal, a new method for protein metal-binding site prediction. The method relies on a parallel genetic algorithm to find candidate metal-binding sites that are structurally similar to curated templates from M-CSA and MetalPDB. GASS-Metal was thoroughly validated using homologous proteins and conservative mutations of residues, showing a robust performance. The ability of GASS-Metal to identify metal-binding sites was also compared with state-of-the-art methods, outperforming similar methods and achieving an MCC of up to 0.57 and detecting up to 96.1% of the sites correctly. GASS-Metal is freely available at gassmetal.unifei.edu.br. The GASS-Metal source code is available at androizidoro/gassmetal-local.
Publisher: Wiley
Date: 26-10-2022
DOI: 10.1002/ART.42296
Abstract: Deep learning has emerged as the leading method in machine learning, spawning a rapidly growing field of academic research and commercial applications across medicine. Deep learning could have particular relevance to rheumatology if correctly utilized. The greatest benefits of deep learning methods are seen with unstructured data frequently found in rheumatology, such as images and text, where traditional machine learning methods have struggled to unlock the trove of information held within these data formats. The basis for this success comes from the ability of deep learning to learn the structure of the underlying data. It is no surprise that the first areas of medicine that have started to experience impact from deep learning heavily rely on interpreting visual data, such as triaging radiology workflows and computer‐assisted colonoscopy. Applications in rheumatology are beginning to emerge, with recent successes in areas as erse as detecting joint erosions on plain radiography, predicting future rheumatoid arthritis disease activity, and identifying halo sign on temporal artery ultrasound. Given the important role deep learning methods are likely to play in the future of rheumatology, it is imperative that rheumatologists understand the methods and assumptions that underlie the deep learning algorithms in widespread use today, their limitations and the landscape of deep learning research that will inform algorithm development, and clinical decision support tools of the future. The best applications of deep learning in rheumatology must be informed by the clinical experience of rheumatologists, so that algorithms can be developed to tackle the most relevant clinical problems.
Publisher: Oxford University Press (OUP)
Date: 21-05-2018
DOI: 10.1093/NAR/GKY375
Publisher: Oxford University Press (OUP)
Date: 21-06-2022
DOI: 10.1093/BIB/BBAC216
Abstract: The rate of biological data generation has increased dramatically in recent years, which has driven the importance of databases as a resource to guide innovation and the generation of biological insights. Given the complexity and scale of these databases, automatic data classification is often required. Biological data sets are often hierarchical in nature, with varying degrees of complexity, imposing different challenges to train, test and validate accurate and generalizable classification models. While some approaches to classify hierarchical data have been proposed, no guidelines regarding their utility, applicability and limitations have been explored or implemented. These include ‘Local’ approaches considering the hierarchy, building models per level or node, and ‘Global’ hierarchical classification, using a flat classification approach. To fill this gap, here we have systematically contrasted the performance of ‘Local per Level’ and ‘Local per Node’ approaches with a ‘Global’ approach applied to two different hierarchical datasets: BioLip and CATH. The results show how different components of hierarchical data sets, such as variation coefficient and prediction by depth, can guide the choice of appropriate classification schemes. Finally, we provide guidelines to support this process when embarking on a hierarchical classification task, which will help optimize computational resources and predictive performance.
Publisher: Oxford University Press (OUP)
Date: 23-08-2022
DOI: 10.1093/BIB/BBAC337
Abstract: Drug discovery is a lengthy, costly and high-risk endeavour that is further convoluted by high attrition rates in later development stages. Toxicity has been one of the main causes of failure during clinical trials, increasing drug development time and costs. To facilitate early identification and optimisation of toxicity profiles, several computational tools emerged aiming at improving success rates by timely pre-screening drug candidates. Despite these efforts, there is an increasing demand for platforms capable of assessing both environmental as well as human-based toxicity properties at large scale. Here, we present toxCSM, a comprehensive computational platform for the study and optimisation of toxicity profiles of small molecules. toxCSM leverages on the well-established concepts of graph-based signatures, molecular descriptors and similarity scores to develop 36 models for predicting a range of toxicity properties, which can assist in developing safer drugs and agrochemicals. toxCSM achieved an Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) of up to 0.99 and Pearson’s correlation coefficients of up to 0.94 on 10-fold cross-validation, with comparable performance on blind test sets, outperforming all alternative methods. toxCSM is freely available as a user-friendly web server and API at biosig.lab.uq.edu.au/toxcsm.
Publisher: Oxford University Press (OUP)
Date: 14-05-2014
DOI: 10.1093/NAR/GKU411
Publisher: Oxford University Press (OUP)
Date: 07-05-2022
DOI: 10.1093/NAR/GKAC323
Abstract: Proteins are essential macromolecules for the maintenance of living systems. Many of them perform their function by interacting with other molecules in regions called binding sites. The identification and characterization of these regions are of fundamental importance to determine protein function, being a fundamental step in processes such as drug design and discovery. However, identifying such binding regions is not trivial due to the drawbacks of experimental methods, which are costly and time-consuming. Here we propose GRaSP-web, a web server that uses GRaSP (Graph-based Residue neighborhood Strategy to Predict binding sites), a residue-centric method based on graphs that uses machine learning to predict putative ligand binding site residues. The method outperformed 6 state-of-the-art residue-centric methods (MCC of 0.61). Also, GRaSP-web is scalable as it takes 10-20 seconds to predict binding sites for a protein complex (the state-of-the-art residue-centric method takes 2-5h on the average). It proved to be consistent in predicting binding sites for bound/unbound structures (MCC 0.61 for both) and for a large dataset of multi-chain proteins (4500 entries, MCC 0.61). GRaSPWeb is freely available at grasp.ufv.br.
Publisher: Elsevier BV
Date: 12-2021
Publisher: Oxford University Press (OUP)
Date: 24-04-2021
DOI: 10.1093/NAR/GKAB273
Abstract: Protein–protein interactions play a crucial role in all cellular functions and biological processes and mutations leading to their disruption are enriched in many diseases. While a number of computational methods to assess the effects of variants on protein–protein binding affinity have been proposed, they are in general limited to the analysis of single point mutations and have been shown to perform poorly on independent test sets. Here, we present mmCSM-PPI, a scalable and effective machine learning model for accurately assessing changes in protein–protein binding affinity caused by single and multiple missense mutations. We expanded our well-established graph-based signatures in order to capture physicochemical and geometrical properties of multiple wild-type residue environments and integrated them with substitution scores and dynamics terms from normal mode analysis. mmCSM-PPI was able to achieve a Pearson's correlation of up to 0.75 (RMSE = 1.64 kcal/mol) under 10-fold cross-validation and 0.70 (RMSE = 2.06 kcal/mol) on a non-redundant blind test, outperforming existing methods. Our method is freely available as a user-friendly and easy-to-use web server and API at biosig.unimelb.edu.au/mmcsm_ppi.
Publisher: Oxford University Press (OUP)
Date: 26-10-2020
DOI: 10.1093/BIOINFORMATICS/BTZ779
Abstract: A lack of accurate computational tools to guide rational mutagenesis has made affinity maturation a recurrent challenge in antibody (Ab) development. We previously showed that graph-based signatures can be used to predict the effects of mutations on Ab binding affinity. Here we present an updated and refined version of this approach, mCSM-AB2, capable of accurately modelling the effects of mutations on Ab–antigen binding affinity, through the inclusion of evolutionary and energetic terms. Using a new and expanded database of over 1800 mutations with experimental binding measurements and structural information, mCSM-AB2 achieved a Pearson’s correlation of 0.73 and 0.77 across training and blind tests, respectively, outperforming available methods currently used for rational Ab engineering. mCSM-AB2 is available as a user-friendly and freely accessible web server providing rapid analysis of both in idual mutations or the entire binding interface to guide rational antibody affinity maturation at biosig.unimelb.edu.au/mcsm_ab2 Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 22-05-2019
DOI: 10.1093/NAR/GKZ383
Abstract: Protein–protein Interactions are involved in most fundamental biological processes, with disease causing mutations enriched at their interfaces. Here we present mCSM-PPI2, a novel machine learning computational tool designed to more accurately predict the effects of missense mutations on protein–protein interaction binding affinity. mCSM-PPI2 uses graph-based structural signatures to model effects of variations on the inter-residue interaction network, evolutionary information, complex network metrics and energetic terms to generate an optimised predictor. We demonstrate that our method outperforms previous methods, ranking first among 26 others on CAPRI blind tests. mCSM-PPI2 is freely available as a user friendly webserver at biosig.unimelb.edu.au/mcsm_ppi2/.
Publisher: Springer Science and Business Media LLC
Date: 23-03-2016
Publisher: Cold Spring Harbor Laboratory
Date: 26-09-2021
DOI: 10.1101/2021.09.26.461876
Abstract: Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods have led to protein structure predictions that have reached the accuracy of experimentally determined models. While this has been independently verified, the implementation of these methods across structural biology applications remains to be tested. Here, we evaluate the use of AlphaFold 2 (AF2) predictions in the study of characteristic structural elements the impact of missense variants function and ligand binding site predictions modelling of interactions and modelling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modelled when compared to homology modelling, identifying structural features rarely seen in the PDB. AF2-based predictions of protein disorder and protein complexes surpass state-of-the-art tools and AF2 models can be used across erse applications equally well compared to experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life science research.
Publisher: Oxford University Press (OUP)
Date: 04-11-2021
DOI: 10.1093/BIOINFORMATICS/BTAB762
Abstract: Understanding antibody–antigen interactions is key to improving their binding affinities and specificities. While experimental approaches are fundamental for developing new therapeutics, computational methods can provide quick assessment of binding landscapes, guiding experimental design. Despite this, little effort has been devoted to accurately predicting the binding affinity between antibodies and antigens and to develop tailored docking scoring functions for this type of interaction. Here, we developed CSM-AB, a machine learning method capable of predicting antibody–antigen binding affinity by modelling interaction interfaces as graph-based signatures. CSM-AB outperformed alternative methods achieving a Pearson's correlation of up to 0.64 on blind tests. We also show CSM-AB can accurately rank near-native poses, working effectively as a docking scoring function. We believe CSM-AB will be an invaluable tool to assist in the development of new immunotherapies. CSM-AB is freely available as a user-friendly web interface and API at biosig.unimelb.edu.au/csm_ab/datasets. Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 05-05-2016
DOI: 10.1093/NAR/GKW390
Publisher: Informa UK Limited
Date: 28-04-2019
No related grants have been discovered for Douglas Pires.