ARDC Research Link Australia

Publication

Can simple codon pair usage predict protein-protein interaction?

Publisher: Royal Society of Chemistry (RSC)

Date: 2012

Abstract: Deciphering functional interactions between proteins is one of the great challenges in biology. Sequence-based homology-free encoding schemes have been increasingly applied to develop promising protein-protein interaction (PPI) predictors by means of statistical or machine learning methods. Here we analyze the relationship between codon pair usage and PPIs in yeast. We show that codon pair usage of interacting protein pairs differs significantly from randomly expected. This motivates the development of a novel approach for predicting PPIs, with codon pair frequency difference as input to a Support Vector Machine predictor, termed as CCPPI. 10-fold cross-validation tests based on yeast PPI datasets with balanced positive-to-negative ratios indicate that CCPPI performs better than other sequence-based encoding schemes. Moreover, it ranks the best when tested on an unbalanced large-scale dataset. Although CCPPI is subjected to high false positive rates like many PPI predictors, statistical analyses of the predicted true positives confirm that the success of CCPPI is partly ascribed to its capability to capture proteomic co-expression and functional similarities between interacting protein pairs. Our findings suggest that codon pairs of interacting protein pairs evolve in a coordinated manner and consequently they provide additional information beyond amino acids-based encoding schemes. CCPPI has been made freely available at: protein.cau.edu.cn/ccppi.

Publication

PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection

Publisher: Springer Science and Business Media LLC

Date: 31-07-2017

DOI: 10.1038/S41598-017-07199-4

Abstract: Protein phosphorylation is a major form of post-translational modification (PTM) that regulates erse cellular processes. In silico methods for phosphorylation site prediction can provide a useful and complementary strategy for complete phosphoproteome annotation. Here, we present a novel bioinformatics tool, PhosphoPredict, that combines protein sequence and functional features to predict kinase-specific substrates and their associated phosphorylation sites for 12 human kinases and kinase families, including ATM, CDKs, GSK-3, MAPKs, PKA, PKB, PKC, and SRC. To elucidate critical determinants, we identified feature subsets that were most informative and relevant for predicting substrate specificity for each in idual kinase family. Extensive benchmarking experiments based on both five-fold cross-validation and independent tests indicated that the performance of PhosphoPredict is competitive with that of several other popular prediction tools, including KinasePhos, PPSP, GPS, and Musite. We found that combining protein functional and sequence features significantly improves phosphorylation site prediction performance across all kinases. Application of PhosphoPredict to the entire human proteome identified 150 to 800 potential phosphorylation substrates for each of the 12 kinases or kinase families. PhosphoPredict significantly extends the bioinformatics portfolio for kinase function analysis and will facilitate high-throughput identification of kinase-specific phosphorylation sites, thereby contributing to both basic and translational research programs.

Publication

hCKSAAP_UbSite: Improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties

Publisher: Elsevier BV

Date: 08-2013

DOI: 10.1016/J.BBAPAP.2013.04.006

Abstract: As one of the most common post-translational modifications, ubiquitination regulates the quantity and function of a variety of proteins. Experimental and clinical investigations have also suggested the crucial roles of ubiquitination in several human diseases. The complicated sequence context of human ubiquitination sites revealed by proteomic studies highlights the need of developing effective computational strategies to predict human ubiquitination sites. Here we report the establishment of a novel human-specific ubiquitination site predictor through the integration of multiple complementary classifiers. Firstly, a Support Vector Machine (SVM) classier was constructed based on the composition of k-spaced amino acid pairs (CKSAAP) encoding, which has been utilized in our previous yeast ubiquitination site predictor. To further exploit the pattern and properties of the ubiquitination sites and their flanking residues, three additional SVM classifiers were constructed using the binary amino acid encoding, the AAindex physicochemical property encoding and the protein aggregation propensity encoding, respectively. Through an integration that relied on logistic regression, the resulting predictor termed hCKSAAP_UbSite achieved an area under ROC curve (AUC) of 0.770 in 5-fold cross-validation test on a class-balanced training dataset. When tested on a class-balanced independent testing dataset that contains 3419 ubiquitination sites, hCKSAAP_UbSite has also achieved a robust performance with an AUC of 0.757. Specifically, it has consistently performed better than the predictor using the CKSAAP encoding alone and two other publicly available predictors which are not human-specific. Given its promising performance in our large-scale datasets, hCKSAAP_UbSite has been made publicly available at our server (protein.cau.edu.cn/cksaap_ubsite/).

Publication

Predicting residue-residue contacts and helix-helix interactions in transmembrane proteins using an integrative feature-based random forest approach

Publisher: Public Library of Science (PLoS)

Date: 28-10-2011

DOI: 10.1371/JOURNAL.PONE.0026767

Publication

Three chromosome-level duck genome assemblies provide insights into genomic variation during domestication

Publisher: Springer Science and Business Media LLC

Date: 11-10-2021

DOI: 10.1038/S41467-021-26272-1

Abstract: Domestic ducks are raised for meat, eggs and feather down, and almost all varieties are descended from the Mallard (Anas platyrhynchos). Here, we report chromosome-level high-quality genome assemblies for meat and laying duck breeds, and the Mallard. Our new genomic databases contain annotations for thousands of new protein-coding genes and recover a major percentage of the presumed “missing genes” in birds. We obtain the entire genomic sequences for the C-type lectin (CTL) family members that regulate eggshell biomineralization. Our population and comparative genomics analyses provide more than 36 million sequence variants between duck populations. Furthermore, a mutant cell line allows confirmation of the predicted anti-adipogenic function of NR2F2 in the duck, and uncovered mutations specific to Pekin duck that potentially affect adipose deposition. Our study provides insights into avian evolution and the genetics of oviparity, and will be a rich resource for the future genetic improvement of commercial traits in the duck.

Publication

An Integrative Computational Framework Based on a Two-Step Random Forest Algorithm Improves Prediction of Zinc-Binding Sites in Proteins

Publisher: Public Library of Science (PLoS)

Date: 14-11-2012

DOI: 10.1371/JOURNAL.PONE.0049716

Publication

Structural propensities of human ubiquitination sites: accessibility, centrality and local conformation

Publisher: Public Library of Science (PLoS)

Date: 11-12-2013

DOI: 10.1371/JOURNAL.PONE.0083167

Publication

ZincExplorer: An accurate hybrid method to improve the prediction of zinc-binding sites from protein sequences

Publisher: Royal Society of Chemistry (RSC)

Date: 2013

DOI: 10.1039/C3MB70100J

Abstract: As one of the most important trace elements within an organism, zinc has been shown to be involved in numerous biological processes and closely implicated in various diseases. The zinc ion is important for proteins to perform their functional roles. To provide in-depth functional annotation of zinc-binding proteins, an initial but crucial step is the accurate recognition of zinc-binding sites. Motivated by the biological importance of zinc, we propose a new method called ZincExplorer to predict zinc-binding sites from protein sequences. ZincExplorer is a hybrid method that can accurately predict zinc-binding sites from protein sequences. It integrates the outputs of three different types of predictors, namely, SVM-, cluster- and template-based predictors. Four types of zinc-binding amino acids CHEDs (i.e. CYS, HIS, ASP and GLU) could be predicted using ZincExplorer. It achieved a high AURPC (Area Under Recall-Precision Curve) of 0.851, and a precision of 85.6% (specificity = 98.4%, MCC = 0.747) at the 70.0% recall for the CHEDs on the 5-fold cross-validation test. When tested on an independent dataset containing 2023 zinc-binding CHEDs and 14,493 non-zinc-binding CHEDs, it achieved about 3-8% higher AURPC in comparison to two other sequence-based predictors. Moreover, ZincExplorer could also identify the interdependent relationships (IRs) of the predicted zinc-binding sites bound to the same zinc ion, which makes it a useful tool for providing in-depth zinc-binding site annotation.

Publication

Interaction of Positional Isomers of Quercetin Glucuronides with the Transporter ABCC2 (cMOAT, MRP2)

Publisher: American Society for Pharmacology & Experimental Therapeutics (ASPET)

Date: 03-05-2007

DOI: 10.1124/DMD.106.014241

Abstract: The exporter ABCC2 (cMOAT, MRP2) is a membrane-bound protein on the apical side of enterocytes and hepatic biliary vessels that transports leukotriene C(4), glutathione, some conjugated bile salts, drugs, xenobiotics, and phytonutrients. The latter class includes quercetin, a bioactive flavonoid found in foods such as onions, apples, tea, and wine. There is no available three-dimensional (3D) structure of ABCC2. We have predicted the 3D structure by in silico modeling, showing that 3-[[3-[2-(7-chloroquinolin-2-yl)vinyl]phenyl]-(2-dimethylcarbamoylethylsulfanyl)methylsulfanyl] propionic acid (MK571) binds most tightly to the putative binding site, and then tested the computational prediction experimentally by measuring interaction with all quercetin monoglucuronides occurring in vivo (quercetin substituted with glucuronic acid at the 3-, 3'-, 4'-, and 7-hydroxyl groups). The 4'-O-beta-D-glucuronide is predicted in silico to interact most strongly and the 3-O-beta-D-glucuronide most weakly, and this prediction is supported experimentally using binding and competition assays on ABCC2-overexpressing baculovirus-infected Sf9 cells. To test the transport in situ, we examined the effect of two ABCC2 inhibitors, MK571 and cyclosporin A, on the transport into the media of quercetin glucuronides produced intracellularly by Caco2 cells. The inhibitors reduced the amount of all quercetin glucuronides in the media. The results show that the molecular model of ABCC2 agrees well with experimentally determined ABCC2-ligand interactions and, importantly, that the interaction of ABCC2 with quercetin glucuronides is dependent on the position and nature of substitution.

Publication

Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features

Publisher: Springer Science and Business Media LLC

Date: 21-07-2014

DOI: 10.1038/SREP05765

Publication

Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues

Publisher: Public Library of Science (PLoS)

Date: 19-07-2012

DOI: 10.1371/JOURNAL.PONE.0041370

Ziding Zhang

Researcher

Publications

Can simple codon pair usage predict protein-protein interaction?

PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection

hCKSAAP_UbSite: Improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties

Predicting residue-residue contacts and helix-helix interactions in transmembrane proteins using an integrative feature-based random forest approach

Three chromosome-level duck genome assemblies provide insights into genomic variation during domestication

An Integrative Computational Framework Based on a Two-Step Random Forest Algorithm Improves Prediction of Zinc-Binding Sites in Proteins

Structural propensities of human ubiquitination sites: accessibility, centrality and local conformation

ZincExplorer: An accurate hybrid method to improve the prediction of zinc-binding sites from protein sequences

Interaction of Positional Isomers of Quercetin Glucuronides with the Transporter ABCC2 (cMOAT, MRP2)

Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features

Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues

Related Organisations

Tianjin University

China Agricultural University

Related Funding Activities

Ziding Zhang

Researcher

Related Links

Publications

Can simple codon pair usage predict protein-protein interaction?

PhosphoPredict: A bioinformatics tool for prediction of human kinase-specific phosphorylation substrates and sites by integrating heterogeneous feature selection

hCKSAAP_UbSite: Improved prediction of human ubiquitination sites by exploiting amino acid pattern and properties

Predicting residue-residue contacts and helix-helix interactions in transmembrane proteins using an integrative feature-based random forest approach

Three chromosome-level duck genome assemblies provide insights into genomic variation during domestication

An Integrative Computational Framework Based on a Two-Step Random Forest Algorithm Improves Prediction of Zinc-Binding Sites in Proteins

Structural propensities of human ubiquitination sites: accessibility, centrality and local conformation

ZincExplorer: An accurate hybrid method to improve the prediction of zinc-binding sites from protein sequences

Interaction of Positional Isomers of Quercetin Glucuronides with the Transporter ABCC2 (cMOAT, MRP2)

Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features

Identification of catalytic residues using a novel feature that integrates the microenvironment and geometrical location properties of residues

Related Organisations

Tianjin University

China Agricultural University

Related Funding Activities

ARDC NEWSLETTER SIGNUP