ORCID Profile
0000-0002-9958-5699
Current Organisations
Griffith University - Gold Coast Campus
,
Shenzhen Bay Laboratory
Does something not look right? The information on this page has been harvested from data sources that may not be up to date. We continue to work with information providers to improve coverage and quality. To report an issue, use the Feedback Form.
In Research Link Australia (RLA), "Research Topics" refer to ANZSRC FOR and SEO codes. These topics are either sourced from ANZSRC FOR and SEO codes listed in researchers' related grants or generated by a large language model (LLM) based on their publications.
Cognitive Science | Biochemistry and Cell Biology | Knowledge Representation and Machine Learning | Bioinformatics Software | Molecular Evolution | Analytical Biochemistry | Bioinformatics | Cell Development, Proliferation and Death | Systems Biology
Expanding Knowledge in the Biological Sciences | Expanding Knowledge in the Information and Computing Sciences | Flora, Fauna and Biodiversity of environments not elsewhere classified | Information Processing Services (incl. Data Entry and Capture) | Human Pharmaceutical Products not elsewhere classified |
Publisher: Wiley
Date: 11-10-2008
DOI: 10.1002/PROT.21654
Abstract: How to make an objective assignment of secondary structures based on a protein structure is an unsolved problem. Defining the boundaries between helix, sheet, and coil structures is arbitrary, and commonly accepted standard assignments do not exist. Here, we propose a criterion that assesses secondary structure assignment based on the similarity of the secondary structures assigned to pairwise sequence-alignment benchmarks, where these benchmarks are determined by prior structural alignments of the protein pairs. This criterion is used to rank six secondary structure assignment methods: STRIDE, DSSP, SECSTR, KAKSI, P-SEA, and SEGNO with three established sequence-alignment benchmarks (PREFAB, SABmark, and SALIGN). STRIDE and KAKSI achieve comparable success rates in assigning the same secondary structure elements to structurally aligned residues in the three benchmarks. Their success rates are between 1-4% higher than those of the other four methods. The consensus of STRIDE, KAKSI, SECSTR, and P-SEA, called SKSP, improves assignments over the best single method in each benchmark by an additional 1%. These results support the usefulness of the sequence-alignment benchmarks as a means to evaluate secondary structure assignment. The SKSP server and the benchmarks can be accessed at sparks.informatics.iupui.edu
Publisher: Wiley
Date: 05-2007
DOI: 10.1110/PS.062597307
Publisher: Elsevier BV
Date: 02-2003
DOI: 10.1016/S0022-2836(02)01329-3
Abstract: Cooperative binding of ligands to proteins can serve to increase their efficiency and to regulate their activity. Thus, understanding of the mechanism of cooperativity is one of the central concerns of molecular biology. For the tetrameric human hemoglobin (HbA), the cooperative mechanism involves a reasonably well understood combination of tertiary and quaternary changes that occur during the binding process. The dimeric hemoglobin of Scapharca (HbI), which is composed of subunits with the same fold as in HbA, is also highly cooperative but the structural changes on ligand binding are small. A re-orientation of Phe97 in the binding pocket and changes in the number of interfacial water molecules have been implicated in the cooperative mechanism. To explore the role of these factors, we have investigated models of partially liganded intermediate states of HbI with molecular dynamics simulation methods. Since, unlike HbA, no structures for intermediates are available, they were constructed by combining subunits from the unliganded and liganded dimers. Two structurally distinct intermediates were examined, and it was shown that the transition between the two intermediates is directly coupled to the number of interfacial water molecules. Further, it was found that there is a well-defined water channel that connects the interface between the subunits to bulk water. The bottleneck (gate) of the channel, which can be open or closed, is made of hydrophilic residues. The implication of the present results for the cooperative mechanism of HbI is discussed.
Publisher: Springer Science and Business Media LLC
Date: 13-05-2021
DOI: 10.1038/S41467-021-23100-4
Abstract: Refining modelled structures to approach experimental accuracy is one of the most challenging problems in molecular biology. Despite many years’ efforts, the progress in protein or RNA structure refinement has been slow because the global minimum given by the energy scores is not at the experimentally determined “native” structure. Here, we propose a fully knowledge-based energy function that captures the full orientation dependence of base–base, base–oxygen and oxygen–oxygen interactions with the RNA backbone modelled by rotameric states and internal energies. A total of 4000 quantum-mechanical calculations were performed to reweight base–base statistical potentials for minimizing possible effects of indirect interactions. The resulting BRiQ knowledge-based potential, equipped with a nucleobase-centric s ling algorithm, provides a robust improvement in refining near-native RNA models generated by a wide variety of modelling techniques.
Publisher: Hindawi Limited
Date: 10-07-2017
DOI: 10.1002/HUMU.23283
Abstract: Synonymous single-nucleotide variants (SNVs), although they do not alter the encoded protein sequences, have been implicated in many genetic diseases. Experimental studies indicate that synonymous SNVs can lead to changes in the secondary and tertiary structures of DNA and RNA, thereby affecting translational efficiency, cotranslational protein folding as well as the binding of DNA-/RNA-binding proteins. However, the importance of these various features in disease phenotypes is not clearly understood. Here, we have built a support vector machine (SVM) model (termed DDIG-SN) as a means to discriminate disease-causing synonymous variants. The model was trained and evaluated on nearly 900 disease-causing variants. The method achieves robust performance with the area under the receiver operating characteristic curve of 0.84 and 0.85 for protein-stratified 10-fold cross-validation and independent testing, respectively. We were able to show that the disease-causing effects in the immediate proximity to exon-intron junctions (1-3 bp) are driven by the loss of splicing motif strength, whereas the gain of splicing motif strength is the primary cause in regions further away from the splice site (4-69 bp). The method is available as a part of the DDIG server at dig.
Publisher: Hindawi Limited
Date: 09-2019
DOI: 10.1002/HUMU.23843
Publisher: eLife Sciences Publications, Ltd
Date: 11-09-2023
Publisher: Oxford University Press (OUP)
Date: 07-2022
DOI: 10.1093/NAR/GKI360
Publisher: Oxford University Press (OUP)
Date: 22-05-2021
DOI: 10.1093/BIOINFORMATICS/BTAB391
Abstract: The accuracy of RNA secondary and tertiary structure prediction can be significantly improved by using structural restraints derived from evolutionary coupling or direct coupling analysis. Currently, these coupling analyses relied on manually curated multiple sequence alignments collected in the Rfam database, which contains 3016 families. By comparison, millions of non-coding RNA sequences are known. Here, we established RNAcmap, a fully automatic pipeline that enables evolutionary coupling analysis for any RNA sequences. The homology search was based on the covariance model built by INFERNAL according to two secondary structure predictors: a folding-based algorithm RNAfold and the latest deep-learning method SPOT-RNA. We showed that the performance of RNAcmap is less dependent on the specific evolutionary coupling tool but is more dependent on the accuracy of secondary structure predictor with the best performance given by RNAcmap (SPOT-RNA). The performance of RNAcmap (SPOT-RNA) is comparable to that based on Rfam-supplied alignment and consistent for those sequences that are not in Rfam collections. Further improvement can be made with a simple meta predictor RNAcmap (SPOT-RNA/RNAfold) depending on which secondary structure predictor can find more homologous sequences. Reliable base-pairing information generated from RNAcmap, for RNAs with high effective homologous sequences, in particular, will be useful for aiding RNA structure prediction. RNAcmap is available as a web server at erver/rnacmap/ and as a standalone application along with the datasets at parks-lab-org/RNAcmap_standalone. A platform independent and fully configured docker image of RNAcmap is also provided at /jaswindersingh2/rnacmap. Supplementary data are available at Bioinformatics online.
Publisher: Hindawi Limited
Date: 09-2019
DOI: 10.1002/HUMU.23838
Publisher: Wiley
Date: 05-02-2016
DOI: 10.1002/JCC.24298
Abstract: An important unsolved problem in molecular and structural biology is the protein folding and structure prediction problem. One major bottleneck for solving this is the lack of an accurate energy to discriminate near-native conformations against other possible conformations. Here we have developed sDFIRE energy function, which is an optimized linear combination of DFIRE (the Distance-scaled Finite Ideal gas Reference state based Energy), the orientation dependent (polar-polar and polar-nonpolar) statistical potentials, and the matching scores between predicted and model structural properties including predicted main-chain torsion angles and solvent accessible surface area. The weights for these scoring terms are optimized by three widely used decoy sets consisting of a total of 134 proteins. Independent tests on CASP8 and CASP9 decoy sets indicate that sDFIRE outperforms other state-of-the-art energy functions in selecting near native structures and in the Pearson's correlation coefficient between the energy score and structural accuracy of the model (measured by TM-score).
Publisher: Oxford University Press (OUP)
Date: 07-12-2018
DOI: 10.1093/BIOINFORMATICS/BTY1006
Abstract: Sequence-based prediction of one dimensional structural properties of proteins has been a long-standing subproblem of protein structure prediction. Recently, prediction accuracy has been significantly improved due to the rapid expansion of protein sequence and structure libraries and advances in deep learning techniques, such as residual convolutional networks (ResNets) and Long-Short-Term Memory Cells in Bidirectional Recurrent Neural Networks (LSTM-BRNNs). Here we leverage an ensemble of LSTM-BRNN and ResNet models, together with predicted residue-residue contact maps, to continue the push towards the attainable limit of prediction for 3- and 8-state secondary structure, backbone angles (θ, τ, ϕ and ψ), half-sphere exposure, contact numbers and solvent accessible surface area (ASA). The new method, named SPOT-1D, achieves similar, high performance on a large validation set and test set (≈1000 proteins in each set), suggesting robust performance for unseen data. For the large test set, it achieves 87% and 77% in 3- and 8-state secondary structure prediction and 0.82 and 0.86 in correlation coefficients between predicted and measured ASA and contact numbers, respectively. Comparison to current state-of-the-art techniques reveals substantial improvement in secondary structure and backbone angle prediction. In particular, 44% of 40-residue fragment structures constructed from predicted backbone Cα-based θ and τ angles are less than 6 Å root-mean-squared-distance from their native conformations, nearly 20% better than the next best. The method is expected to be useful for advancing protein structure and function prediction. SPOT-1D and its data is available at: Supplementary data are available at Bioinformatics online.
Publisher: Wiley
Date: 13-06-2005
DOI: 10.1002/JCC.20251
Abstract: We developed a method for fast decoy clustering by using reference root-mean-squared distance (rRMSD) rather than commonly used pairwise RMSD (pRMSD) values. For 41 proteins with 2000 decoys each, the computing efficiency increases nine times without a significant change in the accuracy of near-native selections. Tests on additional protein decoys based on different reference conformations confirmed this result. Further analysis indicates that the pRMSD and rRMSD values are highly correlated (with an average correlation coefficient of 0.82) and the clusters obtained from pRMSD and rRMSD values are highly similar (the representative structures of the top five largest clusters from the two methods are 74% identical). SCUD (Structure ClUstering of Decoys) with an automatic cutoff value is available at theory.med.buffalo.edu.
Publisher: Wiley
Date: 11-05-2010
DOI: 10.1002/PROT.22746
Publisher: Wiley
Date: 24-06-2005
DOI: 10.1002/PROT.20576
Abstract: We entered the CAPRI experiment during the middle of Round 4 and have submitted predictions for all 6 targets released since then. We used the following procedures for docking prediction: (1) the identification of possible binding region(s) of a target based on known biological information, (2) rigid-body s ling around the binding region(s) by using the docking program ZDOCK, (3) ranking of the s led complex conformations by employing the DFIRE-based statistical energy function, (4) clustering based on pairwise root-mean-square distance and the DFIRE energy, and (5) manual inspection and relaxation of the side-chain conformations of the top-ranked structures by geometric constraint. Reasonable predictions were made for 4 of the 6 targets. The best fraction of native contacts within the top 10 models are 89.1% for Target 12, 54.3% for Target 13, 29.3% for Target 14, and 94.1% for Target 18. The origin of successes and failures is discussed. .
Publisher: Public Library of Science (PLoS)
Date: 02-05-2014
Publisher: Springer Science and Business Media LLC
Date: 16-03-2017
DOI: 10.1038/SREP44150
Abstract: Understanding ethanol-induced stresses and responses in biofuel-producing bacteria at systems level has significant implications in engineering more efficient biofuel producers. We present a computational study of transcriptomic and genomic data of both ethanol-stressed and ethanol-adapted E. coli cells with computationally predicated ethanol-binding proteins and experimentally identified ethanol tolerance genes. Our analysis suggests: (1) ethanol damages cell wall and membrane integrity, causing increased stresses, particularly reactive oxygen species, which damages DNA and reduces the O 2 level (2) decreased cross-membrane proton gradient from membrane damage, coupled with hypoxia, leads to reduced ATP production by aerobic respiration, driving cells to rely more on fatty acid oxidation, anaerobic respiration and fermentation for ATP production (3) the reduced ATP generation results in substantially decreased synthesis of macromolecules (4) ethanol can directly bind 213 proteins including transcription factors, altering their functions (5) all these changes together induce multiple stress responses, reduced biosynthesis, cell viability and growth and (6) ethanol-adapted E. coli cells restore the majority of these reduced activities through selection of specific genomic mutations and alteration of stress responses, ultimately restoring normal ATP production, macromolecule biosynthesis, and growth. These new insights into the energy and mass balance will inform design of more ethanol-tolerant strains.
Publisher: Elsevier BV
Date: 03-2016
DOI: 10.1016/J.JMB.2016.01.012
Abstract: Protein engineering and characterisation of non-synonymous single nucleotide variants (SNVs) require accurate prediction of protein stability changes (ΔΔGu) induced by single amino acid substitutions. Here, we have developed a new prediction method called Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM), which comprises five specialised support vector machine (SVM) models and makes the final prediction from a consensus of two models selected based on the predicted secondary structure and accessible surface area of the mutated residue. The new method is applicable to single-domain monomeric proteins and can predict ΔΔGu with a protein sequence and mutation as the only inputs. EASE-MM yielded a Pearson correlation coefficient of 0.53-0.59 in 10-fold cross-validation and independent testing and was able to outperform other sequence-based methods. When compared to structure-based energy functions, EASE-MM achieved a comparable or better performance. The application to a large dataset of human germline non-synonymous SNVs showed that the disease-causing variants tend to be associated with larger magnitudes of ΔΔGu predicted with EASE-MM. The EASE-MM web-server is available at erver/ease.
Publisher: Wiley
Date: 14-08-2009
DOI: 10.1002/PROT.22193
Publisher: Oxford University Press (OUP)
Date: 05-05-2020
DOI: 10.1093/BIOINFORMATICS/BTAA292
Abstract: Molecular docking is a widely used technique for large-scale virtual screening of the interactions between small-molecule ligands and their target proteins. However, docking methods often perform poorly for metalloproteins due to additional complexity from the three-way interactions among amino-acid residues, metal ions and ligands. This is a significant problem because zinc proteins alone comprise about 10% of all available protein structures in the protein databank. Here, we developed GM-DockZn that is dedicated for ligand docking to zinc proteins. Unlike the existing docking methods developed specifically for zinc proteins, GM-DockZn s les ligand conformations directly using a geometric grid around the ideal zinc-coordination positions of seven discovered coordination motifs, which were found from the survey of known zinc proteins complexed with a single ligand. GM-DockZn has the best performance in s ling near-native poses with correct coordination atoms and numbers within the top 50 and top 10 predictions when compared to several state-of-the-art techniques. This is true not only for a non-redundant dataset of zinc proteins but also for a homolog set of different ligand and zinc-coordination systems for the same zinc proteins. Similar superior performance of GM-DockZn for near-native-pose s ling was also observed for docking to apo-structures and cross-docking between different ligand complex structures of the same protein. The highest success rate for s ling nearest near-native poses within top 5 and top 1 was achieved by combining GM-DockZn for conformational s ling with GOLD for ranking. The proposed geometry-based s ling technique will be useful for ligand docking to other metalloproteins. GM-DockZn is freely available at www.qmclab.com/ for academic users. Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 11-06-2011
DOI: 10.1093/BIOINFORMATICS/BTR350
Abstract: Motivation: In recent years, development of a single-method fold-recognition server lags behind consensus and multiple template techniques. However, a good consensus prediction relies on the accuracy of in idual methods. This article reports our efforts to further improve a single-method fold recognition technique called SPARKS by changing the alignment scoring function and incorporating the SPINE-X techniques that make improved prediction of secondary structure, backbone torsion angle and solvent accessible surface area. Results: The new method called SPARKS-X was tested with the SALIGN benchmark for alignment accuracy, Lindahl and SCOP benchmarks for fold recognition, and CASP 9 blind test for structure prediction. The method is compared to several state-of-the-art techniques such as HHPRED and BoostThreader. Results show that SPARKS-X is one of the best single-method fold recognition techniques. We further note that incorporating multiple templates and refinement in model building will likely further improve SPARKS-X. Availability: The method is available as a SPARKS-X server at sparks.informatics.iupui.edu/ Contact: yqzhou@iupui.edu
Publisher: Oxford University Press (OUP)
Date: 18-04-2017
DOI: 10.1093/BIOINFORMATICS/BTX218
Abstract: The accuracy of predicting protein local and global structural properties such as secondary structure and solvent accessible surface area has been stagnant for many years because of the challenge of accounting for non-local interactions between amino acid residues that are close in three-dimensional structural space but far from each other in their sequence positions. All existing machine-learning techniques relied on a sliding window of 10–20 amino acid residues to capture some ‘short to intermediate’ non-local interactions. Here, we employed Long Short-Term Memory (LSTM) Bidirectional Recurrent Neural Networks (BRNNs) which are capable of capturing long range interactions without using a window. We showed that the application of LSTM-BRNN to the prediction of protein structural properties makes the most significant improvement for residues with the most long-range contacts (|i-j| & ) over a previous window-based, deep-learning method SPIDER2. Capturing long-range interactions allows the accuracy of three-state secondary structure prediction to reach 84% and the correlation coefficient between predicted and actual solvent accessible surface areas to reach 0.80, plus a reduction of 5%, 10%, 5% and 10% in the mean absolute error for backbone ϕ, ψ, θ and τ angles, respectively, from SPIDER2. More significantly, 27% of 182724 40-residue models directly constructed from predicted Cα atom-based θ and τ have similar structures to their corresponding native structures (6Å RMSD or less), which is 3% better than models built by ϕ and ψ angles. We expect the method to be useful for assisting protein structure and function prediction. The method is available as a SPIDER3 server and standalone package at sparks-lab.org. Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 07-01-2015
DOI: 10.1093/BIOINFORMATICS/BTU862
Abstract: Motivation: Frameshifting (FS) indels and nonsense (NS) variants disrupt the protein-coding sequence downstream of the mutation site by changing the reading frame or introducing a premature termination codon, respectively. Despite such drastic changes to the protein sequence, FS indels and NS variants have been discovered in healthy in iduals. How to discriminate disease-causing from neutral FS indels and NS variants is an understudied problem. Results: We have built a machine learning method called DDIG-in (FS) based on real human genetic variations from the Human Gene Mutation Database (inherited disease-causing) and the 1000 Genomes Project (GP) (putatively neutral). The method incorporates both sequence and predicted structural features and yields a robust performance by 10-fold cross-validation and independent tests on both FS indels and NS variants. We showed that human-derived NS variants and FS indels derived from animal orthologs can be effectively employed for independent testing of our method trained on human-derived FS indels. DDIG-in (FS) achieves a Matthews correlation coefficient (MCC) of 0.59, a sensitivity of 86%, and a specificity of 72% for FS indels. Application of DDIG-in (FS) to NS variants yields essentially the same performance (MCC of 0.43) as a method that was specifically trained for NS variants. DDIG-in (FS) was shown to make a significant improvement over existing techniques. Availability and implementation: The DDIG-in web-server for predicting NS variants, FS indels, and non-frameshifting (NFS) indels is available at dig. Contact: yaoqi.zhou@griffith.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Elsevier BV
Date: 1999
Abstract: Since the internal motions of proteins play an essential role in their biological function, it is important to characterize them in a fundamental way. The Lindemann criterion for the solid state is applied to molecular dynamics simulations and temperature-dependent X-ray diffraction data of proteins. It is found that the interior of native proteins is solid-like, while their surface is liquid-like. When the entire protein becomes solid-like at low temperature ( approximately 220 K), the protein is inactive. Thus, the surface-molten solid nature of proteins in their native state permits the dynamics required for function, while preserving their stability. Comparison with rare gas clusters and polymer models indicates that their thermodynamic phase diagrams have many elements in common with those of proteins.
Publisher: Ovid Technologies (Wolters Kluwer Health)
Date: 21-08-2006
Publisher: Research Square Platform LLC
Date: 05-07-2023
DOI: 10.21203/RS.3.RS-2567304/V2
Abstract: Despite their importance in a wide range of living organisms, self-cleaving ribozymes in human genome are few and poorly studied. Here, we performed deep mutational scanning and covariance analysis of two previously proposed self-cleaving ribozymes (LINE-1 and OR4K15). We found that the functional regions for both ribozymes are made of two short segments, connected by a non-functional loop with a total of 46 and 47 contiguous nucleotides only. The discovery makes them the shortest known self-cleaving ribozymes. Moreover, the above functional regions are circular permutated with two nearly identical catalytic internal loops, supported by two stems of different lengths. This new self-cleaving ribozyme class, named as lantern ribozyme for their shape, is similar to the catalytic region of the twister sister ribozymes in term of sequence and secondary structure. However, the nucleotides at the cleavage sites have shown that mutational effects on lantern ribozymes are different from twister sister ribozymes. The discovery of lantern ribozymes reveals a new ribozyme class with the simplest and, perhaps, the most primitive structure needed for self-cleavage.
Publisher: American Chemical Society (ACS)
Date: 17-08-2018
Abstract: It has been long established that cis conformations of amino acid residues play many biologically important roles despite their rare occurrence in protein structure. Because of this rarity, few methods have been developed for predicting cis isomers from protein sequences, most of which are based on outdated datasets and lack the means for independent testing. In this work, using a database of >10000 high-resolution protein structures, we update the statistics of cis isomers and develop a sequence-based prediction technique using an ensemble of residual convolutional and long short-term memory bidirectional recurrent neural networks that allow learning from the whole protein sequence. We show that ensembling eight neural network models yields maximum Matthews correlation coefficient values of approximately 0.35 for cis-Pro isomers and 0.1 for cis-nonPro residues. The method should be useful for prioritizing functionally important residues in cis isomers for experimental validations and improving the s ling of rare protein conformations for ab initio protein structure prediction.
Publisher: AIP Publishing
Date: 29-10-2002
DOI: 10.1063/1.1514574
Abstract: Protein topology, which refers to the arrangement of secondary structures of proteins, has been extensively investigated to examine its role in protein folding. However, recent studies show that topology alone cannot account for the variation of folding behaviors observed in some proteins of the same structural family. In a recent work, we showed that the native structure of the second β hairpin of protein G predicts a folding mechanism that is different from topology-based models. Here, we continue to examine how much one can learn about folding mechanism from native structure. This work focuses on fragment B of Staphylococcal protein A (BpA) – a three-helix (H1, H2, and H3) bundle protein. Using a recently developed all-atom (except nonpolar hydrogen) Gō model interacting with simple discontinuous potentials, the folding of the model BpA was observed in 112 out of 249 trajectories within 50 h of CPU times on a Pentium PC (1 GHz). The model successfully captured several specific properties of BpA that have been observed experimentally. These include the higher stability of H3 compared to H1 and H2, and the higher stability of the H2–H3 microdomain compared to the H1–H2 microdomain. These specific details were not produced by a topology-based square-well model of BpA. Thus, the result further supports the important role of sidechain packing in determining the specific pathway of protein folding. Additional 96 000 short simulations were performed to locate the transition states of the two folding pathways. The limitation of the Gō model and its possible improvement are also discussed.
Publisher: Springer Science and Business Media LLC
Date: 27-11-2019
DOI: 10.1038/S41467-019-13395-9
Abstract: The majority of our human genome transcribes into noncoding RNAs with unknown structures and functions. Obtaining functional clues for noncoding RNAs requires accurate base-pairing or secondary-structure prediction. However, the performance of such predictions by current folding-based algorithms has been stagnated for more than a decade. Here, we propose the use of deep contextual learning for base-pair prediction including those noncanonical and non-nested (pseudoknot) base pairs stabilized by tertiary interactions. Since only $$ $$ 250 nonredundant, high-resolution RNA structures are available for model training, we utilize transfer learning from a model initially trained with a recent high-quality bpRNA dataset of $$ $$ 10,000 nonredundant RNAs made available through comparative analysis. The resulting method achieves large, statistically significant improvement in predicting all base pairs, noncanonical and non-nested base pairs in particular. The proposed method (SPOT-RNA), with a freely available server and standalone software, should be useful for improving RNA structure modeling, sequence alignment, and functional annotations.
Publisher: Research Square Platform LLC
Date: 07-03-2023
DOI: 10.21203/RS.3.RS-2567304/V1
Abstract: Despite their importance in a wide range of living organisms, self-cleaving ribozymes in human genome are few and poorly studied. Here, we performed deep mutational scanning and covariance analysis of two previously proposed self-cleaving ribozymes (LINE-1 and OR4K15 ribozymes). We found that the functional regions for both ribozymes are made of two short segments, connected by a non-functional loop with a total of 46 and 47 contiguous nucleotides only. The discovery makes them the shortest known self-cleaving ribozymes. Moreover, the above functional regions of LINE-1 and OR4K15 ribozymes are circular permutated with two nearly identical catalytic internal loops, supported by two stems of different lengths. This new self-cleaving ribozyme family, named as lantern ribozyme for their shape, is similar to the catalytic core region of the twister sister ribozymes in term of sequence and secondary structure. However, the nucleotides at the cleavage sites have shown that mutational effects on lantern ribozymes are different from twister sister ribozymes. Lacking a stem loop for stabilizing the core active region and two mismatches in the internal loops may force lantern ribozymes to adopt a tertiary structure (and functional mechanisms) different from twister sister, requiring further studies. Nevertheless, the discovery of the lantern ribozymes reveals a new ribozyme family with the simplest and, perhaps, the most primitive structure needed for self-cleavage.
Publisher: Wiley
Date: 06-12-1996
DOI: 10.1002/(SICI)1097-0282(199602)38:2<273::AID-BIP11>3.0.CO;2-G
Publisher: Elsevier BV
Date: 08-2002
Publisher: Wiley
Date: 07-09-2010
Publisher: Springer Science and Business Media LLC
Date: 08-04-2017
Publisher: Springer New York
Date: 2017
Publisher: Springer Science and Business Media LLC
Date: 28-11-2019
DOI: 10.1186/S13059-019-1847-4
Abstract: Single nucleotide variants (SNVs) in intronic regions have yet to be systematically investigated for their disease-causing potential. Using known pathogenic and neutral intronic SNVs (iSNVs) as training data, we develop the RegSNPs-intron algorithm based on a random forest classifier that integrates RNA splicing, protein structure, and evolutionary conservation features. RegSNPs-intron showed excellent performance in evaluating the pathogenic impacts of iSNVs. Using a high-throughput functional reporter assay called ASSET-seq (ASsay for Splicing using ExonTrap and sequencing), we evaluate the impact of RegSNPs-intron predictions on splicing outcome. Together, RegSNPs-intron and ASSET-seq enable effective prioritization of iSNVs for disease pathogenesis.
Publisher: Elsevier BV
Date: 06-2020
DOI: 10.1016/J.SBI.2019.11.009
Abstract: Protein glycosylation is the most complex and prevalent post-translation modification in terms of the number of proteins modified and the ersity generated. To understand the functional roles of glycoproteins it is important to gain an insight into the repertoire of oligosaccharides present. The comparison and relative quantitation of glycoforms combined with site-specific identification and occupancy are necessary steps in this direction. Computational platforms have continued to mature assisting researchers with the interpretation of such glycomics and glycoproteomics data sets, but frequently support dedicated workflows and users rely on the manual interpretation of data to gain insights into the glycoproteome. The growth of site-specific knowledge has also led to the implementation of machine-learning algorithms to predict glycosylation which is now being integrated into glycoproteomics pipelines. This short review describes commercial and open-access databases and software with an emphasis on those that are actively maintained and designed to support current analytical workflows.
Publisher: SAGE Publications
Date: 09-2015
Publisher: Springer New York
Date: 2014
Publisher: Elsevier BV
Date: 02-2019
Publisher: Oxford University Press (OUP)
Date: 04-06-2010
DOI: 10.1093/BIOINFORMATICS/BTQ295
Abstract: Motivation: Template-based prediction of DNA binding proteins requires not only structural similarity between target and template structures but also prediction of binding affinity between the target and DNA to ensure binding. Here, we propose to predict protein–DNA binding affinity by introducing a new volume-fraction correction to a statistical energy function based on a distance-scaled, finite, ideal-gas reference (DFIRE) state. Results: We showed that this energy function together with the structural alignment program TM-align achieves the Matthews correlation coefficient (MCC) of 0.76 with an accuracy of 98%, a precision of 93% and a sensitivity of 64%, for predicting DNA binding proteins in a benchmark of 179 DNA binding proteins and 3797 non-binding proteins. The MCC value is substantially higher than the best MCC value of 0.69 given by previous methods. Application of this method to 2235 structural genomics targets uncovered 37 as DNA binding proteins, 27 (73%) of which are putatively DNA binding and only 1 protein whose annotated functions do not contain DNA binding, while the remaining proteins have unknown function. The method provides a highly accurate and sensitive technique for structure-based prediction of DNA binding proteins. Availability: The method is implemented as a part of the Structure-based function-Prediction On-line Tools (SPOT) package available at pot Contact: yqzhou@iupui.edu
Publisher: Wiley
Date: 17-12-2019
DOI: 10.1002/JCC.26132
Abstract: Protein structure determination has long been one of the most challenging problems in molecular biology for the past 60 years. Here we present an ab initio protein tertiary-structure prediction method assisted by predicted contact maps from SPOT-Contact and predicted dihedral angles from SPIDER 3. These predicted properties were then fed to the crystallography and NMR system (CNS) for restrained structure modeling. The resulted structures are first evaluated by the potential energy calculated by CNS, followed by dDFIRE energy function for model selections. The method called SPOT-Fold has been tested on 241 CASP targets between 67 and 670 amino acid residues, 60 randomly selected globular proteins under 100 amino acids. The method has a comparable accuracy to other contact-map-based modeling techniques. © 2019 Wiley Periodicals, Inc.
Publisher: Proceedings of the National Academy of Sciences
Date: 27-12-2013
Publisher: Springer Science and Business Media LLC
Date: 07-08-2019
Publisher: Oxford University Press (OUP)
Date: 24-12-2019
DOI: 10.1093/NAR/GKZ1192
Abstract: Despite the large number of noncoding RNAs in human genome and their roles in many diseases include cancer, we know very little about them due to lack of structural clues. The centerpiece of the structural clues is the full RNA base-pairing structure of secondary and tertiary contacts that can be precisely obtained only from costly and time-consuming 3D structure determination. Here, we performed deep mutational scanning of self-cleaving CPEB3 ribozyme by error-prone PCR and showed that a library of & × 104 single-to-triple mutants is sufficient to infer 25 of 26 base pairs including non-nested, nonhelical, and noncanonical base pairs with both sensitivity and precision at 96%. Such accurate inference was further confirmed by a twister ribozyme at 100% precision with only noncanonical base pairs as false negatives. The performance was resulted from analyzing covariation-induced deviation of activity by utilizing both functional and nonfunctional variants for unsupervised classification, followed by Monte Carlo (MC) simulated annealing with mutation-derived scores. Highly accurate inference can also be obtained by combining MC with evolution/direct coupling analysis, R-scape or epistasis analysis. The results highlight the usefulness of deep mutational scanning for high-accuracy structural inference of self-cleaving ribozymes with implications for other structured RNAs that permit high-throughput functional selections.
Publisher: Proceedings of the National Academy of Sciences
Date: 11-11-2013
Abstract: Linker H1 histones control the accessibility of linker DNA between two neighbor nucleosomes to DNA-binding proteins and regulate chromatin folding. We investigated the structure of the H1–nucleosome complex through a combination of multidimensional nuclear magnetic resonance spectroscopy, site-directed mutagenesis-isothermal-titration calorimetry and computational design/modeling. The results lead to a unique structural model for the globular domain of H1 in complex with the nucleosome that contains residue-level information and have implications for the dynamics of chromatin in vivo. In addition, our approach will be useful for testing the hypothesis that the globular domain of H1 variants might have distinct binding geometries within the nucleosome, and thereby contribute to the heterogeneity of chromatin structure.
Publisher: Springer New York
Date: 28-10-2016
DOI: 10.1007/978-1-4939-6406-2_10
Abstract: A fast accessible surface area (ASA) predictor is presented. In this new approach no residue mutation profiles generated by multiple sequence alignments are used as inputs. Instead, we use only single sequence information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for ASAquick are available from Research and Information Systems at mamiris.com and from the Battelle Center for Mathematical Medicine at mathmed.org .
Publisher: Springer Science and Business Media LLC
Date: 2013
Publisher: Springer New York
Date: 28-10-2017
DOI: 10.1007/978-1-4939-6406-2_12
Abstract: Over the past decade, it has become evident that a large proportion of proteins contain intrinsically disordered regions, which play important roles in pivotal cellular functions. Many computational tools have been developed with the aim of identifying the level and location of disorder within a protein. In this chapter, we describe a neural network based technique called SPINE-D that employs a unique three-state design and can accurately capture disordered residues in both short and long disordered regions. SPINE-D was trained on a large database of 4229 non-redundant proteins, and yielded an AUC of 0.86 on a cross-validation test and 0.89 on an independent test. SPINE-D can also detect a semi-disordered state that is associated with induced folders and aggregation-prone regions in disordered proteins and weakly stable or locally unfolded regions in structured proteins. We implement an online web service and an offline stand-alone program for SPINE-D, they are freely available at sparks-lab.org/SPINE-D/ . We then walk you through how to use the online and offline SPINE-D in making disorder predictions, and examine the disorder and semi-disorder prediction in a case study on the p53 protein.
Publisher: Springer Science and Business Media LLC
Date: 23-09-2021
Publisher: Proceedings of the National Academy of Sciences
Date: 23-12-1997
Abstract: The calculated folding thermodynamics of a simple off-lattice three-helix-bundle protein model under equilibrium conditions shows the experimentally observed protein transitions: a collapse transition, a disordered-to-ordered globule transition, a globule to native-state transition, and the transition from the active native state to a frozen inactive state. The cooperativity and physical origin of the various transitions are explored with a single “optimization” parameter and characterized with the Lindemann criterion for liquid versus solid-state dynamics. Below the folding temperature, the model has a simple free energy surface with a single basin near the native state the surface is similar to that calculated from a simulation of the same three-helix-bundle protein with an all-atom representation [Boczko, E. M. & Brooks III, C. L. (1995) Science 269, 393–396].
Publisher: MDPI AG
Date: 18-08-2021
Abstract: This paper outlines the development of Indigenist Health Humanities as a new and innovative field of research building an intellectual collective capable of bridging the knowledge gap that hinders current efforts to close the gap in Indigenous health inequality. Bringing together health and the humanities through the particularity of Indigenous scholarship, a deeper understanding of the human experience of health will be developed alongside a greater understanding of the enablers to building a transdisciplinary collective of Indigenist researchers. The potential benefits include a more sustainable, relational, and ethical approach to advancing new knowledge, and health outcomes, for Indigenous people in its fullest sense.
Publisher: Public Library of Science (PLoS)
Date: 18-06-2020
Publisher: Oxford University Press (OUP)
Date: 05-01-2017
DOI: 10.1093/BIOINFORMATICS/BTW829
Abstract: The high cost of drug discovery motivates the development of accurate virtual screening tools. Binding-homology, which takes advantage of known protein–ligand binding pairs, has emerged as a powerful discrimination technique. In order to exploit all available binding data, modelled structures of ligand-binding sequences may be used to create an expanded structural binding template library. SPOT-Ligand 2 has demonstrated significantly improved screening performance over its previous version by expanding the template library 15 times over the previous one. It also performed better than or similar to other binding-homology approaches on the DUD and DUD-E benchmarks. The server is available online at sparks-lab.org. Supplementary data are available at Bioinformatics online.
Publisher: American Chemical Society (ACS)
Date: 19-08-2005
DOI: 10.1021/BI050785R
Abstract: To test whether the folding process of a large protein can be understood on the basis of the folding behavior of the domains that constitute it, we coupled two well-studied small -helical proteins, the B-domain of protein A (60 amino acids) and Rd-apocytochrome b562 (Rd-apocyt b562, 106 amino acids), by fusing the C-terminal helix of the B-domain of protein A with the N-terminal helix of Rd-apocyt b562 without changing their hydrophobic core residues. The success of the design was confirmed by determining the structure of the engineered protein with multidimensional NMR methods. Kinetic studies showed that the logarithms of the folding/unfolding rate constants of the engineered protein are linearly dependent on concentrations of guanidinium chloride in the measurable range from 1.7 to 4 M. Their slopes (m-values) are close to those of Rd-apocyt b562. In addition, the 1H-15N HSQC spectrum taken at 1.5 M guanidinium chloride reveals that only the Rd-apocyt b562 domain in the designed protein remained folded. These results suggest that the two domains have weak energetic coupling. Interestingly, the redesigned protein folds faster than Rd-apocyt b562, suggesting that the fused helix stabilizes the rate-limiting transition state.
Publisher: Elsevier BV
Date: 04-1991
Publisher: Springer Science and Business Media LLC
Date: 23-09-1999
DOI: 10.1038/43940
Publisher: Wiley
Date: 25-03-2018
DOI: 10.1002/PROT.25489
Abstract: Designing protein sequences that can fold into a given structure is a well-known inverse protein-folding problem. One important characteristic to attain for a protein design program is the ability to recover wild-type sequences given their native backbone structures. The highest average sequence identity accuracy achieved by current protein-design programs in this problem is around 30%, achieved by our previous system, SPIN. SPIN is a program that predicts sequences compatible with a provided structure using a neural network with fragment-based local and energy-based nonlocal profiles. Our new model, SPIN2, uses a deep neural network and additional structural features to improve on SPIN. SPIN2 achieves over 34% in sequence recovery in 10-fold cross-validation and independent tests, a 4% improvement over the previous version. The sequence profiles generated from SPIN2 are expected to be useful for improving existing fold recognition and protein design techniques. SPIN2 is available at sparks-lab.org.
Publisher: Elsevier BV
Date: 09-2007
Publisher: Springer Science and Business Media LLC
Date: 09-2017
DOI: 10.1038/549031C
Publisher: World Scientific Pub Co Pte Lt
Date: 10-2005
DOI: 10.1142/S0219720005001430
Abstract: Statistical energy functions are discrete (or stepwise) energy functions that lack van der Waals repulsion. As a result, they are often applied directly to a given structure (native or decoy) without further energy minimization being performed to the structure. However, the full benefit (or hidden defect) of an energy function cannot be revealed without energy minimization. This paper tests a recently developed, all-atom statistical energy function by energy minimization with a fixed secondary helical structure in dihedral space. This is accomplished by combining the statistical energy function based on a distance-scaled finite ideal-gas reference (DFIRE) state with a simple repulsive interaction and an improper torsion energy function. The energy function was used to minimize 2000 random initial structures of 41 small and medium-sized helical proteins in a dihedral space with a fixed helical region. Results indicate that near-native structures for most studied proteins can be obtained by minimization alone. The average minimum root-mean-squared distance (rmsd) from the native structure for all 41 proteins is 4.1 Å. The energy function (together with a simple clustering of similar structures) also makes a reasonable selection of near-native structures from minimized structures. The average rmsd value and the average rank for the best structure in the top five is 6.8 Å and 2.4, respectively. The accuracy of the structures s led and the structure selections can be improved significantly with the removal of flexible terminal regions in rmsd calculations and in minimization and with the increase in the number of minimizations. The minimized structures form an excellent decoy set for testing other energy functions because most structures are well-packed with minimum hard-core overlaps with correct hydrophobic/hydrophilic partitioning. They are available online at .
Publisher: Springer Science and Business Media LLC
Date: 31-05-2013
Publisher: American Society for Microbiology
Date: 09-2016
DOI: 10.1128/IAI.00414-16
Abstract: Plasmodium falciparum is the most virulent human malaria parasite because of its ability to cytoadhere in the microvasculature. Nonhuman primate studies demonstrated relationships among knob expression, cytoadherence, and infectivity. This has not been examined in humans. Cultured clinical-grade P. falciparum parasites (NF54, 7G8, and 3D7B) and ex vivo -derived cell banks were characterized. Knob and knob-associated histidine-rich protein expression, CD36 adhesion, and antibody recognition of parasitized erythrocytes (PEs) were evaluated. Parasites from the cell banks were administered to malaria-naive human volunteers to explore infectivity. For the NF54 and 3D7B cell banks, blood was collected from the study participants for in vitro characterization. All parasites were infective in vivo . However, infectivity of NF54 was dramatically reduced. In vitro characterization revealed that unlike other cell bank parasites, NF54 PEs lacked knobs and did not cytoadhere. Recognition of NF54 PEs by immune sera was observed, suggesting P. falciparum erythrocyte membrane protein 1 expression. Subsequent recovery of knob expression and CD36-mediated adhesion were observed in PEs derived from participants infected with NF54. Knobless cell bank parasites have a dramatic reduction in infectivity and the ability to adhere to CD36. Subsequent infection of malaria-naive volunteers restored knob expression and CD36-mediated cytoadherence, thereby showing that the human environment can modulate virulence.
Publisher: Elsevier BV
Date: 2019
Publisher: SAGE Publications
Date: 02-04-2020
Abstract: On 28 August 2019, the Full Federal Court of Australia handed down its decision in four test cases selected by the Court as representative of over 50 cases brought on behalf of asylum seekers on Nauru and Papua New Guinea against the Minister for Home Affairs (the test cases). The test cases confirmed that the Federal Court does have the jurisdiction to hear negligence claims brought by immigration transferees offshore, but that the jurisdiction of the Federal Court has limits.
Publisher: American Chemical Society (ACS)
Date: 16-02-2005
DOI: 10.1021/JM049314D
Abstract: We developed a knowledge-based statistical energy function for protein-ligand, protein-protein, and protein-DNA complexes by using 19 atom types and a distance-scale finite ideal-gas reference (DFIRE) state. The correlation coefficients between experimentally measured protein-ligand binding affinities and those predicted by the DFIRE energy function are around 0.63 for one training set and two testing sets. The energy function also makes highly accurate predictions of binding affinities of protein-protein and protein-DNA complexes. Correlation coefficients between theoretical and experimental results are 0.73 for 82 protein-protein (peptide) complexes and 0.83 for 45 protein-DNA complexes, despite the fact that the structures of protein-protein (peptide) and protein-DNA complexes were not used in training the energy function. The results of the DFIRE energy function on protein-ligand complexes are compared to the published results of 12 other scoring functions generated from either physical-based, knowledge-based, or empirical methods. They include AutoDock, X-Score, DrugScore, four scoring functions in Cerius 2 (LigScore, PLP, PMF, and LUDI), four scoring functions in SYBYL (F-Score, G-Score, D-Score, and ChemScore), and BLEEP. While the DFIRE energy function is only moderately successful in ranking native or near native conformations, it yields the strongest correlation between theoretical and experimental binding affinities of the testing sets and between rmsd values and energy scores of docking decoys in a benchmark of 100 protein-ligand complexes. The parameters and the program of the all-atom DFIRE energy function are freely available for academic users at theory.med.buffalo.edu.
Publisher: Informa UK Limited
Date: 03-1989
Publisher: Springer Science and Business Media LLC
Date: 09-08-2019
Publisher: Wiley
Date: 17-05-2007
DOI: 10.1002/PROT.21459
Abstract: Recognizing the structural similarity without significant sequence identity (called fold recognition) is the key for bridging the gap between the number of known protein sequences and the number of structures solved. Previously, we developed a fold-recognition method called SP(3) which combines sequence-derived sequence profiles, secondary-structure profiles and residue-depth dependent, structure-derived sequence profiles. The use of residue-depth-dependent profiles makes SP(3) one of the best automatic predictors in CASP 6. Because residue depth (RD) and solvent accessible surface area (solvent accessibility) are complementary in describing the exposure of a residue to solvent, we test whether or not incorporation of solvent-accessibility profiles into SP(3) could further increase the accuracy of fold recognition. The resulting method, called SP(4), was tested in SALIGN benchmark for alignment accuracy and Lindahl, LiveBench 8 and CASP7 blind prediction for fold recognition sensitivity and model-structure accuracy. For remote homologs, SP(4) is found to consistently improve over SP(3) in the accuracy of sequence alignment and predicted structural models as well as in the sensitivity of fold recognition. Our result suggests that RD and solvent accessibility can be used concurrently for improving the accuracy and sensitivity of fold recognition. The SP(4) server and its local usage package are available on sparks.informatics.iupui.edu/SP4.
Publisher: Wiley
Date: 02-04-2004
DOI: 10.1002/PROT.20007
Abstract: An elaborate knowledge-based energy function is designed for fold recognition. It is a residue-level single-body potential so that highly efficient dynamic programming method can be used for alignment optimization. It contains a backbone torsion term, a buried surface term, and a contact-energy term. The energy score combined with sequence profile and secondary structure information leads to an algorithm called SPARKS (Sequence, secondary structure Profiles and Residue-level Knowledge-based energy Score) for fold recognition. Compared with the popular PSI-BLAST, SPARKS is 21% more accurate in sequence-sequence alignment in ProSup benchmark and 10%, 25%, and 20% more sensitive in detecting the family, superfamily, fold similarities in the Lindahl benchmark, respectively. Moreover, it is one of the best methods for sensitivity (the number of correctly recognized proteins), alignment accuracy (based on the MaxSub score), and specificity (the average number of correctly recognized proteins whose scores are higher than the first false positives) in LiveBench 7 among more than twenty servers of non-consensus methods. The simple algorithm used in SPARKS has the potential for further improvement. This highly efficient method can be used for fold recognition on genomic scales. A web server is established for academic users on theory.med.buffalo.edu.
Publisher: Hindawi Limited
Date: 03-10-2017
DOI: 10.1002/HUMU.23111
Publisher: Springer Science and Business Media LLC
Date: 21-12-2017
Publisher: Mary Ann Liebert Inc
Date: 05-2020
Abstract: The folding of a protein structure is a process governed by both local and nonlocal interactions. While incorporating local dependencies into a machine learning algorithm for protein structure prediction is simple and has been exploited for some time, the modeling of long-range dependences which result from structurally-neighboring residues has only recently begun to be addressed. Structural properties designed to localize the prediction space from direct tertiary structure prediction, such as secondary structure, contact maps, and intrinsic disorder, among others, have begun to greatly benefit from machine learning models capable of modeling a widened, potentially global protein context. This has led to a direct enhancement of the quality of predicted tertiary structures through both the optimization of structural constraints and improved reliability of alignments to structural templates. These improvements have stemmed from the application of recurrent and convolutional neural network architectures effective not only at innate sequential context propagation but also deep feature extraction due to novel skip connections and normalization techniques allowing for greatly enhanced error back-propagation. The recent results from independent blind testing in Critical Assessment of protein Structure Prediction 13 have signaled the beginning of a new generation of protein structure prediction through the utilization of these contextual techniques. The ripples from advancements in the determination of one-dimensional and two-dimensional structural properties have us moving ever closer to the solution of the protein structure prediction problem.
Publisher: Hindawi Limited
Date: 16-05-2017
DOI: 10.1002/HUMU.23235
Publisher: Hindawi Limited
Date: 21-11-2018
DOI: 10.1002/HUMU.23358
Abstract: Many genetic diseases exhibit considerable epidemiological comorbidity and common symptoms, which provokes debate about the extent of their etiological overlap. The rapid growth in the number of known disease-causing mutations in the Human Gene Mutation Database (HGMD) has allowed us to characterize genetic similarities between diseases by ascertaining the extent to which identical genetic mutations are shared between diseases. Using this approach, we show that 41.6% of disease pairs in all possible pairs (42, 083) exhibit a significant sharing of mutations (P value < 0.05). These mutation-related disease pairs are in agreement with heritability-based disease-disease relations in 48 neurological and psychiatric disease pairs (Spearman's correlation coefficient = 0.50 P value = 3.4 × 10
Publisher: Royal Society of Chemistry (RSC)
Date: 2013
DOI: 10.1039/C3MB70167K
Publisher: Informa UK Limited
Date: 11-2011
Publisher: Cold Spring Harbor Laboratory
Date: 07-07-2023
DOI: 10.1101/2023.07.07.548080
Abstract: Recent advances in deep learning have significantly improved the ability to infer protein sequences directly from protein structures for the fix-backbone design. The methods have evolved from the early use of multi-layer perceptrons to convolutional neural networks, transformer, and graph neural networks (GNN). However, the conventional approach of constructing K-nearest-neighbors (KNN) graph for GNN has limited the utilization of edge information, which plays a critical role in network performance. Here we introduced SPIN-CGNN based on protein contact maps for nearest neighbors. Together with auxiliary edge updates and selective kernels, we found that SPIN-CGNN provided a comparable performance in refolding ability by AlphaFold2 to the current state-of-the-art techniques but a significant improvement over them in term of sequence recovery, perplexity, deviation from amino-acid compositions of native sequences, conservation of hydrophobic positions, and low complexity regions, according to the test by unseen structures and “hallucinated” structures. Results suggest that low complexity regions in the sequences designed by deep learning techniques remain to be improved, when compared to the native sequences.
Publisher: Wiley
Date: 07-2008
Publisher: Informa UK Limited
Date: 03-1989
Publisher: Oxford University Press (OUP)
Date: 27-10-2020
DOI: 10.1093/BIOINFORMATICS/BTAA652
Abstract: RNA solvent accessibility, similar to protein solvent accessibility, reflects the structural regions that are accessible to solvents or other functional biomolecules, and plays an important role for structural and functional characterization. Unlike protein solvent accessibility, only a few tools are available for predicting RNA solvent accessibility despite the fact that millions of RNA transcripts have unknown structures and functions. Also, these tools have limited accuracy. Here, we have developed RNAsnap2 that uses a dilated convolutional neural network with a new feature, based on predicted base-pairing probabilities from LinearPartition. Using the same training set from the recent predictor RNAsol, RNAsnap2 provides an 11% improvement in median Pearson Correlation Coefficient (PCC) and 9% improvement in mean absolute errors for the same test set of 45 RNA chains. A larger improvement (22% in median PCC) is observed for 31 newly deposited RNA chains that are non-redundant and independent from the training and the test sets. A single-sequence version of RNAsnap2 (i.e. without using sequence profiles generated from homology search by Infernal) has achieved comparable performance to the profile-based RNAsol. In addition, RNAsnap2 has achieved comparable performance for protein-bound and protein-free RNAs. Both RNAsnap2 and RNAsnap2 (SingleSeq) are expected to be useful for searching structural signatures and locating functional regions of non-coding RNAs. Standalone-versions of RNAsnap2 and RNAsnap2 (SingleSeq) are available at aswindersingh2/RNAsnap2. Direct prediction can also be made at erver/rnasnap2. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above. Supplementary data are available at Bioinformatics online.
Publisher: Wiley
Date: 02-03-2006
DOI: 10.1002/PROT.20934
Abstract: Solvent accessibility, one of the key properties of amino acid residues in proteins, can be used to assist protein structure prediction. Various approaches such as neural network, support vector machines, probability profiles, information theory, Bayesian theory, logistic function, and multiple linear regression have been developed for solvent accessibility prediction. In this article, a much simpler quadratic programming method based on the buriability parameter set of amino acid residues is developed. The new method, called QBES (Quadratic programming and Buriability Energy function for Solvent accessibility prediction), is reasonably accurate for predicting the real value of solvent accessibility. By using a dataset of 30 proteins to optimize three parameters, the average correlation coefficients between the predicted and actual solvent accessibility are about 0.5 for all four independent test sets ranging from 126 to 513 proteins. The method is efficient. It takes only 20 min for a regular PC to obtain results of 30 proteins with an average length of 263 amino acids. Although the proposed method is less accurate than a few more sophisticated methods based on neural network or support vector machines, this is the first attempt to predict solvent accessibility by energy optimization with constraints. Possible improvements and other applications of the method are discussed.
Publisher: Elsevier BV
Date: 04-1991
Publisher: Cold Spring Harbor Laboratory
Date: 02-11-2017
Abstract: As most RNA structures are elusive to structure determination, obtaining solvent accessible surface areas (ASAs) of nucleotides in an RNA structure is an important first step to characterize potential functional sites and core structural regions. Here, we developed RNAsnap, the first machine-learning method trained on protein-bound RNA structures for solvent accessibility prediction. Built on sequence profiles from multiple sequence alignment (RNAsnap-prof), the method provided robust prediction in fivefold cross-validation and an independent test (Pearson correlation coefficients, r , between predicted and actual ASA values are 0.66 and 0.63, respectively). Application of the method to 6178 mRNAs revealed its positive correlation to mRNA accessibility by dimethyl sulphate (DMS) experimentally measured in vivo ( r = 0.37) but not in vitro ( r = 0.07), despite the lack of training on mRNAs and the fact that DMS accessibility is only an approximation to solvent accessibility. We further found strong association across coding and noncoding regions between predicted solvent accessibility of the mutation site of a single nucleotide variant (SNV) and the frequency of that variant in the population for 2.2 million SNVs obtained in the 1000 Genomes Project. Moreover, mapping solvent accessibility of RNAs to the human genome indicated that introns, 5′ cap of 5′ and 3′ cap of 3′ untranslated regions, are more solvent accessible, consistent with their respective functional roles. These results support conformational selections as the mechanism for the formation of RNA–protein complexes and highlight the utility of genome-scale characterization of RNA tertiary structures by RNAsnap. The server and its stand-alone downloadable version are available at sparks-lab.org .
Publisher: Wiley
Date: 16-04-2004
DOI: 10.1002/PROT.20019
Abstract: Extracting knowledge-based statistical potential from known structures of proteins is proved to be a simple, effective method to obtain an approximate free-energy function. However, the different compositions of amino acid residues at the core, the surface, and the binding interface of proteins prohibited the establishment of a unified statistical potential for folding and binding despite the fact that the physical basis of the interaction (water-mediated interaction between amino acids) is the same. Recently, a physical state of ideal gas, rather than a statistically averaged state, has been used as the reference state for extracting the net interaction energy between amino acid residues of monomeric proteins. Here, we find that this monomer-based potential is more accurate than an existing all-atom knowledge-based potential trained with interfacial structures of dimers in distinguishing native complex structures from docking decoys (100% success rate vs. 52% in 21 dimer/trimer decoy sets). It is also more accurate than a recently developed semiphysical empirical free-energy functional enhanced by an orientation-dependent hydrogen-bonding potential in distinguishing native state from Rosetta docking decoys (94% success rate vs. 74% in 31 antibody-antigen and other complexes based on Z score). In addition, the monomer potential achieved a 93% success rate in distinguishing true dimeric interfaces from artificial crystal interfaces. More importantly, without additional parameters, the potential provides an accurate prediction of binding free energy of protein-peptide and protein-protein complexes (a correlation coefficient of 0.87 and a root-mean-square deviation of 1.76 kcal/mol with 69 experimental data points). This work marks a significant step toward a unified knowledge-based potential that quantitatively captures the common physical principle underlying folding and binding. A Web server for academic users, established for the prediction of binding free energy and the energy evaluation of the protein-protein complexes, may be found at theory.med.buffalo.edu.
Publisher: Wiley
Date: 2004
DOI: 10.1110/PS.03162804
Abstract: We have performed discontinuous molecular dynamics simulations of the thermodynamics and stability of a tetrameric beta-sheet complex that contains four identical four-stranded antiparallel beta-sheet peptides. The potential used in the simulation is a hybrid Go-type potential characterized by the bias gap parameter g, an artificial measure of the preference of a model protein for its native state, and the intermolecular contact parameter eta, which measures the ratio of intermolecular to intramolecular native attractions. Despite the simplicity of the model, a complex set of thermodynamic transitions for the beta-sheet complex is revealed that shows there are three distinct oligomer (partially ordered, ordered, and highly ordered beta-sheet complex) states and four noninteracting monomers phases. The thermodynamic properties of the three oligomer states strongly depend on both the size of the intermolecular contact parameter eta and the temperature. The partially ordered beta-sheet complex is made up of four ordered globules and is observed at intermediate to large eta at high temperatures. The ordered beta-sheet complex contains four native beta-sheets and is located at small to intermediate eta at low temperatures in the phase diagram. The highly ordered beta-sheet complex has fully-stiff beta-sheet strands, the same as the global energy minimum structure, and is observed for all eta at low temperatures.
Publisher: Cold Spring Harbor Laboratory
Date: 03-02-2023
DOI: 10.1101/2023.02.01.526559
Abstract: Recent success of AlphaFold2 in protein structure prediction relied heavily on co-evolutionary information derived from homologous protein sequences found in the huge, integrated database of protein sequences (Big Fantastic Database). In contrast, the existing nucleotide databases were not consolidated to facilitate wider and deeper homology search. Here, we built a comprehensive database by including the noncoding RNA sequences from RNAcentral, the transcriptome assembly and metagenome assembly from MG-RAST, the genomic sequences from Genome Warehouse (GWH), and the genomic sequences from MGnify, in addition to NCBI’s nucleotide database (nt) and its subsets. The resulting MARS database (Master database of All possible RNA sequences) is 20-fold larger than NCBI’s nt database or 60-fold larger than RNAcentral. The new dataset along with a new split-search strategy allows a substantial improvement in homology search over existing state-of-the-art techniques. It also yields more accurate and more sensitive multiple sequence alignments (MSA) than manually curated MSAs from Rfam for the majority of structured RNAs mapped to Rfam. The results indicate that MARS coupled with the fully automatic homology search tool RNAcmap will be useful for improved structural and functional inference of noncoding RNAs.
Publisher: Wiley
Date: 02-02-2016
DOI: 10.1002/JCC.24314
Abstract: Protein-peptide interactions are essential for all cellular processes including DNA repair, replication, gene-expression, and metabolism. As most protein-peptide interactions are uncharacterized, it is cost effective to investigate them computationally as the first step. All existing approaches for predicting protein-peptide binding sites, however, are based on protein structures despite the fact that the structures for most proteins are not yet solved. This article proposes the first machine-learning method called SPRINT to make Sequence-based prediction of Protein-peptide Residue-level Interactions. SPRINT yields a robust and consistent performance for 10-fold cross validations and independent test. The most important feature is evolution-generated sequence profiles. For the test set (1056 binding and non-binding residues), it yields a Matthews' Correlation Coefficient of 0.326 with a sensitivity of 64% and a specificity of 68%. This sequence-based technique shows comparable or more accurate than structure-based methods for peptide-binding site prediction. SPRINT is available as an online server at: © 2016 Wiley Periodicals, Inc.
Publisher: Wiley
Date: 12-12-2004
DOI: 10.1002/PROT.10584
Abstract: The average contribution of in idual residue to folding stability and its dependence on buried accessible surface area (ASA) are obtained by two different approaches. One is based on experimental mutation data, and the other uses a new knowledge-based atom-atom potential of mean force. We show that the contribution of a residue has a significant correlation with buried ASA and the regression slopes of 20 amino acid residues (called the buriability) are all positive (pro-burial). The buriability parameter provides a quantitative measure of the driving force for the burial of a residue. The large buriability gap observed between hydrophobic and hydrophilic residues is responsible for the burial of hydrophobic residues in soluble proteins. Possible factors that contribute to the buriability gap are discussed.
Publisher: Wiley
Date: 02-11-2012
DOI: 10.1002/JCC.21968
Publisher: American Chemical Society (ACS)
Date: 05-02-2005
DOI: 10.1021/PR049805M
Abstract: The domain graph of domains and domain combinations of Arabidopsis thaliana is established based on pfam 14.0 database and analyzed via comparison with 10 eukaryotic, 30 bacterial, and 16 archaeal proteomes. The comparative analysis of the domain graphs provides a useful platform for revealing global insights on the evolution of plant kingdom. More importantly, it is a powerful tool for searching not only the possible new function of both plant-specific and nonspecific domains via specific domain combinations in Arabidopsis thaliana but also the functional role of unknown domains. As an ex le, we present the functional link between ubiquitin and Myb_DNA-binding domains via Bromodomain as the plant specific evidence for the association between transcription and ubiquitin. We further show that PentatricoPeptide Repeats (PPR) proteins have plant-specific links with a wide variety of domains responsible for RNA binding/metabolism, modulation of protein-protein interactions, ubiquitin-conjugation, cell growth/maintenance, catalysis, and others. This further supports the recently proposed association of PPR proteins with specific RNA transcripts and defined effector proteins. Moreover, the domain graph built from tissue-specific genes is frequently associated with DNA binding domains, suggesting that the differentiation of tissue cell types is contributed mostly by tissue-specific transcriptional process. DOGMA (DOmain Graph via coMparitive analysis for Arabidopsis thaliana) is available on-line with a variety of search tools at theory.med.buffalo.edu/DOGMA. The database, which allows user-specified search for plant specific domains and their combinations, will be useful as an additional tool for annotation of the proteins that play specific roles in plants and other organisms.
Publisher: Public Library of Science (PLoS)
Date: 21-04-2016
Publisher: Wiley
Date: 14-08-2018
DOI: 10.1002/CPPS.75
Abstract: Protein‐carbohydrate interaction is essential for biological systems, and carbohydrate‐binding proteins (CBPs) are important targets when designing antiviral and anticancer drugs. Due to the high cost and difficulty associated with experimental approaches, many computational methods have been developed as complementary approaches to predict CBPs or carbohydrate‐binding sites. However, most of these computational methods are not publicly available. Here, we provide a comprehensive review of related studies and demonstrate our two recently developed bioinformatics methods. The method SPOT‐CBP is a template‐based method for detecting CBPs based on structure through structural homology search combined with a knowledge‐based scoring function. This method can yield model complex structure in addition to accurate prediction of CBPs. Furthermore, it has been observed that similarly accurate predictions can be made using structures from homology modeling, which has significantly expanded its applicability. The other method, SPRINT‐CBH, is a de novo approach that predicts binding residues directly from protein sequences by using sequence information and predicted structural properties. This approach does not need structurally similar templates and thus is not limited by the current database of known protein‐carbohydrate complex structures. These two complementary methods are available at sparks‐lab.org . © 2018 by John Wiley & Sons, Inc.
Publisher: Oxford University Press (OUP)
Date: 29-10-2020
DOI: 10.1093/NAR/GKAA931
Abstract: We present DescribePROT, the database of predicted amino acid-level descriptors of structure and function of proteins. DescribePROT delivers a comprehensive collection of 13 complementary descriptors predicted using 10 popular and accurate algorithms for 83 complete proteomes that cover key model organisms. The current version includes 7.8 billion predictions for close to 600 million amino acids in 1.4 million proteins. The descriptors encompass sequence conservation, position specific scoring matrix, secondary structure, solvent accessibility, intrinsic disorder, disordered linkers, signal peptides, MoRFs and interactions with proteins, DNA and RNAs. Users can search DescribePROT by the amino acid sequence and the UniProt accession number and entry name. The pre-computed results are made available instantaneously. The predictions can be accesses via an interactive graphical interface that allows simultaneous analysis of multiple descriptors and can be also downloaded in structured formats at the protein, proteome and whole database scale. The putative annotations included by DescriPROT are useful for a broad range of studies, including: investigations of protein function, applied projects focusing on therapeutics and diseases, and in the development of predictors for other protein sequence descriptors. Future releases will expand the coverage of DescribePROT. DescribePROT can be accessed at ervers/DESCRIBEPROT/.
Publisher: Wiley
Date: 19-06-2014
DOI: 10.1002/PROT.24620
Publisher: Wiley
Date: 09-01-2009
DOI: 10.1002/PROT.22329
Publisher: Springer US
Date: 2007
Publisher: No publisher found
Date: 2014
Publisher: Wiley
Date: 05-10-2018
DOI: 10.1002/JCC.25534
Abstract: Predicting protein structure from sequence alone is challenging. Thus, the majority of methods for protein structure prediction rely on evolutionary information from multiple sequence alignments. In previous work we showed that Long Short-Term Bidirectional Recurrent Neural Networks (LSTM-BRNNs) improved over regular neural networks by better capturing intra-sequence dependencies. Here we show a single-sequence-based prediction method employing LSTM-BRNNs (SPIDER3-Single), that consistently achieves Q3 accuracy of 72.5%, and correlation coefficient of 0.67 between predicted and actual solvent accessible surface area. Moreover, it yields reasonably accurate prediction of eight-state secondary structure, main-chain angles (backbone ϕ and ψ torsion angles and C α-atom-based θ and τ angles), half-sphere exposure, and contact number. The method is more accurate than the corresponding evolutionary-based method for proteins with few sequence homologs, and computationally efficient for large-scale screening of protein-structural properties. It is available as an option in the SPIDER3 server, and a standalone version for download, at sparks-lab.org. © 2018 Wiley Periodicals, Inc.
Publisher: Springer New York
Date: 2019
DOI: 10.1007/978-1-4939-9045-0_27
Abstract: Plant long noncoding RNAs (lncRNAs) play important functional roles in various biological processes. Most databases deposit all plant lncRNA candidates produced by high-throughput experimental and/or computational techniques. There are several databases for experimentally validated lncRNAs. However, these databases are small in scale (with a few hundreds of lncRNAs only) and specific in their focuses (plants, diseases, or interactions). Thus, we established EVLncRNAs by curating lncRNAs validated by low-throughput experiments (up to May 1, 2016) and integrating specific databases (lncRNAdb, LncRANDisease, Lnc2Cancer, and PLNIncRBase) with additional functional and disease-specific information not covered previously. The current version of EVLncRNAs contains 1543 lncRNAs from 77 species, including 428 plant lncRNAs from 44 plant species. Compared to PLNIncRBase, our dataset does not contain any lncRNAs from microarray and deep sequencing. Moreover, 40% of entries contain new information (interaction and additional information from NCBI and Ensembl). The database allows users to browse, search, and download as well as to submit experimentally validated lncRNAs. The database is available at biophy.dzu.edu.cn/EVLncRNAs .
Publisher: Springer New York
Date: 28-10-2016
DOI: 10.1007/978-1-4939-6406-2_6
Abstract: Predicting one-dimensional structure properties has played an important role to improve prediction of protein three-dimensional structures and functions. The most commonly predicted properties are secondary structure and accessible surface area (ASA) representing local and nonlocal structural characteristics, respectively. Secondary structure prediction is further complemented by prediction of continuous main-chain torsional angles. Here we describe a newly developed method SPIDER2 that utilizes three iterations of deep learning neural networks to improve the prediction accuracy of several structural properties simultaneously. For an independent test set of 1199 proteins SPIDER2 achieves 82 % accuracy for secondary structure prediction, 0.76 for the correlation coefficient between predicted and actual solvent accessible surface area, 19° and 30° for mean absolute errors of backbone φ and ψ angles, respectively, and 8° and 32° for mean absolute errors of Cα-based θ and τ angles, respectively. The method provides state-of-the-art, all-in-one accurate prediction of local structure and solvent accessible surface area. The method is implemented, as a webserver along with a standalone package that are available in our website: sparks-lab.org .
Publisher: Springer Science and Business Media LLC
Date: 05-2021
DOI: 10.1007/S11033-021-06476-W
Abstract: Argonaute proteins are highly conserved and widely expressed in almost all organisms. They not only play a critical role in the biogenesis of small RNAs but also defend against invading nucleic acids via small RNA or DNA-mediated gene silencing pathways. One functional mechanism of Argonaute proteins is acting as a nucleic-acid-guided endonuclease, which can cleave targets complementary to DNA or RNA guides. The cleavage then leads to translational silencing directly or indirectly by recruiting additional silencing proteins. Here, we summarized the latest research progress in structural and biological studies of Argonaute proteins and pointed out their potential applications in the field of gene editing.
Publisher: Springer Science and Business Media LLC
Date: 09-1988
DOI: 10.1007/BF01011655
Publisher: American Chemical Society (ACS)
Date: 19-01-2002
DOI: 10.1021/JP013824R
Publisher: Elsevier BV
Date: 07-2020
Publisher: AIP Publishing
Date: 25-02-2019
DOI: 10.1063/1.5082351
Abstract: Experiments have shown that cholesterol influences the membrane permeability of small molecules, amino acids, and cell-penetrating peptides. However, their exact translocation mechanisms under the influence of cholesterol remain poorly understood. Given the practical importance of cell-penetrating peptides and the existence of varied cholesterol contents in different cell types, it is necessary to examine the permeation of amino acids in cholesterol-containing membranes at atomic level of details. Here, bias-exchange metadynamics simulations were employed to investigate the molecular mechanism of the membrane permeation of two amino acids Arg and Trp important for cell-penetrating peptides in the presence of different concentrations of cholesterol. We found that the free energy barrier of Arg+ (the protonated form) permeation increased linearly as the cholesterol concentration increased, whereas the barrier of Trp permeation had a rapid increase from 0 mol. % to 20 mol. % cholesterol-containing membranes and nearly unchanged from 20 mol. % to 40 mol. % cholesterol-containing membranes. Arg0 becomes slightly more stable than Arg+ at the center of the dipalmitoylphosphatidylcholine (DPPC) membrane with 40 mol. % cholesterol concentrations. As a result, Arg+ has a similar permeability as Trp at 0 mol. % and 20 mol. % cholesterol, but a significantly lower permeability than Trp at 40 mol. % cholesterol. This difference is caused by the gradual reduction of water defects for Arg+ as the cholesterol concentration increases but lack of water defects for Trp in cholesterol-containing membranes. Strong but different orientation dependence between Arg+ and Trp permeations is observed. These results provide an improved microscopic understanding of amino-acid permeation through cholesterol-containing DPPC membrane systems.
Publisher: Springer Science and Business Media LLC
Date: 03-09-2018
DOI: 10.1038/S41598-018-31241-8
Abstract: The interaction of carbohydrate-binding proteins (CBPs) with their corresponding glycan ligands is challenging to study both experimentally and computationally. This is in part due to their low binding affinity, high flexibility, and the lack of a linear sequence in carbohydrates, as exists in nucleic acids and proteins. We recently described a function-prediction technique called SPOT-Struc that identifies CBPs by global structural alignment and binding-affinity prediction. Here we experimentally determined the carbohydrate specificity and binding affinity of YesU (RCSB PDB ID: 1oq1), an uncharacterized protein from Bacillus subtilis that SPOT-Struc predicted would bind high mannose-type glycans. Glycan array analyses however revealed glycan binding patterns similar to those exhibited by fucose (Fuc)-binding lectins, with SPR analysis revealing high affinity binding to Lewis x and lacto- N -fucopentaose III. Structure based alignment of YesU revealed high similarity to the legume lectins UEA-I and GS-IV, and docking of Lewis x into YesU revealed a complex structure model with predicted binding affinity of −4.3 kcal/mol. Moreover the adherence of B. subtilis to intestinal cells was significantly inhibited by Le x and Le y but by not non-fucosylated glycans, suggesting the interaction of YesU to fucosylated glycans may be involved in the adhesion of B. subtilis to the gastrointestinal tract of mammals.
Publisher: Oxford University Press (OUP)
Date: 11-03-2021
DOI: 10.1093/BIOINFORMATICS/BTAB165
Abstract: The recent discovery of numerous non-coding RNAs (long non-coding RNAs, in particular) has transformed our perception about the roles of RNAs in living organisms. Our ability to understand them, however, is h ered by our inability to solve their secondary and tertiary structures in high resolution efficiently by existing experimental techniques. Computational prediction of RNA secondary structure, on the other hand, has received much-needed improvement, recently, through deep learning of a large approximate data, followed by transfer learning with gold-standard base-pairing structures from high-resolution 3-D structures. Here, we expand this single-sequence-based learning to the use of evolutionary profiles and mutational coupling. The new method allows large improvement not only in canonical base-pairs (RNA secondary structures) but more so in base-pairing associated with tertiary interactions such as pseudoknots, non-canonical and lone base-pairs. In particular, it is highly accurate for those RNAs of more than 1000 homologous sequences by achieving & .8 F1-score (harmonic mean of sensitivity and precision) for 14/16 RNAs tested. The method can also significantly improve base-pairing prediction by incorporating artificial but functional homologous sequences generated from deep mutational scanning without any modification. The fully automatic method (publicly available as server and standalone software) should provide the scientific community a new powerful tool to capture not only the secondary structure but also tertiary base-pairing information for building three-dimensional models. It also highlights the future of accurately solving the base-pairing structure by using a large number of natural and/or artificial homologous sequences. Standalone-version of SPOT-RNA2 is available at aswindersingh2/SPOT-RNA2. Direct prediction can also be made at erver/spot-rna2/. The datasets used in this research can also be downloaded from the GITHUB and the webserver mentioned above. Supplementary data are available at Bioinformatics online.
Publisher: eLife Sciences Publications, Ltd
Date: 11-09-2023
DOI: 10.7554/ELIFE.90254
Publisher: Springer Science and Business Media LLC
Date: 03-2006
DOI: 10.1007/BF02904509
Publisher: Wiley
Date: 10-07-2007
DOI: 10.1002/PROT.21498
Publisher: Springer Science and Business Media LLC
Date: 17-06-2011
Abstract: Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abundance of disorder in protein families, classes, and complete proteomes, and for the analysis of disorder-related protein functions. The above investigations currently utilize the disorder content derived from the per-residue disorder predictions. We show that these predictions may over-or under-predict the overall amount of disorder, which motivates development of novel tools for direct and accurate sequence-based prediction of the disorder content. We hypothesize that sequence-level aggregation of input information may provide more accurate content prediction when compared with the content extracted from the local window-based residue-level disorder predictors. We propose a novel predictor, DisCon, that takes advantage of a small set of 29 custom-designed descriptors that aggregate and hybridize information concerning sequence, evolutionary profiles, and predicted secondary structure, solvent accessibility, flexibility, and annotation of globular domains. Using these descriptors and a ridge regression model, DisCon predicts the content with low, 0.05, mean squared error and high, 0.68, Pearson correlation. This is a statistically significant improvement over the content computed from outputs of ten modern disorder predictors on a test dataset with proteins that share low sequence identity with the training sequences. The proposed predictive model is analyzed to discuss factors related to the prediction of the disorder content. DisCon is a high-quality alternative for high-throughput annotation of the disorder content. We also empirically demonstrate that the DisCon's predictions can be used to improve binary annotations of the disordered residues from the real-value disorder propensities generated by current residue-level disorder predictors. The web server that implements the DisCon is available at biomine.ece.ualberta.ca/DisCon/ .
Publisher: Wiley
Date: 20-01-2009
DOI: 10.1002/PROT.22343
Publisher: Oxford University Press (OUP)
Date: 13-05-2021
DOI: 10.1093/BIOINFORMATICS/BTAB316
Abstract: Knowing protein secondary and other one-dimensional structural properties are essential for accurate protein structure and function prediction. As a result, many methods have been developed for predicting these one-dimensional structural properties. However, most methods relied on evolutionary information that may not exist for many proteins due to a lack of sequence homologs. Moreover, it is computationally intensive for obtaining evolutionary information as the library of protein sequences continues to expand exponentially. Here, we developed a new single-sequence method called SPOT-1D-Single based on a large training dataset of 39 120 proteins deposited prior to 2016 and an ensemble of hybrid long-short-term-memory bidirectional neural network and convolutional neural network. We showed that SPOT-1D-Single consistently improves over SPIDER3-Single and ProteinUnet for secondary structure, solvent accessibility, contact number and backbone angles prediction for all seven independent test sets (TEST2018, SPOT-2016, SPOT-2016-HQ, SPOT-2018, SPOT-2018-HQ, CASP12 and CASP13 free-modeling targets). For ex le, the predicted three-state secondary structure’s accuracy ranges from 72.12% to 74.28% by SPOT-1D-Single, compared to 69.1–72.6% by SPIDER3-Single and 70.6–73% by ProteinUnet. SPOT-1D-Single also predicts SS3 and SS8 with 6.24% and 6.98% better accuracy than SPOT-1D on SPOT-2018 proteins with no homologs (Neff = 1), respectively. The new method’s improvement over existing techniques is due to a larger training set combined with ensembled learning. Standalone-version of SPOT-1D-Single is available at as-preet/SPOT-1D-Single. Direct prediction can also be made at erver/spot-1d-single. The datasets used in this research can also be downloaded from GitHub. Supplementary data are available at Bioinformatics online.
Publisher: American Chemical Society (ACS)
Date: 29-12-2009
DOI: 10.1021/BI8017043
Publisher: Wiley
Date: 09-10-2002
DOI: 10.1002/PROT.10241
Abstract: The stability scale of 20 amino acid residues is derived from a database of 1023 mutation experiments on 35 proteins. The resulting scale of hydrophobic residues has an excellent correlation with the octanol-to-water transfer free energy corrected with an additional Flory-Huggins molar-volume term (correlation coefficient r = 0.95, slope = 1.05, and a near zero intercept). Thus, hydrophobic contribution to folding stability is characterized remarkably well by transfer experiments. However, no corresponding correlation is found for hydrophilic residues. Both the hydrophilic portion and the entire scale, however, correlate strongly with average burial accessible surface (r = 0.76 and 0.97, respectively). Such a strong correlation leads to a near uniform value of the atomic solvation parameters for atoms C, S, O/N, O(-0.5), and N(+0.5,1). All are in the range of 12-28 cal x mol(-1) A(-2), close to the original estimate of hydrophobic contribution of 25-30 cal x mol(-1) A(-2) to folding stability. Without any adjustable parameters, the new stability scale and new atomic solvation parameters yielded an accurate prediction of protein-protein binding free energy for a separate database of 21 protein-protein complexes (r = 0.80 and slope = 1.06, and r = 0.83 and slope = 0.93, respectively).
Publisher: Wiley
Date: 27-06-2007
DOI: 10.1002/PROT.21492
Abstract: In this article, we perform a dynamic Monte Carlo simulation study of the helix-coil transition by using a bond-fluctuation lattice model. The results of the simulations are compared with those predicted by the Zimm-Bragg statistical thermodynamic theory with propagation and nucleation parameters determined from simulation data. The Zimm-Bragg theory provides a satisfactory description of the helix-coil transition of a homopolypeptide chain of 32 residues (N = 32). For such a medium-length chain, however, the analytical equation based on a widely-used large-N approximation to the Zimm-Bragg theory is not suitable to predict the average length of helical blocks at low temperatures when helicity is high. We propose an analytical large-eigenvalue (lambda) approximation. The new equation yields a significantly improved agreement on the average helix-block length with the original Zimm-Bragg theory for both medium and long chain lengths in the entire temperature range. Nevertheless, even the original Zimm-Bragg theory does not provide an accurate description of helix-coil transition for longer chains. We assume that the single-residue nucleation of helix formation as suggested in the original Zimm-Bragg model might be responsible for this deviation. A mechanism of nucleation by a short helical block is proposed by us and provides a significantly improved agreement with our simulation data.
Publisher: Wiley
Date: 04-03-2011
DOI: 10.1002/JCC.21747
Publisher: Oxford University Press (OUP)
Date: 29-07-2017
DOI: 10.1093/NAR/GKX677
Publisher: Elsevier BV
Date: 11-1999
Publisher: Wiley
Date: 11-2002
DOI: 10.1110/PS.0217002
Abstract: The distance-dependent structure-derived potentials developed so far all employed a reference state that can be characterized as a residue (atom)-averaged state. Here, we establish a new reference state called the distance-scaled, finite ideal-gas reference (DFIRE) state. The reference state is used to construct a residue-specific all-atom potential of mean force from a database of 1011 nonhomologous (less than 30% homology) protein structures with resolution less than 2 A. The new all-atom potential recognizes more native proteins from 32 multiple decoy sets, and raises an average Z-score by 1.4 units more than two previously developed, residue-specific, all-atom knowledge-based potentials. When only backbone and C(beta) atoms are used in scoring, the performance of the DFIRE-based potential, although is worse than that of the all-atom version, is comparable to those of the previously developed potentials on the all-atom level. In addition, the DFIRE-based all-atom potential provides the most accurate prediction of the stabilities of 895 mutants among three knowledge-based all-atom potentials. Comparison with several physical-based potentials is made.
Publisher: Oxford University Press (OUP)
Date: 03-10-2023
DOI: 10.1093/NSR/NWAD259
Publisher: Annual Reviews
Date: 06-05-2013
DOI: 10.1146/ANNUREV-BIOPHYS-083012-130315
Abstract: In the past decade, a concerted effort to successfully capture specific tertiary packing interactions produced specific three-dimensional structures for many de novo designed proteins that are validated by nuclear magnetic resonance and/or X-ray crystallographic techniques. However, the success rate of computational design remains low. In this review, we provide an overview of experimentally validated, de novo designed proteins and compare four available programs, RosettaDesign, EGAD, Liang-Grishin, and RosettaDesign-SR, by assessing designed sequences computationally. Computational assessment includes the recovery of native sequences, the calculation of sizes of hydrophobic patches and total solvent-accessible surface area, and the prediction of structural properties such as intrinsic disorder, secondary structures, and three-dimensional structures. This computational assessment, together with a recent community-wide experiment in assessing scoring functions for interface design, suggests that the next-generation protein-design scoring function will come from the right balance of complementary interaction terms. Such balance may be found when more negative experimental data become available as part of a training set.
Publisher: American Chemical Society (ACS)
Date: 22-09-2016
Abstract: Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew's correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at erver/SPRINT-CBH .
Publisher: Wiley
Date: 23-01-2008
DOI: 10.1002/PROT.21940
Abstract: The backbone structure of a protein is largely determined by the phi and psi torsion angles. Thus, knowing these angles, even if approximately, will be very useful for protein-structure prediction. However, in a previous work, a sequence-based, real-value prediction of psi angle could only achieve a mean absolute error of 54 degrees (83 degrees, 35 degrees, 33 degrees for coil, strand, and helix residues, respectively) between predicted and actual angles. Moreover, a real-value prediction of phi angle is not yet available. This article employs a neural-network based approach to improve psi prediction by taking advantage of angle periodicity and apply the new method to the prediction to phi angles. The 10-fold-cross-validated mean absolute error for the new method is 38 degrees (58 degrees, 33 degrees, 22 degrees for coil, strand, and helix, respectively) for psi and 25 degrees (35 degrees, 22 degrees, 16 degrees for coil, strand, and helix, respectively) for phi. The accuracy of real-value prediction is comparable to or more accurate than the predictions based on multistate classification of the phi-psi map. More accurate prediction of real-value angles will likely be useful for improving the accuracy of fold recognition and ab initio protein-structure prediction. The Real-SPINE 2.0 server is available on the website sparks.informatics.iupui.edu.
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Date: 10-2015
Publisher: Informa UK Limited
Date: 20-12-1995
Publisher: Informa UK Limited
Date: 26-07-2019
Publisher: American Chemical Society (ACS)
Date: 26-04-2003
DOI: 10.1021/JA029855X
Abstract: More than 22 000 folding kinetic simulations were performed to study the temperature dependence of the distribution of first passage time (FPT) for the folding of an all-atom Gō-like model of the second beta-hairpin fragment of protein G. We find that the mean FPT (MFPT) for folding has a U (or V)-shaped dependence on the temperature with a minimum at a characteristic optimal folding temperature T(opt). The optimal folding temperature T(opt) is located between the thermodynamic folding transition temperature and the solidification temperature based on the Lindemann criterion for the solid. Both the T(opt) and the MFPT decrease when the energy bias gap against nonnative contacts increases. The high-order moments are nearly constant when the temperature is higher than T(opt) and start to erge when the temperature is lower than T(opt). The distribution of FPT is close to a log-normal-like distribution at T > or = T(opt). At even lower temperatures, the distribution starts to develop long power-law-like tails, indicating the non-self-averaging intermittent behavior of the folding dynamics. It is demonstrated that the distribution of FPT can also be calculated reliably from the derivative of the fraction not folded (or fraction folded), a measurable quantity by routine ensemble-averaged experimental techniques at dilute protein concentrations.
Publisher: Oxford University Press (OUP)
Date: 28-07-2006
DOI: 10.1093/NAR/GKL454
Publisher: American Chemical Society (ACS)
Date: 26-05-2021
Publisher: Oxford University Press (OUP)
Date: 15-03-2018
DOI: 10.1093/NAR/GKY192
Publisher: Wiley
Date: 02-2004
DOI: 10.1110/PS.03348304
Publisher: Wiley
Date: 05-2012
DOI: 10.1002/PRO.2066
Publisher: Oxford University Press (OUP)
Date: 26-09-2018
DOI: 10.1093/BIOINFORMATICS/BTX614
Abstract: Protein–peptide interactions are one of the most important biological interactions and play crucial role in many diseases including cancer. Therefore, knowledge of these interactions provides invaluable insights into all cellular processes, functional mechanisms, and drug discovery. Protein–peptide interactions can be analyzed by studying the structures of protein–peptide complexes. However, only a small portion has known complex structures and experimental determination of protein–peptide interaction is costly and inefficient. Thus, predicting peptide-binding sites computationally will be useful to improve efficiency and cost effectiveness of experimental studies. Here, we established a machine learning method called SPRINT-Str (Structure-based prediction of protein–Peptide Residue-level Interaction) to use structural information for predicting protein–peptide binding residues. These predicted binding residues are then employed to infer the peptide-binding site by a clustering algorithm. SPRINT-Str achieves robust and consistent results for prediction of protein–peptide binding regions in terms of residues and sites. Matthews’ Correlation Coefficient (MCC) for 10-fold cross validation and independent test set are 0.27 and 0.293, respectively, as well as 0.775 and 0.782, respectively for area under the curve. The prediction outperforms other state-of-the-art methods, including our previously developed sequence-based method. A further spatial neighbor clustering of predicted binding residues leads to prediction of binding sites at 20–116% higher coverage than the next best method at all precision levels in the test set. The application of SPRINT-Str to protein binding with DNA, RNA and carbohydrate confirms the method‘s capability of separating peptide-binding sites from other functional sites. More importantly, similar performance in prediction of binding residues and sites is obtained when experimentally determined structures are replaced by unbound structures or quality model structures built from homologs, indicating its wide applicability. erver/SPRINT-Str Supplementary data are available at Bioinformatics online.
Publisher: Wiley
Date: 2005
DOI: 10.1002/PROT.20732
Abstract: Two single-method servers, SPARKS 2 and SP3, participated in automatic-server predictions in CASP6. The overall results for all as well as detailed performance in comparative modeling targets are presented. It is shown that both SPARKS 2 and SP3 are able to recognize their corresponding best templates for all easy comparative modeling targets. The alignment accuracy, however, is not always the best among all the servers. Possible factors are discussed. SPARKS 2 and SP3 fold recognition servers, as well as their executables, are freely available for all academic users on theory.med.buffalo.edu.
Publisher: No publisher found
Date: 2013
Publisher: Bentham Science Publishers Ltd.
Date: 09-2011
Publisher: Elsevier BV
Date: 12-2019
Publisher: Public Library of Science (PLoS)
Date: 27-06-2012
Publisher: Springer Science and Business Media LLC
Date: 09-2011
Publisher: Springer Science and Business Media LLC
Date: 22-01-2016
Publisher: Oxford University Press (OUP)
Date: 10-10-2016
DOI: 10.1093/BIOINFORMATICS/BTV580
Abstract: Motivation: The three dimensional tertiary structure of a protein at near atomic level resolution provides insight alluding to its function and evolution. As protein structure decides its functionality, similarity in structure usually implies similarity in function. As such, structure alignment techniques are often useful in the classifications of protein function. Given the rapidly growing rate of new, experimentally determined structures being made available from repositories such as the Protein Data Bank, fast and accurate computational structure comparison tools are required. This paper presents SPalignNS, a non-sequential protein structure alignment tool using a novel asymmetrical greedy search technique. Results: The performance of SPalignNS was evaluated against existing sequential and non-sequential structure alignment methods by performing trials with commonly used datasets. These benchmark datasets used to gauge alignment accuracy include (i) 9538 pairwise alignments implied by the HOMSTRAD database of homologous proteins (ii) a subset of 64 difficult alignments from set (i) that have low structure similarity (iii) 199 pairwise alignments of proteins with similar structure but different topology and (iv) a subset of 20 pairwise alignments from the RIPC set. SPalignNS is shown to achieve greater alignment accuracy (lower or comparable root-mean squared distance with increased structure overlap coverage) for all datasets, and the highest agreement with reference alignments from the challenging dataset (iv) above, when compared with both sequentially constrained alignments and other non-sequential alignments. Availability and implementation: SPalignNS was implemented in C++. The source code, binary executable, and a web server version is freely available at: sparks-lab.org Contact: yaoqi.zhou@griffith.edu.au
Publisher: Springer Science and Business Media LLC
Date: 02-2018
Publisher: Wiley
Date: 10-2009
DOI: 10.1002/PROT.22252
Publisher: Queensland University of Technology
Date: 26-11-2020
DOI: 10.5204/IJCJSD.1691
Abstract: This article explains the way that Australian coroners’ courts often fail Aboriginal and Torres Strait Islander peoples. We discuss the gap between the expectations of families of the deceased and the realities of the process of the coroner’s court. The discussion is illustrated with reference to real-life ex les, drawn from the authors’ experiences representing the families of the deceased.
Publisher: Wiley
Date: 12-09-2014
DOI: 10.1002/JCC.23718
Abstract: Because a nearly constant distance between two neighbouring Cα atoms, local backbone structure of proteins can be represented accurately by the angle between C(αi-1)-C(αi)-C(αi+1) (θ) and a dihedral angle rotated about the C(αi)-C(αi+1) bond (τ). θ and τ angles, as the representative of structural properties of three to four amino-acid residues, offer a description of backbone conformations that is complementary to φ and ψ angles (single residue) and secondary structures (>3 residues). Here, we report the first machine-learning technique for sequence-based prediction of θ and τ angles. Predicted angles based on an independent test have a mean absolute error of 9° for θ and 34° for τ with a distribution on the θ-τ plane close to that of native values. The average root-mean-square distance of 10-residue fragment structures constructed from predicted θ and τ angles is only 1.9Å from their corresponding native structures. Predicted θ and τ angles are expected to be complementary to predicted ϕ and ψ angles and secondary structures for using in model validation and template-based as well as template-free structure prediction. The deep neural network learning technique is available as an on-line server called Structural Property prediction with Integrated DEep neuRal network (SPIDER) at sparks-lab.org.
Publisher: Oxford University Press (OUP)
Date: 02-2022
DOI: 10.1093/BIOINFORMATICS/BTAC053
Abstract: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks. We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods trRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff = 1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction. Stand-alone-version of SPOT-Contact-LM is available at as-preet/SPOT-Contact-Single. Direct prediction can also be made at erver/spot-contact-single. The datasets used in this research can also be downloaded from the GitHub. Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 22-08-2016
DOI: 10.1093/BIOINFORMATICS/BTW549
Abstract: Motivation: Backbone structures and solvent accessible surface area of proteins are benefited from continuous real value prediction because it removes the arbitrariness of defining boundary between different secondary-structure and solvent-accessibility states. However, lacking the confidence score for predicted values has limited their applications. Here we investigated whether or not we can make a reasonable prediction of absolute errors for predicted backbone torsion angles, Cα-atom-based angles and torsion angles, solvent accessibility, contact numbers and half-sphere exposures by employing deep neural networks. Results: We found that angle-based errors can be predicted most accurately with Spearman correlation coefficient (SPC) between predicted and actual errors at about 0.6. This is followed by solvent accessibility (SPC∼0.5). The errors on contact-based structural properties are most difficult to predict (SPC between 0.2 and 0.3). We showed that predicted errors are significantly better error indicators than the average errors based on secondary-structure and amino-acid residue types. We further demonstrated the usefulness of predicted errors in model quality assessment. These error or confidence indictors are expected to be useful for prediction, assessment, and refinement of protein structures. Availability and Implementation: The method is available at sparks-lab.org as a part of SPIDER2 package. Contact: yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Elsevier BV
Date: 06-2004
Publisher: Elsevier BV
Date: 05-2011
Publisher: Cold Spring Harbor Laboratory
Date: 20-06-2021
DOI: 10.1101/2021.06.19.449089
Abstract: Accurate prediction of protein contact-map is essential for accurate protein structure and function prediction. As a result, many methods have been developed for protein contact map prediction. However, most methods rely on protein-sequence-evolutionary information, which may not exist for many proteins due to lack of naturally occurring homologous sequences. Moreover, generating evolutionary profiles is computationally intensive. Here, we developed a contact-map predictor utilizing the output of a pre-trained language model ESM-1b as an input along with a large training set and an ensemble of residual neural networks. We showed that the proposed method makes a significant improvement over a single-sequence-based predictor SSCpred with 15% improvement in the F1-score for the independent CASP14-FM test set. It also outperforms evolutionary-profile-based methods TrRosetta and SPOT-Contact with 48.7% and 48.5% respective improvement in the F1-score on the proteins without homologs (Neff=1) in the independent SPOT-2018 set. The new method provides a much faster and reasonably accurate alternative to evolution-based methods, useful for large-scale prediction. Stand-alone-version of SPOT-Contact-Single is available at as-preet/SPOT-Contact-Single . Direct prediction can also be made at erver/spot-contact-single . The datasets used in this research can also be downloaded from the GitHub. jaspreetsingh2@griffithuni.edu.au , k.paliwal@griffith.edu.au , and zhouyq@szbl.ac.cn Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 08-09-2021
DOI: 10.1093/BIOINFORMATICS/BTAB643
Abstract: Protein–protein interactions (PPI) play crucial roles in many biological processes, and identifying PPI sites is an important step for mechanistic understanding of diseases and design of novel drugs. Since experimental approaches for PPI site identification are expensive and time-consuming, many computational methods have been developed as screening tools. However, these methods are mostly based on neighbored features in sequence, and thus limited to capture spatial information. We propose a deep graph-based framework deep Graph convolutional network for Protein–Protein-Interacting Site prediction (GraphPPIS) for PPI site prediction, where the PPI site prediction problem was converted into a graph node classification task and solved by deep learning using the initial residual and identity mapping techniques. We showed that a deeper architecture (up to eight layers) allows significant performance improvement over other sequence-based and structure-based methods by more than 12.5% and 10.5% on AUPRC and MCC, respectively. Further analyses indicated that the predicted interacting sites by GraphPPIS are more spatially clustered and closer to the native ones even when false-positive predictions are made. The results highlight the importance of capturing spatially neighboring residues for interacting site prediction. The datasets, the pre-computed features, and the source codes along with the pre-trained models of GraphPPIS are available at iomed-AI/GraphPPIS. The GraphPPIS web server is freely available at biomed.nscc-gz.cn/apps/GraphPPIS. Supplementary data are available at Bioinformatics online.
Publisher: Cold Spring Harbor Laboratory
Date: 16-10-2021
DOI: 10.1101/2021.10.16.464622
Abstract: Protein language models have emerged as an alternative to multiple sequence alignment for enriching sequence information and improving downstream prediction tasks such as biophysical, structural, and functional properties. Here we show that a combination of traditional one-hot encoding with the embeddings from two different language models (ProtTrans and ESM-1b) allows a leap in accuracy over single-sequence based techniques in predicting protein 1D secondary and tertiary structural properties, including backbone torsion angles, solvent accessibility and contact numbers. This large improvement leads to an accuracy comparable to or better than the current state-of-the-art techniques for predicting these 1D structural properties based on sequence profiles generated from multiple sequence alignments. The high-accuracy prediction in both secondary and tertiary structural properties indicates that it is possible to make highly accurate prediction of protein structures without homologous sequences, the remaining obstacle in the post AlphaFold2 era.
Publisher: Mary Ann Liebert Inc
Date: 06-2020
Abstract: Noncoding RNAs are increasingly found to play a wide variety of roles in living organisms. Yet, their functional mechanisms are poorly understood because their structures are difficult to determine experimentally. As a result, developing more effective computational techniques to predict RNA structures becomes increasingly an urgent task. One key challenge in RNA structure prediction is the lack of an accurate free energy function to guide RNA folding and discriminate native and near-native structures from decoy conformations. In this study, we developed an all-atom distance-dependent knowledge-based energy function for RNA that is based on a reference state (distance-scaled finite ideal-gas reference state, DFIRE) proven successful for protein structure discrimination. Using four separate benchmarks including RNA puzzles, we found that this DFIRE-based RNA statistical energy function is able to discriminate native and near-native structures against decoys with performance comparable with or better than several existing scoring functions compared. The energy function is expected to be useful for improving the detection of RNA near-native structures.
Publisher: Cold Spring Harbor Laboratory
Date: 16-03-2023
DOI: 10.1101/2023.03.15.532863
Abstract: Compared to proteins, DNA and RNA are more difficult languages to interpret because 4-letter-coded DNA/RNA sequences have less information content than 20-letter-coded protein sequences. While BERT (Bidirectional Encoder Representations from Transformers)-like language models have been developed for RNA, they are ineffective at capturing the evolutionary information from homologous sequences because unlike proteins, RNA sequences are less conserved. Here, we have developed an unsupervised Multiple sequence-alignment-based RNA language model (RNA-MSM) by utilizing homologous sequences from an automatic pipeline, RNAcmap. The resulting unsupervised, two-dimensional attention maps and one-dimensional embeddings from RNA-MSM can be directly mapped with high accuracy to 2D base pairing probabilities and 1D solvent accessibilities, respectively. Further fine-tuning led to significantly improved performance on these two downstream tasks over existing state-of-the-art techniques. We anticipate that the pre-trained RNA-MSM model can be fine-tuned on many other tasks related to RNA structure and function.
Publisher: Wiley
Date: 07-02-2008
DOI: 10.1002/PROT.21968
Abstract: Proteins fold into unique three-dimensional structures by specific, orientation-dependent interactions between amino acid residues. Here, we extract orientation-dependent interactions from protein structures by treating each polar atom as a dipole with a direction. The resulting statistical energy function successfully refolds 13 out of 16 fully unfolded secondary-structure terminal regions of 10-23 amino acid residues in 15 small proteins. Dissecting the orientation-dependent energy function reveals that the orientation preference between hydrogen-bonded atoms is not enough to account for the structural specificity of proteins. The result has significant implications on the theoretical and experimental searches for specific interactions involved in protein folding and molecular recognition between proteins and other biologically active molecules.
Publisher: Oxford University Press (OUP)
Date: 05-12-2017
DOI: 10.1093/BIOINFORMATICS/BTW668
Abstract: The quality of fragment library determines the efficiency of fragment assembly, an approach that is widely used in most de novo protein-structure prediction algorithms. Conventional fragment libraries are constructed mainly based on the identities of amino acids, sometimes facilitated by predicted information including dihedral angles and secondary structures. However, it remains challenging to identify near-native fragment structures with low sequence homology. We introduce a novel fragment-library-construction algorithm, LRFragLib, to improve the detection of near-native low-homology fragments of 7–10 residues, using a multi-stage, flexible selection protocol. Based on logistic regression scoring models, LRFragLib outperforms existing techniques by achieving a significantly higher precision and a comparable coverage on recent CASP protein sets in s ling near-native structures. The method also has a comparable computational efficiency to the fastest existing techniques with substantially reduced memory usage. The source code is available for download at 166.111.152.91/Downloads.html Supplementary data are available at Bioinformatics online.
Publisher: MDPI AG
Date: 16-03-2018
DOI: 10.3390/IJMS19030885
Publisher: SAGE Publications
Date: 10-09-2020
Abstract: Between 2013 and 2019, an estimated 200 children seeking asylum in Australia were detained on the island of Nauru. In 2018, 15 of these children developed the rare and life-threatening pervasive refusal syndrome (PRS). This paper describes the PRS case cluster, the complexities faced by clinicians managing these cases, and the lessons that can be learned from this outbreak. The emergence of PRS on Nauru highlighted the risks of long-term detention of children in settings that are unable to meet their physical and psycho-social needs. The case cluster also underscored (a) the difficulties faced by doctors working in conditions where their medical and legal obligations may be in direct conflict, and (b) the role of clinicians in patient advocacy.
Publisher: Elsevier BV
Date: 11-1992
Publisher: Public Library of Science (PLoS)
Date: 04-06-2008
Publisher: Wiley
Date: 13-02-2007
DOI: 10.1002/PROT.21298
Abstract: An integrated system of neural networks, called SPINE, is established and optimized for predicting structural properties of proteins. SPINE is applied to three-state secondary-structure and residue-solvent-accessibility (RSA) prediction in this paper. The integrated neural networks are carefully trained with a large dataset of 2640 chains, sequence profiles generated from multiple sequence alignment, representative amino acid properties, a slow learning rate, overfitting protection, and an optimized sliding-widow size. More than 200,000 weights in SPINE are optimized by maximizing the accuracy measured by Q(3) (the percentage of correctly classified residues). SPINE yields a 10-fold cross-validated accuracy of 79.5% (80.0% for chains of length between 50 and 300) in secondary-structure prediction after one-month (CPU time) training on 22 processors. An accuracy of 87.5% is achieved for exposed residues (RSA >95%). The latter approaches the theoretical upper limit of 88-90% accuracy in assigning secondary structures. An accuracy of 73% for three-state solvent-accessibility prediction (25%/75% cutoff) and 79.3% for two-state prediction (25% cutoff) is also obtained.
Publisher: Oxford University Press (OUP)
Date: 05-09-2019
DOI: 10.1093/BIOINFORMATICS/BTZ691
Abstract: Protein intrinsic disorder describes the tendency of sequence residues to not fold into a rigid three-dimensional shape by themselves. However, some of these disordered regions can transition from disorder to order when interacting with another molecule in segments known as molecular recognition features (MoRFs). Previous analysis has shown that these MoRF regions are indirectly encoded within the prediction of residue disorder as low-confidence predictions [i.e. in a semi-disordered state P(D)≈0.5]. Thus, what has been learned for disorder prediction may be transferable to MoRF prediction. Transferring the internal characterization of protein disorder for the prediction of MoRF residues would allow us to take advantage of the large training set available for disorder prediction, enabling the training of larger analytical models than is currently feasible on the small number of currently available annotated MoRF proteins. In this paper, we propose a new method for MoRF prediction by transfer learning from the SPOT-Disorder2 ensemble models built for disorder prediction. We confirm that directly training on the MoRF set with a randomly initialized model yields substantially poorer performance on independent test sets than by using the transfer-learning-based method SPOT-MoRF, for both deep and simple networks. Its comparison to current state-of-the-art techniques reveals its superior performance in identifying MoRF binding regions in proteins across two independent testing sets, including our new dataset of & protein chains. These test chains share & % sequence similarity to all training and validation proteins used in SPOT-Disorder2 and SPOT-MoRF, and provide a much-needed large-scale update on the performance of current MoRF predictors. The method is expected to be useful in locating functional disordered regions in proteins. SPOT-MoRF and its data are available as a web server and as a standalone program at: ack/server/SPOT-MoRF/index.php. Supplementary data are available at Bioinformatics online.
Publisher: Elsevier BV
Date: 12-2013
Publisher: Wiley
Date: 09-03-2009
DOI: 10.1002/PROT.22384
Publisher: Informa UK Limited
Date: 20-12-1989
Publisher: Elsevier BV
Date: 2004
Publisher: Wiley
Date: 07-2003
DOI: 10.1110/PS.0305103
Abstract: Helices in membrane spanning regions are more tightly packed than the helices in soluble proteins. Thus, we introduce a method that uses a simple scale of burial propensity and a new algorithm to predict transmembrane helical (TMH) segments and a positive-inside rule to predict amino-terminal orientation. The method (the topology predictor of transmembrane helical proteins using mean burial propensity [THUMBUP]) correctly predicted the topology of 55 of 73 proteins (or 75%) with known three-dimensional structures (the 3D helix database). This level of accuracy can be reached by MEMSAT 1.8 (a 200-parameter model-recognition method) and a new HMM-based method (a 111-parameter hidden Markov model, UMDHMM(TMHP)) if they were retrained with the 73-protein database. Thus, a method based on a physiochemical property can provide topology prediction as accurate as those methods based on more complicated statistical models and learning algorithms for the proteins with accurately known structures. Commonly used HMM-based methods and MEMSAT 1.8 were trained with a combination of the partial 3D helix database and a 1D helix database of TMH proteins in which topology information were obtained by gene fusion and other experimental techniques. These methods provide a significantly poorer prediction for the topology of TMH proteins in the 3D helix database. This suggests that the 1D helix database, because of its inaccuracy, should be avoided as either a training or testing database. A Web server of THUMBUP and UMDHMM(TMHP) is established for academic users at hys_bio/service.htm. The 3D helix database is also available from the same Web site.
Publisher: MDPI AG
Date: 13-04-2018
DOI: 10.3390/IJMS19041186
Publisher: Wiley
Date: 25-09-2014
DOI: 10.1002/PROT.24682
Publisher: No publisher found
Date: 2013
Publisher: Wiley
Date: 22-11-2013
DOI: 10.1002/PROT.24441
Publisher: Oxford University Press (OUP)
Date: 18-08-2021
DOI: 10.1093/BIOINFORMATICS/BTAB598
Abstract: Despite many successes, de novo protein design is not yet a solved problem as its success rate remains low. The low success rate is largely because we do not yet have an accurate energy function for describing the solvent-mediated interaction between amino acid residues in a protein chain. Previous studies showed that an energy function based on series expansions with its parameters optimized for side-chain and loop conformations can lead to one of the most accurate methods for side chain (OSCAR) and loop prediction (LEAP). Following the same strategy, we developed an energy function based on series expansions with the parameters optimized in four separate stages (recovering single-residue types without and with orientation dependence, selecting loop decoys and maintaining the composition of amino acids). We tested the energy function for de novo design by using Monte Carlo simulated annealing. The method for protein design (OSCAR-Design) is found to be as accurate as OSCAR and LEAP for side-chain and loop prediction, respectively. In de novo design, it can recover native residue types ranging from 38% to 43% depending on test sets, conserve hydrophobic/hydrophilic residues at ∼75%, and yield the overall similarity in amino acid compositions at more than 90%. These performance measures are all statistically significantly better than several protein design programs compared. Moreover, the largest hydrophobic patch areas in designed proteins are near or smaller than those in native proteins. Thus, an energy function based on series expansion can be made useful for protein design. The Linux executable version is freely available for academic users at zhouyq-lab.szbl.ac.cn/resources/.
Publisher: Wiley
Date: 27-09-2019
Publisher: American Chemical Society (ACS)
Date: 12-12-2022
Publisher: Wiley
Date: 15-09-2014
DOI: 10.1002/JCC.23730
Abstract: Carbohydrate-binding proteins (CBPs) are potential biomarkers and drug targets. However, the interactions between carbohydrates and proteins are challenging to study experimentally and computationally because of their low binding affinity, high flexibility, and the lack of a linear sequence in carbohydrates as exists in RNA, DNA, and proteins. Here, we describe a structure-based function-prediction technique called SPOT-Struc that identifies carbohydrate-recognizing proteins and their binding amino acid residues by structural alignment program SPalign and binding affinity scoring according to a knowledge-based statistical potential based on the distance-scaled finite-ideal gas reference state (DFIRE). The leave-one-out cross-validation of the method on 113 carbohydrate-binding domains and 3442 noncarbohydrate binding proteins yields a Matthews correlation coefficient of 0.56 for SPalign alone and 0.63 for SPOT-Struc (SPalign + binding affinity scoring) for CBP prediction. SPOT-Struc is a technique with high positive predictive value (79% correct predictions in all positive CBP predictions) with a reasonable sensitivity (52% positive predictions in all CBPs). The sensitivity of the method was changed slightly when applied to 31 APO (unbound) structures found in the protein databank (14/31 for APO versus 15/31 for HOLO). The result of SPOT-Struc will not change significantly if highly homologous templates were used. SPOT-Struc predicted 19 out of 2076 structural genome targets as CBPs. In particular, one uncharacterized protein in Bacillus subtilis (1oq1A) was matched to galectin-9 from Mus musculus. Thus, SPOT-Struc is useful for uncovering novel carbohydrate-binding proteins. SPOT-Struc is available at sparks-lab.org.
Publisher: Elsevier BV
Date: 2002
Publisher: AIP Publishing
Date: 05-1990
DOI: 10.1063/1.458486
Abstract: A formally exact nonlocal density-functional expansion procedure for direct correlation functions developed earlier by Stell for a homogeneous system, and extended by Blum and Stell, Sullivan and Stell, and ourselves to various inhomogeneous systems, is used here to derive nonlocal integral-equation approximations. Two of the simplest of these approximations (zeroth order), which we shall characterize here as the hydrostatic Percus–Yevick (HPY) approximation and the hydrostatic hypernetted-chain (HHNC) approximation, respectively, are shown to be capable of accounting for wetting transitions on the basis of general theoretical considerations. Before turning to such transitions, we investigate in this first paper of a series the case of homogeneous hard-sphere fluids and hard spheres near a hard wall as well as the case of hard spheres inside a slit pore. Numerical results show that the HHNC approximation is better than the HNC approximation for both the homogeneous and inhomogeneous systems considered here while the HPY approximation appears to overcorrect the PY approximation.
Publisher: AIP Publishing
Date: 10-06-2008
DOI: 10.1063/1.2936832
Abstract: This paper examines the folding mechanism of an in idual β-hairpin in the presence of other hairpins by using an off-lattice model of a small triple-stranded antiparallel β-sheet protein, Pin1 WW domain. The turn zipper model and the hydrophobic collapse model originally developed for a single β-hairpin in literature is confirmed to be useful in describing β-hairpins in model Pin1 WW domain. We find that the mechanism for folding a specific hairpin is independent of whether it folds first or second, but the formation process are significantly dependent on temperature. More specifically, β1-β2 hairpin folds via the turn zipper model at a low temperature and the hydrophobic collapse model at a high temperature, while the folding of β2-β3 hairpin follows the turn zipper model at both temperatures. The change in folding mechanisms is interpreted by the interplay between contact stability (enthalpy) and loop lengths (entropy), the effect of which is temperature dependent.
Publisher: Hindawi Limited
Date: 09-2019
DOI: 10.1002/HUMU.23875
Publisher: American Chemical Society (ACS)
Date: 30-01-2019
Abstract: Peptide-binding domains have been successfully targeted in therapeutic applications. However, many peptide-binding proteins (PBPs) remain uncharacterized. Computational prediction of peptide-domain interfaces is challenging due to short lengths, lack of well-defined structures, and limited conservation of peptide motifs. Here we present SPOT-peptide, a template-based protocol for the simultaneous prediction of peptide-binding domains and peptide binding sites independent of specific peptide composition. SPOT-peptide leverages the dogmatic relationship between protein structure and function to predict peptide-binding characteristics for an unknown target based on remote structural homologues. In a leave-homologue out benchmark evaluation, PBPs are discriminated with a Matthews correlation coefficient (MCC) of 0.420 and the correct binding sites are identified in 80% of the predicted PBPs. Furthermore, replacing the holo target structures with equivalent structures in the apo conformation only marginally diminished PBP recovery. The method is available as a web server at om/SPOT-peptide .
Publisher: Wiley
Date: 02-2004
DOI: 10.1110/PS.03411904
Publisher: No publisher found
Date: 2013
Publisher: Wiley
Date: 13-04-2016
DOI: 10.1002/JCC.24380
Abstract: Structure-based virtual screening usually involves docking of a library of chemical compounds onto the functional pocket of the target receptor so as to discover novel classes of ligands. However, the overall success rate remains low and screening a large library is computationally intensive. An alternative to this "ab initio" approach is virtual screening by binding homology search. In this approach, potential ligands are predicted based on similar interaction pairs (similarity in receptors and ligands). SPOT-Ligand is an approach that integrates ligand similarity by Tanimoto coefficient and receptor similarity by protein structure alignment program SPalign. The method was found to yield a consistent performance in DUD and DUD-E docking benchmarks even if model structures were employed. It improves over docking methods (DOCK6 and AUTODOCK Vina) and has a performance comparable to or better than other binding-homology methods (FINDsite and PoLi) with higher computational efficiency. The server is available at sparks-lab.org. © 2016 Wiley Periodicals, Inc.
Publisher: Springer Science and Business Media LLC
Date: 05-04-2017
Publisher: Wiley
Date: 14-05-2018
DOI: 10.1002/JCC.25353
Abstract: Malonylation is a recently discovered post-translational modification (PTM) in which a malonyl group attaches to a lysine (K) amino acid residue of a protein. In this work, a novel machine learning model, SPRINT-Mal, is developed to predict malonylation sites by employing sequence and predicted structural features. Evolutionary information and physicochemical properties are found to be the two most discriminative features whereas a structural feature called half-sphere exposure provides additional improvement to the prediction performance. SPRINT-Mal trained on mouse data yields robust performance for 10-fold cross validation and independent test set with Area Under the Curve (AUC) values of 0.74 and 0.76 and Matthews' Correlation Coefficient (MCC) of 0.213 and 0.20, respectively. Moreover, SPRINT-Mal achieved comparable performance when testing on H. sapiens proteins without species-specific training but not in bacterium S. erythraea. This suggests similar underlying physicochemical mechanisms between mouse and human but not between mouse and bacterium. SPRINT-Mal is freely available as an online server at: erver/SPRINT-Mal/. © 2018 Wiley Periodicals, Inc.
Publisher: Oxford University Press (OUP)
Date: 14-07-2005
DOI: 10.1093/BIOINFORMATICS/BTI582
Abstract: Motivation: Multiple sequence alignment is an essential part of bioinformatics tools for a genome-scale study of genes and their evolution relations. However, making an accurate alignment between remote homologs is challenging. Here, we develop a method, called SPEM, that aligns multiple sequences using pre-processed sequence profiles and predicted secondary structures for pairwise alignment, consistency-based scoring for refinement of the pairwise alignment and a progressive algorithm for final multiple alignment. Results: The alignment accuracy of SPEM is compared with those of established methods such as ClustalW, T-Coffee, MUSCLE, ProbCons and PRALINEPSI in easy (homologs) and hard (remote homologs) benchmarks. Results indicate that the average sum of pairwise alignment scores given by SPEM are 7–15% higher than those of the methods compared in aligning remote homologs (sequence identity & %). Its accuracy for aligning homologs (sequence identity & %) is statistically indistinguishable from those of the state-of-the-art techniques such as ProbCons or MUSCLE 6.0. Availability: The SPEM server and its executables are available on theory.med.buffalo.edu Contact: yqzhou@buffalo.edu
Publisher: Wiley
Date: 07-2005
DOI: 10.1110/PS.041311005
Publisher: Springer Science and Business Media LLC
Date: 11-1988
DOI: 10.1007/BF01133263
Publisher: Wiley
Date: 08-11-2010
DOI: 10.1002/PROT.22842
Publisher: AIP Publishing
Date: 20-06-2008
DOI: 10.1063/1.2937135
Abstract: Reaching the native states of small proteins, a necessary step towards a comprehensive understanding of the folding mechanisms, has remained a tremendous challenge to ab initio protein folding simulations despite the extensive effort. In this work, the folding process of the B domain of protein A (BdpA) has been simulated by both conventional and replica exchange molecular dynamics using AMBER FF03 all-atom force field. Started from an extended chain, a total of 40 conventional (each to 1.0μs) and two sets of replica exchange (each to 200.0ns per replica) molecular dynamics simulations were performed with different generalized-Born solvation models and temperature control schemes. The improvements in both the force field and solvent model allowed successful simulations of the folding process to the native state as demonstrated by the 0.80Å Cα root mean square deviation (RMSD) of the best folded structure. The most populated conformation was the native folded structure with a high population. This was a significant improvement over the 2.8Å Cα RMSD of the best nativelike structures from previous ab initio folding studies on BdpA. To the best of our knowledge, our results demonstrate, for the first time, that ab initio simulations can reach the native state of BdpA. Consistent with experimental observations, including Φ-value analyses, formation of helix II/III hairpin was a crucial step that provides a template upon which helix I could form and the folding process could complete. Early formation of helix III was observed which is consistent with the experimental results of higher residual helical content of isolated helix III among the three helices. The calculated temperature-dependent profile and the melting temperature were in close agreement with the experimental results. The simulations further revealed that phenylalanine 31 may play critical to achieve the correct packing of the three helices which is consistent with the experimental observation. In addition to the mechanistic studies, an ab initio structure prediction was also conducted based on both the physical energy and a statistical potential. Based on the lowest physical energy, the predicted structure was 2.0Å Cα RMSD away from the experimentally determined structure.
Publisher: Wiley
Date: 07-2002
DOI: 10.1110/PS.0205002
Abstract: Alpha helices, beta strands, and loops are the basic building blocks of protein structure. The folding kinetics of alpha helices and beta strands have been investigated extensively. However, little is known about the formation of loops. Experimental studies show that for some proteins, the formation of a single loop is the rate-determining step for folding, whereas for others, a loop (or turn) can misfold to serve as the hinge loop region for domain-swapped species. Computer simulations of an all-atom model of fragment B of Staphylococcal protein A found that the formation of a single loop initiates the dominant folding pathway. On the other hand, the stability analysis of intermediates suggests that the same loop is a likely candidate to serve as a hinge loop for domain swapping. To interpret the simulation result, we developed a simple structural parameter: the loop contact distance (LCD), or the sequence distance of contacting residues between a loop and the rest of the protein. The parameter is applied to a number of other proteins, including SH3 domains and prion protein. The results suggest that a locally interacting loop (low LCD) can either promote folding or serve as the hinge region for domain swapping. Thus, there is an intimate connection between folding and domain swapping, a possible cause of misfolding and aggregation.
Publisher: Oxford University Press (OUP)
Date: 03-2002
DOI: 10.1093/CHROMSCI/40.3.122
Abstract: A solid-phase microextraction (SPME) s ling method is developed to evaluate indoor exposure to benzene, toluene, ethylbenzene, xylene, and styrene with gas chromatography and flame ionization detection for quantitative analysis. An SPME holder with a 100-pm polydimethylsiloxane (PDMS) and 65-pm PDMS- inylbenzene fiber coating is tested in different air relative humidity conditions. The method gives good resolution, shows a linear response, is repeatable, and presents high sensitivity. This method is compared with National Institute of Occupational Safety and Health (NIOSH) active s ling.
Publisher: Oxford University Press (OUP)
Date: 22-11-2020
DOI: 10.1093/NAR/GKAA1076
Abstract: Long non-coding RNAs (lncRNAs) play important functional roles in many erse biological processes. However, not all expressed lncRNAs are functional. Thus, it is necessary to manually collect all experimentally validated functional lncRNAs (EVlncRNA) with their sequences, structures, and functions annotated in a central database. The first release of such a database (EVLncRNAs) was made using the literature prior to 1 May 2016. Since then (till 15 May 2020), 19 245 articles related to lncRNAs have been published. In EVLncRNAs 2.0, these articles were manually examined for a major expansion of the data collected. Specifically, the number of annotated EVlncRNAs, associated diseases, lncRNA-disease associations, and interaction records were increased by 260%, 320%, 484% and 537%, respectively. Moreover, the database has added several new categories: 8 lncRNA structures, 33 exosomal lncRNAs, 188 circular RNAs, and 1079 drug-resistant, chemoresistant, and stress-resistant lncRNAs. All records have checked against known retraction and fake articles. This release also comes with a highly interactive visual interaction network that facilitates users to track the underlying relations among lncRNAs, miRNAs, proteins, genes and other functional elements. Furthermore, it provides links to four new bioinformatics tools with improved data browsing and searching functionality. EVLncRNAs 2.0 is freely available at www.sdklab-biophysics-dzu.net/EVLncRNAs2/.
Publisher: Public Library of Science (PLoS)
Date: 14-09-2017
Publisher: Informa UK Limited
Date: 10-12-1995
Publisher: Wiley
Date: 28-02-2002
DOI: 10.1002/PROT.10065
Abstract: Predicting the folding mechanism of the second beta-hairpin fragment of the Ig-binding domain B of streptococcal protein G is unexpectedly challenging for simplified reduced models because the models developed so far indicated a different folding mechanism from what was suggested from high-temperature unfolding and equilibrium free-energy surface analysis based on established all-atom empirical force fields in explicit or implicit solvent. This happened despite the use of empirical residue-based interactions, multibody hydrophobic interactions, and inclusions of hydrogen bonding effects in the simplified models. This article employs a recently developed all-atom (except nonpolar hydrogens) model interacting with simple square-well potentials to fold the peptide fragment by molecular dynamics simulation methods. In this study, 193 out of 200 trajectories are folded at two reduced temperatures (3.5 and 3.7) close to the transition temperature T* approximately 4.0. Each simulation takes <7 h of CPU time on a Pentium 800-MHz PC. Folding of the new all-atom model is found to be initiated by collapse before the formation of main-chain hydrogen bonds. This verifies the mechanism proposed from previous all-atom unfolding and equilibrium simulations. The new model further predicts that the collapse is initiated by two nucleation contacts (a hydrophilic contact between D46 and T49 and a hydrophobic contact between Y45 and F52), in agreement with recent NMR measurements. The results suggest that atomic packing and native contact interactions play a dominant role in folding mechanism.
Publisher: Oxford University Press (OUP)
Date: 22-12-2011
DOI: 10.1093/NAR/GKQ1266
Publisher: Elsevier BV
Date: 11-2009
Publisher: Cold Spring Harbor Laboratory
Date: 10-08-2020
DOI: 10.1101/2020.08.08.242636
Abstract: The accuracy of RNA secondary and tertiary structure prediction can be significantly improved by using structural restraints derived from evolutionary or direct coupling analysis. Currently, these coupling analyses relied on manually curated multiple sequence alignments collected in the Rfam database, which contains 3016 families. By comparison, millions of non-coding RNA sequences are known. Here, we established RNAcmap, a fully automatic method that enables evolutionary coupling analysis for any RNA sequences. The homology search was based on the covariance model built by Infernal according to two secondary structure predictors: a folding-based algorithm RNAfold and the latest deep-learning method SPOT-RNA. We show that the performance of RNAcmap is less dependent on the specific evolutionary coupling tool but is more dependent on the accuracy of secondary structure predictor with the best performance given by RNAcmap (SPOT-RNA). The performance of RNAcmap (SPOT-RNA) is comparable to that based on Rfam-supplied alignment and consistent for those sequences that are not in Rfam collections. Further improvement can be made with a simple meta predictor RNAcmap (SPOT-RNA/RNAfold) depending on which secondary structure predictor can find more homologous sequences. Reliable base-pairing information generated from RNAcmap, for RNAs with high effective homologous sequences, in particular, will be useful for aiding RNA structure prediction. RNAcmap is available as a web server at erver/rnacmap/ ) and as a standalone application along with the datasets at parks-lab-org/RNAcmap .
Publisher: Bentham Science Publishers Ltd.
Date: 09-2011
Publisher: Wiley
Date: 10-12-2013
DOI: 10.1002/JCC.23509
Publisher: American Physical Society (APS)
Date: 23-09-1996
Publisher: No publisher found
Date: 2013
Publisher: Wiley
Date: 25-05-2012
DOI: 10.1002/PROT.24100
Publisher: Wiley
Date: 21-11-2018
DOI: 10.1002/JCC.25124
Abstract: Determining the flexibility of structured biomolecules is important for understanding their biological functions. One quantitative measurement of flexibility is the atomic Debye-Waller factor or temperature B-factor. Most existing studies are limited to temperature B-factors of proteins and their prediction. Only one method attempted to predict temperature B-factors of ribosomal RNA. Here, we developed and compared machine-learning techniques in prediction of temperature B-factors of RNAs. The best model based on Support Vector Machines yields Pearson's correction coefficient at 0.51 for fivefold cross validation and 0.50 for the independent test. Analysis of the performance indicates that the model has the best performance on rRNAs, tRNAs, and protein-bound RNAs, for long chains in particular. The server is available at erver/RNAflex. © 2017 Wiley Periodicals, Inc.
Publisher: Wiley
Date: 02-11-2005
DOI: 10.1002/PROT.20308
Publisher: Oxford University Press (OUP)
Date: 25-06-2022
DOI: 10.1093/BIOINFORMATICS/BTAC421
Abstract: Recently, AlphaFold2 achieved high experimental accuracy for the majority of proteins in Critical Assessment of Structure Prediction (CASP 14). This raises the hope that one day, we may achieve the same feat for RNA structure prediction for those structured RNAs, which is as fundamentally and practically important similar to protein structure prediction. One major factor in the recent advancement of protein structure prediction is the highly accurate prediction of distance-based contact maps of proteins. Here, we showed that by integrated deep learning with physics-inferred secondary structures, co-evolutionary information and multiple sequence-alignment s ling, we can achieve RNA contact-map prediction at a level of accuracy similar to that in protein contact-map prediction. More importantly, highly accurate prediction for top L long-range contacts can be assured for those RNAs with a high effective number of homologous sequences (Neff & 50). The initial use of the predicted contact map as distance-based restraints confirmed its usefulness in 3D structure prediction. SPOT-RNA-2D is available as a web server at erver/spot-rna-2d/ and as a standalone program at aswindersingh2/SPOT-RNA-2D. Supplementary data are available at Bioinformatics online.
Publisher: Oxford University Press (OUP)
Date: 14-11-2015
DOI: 10.1093/BIOINFORMATICS/BTV665
Abstract: Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ. Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. Availability and implementation: The method is available at sparks-lab.org. Contact: yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
Publisher: Wiley
Date: 05-2004
DOI: 10.1110/PS.03587604
Publisher: American Chemical Society (ACS)
Date: 03-0066
DOI: 10.1021/PR050366G
Abstract: Molecular networks in cells are organized into functional modules, where genes in the same module interact densely with each other and participate in the same biological process. Thus, identification of modules from molecular networks is an important step toward a better understanding of how cells function through the molecular networks. Here, we propose a simple, automatic method, called MC(2), to identify functional modules by enumerating and merging cliques in the protein-interaction data from large-scale experiments. Application of MC(2) to the S. cerevisiae protein-interaction data produces 84 modules, whose sizes range from 4 to 69 genes. The majority of the discovered modules are significantly enriched with a highly specific process term (at least 4 levels below root) and a specific cellular component in Gene Ontology (GO) tree. The average fraction of genes with the most enriched GO term for all modules is 82% for specific biological processes and 78% for specific cellular components. In addition, the predicted modules are enriched with coexpressed proteins. These modules are found to be useful for annotating unknown genes and uncovering novel functions of known genes. MC(2) is efficient, and takes only about 5 min to identify modules from the current yeast gene interaction network with a typical PC (Intel Xeon 2.5 GHz CPU and 512 MB memory). The CPU time of MC(2) is affordable (12 h) even when the number of interactions is increased by a factor of 10. MC(2) and its results are publicly available on theory.med.buffalo.edu/MC2.
Publisher: American Chemical Society (ACS)
Date: 05-11-2018
Abstract: Recognizing the widespread existence of intrinsically disordered regions in proteins spurred the development of computational techniques for their detection. All existing techniques can be classified into methods relying on single-sequence information and those relying on evolutionary sequence profiles generated from multiple-sequence alignments. The methods based on sequence profiles are, in general, more accurate because the presence or absence of conserved amino acid residues in a protein sequence provides important information on the structural and functional roles of the residues. However, the wide applicability of profile-based techniques is limited by time-consuming calculation of sequence profiles. Here we demonstrate that the performance gap between profile-based techniques and single-sequence methods can be reduced by using an ensemble of deep recurrent and convolutional neural networks that allow whole-sequence learning. In particular, the single-sequence method (called SPOT-Disorder-Single) is more accurate than SPOT-Disorder (a profile-based method) for proteins with few homologous sequences and comparable for proteins in predicting long-disordered regions. The method performance is robust across four independent test sets with different amounts of short- and long-disordered regions. SPOT-Disorder-Single is available as a Web server and as a standalone program at ack/server/SPOT-Disorder-Single .
Publisher: Springer Science and Business Media LLC
Date: 22-06-2015
DOI: 10.1038/SREP11476
Abstract: Direct prediction of protein structure from sequence is a challenging problem. An effective approach is to break it up into independent sub-problems. These sub-problems such as prediction of protein secondary structure can then be solved independently. In a previous study, we found that an iterative use of predicted secondary structure and backbone torsion angles can further improve secondary structure and torsion angle prediction. In this study, we expand the iterative features to include solvent accessible surface area and backbone angles and dihedrals based on Cα atoms. By using a deep learning neural network in three iterations, we achieved 82% accuracy for secondary structure prediction, 0.76 for the correlation coefficient between predicted and actual solvent accessible surface area, 19° and 30° for mean absolute errors of backbone φ and ψ angles, respectively and 8° and 32° for mean absolute errors of Cα-based θ and τ angles, respectively, for an independent test dataset of 1199 proteins. The accuracy of the method is slightly lower for 72 CASP 11 targets but much higher than those of model structures from current state-of-the-art techniques. This suggests the potentially beneficial use of these predicted properties for model assessment and ranking.
Publisher: Springer Science and Business Media LLC
Date: 21-03-2017
DOI: 10.1038/NCOMMS14902
Abstract: Reliable determination of binding kinetics and affinity of DNA hybridization and single-base mismatches plays an essential role in systems biology, personalized and precision medicine. The standard tools are optical-based sensors that are difficult to operate in low cost and to miniaturize for high-throughput measurement. Biosensors based on nanowire field-effect transistors have been developed, but reliable and cost-effective fabrication remains a challenge. Here, we demonstrate that a graphene single-crystal domain patterned into multiple channels can measure time- and concentration-dependent DNA hybridization kinetics and affinity reliably and sensitively, with a detection limit of 10 pM for DNA. It can distinguish single-base mutations quantitatively in real time. An analytical model is developed to estimate probe density, efficiency of hybridization and the maximum sensor response. The results suggest a promising future for cost-effective, high-throughput screening of drug candidates, genetic variations and disease biomarkers by using an integrated, miniaturized, all-electrical multiplexed, graphene-based DNA array.
Publisher: American Chemical Society (ACS)
Date: 04-01-2022
Publisher: Wiley
Date: 30-03-2007
DOI: 10.1002/PROT.21408
Abstract: Proteins can move freely in three-dimensional space. As a result, their structural properties, such as solvent accessible surface area, backbone dihedral angles, and atomic distances, are continuous variables. However, these properties are often arbitrarily ided into a few classes to facilitate prediction by statistical learning techniques. In this work, we establish an integrated system of neural networks (called Real-SPINE) for real-value prediction and apply the method to predict residue-solvent accessibility and backbone psi dihedral angles of proteins based on information derived from sequences only. Real-SPINE is trained with a large data set of 2640 protein chains, sequence profiles generated from multiple sequence alignment, representative amino-acid properties, a slow learning rate, overfitting protection, and predicted secondary structures. The method optimizes more than 200,000 weights and yields a 10-fold cross-validated Pearson's correlation coefficient (PCC) of 0.74 between predicted and actual solvent accessible surface areas and 0.62 between predicted and actual psi angles. In particular, 90% of 2640 proteins have a PCC value greater than 0.6 between predicted and actual solvent-accessible surface areas. The results of Real-SPINE can be compared with the best reported correlation coefficients of 0.64-0.67 for solvent-accessible surface areas and 0.47 for psi angles. The real-SPINE server, executable programs, and datasets are freely available on sparks.informatics.iupui.edu.
Publisher: Oxford University Press (OUP)
Date: 23-03-2019
DOI: 10.1093/BIOINFORMATICS/BTZ215
Abstract: Protein glycosylation is one of the most abundant post-translational modifications that plays an important role in immune responses, intercellular signaling, inflammation and host-pathogen interactions. However, due to the poor ionization efficiency and microheterogeneity of glycopeptides identifying glycosylation sites is a challenging task, and there is a demand for computational methods. Here, we constructed the largest dataset of human and mouse glycosylation sites to train deep learning neural networks and support vector machine classifiers to predict N-/O-linked glycosylation sites, respectively. The method, called SPRINT-Gly, achieved consistent results between ten-fold cross validation and independent test for predicting human and mouse glycosylation sites. For N-glycosylation, a mouse-trained model performs equally well in human glycoproteins and vice versa, however, due to significant differences in O-linked sites separate models were generated. Overall, SPRINT-Gly is 18% and 50% higher in Matthews correlation coefficient than the next best method compared in N-linked and O-linked sites, respectively. This improved performance is due to the inclusion of novel structure and sequence-based features. erver/SPRINT-Gly/ Supplementary data are available at Bioinformatics online.
Publisher: Wiley
Date: 23-12-2016
DOI: 10.1002/JCC.24285
Abstract: Protein structure prediction is a long-standing problem in molecular biology. Due to lack of an accurate energy function, it is often difficult to know whether the s ling algorithm or the energy function is the most important factor for failure of locating near-native conformations of proteins. This article examines the size dependence of s ling effectiveness by using a perfect "energy function": the root-mean-squared distance from the target native structure. Using protein targets up to 460 residues from critical assessment of structure prediction techniques (CASP11, 2014), we show that the accuracy of near native structures s led is relatively independent of protein sizes but strongly depends on the errors of predicted torsion angles. Even with 40% out-of-range angle prediction, 2 Å or less near-native conformation can be s led. The result supports that the poor energy function is one of the bottlenecks of structure prediction and predicted torsion angles are useful for overcoming the bottleneck by restricting the s ling space in the absence of a perfect energy function.
Publisher: Cold Spring Harbor Laboratory
Date: 07-10-2022
DOI: 10.1101/2022.10.03.510702
Abstract: Unlike 20-letter-coded proteins, RNA homologous sequences are notoriously difficult to detect because their 4-letter-coded sequences can quickly lose their sequence identity. As a result, employing secondary structures has been found necessary to improve the sensitivity and the accuracy of homolog search. However, exact secondary structures often are not known. As a result, Rfam, the de facto gold-standard of RNA homologous families, has to rely on manual curation and experimental secondary structure if available. Here, we showed that using a combination of BLAST and iterative INFERNAL searches along with an expanded sequence database leads multiple sequence alignments (MSA) that are comparable to those provided by Rfam MSAs, according to secondary structure extracted from mutational coupling analysis and alignment accuracy when compared to structure alignment. The fully automatic tool (RNAcmap2) allows making homolog search, multiple sequence alignment, and mutational coupling analysis for any non-Rfam RNA sequences with Rfam-like performance.
Publisher: No publisher found
Date: 2013
Location: No location found
Location: United States of America
Start Date: 2018
End Date: 2020
Funder: Australian Research Council
View Funded ActivityStart Date: 2017
End Date: 2019
Funder: National Health and Medical Research Council
View Funded ActivityStart Date: 2015
End Date: 2016
Funder: Australian Research Council
View Funded ActivityStart Date: 2015
End Date: 2017
Funder: National Health and Medical Research Council
View Funded ActivityStart Date: 2015
End Date: 2015
Funder: Australian Research Council
View Funded ActivityStart Date: 2021
End Date: 12-2023
Amount: $570,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 09-2016
End Date: 12-2018
Amount: $241,564.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2015
End Date: 01-2016
Amount: $540,000.00
Funder: Australian Research Council
View Funded ActivityStart Date: 2018
End Date: 12-2020
Amount: $337,946.00
Funder: Australian Research Council
View Funded Activity