- Home
- A-Z Publications
- Current Bioinformatics
- Issue Home
Current Bioinformatics - Current Issue
Volume 19, Issue 10, 2024
- Life Sciences, Systems Biology & Bioinformatics, Biochemical Research Methods, Mathematical & Computational Biology
-
-
-
Advances in Deep Learning Assisted Drug Discovery Methods: A Self-review
Authors: Haiping Zhang and Konda Mani SaravananArtificial Intelligence is a field within computer science that endeavors to replicate the intricate structures and operational mechanisms inherent in the human brain. Machine learning is a subfield of artificial intelligence that focuses on developing models by analyzing training data. Deep learning is a distinct subfield within artificial intelligence, characterized by using models that depict geometric transformations across multiple layers. The deep learning has shown significant promise in various domains, including health and life sciences. In recent times, deep learning has demonstrated successful applications in drug discovery. In this self-review, we present recent methods developed with the aid of deep learning. The objective is to give a brief overview of the present cutting-edge advancements in drug discovery from our group. We have systematically discussed experimental evidence and proof of concept examples for the deep learning-based models developed, such as DeepBindBC, DeepPep, and DeepBindRG. These developments not only shed light on the existing challenges but also emphasize the achievements and prospects for future drug discovery and development progress.
-
-
-
-
A-RFP: An Adaptive Residue Flexibility Prediction Method Improving Protein-ligand Docking Based on Homologous Proteins
Authors: Chuqi Lei, Senbiao Fang, Yaohang Li, Fei Guo and Min LiBackgroundComputational molecular docking plays an important role in determining the precise receptor-ligand conformation, which becomes a powerful tool for drug discovery. In the past 30 years, most computational docking methods have treated the receptor structure as a rigid body, although flexible docking often yields higher accuracy. The main disadvantage of flexible docking is its significantly higher computational cost. Due to the fact that different protein pocket residues exhibit different degrees of flexibility, semi-flexible docking methods, balancing rigid docking and flexible docking, have demonstrated success in predicting highly accurate conformations with a relatively low computational cost.
MethodsIn our study, the number of flexible pocket residues was assessed by quantitative analysis, and a novel adaptive residue flexibility prediction method, named A-RFP, was proposed to improve the docking performance. Based on the homologous information, a joint strategy is used to predict the pocket residue flexibility by combining RMSD, the distance between the residue sidechain and the ligand, and the sidechain orientation. For each receptor-ligand pair, A-RFP provides a docking conformation with the optimal affinity.
ResultsBy analyzing the docking affinities of 3507 target-ligand pairs in 5 different values ranging from 0 to 10, we found there is a general trend that the larger number of flexible residues inevitably improves the docking results by using Autodock Vina. However, a certain number of counterexamples still exist. To validate the effectiveness of A-RFP, the experimental assessment was tested in a small-scale virtual screening on 5 proteins, which confirmed that A-RFP could enhance the docking performance. And the flexible-receptor virtual screening on a low-similarity dataset with 85 receptors validates the accuracy of residue flexibility comprehensive evaluation. Moreover, we studied three receptors with FDA-approved drugs, which further proved A-RFP can play a suitable role in ligand discovery.
ConclusionOur analysis confirms that the screening performance of the various numbers of flexible residues varies wildly across receptors. It suggests that a fine-grained docking method would offset the aforementioned deficiency. Thus, we presented A-RFP, an adaptive pocket residue flexibility prediction method based on homologous information. Without considering computational resources and time costs, A-RFP provides the optimal docking result.
-
-
-
STNMDA: A Novel Model for Predicting Potential Microbe-Drug Associations with Structure-Aware Transformer
Authors: Liu Fan, Xiaoyu Yang, LeiWang and Xianyou ZhuIntroductionMicrobes are intimately involved in the physiological and pathological processes of numerous diseases. There is a critical need for new drugs to combat microbe-induced diseases in clinical settings. Predicting potential microbe-drug associations is, therefore, essential for both disease treatment and novel drug discovery. However, it is costly and time-consuming to verify these relationships through traditional wet lab approaches.
MethodsWe proposed an efficient computational model, STNMDA, that integrated a Structure-Aware Transformer (SAT) with a Deep Neural Network (DNN) classifier to infer latent microbe-drug associations. The STNMDA began with a “random walk with a restart” approach to construct a heterogeneous network using Gaussian kernel similarity and functional similarity measures for microorganisms and drugs. This heterogeneous network was then fed into the SAT to extract attribute features and graph structures for each drug and microbe node. Finally, the DNN classifier calculated the probability of associations between microbes and drugs.
ResultsExtensive experimental results showed that STNMDA surpassed existing state-of-the-art models in performance on the MDAD and aBiofilm databases. In addition, the feasibility of STNMDA in confirming associations between microbes and drugs was demonstrated through case validations.
ConclusionHence, STNMDA showed promise as a valuable tool for future prediction of microbe-drug associations.
-
-
-
Genotype and Phenotype Association Analysis Based on Multi-omics Statistical Data
Authors: Xinpeng Guo, Yafei Song, Dongyan Xu, Xueping Jin and Xuequn ShangBackgroundWhen using clinical data for multi-omics analysis, there are issues such as the insufficient number of omics data types and relatively small sample size due to the protection of patients' privacy, the requirements of data management by various institutions, and the relatively large number of features of each omics data. This paper describes the analysis of multi-omics pathway relationships using statistical data in the absence of clinical data.
MethodsWe proposed a novel approach to exploit easily accessible statistics in public databases. This approach introduces phenotypic associations that are not included in the clinical data and uses these data to build a three-layer heterogeneous network. To simplify the analysis, we decomposed the three-layer network into double two-layer networks to predict the weights of the inter-layer associations. By adding a hyperparameter β, the weights of the two layers of the network were merged, and then k-fold cross-validation was used to evaluate the accuracy of this method. In calculating the weights of the two-layer networks, the RWR with fixed restart probability was combined with PBMDA and CIPHER to generate the PCRWR with biased weights and improved accuracy.
ResultsThe area under the receiver operating characteristic curve was increased by approximately 7% in the case of the RWR with initial weights.
ConclusionMulti-omics statistical data were used to establish genotype and phenotype correlation networks for analysis, which was similar to the effect of clinical multi-omics analysis.
-
-
-
Enhancing Drug-Target Binding Affinity Prediction through Deep Learning and Protein Secondary Structure Integration
Authors: Runhua Zhang, Baozhong Zhu, Tengsheng Jiang, Zhiming Cui and Hongjie WuBackgroundConventional approaches to drug discovery are often characterized by lengthy and costly processes. To expedite the discovery of new drugs, the integration of artificial intelligence (AI) in predicting drug-target binding affinity (DTA) has emerged as a crucial approach. Despite the proliferation of deep learning methods for DTA prediction, many of these methods primarily concentrate on the amino acid sequence of proteins. Yet, the interactions between drug compounds and targets occur within distinct segments within the protein structures, whereas the primary sequence primarily captures global protein features. Consequently, it falls short of fully elucidating the intricate relationship between drugs and their respective targets.
ObjectiveThis study aims to employ advanced deep-learning techniques to forecast DTA while incorporating information about the secondary structure of proteins.
MethodsIn our research, both the primary sequence of protein and the secondary structure of protein were leveraged for protein representation. While the primary sequence played the role of the overarching feature, the secondary structure was employed as the localized feature. Convolutional neural networks and graph neural networks were utilized to independently model the intricate features of target proteins and drug compounds. This approach enhanced our ability to capture drug-target interactions more effectively.
ResultsWe have introduced a novel method for predicting DTA. In comparison to DeepDTA, our approach demonstrates significant enhancements, achieving a 3.9% increase in the Concordance Index (CI) and a remarkable 34% reduction in Mean Squared Error (MSE) when evaluated on the KIBA dataset.
ConclusionIn conclusion, our results unequivocally demonstrate that augmenting DTA prediction with the inclusion of the protein's secondary structure as a localized feature yields significantly improved accuracy compared to relying solely on the primary structure.
-
-
-
Sia-m7G: Predicting m7G Sites through the Siamese Neural Network with an Attention Mechanism
Authors: Jia Zheng and Yetong ZhouBackgroundThe chemical modification of RNA plays a crucial role in many biological processes. N7-methylguanosine (m7G), being one of the most important epigenetic modifications, plays an important role in gene expression, processing metabolism, and protein synthesis. Detecting the exact location of m7G sites in the transcriptome is key to understanding their relevant mechanism in gene expression. On the basis of experimentally validated data, several machine learning or deep learning tools have been designed to identify internal m7G sites and have shown advantages over traditional experimental methods in terms of speed, cost-effectiveness and robustness.
AimsIn this study, we aim to develop a computational model to help predict the exact location of m7G sites in humans.
ObjectiveSimple and advanced encoding methods and deep learning networks are designed to achieve excellent m7G prediction efficiently.
MethodsThree types of feature extractions and six classification algorithms were tested to identify m7G sites. Our final model, named Sia-m7G, adopts one-hot encoding and a delicate Siamese neural network with an attention mechanism. In addition, multiple 10-fold cross-validation tests were conducted to evaluate our predictor.
ResultsSia-m7G achieved the highest sensitivity, specificity and accuracy on 10-fold cross-validation tests compared with the other six m7G predictors. Nucleotide preference and model visualization analyses were conducted to strengthen the interpretability of Sia-m7G and provide a further understanding of m7G site fragments in genomic sequences.
ConclusionSia-m7G has significant advantages over other classifiers and predictors, which proves the superiority of the Siamese neural network algorithm in identifying m7G sites.
-
-
-
Integrated Machine Learning Algorithms for Stratification of Patients with Bladder Cancer
Authors: Yuanyuan He, Haodong Wei, Siqing Liao, Ruiming Ou, Yuqiang Xiong, Yongchun Zuo and Lei YangBackgroundBladder cancer is a prevalent malignancy globally, characterized by rising incidence and mortality rates. Stratifying bladder cancer patients into different subtypes is crucial for the effective treatment of this form of cancer. Therefore, there is a need to develop a stratification model specific to bladder cancer.
PurposeThis study aims to establish a prognostic prediction model for bladder cancer, with the primary goal of accurately predicting prognosis and treatment outcomes.
MethodsWe collected datasets from 10 bladder cancer datasets sourced from the Gene Expression Omnibus (GEO), the Cancer Genome Atlas (TCGA) databases, and IMvigor210 dataset. The machine learning based on feature selection algorithms were used to generate 96 models for establishing the risk score for each patient. Based on the risk score, all the patients were classified into two different risk score groups.
ResultsThe two groups of bladder cancer patients exhibited significant differences in prognosis, biological functions, and drug sensitivity. Nomogram model demonstrated that the risk score had a robust predictive effect with good clinical utility.
ConclusionThe risk score constructed in this study can be utilized to predict the prognosis, response to drug treatment, and immunotherapy of bladder cancer patients, providing assistance for personalized clinical treatment of bladder cancer.
-
-
-
CFCN: An HLA-peptide Prediction Model based on Taylor Extension Theory and Multi-view Learning
Authors: Bing Rao, Bing Han, Leyi Wei, Zeyu Zhang, Xinbo Jiang and Balachandran ManavalanBackgroundWith the increasing development of biotechnology, many cancer solutions have been proposed nowadays. In recent years, Neo-peptides-based methods have made significant contributions, with an essential prerequisite of bindings between peptides and HLA molecules. However, the binding is hard to predict, and the accuracy is expected to improve further.
MethodsTherefore, we propose the Crossed Feature Correction Network (CFCN) with deep learning method, which can automatically extract and adaptively learn the discriminative features in HLA-peptide binding, in order to make more accurate predictions on HLA-peptide binding tasks. With the fancy structure of encoding and feature extracting process for peptides, as well as the feature fusion process between fine-grained and coarse-grained level, it shows many advantages on given tasks.
ResultsThe experiment illustrates that CFCN achieves better performances overall, compared with other fancy models in many aspects.
ConclusionIn addition, we also consider to use multi-view learning methods for the feature fusion process, in order to find out further relations among binding features. Eventually, we encapsulate our model as a useful tool for further research on binding tasks.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)