- Home
- A-Z Publications
- Current Bioinformatics
- Previous Issues
- Volume 15, Issue 4, 2020
Current Bioinformatics - Volume 15, Issue 4, 2020
Volume 15, Issue 4, 2020
-
-
Relevance of Molecular Docking Studies in Drug Designing
Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. ChhillarMolecular Docking is used to positioning the computer-generated 3D structure of small ligands into a receptor structure in a variety of orientations, conformations and positions. This method is useful in drug discovery and medicinal chemistry providing insights into molecular recognition. Docking has become an integral part of Computer-Aided Drug Design and Discovery (CADDD). Traditional docking methods suffer from limitations of semi-flexible or static treatment of targets and ligand. Over the last decade, advances in the field of computational, proteomics and genomics have also led to the development of different docking methods which incorporate protein-ligand flexibility and their different binding conformations. Receptor flexibility accounts for more accurate binding pose predictions and a more rational depiction of protein binding interactions with the ligand. Protein flexibility has been included by generating protein ensembles or by dynamic docking methods. Dynamic docking considers solvation, entropic effects and also fully explores the drug-receptor binding and recognition from both energetic and mechanistic point of view. Though in the fast-paced drug discovery program, dynamic docking is computationally expensive but is being progressively used for screening of large compound libraries to identify the potential drugs. In this review, a quick introduction is presented to the available docking methods and their application and limitations in drug discovery.
-
-
-
Analysis and Comparison of RNA Pseudouridine Site Prediction Tools
More LessBackground: Pseudouridine (Ψ) is the most abundant RNA modification and has important functions in a series of biological and cellular processes. Although experimental techniques have made great contributions to identify Ψ sites, they are still labor-intensive and costineffective. In the past few years, a series of computational approaches have been developed, which provided rapid and efficient approaches to identify Ψ sites. Results: To provide the readership with a clear landscape about the recent development in this important area, in this review, we summarized and compared the representative computational approaches developed for identifying Ψ sites. Moreover, future directions in computationally identifying Ψ sites were discussed as well. Conclusion: We anticipate that this review will provide novel insights into the researches on pseudouridine modification.
-
-
-
Finding Community of Brain Networks Based on Neighbor Index and DPSO with Dynamic Crossover
Authors: Jie Zhang, Junhong Feng and Fang-Xiang WuBackground: The brain networks can provide us an effective way to analyze brain function and brain disease detection. In brain networks, there exist some import neural unit modules, which contain meaningful biological insights. Objective: Therefore, we need to find the optimal neural unit modules effectively and efficiently. Method: In this study, we propose a novel algorithm to find community modules of brain networks by combining Neighbor Index and Discrete Particle Swarm Optimization (DPSO) with dynamic crossover, abbreviated as NIDPSO. The differences between this study and the existing ones lie in that NIDPSO is proposed first to find community modules of brain networks, and dose not need to predefine and preestimate the number of communities in advance. Results: We generate a neighbor index table to alleviate and eliminate ineffective searches and design a novel coding by which we can determine the community without computing the distances amongst vertices in brain networks. Furthermore, dynamic crossover and mutation operators are designed to modify NIDPSO so as to alleviate the drawback of premature convergence in DPSO. Conclusion: The numerical results performing on several resting-state functional MRI brain networks demonstrate that NIDPSO outperforms or is comparable with other competing methods in terms of modularity, coverage and conductance metrics.
-
-
-
Predicting Protein Phosphorylation Sites Based on Deep Learning
Authors: Haixia Long, Zhao Sun, Manzhi Li, Hai Y. Fu and Ming Cai LinBackground: Protein phosphorylation is one of the most important Post-translational Modifications (PTMs) occurring at amino acid residues serine (S), threonine (T), and tyrosine (Y). It plays critical roles in protein structure and function predicting. With the development of novel high-throughput sequencing technologies, there are a huge amount of protein sequences being generated and stored in databases. Objective: It is of great importance in both basic research and drug development to quickly and accurately predict which residues of S, T, or Y can be phosphorylated. Methods: In order to solve the problem, a novel hybrid deep learning model with a convolutional neural network and bi-directional long short-term memory recurrent neural network (CNN+BLSTM) is proposed for predicting phosphorylation sites in proteins. The model contains a list of layers that transform the input data into an output class, in which the convolution layer captures higher-level abstraction features of amino acid, while the recurrent layer captures long-term dependencies between amino acids to improve predictions. The joint model learns interactions between higher-level features derived from the protein sequence to predict the phosphorylated sites. Results: We applied our model together with two canonical methods namely iPhos-PseEn and MusiteDeep. A 5-fold cross-validation process indicated that CNN+BLSTM outperforms the two competitors in various evaluation metrics like the area under the receiver operating characteristic and precision-recall curves, the Matthews correlation coefficient, F-measure, accuracy, and so on. Conclusion: CNN+BLSTM is promising in identifying potential protein phosphorylation for further experimental validation.
-
-
-
A Sequential Ensemble Model for Communicable Disease Forecasting
Authors: Nashreen Sultana, Nonita Sharma, Krishna P. Sharma and Shobhit VermaBackground: Ensemble building is a popular method for improving model accuracy for classification problems as well as regression. Objective: In this research work, we propose a sequential ensemble model to predict the number of incidences for communicable diseases like influenza, hand foot and mouth disease (HFMD), and diarrhea and compare it with applied models for prediction. Methods: The weekly dataset of the three diseases, namely, influenza, HFMD, and diarrhea, are collected from the official government site of Hong Kong from the year 2010 to 2018. The data was preprocessed by taking log transformation and z-score transformation. The proposed sequential ensemble model is applied to the processed dataset to predict future occurrences. Results: The result of the proposed ensemble model is compared against standard support vector regression (SVR) using different error metrics such as root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE). In the case of all the threedisease datasets, the proposed ensemble model gives better results in comparison to the standard SVR model. Conclusion: The main objective of this research work is to minimize the prediction error; the proposed sequential ensemble model has shown a significant result in terms of prediction errors.
-
-
-
SimExact – An Efficient Method to Compute Function Similarity Between Proteins Using Gene Ontology
Authors: Najmul Ikram, Muhammad A. Qadir and Muhammad Tanvir AfzalBackground: The rapidly growing protein and annotation databases necessitate the development of efficient tools to process this valuable information. Biologists frequently need to find proteins similar to a given protein, for which BLAST tools are commonly used. With the development of biomedical ontologies, e.g. Gene Ontology, methods were designed to measure function (semantic) similarity between two proteins. These methods work well on protein pairs, but are not suitable for protein query processing. Objective: Our aim is to facilitate searching of similar proteins in an acceptable time. Methods: A novel method SimExact for high speed searching of functionally similar proteins has been proposed. Results: The experiments of this study show that SimExact gives correct results required for protein searching. A fully functional prototype of an online tool (www.datafurnish.com/protsem.php) has been provided that generates a ranked list of the proteins similar to a query protein, with a response time of less than 20 seconds in our setup. SimExact was used to search for protein pairs having high disparity between function similarity and sequence similarity. Conclusion: SimExact makes such searches practical, which would not be possible in a reasonable time otherwise.
-
-
-
Identification of Novel Key Targets and Candidate Drugs in Oral Squamous Cell Carcinoma
Authors: Juan Liu, Xinjie Lian, Feng Liu, Xueling Yan, Chunyan Cheng, Lijia Cheng, Xiaolin Sun and Zheng ShiBackground: Oral Squamous Cell Carcinoma (OSCC) is the most common malignant epithelial neoplasm. It is located within the top 10 ranking incidence of cancers with a poor prognosis and low survival rates. New breakthroughs of therapeutic strategies are therefore needed to improve the survival rate of OSCC harboring patients. Objective: Since targeted therapy is considered as the most promising therapeutic strategies in cancer, it is of great significance to identify novel targets and drugs for the treatment of OSCC. Methods: A series of bioinformatics approaches were launched to identify the hub proteins and their potential agents. Microarray analysis and several online functional activity network analysis were firstly utilized to recognize drug targets in OSCC. Subsequently, molecular docking was used to screen their potential drugs from the specs chemistry database. At the same time, the assessment of ligand-based virtual screening model was also evaluated. Results: In this study, two microarray data (GSE31056, GSE23558) were firstly selected and analyzed to get consensus candidate genes including 681 candidate genes. Additionally, we selected 33 candidate genes based on whether they belong to the kinases and transcription factors and further clustered candidate hub targets based on functions and signaling pathways with significant enrichment analysis by using DAVID and STRING online databases. Then, core PPI network was then identified and we manually selected GRB2 and IGF1 as the key drug targets according to the network analysis and previous references. Lastly, virtual screening was performed to identify potential small molecules which could target these two targets, and such small molecules can serve as the promising candidate agents for future drug development. Conclusion: In summary, our study might provide novel insights for understanding of the underlying molecular events of OSCC, and our discovered candidate targets and candidate agents could be used as the promising therapeutic strategies for the treatment of OSCC.
-
-
-
A Novel Integrative Approach for Non-coding RNA Classification Based on Deep Learning
Background: Molecular biomarkers show new ways to understand many disease processes. Noncoding RNAs as biomarkers play a crucial role in several cellular activities, which are highly correlated to many human diseases especially cancer. The classification and the identification of ncRNAs have become a critical issue due to their application, such as biomarkers in many human diseases. Objective: Most existing computational tools for ncRNA classification are mainly used for classifying only one type of ncRNA. They are based on structural information or specific known features. Furthermore, these tools suffer from a lack of significant and validated features. Therefore, the performance of these methods is not always satisfactory. Methods: We propose a novel approach named imCnC for ncRNA classification based on multisource deep learning, which integrates several data sources such as genomic and epigenomic data to identify several ncRNA types. Also, we propose an optimization technique to visualize the extracted features pattern from the multisource CNN model to measure the epigenomics features of each ncRNA type. Results: The computational results using a dataset of 16 human ncRNA classes downloaded from RFAM show that imCnC outperforms the existing tools. Indeed, imCnC achieved an accuracy of 94,18%. In addition, our method enables to discover new ncRNA features using an optimization technique to measure and visualize the features pattern of the imCnC classifier.
-
-
-
A Machine Learning-based Diagnosis of Thyroid Cancer Using Thyroid Nodules Ultrasound Images
Authors: Xuesi Ma, Baohang Xi, Yi Zhang, Lijuan Zhu, Xin Sui, Geng Tian and Jialiang YangBackground: Ultrasound test is one of the routine tests for the diagnosis of thyroid cancer. The diagnosis accuracy depends largely on the correct interpretation of ultrasound images of thyroid nodules. However, human eye-based image recognition is usually subjective and sometimes error-prone especially for less experienced doctors, which presents a need for computeraided diagnostic systems. Objective: To our best knowledge, there is no well-maintained ultrasound image database for the Chinese population. In addition, though there are several computational methods for image-based thyroid cancer detection, a comparison among them is missing. Finally, the effects of features like the choice of distance measures have not been assessed. The study aims to give the improvement of these limitations and proposes a highly accurate image-based thyroid cancer diagnosis system, which can better assist doctors in the diagnosis of thyroid cancer. Methods: We first establish a novel thyroid nodule ultrasound image database consisting of 508 images collected from the Third Hospital of Hebei Medical University in China. The clinical information for the patients is also collected from the hospital, where 415 patients are diagnosed to be benign and 93 are malignant by doctors following a standard diagnosis procedure. We develop and apply five machine learning methods to the dataset including deep neural network, support vector machine, the center clustering method, k-nearest neighbor, and logistic regression. Results: Experimental results show that deep neural network outperforms other diagnosis methods with an average cross-validation accuracy of 0.87 in 10 runs. Meanwhile, we also explore the performance of four image distance measures including the Euclidean distance, the Manhattan distance, the Chebyshev distance, and the Minkowski distance, among which the Chebyshev distance is the best. The resource can be directly used to aid doctors in thyroid cancer diagnosis and treatment. Conclusions: The paper establishes a novel thyroid nodule ultrasound image database and develops a high accurate image-based thyroid cancer diagnosis system which can better assist doctors in the diagnosis of thyroid cancer.
-
-
-
Application of a Deep Matrix Factorization Model on Integrated Gene Expression Data
Authors: Yong-Jing Hao, Mi-Xiao Hou, Ying-Lian Gao, Jin-Xing Liu and Xiang-Zhen KongBackground: Non-negative Matrix Factorization (NMF) has been extensively used in gene expression data. However, most NMF-based methods have single-layer structures, which may achieve poor performance for complex data. Deep learning, with its carefully designed hierarchical structure, has shown significant advantages in learning data features. Objective: In bioinformatics, on the one hand, to discover differentially expressed genes in gene expression data; on the other hand, to obtain higher sample clustering results. It can provide the reference value for the prevention and treatment of cancer. Method: In this paper, we apply a deep NMF method called Deep Semi-NMF on the integrated gene expression data. In each layer, the coefficient matrix is directly decomposed into the basic and coefficient matrix of the next layer. We apply this factorization model on The Cancer Genome Atlas (TCGA) genomic data. Results: The experimental results demonstrate the superiority of Deep Semi-NMF method in identifying differentially expressed genes and clustering samples. Conclusion: The Deep Semi-NMF model decomposes a matrix into multiple matrices and multiplies them to form a matrix. It can also improve the clustering performance of samples while digging out more accurate key genes for disease treatment.
-
-
-
ConvsPPIS: Identifying Protein-protein Interaction Sites by an Ensemble Convolutional Neural Network with Feature Graph
Authors: Huaixu Zhu, Xiuquan Du and Yu YaoBackground/Objective: Protein-protein interactions are essentials for most cellular processes and thus, unveiling how proteins interact with is a crucial question that can be better understood by recognizing which residues participate in the interaction. Although many computational approaches have been proposed to predict interface residues, their feature perspective and model learning ability are not enough to achieve ideal results. So, our objective is to improve the predictive performance under considering feature perspective and new learning algorithm. Method: In this study, we proposed an ensemble deep convolutional neural network, which explores the context and positional context of consecutive residues within a protein sub-sequence. Specifically, unlike the feature view of previous methods, ConvsPPIS uses evolutionary, physicochemical, and structural protein characteristics to construct their own feature graph respectively. After that, three independent deep convolutional neural networks are trained on each type of feature graph for learning the underlying pattern in sub-sequence. Lastly, we integrated those three deep networks into an ensemble predictor with leveraging complementary information of those features to predict potential interface residues. Results: Some comparative experiments have conducted through 10-fold cross-validation. The results indicated that ConvsPPIS achieved superior performance on DBv5-Sel dataset with an accuracy of 88%. Additional experiments on CAPRI-Alone dataset demonstrated ConvsPPIS has also better prediction performance. Conclusion: The ConvsPPIS method provided a new perspective to capture protein feature expression for identifying protein-protein interaction sites. The results proved the superiority of this method.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)