- Home
- A-Z Publications
- Current Bioinformatics
- Previous Issues
- Volume 18, Issue 7, 2023
Current Bioinformatics - Volume 18, Issue 7, 2023
Volume 18, Issue 7, 2023
-
-
Convolutional Neural Networks: A Promising Deep Learning Architecture for Biological Sequence Analysis
Authors: Chinju John, Jayakrushna Sahoo, Manu Madhavan and Oommen K. MathewThe deep learning arena explores new dimensions once considered impossible to human intelligence. Recently, it has taken footsteps in the biological data world to deal with the diverse patterns of data derived from biomolecules. The convolutional neural networks, one of the most employed and persuasive deep learning architectures, can unravel the sequestered truths from these data, especially from the biological sequences. These neural network variants outperform traditional bioinformatics tools for the enduring tasks associated with such sequences. This work imparts an exciting preface to the basics of convolutional neural network architecture and how it can be instrumented to deal with biological sequence analysis. The approach followed in this paper can provide the reader with an enhanced view of convolutional neural networks, their basic working principles and how they apply to biological sequences. A detailed view of critical steps involved in deep learning, starting from the data preprocessing, architecture designing, model training, hyperparameter tuning, and evaluation metrics, are portrayed. A comparative analysis of convolutional neural network architectures developed for protein family classification is also discussed. This review contributes significantly to understanding the concepts behind deep learning architectures and their applications in biological sequence analysis. It can lift the barrier of limited knowledge to a great extent on the deep learning concepts and their implementation, especially for people who are dealing with pure biology.
-
-
-
Advances in Peptide/Protein Structure Prediction Tools and their Relevance for Structural Biology in the Last Decade
Peptides and proteins are involved in several biological processes at a molecular level. In this context, three-dimensional structure characterization and determination of peptides and proteins have helped researchers unravel the chemical and biological role of these macromolecules. Over 50 years, peptide and protein structures have been determined by experimental methods, including nuclear magnetic resonance (NMR), X-ray crystallography, and cryo-electron microscopy (cryo-EM). Therefore, an increasing number of atomic coordinates for peptides and proteins have been deposited in public databases, thus assisting the development of computational tools for predicting unknown 3D structures. In the last decade, a race for innovative methods has arisen in computational sciences, including more complex biological activity and structure prediction algorithms. As a result, peptide/protein theoretical models have achieved a new level of structure prediction accuracy compared with experimentally determined structures. Machine learning and deep learning approaches, for instance, incorporate fundamental aspects of peptide/protein geometry and include physical/biological knowledge about these macromolecules' experimental structures to build more precise computational models. Additionally, computational strategies have helped structural biology, including comparative, threading, and ab initio modeling and, more recently, prediction tools based on machine learning and deep learning. Bearing this in mind, here we provide a retrospective of protein and peptide structure prediction tools, highlighting their advances and obstacles and how they have assisted researchers in answering crucial biological questions.
-
-
-
Machine Learning Applications in the Study of Parkinson’s Disease: A Systematic Review
Background: Parkinson’s disease is a common neurodegenerative disorder that has been studied from multiple perspectives using several data modalities. Given the size and complexity of these data, machine learning emerged as a useful approach to analyze them for different purposes. These methods have been successfully applied in a broad range of applications, including the diagnosis of Parkinson’s disease or the assessment of its severity. In recent years, the number of published articles that used machine learning methodologies to analyze data derived from Parkinson’s disease patients have grown substantially. Objective: Our goal was to perform a comprehensive systematic review of the studies that applied machine learning to Parkinson’s disease data. Methods: We extracted published articles in PubMed, SCOPUS and Web of Science until March 15, 2022. After selection, we included 255 articles in this review. Results: We classified the articles by data type and we summarized their characteristics, such as outcomes of interest, main algorithms, sample size, sources of data and model performance. Conclusion: This review summarizes the main advances in the use of Machine Learning methodologies for the study of Parkinson’s disease, as well as the increasing interest of the research community in this area.
-
-
-
EpiSemble: A Novel Ensemble-based Machine-learning Framework for Prediction of DNA N6-methyladenine Sites Using Hybrid Features Selection Approach for Crops
Authors: Dipro Sinha, Tanwy Dasmandal, Md Yeasin, Dwijesh C. Mishra, Anil Rai and Sunil ArchakAim: The study aimed to develop a robust and more precise 6mA methylation prediction tool that assists researchers in studying the epigenetic behaviour of crop plants. Background: N6-methyladenine (6mA) is one of the predominant epigenetic modifications involved in a variety of biological processes in all three kingdoms of life. While in vitro approaches are more precise in detecting epigenetic alterations, they are resource-intensive and time-consuming. Artificial intelligence- based in silico methods have helped overcome these bottlenecks. Methods: A novel machine learning framework was developed through the incorporation of four techniques: ensemble machine learning, hybrid approach for feature selection, the addition of features, such as Average Mutual Information Profile (AMIP), and bootstrap samples. In this study, four different feature sets, namely di-nucleotide frequency, GC content, AMIP, and nucleotide chemical properties were chosen for the vectorization of DNA sequences. Nine machine learning models, including support vector machine, random forest, k-nearest neighbor, artificial neural network, multiple logistic regression, decision tree, naïve Bayes, AdaBoost, and gradient boosting were employed using relevant features extracted through the feature selection module. The top three best-performing models were selected and a robust ensemble model was developed to predict sequences with 6mA sites. Results: EpiSemble, a novel ensemble model was developed for the prediction of 6mA methylation sites. Using the new model, an improvement in accuracy of 7.0%, 3.74%, and 6.65% was achieved over existing models for RiceChen, RiceLv, and Arabidopsis datasets, respectively. An R package, EpiSemble, based on the new model was developed and made available at https://cran.rproject. org/web/packages/EpiSemble/index.html. Conclusion: The EpiSemble model added AMIP as a novel feature, integrated feature selection modules, bootstrapping of samples, and ensemble technique to achieve an improved output for accurate prediction of 6mA sites in plants. To our knowledge, this is the first R package developed for predicting epigenetic sites of genomes in crop plants, which is expected to help plant researchers in their future explorations.
-
-
-
Survival Prediction of Esophageal Squamous Cell Carcinoma Based on the Prognostic Index and Sparrow Search Algorithm-Support Vector Machine
Authors: Yanfeng Wang, Wenhao Zhang, Yuli Yang, Junwei Sun and Lidong WangAim: Esophageal squamous cell carcinoma (ESCC) is one of the highest incidence and mortality cancers in the world, and recent studies show that the incidence of ESCC is on the rise, and the mortality rate remains high. An effective survival prediction model can assist physicians in treatment decisions and improve the quality of patient survival. Introduction: In this study, ESCC prognostic index and survival prediction model based on blood indicators and TNM staging information are developed, and their effectiveness is analyzed. Methods: Kaplan-Meier survival analysis and COX regression analysis are used to find influencing factors that are significantly associated with patient survival. The binary logistic regression method is utilized to construct a prognostic index (PI) for esophageal squamous cell carcinoma (ESCC). Based on the sparrow search algorithm (SSA) and support vector machine (SVM), a survival prediction model for patients with ESCC is established. Results: Eight factors significantly associated with patient survival are selected by Kaplan-Meier survival analysis and COX regression analysis. PI is divided into four stages, and the stages can reasonably reflect the survival condition of diverse patients. Compared with the other four existing models, the sparrow search algorithm-support vector machine (SSA-SVM) proposed in this paper has higher prediction accuracy. Conclusion: In order to accurately and effectively predict the five-year survival rate of patients with ESCC, a survival prediction model based on Kaplan-Meier survival analysis, COX regression analysis, binary logistic regression and support vector machine is proposed in this paper. The results show that the method proposed in this paper can accurately predict the five-year survival rate of ESCC patients.
-
-
-
Predicting Herb-disease Associations Through Graph Convolutional Network
Authors: Xuan Hu, You Lu, Geng Tian, Pingping Bing, Bing Wang and Binsheng HeBackground: In recent years, herbs have become very popular worldwide as a form of complementary and alternative medicine (CAM). However, there are many types of herbs and diseases, whose associations are impossible to be fully revealed. Identifying new therapeutic indications of herbs, that is drug repositioning, is a critical supplement for new drug development. Considering that exploring the associations between herbs and diseases by wet-lab techniques is time-consuming and laborious, there is an urgent need for reliable computational methods to fill this gap. In this study, we first preprocessed the herbs and their indications in the TCM-Suit database, a comprehensive, accurate, and integrated traditional Chinese medicine database, to obtain the herb-disease association network. We then proposed a novel model based on a graph convolution network (GCN) to infer potential new associations between herbs and diseases. Methods: In our method, the effective features of herbs and diseases were extracted through multi-layer GCN, then the layer attention mechanism was introduced to combine the features learned from multiple GCN layers, and jump connections were added to reduce the over-smoothing phenomenon caused by multi-layer GCN stacking. Finally, the recovered herb-disease association network was generated by the bilinear decoder. We applied our model together with four other methods (including SCMFDD, BNNR, LRMCMDA, and DRHGCN) to predict herb-disease associations. Compared with all other methods, our model showed the highest area under the receiver operating characteristic curve (AUROC), the area under the precision-recall curve (AUPRC), as well as the highest recall in the five-fold cross-validation. Conclusion: We further used our model to predict the candidate herbs for Alzheimer's disease and found the compounds mediating herbs and diseases through the herb-compound-gene-disease network. The relevant literature also confirmed our findings.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)