Volume 19, Issue 9

Current Bioinformatics - Volume 19, Issue 9, 2024

Volume 19, Issue 9, 2024

- Deep Learning in DNA- and RNA-sequence Analysis: Advances and New Challenges
  
  By Xiangtao Li
  
  https://doi.org/10.2174/157489361909240611150749
  More Less
  
  Add to my favourites
  
  Email this

- FMDVSerPred: A Novel Computational Solution for Foot-and-mouth Disease Virus Classification and Serotype Prediction Prevalent in Asia Using VP1 Nucleotide Sequence Data
  
  Authors: Samarendra Das, Soumen Pal, Samyak Mahapatra, Jitendra K. Biswal, Sukanta K. Pradhan, Aditya P. Sahoo and Rabindra P. Singh
  
  https://doi.org/10.2174/0115748936278851231213110653
  More Less
  
  Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: The high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.
  
  Add to my favourites
  
  Email this

- DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers
  
  Authors: Necla N. Soylu and Emre Sefer
  
  https://doi.org/10.2174/0115748936283134240109054157
  More Less
  
  Introduction: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.793 and 0.661 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.776, 0.764, and 0.734 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.
  
  Add to my favourites
  
  Email this

- Integration of Artificial Intelligence, Machine Learning and Deep Learning Techniques in Genomics: Review on Computational Perspectives for NGS Analysis of DNA and RNA Seq Data
  
  Authors: Chandrashekar K., Vidya Niranjan, Adarsh Vishal and Anagha S. Setlur
  
  https://doi.org/10.2174/0115748936284044240108074937
  More Less
  
  In the current state of genomics and biomedical research, the utilization of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) have emerged as paradigm shifters. While traditional NGS DNA and RNA sequencing analysis pipelines have been sound in decoding genetic information, the sequencing data's volume and complexity have surged. There is a demand for more efficient and accurate methods of analysis. This has led to dependency on AI/ML and DL approaches. This paper highlights these tool approaches to ease combat the limitations and generate better results, with the help of pipeline automation and integration of these tools into the NGS DNA and RNA-seq pipeline we can improve the quality of research as large data sets can be processed using Deep Learning tools. Automation helps reduce labor-intensive tasks and helps researchers to focus on other frontiers of research. In the traditional pipeline all tasks from quality check to the variant identification in the case of SNP detection take a huge amount of computational time and manually the researcher has to input codes to prevent manual human errors, but with the power of automation, we can run the whole process in comparatively lesser time and smoother as the automated pipeline can run for multiple files instead of the one single file observed in the traditional pipeline. In conclusion, this review paper sheds light on the transformative impact of DL's integration into traditional pipelines and its role in optimizing computational time. Additionally, it highlights the growing importance of AI-driven solutions in advancing genomics research and enabling data-intensive biomedical applications.
  
  Add to my favourites
  
  Email this

- Prospects of Identifying Alternative Splicing Events from Single-Cell RNA Sequencing Data
  
  Authors: Jiacheng Wang and Lei Yuan
  
  https://doi.org/10.2174/0115748936279561231214072041
  More Less
  
  Background: The advent of single-cell RNA sequencing (scRNA-seq) technology has offered unprecedented opportunities to unravel cellular heterogeneity and functions. Yet, despite its success in unraveling gene expression heterogeneity, accurately identifying and interpreting alternative splicing events from scRNA-seq data remains a formidable challenge. With advancing technology and algorithmic innovations, the prospect of accurately identifying alternative splicing events from scRNA-seq data is becoming increasingly promising. Objective: This perspective aims to uncover the intricacies of splicing at the single-cell level and their potential implications for health and disease. It seeks to harness scRNA-seq's transformative power in revealing cell-specific alternative splicing dynamics and aims to propel our understanding of gene regulation within individual cells to new heights. Methods: The perspective grounds its method on recent literature along with the experimental protocols of single-cell RNA-seq and methods to identify and quantify the alternative splicing events from scRNA-seq data. Results: This perspective outlines the promising potential, challenges, and methodologies for leveraging different scRNA-seq technologies to identify and study alternative splicing events, with a focus on advancing our understanding of gene regulation at the single-cell level. Conclusion: This perspective explores the prospects of utilizing scRNA-seq data to identify and study alternative splicing events, highlighting their potential, challenges, methodologies, biological insights, and future directions.
  
  Add to my favourites
  
  Email this

- Application of Deep Learning Neural Networks in Computer-Aided Drug Discovery: A Review
  
  Authors: Jay S. Mathivanan, Victor Violet Dhayabaran, Mary Rajathei David, Muthugobal Bagayalakshmi Karuna Nidhi, Karuppasamy Muthuvel Prasath and Suvaiyarasan Suvaithenamudhan
  
  https://doi.org/10.2174/0115748936276510231123121404
  More Less
  
  Computer-aided drug design has an important role in drug development and design. It has become a thriving area of research in the pharmaceutical industry to accelerate the drug discovery process. Deep learning, a subdivision of artificial intelligence, is widely applied to advance new drug development and design opportunities. This article reviews the recent technology that uses deep learning techniques to ameliorate the understanding of drug-target interactions in computer-aided drug discovery based on the prior knowledge acquired from various literature. In general, deep learning models can be trained to predict the binding affinity between the protein-ligand complexes and protein structures or generate protein-ligand complexes in structure-based drug discovery. In other words, artificial neural networks and deep learning algorithms, especially graph convolutional neural networks and generative adversarial networks, can be applied to drug discovery. Graph convolutional neural network effectively captures the interactions and structural information between atoms and molecules, which can be enforced to predict the binding affinity between protein and ligand. Also, the ligand molecules with the desired properties can be generated using generative adversarial networks.
  
  Add to my favourites
  
  Email this

- Prediction of Drug Pathway-based Disease Classes using Multiple Properties of Drugs
  
  Authors: Lei Chen and Linyang Li
  
  https://doi.org/10.2174/0115748936284973240105115444
  More Less
  
  Background: Drug repositioning now is an important research area in drug discovery as it can accelerate the procedures of discovering novel effects of existing drugs. However, it is challenging to screen out possible effects for given drugs. Designing computational methods are a quick and cheap way to complete this task. Most existing computational methods infer the relationships between drugs and diseases. The pathway-based disease classification reported in KEGG provides us a new way to investigate drug repositioning as such classification can be applied to drugs. A predicted class of a given drug suggests latent diseases it can treat. Objective: The purpose of this study is to set up efficient multi-label classifiers to predict the classes of drugs. Methods: We adopt three types of drug information to generate drug features, including drug pathway information, label information and drug network. For the first two types, drugs are first encoded into binary vectors, which are further processed by singular value decomposition. For the third type, the network embedding algorithm, Mashup, is employed to yield drug features. Above features are combined and fed into RAndom k-labELsets (RAKEL) to construct multi-label classifiers, where support vector machine is selected as the base classification algorithm. Results: The ten-fold cross-validation results show that the classifiers provide high performance with accuracy higher than 0.95 and absolute true higher than 0.92. The case study indicates the novel effects of three drugs, i.e., they may treat new diseases. Conclusion: The proposed classifiers have high performance and are superiority to the classifiers with other classic algorithms and drug information. Furthermore, they have the ability to discover new effects of drugs.
  
  Add to my favourites
  
  Email this

- P4PC: A Portal for Bioinformatics Resources of piRNAs and circRNAs
  
  Authors: Yajun Liu, Ru Li, Yulian Ding, Xinhong Hei and Fang-Xiang Wu
  
  https://doi.org/10.2174/0115748936289420240117100823
  More Less
  
  Background: PIWI-interacting RNAs (piRNAs) and circular RNAs (circRNAs) are two kinds of non-coding RNAs (ncRNAs) that play important roles in epigenetic regulation, transcriptional regulation, post-transcriptional regulation of many biological processes. Although there exist various resources, it is still challenging to select such resources for specific research projects on ncRNAs. Methods: In order to facilitate researchers in finding the appropriate bioinformatics sources for studying ncRNAs, we created a novel portal named P4PC that provides computational tools and data sources of piRNAs and circRNAs. Results: 249 computational tools, 126 databases and 420 papers are manually curated in P4PC. All entries in P4PC are classified in 5 groups and 26 subgroups. The list of resources is summarized in the first page of each group. Conclusion: According to their research proposes, users can quickly select proper resources for their research projects by viewing detail information and comments in P4PC. Database URL is http://www.ibiomedical.net/Portal4PC/ and https://43.138.46.5/Portal4PC/.
  
  Add to my favourites
  
  Email this

- MSSD: An Efficient Method for Constructing Accurate and Stable Phylogenetic Networks by Merging Subtrees of Equal Depth
  
  Authors: Jiajie Xing, Xu Song, Meiju Yu, Juan Wang and Jing Yu
  
  https://doi.org/10.2174/0115748936256923230927081102
  More Less
  
  Background: Systematic phylogenetic networks are essential for studying the evolutionary relationships and diversity among species. These networks are particularly important for capturing non-tree-like processes resulting from reticulate evolutionary events. However, existing methods for constructing phylogenetic networks are influenced by the order of inputs. The different orders can lead to inconsistent experimental results. Moreover, constructing a network for large datasets is time-consuming and the network often does not include all of the input tree nodes. Aims: This paper aims to propose a novel method, called as MSSD, which can construct a phylogenetic network from gene trees by Merging Subtrees with the Same Depth in a bottom-up way. Methods: The MSSD first decomposes trees into subtrees based on depth. Then it merges subtrees with the same depth from 0 to the maximum depth. For all subtrees of one depth, it inserts each subtree into the current networks by means of identical subtrees. Results: We test the MSSD on the simulated data and real data. The experimental results show that the networks constructed by the MSSD can represent all input trees and the MSSD is more stable than other methods. The MSSD can construct networks faster and the constructed networks have more similar information with the input trees than other methods. Conclusion: MSSD is a powerful tool for studying the evolutionary relationships among species in biologyand is free available at https://github.com/xingjiajie2023/MSSD.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 19, Issue 9, 2024

Volume 19, Issue 9, 2024

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed