- Home
- A-Z Publications
- Current Bioinformatics
- Previous Issues
- Volume 19, Issue 9, 2024
Current Bioinformatics - Volume 19, Issue 9, 2024
Volume 19, Issue 9, 2024
-
-
FMDVSerPred: A Novel Computational Solution for Foot-and-mouth Disease Virus Classification and Serotype Prediction Prevalent in Asia Using VP1 Nucleotide Sequence Data
Background: Three serotypes of Foot-and-mouth disease (FMD) virus have been circulating in Asia, which are commonly identified by serological assays. Such tests are timeconsuming and also need a bio-containment facility for execution. To the best of our knowledge, no computational solution is available in the literature to predict the FMD virus serotypes. Thus, this necessitates the urgent need for user-friendly tools for FMD virus serotyping. Methods: We presented a computational solution based on a machine-learning model for FMD virus classification and serotype prediction. Besides, various data pre-processing techniques are implemented in the approach for better model prediction. We used sequence data of 2509 FMD virus isolates reported from India and seven other Asian FMD-endemic countries for model training, testing, and validation. We also studied the utility of the developed computational solution in a wet lab setup through collecting and sequencing of 12 virus isolates reported in India. Here, the computational solution is implemented in two user-friendly tools, i.e., online web-prediction server (https://nifmd-bbf.icar.gov.in/FMDVSerPred) and R statistical software package (https://github.com/sam-dfmd/FMDVSerPred). Results: The random forest machine learning model is implemented in the computational solution, as it outperformed seven other machine learning models when evaluated on ten test and independent datasets. Furthermore, the developed computational solution provided validation accuracies of up to 99.87% on test data, up to 98.64%, and 90.24% on independent data reported from Asian countries, including India and its seven neighboring countries, respectively. In addition, our approach was successfully used for predicting serotypes of field FMD virus isolates reported from various parts of India. Conclusion: The high-throughput sequencing combined with machine learning offers a promising solution to FMD virus serotyping.
-
-
-
DeepPTM: Protein Post-translational Modification Prediction from Protein Sequences by Combining Deep Protein Language Model with Vision Transformers
Authors: Necla N. Soylu and Emre SeferIntroduction: More recent self-supervised deep language models, such as Bidirectional Encoder Representations from Transformers (BERT), have performed the best on some language tasks by contextualizing word embeddings for a better dynamic representation. Their proteinspecific versions, such as ProtBERT, generated dynamic protein sequence embeddings, which resulted in better performance for several bioinformatics tasks. Besides, a number of different protein post-translational modifications are prominent in cellular tasks such as development and differentiation. The current biological experiments can detect these modifications, but within a longer duration and with a significant cost. Methods: In this paper, to comprehend the accompanying biological processes concisely and more rapidly, we propose DEEPPTM to predict protein post-translational modification (PTM) sites from protein sequences more efficiently. Different than the current methods, DEEPPTM enhances the modification prediction performance by integrating specialized ProtBERT-based protein embeddings with attention-based vision transformers (ViT), and reveals the associations between different modification types and protein sequence content. Additionally, it can infer several different modifications over different species. Results: Human and mouse ROC AUCs for predicting Succinylation modifications were 0.793 and 0.661 respectively, once 10-fold cross-validation is applied. Similarly, we have obtained 0.776, 0.764, and 0.734 ROC AUC scores on inferring ubiquitination, crotonylation, and glycation sites, respectively. According to detailed computational experiments, DEEPPTM lessens the time spent in laboratory experiments while outperforming the competing methods as well as baselines on inferring all 4 modification sites. In our case, attention-based deep learning methods such as vision transformers look more favorable to learning from ProtBERT features than more traditional deep learning and machine learning techniques. Conclusion: Additionally, the protein-specific ProtBERT model is more effective than the original BERT embeddings for PTM prediction tasks. Our code and datasets can be found at https://github.com/seferlab/deepptm.
-
-
-
Integration of Artificial Intelligence, Machine Learning and Deep Learning Techniques in Genomics: Review on Computational Perspectives for NGS Analysis of DNA and RNA Seq Data
Authors: Chandrashekar K., Vidya Niranjan, Adarsh Vishal and Anagha S. SetlurIn the current state of genomics and biomedical research, the utilization of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) have emerged as paradigm shifters. While traditional NGS DNA and RNA sequencing analysis pipelines have been sound in decoding genetic information, the sequencing data's volume and complexity have surged. There is a demand for more efficient and accurate methods of analysis. This has led to dependency on AI/ML and DL approaches. This paper highlights these tool approaches to ease combat the limitations and generate better results, with the help of pipeline automation and integration of these tools into the NGS DNA and RNA-seq pipeline we can improve the quality of research as large data sets can be processed using Deep Learning tools. Automation helps reduce labor-intensive tasks and helps researchers to focus on other frontiers of research. In the traditional pipeline all tasks from quality check to the variant identification in the case of SNP detection take a huge amount of computational time and manually the researcher has to input codes to prevent manual human errors, but with the power of automation, we can run the whole process in comparatively lesser time and smoother as the automated pipeline can run for multiple files instead of the one single file observed in the traditional pipeline. In conclusion, this review paper sheds light on the transformative impact of DL's integration into traditional pipelines and its role in optimizing computational time. Additionally, it highlights the growing importance of AI-driven solutions in advancing genomics research and enabling data-intensive biomedical applications.
-
-
-
Prospects of Identifying Alternative Splicing Events from Single-Cell RNA Sequencing Data
Authors: Jiacheng Wang and Lei YuanBackground: The advent of single-cell RNA sequencing (scRNA-seq) technology has offered unprecedented opportunities to unravel cellular heterogeneity and functions. Yet, despite its success in unraveling gene expression heterogeneity, accurately identifying and interpreting alternative splicing events from scRNA-seq data remains a formidable challenge. With advancing technology and algorithmic innovations, the prospect of accurately identifying alternative splicing events from scRNA-seq data is becoming increasingly promising. Objective: This perspective aims to uncover the intricacies of splicing at the single-cell level and their potential implications for health and disease. It seeks to harness scRNA-seq's transformative power in revealing cell-specific alternative splicing dynamics and aims to propel our understanding of gene regulation within individual cells to new heights. Methods: The perspective grounds its method on recent literature along with the experimental protocols of single-cell RNA-seq and methods to identify and quantify the alternative splicing events from scRNA-seq data. Results: This perspective outlines the promising potential, challenges, and methodologies for leveraging different scRNA-seq technologies to identify and study alternative splicing events, with a focus on advancing our understanding of gene regulation at the single-cell level. Conclusion: This perspective explores the prospects of utilizing scRNA-seq data to identify and study alternative splicing events, highlighting their potential, challenges, methodologies, biological insights, and future directions.
-
-
-
Application of Deep Learning Neural Networks in Computer-Aided Drug Discovery: A Review
Computer-aided drug design has an important role in drug development and design. It has become a thriving area of research in the pharmaceutical industry to accelerate the drug discovery process. Deep learning, a subdivision of artificial intelligence, is widely applied to advance new drug development and design opportunities. This article reviews the recent technology that uses deep learning techniques to ameliorate the understanding of drug-target interactions in computer-aided drug discovery based on the prior knowledge acquired from various literature. In general, deep learning models can be trained to predict the binding affinity between the protein-ligand complexes and protein structures or generate protein-ligand complexes in structure-based drug discovery. In other words, artificial neural networks and deep learning algorithms, especially graph convolutional neural networks and generative adversarial networks, can be applied to drug discovery. Graph convolutional neural network effectively captures the interactions and structural information between atoms and molecules, which can be enforced to predict the binding affinity between protein and ligand. Also, the ligand molecules with the desired properties can be generated using generative adversarial networks.
-
-
-
Prediction of Drug Pathway-based Disease Classes using Multiple Properties of Drugs
Authors: Lei Chen and Linyang LiBackground: Drug repositioning now is an important research area in drug discovery as it can accelerate the procedures of discovering novel effects of existing drugs. However, it is challenging to screen out possible effects for given drugs. Designing computational methods are a quick and cheap way to complete this task. Most existing computational methods infer the relationships between drugs and diseases. The pathway-based disease classification reported in KEGG provides us a new way to investigate drug repositioning as such classification can be applied to drugs. A predicted class of a given drug suggests latent diseases it can treat. Objective: The purpose of this study is to set up efficient multi-label classifiers to predict the classes of drugs. Methods: We adopt three types of drug information to generate drug features, including drug pathway information, label information and drug network. For the first two types, drugs are first encoded into binary vectors, which are further processed by singular value decomposition. For the third type, the network embedding algorithm, Mashup, is employed to yield drug features. Above features are combined and fed into RAndom k-labELsets (RAKEL) to construct multi-label classifiers, where support vector machine is selected as the base classification algorithm. Results: The ten-fold cross-validation results show that the classifiers provide high performance with accuracy higher than 0.95 and absolute true higher than 0.92. The case study indicates the novel effects of three drugs, i.e., they may treat new diseases. Conclusion: The proposed classifiers have high performance and are superiority to the classifiers with other classic algorithms and drug information. Furthermore, they have the ability to discover new effects of drugs.
-
-
-
P4PC: A Portal for Bioinformatics Resources of piRNAs and circRNAs
Authors: Yajun Liu, Ru Li, Yulian Ding, Xinhong Hei and Fang-Xiang WuBackground: PIWI-interacting RNAs (piRNAs) and circular RNAs (circRNAs) are two kinds of non-coding RNAs (ncRNAs) that play important roles in epigenetic regulation, transcriptional regulation, post-transcriptional regulation of many biological processes. Although there exist various resources, it is still challenging to select such resources for specific research projects on ncRNAs. Methods: In order to facilitate researchers in finding the appropriate bioinformatics sources for studying ncRNAs, we created a novel portal named P4PC that provides computational tools and data sources of piRNAs and circRNAs. Results: 249 computational tools, 126 databases and 420 papers are manually curated in P4PC. All entries in P4PC are classified in 5 groups and 26 subgroups. The list of resources is summarized in the first page of each group. Conclusion: According to their research proposes, users can quickly select proper resources for their research projects by viewing detail information and comments in P4PC. Database URL is http://www.ibiomedical.net/Portal4PC/ and https://43.138.46.5/Portal4PC/.
-
-
-
MSSD: An Efficient Method for Constructing Accurate and Stable Phylogenetic Networks by Merging Subtrees of Equal Depth
Authors: Jiajie Xing, Xu Song, Meiju Yu, Juan Wang and Jing YuBackground: Systematic phylogenetic networks are essential for studying the evolutionary relationships and diversity among species. These networks are particularly important for capturing non-tree-like processes resulting from reticulate evolutionary events. However, existing methods for constructing phylogenetic networks are influenced by the order of inputs. The different orders can lead to inconsistent experimental results. Moreover, constructing a network for large datasets is time-consuming and the network often does not include all of the input tree nodes. Aims: This paper aims to propose a novel method, called as MSSD, which can construct a phylogenetic network from gene trees by Merging Subtrees with the Same Depth in a bottom-up way. Methods: The MSSD first decomposes trees into subtrees based on depth. Then it merges subtrees with the same depth from 0 to the maximum depth. For all subtrees of one depth, it inserts each subtree into the current networks by means of identical subtrees. Results: We test the MSSD on the simulated data and real data. The experimental results show that the networks constructed by the MSSD can represent all input trees and the MSSD is more stable than other methods. The MSSD can construct networks faster and the constructed networks have more similar information with the input trees than other methods. Conclusion: MSSD is a powerful tool for studying the evolutionary relationships among species in biologyand is free available at https://github.com/xingjiajie2023/MSSD.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)