Volume 19, Issue 7

Current Bioinformatics - Volume 19, Issue 7, 2024

Volume 19, Issue 7, 2024

- Metabolomics: Recent Advances and Future Prospects Unveiled
  
  Authors: Shweta Sharma, Garima Singh and Mymoona Akhter
  
  https://doi.org/10.2174/0115748936270744231115110329
  More Less
  
  In the era of genomics, fueled by advanced technologies and analytical tools, metabolomics has become a vital component in biomedical research. Its significance spans various domains, encompassing biomarker identification, uncovering underlying mechanisms and pathways, as well as the exploration of new drug targets and precision medicine. This article presents a comprehensive overview of the latest developments in metabolomics techniques, emphasizing their wide-ranging applications across diverse research fields and underscoring their immense potential for future advancements.
  
  Add to my favourites
  
  Email this

- An Explainable Multichannel Model for COVID-19 Time Series Prediction
  
  Authors: Hongjian He, Jiang Xie, Xinwei Lu, Dingkai Huang and Wenjun Zhang
  
  https://doi.org/10.2174/1574893618666230727160507
  More Less
  
  Introduction: The COVID-19 pandemic has affected every country and changed people's lives. Accurate prediction of COVID-19 trends can help prevent the further spread of the outbreak. However, the changing environment affects the COVID-19 prediction performance, and previous models are limited in practical applications. Methods: An explainable multichannel deep learning model with spatial, temporal and environmental channels for time series prediction, STE-COVIDNet, was proposed. The time series data of COVID-19 infection, weather, in-state population mobility, and vaccination were collected from May, 2020, to October, 2021, in the USA. In the environmental channel of STE-COVIDNet, an attention mechanism was applied to extract significant environmental factors related to the spread of COVID-19. In addition, the attention weights of these factors were analyzed with the actual situation. Results: STE-COVIDNet was found to be superior to other advanced prediction models of COVID-19 infection cases. The analysis results of attention weight were reported to be consistent with existing studies and reports. It was found that the same environmental factors that influence the spread of COVID-19 can vary across time and region, which explains why findings of previous studies on the relationship between the environment and COVID-19 vary by region and time. Conclusion: STE-COVIDNet is an explainable model that can adapt to environmental changes and thus improve predictive performance.
  
  Add to my favourites
  
  Email this

- DeepEpi: Deep Learning Model for Predicting Gene Expression Regulation Based on Epigenetic Histone Modifications
  
  Authors: Rania Hamdy, Yasser Omar and Fahima Maghraby
  
  https://doi.org/10.2174/1574893618666230818121046
  More Less
  
  Background: Histone modification is a vital element in gene expression regulation. The way in which these proteins bind to the DNA impacts whether or not a gene may be expressed. Although those factors cannot influence DNA construction, they can influence how it is transcribed. Objective: Each spatial location in DNA has its function, so the spatial arrangement of chromatin modifications affects how the gene can express. Also, gene regulation is affected by the type of histone modification combinations that are present on the gene and depends on the spatial distributional pattern of these modifications and how long these modifications read on a gene region. So, this study aims to know how to model Long-range spatial genome data and model complex dependencies among Histone reads. Methods: The Convolution Neural Network (CNN) is used to model all data features in this paper. It can detect patterns in histones signals and preserve the spatial information of these patterns. It also uses the concept of memory in long short-term memory (LSTM), using vanilla LSTM, Bi-Directional LSTM, or Stacked LSTM to preserve long-range histones signals. Additionally, it tries to combine these methods using ConvLSTM or uses them together with the aid of a self-attention. Results: Based on the results, the combination of CNN, LSTM with the self-attention mechanism obtained an Area under the Curve (AUC) score of 88.87% over 56 cell types. Conclusion: The result outperforms the present state-of-the-art model and provides insight into how combinatorial interactions between histone modification marks can control gene expression. The source code is available at https://github.com/RaniaHamdy/DeepEpi.
  
  Add to my favourites
  
  Email this

- Toxicity Prediction for Immune Thrombocytopenia Caused by Drugs Based on Logistic Regression with Feature Importance
  
  Authors: Osphanie Mentari, Muhammad Shujaat, Hilal Tayara and Kil T. Chong
  
  https://doi.org/10.2174/0115748936269606231001140647
  More Less
  
  Background: One of the problems in drug discovery that can be solved by artificial intelligence is toxicity prediction. In drug-induced immune thrombocytopenia, toxicity can arise in patients after five to ten days by significant bleeding caused by drugdependent antibodies. In clinical trials, when this condition occurs, all the drugs consumed by patients should be stopped, although sometimes this is not possible, especially for older patients who are dependent on their medication. Therefore, being able to predict toxicity in drug-induced immune thrombocytopenia is very important. Computational technologies, such as machine learning, can help predict toxicity better than empirical techniques owing to the lower cost and faster processing. Objective: Previous studies used the KNN method. However, the performance of these approaches needs to be enhanced. This study proposes a Logistic Regression to improve accuracy scores. Methods: In this study, we present a new model for drug-induced immune thrombocytopenia using a machine learning method. Our model extracts several features from the Simplified Molecular Input Line Entry System (SMILES). These features were fused and cleaned, and the important features were selected using the SelectKBest method. The model uses a Logistic Regression that is optimized and tuned by the Grid Search Cross Validation. Results: The highest accuracy occurred when using features from PADEL, CDK, RDKIT, MORDRED, BLUEDESC combinations, resulting in an accuracy of 80%. Conclusion: Our proposed model outperforms previous studies in accuracy categories. The information and source code is accessible online at Github: https://github.com/Osphanie/Thrombocytopenia
  
  Add to my favourites
  
  Email this

- Prediction of Super-enhancers Based on Mean-shift Undersampling
  
  Authors: Han Cheng, Shumei Ding and Cangzhi Jia
  
  https://doi.org/10.2174/0115748936268302231110111456
  More Less
  
  Background: Super-enhancers are clusters of enhancers defined based on the binding occupancy of master transcription factors, chromatin regulators, or chromatin marks. It has been reported that super-enhancers are transcriptionally more active and cell-type-specific than regular enhancers. Therefore, it is necessary to identify super-enhancers from regular enhancers. A variety of computational methods have been proposed to identify super-enhancers as auxiliary tools. However, most methods use ChIP-seq data, and the lack of this part of the data will make the predictor unable to execute or fail to achieve satisfactory performance. Objective: The aim of this study is to propose a stacking computational model based on the fusion of multiple features to identify super-enhancers in both human and mouse species. Methods: This work adopted mean-shift to cluster majority class samples and selected four sets of balanced datasets for mouse and three sets of balanced datasets for human to train the stacking model. Five types of sequence information are used as input to the XGBoost classifier, and the average value of the probability outputs from each classifier is designed as the final classification result. Results: The results of 10-fold cross-validation and cross-cell-line validation prove that our method has superior performance compared to other existing methods. The source code and datasets are available at https://github.com/Cheng-Han-max/SE_voting. Conclusion: The analysis of feature importance indicates that Mismatch accounts for the highest proportion among the top 20 important features.
  
  Add to my favourites
  
  Email this

- Discovering Microbe-disease Associations with Weighted Graph Convolution Networks and Taxonomy Common Tree
  
  Authors: Jieqi Xing, Yu Shi, Xiaoquan Su and Shunyao Wu
  
  https://doi.org/10.2174/0115748936270441231116093650
  More Less
  
  Background: Microbe-disease associations are integral to understanding complex diseases and their screening procedures. Objective: While numerous computational methods have been developed to detect these associations, their performance remains limited due to inadequate utilization of weighted inherent similarities and microbial taxonomy hierarchy. To address this limitation, we have introduced WTHMDA (weighted taxonomic heterogeneous network-based microbe-disease association), a novel deep learning framework. Methods: WTHMDA combines a weighted graph convolution network and the microbial taxonomy common tree to predict microbe-disease associations effectively. The framework extracts multiple microbe similarities from the taxonomy common tree, facilitating the construction of a microbe- disease heterogeneous interaction network. Utilizing a weighted DeepWalk algorithm, node embeddings in the network incorporate weight information from the similarities. Subsequently, a deep neural network (DNN) model accurately predicts microbe-disease associations based on this interaction network. Results: Extensive experiments on multiple datasets and case studies demonstrate WTHMDA's superiority over existing approaches, particularly in predicting unknown associations. Conclusion: Our proposed method offers a new strategy for discovering microbe-disease linkages, showcasing remarkable performance and enhancing the feasibility of identifying disease risk.
  
  Add to my favourites
  
  Email this

- Stacking-Kcr: A Stacking Model for Predicting the Crotonylation Sites of Lysine by Fusing Serial and Automatic Encoder
  
  Authors: Ying Liang, Suhui Li, Xiya You, You Guo and Jianjun Tang
  
  https://doi.org/10.2174/0115748936272040231117114252
  More Less
  
  Background: Protein lysine crotonylation (Kcr), a newly discovered important posttranslational modification (PTM), is typically localized at the transcription start site and regulates gene expression, which is associated with a variety of pathological conditions such as developmental defects and malignant transformation. Objective: Identifying Kcr sites is advantageous for the discovery of its biological mechanism and the development of new drugs for related diseases. However, traditional experimental methods for identifying Kcr sites are expensive and inefficient, necessitating the development of new computational techniques. Methods: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical properties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. Results: The five-fold cross-validation of this model has achieved an accuracy of 0.828 and an AUC of 0.910. This shows that the Stacking-Kcr method has obvious advantages over traditional machine learning methods. On independent test sets, Stacking-Kcr achieved an accuracy of 84.89% and an AUC of 92.21%, which was higher than 1.7% and 0.8% of other state-of-the-art tools. Additionally, we trained Stacking-Kcr on the phosphorylation site, and the result is superior to the current model. Conclusion: These outcomes are additional evidence that Stacking-Kcr has strong application potential and generalization performance.
  
  Add to my favourites
  
  Email this

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Current Bioinformatics - Volume 19, Issue 7, 2024

Volume 19, Issue 7, 2024

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed