- Home
- A-Z Publications
- Current Bioinformatics
- Previous Issues
- Volume 19, Issue 7, 2024
Current Bioinformatics - Volume 19, Issue 7, 2024
Volume 19, Issue 7, 2024
-
-
Metabolomics: Recent Advances and Future Prospects Unveiled
Authors: Shweta Sharma, Garima Singh and Mymoona AkhterIn the era of genomics, fueled by advanced technologies and analytical tools, metabolomics has become a vital component in biomedical research. Its significance spans various domains, encompassing biomarker identification, uncovering underlying mechanisms and pathways, as well as the exploration of new drug targets and precision medicine. This article presents a comprehensive overview of the latest developments in metabolomics techniques, emphasizing their wide-ranging applications across diverse research fields and underscoring their immense potential for future advancements.
-
-
-
An Explainable Multichannel Model for COVID-19 Time Series Prediction
Authors: Hongjian He, Jiang Xie, Xinwei Lu, Dingkai Huang and Wenjun ZhangIntroduction: The COVID-19 pandemic has affected every country and changed people's lives. Accurate prediction of COVID-19 trends can help prevent the further spread of the outbreak. However, the changing environment affects the COVID-19 prediction performance, and previous models are limited in practical applications. Methods: An explainable multichannel deep learning model with spatial, temporal and environmental channels for time series prediction, STE-COVIDNet, was proposed. The time series data of COVID-19 infection, weather, in-state population mobility, and vaccination were collected from May, 2020, to October, 2021, in the USA. In the environmental channel of STE-COVIDNet, an attention mechanism was applied to extract significant environmental factors related to the spread of COVID-19. In addition, the attention weights of these factors were analyzed with the actual situation. Results: STE-COVIDNet was found to be superior to other advanced prediction models of COVID-19 infection cases. The analysis results of attention weight were reported to be consistent with existing studies and reports. It was found that the same environmental factors that influence the spread of COVID-19 can vary across time and region, which explains why findings of previous studies on the relationship between the environment and COVID-19 vary by region and time. Conclusion: STE-COVIDNet is an explainable model that can adapt to environmental changes and thus improve predictive performance.
-
-
-
DeepEpi: Deep Learning Model for Predicting Gene Expression Regulation Based on Epigenetic Histone Modifications
Authors: Rania Hamdy, Yasser Omar and Fahima MaghrabyBackground: Histone modification is a vital element in gene expression regulation. The way in which these proteins bind to the DNA impacts whether or not a gene may be expressed. Although those factors cannot influence DNA construction, they can influence how it is transcribed. Objective: Each spatial location in DNA has its function, so the spatial arrangement of chromatin modifications affects how the gene can express. Also, gene regulation is affected by the type of histone modification combinations that are present on the gene and depends on the spatial distributional pattern of these modifications and how long these modifications read on a gene region. So, this study aims to know how to model Long-range spatial genome data and model complex dependencies among Histone reads. Methods: The Convolution Neural Network (CNN) is used to model all data features in this paper. It can detect patterns in histones signals and preserve the spatial information of these patterns. It also uses the concept of memory in long short-term memory (LSTM), using vanilla LSTM, Bi-Directional LSTM, or Stacked LSTM to preserve long-range histones signals. Additionally, it tries to combine these methods using ConvLSTM or uses them together with the aid of a self-attention. Results: Based on the results, the combination of CNN, LSTM with the self-attention mechanism obtained an Area under the Curve (AUC) score of 88.87% over 56 cell types. Conclusion: The result outperforms the present state-of-the-art model and provides insight into how combinatorial interactions between histone modification marks can control gene expression. The source code is available at https://github.com/RaniaHamdy/DeepEpi.
-
-
-
Toxicity Prediction for Immune Thrombocytopenia Caused by Drugs Based on Logistic Regression with Feature Importance
Authors: Osphanie Mentari, Muhammad Shujaat, Hilal Tayara and Kil T. ChongBackground: One of the problems in drug discovery that can be solved by artificial intelligence is toxicity prediction. In drug-induced immune thrombocytopenia, toxicity can arise in patients after five to ten days by significant bleeding caused by drugdependent antibodies. In clinical trials, when this condition occurs, all the drugs consumed by patients should be stopped, although sometimes this is not possible, especially for older patients who are dependent on their medication. Therefore, being able to predict toxicity in drug-induced immune thrombocytopenia is very important. Computational technologies, such as machine learning, can help predict toxicity better than empirical techniques owing to the lower cost and faster processing. Objective: Previous studies used the KNN method. However, the performance of these approaches needs to be enhanced. This study proposes a Logistic Regression to improve accuracy scores. Methods: In this study, we present a new model for drug-induced immune thrombocytopenia using a machine learning method. Our model extracts several features from the Simplified Molecular Input Line Entry System (SMILES). These features were fused and cleaned, and the important features were selected using the SelectKBest method. The model uses a Logistic Regression that is optimized and tuned by the Grid Search Cross Validation. Results: The highest accuracy occurred when using features from PADEL, CDK, RDKIT, MORDRED, BLUEDESC combinations, resulting in an accuracy of 80%. Conclusion: Our proposed model outperforms previous studies in accuracy categories. The information and source code is accessible online at Github: https://github.com/Osphanie/Thrombocytopenia
-
-
-
Prediction of Super-enhancers Based on Mean-shift Undersampling
Authors: Han Cheng, Shumei Ding and Cangzhi JiaBackground: Super-enhancers are clusters of enhancers defined based on the binding occupancy of master transcription factors, chromatin regulators, or chromatin marks. It has been reported that super-enhancers are transcriptionally more active and cell-type-specific than regular enhancers. Therefore, it is necessary to identify super-enhancers from regular enhancers. A variety of computational methods have been proposed to identify super-enhancers as auxiliary tools. However, most methods use ChIP-seq data, and the lack of this part of the data will make the predictor unable to execute or fail to achieve satisfactory performance. Objective: The aim of this study is to propose a stacking computational model based on the fusion of multiple features to identify super-enhancers in both human and mouse species. Methods: This work adopted mean-shift to cluster majority class samples and selected four sets of balanced datasets for mouse and three sets of balanced datasets for human to train the stacking model. Five types of sequence information are used as input to the XGBoost classifier, and the average value of the probability outputs from each classifier is designed as the final classification result. Results: The results of 10-fold cross-validation and cross-cell-line validation prove that our method has superior performance compared to other existing methods. The source code and datasets are available at https://github.com/Cheng-Han-max/SE_voting. Conclusion: The analysis of feature importance indicates that Mismatch accounts for the highest proportion among the top 20 important features.
-
-
-
Discovering Microbe-disease Associations with Weighted Graph Convolution Networks and Taxonomy Common Tree
Authors: Jieqi Xing, Yu Shi, Xiaoquan Su and Shunyao WuBackground: Microbe-disease associations are integral to understanding complex diseases and their screening procedures. Objective: While numerous computational methods have been developed to detect these associations, their performance remains limited due to inadequate utilization of weighted inherent similarities and microbial taxonomy hierarchy. To address this limitation, we have introduced WTHMDA (weighted taxonomic heterogeneous network-based microbe-disease association), a novel deep learning framework. Methods: WTHMDA combines a weighted graph convolution network and the microbial taxonomy common tree to predict microbe-disease associations effectively. The framework extracts multiple microbe similarities from the taxonomy common tree, facilitating the construction of a microbe- disease heterogeneous interaction network. Utilizing a weighted DeepWalk algorithm, node embeddings in the network incorporate weight information from the similarities. Subsequently, a deep neural network (DNN) model accurately predicts microbe-disease associations based on this interaction network. Results: Extensive experiments on multiple datasets and case studies demonstrate WTHMDA's superiority over existing approaches, particularly in predicting unknown associations. Conclusion: Our proposed method offers a new strategy for discovering microbe-disease linkages, showcasing remarkable performance and enhancing the feasibility of identifying disease risk.
-
-
-
Stacking-Kcr: A Stacking Model for Predicting the Crotonylation Sites of Lysine by Fusing Serial and Automatic Encoder
Authors: Ying Liang, Suhui Li, Xiya You, You Guo and Jianjun TangBackground: Protein lysine crotonylation (Kcr), a newly discovered important posttranslational modification (PTM), is typically localized at the transcription start site and regulates gene expression, which is associated with a variety of pathological conditions such as developmental defects and malignant transformation. Objective: Identifying Kcr sites is advantageous for the discovery of its biological mechanism and the development of new drugs for related diseases. However, traditional experimental methods for identifying Kcr sites are expensive and inefficient, necessitating the development of new computational techniques. Methods: In this work, to accurately identify Kcr sites, we propose a model for ensemble learning called Stacking-Kcr. Firstly, extract features from sequence information, physicochemical properties, and sequence fragment similarity. Then, the two characteristics of sequence information and physicochemical properties are fused using automatic encoder and serial, respectively. Finally, the fused two features and sequence fragment similarity features are then respectively input into the four base classifiers, a meta classifier is constructed using the first level prediction results, and the final forecasting results are obtained. Results: The five-fold cross-validation of this model has achieved an accuracy of 0.828 and an AUC of 0.910. This shows that the Stacking-Kcr method has obvious advantages over traditional machine learning methods. On independent test sets, Stacking-Kcr achieved an accuracy of 84.89% and an AUC of 92.21%, which was higher than 1.7% and 0.8% of other state-of-the-art tools. Additionally, we trained Stacking-Kcr on the phosphorylation site, and the result is superior to the current model. Conclusion: These outcomes are additional evidence that Stacking-Kcr has strong application potential and generalization performance.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)